Find answers from the community

Updated last year

I am using JSONReader and ChromaDB to

At a glance

I am using JSONReader and ChromaDB to load a json file containing ~150,000 tweets and store the embeddings. The process has been running for about half an hour, but is only using ~5% of my CPU. Any suggestions on how to speed this up?

5 comments

LLogan M

oh boy, that's a lot of tweets

The best bet is probably increasing the embedding batch size -- the default is 10

Plain Text

from llama_index import ServiceContext, set_global_service_context
from llama_index.embeddings import OpenAIEmbedding

embed_model = OpenAIEmbedding(embed_batch_size=100)
ctx = service_context.from_defaults(embed_model=embed_model)
set_global_service_context(ctx)

LLogan M

I think the max batch size is 2048? But not sure when the rate limiting will start lol

LLogan M

If you are using local embeddings, the process is similar, but you'll be bound by memory

SSeldo

That does seem to have sped things up, thank you!

SSeldo

(I switched to 2000 for the batch size)

Add a reply