Find answers from the community

Updated 2 years ago

I am using JSONReader and ChromaDB to

At a glance

The community member is using JSONReader and ChromaDB to load a JSON file containing approximately 150,000 tweets and store the embeddings. The process is running slowly, only using about 5% of the CPU. Other community members suggest increasing the embedding batch size, with one recommending a batch size of 2000, which seems to have sped up the process.

I am using JSONReader and ChromaDB to load a json file containing ~150,000 tweets and store the embeddings. The process has been running for about half an hour, but is only using ~5% of my CPU. Any suggestions on how to speed this up?
L
S
5 comments
oh boy, that's a lot of tweets

The best bet is probably increasing the embedding batch size -- the default is 10

Plain Text
from llama_index import ServiceContext, set_global_service_context
from llama_index.embeddings import OpenAIEmbedding

embed_model = OpenAIEmbedding(embed_batch_size=100)
ctx = service_context.from_defaults(embed_model=embed_model)
set_global_service_context(ctx)
I think the max batch size is 2048? But not sure when the rate limiting will start lol
If you are using local embeddings, the process is similar, but you'll be bound by memory
That does seem to have sped things up, thank you!
(I switched to 2000 for the batch size)
Add a reply
Sign up and join the conversation on Discord