Find answers from the community

Updated last year

I am using JSONReader and ChromaDB to

I am using JSONReader and ChromaDB to load a json file containing ~150,000 tweets and store the embeddings. The process has been running for about half an hour, but is only using ~5% of my CPU. Any suggestions on how to speed this up?
L
S
5 comments
oh boy, that's a lot of tweets

The best bet is probably increasing the embedding batch size -- the default is 10

Plain Text
from llama_index import ServiceContext, set_global_service_context
from llama_index.embeddings import OpenAIEmbedding

embed_model = OpenAIEmbedding(embed_batch_size=100)
ctx = service_context.from_defaults(embed_model=embed_model)
set_global_service_context(ctx)
I think the max batch size is 2048? But not sure when the rate limiting will start lol
If you are using local embeddings, the process is similar, but you'll be bound by memory
That does seem to have sped things up, thank you!
(I switched to 2000 for the batch size)
Add a reply
Sign up and join the conversation on Discord