Find answers from the community

Updated last year

Embedding

At a glance

The community member is trying to create a VectorStoreIndex from a large list of documents, but the embeddings generation is taking a long time. The community members suggest trying indexing with async, increasing the batch size, and updating the llama_index library to a newer version. The community members also provide specific code examples for creating a VectorStoreIndex with use_async=True. The community members note that increasing the batch size to 2048 helped a little, but the process is still slow for 100k documents. The solution seems to be updating the llama_index library, which resolved the AssertionError and made the embedding process much faster.

Useful resources
hey, I have a big list of documents and Im trying to do VectorStoreIndex.from_documents on it but the embeddings generation takes very long, how can I fix this, thanks
Attachment
b95eb106-dc19-4456-9cb6-5b78572aba43.png
W
L
T
8 comments
thanks imma try it out
increasing batch size to 2048 increased the speed a little bit while async does not work .For more information, I'm using flask with chromadb, I'm trying to read a csv file with paged csv loader and insert those documents to chroma. For 100k documents its still taking well over 40 minutes, any help is appreciated!!
Attachment
image.png
Did you pass use_async=True in VectorStoreIndex?
I think, you'll have to Create an Instance of VectorStoreIndex with use_async=True.
Something like this

Plain Text
from llama_index import VectorStoreIndex
index = VectorStoreIndex(documents, use_async=True,storage_context=storage_context, show_progress=True )
I specified the batch size to be 2048 but I'm still getting AssertionError: The batch size should not be larger than 2048 , the progress bar also changed to .../2 which I think is over 2048
Attachments
image.png
image.png
updating llama_index to newer version got rid of AssertionError: The batch size should not be larger than 2048
The embedding process much faster now, thanks @WhiteFang_Jr @Logan M !
Add a reply
Sign up and join the conversation on Discord