Find answers from the community

Updated last year

Embedding

At a glance
hey, I have a big list of documents and Im trying to do VectorStoreIndex.from_documents on it but the embeddings generation takes very long, how can I fix this, thanks
Attachment
b95eb106-dc19-4456-9cb6-5b78572aba43.png
W
L
T
8 comments
thanks imma try it out
increasing batch size to 2048 increased the speed a little bit while async does not work .For more information, I'm using flask with chromadb, I'm trying to read a csv file with paged csv loader and insert those documents to chroma. For 100k documents its still taking well over 40 minutes, any help is appreciated!!
Attachment
image.png
Did you pass use_async=True in VectorStoreIndex?
I think, you'll have to Create an Instance of VectorStoreIndex with use_async=True.
Something like this

Plain Text
from llama_index import VectorStoreIndex
index = VectorStoreIndex(documents, use_async=True,storage_context=storage_context, show_progress=True )
I specified the batch size to be 2048 but I'm still getting AssertionError: The batch size should not be larger than 2048 , the progress bar also changed to .../2 which I think is over 2048
Attachments
image.png
image.png
updating llama_index to newer version got rid of AssertionError: The batch size should not be larger than 2048
The embedding process much faster now, thanks @WhiteFang_Jr @Logan M !
Add a reply
Sign up and join the conversation on Discord