Find answers from the community

Updated 5 months ago

hey, I have a big list of documents and

At a glance

hey, I have a big list of documents and Im trying to do VectorStoreIndex.from_documents on it but the embeddings generation takes very long, how can I fix this, thanks

Attachment

6 comments

TTeemu

Which embedding model are you using?

TTungdepzai

Im using openai's ada

TTeemu

With default settings I tried and it took me 20min for 75 000 pages of PDFs

TTeemu

Is yours taking a similar time?

TTungdepzai

Im using paged csv loader which splits each row of my csv into a document. For 100k+ documents the process is taking more than an hour

LLogan M

Also increasing the batch size could help

https://gpt-index.readthedocs.io/en/stable/module_guides/models/embeddings.html#batch-size

Add a reply