Log in
Log into community
Find answers from the community
View all posts
Related posts
Was this helpful?
π
π
π
Powered by
Hall
Inactive
Updated 4 months ago
0
Follow
hey, I have a big list of documents and
hey, I have a big list of documents and
Inactive
0
Follow
At a glance
T
Tungdepzai
last year
Β·
hey, I have a big list of documents and Im trying to do VectorStoreIndex.from_documents on it but the embeddings generation takes very long, how can I fix this, thanks
Attachment
T
T
L
6 comments
Share
Open in Discord
T
Teemu
last year
Which embedding model are you using?
T
Tungdepzai
last year
Im using openai's ada
T
Teemu
last year
With default settings I tried and it took me 20min for 75 000 pages of PDFs
T
Teemu
last year
Is yours taking a similar time?
T
Tungdepzai
last year
Im using
paged csv loader
which splits each row of my csv into a document. For 100k+ documents the process is taking more than an hour
L
Logan M
last year
Also increasing the batch size could help
https://gpt-index.readthedocs.io/en/stable/module_guides/models/embeddings.html#batch-size
Add a reply
Sign up and join the conversation on Discord
Join on Discord