Find answers from the community

Updated 5 months ago

save_to_string speed

At a glance

Hey . I'm also using AWS lambda but apperently save_to_string takes very long time to run even on small text files. Any idea how things can be speed up?

9 comments

LLogan M

Hmm, it's taking too long on lambda, or before lambda?

yyoelk

@Logan M on lambda but also tried locally on my laptop. It can take more than 10 seconds for a 30kb text file that was indexed with chunk_size = 256

LLogan M

Yea if you have a vector index, it's generating 1526? Sized vectors for each chunk, so saving this might take some time when you have a lot of chunks. Not sure how that can be sped up 🤔

If saving is going to be a common operation, it might be worthwhile to setup a 3rd party vector store (pinecone, etc.). Then, the vectors are never saved in the index

LLogan M

When you save the index to disk, I would guess it's much larger than 30kb

yyoelk

Correct

yyoelk

I guess I'm getting a huge json when getting the embeddings from open AI. If using Pinecone, don't I need to somehow covert this json to a string before saving it in the DB? Where would the vectors be saved?

LLogan M

I'm pretty sure when using a store like pinecone, it can send the raw numbers, rather than converting to string (type conversion is probably what's slowing things down, if I had to guess). And this is all handled under the hood

Maybe @jerryjliu0 can confirm this haha

jjerryjliu0

@yoelk with pinecone, you don't need to do save_to_disk or save_to_string, when you add documents to the pinecone index it'll automatically be stored in the pinecone backend

yyoelk

Thanks @jerryjliu0 and @Logan M !

Add a reply