Hi Team,

Hi Team,

Issue: Very long time being taken to preprocess a txt file which is 50 MB in size.

Explanation on the issue:

In our preprocess flow (adding to knowledge base flow), we tried uploading a 58MB txt file. The file was broken into 80k chunks which needed to be uploaded into our pinecone vector store with llama_index wrappers.

We are seeing that the storage_context.docstore.add_documents()function is taking a very long time in getting executed.
After that, the GPTVectorStoreIndex(nodes, storage_context, service_context)is also taking a very long time.

My interepretation on it:

I think the number of chunks (80k+) is causing this slowness and our document getting "stuck" in the process. Not sure how to fix this because we have been using the same chunk size and text splitter for months and they have performed really well.

Can someone help us with it? Any ideas on how to scale up in such cases?

Find answers from the community

Hi Team,