Find answers from the community

Home
Members
sahilthakur3
s
sahilthakur3
Offline, last seen 3 months ago
Joined September 25, 2024
Hi Team,

Issue: Very long time being taken to preprocess a txt file which is 50 MB in size.

Explanation on the issue:

In our preprocess flow (adding to knowledge base flow), we tried uploading a 58MB txt file. The file was broken into 80k chunks which needed to be uploaded into our pinecone vector store with llama_index wrappers.

We are seeing that the storage_context.docstore.add_documents()function is taking a very long time in getting executed.
After that, the GPTVectorStoreIndex(nodes, storage_context, service_context)is also taking a very long time.

My interepretation on it:

I think the number of chunks (80k+) is causing this slowness and our document getting "stuck" in the process. Not sure how to fix this because we have been using the same chunk size and text splitter for months and they have performed really well.

Can someone help us with it? Any ideas on how to scale up in such cases?
4 comments
W
K
Hi team, had asked the same question on Monday as well but did not receive any response so just bringing it up again :-

Hi everyone, getting the following error sometimes [No idea on how this is reproducable as haven't been able to do so] , I think this has started coming ever since we upgraded the llama_index version, can someone help with a solution? ?:-

Error occurred in get_answer: 1 validation error for VectorStoreQuerySpec
root
VectorStoreQuerySpec expected dict not NoneType (type=type_error)
10 comments
r
T
k
L
s
Hi, I'm facing an issue in using llamaindex - the error seems to be coming from llama_index/utils.py ->
4 comments
L
s
Getting an error when updating our model to gpt-4-0613. Getting the following error -

Error - Unknown model: gpt-4-0613. Please provide a valid OpenAI model name.Known models are: gpt-4, gpt-4-0314, gpt-4-32k, gpt-4-32k-0314, gpt-3.5-turbo, gpt-3.5-turbo-0301, text-ada-001, ada, text-babbage-001, babbage, text-curie-001, curie, davinci, text-davinci-003, text-davinci-002, code-davinci-002, code-davinci-001, code-cushman-002, code-cushman-001.

Anyone facing this while updating their model version?

PS - Have updated openai library to 0.27.8
1 comment
L