Find answers from the community

Updated last year

I am facing a high memory consumption

At a glance
The community member is facing a high memory consumption issue when refreshing documents using documents.index.refresh_ref_docs(documents). There are around 230 documents, and the memory consumption spikes to over 16GB, leading to an out-of-memory (OOM) issue. The community members discuss potential reasons for this issue and suggest processing the documents in batches to mitigate the memory usage. They also discuss the use of Weaviate and the docstore, and how that might impact the memory usage.
I am facing a high memory consumption issue when I am refreshing the documents.
Plain Text
index.refresh_ref_docs(documents)

There are around 230 documents. The memory consumption spikes to 16+ Gb and the node does not have it and that leads to OOM issue.

What could be the reason for it, and are there any resolutions?

I can provide other details if needed.
L
D
6 comments
how big is your saved index?
or I guess, the docstore.json file specifically
Where can I check that?

Should I check it via weaviate client or it is in llama_index.

Infra information: weaviate is currently deployed in k8s.
ohhhh you are using weaviate πŸ‘€

Are you also persisting/managing the docstore object? Without the docstore, refresh doesn't work (it will insert everything). But with external vector dbs, the docstore is not used unless you set store_nodes_override=True πŸ€”
I'm guessing your 230 documents are causing so many embedding vectors to be loaded into memory that it is causing memory issues. You will probably have to process them in batches
Ah ok, i will try that.

The 230 documents are being created out of 50 files, and I am using SimpleDirectoryReader, so probably some of the readers create multiple documents from each file.

I definitely have to get some understandings of indexes.
Add a reply
Sign up and join the conversation on Discord