I am facing a high memory consumption

At a glance

The community member is facing a high memory consumption issue when refreshing documents using documents.index.refresh_ref_docs(documents). There are around 230 documents, and the memory consumption spikes to over 16GB, leading to an out-of-memory (OOM) issue. The community members discuss potential reasons for this issue and suggest processing the documents in batches to mitigate the memory usage. They also discuss the use of Weaviate and the docstore, and how that might impact the memory usage.

DDilbar

I am facing a high memory consumption issue when I am refreshing the documents.

Plain Text

index.refresh_ref_docs(documents)

There are around 230 documents. The memory consumption spikes to 16+ Gb and the node does not have it and that leads to OOM issue.

What could be the reason for it, and are there any resolutions?

I can provide other details if needed.

6 comments

LLogan M

how big is your saved index?

LLogan M

or I guess, the docstore.json file specifically

DDilbar

Where can I check that?

Should I check it via weaviate client or it is in llama_index.

Infra information: weaviate is currently deployed in k8s.

LLogan M

ohhhh you are using weaviate 👀

Are you also persisting/managing the docstore object? Without the docstore, refresh doesn't work (it will insert everything). But with external vector dbs, the docstore is not used unless you set store_nodes_override=True 🤔

LLogan M

I'm guessing your 230 documents are causing so many embedding vectors to be loaded into memory that it is causing memory issues. You will probably have to process them in batches

DDilbar

Ah ok, i will try that.

The 230 documents are being created out of 50 files, and I am using SimpleDirectoryReader, so probably some of the readers create multiple documents from each file.

I definitely have to get some understandings of indexes.

Add a reply

Find answers from the community

I am facing a high memory consumption