When using PGVectorStore with

At a glance

The community member is experiencing an issue with the PGVectorStore and AutoMergingRetriever, where they are receiving a "doc_id not found" error, even though the doc_id exists in their vector table. The comments suggest that the AutoMergingRetriever uses the docstore to retrieve nodes, not the vector database, and that the community member may need to persist both the leaf nodes in a vector database and a separate docstore.

The community members discuss the correct scenario for using a vector database with the AutoMergingRetriever, and it is suggested that the community member should be using and persisting a docstore as they build the index. The docstore is used to keep track of nodes that are not embedded in the vector database. The community members also discuss the possibility of extending the llamaindex library to allow persisting the docstore in a PostgreSQL database.

Useful resources

ccodermickey

When using PGVectorStore with AutoMergingRetriever I receive the error: ValueError: doc_id e6646445-0d1d-4626-aec8-f9389e12a038 not found. -- but this doc id exists in my vector table. What am I missing?

(Tried to ask our kapa bot but it does not know the answer: https://discord.com/channels/1059199217496772688/1195412574838194277)

10 comments

LLogan M

automerging retriever uses the docstore to retreive nodes, not the vector db. Hence, if you aren't using the docstore, it probably won't be there

ccodermickey

Ah interesting! Is there any reference example that gets a vector db converts to a docstore? Or, in other words, what is the correct scenario to go from vector db to AutoMerging retriever?

LLogan M

Ideally as you build the index, you are also using and persisting a docstore

The auto-merging retriever example shows using the docstore directly to keep track of nodes that aren't actually embedded, but needed to perform the auto-merging step

https://docs.llamaindex.ai/en/stable/examples/retrievers/auto_merging_retriever.html#load-into-storage

Only the leaf nodes get embedded into your vector db

ccodermickey

So I need to persist both the leaf nodes in a vector db + a doc store?

I'm a little confused because I had the impression that the vector db would store everything and I would be able to rebuild from it... 🤔

Excerpt from https://docs.llamaindex.ai/en/stable/module_guides/storing/storing.html

Many vector stores (except FAISS) will store both the data as well as the index (embeddings). This means that you will not need to use a separate document store or index store.

So, in the case of AutoMerging I need both?

LLogan M

Right, but in this case, there is extra data that we aren't embedding, hence it has to live somewhere else

LLogan M

There's a few features in llama-index that rely on it tbh.

Tbh been noodling on some ideas to make this clearer in the framework. A docstore has a lot of interesting applications overall

ccodermickey

Got it! Thank you so much @Logan M ! Btw I've been binge watching your content and it's amazing! I'm learning tons! 🙂

Just to wrap up, so if I have an architecture where I have multiple servers nodes reading from the same pgvector, I also need a redis/mongo/s3 to persist the docstore for them too right?
(I was looking the docstore in the llamaindex docs and it seems that we cant persist into PG, right?)

LLogan M

Glad you like the content! :dotsCATJAM:

Yea thats correct. Technically we could maybe implement a generic "database" docstore using sqlalchemy 🤔 But using a db just to store a lot of text feels a little dirty haha.

ccodermickey

I'd rather do that vs adding more complexity to the infra. PG works well for storing text blobs anyway...
If I wanted to extend and contribute to llamaindex docstores to add the functionality to persist it into a pgvectorstore table, which classes should I look for?

LLogan M

That would be awesome 🙏 I might be wrong, but I think it could be a generic database store if you use sqlalchemy?

You'd want to add a new kvstore here (here's the mongodb one):
https://github.com/run-llama/llama_index/blob/main/llama_index/storage/kvstore/mongodb_kvstore.py

And then also add a docstore that uses that kvstore
https://github.com/run-llama/llama_index/blob/main/llama_index/storage/docstore/mongo_docstore.py

And then also add an index store that uses that kvstore
https://github.com/run-llama/llama_index/blob/main/llama_index/storage/index_store/mongo_index_store.py

The kvstore has most of the logic, the other two are just light wrappers for specific storage

Add a reply

Find answers from the community

When using PGVectorStore with