Find answers from the community

Updated 3 months ago

When using PGVectorStore with

When using PGVectorStore with AutoMergingRetriever I receive the error: ValueError: doc_id e6646445-0d1d-4626-aec8-f9389e12a038 not found. -- but this doc id exists in my vector table. What am I missing?

(Tried to ask our kapa bot but it does not know the answer: https://discord.com/channels/1059199217496772688/1195412574838194277)
L
c
10 comments
automerging retriever uses the docstore to retreive nodes, not the vector db. Hence, if you aren't using the docstore, it probably won't be there
Ah interesting! Is there any reference example that gets a vector db converts to a docstore? Or, in other words, what is the correct scenario to go from vector db to AutoMerging retriever?
Ideally as you build the index, you are also using and persisting a docstore

The auto-merging retriever example shows using the docstore directly to keep track of nodes that aren't actually embedded, but needed to perform the auto-merging step

https://docs.llamaindex.ai/en/stable/examples/retrievers/auto_merging_retriever.html#load-into-storage

Only the leaf nodes get embedded into your vector db
So I need to persist both the leaf nodes in a vector db + a doc store?

I'm a little confused because I had the impression that the vector db would store everything and I would be able to rebuild from it... πŸ€”

Excerpt from https://docs.llamaindex.ai/en/stable/module_guides/storing/storing.html
Many vector stores (except FAISS) will store both the data as well as the index (embeddings). This means that you will not need to use a separate document store or index store.


So, in the case of AutoMerging I need both?
Right, but in this case, there is extra data that we aren't embedding, hence it has to live somewhere else
There's a few features in llama-index that rely on it tbh.

Tbh been noodling on some ideas to make this clearer in the framework. A docstore has a lot of interesting applications overall
Got it! Thank you so much @Logan M ! Btw I've been binge watching your content and it's amazing! I'm learning tons! πŸ™‚

Just to wrap up, so if I have an architecture where I have multiple servers nodes reading from the same pgvector, I also need a redis/mongo/s3 to persist the docstore for them too right?
(I was looking the docstore in the llamaindex docs and it seems that we cant persist into PG, right?)
Glad you like the content! :dotsCATJAM:

Yea thats correct. Technically we could maybe implement a generic "database" docstore using sqlalchemy πŸ€” But using a db just to store a lot of text feels a little dirty haha.
I'd rather do that vs adding more complexity to the infra. PG works well for storing text blobs anyway...
If I wanted to extend and contribute to llamaindex docstores to add the functionality to persist it into a pgvectorstore table, which classes should I look for?
That would be awesome πŸ™ I might be wrong, but I think it could be a generic database store if you use sqlalchemy?

You'd want to add a new kvstore here (here's the mongodb one):
https://github.com/run-llama/llama_index/blob/main/llama_index/storage/kvstore/mongodb_kvstore.py

And then also add a docstore that uses that kvstore
https://github.com/run-llama/llama_index/blob/main/llama_index/storage/docstore/mongo_docstore.py

And then also add an index store that uses that kvstore
https://github.com/run-llama/llama_index/blob/main/llama_index/storage/index_store/mongo_index_store.py

The kvstore has most of the logic, the other two are just light wrappers for specific storage
Add a reply
Sign up and join the conversation on Discord