Find answers from the community

Updated 3 months ago

Building Multi-Tenancy RAG System with L...

So, I am currently implementing a solution for a multiuser operation using PGVectorStore and the recently created PostgresDocumentStore and PostgresIndexStore, for Key Value Storage. According to what I read, should each user have a different index stored in the PostgresIndexStore?. The only example available on multi-user/multi-tenant applications on the llamaindex blog (https://blog.llamaindex.ai/building-multi-tenancy-rag-system-with-llamaindex-0d6ab4e0c44b) speaks about metadata filtering, and does not include the concepts of index or separate indexes for this purpose. For this reason, I still struggle understanding the idea behind the IndexStore, as most of the examples and tutorials available harness only the VectorStore. Having asked this, after digging on the web a bit, I noticed that the PGVectorStore seems to not be working as expected with the IndexStore as seen on this PR https://github.com/run-llama/llama_index/issues/7360 . Should I pursue the integration of those three stores?. I will greatly appreciate any guideline or support
L
T
5 comments
By default, the vector store is storing all the node data, the index store and docstore are not used

You can override this if you want, as the github issue mentions VectorStoreIndex(..., store_nodes_override=True) -- in most cases, you would only need to do this if you need easy/quick key/val access to all the nodes in your index.

Most of the time, users do not need this capability
First of all, thank you so much for your expert guidance. One of the common topics I see around is that the persisted docstores and indexstores improve performance when loading big knowledge bases, as when these not exist, the framework will have to regenerate them in memory from the vectorstore, as you clarified today on another thread:

You don't need to call persist() on most 3rd party vector stores

You can reload the index by doing
VectorStoreIndex.from_vector_store(vector_store, service_context=service_context)

For an application that will maybe have thousands of documents or entries, for maybe thousands of users, does it make sense to avoid the use of the docstore and indexstore? what is the criteria that I should have in mind to use them or not?

Lastly, when you say that most of the time users do not need this capability, when would be a right time to use it?

Once again, thanks for your amazing help and support
the framework will have to regenerate them in memory from the vectorstore this isn't true actually. Since the data is hosted in postgres, it stays there. from_vector_store() is essentially a no-op πŸ‘€

when would be a right time to use it -- some features require access to all nodes (BM25 retriever, hierarchical retriever, etc.). In these cases, having at least the docstore makes this possible
once again thank you so much. One last question: so what is the purpose of the indexstore in the grand scheme of things?
At one time, it was holding the "structure" of the index (i.e. mapping keywords to chunks, etc.). But this utility is used less and less these days. As well, it keeps track of which "nodes" are available to an index (but this is mostly used if you saved multiple indexes to disk under the same storage context)
Add a reply
Sign up and join the conversation on Discord