Find answers from the community

Updated 6 months ago

Building Multi-Tenancy RAG System with L...

At a glance
The community member is implementing a solution for a multiuser operation using PGVectorStore, PostgresDocumentStore, and PostgresIndexStore. They are unsure if each user should have a different index stored in the PostgresIndexStore, as the only available example on multi-user/multi-tenant applications does not cover this concept. The community member also struggles to understand the purpose of the IndexStore, as most examples and tutorials focus on the VectorStore. Another community member clarifies that by default, the vector store stores all the node data, and the index store and document store are not used. They can be overridden if needed, but most users do not require this capability. The community member further explains that the persisted document and index stores are not necessary for performance, as the data is hosted in Postgres and can be reloaded efficiently. The IndexStore was previously used to hold the structure of the index, but its utility has decreased over time, and it now mainly keeps track of available nodes. There is no explicitly marked answer in the provided information.
Useful resources
So, I am currently implementing a solution for a multiuser operation using PGVectorStore and the recently created PostgresDocumentStore and PostgresIndexStore, for Key Value Storage. According to what I read, should each user have a different index stored in the PostgresIndexStore?. The only example available on multi-user/multi-tenant applications on the llamaindex blog (https://blog.llamaindex.ai/building-multi-tenancy-rag-system-with-llamaindex-0d6ab4e0c44b) speaks about metadata filtering, and does not include the concepts of index or separate indexes for this purpose. For this reason, I still struggle understanding the idea behind the IndexStore, as most of the examples and tutorials available harness only the VectorStore. Having asked this, after digging on the web a bit, I noticed that the PGVectorStore seems to not be working as expected with the IndexStore as seen on this PR https://github.com/run-llama/llama_index/issues/7360 . Should I pursue the integration of those three stores?. I will greatly appreciate any guideline or support
L
T
5 comments
By default, the vector store is storing all the node data, the index store and docstore are not used

You can override this if you want, as the github issue mentions VectorStoreIndex(..., store_nodes_override=True) -- in most cases, you would only need to do this if you need easy/quick key/val access to all the nodes in your index.

Most of the time, users do not need this capability
First of all, thank you so much for your expert guidance. One of the common topics I see around is that the persisted docstores and indexstores improve performance when loading big knowledge bases, as when these not exist, the framework will have to regenerate them in memory from the vectorstore, as you clarified today on another thread:

You don't need to call persist() on most 3rd party vector stores

You can reload the index by doing
VectorStoreIndex.from_vector_store(vector_store, service_context=service_context)

For an application that will maybe have thousands of documents or entries, for maybe thousands of users, does it make sense to avoid the use of the docstore and indexstore? what is the criteria that I should have in mind to use them or not?

Lastly, when you say that most of the time users do not need this capability, when would be a right time to use it?

Once again, thanks for your amazing help and support
the framework will have to regenerate them in memory from the vectorstore this isn't true actually. Since the data is hosted in postgres, it stays there. from_vector_store() is essentially a no-op πŸ‘€

when would be a right time to use it -- some features require access to all nodes (BM25 retriever, hierarchical retriever, etc.). In these cases, having at least the docstore makes this possible
once again thank you so much. One last question: so what is the purpose of the indexstore in the grand scheme of things?
At one time, it was holding the "structure" of the index (i.e. mapping keywords to chunks, etc.). But this utility is used less and less these days. As well, it keeps track of which "nodes" are available to an index (but this is mostly used if you saved multiple indexes to disk under the same storage context)
Add a reply
Sign up and join the conversation on Discord