Find answers from the community

Updated last year

Quick question

Quick question

For VectorStoreIndex, we have an option to store the index to various VectorStorage, such as Chroma, Pinecone, and so on.

For other type of indexes (KeywordTableIndex, DocumentSummaryIndex, etc), do we have a similar storage solution? Or is it just in-memory and disk persist?
L
b
R
21 comments
Good question!

Other indexes can be stored remotely using integrations like mongodb or redis. You have to store both the index store and docstore for these indexes

You can also use fsspec and specify remote storage (aws, google cloud, etc)
Here's an example @Rendy Febry https://gpt-index.readthedocs.io/en/stable/core_modules/data_modules/storage/docstores.html it is using VectorStoreIndex but you can pass storage_context to most any index.
Thank @Logan M @bmax

I know IndexStore and DocStore have multiple storage option.

But I think the KeywordTableIndex, DocumentSummaryIndex, etc doesn't have them
Thanks @bmax

I just tried them, and yep like you side it store to mongodb (I use MongoDocumentStore)

Follow up question then, is the keyword table index should have same data with vector store index? Because I see the generated data on mongo, it only contain the document text and the metadata.

As my understanding keywordtable index should extract the relavant keyword from the document, where this keywords stored?
Attachment
Screenshot_2023-09-06_at_09.57.47.png
Wait a sec, I can see the generated keywords on the debugger, but can't found them stored anywhere on the MongoDocumentStore.

Any idea @Logan M ?
Attachment
Screenshot_2023-09-06_at_10.07.29.png
I think the keywords are stored on the nodes but not sure that storage context saves those
The keywords are in the index store actually πŸ‘€
will let Logan illuminate
@Logan M On which collection exactly? I can't find on any of them
Attachment
Screenshot_2023-09-06_at_10.10.03.png
if you're using the thign I sent
that's the docstore mongo
Oh wait a sec, you said the index store, if I understand correctly thos are docstore
from llama_index.storage.docstore import MongoDocumentStore
vs
from llama_index.storage.index_store import MongoIndexStore
I use both @bmax that's why I confuse, let me check the IndexStore one
Okay found it,

I issue this is key pair of the keyword and the node/doc ids that contain them right?
Attachment
Screenshot_2023-09-06_at_10.12.55.png
looks right to me
All this time I always though the IndexStore itself is unuseful because it only store a very limited number of info when paired with VectorStore. Interesting

I wonder why it store the "summary" on index store instead of you know store the contained keywords on the docstore itself, just like storing the embedding
Anyway, thanks guys!
Add a reply
Sign up and join the conversation on Discord