Find answers from the community

Updated 9 months ago

I have read quite a bit on

At a glance

The community member is interested in storing a DocumentSummaryIndex in persistent storage, such as a Chroma Database, but has not found any working examples. They are trying to use the DocumentSummaryIndex together with Chroma, but are unsure if it is possible to store the index in the Chroma Database. Another community member responds that the DocumentSummaryIndex uses both a docstore and a vector store, and that Chroma cannot be used as the docstore. The community member then asks for an example of how to properly set up the storage context and persist the index. Another community member provides a code snippet demonstrating how to use a SimpleDocumentStore, ChromaVectorStore, and SimpleIndexStore to create a StorageContext and persist the DocumentSummaryIndex.

I have read quite a bit on DocumentSummaryIndex. However, I have not found a single example that stored this index in persistent storage such as a Chroma Database. Of course, I could write custom tools, but would rather not. Has anybody stored this type of index on a file system? I am interested in working examples. Since a summary index can be expensive to compute if there are many long documents, and it is meant to be reused many times, I surmise that such examples must exist. I am working on my laptop (i.e., not in the cloud). Thanks.

I am trying to work with a DocumentSummaryIndex together with Chroma, and I fear I have a seroius misunderstanding. All examples I have see that discuss this particular index do so using VectorStoreindex. The DocumentSummaryIndex is composed of nodes and summaries. Given a set of documents, I chunk them into nodes. I then save these nodes into a Chroma database with the idea to reload them at a later time to consturct my DocumentSummaryIndex. Since I know that indexes can be persisted, I figured that the DocumentSummaryIndex could be stored in the Chroma Databse. Is this correct, or am I mistaken. If I the former, I would really appreciate a minimum working example that demonstrates saving nodes and index to the database and reloading the data. I am working 100% with open source models.
L
e
4 comments
The document summary index uses the docstore and vector store attached to the storage context

You need both. And chroma cant be a docstore
Thanks Logan. Is there an example somewhere? So you are saying that first generate the nodes, store them in a VecotrStore (along with metadata, and even content), and then create a docstore for the document summary index?
no example I saw in my ctrl-f journey, but its similar to other indexes

Plain Text
from llama_index.core import DocumentSummaryIndex, StorageContext
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.core.storage.index_store import SimpleIndexStore

# could also use redis, mongodb, docstores/index stores
docstore = SimpleDocumentStore()
vector_store = ChromaVectorStore(...)
index_store = SimpleIndexStore()

# if left blank, they default to the "simple" versions, just showing this for consistency 
storage_context = StorageContext.from_defaults(
  vector_store=vector_store,
  docstore=docstore,
  index_store=index_store
)

index = DocumentSummaryIndex.from_documents(documents, ..., storage_context=storage_context)

# chroma saves automatically
# simple document store saves to disk, others would save automatically as well
index.storage_context.persist(persist_dir="./storage")

# then load -- provide vector store since its not saved to disk
from llama_index.core import load_index_from_storage
index = load_index_from_storage(StorageContext.from_defaults(persist_dir="./storage", vector_store=vector_store))
Thank you, @Logan M ! This will get me going.
Add a reply
Sign up and join the conversation on Discord