Find answers from the community

Updated 7 months ago

I have read quite a bit on

I have read quite a bit on DocumentSummaryIndex. However, I have not found a single example that stored this index in persistent storage such as a Chroma Database. Of course, I could write custom tools, but would rather not. Has anybody stored this type of index on a file system? I am interested in working examples. Since a summary index can be expensive to compute if there are many long documents, and it is meant to be reused many times, I surmise that such examples must exist. I am working on my laptop (i.e., not in the cloud). Thanks.

I am trying to work with a DocumentSummaryIndex together with Chroma, and I fear I have a seroius misunderstanding. All examples I have see that discuss this particular index do so using VectorStoreindex. The DocumentSummaryIndex is composed of nodes and summaries. Given a set of documents, I chunk them into nodes. I then save these nodes into a Chroma database with the idea to reload them at a later time to consturct my DocumentSummaryIndex. Since I know that indexes can be persisted, I figured that the DocumentSummaryIndex could be stored in the Chroma Databse. Is this correct, or am I mistaken. If I the former, I would really appreciate a minimum working example that demonstrates saving nodes and index to the database and reloading the data. I am working 100% with open source models.
L
e
4 comments
The document summary index uses the docstore and vector store attached to the storage context

You need both. And chroma cant be a docstore
Thanks Logan. Is there an example somewhere? So you are saying that first generate the nodes, store them in a VecotrStore (along with metadata, and even content), and then create a docstore for the document summary index?
no example I saw in my ctrl-f journey, but its similar to other indexes

Plain Text
from llama_index.core import DocumentSummaryIndex, StorageContext
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.core.storage.index_store import SimpleIndexStore

# could also use redis, mongodb, docstores/index stores
docstore = SimpleDocumentStore()
vector_store = ChromaVectorStore(...)
index_store = SimpleIndexStore()

# if left blank, they default to the "simple" versions, just showing this for consistency 
storage_context = StorageContext.from_defaults(
  vector_store=vector_store,
  docstore=docstore,
  index_store=index_store
)

index = DocumentSummaryIndex.from_documents(documents, ..., storage_context=storage_context)

# chroma saves automatically
# simple document store saves to disk, others would save automatically as well
index.storage_context.persist(persist_dir="./storage")

# then load -- provide vector store since its not saved to disk
from llama_index.core import load_index_from_storage
index = load_index_from_storage(StorageContext.from_defaults(persist_dir="./storage", vector_store=vector_store))
Thank you, @Logan M ! This will get me going.
Add a reply
Sign up and join the conversation on Discord