Find answers from the community

Updated last year

Is it possible to store a

At a glance

The community member is trying to store a DocumentSummaryIndex in a Chroma Vector store, but is encountering issues. The VectorStoreIndex has a .from_vector_store() method, but DocumentSummaryIndex does not. When trying to load the index, the community member gets an error about no persist_dir. The community member then persists the DocumentSummaryIndex to a file and loads it back.

The comments clarify that the DocumentSummaryIndex still requires a docstore, and the summaries are embedded and inserted into the vector database. When querying, the summary is retrieved, and all the nodes that make up the document are pulled from the docstore to pass to the LLM for the RAG Q&A.

Is it possible to store a DocumentSummaryIndex in a chroma Vector store? VectorStoreIndex has a .from_vector_store() method but documentSummaryIndex does not. When trying to do load_index_from_storage(storage_context=storage_context, service_context=service_context) i get an error about no persist_dir as follows ValueError: No index in storage context, check if you specified the right persist_dir. and requires me to .persist() the documentSummaryIndex to a file.

here is how i load the index:

Plain Text
 
    db = chromadb.PersistentClient(path="./chroma_db")
    chroma_collection = db.get_or_create_collection("test")
    vector_store = ChromaVectorStore(chroma_collection=chroma_collection, persist_dir="./chroma_db")
    storage_context = StorageContext.from_defaults(vector_store=vector_store, persist_dir="./chroma_db")
    service_context = ServiceContext.from_defaults(
        llm=chatgpt, 
        transformations=extractors,
        embed_model=embedding, 
        system_prompt=system_prompt)

    doc_summary_index = DocumentSummaryIndex.from_documents(documents=docs, 
                                                            storage_context=storage_context,
                                                            service_context=service_context, 
                                                            show_progress=True)
    doc_summary_index.storage_context.persist(persist_dir="./chroma_db")


And then loading it back after

Plain Text
 doc_summary_index = load_index_from_storage(storage_context=storage_context, service_context=service_context)

    query_engine = doc_summary_index.as_query_engine(
        response_mode="tree_summarize", use_async=True, service_context=service_context
    )
L
d
7 comments
Document summary index still requires a docstore, only the summaries are embeded and inserted into the vector db
ah ok thats where i was getting confused.
How is the docstore used when querying then?
So a summary is retrieved, it's replaced by all the nodes that make up that document. Those nodes live in the docstore
ah i see, the summary has a reference to all the nodes (Split by SentenceSplitter or whatever) , so whichever summary gets chosen on the query, it then pulls all the nodes with that summary as a parent to pass to the LLM for the RAG Q&A
correct βœ…
super helpful thank u
Add a reply
Sign up and join the conversation on Discord