Find answers from the community

Updated last year

Is it possible to store a

At a glance
Is it possible to store a DocumentSummaryIndex in a chroma Vector store? VectorStoreIndex has a .from_vector_store() method but documentSummaryIndex does not. When trying to do load_index_from_storage(storage_context=storage_context, service_context=service_context) i get an error about no persist_dir as follows ValueError: No index in storage context, check if you specified the right persist_dir. and requires me to .persist() the documentSummaryIndex to a file.

here is how i load the index:

Plain Text
 
    db = chromadb.PersistentClient(path="./chroma_db")
    chroma_collection = db.get_or_create_collection("test")
    vector_store = ChromaVectorStore(chroma_collection=chroma_collection, persist_dir="./chroma_db")
    storage_context = StorageContext.from_defaults(vector_store=vector_store, persist_dir="./chroma_db")
    service_context = ServiceContext.from_defaults(
        llm=chatgpt, 
        transformations=extractors,
        embed_model=embedding, 
        system_prompt=system_prompt)

    doc_summary_index = DocumentSummaryIndex.from_documents(documents=docs, 
                                                            storage_context=storage_context,
                                                            service_context=service_context, 
                                                            show_progress=True)
    doc_summary_index.storage_context.persist(persist_dir="./chroma_db")


And then loading it back after

Plain Text
 doc_summary_index = load_index_from_storage(storage_context=storage_context, service_context=service_context)

    query_engine = doc_summary_index.as_query_engine(
        response_mode="tree_summarize", use_async=True, service_context=service_context
    )
L
d
7 comments
Document summary index still requires a docstore, only the summaries are embeded and inserted into the vector db
ah ok thats where i was getting confused.
How is the docstore used when querying then?
So a summary is retrieved, it's replaced by all the nodes that make up that document. Those nodes live in the docstore
ah i see, the summary has a reference to all the nodes (Split by SentenceSplitter or whatever) , so whichever summary gets chosen on the query, it then pulls all the nodes with that summary as a parent to pass to the LLM for the RAG Q&A
correct βœ…
super helpful thank u
Add a reply
Sign up and join the conversation on Discord