Find answers from the community

Updated 3 months ago

SummaryIndex persistence

Hey I am trying to use the SummaryIndex mechansim for the agent tool.

I have:
Plain Text
summary_index = SummaryIndex(nodes)
        summary_index.storage_context.persist(os.path.join(persist_dir, "summary_index"))

        summary_query_engine = summary_index.as_query_engine(
            llm=self.model,
            response_mode="tree_summarize",
            use_async=True,
        )
        summary_tool = QueryEngineTool.from_defaults(
            name=f"summary_tool_{class_name}",
            query_engine=summary_query_engine,
            description=(f"Useful for summarization questions related to {class_name}"),
        )

I cannot figure out how to persist both the vector_index and the summary_index on disk so I do not have to regenerate it. How do you recommend I do that.

Also, how do I check the summary mechanism is even working ? The summary_index.summary=None which tells me something is off. Is the summary text generated and stored somewhere by any chance?
c
W
L
4 comments
@WhiteFang_Jr would you know πŸ‘†
feel like I am doing this wrong. am I supposed to save the summary index in a separate location or store in the same place as the vector index tool?
Attachment
image.png
I think summaryIndex only breaks the docs into nodes and when you ask a query it iterates over all the nodes to formulate the final answer.

Persisting should work. does index.storage_context.persist(persist_dir='dir_name') not work?
summary_index.summary is not relevant

As long as both indexes have the same storage context, you can persist both to disk

Plain Text
storage_context = StorageContext.from_defaults()

vector_index = VectorStoreIndex(..., storage_context=storage_context)
vector_index.set_index_id("vector")
summary_index = SummaryIndex(..., storage_context=storage_context)
summary_index,set_index_id("summary")

storage_context.persist(...)

storage_context = StorageContext.from_defaults(persist_dir="...")
vector_index = load_index_from_storage(storage_context, index_id="vector")
summary_index = load_index_from_storage(storage_context, index_id="summary")
Add a reply
Sign up and join the conversation on Discord