SummaryIndex persistence

At a glance

The community member is trying to use the SummaryIndex mechanism for an agent tool. They have created a summary_index and persisted it to disk, but they are unsure how to persist both the vector_index and the summary_index so they don't have to regenerate them. They also want to check if the summary mechanism is working, as summary_index.summary is None.

In the comments, another community member suggests that the summary_index only breaks the documents into nodes and generates the final answer when a query is made. They recommend using the same storage_context for both the vector_index and the summary_index, and then persisting them together.

Another community member provides a more detailed answer, explaining that as long as both indexes have the same storage_context, they can be persisted together. They provide example code for creating the storage_context, setting the index IDs, and then loading the indexes from storage.

ccmosguy

Hey I am trying to use the SummaryIndex mechansim for the agent tool.

I have:

Plain Text

summary_index = SummaryIndex(nodes)
        summary_index.storage_context.persist(os.path.join(persist_dir, "summary_index"))

        summary_query_engine = summary_index.as_query_engine(
            llm=self.model,
            response_mode="tree_summarize",
            use_async=True,
        )
        summary_tool = QueryEngineTool.from_defaults(
            name=f"summary_tool_{class_name}",
            query_engine=summary_query_engine,
            description=(f"Useful for summarization questions related to {class_name}"),
        )

I cannot figure out how to persist both the vector_index and the summary_index on disk so I do not have to regenerate it. How do you recommend I do that.

Also, how do I check the summary mechanism is even working ? The summary_index.summary=None which tells me something is off. Is the summary text generated and stored somewhere by any chance?

4 comments

ccmosguy

@WhiteFang_Jr would you know 👆

ccmosguy

feel like I am doing this wrong. am I supposed to save the summary index in a separate location or store in the same place as the vector index tool?

Attachment

WWhiteFang_Jr

I think summaryIndex only breaks the docs into nodes and when you ask a query it iterates over all the nodes to formulate the final answer.

Persisting should work. does index.storage_context.persist(persist_dir='dir_name') not work?

LLogan M

summary_index.summary is not relevant

As long as both indexes have the same storage context, you can persist both to disk

Plain Text

storage_context = StorageContext.from_defaults()

vector_index = VectorStoreIndex(..., storage_context=storage_context)
vector_index.set_index_id("vector")
summary_index = SummaryIndex(..., storage_context=storage_context)
summary_index,set_index_id("summary")

storage_context.persist(...)

storage_context = StorageContext.from_defaults(persist_dir="...")
vector_index = load_index_from_storage(storage_context, index_id="vector")
summary_index = load_index_from_storage(storage_context, index_id="summary")

Add a reply

Find answers from the community

SummaryIndex persistence