LlamaIndex

Log inLog into community

Find answers from the community

Updated 2 years ago

Any art on how to properly summarize a

Any art on how to properly summarize a

At a glance

yyourbuddyconner

·

Any art on how to properly summarize a SimpleVectorIndex? I am seeing that it picks out subsets of my document and mode="tree_summarize" doesnt seem to be a thing on this index type.

Getting KeyError: <QueryMode.SUMMARIZE: 'summarize'> too...

Goal is to summarize the index such that it can be placed in a TreeIndex for hierarchical organization. And then facilitate vector querying at query-time for efficient retrieval.

y

B

j

22 comments

yyourbuddyconner

Answer: response_mode="tree_summarize"

yyourbuddyconner

Thought, still producing weird summaries need to dig in more...

yyourbuddyconner

Something like this...

Plain Text

    summary = index.query(
        """
        Write a summary of the index and table of contents, including the title and author.
        """, 
        response_mode="tree_summarize",
        similarity_top_k=5
    )

yyourbuddyconner

Having an issue with the summary for a particularly large text, this works for the smaller corpuses

BBCM [wade.digital]

I loaded the gpt-index code into a vector index and asked it if it was possible to use vector indexes in composable ones and it said no. Not sure if that's accurate, was coming here to ask if that was true.

yyourbuddyconner

Yeah it's definitely possible, I have a TreeIndex -> multiple VectorIndexes

yyourbuddyconner

My issue is the summaries at each node in the TreeIndex seem to be coming from subsets of the document as opposed to the entire document.

Trying to have it grab the ToC but that doesnt seem to work.

Considering trying out the langchain HyDE workflow to see if that produces embeddings which summarize better...

@yourbuddyconner since simplevectorindex first extracts the top-k nodes determiend by similarity_top_k, tree summarize will only summarize over these extracted nodes.

if you want to summarize over all your nodes, i'd put everything into a list index, and then just run query with response_mode="tree_summarize

yyourbuddyconner

Nice

yyourbuddyconner

Will try that out

yyourbuddyconner

So maybe a workflow would be:

Load corpus into list to summarize
Load corpus into vector and attach summary as text
Load vectors into tree for retrieval

?

For list index, yes it's a nice tool for complete summarization. For the latter two steps you could also try using the SimpleVectorIndex! It just embeds text chunks under the hood, and does top-k lookup during query-time. Can be good for more specific retrieval queries

yyourbuddyconner

yeah brain LLM was summarizing lmao, referring to SimpleVectorIndex for sure

yyourbuddyconner

Any ideas on making tree-summarization more efficient on large documents?

Seems to be a tradeoff between speed and correctness...

yyourbuddyconner

Ex. Document i am working with has 2535 nodes in the ListIndex

One idea is first using a SimpleVectorIndex with a larger top_k on a "prequery", e.g. "Give me documents that are relevant for this summary query I want to run". Then convert retrieved nodes back into Documents, feed into a list index

yyourbuddyconner

Nice yeah I saw you mentioned that to Dan earlier

haven't had the chance to include the new HYDE paper yet 🙂 e.g. using the implementation from langchain in a seamless way

yyourbuddyconner

Word, I am just gonna kludge it for now (google a summary lol) and will figure out how to best programmatically summarize docs later

yyourbuddyconner

Encourages me that we're still soooooooo early in the development of the tooling here

yyourbuddyconner

You're a pioneer Jerry lol

thanks for the support!

Add a reply

Sign up and join the conversation on Discord