Find answers from the community

Updated 2 years ago

Any art on how to properly summarize a

Any art on how to properly summarize a SimpleVectorIndex? I am seeing that it picks out subsets of my document and mode="tree_summarize" doesnt seem to be a thing on this index type.

Getting KeyError: <QueryMode.SUMMARIZE: 'summarize'> too...

Goal is to summarize the index such that it can be placed in a TreeIndex for hierarchical organization. And then facilitate vector querying at query-time for efficient retrieval.
y
B
j
22 comments
Answer: response_mode="tree_summarize"
Thought, still producing weird summaries need to dig in more...
Something like this...

Plain Text
    summary = index.query(
        """
        Write a summary of the index and table of contents, including the title and author.
        """, 
        response_mode="tree_summarize",
        similarity_top_k=5
    )    
Having an issue with the summary for a particularly large text, this works for the smaller corpuses
I loaded the gpt-index code into a vector index and asked it if it was possible to use vector indexes in composable ones and it said no. Not sure if that's accurate, was coming here to ask if that was true.
Yeah it's definitely possible, I have a TreeIndex -> multiple VectorIndexes
My issue is the summaries at each node in the TreeIndex seem to be coming from subsets of the document as opposed to the entire document.

Trying to have it grab the ToC but that doesnt seem to work.

Considering trying out the langchain HyDE workflow to see if that produces embeddings which summarize better...
@yourbuddyconner since simplevectorindex first extracts the top-k nodes determiend by similarity_top_k, tree summarize will only summarize over these extracted nodes.

if you want to summarize over all your nodes, i'd put everything into a list index, and then just run query with response_mode="tree_summarize
So maybe a workflow would be:
  • Load corpus into list to summarize
  • Load corpus into vector and attach summary as text
  • Load vectors into tree for retrieval
?
For list index, yes it's a nice tool for complete summarization. For the latter two steps you could also try using the SimpleVectorIndex! It just embeds text chunks under the hood, and does top-k lookup during query-time. Can be good for more specific retrieval queries
yeah brain LLM was summarizing lmao, referring to SimpleVectorIndex for sure
Any ideas on making tree-summarization more efficient on large documents?

Seems to be a tradeoff between speed and correctness...
Ex. Document i am working with has 2535 nodes in the ListIndex
One idea is first using a SimpleVectorIndex with a larger top_k on a "prequery", e.g. "Give me documents that are relevant for this summary query I want to run". Then convert retrieved nodes back into Documents, feed into a list index
Nice yeah I saw you mentioned that to Dan earlier
haven't had the chance to include the new HYDE paper yet πŸ™‚ e.g. using the implementation from langchain in a seamless way
Word, I am just gonna kludge it for now (google a summary lol) and will figure out how to best programmatically summarize docs later
Encourages me that we're still soooooooo early in the development of the tooling here
You're a pioneer Jerry lol
thanks for the support!
Add a reply
Sign up and join the conversation on Discord