Include summary

Sorry, where is this option used? It doesn't ring any immediate bells lol

You can see it commented out in a bunch of the example notebooks (e.g. https://colab.research.google.com/drive/1uL1TdMbR4kqa0Ksrd_Of_jWSxWt1ia7o#scrollTo=82b43d58-5753-4035-9ea6-f8bfa860f89c&line=1&uniqifier=1) and it's also referenced in this PR (https://github.com/jerryjliu/llama_index/pull/148). It seems like a useful function, but after looking at the code, it doesn't seem used anywhere ( https://github.com/jerryjliu/llama_index/search?q=include_summary )?

Seems like its still supported!

https://github.com/jerryjliu/llama_index/blob/83a7c646ad16509048c18aee3c2b2d69fa9b8b81/gpt_index/indices/query/base.py

Oh wait, is it used? Lol

Doesn't seem like it xD It's just set on init now?

Yeaaa I see, it's not used lol.

I think the main point of the summary was to use it in the composable graphs 🤔 I can't say why it was removed from refinement tbh

So I'm not exactly sure what the flow is for composable graphs, when I query my composable graph which is a tree in embedding mode on top of a bunch of vector stores, does it first check the embedding similarity of the summaries of each vector store with the query before doing the embedding search in the vector store itself? Or does it first do the embedding search and then pass it up to the tree to pick num_branch_children vector store sources?

Ngl with the tree, it's a bit of mystery to me too.

At a high level, having a tree index on top of your vector indexes, it organizes the summaries of each vector index into a tree where each parent node summarizes its two children nodes

But for the query (normal and emebdding), I haven't looked at the source code enough to understand how it works.

Maybe @jerryjliu0 has a good summary of how tree queries work on hand haha

I feel like if it's not already, being able to use the summaries in some way to get the final answer would be good... in my use case, oftentimes the summary itself has a lot of useful info... I guess alternatively I could add the summary text itself into each vector store so that it's also query-able hmm 🤔

cc @Logan M @LLYX if you have a tree on top of a bunch of vector stores, it would first query the tree, "retrieve" the relevant leaf nodes (which correspond to your vector indexes), and then query the retrieved vector indexes.

The way a tree index works in a composed graph is basically "routing" to child nodes. so imagine a one-layer tree with one root node and 5 child nodes. querying the tree index would just pick the child node to go down into

@LLYX include_summary=True will also add the summary of each index as context too. make sure you set include_summary=True for each subindex in your overall query config

So if my tree layer is set to embedding mode, is it doing a similarity search between the query embedding and the embedding of the vector store summaries? Thanks for answering!

And you mean set include_summary for each my simple_dicts, right?

yep! since tree index in embedding mode just matches query against embedding of child nodes

yep!