Say I already have a collection of tree

At a glance

Say I already have a collection of tree indices. How should I compose a graph over these tree indices to handle users' questions on them. How to generate a effective summary over them?

10 comments

LLogan M

If you only have a few indexes, you can manually write the summaries to guide the query, like this
https://gpt-index.readthedocs.io/en/latest/guides/tutorials/graph.html#defining-a-graph-for-compare-contrast-queries

But otherwise, your best bet is just querying the tree to generate a summary of iteslf

MMilkman

I wonder what should on top of the tree indices if I want to compare and contrast. Keyword table doesn't seem to perform well. And I tried building a list index on top, it seems like the query was just based on the summaries not on the actual subindex

LLogan M

List index uses the summaries, as well as the subindex, but it will query EVERY subindex, which can be slow

Probably another tree index on top, or a vector index.

MMilkman

All of these top index structs should enter the subindex for details right? I think it might be the way I wrote my query configs

MMilkman

Does the order of query configs matter? If I have two tree, one on top of the other, how do I specified their query configs separately.

LLogan M

Yea the configs are applied in order

So the more general ones should go on top

You can specify configs for a specific index using their ID when you have multiple indexes of the same type, lemme find an example

LLogan M

Here they set an ID for an index
https://gpt-index.readthedocs.io/en/latest/guides/tutorials/graph.html#defining-the-set-of-indexes

Then it's used in this config
https://gpt-index.readthedocs.io/en/latest/guides/tutorials/graph.html#querying-our-unified-interface

LLau Fla

Very interesting example in that link, can you please explain:

Why similarity_top_k: 1 for the vector indices?
Is the 1st query_configs still used once you composed the final one that contains the KW, Tree, and all Vector indices? Or was it overwritten?
You said more general first, I see that in the final query_configs you go from KW, Tree to the single vector ones. But: to me it seems the most general one is the Tree Index (the abstraction layer), followed by the KW-Index root_node of graph#1 to compare, and then the single vector indices for straight questions.
But why is the query_decompose then on the single simple_dict/Vector indices meant for straight simple questions and not on the root_index one that we built for more complex compare/contrast questions?

Great example to learn
BTW, this one part I could not find an explanation for in the references, what is happening here?

get root index

root_index = graph.get_index(graph.index_struct.root_id, GPTSimpleKeywordTableIndex)

set id of root index

root_index.index_struct.index_id = "compare_contrast"
root_summary = (
"This index contains Wikipedia articles about multiple cities. "
"Use this index if you want to compare multiple cities. "
)

LLau Fla

@Logan M

LLogan M

Just because 😄

It gets overwritten by the end

Yea when I said more general, I meant when you have configs of the same type. So, if you wanted to set a specific vector index config, you can set its specifix config after the config that sets up all vector indexes.

Just a design choice. It starts the compare/contrast query once it gets to the vector index bottom level level

In order to use that graph within another graph, it needs to set the ID and write a summary of the graph.

Add a reply

Find answers from the community

Say I already have a collection of tree

get root index

set id of root index