Could anyone please share a working

This example is incomplete? What errors are you getting when you include query_configs? https://gpt-index.readthedocs.io/en/latest/how_to/composability.html#querying-the-top-level-index

@Logan M I used this example but changed the "tree" to "dict" as my underlying indices are SimpleVectors.
I got
ValueError: Vector store is required for vector store query.

If you are using GPTSimpleVectorIndex, I think the correct index_struct_type is simple_dict

Couldn't see it in
https://gpt-index.readthedocs.io/en/latest/reference/indices/composability_query.html?highlight=index_struct_type#module-gpt_index.data_structs.struct_type

Must be out of date, I was looking at the codebase https://github.com/jerryjliu/gpt_index/blob/main/gpt_index/data_structs/struct_type.py

Thanks @Logan M !. One related question- I thought that in the above case the list index will query each node underneath which is a SimpleVector index. In this case, why is it required to set_text() for each SimpleVector index? Not sure how it works in this aspect

This is going to be mostly a guess here, just judging off of the code and what is written in the docs.

Each index in the composable graph uses a summary to represent itself. During query time, this summary is used to help find answers. If you don't provide a summary, llama_index will use the LLM to generate one for each of you vector indexes.

So in your case, you have a list index, with each item in the list being a vector index. So if I query something, it will use the index summary text, plus the closest matching embedding(s) under that index to generate an answer. @jerryjliu0 can probably clarify this further if I'm way off base haha

@Logan M Hmm, I guess it means that embedding of the summary is calculated on the fly with every query, and then the closest node (based on vector similarity) is being queried to retrieve its top_k?
Unfortunately couldn't find any place that explains how it works, hopefully @jerryjliu0 will come to the rescue 🙂

Since the top-level index is a list index, the default query should use each vector index inside of it 🤔 (and from each vector index, it uses the top_k)

@Logan M so in this case I don't understand why a summary is required and how it is used

Yea I agree, with a ListIndex as the top level, the best use-case I can think of is it helps with summarization queries? 🤷‍♂️ A little weird

A better use-case might be using a vector index in the top level as well (so it uses the embeddings from the summaries to find the best sub-index). Or in this example, they use a keyword index. https://github.com/jerryjliu/gpt_index/blob/main/examples/composable_indices/ComposableIndices.ipynb

Then, how the summaries are used is much more obvious lol

@Logan M thanks for your insights. The reason I thought of using ListIndex on top is that there's no way of merging multiple index files (jsons) into one. I guess I can simply query them one by one without using ListIndex, I only wonder how I should send the LLM only the global top_k paragraphs from all indices instead of top_k from each one

@yoelk You could use the index.insert() function to add documents after the index is created, effectively merging the indexes. This notebook uses the list/tree indexes, but I think it will work with vector indexes as well https://github.com/jerryjliu/gpt_index/blob/main/examples/paul_graham_essay/InsertDemo.ipynb

So if you have two indexes, you could iterate over the nodes in one index to add to the other? 🤔

oops yeah the code is updated now, "simple_dict" is an old structure, "dict" is the new struct type

cc @Logan M @yoelk almost! we actually don't automatically generate the summary for an index if it's not set. The reason we require this in the first place for composability, is imagine loading in "indices" into another index the same as documents - we'd need text associated with each index, so the top-level index can do the text-chunking + data structure formation over this test.

there are special edge cases, e.g. the list index doesn't technically need any text to start with (you could set it to a blank string and you'd get largely the same default behavior), but other indices would (e.g. if you tried to compose a vector index ontop of another vector index)

@jerryjliu0 Is that possible - to iterate over the nodes of one index and add to the other?

@yoelk ooh yeah not officially supported but let me think about this and get back to you!

@jerryjliu0 Thanks! If possible that would be super helpful for me

you could manually go through index.index_struct.nodes_dict, create a Document object over each node.get_text(), and insert into a new index. but again this isn't officially supported in a nice way

^^ that's exactly what I was thinking too. Doesn't look nice, but should work!

@jerryjliu0 But that would practically re-index everything while I already indexed the files in parallel (AWS lambda). I thought I could only append the nodes with their precalculated vectors

Hmm...as a quick solution, maybe there's a way for me to load them as json dictionaries (using json.load()), merge them somehow, and save the merged json?

actually, a Document object contains an "embedding" field. If that is specified, then we don't compute an embedding under the hood

i think that kind of answers your concern about "re-indexing"

Thanks @jerryjliu0 . Do you think that's also something worth looking at?

sorry wdym, that's something you can try out atm

I meant that the index files are jsons, which I can load using Python's json loader (json.load(file)). Then, maybe I can merge them and save the merged json as merged_index.json

It's a little more tricky than that, since the json contains many top-level keys that can be specific to each index type. You can try and figure it out, but I think the method using index.index_struct.nodes_dict will be more straightforward, since it uses the insert function that every index implements.

Play around with it, I'm sure you'll get it 💪