LlamaIndex

Log inLog into community

Find answers from the community

Updated 5 months ago

Hello I just had a question about how

Hello I just had a question about how

At a glance

The post is about a community member's question on how composability works under the hood in the GPT-Index library. They have created a composable graph with multiple vector indexes and are interested in understanding the specific query configurations needed to handle different query use cases, such as querying individual documents, comparing/contrasting between documents, and summarizing.

The comments provide helpful explanations and suggestions from other community members. They explain that with the default settings, the query will retrieve the top 1 matching node from each vector index and send it to the language model to generate the response. For comparing documents, they suggest using the "single-step query decomposition" feature. They also recommend having specific indexes for specific use cases, as it allows tuning the configurations for each use case.

Regarding the query configuration, the community members provide an example configuration and explain that it sets the configuration for each type of index in the graph. They also mention that for a simple vector index, the "index_struct_type" may need to be changed from "dict" to "simple_dict".

The community members also address a follow-up question from the original poster, confirming that their understanding of the query configuration is accurate.

Additionally, the community members discuss some technical details related to token chunking and the relationship between max_

Useful resources

·

Hello! I just had a question about how composability works under the hood. My use case is as follows
I have there docs as vector stores:-
index1 = GPTSimpleVectorIndex.from_documents(doc1)
index2 = GPTSimpleVectorIndex.from_documents(doc2)
index3 = GPTSimpleVectorIndex.from_documents(doc3)

I then created a list index and further transformed it into a graph:-
graph = ComposableGraph.build_from_indices(
GPTListIndex,
[index1, index2, index3],
index_summaries=[index1_summary, index2_summary, index3_summary],
)

My queries involve a mixture of questions such as querying individual docs (eg. summarise doc1), compare/contrast between pairs or triplets, etc

I was just wondering if there are any specific query_config I need to use and how exactly it works to address the above query use cases?

L

j

k

36 comments

Good question!

Since you've wrapped it with a list index, every query will, with default settings, get the top 1 matching node from each vector index and send it to the LLM to answer the query.

For comparing docs, you might be interested in this page:
https://gpt-index.readthedocs.io/en/latest/how_to/query/query_transformations.html#single-step-query-decomposition

Generally though, to answer the rest of your questions, I think you will find that having specific indexes for specific use cases ends up working quite well. Your current structure works well for general queries, and also comparing/contrasting use the query decomposition.

For summaizing, you may need to increase the similarity_top_k parameter in the query configs for the vector indexes to get decent summaries. Normalling, a plain list index over a document will give the best summaries.

The advantage I find with having specific indexes for specific use cases is that you can tune configs for each use case, and if you are using langchain, each index can be a "tool" in langchain, which is a very powerful pattern.

exactly what @Logan M said!

we also just landed a change (like an hour ago), that allows you to easily reuse nodes across different indices, so that building a new index on the same nodes doesn't duplicate your data. this should allow you to easily create a bunch of indices over the same data for different use cases. you can see an initial snippet here: https://gpt-index.readthedocs.io/en/latest/guides/primer/usage_pattern.html#reusing-nodes-across-index-structures

Thank you so much @Logan M and @jerryjliu0 ! Just a follow up question on "Since you've wrapped it with a list index, every query will, with default settings, get the top 1 matching node from each vector index and send it to the LLM to answer the query." :-

So in a general query, I would atmost have three candidate nodes in total (one from each list index node) which then get fed into the response synthesis module? And these nodes will used to create the final response (eg. using create and refine)?

And just another follow up on that. My current understanding is that the actual querying strategy to be used for the composability graph (eg. the default setting mentioned by @Logan M ) depends entirely on the query_config. I was just wondering if you had a bit more comprehensive documentation to construct the query_config as it will be a of great help especially to people from non-technical background 🙂

My current query_config that I am using is as follows:-
query_configs = [
{
"index_struct_type": "dict",
"query_mode": "default",
"query_kwargs": {
"similarity_top_k": 1,

}
},
{
"index_struct_type": "list",
"query_mode": "default",
"query_kwargs": {
"response_mode": "tree_summarize",
}
},
]

My interpretation of this query_config is that I will get atmost three candidate nodes (one per list index node) and these will be used by the response synthesis module to create the final response using the tree_summarise technique?

Just wanted to check if that is accurate

You are correct! Basically the query config is setting the config for each type of index in your graph.

BTW, for a simple vector index, if youbhave issues with your config example, you might need to change "dict" to "simple_dict"

Thank you so much!

Hello again! Just a quick question
I created a graph and during query I am getting an error
TypeError Traceback (most recent call last)
<ipython-input-78-e09dfe80b36a> in <cell line: 26>()
24
25 query = "Hello"
---> 26 response_summary = graph.query(query)
27 print(response_summary)

8 frames
/usr/local/lib/python3.9/dist-packages/llama_index/indices/query/embedding_utils.py in _hash(self, node)
51 """Generate a unique key for each node."""
52 # TODO: Better way to get unique identifier of a node
---> 53 return str(abs(hash(node.get_text())))
54
55 def add(self, node: Node, similarity: float) -> None:

TypeError: unhashable type: 'dict'

Could you kindly help? @jerryjliu0 @Logan M

And Also just a non-related question regarding tokens

chunk_size_limit = 3347

max_input_size = 4097

max_tokens = 600

Lets say for a given query llama-index picks up single chunk which is used as context. Does this mean the total number of input tokens are 3347 + tokens in query? and lets say we get an output. Will this mean 3347 + tokens in query + tokens in response = 4097?

I think this error might be related to how you created the index. Somehow node.get_text() is returning a dictionary instead of a string? 🙃

Ahh Isee

I think I messed up my dataloader

Not quite!

The chunk size + prompt is the input size yes. But it won't always generate max_tokens (in fact in most cases, the tokens in response is much lower than the max. But this depends on the query and what you asked the LLM to do)

Ok I see. So max_input_size defines the total context size of the model and max_tokens is the total possible response tokens?

Because I have a large txt document and I calculated the total number of tokens using tiktoken cl100k_base model. I then divided that by chunk_size_limit to get the expected number of chunks. But then once I ran GPTVectorindex for the document the number of chunks was larger than the calculation. I used the follwing code
items = index_set['ceimia']._index_struct.embeddings_dict
print(len(items))

WHich is why I got a bit confused on how exactly we define the chunk size directly

Yeaaaa it can be mysterious sometimes 😅 internally llama index uses tiktoken.get_encoding("gpt2")

Ohh I see. So when we define chunk_size_limit = 3000 for example, it is not gauranteed to split my document into chunk of 300 tokens?

3000*

I think it tries to get as close as possible without going over? The logic is a little hard to understand (I'm reading the source code now haha)

And this is just for embeddings. Since queries have dynamic sizes, it might get split again at query time to fit in the prompt

Oh thats fine! I sometimes like to get to the bottom of stuff!

Thank you for all your help! Llama-index is definitely a game changer in the LLM space

Haha I agree! Happy to help 💪

Hi @Logan M , just another Q 😆 - for step_decompose_transform is there any max_iter for the number of times it an ask follow up questions? Because I noticed sometime to stops asking follow up questions after 3 iterations?

This is without the new query being None

hmmm I don't see anything in the code about a limit 🤔

Interesting

THat is a strange effect then. Because after three follow up questions, the chatbot automatically moves to the next node in the list_index

@jerryjliu0 if you can help debug this itll be great! 🙂

yep there's a limit!

hmm it's a bit tricky since it's actually in the outer QueryCombiner class, which we don't officially expose to the user atm

see get_default_query_combiner in gpt_index.indices.query.query_combiner.base to see how to construct a MultiStepQueryCombiner

you can pass in both a "query_transform" and a "query_combiner" into the query_configs (right now i'm assuming you're just passing in the query_transform)

we'll expose the query combiner class soon!

Add a reply

Sign up and join the conversation on Discord