Find answers from the community

Updated 4 months ago

Hello I just had a question about how

At a glance
Hello! I just had a question about how composability works under the hood. My use case is as follows
I have there docs as vector stores:-
index1 = GPTSimpleVectorIndex.from_documents(doc1)
index2 = GPTSimpleVectorIndex.from_documents(doc2)
index3 = GPTSimpleVectorIndex.from_documents(doc3)

I then created a list index and further transformed it into a graph:-
graph = ComposableGraph.build_from_indices(
GPTListIndex,
[index1, index2, index3],
index_summaries=[index1_summary, index2_summary, index3_summary],
)

My queries involve a mixture of questions such as querying individual docs (eg. summarise doc1), compare/contrast between pairs or triplets, etc

I was just wondering if there are any specific query_config I need to use and how exactly it works to address the above query use cases?
L
j
k
36 comments
Good question!

Since you've wrapped it with a list index, every query will, with default settings, get the top 1 matching node from each vector index and send it to the LLM to answer the query.

For comparing docs, you might be interested in this page:
https://gpt-index.readthedocs.io/en/latest/how_to/query/query_transformations.html#single-step-query-decomposition

Generally though, to answer the rest of your questions, I think you will find that having specific indexes for specific use cases ends up working quite well. Your current structure works well for general queries, and also comparing/contrasting use the query decomposition.

For summaizing, you may need to increase the similarity_top_k parameter in the query configs for the vector indexes to get decent summaries. Normalling, a plain list index over a document will give the best summaries.

The advantage I find with having specific indexes for specific use cases is that you can tune configs for each use case, and if you are using langchain, each index can be a "tool" in langchain, which is a very powerful pattern.
exactly what @Logan M said!
we also just landed a change (like an hour ago), that allows you to easily reuse nodes across different indices, so that building a new index on the same nodes doesn't duplicate your data. this should allow you to easily create a bunch of indices over the same data for different use cases. you can see an initial snippet here: https://gpt-index.readthedocs.io/en/latest/guides/primer/usage_pattern.html#reusing-nodes-across-index-structures
Thank you so much @Logan M and @jerryjliu0 ! Just a follow up question on "Since you've wrapped it with a list index, every query will, with default settings, get the top 1 matching node from each vector index and send it to the LLM to answer the query." :-

So in a general query, I would atmost have three candidate nodes in total (one from each list index node) which then get fed into the response synthesis module? And these nodes will used to create the final response (eg. using create and refine)?

And just another follow up on that. My current understanding is that the actual querying strategy to be used for the composability graph (eg. the default setting mentioned by @Logan M ) depends entirely on the query_config. I was just wondering if you had a bit more comprehensive documentation to construct the query_config as it will be a of great help especially to people from non-technical background πŸ™‚

My current query_config that I am using is as follows:-
query_configs = [
{
"index_struct_type": "dict",
"query_mode": "default",
"query_kwargs": {
"similarity_top_k": 1,

}
},
{
"index_struct_type": "list",
"query_mode": "default",
"query_kwargs": {
"response_mode": "tree_summarize",
}
},
]
My interpretation of this query_config is that I will get atmost three candidate nodes (one per list index node) and these will be used by the response synthesis module to create the final response using the tree_summarise technique?
Just wanted to check if that is accurate
You are correct! Basically the query config is setting the config for each type of index in your graph.

BTW, for a simple vector index, if youbhave issues with your config example, you might need to change "dict" to "simple_dict"
Thank you so much!
Hello again! Just a quick question
I created a graph and during query I am getting an error
TypeError Traceback (most recent call last)
<ipython-input-78-e09dfe80b36a> in <cell line: 26>()
24
25 query = "Hello"
---> 26 response_summary = graph.query(query)
27 print(response_summary)

8 frames
/usr/local/lib/python3.9/dist-packages/llama_index/indices/query/embedding_utils.py in _hash(self, node)
51 """Generate a unique key for each node."""
52 # TODO: Better way to get unique identifier of a node
---> 53 return str(abs(hash(node.get_text())))
54
55 def add(self, node: Node, similarity: float) -> None:

TypeError: unhashable type: 'dict'

Could you kindly help? @jerryjliu0 @Logan M
And Also just a non-related question regarding tokens

chunk_size_limit = 3347

max_input_size = 4097

max_tokens = 600

Lets say for a given query llama-index picks up single chunk which is used as context. Does this mean the total number of input tokens are 3347 + tokens in query? and lets say we get an output. Will this mean 3347 + tokens in query + tokens in response = 4097?
I think this error might be related to how you created the index. Somehow node.get_text() is returning a dictionary instead of a string? πŸ™ƒ
I think I messed up my dataloader
Not quite!

The chunk size + prompt is the input size yes. But it won't always generate max_tokens (in fact in most cases, the tokens in response is much lower than the max. But this depends on the query and what you asked the LLM to do)
Ok I see. So max_input_size defines the total context size of the model and max_tokens is the total possible response tokens?
Because I have a large txt document and I calculated the total number of tokens using tiktoken cl100k_base model. I then divided that by chunk_size_limit to get the expected number of chunks. But then once I ran GPTVectorindex for the document the number of chunks was larger than the calculation. I used the follwing code
items = index_set['ceimia']._index_struct.embeddings_dict
print(len(items))
WHich is why I got a bit confused on how exactly we define the chunk size directly
Yeaaaa it can be mysterious sometimes πŸ˜… internally llama index uses tiktoken.get_encoding("gpt2")
Ohh I see. So when we define chunk_size_limit = 3000 for example, it is not gauranteed to split my document into chunk of 300 tokens?
I think it tries to get as close as possible without going over? The logic is a little hard to understand (I'm reading the source code now haha)
And this is just for embeddings. Since queries have dynamic sizes, it might get split again at query time to fit in the prompt
Oh thats fine! I sometimes like to get to the bottom of stuff!
Thank you for all your help! Llama-index is definitely a game changer in the LLM space
Haha I agree! Happy to help πŸ’ͺ
Hi @Logan M , just another Q πŸ˜† - for step_decompose_transform is there any max_iter for the number of times it an ask follow up questions? Because I noticed sometime to stops asking follow up questions after 3 iterations?
This is without the new query being None
hmmm I don't see anything in the code about a limit πŸ€”
THat is a strange effect then. Because after three follow up questions, the chatbot automatically moves to the next node in the list_index
@jerryjliu0 if you can help debug this itll be great! πŸ™‚
yep there's a limit!
hmm it's a bit tricky since it's actually in the outer QueryCombiner class, which we don't officially expose to the user atm
see get_default_query_combiner in gpt_index.indices.query.query_combiner.base to see how to construct a MultiStepQueryCombiner
you can pass in both a "query_transform" and a "query_combiner" into the query_configs (right now i'm assuming you're just passing in the query_transform)
we'll expose the query combiner class soon!
Add a reply
Sign up and join the conversation on Discord