Find answers from the community

Updated 4 months ago

`index service context llm predictor

At a glance
index._service_context.llm_predictor.last_token_usage()

index._service_context.embed_model.last_token_usage()
k
L
69 comments
I just printed response.sourc_nodes. However the first node is not a context but I think an output of an LLM call? Not sure why this is appearing πŸ€”
Hmmm sounds sus lol, what kind of index do you have?
lol I have the vector index
Essentially I want to access all the candidate nodes and their cosine sim scores so I can add a citation feature
Yea for sure! That makes sense

What did the source node look like that seemed weird?
Basically it is not part of any context. It doesnt have docID and seems identical to the final response
This is my query config
Attachment
Screenshot_2023-04-17_at_6.53.16_pm.png
I have two vector indices composed
Oh so you have a graph, not a single vector index πŸ˜…

That source might be the summary of that particular sub-index
I will explore this further, but thanks!
Sounds good! πŸ’ͺ
ALso just a quick question, what is the query mode exactly and what is difference between default and recurssive?
I think that might be leading to this node issue
Because from my current understanding, my query config is over a graph where each node is a vector index. SO during query time I should get the most similar graph node and then within that the most similar chunk?
Similarly this config should give me the top two graph nodes and within each the top chunk?
Attachment
Screenshot_2023-04-17_at_7.09.36_pm.png
But I think I may be missing something
don't worry about setting query mode for now. For graphs, recursive is default which means it checks the index summaries, then goes into the corresponding matching index (i.e. recursive). Each sub-index will use default, which is fine

One thing you are missing I think is that you only need a query config PER index type.

Since your graph is a vector index on top of vector indexes, you can specify a config for both by using index IDs

How it works is you set a config for the top level. You probably want to set top k = 1, so that it looks at all the summaries of the sub-indexes and returns the sub index that closest matches the query

Then, that sub index is queried with a different config (maybe top k = 2) and the answer is returned

Check out this page for setting index IDs in configs: https://gpt-index.readthedocs.io/en/latest/how_to/index_structs/composability.html#querying-the-graph

You also might be interested in this very new tutorial, that covers some other cool things (like building graphs with graphs!) https://gpt-index.readthedocs.io/en/latest/guides/tutorials/graph.html
Thank you, you are amazing!
So in my case I have 5 graph nodes, so will I have 6 total query configs? (one for the graph and 1 each for the 5 graph nodes?)
Well, it depends actually. I usually like to think of it in terms of layers

You might have a specific config for the top layer, using the struct ID field to specify the config for that

Then, if all the sub-nodes are vector indexes, one more query config will apply to all of them.

Basically, query configs are applied per type per ID
Hope that makes sense lol
Oh so you only need two, so one for the graph node (by mentioning the struct ID) and one for all the subindices?
Even if those subindices have different index IDs?
ALso do you mind explaining how the default query mode works ? πŸ™‚
Yea you got it. If you don't specify the ID in the config, then it applies to all the indexes in your graph that has that type! πŸ’ͺ
Like, for the entire graph? Or for a single vector index? πŸ˜…
Just a single vector index or any appropriate index
graph2.index_struct.index_id = "compare_contrast"
query_configs_fast = [
{
"index_struct_type": "simple_dict",
"query_mode": "default",
"query_kwargs": {
"similarity_top_k": 3,
"verbose": True
},

NOTE: set query transform for subindices

"query_transform": step_decompose_transform

},
{
"index_struct_id": "compare_contrast",
"index_struct_type": "simple_dict",
"query_mode": "default",
"query_kwargs": {
"similarity_top_k": 2,
"verbose": True
}
},
]
def ask_ai():

while True:

query = input("Ask: ")
response = graph2.query(query, query_configs=query_configs_fast)
display(Markdown(f"Response: <b>{response.response}</b>"))

Also the index id method doesnt seem to work. The model is overriding the first config and directly applying the second config 😦
ANd so I am getting 4 total candidate nodes
i.e two graph nodes and 2 chunks per graph node
Alright, here it goes!

So, when you create an index, all your data is chunked/embedded. At query time, the query text is also embedded. Using cosine similarity, llama index fetches the top_k closest matching text chunks

Next step is creating the answer. Llama Index takes the first text chunk and asks the LLM to answer the query. Then, with the initial answer, the second text chunk is sent to the LLM, and llama index asks the LLM to either update the the existing answer using the new text, or just repeat the existing answer.

If you set response_mode="compact", then instead of making one call per top k node, it stuffs as much node text as possible into each LLM call. This is usually most helpful if you increase the top k and decrease the chunk size limit, since it can reduce the LLM calls
So, the top level has a top k of two here, so it will use the 2 sub-indexes that have summaries that best match the query

Then for each sub-index, it fetches two nodes to help answer the query

So, 4 nodes in total
Hmm I want top 3 nodes for each subindex
I thought the first query config should allow that lool
me too lol

Something fishy is going on. Did you set the index id of the root node to be "compare_contrast" ?
root node as in the sub indices?
Ive set the graph index id as "graph2.index_struct.index_id = "compare_contrast""
lemme double check if that works or not haha
Try this instead

Plain Text
# get root index
root_index = graph2.get_index(graph2.index_struct.root_id, GPTSimpleVectorIndex)
# set id of root index
root_index.index_struct.index_id = "compare_contrast"
I thiiiink that might work
Hmm still only 4 nodes
didnt work 😦
Maybe try reordering the configs (that seems silly to do but who knows lol)
ahaha it works!
does the order actually matter then?
I guess it does! It must apply them in order, πŸ˜…
from outer to inner
Hi @Logan M ! Sorry for bothering πŸ˜… but just a quick one. I tried the llm_predictor.last_token_usage but it keeps giving me 0. Just wondering if you could help with this lol
I tried multiple strategies and still getting 0
The response I am getting is "answer"*256 (i.e 265 times)
ohhh I think the graph kind of messes with the token counts... What if you do llm_predictor.total_tokens_used ? (This is the accumulated count, but it might work for graphs)
What is the difference?
So this is the token count across the lifetime of the llm_predictor
It doesnt get reset like last_token_usage does
something about graphs is resetting I guess?
is there a manual way of resetting llm_predictor.total_tokens_used?
It might be easier to keep track of prev_total_tokens_used in it's own variable and just subtract to find the difference?

But you might be able to reset it like this (not sure if it's protected or not lol, this is major haxs) llm_predictor._total_tokens_used = 0
Add a reply
Sign up and join the conversation on Discord