LlamaIndex

Log inLog into community

Find answers from the community

Updated 5 months ago

`index service context llm predictor

`index service context llm predictor

At a glance

The community members are discussing issues with a vector index and a graph structure. They are trying to access candidate nodes and their cosine similarity scores to add a citation feature. The discussion covers topics like query modes, index IDs, and query configurations for graphs with sub-indices. The community members explore different approaches, including setting the index ID of the root node, reordering the query configs, and understanding how the default query mode works. They seem to find a solution, but the order of the query configs appears to matter in their case.

Useful resources

·

index._service_context.llm_predictor.last_token_usage()

index._service_context.embed_model.last_token_usage()

k

L

69 comments

I just printed response.sourc_nodes. However the first node is not a context but I think an output of an LLM call? Not sure why this is appearing 🤔

Hmmm sounds sus lol, what kind of index do you have?

lol I have the vector index

Essentially I want to access all the candidate nodes and their cosine sim scores so I can add a citation feature

Yea for sure! That makes sense

What did the source node look like that seemed weird?

Basically it is not part of any context. It doesnt have docID and seems identical to the final response

This is my query config

Attachment

Screenshot_2023-04-17_at_6.53.16_pm.png

I have two vector indices composed

Oh so you have a graph, not a single vector index 😅

That source might be the summary of that particular sub-index

I think

Ohh I see

I will explore this further, but thanks!

Sounds good! 💪

ALso just a quick question, what is the query mode exactly and what is difference between default and recurssive?

I think that might be leading to this node issue

Because from my current understanding, my query config is over a graph where each node is a vector index. SO during query time I should get the most similar graph node and then within that the most similar chunk?

Similarly this config should give me the top two graph nodes and within each the top chunk?

Attachment

Screenshot_2023-04-17_at_7.09.36_pm.png

But I think I may be missing something

don't worry about setting query mode for now. For graphs, recursive is default which means it checks the index summaries, then goes into the corresponding matching index (i.e. recursive). Each sub-index will use default, which is fine

One thing you are missing I think is that you only need a query config PER index type.

Since your graph is a vector index on top of vector indexes, you can specify a config for both by using index IDs

How it works is you set a config for the top level. You probably want to set top k = 1, so that it looks at all the summaries of the sub-indexes and returns the sub index that closest matches the query

Then, that sub index is queried with a different config (maybe top k = 2) and the answer is returned

Check out this page for setting index IDs in configs: https://gpt-index.readthedocs.io/en/latest/how_to/index_structs/composability.html#querying-the-graph

You also might be interested in this very new tutorial, that covers some other cool things (like building graphs with graphs!) https://gpt-index.readthedocs.io/en/latest/guides/tutorials/graph.html

Thank you, you are amazing!

So in my case I have 5 graph nodes, so will I have 6 total query configs? (one for the graph and 1 each for the 5 graph nodes?)

Well, it depends actually. I usually like to think of it in terms of layers

You might have a specific config for the top layer, using the struct ID field to specify the config for that

Then, if all the sub-nodes are vector indexes, one more query config will apply to all of them.

Basically, query configs are applied per type per ID

Hope that makes sense lol

Oh so you only need two, so one for the graph node (by mentioning the struct ID) and one for all the subindices?

Even if those subindices have different index IDs?

ALso do you mind explaining how the default query mode works ? 🙂

Yea you got it. If you don't specify the ID in the config, then it applies to all the indexes in your graph that has that type! 💪

Like, for the entire graph? Or for a single vector index? 😅

Just a single vector index or any appropriate index

graph2.index_struct.index_id = "compare_contrast"
query_configs_fast = [
{
"index_struct_type": "simple_dict",
"query_mode": "default",
"query_kwargs": {
"similarity_top_k": 3,
"verbose": True
},

NOTE: set query transform for subindices

"query_transform": step_decompose_transform

},
{
"index_struct_id": "compare_contrast",
"index_struct_type": "simple_dict",
"query_mode": "default",
"query_kwargs": {
"similarity_top_k": 2,
"verbose": True
}
},
]
def ask_ai():

while True:

query = input("Ask: ")
response = graph2.query(query, query_configs=query_configs_fast)
display(Markdown(f"Response: <b>{response.response}</b>"))

Also the index id method doesnt seem to work. The model is overriding the first config and directly applying the second config 😦

ANd so I am getting 4 total candidate nodes

i.e two graph nodes and 2 chunks per graph node

Alright, here it goes!

So, when you create an index, all your data is chunked/embedded. At query time, the query text is also embedded. Using cosine similarity, llama index fetches the top_k closest matching text chunks

Next step is creating the answer. Llama Index takes the first text chunk and asks the LLM to answer the query. Then, with the initial answer, the second text chunk is sent to the LLM, and llama index asks the LLM to either update the the existing answer using the new text, or just repeat the existing answer.

If you set response_mode="compact", then instead of making one call per top k node, it stuffs as much node text as possible into each LLM call. This is usually most helpful if you increase the top k and decrease the chunk size limit, since it can reduce the LLM calls

So, the top level has a top k of two here, so it will use the 2 sub-indexes that have summaries that best match the query

Then for each sub-index, it fetches two nodes to help answer the query

So, 4 nodes in total

Hmm I want top 3 nodes for each subindex

I thought the first query config should allow that lool

me too lol

Something fishy is going on. Did you set the index id of the root node to be "compare_contrast" ?

root node as in the sub indices?

then no

Ive set the graph index id as "graph2.index_struct.index_id = "compare_contrast""

lemme double check if that works or not haha

Thanks ahah

Try this instead

Plain Text

# get root index
root_index = graph2.get_index(graph2.index_struct.root_id, GPTSimpleVectorIndex)
# set id of root index
root_index.index_struct.index_id = "compare_contrast"

I thiiiink that might work

Hmm still only 4 nodes

didnt work 😦

Maybe try reordering the configs (that seems silly to do but who knows lol)

ahaha it works!

strange that

does the order actually matter then?

I guess it does! It must apply them in order, 😅

from outer to inner

makes sense!

Hi @Logan M ! Sorry for bothering 😅 but just a quick one. I tried the llm_predictor.last_token_usage but it keeps giving me 0. Just wondering if you could help with this lol

I tried multiple strategies and still getting 0

The response I am getting is "answer"*256 (i.e 265 times)

ohhh I think the graph kind of messes with the token counts... What if you do llm_predictor.total_tokens_used ? (This is the accumulated count, but it might work for graphs)

lemme check

It works!

Yesssss

What is the difference?

So this is the token count across the lifetime of the llm_predictor

It doesnt get reset like last_token_usage does

something about graphs is resetting I guess?

ohh I see

is there a manual way of resetting llm_predictor.total_tokens_used?

It might be easier to keep track of prev_total_tokens_used in it's own variable and just subtract to find the difference?

But you might be able to reset it like this (not sure if it's protected or not lol, this is major haxs) llm_predictor._total_tokens_used = 0

Nice!

Thanks a lot!

Add a reply

Sign up and join the conversation on Discord