Find answers from the community

Updated last year

Top_K

At a glance

The community member is building an index and adding a URL to the document's extra information. When a response is returned, it includes multiple source_nodes, and the community member wants to select only the specific node they need to pull the link from. The community members discuss using top_k and similarity_top_k in the query engine to limit the number of source_nodes returned, as well as using a SimilarityPostprocessor to set a similarity threshold. However, they find that these approaches do not work as expected for a composable graph. The community members suggest that the top_K functionality is not yet implemented for the composable graph, and the community member has to handle this manually for now.

Useful resources
When I build my index, i'm adding a URL to the document to the extra info.

When a response is returned, it returns multiple source_nodes, one of which is the node i need to pull the link from. Is there a way I can select this node only? Or have it only return this as the source node?
W
H
10 comments
You can define top_k value in your query_engine.
What it does is it fetches top_K number of similar records from the whole documents based on cosine similarity.

Default value for top_K is 2.



I would also suggest to look into SimilarityPostprocessor this can be helpful for you as well. Using this you can set up a threshold value below which will not be picked as source_nodes to prepare your response.


One sample code implementing above two cases would look like this

Plain Text
query_engine = vector_index.as_query_engine(similarity_top_k=3,node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.75)])

Check out more here: https://gpt-index.readthedocs.io/en/stable/core_modules/query_modules/node_postprocessors/root.html#modules
that's awesome, thank you. I've implemented top_k, but this doesn't seem to work for a graph.


Plain Text
    
graph = ComposableGraph.from_indices(
        SummaryIndex,
        [notion_index, gdrive_index],
        index_summaries=["General purpose index from Notion", "General purpose index from Google Drive"],
        service_context=service_context,
        top_k=1,
    )

Plain Text
    query_engine = graph.as_query_engine(
        top_k=1,
        text_qa_template=prompt
    )
the above returns 6 source nodes
You have to use similarity_top_k in place of top_k inside your query_engine

Plain Text
    query_engine = graph.as_query_engine(
        similarity_top_k=1,
        text_qa_template=prompt
    )
😦 that still returns multiple
i've build a little function that finds the lowest scoring now so it works, but not sure what im missing
Yep, seems like top_K part is not implemented on ComposableGraph query_engine.
You'll have to do it at your end for now, it seems
thanks for your help!
Hey! Are you using SimpleKeywordTableIndex in your composable graph?
SummaryIndex
Add a reply
Sign up and join the conversation on Discord