Top_K

At a glance

The community member is building an index and adding a URL to the document's extra information. When a response is returned, it includes multiple source_nodes, and the community member wants to select only the specific node they need to pull the link from. The community members discuss using top_k and similarity_top_k in the query engine to limit the number of source_nodes returned, as well as using a SimilarityPostprocessor to set a similarity threshold. However, they find that these approaches do not work as expected for a composable graph. The community members suggest that the top_K functionality is not yet implemented for the composable graph, and the community member has to handle this manually for now.

Useful resources

HHABBYMAN

When I build my index, i'm adding a URL to the document to the extra info.

When a response is returned, it returns multiple source_nodes, one of which is the node i need to pull the link from. Is there a way I can select this node only? Or have it only return this as the source node?

10 comments

WWhiteFang_Jr

You can define top_k value in your query_engine.
What it does is it fetches top_K number of similar records from the whole documents based on cosine similarity.

Default value for top_K is 2.

I would also suggest to look into SimilarityPostprocessor this can be helpful for you as well. Using this you can set up a threshold value below which will not be picked as source_nodes to prepare your response.

One sample code implementing above two cases would look like this

Plain Text

query_engine = vector_index.as_query_engine(similarity_top_k=3,node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.75)])

Check out more here: https://gpt-index.readthedocs.io/en/stable/core_modules/query_modules/node_postprocessors/root.html#modules

HHABBYMAN

that's awesome, thank you. I've implemented top_k, but this doesn't seem to work for a graph.

Plain Text

    
graph = ComposableGraph.from_indices(
        SummaryIndex,
        [notion_index, gdrive_index],
        index_summaries=["General purpose index from Notion", "General purpose index from Google Drive"],
        service_context=service_context,
        top_k=1,
    )

Plain Text

    query_engine = graph.as_query_engine(
        top_k=1,
        text_qa_template=prompt
    )

HHABBYMAN

the above returns 6 source nodes

WWhiteFang_Jr

You have to use similarity_top_k in place of top_k inside your query_engine

Plain Text

    query_engine = graph.as_query_engine(
        similarity_top_k=1,
        text_qa_template=prompt
    )

HHABBYMAN

😦 that still returns multiple

HHABBYMAN

i've build a little function that finds the lowest scoring now so it works, but not sure what im missing

WWhiteFang_Jr

Yep, seems like top_K part is not implemented on ComposableGraph query_engine.
You'll have to do it at your end for now, it seems

HHABBYMAN

thanks for your help!

WWhiteFang_Jr

Hey! Are you using SimpleKeywordTableIndex in your composable graph?

HHABBYMAN

SummaryIndex

Add a reply

Find answers from the community

Top_K