Find answers from the community

Updated 4 months ago

Top_K

At a glance

The community member is interested in making the similarity_top_k argument dynamic, so that it adjusts based on the number of relevant paragraphs. The comments suggest using a SimilarityPostprocessor to set a relevancy threshold, and then using the top_k value to limit the number of nodes returned. One community member provides example code for how to set this up. Another community member mentions plans to derive the top_k value programmatically based on the number of relevant nodes.

Useful resources

AAnoDy

Hello, everyone. I am interested in this argument "similarity_top_k".

Is it possible to make it dynamic? That is, if only one paragraph matches the most - similarity_top_k == 1. If there are several really relevant ones, similarity_top_k == 2. etc.

4 comments

WWhiteFang_Jr

You can use similarity node postprocessor and set a threshold value for relevancy.

What this will do is bring all the relevant nodes above the set threshold value and be used for response generation.

You can then set top_k value to 10 or 15 but it will bring only those number of modes that breaches the threshold.

https://docs.llamaindex.ai/en/stable/module_guides/querying/node_postprocessors/node_postprocessors.html#similaritypostprocessor

WWhiteFang_Jr

Pass this processor in the query_engine

WWhiteFang_Jr

Like this

Plain Text

from llama_index.indices.postprocessor import SimilarityPostprocessor

postprocessor = SimilarityPostprocessor(similarity_cutoff=0.7)

query_engine = index.as_query_engine(
    similarity_top_k=10, node_postprocessors=[postprocessor]
)
response = query_engine.query(
    "How much did the author raise in seed funding from Idelle's husband"
    " (Julian) for Viaweb?",
)

ŁŁukasz

I haven't done this yet, but I have plans to do this programatically - if you extract some metadata and label the nodes during node parsing, you can count the relevant nodes for each query and derive the top_k value based on that information

Add a reply