Find answers from the community

Updated 4 months ago

Top_K

At a glance
The community member is interested in making the similarity_top_k argument dynamic, so that it adjusts based on the number of relevant paragraphs. The comments suggest using a SimilarityPostprocessor to set a relevancy threshold, and then using the top_k value to limit the number of nodes returned. One community member provides example code for how to set this up. Another community member mentions plans to derive the top_k value programmatically based on the number of relevant nodes.
Useful resources
Hello, everyone. I am interested in this argument "similarity_top_k".

Is it possible to make it dynamic? That is, if only one paragraph matches the most - similarity_top_k == 1. If there are several really relevant ones, similarity_top_k == 2. etc.
W
Ł
4 comments
You can use similarity node postprocessor and set a threshold value for relevancy.

What this will do is bring all the relevant nodes above the set threshold value and be used for response generation.


You can then set top_k value to 10 or 15 but it will bring only those number of modes that breaches the threshold.

https://docs.llamaindex.ai/en/stable/module_guides/querying/node_postprocessors/node_postprocessors.html#similaritypostprocessor
Pass this processor in the query_engine
Like this

Plain Text
from llama_index.indices.postprocessor import SimilarityPostprocessor

postprocessor = SimilarityPostprocessor(similarity_cutoff=0.7)

query_engine = index.as_query_engine(
    similarity_top_k=10, node_postprocessors=[postprocessor]
)
response = query_engine.query(
    "How much did the author raise in seed funding from Idelle's husband"
    " (Julian) for Viaweb?",
)
I haven't done this yet, but I have plans to do this programatically - if you extract some metadata and label the nodes during node parsing, you can count the relevant nodes for each query and derive the top_k value based on that information
Add a reply
Sign up and join the conversation on Discord