Find answers from the community

Updated 4 weeks ago

Similarity_top_k limit not being respected

I set similiarty_top_k to the value of 100 but only get back 20-22 nodes. Why is that? I'm using the AgentRunner and a basic knowledge tool
s
L
S
15 comments
Do you have a similarity cutoff and or rerank as a node post processor?
Looks like the default is 0.0

similarity_cutoff: float = Field(default=0.0)
Maybe you only have 20-22 nodes?
You can test the retriever directly too
Plain Text
nodes = index.as_retriever(similarity_top_k=100).retrieve("test")
Thanks guys, i solved it. I had some extra logic that fits my nodes into the context window, specified by some max range.
Other question regarding this topic: Can i somehow specify top_k as infinity? So that my retriever just retrieves the max amount of nodes?
I'm using a second reranker model afterwards bge-reranker-v2-m3 and it performs really really well. Actually some nodes are ranked really bad in the RetrieverQueryEngine, the 200th+ Node actually is the one that matters at the end and is correctly resolved by bge, so i would just like to pass all nodes to it.
Currently using a workaround like similarity_top_k=9999, but would like to know if there is a better way
But reranker is also kind of slow - if you have any better ideas how to get the better reranking performance, while maintaining acceptable response latency please tell me^^
Yea no way to set it to infinity. Some vector stores do have a get_nodes method, but you still need to provide either node_ids or metadata filters
Not sure what reranker you are using, if its running locally, its going to be largely dependant on your machine specs (having a GPU will help)

LLM rerankers are slow af in general

API-Based rerankers (like cohere) are you best bet for latency
As i said i use bge-reranker-v2-m3 using FlagEmbeddingReranker. I think however there is also the possibility of directly using a BGEM3Index? Or i've seen it somewhere in the docs. Would be cool to directly apply this in the retrieval step instead of having to postprocess.
Hmm, decent sized model (2.5gb) -- definitely without a gpu this will be painfully slow.

It will also get slower the higher the initial top k is

There is a bge index, but it's using multivector (i.e. colbert) retrieval, so I'd expect it to be pretty resource hungry (its generating a vector per token rather than per chunk)
https://docs.llamaindex.ai/en/stable/api_reference/indices/bge_m3/
Add a reply
Sign up and join the conversation on Discord