Find answers from the community

Updated 3 months ago

Has any one else noticed that the

Has any one else noticed that the similarity score is often quite high (>0.7) even for chunks with content not directly related to the user query ? I would assume it; would be much lower. Is there any tips, resources or pointers from the community on this issue ?
T
M
L
13 comments
I assume you're using OpenAI for this? When using them it's normal to have those values
You can just use a higher similarity cutoff in that case
Why is that ? And what's a reasonalble cutoff in your experience ?
Yea, with openai embeddings, around 0.7-0.75 seems to be the neutral score
For some evals I've done before I used 0.9 as the cutoff
Thanks @Logan M is there any doc coming to mind on that ?
Seems high, but I will try, Thank you !
It's probably too high for retrieval though
Argh. I will to cuting off around 75
Do you know where I can find out more about the neutral score?
It's just from experience tbh
Just a general observation as I've been working with llamaindex
I see. I find that very interesting. Will dig into that
Add a reply
Sign up and join the conversation on Discord