LlamaIndex

Log inLog into community

Find answers from the community

Updated 5 months ago

Has any one else noticed that the

Has any one else noticed that the

At a glance

·

Has any one else noticed that the similarity score is often quite high (>0.7) even for chunks with content not directly related to the user query ? I would assume it; would be much lower. Is there any tips, resources or pointers from the community on this issue ?

T

M

L

13 comments

I assume you're using OpenAI for this? When using them it's normal to have those values

You can just use a higher similarity cutoff in that case

Why is that ? And what's a reasonalble cutoff in your experience ?

Yea, with openai embeddings, around 0.7-0.75 seems to be the neutral score

For some evals I've done before I used 0.9 as the cutoff

Thanks @Logan M is there any doc coming to mind on that ?

Seems high, but I will try, Thank you !

It's probably too high for retrieval though

Argh. I will to cuting off around 75

Do you know where I can find out more about the neutral score?

It's just from experience tbh

Just a general observation as I've been working with llamaindex

I see. I find that very interesting. Will dig into that

Add a reply

Sign up and join the conversation on Discord