Find answers from the community

Updated 11 months ago

so for things like faithfulness and

so for things like faithfulness and relevancy, the scores are generated by some outside LLM right? There's no formula that's used to generate that value?
L
G
10 comments
Correct, its LLM as a judge (except for the semantic similarity one, that's just cosine similarity)
Thanks! I saw that cosine similarity can be switched for things like dot product and Euclidean distance, but is there any way to use other metrics like Bleu or Rouge?
open to adding them (those assume you have ground truth to compare to)
but they have their own pitfalls
yes, I've been using a labled rag dataset example from llamahub for right now
also interesting, is there a reason why you chose those metrics (did were they good enough for measuring semantic similarity), or was it just a speed/ease of implementation situation?
In most cases people don't have ground truth to compare to, so it was a lower priority. Also, imo, they are a tad less helpful? maybe a hot take hahaha

Theres so many ways to write a response. A rouge score of 30 isn't really that informative, even if its what academia has clung to the past few years
But if some contributes it to the repo, it will definitely get merged πŸ™‚
sounds good! I'll see what I can do
Add a reply
Sign up and join the conversation on Discord