so for things like faithfulness and

At a glance

The community members discuss the generation of scores for faithfulness and relevancy. One community member confirms that these scores are generated by an outside language model (LLM), rather than a formula. Another community member asks if other metrics like BLEU or ROUGE can be used, and is told that while they are open to adding them, those metrics require ground truth data which is often not available. The discussion also covers the limitations of ROUGE scores and the choice of metrics used, with the community members noting that in many cases, people don't have ground truth data, so those metrics were not a high priority. The community members are open to accepting contributions that add new metrics to the system.

GGpt. Alex

so for things like faithfulness and relevancy, the scores are generated by some outside LLM right? There's no formula that's used to generate that value?

10 comments

LLogan M

Correct, its LLM as a judge (except for the semantic similarity one, that's just cosine similarity)

GGpt. Alex

Thanks! I saw that cosine similarity can be switched for things like dot product and Euclidean distance, but is there any way to use other metrics like Bleu or Rouge?

LLogan M

nope

LLogan M

open to adding them (those assume you have ground truth to compare to)

LLogan M

but they have their own pitfalls

GGpt. Alex

yes, I've been using a labled rag dataset example from llamahub for right now

GGpt. Alex

also interesting, is there a reason why you chose those metrics (did were they good enough for measuring semantic similarity), or was it just a speed/ease of implementation situation?

LLogan M

In most cases people don't have ground truth to compare to, so it was a lower priority. Also, imo, they are a tad less helpful? maybe a hot take hahaha

Theres so many ways to write a response. A rouge score of 30 isn't really that informative, even if its what academia has clung to the past few years

LLogan M

But if some contributes it to the repo, it will definitely get merged 🙂

GGpt. Alex

sounds good! I'll see what I can do

Add a reply

Find answers from the community

so for things like faithfulness and