Eval

At a glance

The community member is looking for a built-in method to evaluate a set of responses against a test set. They have a set of FAQs with expected results and want to compare the responses to the expected responses. The comments suggest using automated metrics like ROUGE Score and BERTScore to measure the similarity between the responses and the expected results. One community member also mentions a package called "ragas" that combines multiple evaluation metrics. However, the community members indicate that there is no built-in method for this task.

Useful resources

OOrion Pax

Is there a method built in to evaluate a set of responses against a test set? I've seen the evaluation pipeline, but that seems to help tell you what source produced the response and self-test if the response is good (or hallucinated). I have a set of FAQs with the expected result and I want to compare the response to the expected response.

3 comments

LLogan M

Since you have the expected result, there are a ton of automated metrics you can use

Rouge Score is a super popular one, it measures the overlap of text between two sources. However, I'm sure you can see the limitations with doing that

BertScore is another one, that measures the similarity between two pieces of text

LLogan M

I haven't tried it yet, but I saw this package that combines all these metrics into a single tool

https://github.com/explodinggradients/ragas

OOrion Pax

Ok. So nothing built in. Thanks for the reference 🙂

Add a reply

Find answers from the community

Eval