Find answers from the community

Updated last year

Eval

Is there a method built in to evaluate a set of responses against a test set? I've seen the evaluation pipeline, but that seems to help tell you what source produced the response and self-test if the response is good (or hallucinated). I have a set of FAQs with the expected result and I want to compare the response to the expected response.
L
O
3 comments
Since you have the expected result, there are a ton of automated metrics you can use

Rouge Score is a super popular one, it measures the overlap of text between two sources. However, I'm sure you can see the limitations with doing that

BertScore is another one, that measures the similarity between two pieces of text
I haven't tried it yet, but I saw this package that combines all these metrics into a single tool

https://github.com/explodinggradients/ragas
Ok. So nothing built in. Thanks for the reference πŸ™‚
Add a reply
Sign up and join the conversation on Discord