The community member is looking for a built-in method to evaluate a set of responses against a test set. They have a set of FAQs with expected results and want to compare the responses to the expected responses. The comments suggest using automated metrics like ROUGE Score and BERTScore to measure the similarity between the responses and the expected results. One community member also mentions a package called "ragas" that combines multiple evaluation metrics. However, the community members indicate that there is no built-in method for this task.
Is there a method built in to evaluate a set of responses against a test set? I've seen the evaluation pipeline, but that seems to help tell you what source produced the response and self-test if the response is good (or hallucinated). I have a set of FAQs with the expected result and I want to compare the response to the expected response.