Find answers from the community

Updated 6 months ago

Evaluating

At a glance

Hi, we are fine tuning a LLM for a use case specific app (data insights). Now we are looking for a scalable way of evaluating the results quality and ensure that it doesnt disturb the previous fine tune. Can someone provide some hints on if there are some tools/frameworks to do so in a scalable manner? Most of the current frameworks like Glue are for generic cases and not cater to use case specific.

1 comment

LLogan M

Your options are either creating expected Input/output pairs to evaluate against (using rouge score, or similar), or using a larger LLM to generate questions and evaluate responses for you 😅 llama-index has the latter in the repo!

There is also the ragas repo for evaluating responses
https://github.com/explodinggradients/ragas

Add a reply