Find answers from the community

Updated 5 months ago

Evaluating

At a glance
Hi, we are fine tuning a LLM for a use case specific app (data insights). Now we are looking for a scalable way of evaluating the results quality and ensure that it doesnt disturb the previous fine tune. Can someone provide some hints on if there are some tools/frameworks to do so in a scalable manner? Most of the current frameworks like Glue are for generic cases and not cater to use case specific.
L
1 comment
Your options are either creating expected Input/output pairs to evaluate against (using rouge score, or similar), or using a larger LLM to generate questions and evaluate responses for you πŸ˜… llama-index has the latter in the repo!

There is also the ragas repo for evaluating responses
https://github.com/explodinggradients/ragas
Add a reply
Sign up and join the conversation on Discord