llama_index/llama-index-core/llama

yyixin_hu

Hi all,
I am using llama-index evaluation part to evaluate a RAG system, and I was wondering the theory behind those eval-questions. For example in context_relevancy, the prompt is
<"Your task is to evaluate if the retrieved context from the document sources are relevant to the query.\n"
"The evaluation should be performed in a step-by-step manner by answering the following questions:\n"
"1. Does the retrieved context match the subject matter of the user's query?\n"
"2. Can the retrieved context be used exclusively to provide a full answer to the user's query?\n">
https://github.com/run-llama/llama_index/blob/f5263896121721de1051ce58338a1e0ea6950ca7/llama-index-core/llama_index/core/evaluation/context_relevancy.py
Does anyone know these questions are based on what principles?

3 comments

jjerryjliu0

hey @yixin_hu thanks for the question. we spent some back and forth tuning eval prompts until it passed the eyeball test 🙂 of course if you have a better prompt for measuring context relevancy by all means feel free to customize! that should be possible with this module

cc @nerdai to provide any additiona lcontext

aandrei

Yeah, what Jerry mentions essentially covers it. High-level principles here include providing a rubric (taken inspiration from research results found in Prometheus on importance of rubrics) as well as CoT in terms of answering/evaluating step-by-step..

yyixin_hu

thank you for response! These are what we are looking for, we will read the materials, thank you

Add a reply

Find answers from the community

llama_index/llama-index-core/llama_index...