Hey, I wanna evaluate the effect that

aaddo

Hey, I wanna evaluate the effect that GPT-3.5 with RAG has on question answering on a specific dataset, compared to using no RAG (= no vector index store). i wanted to simply rerun the Evaluator modules that LlamaIndex provides for both options. This works for the RAG option (just create a query engine from the vector store and then perform a query). With the no-RAG option this is not possible, since we don't have a VectorStore. Is there any other way to directly compare the two methods using the same benchmark in LlamaIndex?

2 comments

WWhiteFang_Jr

You can try something like this:

Plain Text

from llama_index.llms.openai import OpenAI
from llama_index.core.evaluation import FaithfulnessEvaluator

# create llm
llm = OpenAI(model="gpt-4", temperature=0.0)
# define evaluator
evaluator = FaithfulnessEvaluator(llm=llm)

eval_result = evaluator.evaluate(
        response=response_str, contexts=[TEXT_1,TEXT_2]
    )

Check if this works!

aaddo

I'll give it a try, thanks so much for the response!

Add a reply

Find answers from the community

Hey, I wanna evaluate the effect that