Find answers from the community

Updated 3 months ago

Hey, I wanna evaluate the effect that

Hey, I wanna evaluate the effect that GPT-3.5 with RAG has on question answering on a specific dataset, compared to using no RAG (= no vector index store). i wanted to simply rerun the Evaluator modules that LlamaIndex provides for both options. This works for the RAG option (just create a query engine from the vector store and then perform a query). With the no-RAG option this is not possible, since we don't have a VectorStore. Is there any other way to directly compare the two methods using the same benchmark in LlamaIndex?
W
a
2 comments
You can try something like this:
Plain Text
from llama_index.llms.openai import OpenAI
from llama_index.core.evaluation import FaithfulnessEvaluator

# create llm
llm = OpenAI(model="gpt-4", temperature=0.0)
# define evaluator
evaluator = FaithfulnessEvaluator(llm=llm)

eval_result = evaluator.evaluate(
        response=response_str, contexts=[TEXT_1,TEXT_2]
    )


Check if this works!
I'll give it a try, thanks so much for the response!
Add a reply
Sign up and join the conversation on Discord