I'm using almost the exact same code as with the EDD notebooks from this
repoBasically I'm doing this:
from llama_index.core.evaluation import DatasetGenerator, FaithfulnessEvaluator, RelevancyEvaluator, BatchEvalRunner
evaluator_llm = OpenAI(model="gpt-4", temperature=0)
faithfulness_evaluator = FaithfulnessEvaluator(llm=evaluator_llm)
relevancy_evaluator = RelevancyEvaluator(llm=evaluator_llm)
eval_runner = BatchEvalRunner(
{"faithfulness": faithfulness_evaluator, "relevancy": relevancy_evaluator},
workers=6,
show_progress=True,
)
eval_results = await eval_runner.aevaluate_queries(
query_engine=hybrid_query_engine, queries=subsapmple_synth_questions
)