Is anyone using real-time evaluation (of

At a glance

Is anyone using real-time evaluation (of the retrieved nodes and/or response) to dictate the business logic for the rest of the RAG pipeline? For example, we could evaluate the retrieved nodes against the user query and decide the nodes are of poor quality so we tell the user "We can't answer your question" and the RAG pipe ends there (saving us the cost of synthesis from an LLM). Or we could evaluate the response and decide it wasn't good enough so we ask an Agent to break apart the initial query into sub-queries and try RAG again on the sub-queries (being careful not to go into endless recursion).

Is RAG evaluation more meant to be a method of observability, QA, and monitoring or can it actually influence business logic?

7 comments

LLogan M

I think RAG evaluation can influence business logic if you want it to 🙂

The process you described can be achieved by separating the retrieval and response synthesis steps 👀

ttimothybeamish

Does anyone have examples of doing this?

LLeMuffinMan

I don't have an example, but you could retrieve NodeWithScore objects, do reranking and then maybe put some hard limit on the minimum score?
This hard limit isn't a great solution though, to be honest.

LLogan M

I can write one:

Plain Text

from llama_index.core import get_response_synthesizer

index = <VectorStoreIndex>
retriever = index.as_retriever(similarity_top_k=2)

nodes = retriever.retrieve("query")
<do something with nodes?>

synthesizer = get_response_synthesizer(response_mode="compact", llm=llm)
synthesizer.synthesize("query", nodes=nodes)

LLeMuffinMan

for do something with nodes?, I suppoose you could filter them by score.
You could also pass the content of retrieved nodes to an LLM and ask the LLM if its relevant to the query? But that's expensive.

LLogan M

Yea exactly -- its up to you how to handle the nodes. Anything involving an LLM will be slower than stuff like filtering or reranking

ttimothybeamish

Ok, your example is along the same lines as how I was envisioning it.

Add a reply

Find answers from the community

Is anyone using real-time evaluation (of