The community member is trying to create a RAG (Retrieval-Augmented Generation) pattern for Q&A that includes a second request to an LLM (Large Language Model) for generating a response. This is done when the first attempt produces a response that doesn't score high in terms of relevancy and faithfulness to the retrieval results. The community member is also considering trying another vector index if the retrieval returns poor results. They are unsure if LlamaIndex supports this kind of complex dynamic routing or if they should break the flow into smaller pieces and call them manually. Other community members suggest that it is possible to run the query call twice using different LLM and service setup, but that a layer above LlamaIndex may be needed to manage the routing using traditional development techniques like if/then/else statements.
Hi, I'm trying to create a RAG pattern for Q&A that potentially includes a 2nd request to an LLM for generating a response. For example, when the first attempt produces a response that doesn't get high scores in terms of relevancy to the question and faithfulness to the retrieval results (automatic metrics). The 2nd call would be to a different LLM and using a different prompt. In general the same question can be asked also with respect to retrieval - sometimes the retrieval returns bad results that can be detected by automatic metrics, and in those cases it could make sense to try another vector index. I'm trying to understand if this kind of patterns is supported by LlamaIndex or I should break it down into smaller pieces and call them manually. Thanks!
I'm still not sure if LlamaIndex is meant for this kind of complex dynamic routing. Most examples in the documentation focus on the indexing, retrieval and post-processing (i.e. post-regtrieval, befoe generation) parts. I tend to think that I should break the flow into smaller flows and call them manually as needed.