Find answers from the community

Updated 3 months ago

For the SubqueryEngine, it makes an LLM

For the SubqueryEngine, it makes an LLM call per subquestion and then a final synthesizer LLM call. How do I prevent the LLM call per subquestion and just take all the retrieved nodes and questions and dump it in the final synthesizer.

I'm reading the SubQueryEngine source code, but I'm having trouble seeing how the retrieved nodes gets passed into an LLM from right after for the subquestions. Help appreciated.
L
c
11 comments
Yea theres no option to stop the LLM call per subquestion.

Basically each sub-question is asked to an underlying query engine

I think the query fusion retriever will accomplish what you want instead (or at least somewhat similar)
https://docs.llamaindex.ai/en/stable/examples/retrievers/reciprocal_rerank_fusion.html#reciprocal-rerank-fusion-retriever

Otherwise, you can make a custom query pipeline

https://docs.llamaindex.ai/en/stable/module_guides/querying/pipeline/root.html
I see thanks. btw can you please point me to the source code where query engine makes the LLM call after getting the retrieved nodes.
Is this line where a response is synthesized for a subquestion? (separate from the synthesizer that takes in all sub q + answer for final synthesis)
https://github.com/run-llama/llama_index/blob/2c75572c3f57ae0017a711258161f2f99c2b40ee/llama_index/query_engine/retriever_query_engine.py#L172
i'm kinda confused where the subquestion synthesis llm call is happening vs the final synthesis llm call (with all answers synthesized from the subquestions)
When calling _query_subq it calls response = query_engine.query(question)
which in turns calls retiever_query_engine
which calls
Plain Text
def _query(self, query_bundle: QueryBundle) -> RESPONSE_TYPE:
            nodes = self.retrieve(query_bundle)
            response = self._response_synthesizer.synthesize(
                query=query_bundle,
                nodes=nodes,
            )

            query_event.on_end(payload={EventPayload.RESPONSE: response})
        return response

and if since I'm using Qdrant, the self.retrieve would somehow call Qdrant's query function??
https://github.com/run-llama/llama_index/blob/2c75572c3f57ae0017a711258161f2f99c2b40ee/llama_index/vector_stores/qdrant.py#L416
The retriever is a VectorIndexRetriever, which contains the qdrant vector store
I thought the query from query engine was querying qdrant's query function, which didn't make any sense cuz that query function in qdrant didn't have the llm call. now it makes sense πŸ‘πŸ»
Add a reply
Sign up and join the conversation on Discord