For the SubqueryEngine, it makes an LLM

At a glance

For the SubqueryEngine, it makes an LLM call per subquestion and then a final synthesizer LLM call. How do I prevent the LLM call per subquestion and just take all the retrieved nodes and questions and dump it in the final synthesizer.

I'm reading the SubQueryEngine source code, but I'm having trouble seeing how the retrieved nodes gets passed into an LLM from right after for the subquestions. Help appreciated.

11 comments

LLogan M

Yea theres no option to stop the LLM call per subquestion.

Basically each sub-question is asked to an underlying query engine

I think the query fusion retriever will accomplish what you want instead (or at least somewhat similar)
https://docs.llamaindex.ai/en/stable/examples/retrievers/reciprocal_rerank_fusion.html#reciprocal-rerank-fusion-retriever

Otherwise, you can make a custom query pipeline

https://docs.llamaindex.ai/en/stable/module_guides/querying/pipeline/root.html

cchantlong

I see thanks. btw can you please point me to the source code where query engine makes the LLM call after getting the retrieved nodes.

LLogan M

So, the LLM generates questions, as well as which query engine tool to send those questions too

Since each query engine is a wrapper around a retriever and response synthesizer, it happens all in there

So here's where questions (+ which query engines to send them to) are generated
https://github.com/run-llama/llama_index/blob/2c75572c3f57ae0017a711258161f2f99c2b40ee/llama_index/query_engine/sub_question_query_engine.py#L129

Then, each query engine tool is called
https://github.com/run-llama/llama_index/blob/2c75572c3f57ae0017a711258161f2f99c2b40ee/llama_index/query_engine/sub_question_query_engine.py#L138
https://github.com/run-llama/llama_index/blob/2c75572c3f57ae0017a711258161f2f99c2b40ee/llama_index/query_engine/sub_question_query_engine.py#L223

Then, inside that query engine, there is a retrieve and synthesize step
https://github.com/run-llama/llama_index/blob/2c75572c3f57ae0017a711258161f2f99c2b40ee/llama_index/query_engine/retriever_query_engine.py#L171

cchantlong

Is this line where a response is synthesized for a subquestion? (separate from the synthesizer that takes in all sub q + answer for final synthesis)
https://github.com/run-llama/llama_index/blob/2c75572c3f57ae0017a711258161f2f99c2b40ee/llama_index/query_engine/retriever_query_engine.py#L172

cchantlong

thanks!

cchantlong

i'm kinda confused where the subquestion synthesis llm call is happening vs the final synthesis llm call (with all answers synthesized from the subquestions)

cchantlong

ahh this line is the where the final synthesis is happening
https://github.com/run-llama/llama_index/blob/2c75572c3f57ae0017a711258161f2f99c2b40ee/llama_index/query_engine/sub_question_query_engine.py#L156

cchantlong

When calling _query_subq it calls response = query_engine.query(question)
which in turns calls retiever_query_engine
which calls

Plain Text

def _query(self, query_bundle: QueryBundle) -> RESPONSE_TYPE:
            nodes = self.retrieve(query_bundle)
            response = self._response_synthesizer.synthesize(
                query=query_bundle,
                nodes=nodes,
            )

            query_event.on_end(payload={EventPayload.RESPONSE: response})
        return response

and if since I'm using Qdrant, the self.retrieve would somehow call Qdrant's query function??
https://github.com/run-llama/llama_index/blob/2c75572c3f57ae0017a711258161f2f99c2b40ee/llama_index/vector_stores/qdrant.py#L416

LLogan M

correct!

LLogan M

The retriever is a VectorIndexRetriever, which contains the qdrant vector store

cchantlong

I thought the query from query engine was querying qdrant's query function, which didn't make any sense cuz that query function in qdrant didn't have the llm call. now it makes sense 👍🏻

Add a reply

Find answers from the community

For the SubqueryEngine, it makes an LLM