Find answers from the community

Updated 7 months ago

I'm using SubQuestionQueryEngine.from_

At a glance

I'm using SubQuestionQueryEngine.from_defaults. Is it possible to stream the final response from the LLM? I was hoping to reduce the apparent latency by streaming, but haven't figured out how to do it yet.

1 comment

LLogan M

Plain Text

from llama_index.core import get_response_synthesizer

synthesizer = get_response_synthesizer(llm=llm, response_mode="compact", streaming=True)

query_engine = SubQuestionQueryEngine.from_defaults(..., response_synthesizer=synthesizer)

response = query_engine.query("..")
for token in response.response_gen:
  print(token, end="", flush=True)

Add a reply