Find answers from the community

s
F
Y
a
P
Updated 3 months ago

I'm using SubQuestionQueryEngine.from_

I'm using SubQuestionQueryEngine.from_defaults. Is it possible to stream the final response from the LLM? I was hoping to reduce the apparent latency by streaming, but haven't figured out how to do it yet.
L
1 comment
Plain Text
from llama_index.core import get_response_synthesizer

synthesizer = get_response_synthesizer(llm=llm, response_mode="compact", streaming=True)

query_engine = SubQuestionQueryEngine.from_defaults(..., response_synthesizer=synthesizer)

response = query_engine.query("..")
for token in response.response_gen:
  print(token, end="", flush=True)
Add a reply
Sign up and join the conversation on Discord