Hello. I've looked in the docs and asked kapa but I couldn't find this answer. I am interested in using QueryPipeline to encapsulate the entire RAG chain in one object. Is it possible to stream the final LLM answer as a result of QueryPipeline().run()? I.e yield a generator from which chunks can be retrieved?
So far I only managed to do it by removing LLM from the pipeline (it only retrieves docs and prepares a prompt) and call LLM separately, like:
output, intermediates = pipeline.run_with_intermediates(
input="Who is a good boy?"
)
response = llm_azure.stream_chat(messages=output)
for chunk in response:
print(chunk.delta, end="", flush=True)