Find answers from the community

A
Alex
Offline, last seen 6 months ago
Joined September 25, 2024
Hello. I've looked in the docs and asked kapa but I couldn't find this answer. I am interested in using QueryPipeline to encapsulate the entire RAG chain in one object. Is it possible to stream the final LLM answer as a result of QueryPipeline().run()? I.e yield a generator from which chunks can be retrieved?

So far I only managed to do it by removing LLM from the pipeline (it only retrieves docs and prepares a prompt) and call LLM separately, like:
Plain Text
output, intermediates = pipeline.run_with_intermediates(
    input="Who is a good boy?"
)
response = llm_azure.stream_chat(messages=output)
for chunk in response:
    print(chunk.delta, end="", flush=True)
4 comments
L
A