Find answers from the community

Updated 5 months ago

Hello. I've looked in the docs and asked

At a glance
Hello. I've looked in the docs and asked kapa but I couldn't find this answer. I am interested in using QueryPipeline to encapsulate the entire RAG chain in one object. Is it possible to stream the final LLM answer as a result of QueryPipeline().run()? I.e yield a generator from which chunks can be retrieved?

So far I only managed to do it by removing LLM from the pipeline (it only retrieves docs and prepares a prompt) and call LLM separately, like:
Plain Text
output, intermediates = pipeline.run_with_intermediates(
    input="Who is a good boy?"
)
response = llm_azure.stream_chat(messages=output)
for chunk in response:
    print(chunk.delta, end="", flush=True)
L
A
4 comments
Although... I hope the intermediates thing doesn't exhaust the stream before you get it πŸ˜…
Cool, it works thanks a lot! Didn't occur to me to go to examples, normally I look at documentation/API for those kind of things
In a library that moves this fast, probaly examples is your best bet πŸ˜…

The API reference has gotten a lot better lately, but still not perfect
Add a reply
Sign up and join the conversation on Discord