Alex

·

Hello. I've looked in the docs and asked

Hello. I've looked in the docs and asked kapa but I couldn't find this answer. I am interested in using QueryPipeline to encapsulate the entire RAG chain in one object. Is it possible to stream the final LLM answer as a result of QueryPipeline().run()? I.e yield a generator from which chunks can be retrieved?

So far I only managed to do it by removing LLM from the pipeline (it only retrieves docs and prepares a prompt) and call LLM separately, like:

Plain Text

output, intermediates = pipeline.run_with_intermediates(
    input="Who is a good boy?"
)
response = llm_azure.stream_chat(messages=output)
for chunk in response:
    print(chunk.delta, end="", flush=True)

4 comments

L

A

Find answers from the community

Hello. I've looked in the docs and asked