Find answers from the community

Updated 4 months ago

Response streaming

At a glance

The community members are discussing an issue with streaming the response nodes from a query engine. The original post indicates that the response nodes being printed are "nonsensical" and keep repeating a single source node, which is not the expected behavior. In the comments, a community member explains that the source nodes are calculated before the response stream starts, and suggests iterating over the response generator and then accessing the source nodes separately. However, the community members are having trouble getting the source nodes to print at the same time as the response stream. They mention trying different approaches, but ultimately finding a "patchy workaround" that was not easy to implement.

I mean it starts printing out/streaming the response nodes but they're nonsensical, it just keep repeating one source node and doesn't function the same way it does when not streaming
L
T
5 comments
Not sure I know what you mean haha

So here's my understanding. You can enable streaming and query

response = query_engine.query("query")

From there, you can either do

response.print_response_stream() to print to stdout

Or you can iterate over the generator yourself to handle the tokens

Plain Text
for word in response.response_gen:
    <do a thing with word>


At the same time, independent of those things, you can do response.source_nodes to get a list of source nodes and similarity scores. These are not streamed, as they are static and should be set before the response even starts streaming

So you could iterate over the generator, and after the generator is done, do something with the source nodes right?
Yeah I'm streaming the LLM response stream, however I was trying to get the response source nodes printed at the same type (I managed to get them streaming) but it doesn't really work properly.

I also tried just streaming the LLM response stream and then printing out the source nodes simultaneously onto a different element (without streaming) but that didn't seem to work too well either.
But I guess if they're calculated before the response even starts streaming there must be a way to print them visible even before the LLM response starts streaming?
But that part seemed to break it for me because the data types are different I guess?
I found a patchy workaround but lets say it wasn't too easy πŸ™πŸΌ
Add a reply
Sign up and join the conversation on Discord