Return Direct

At a glance

The post is about handling streaming responses from query engine tools, as the current ToolOutput pydantic validator does not seem compatible with it. The community member who posted the question is wondering if the project has already addressed this issue, and is willing to contribute if not.

In the comments, another community member mentions that handling streaming outputs was too complicated, so they took a "slightly hacky" approach of faking the stream when return_direct is triggered. They suggest that spinning up a thread to write to the chat history makes it a lot of work to handle streaming properly.

The original poster acknowledges the difficulty and says they will instead focus on using other tricks to speed up inference and return the final outputs with less latency.

DDrSebastianK

@Logan M Hey! I saw you did some updates regarding tool output of agents (return_direct). Is there a simple way, to handle streaming responses of query engine tools? The pydantic validator ToolOutput does not seem to be compatible with it (at least it just outputs the whole response as a string for now). I am wondering if you guys already did something for that purpose. Otherwise I am happy to contribute.

4 comments

LLogan M

For streaming outputs of tools, it was waaaay to complicated to handle

Instead, I did a (slightly hacky) approach where I just fake the stream if return direct is triggered while streaming

You'll still have response.response_gen or response.async_response_gen()

LLogan M

If you see an easy way to handle it, go for it! But tbh my impression is that it will be a lot of work due to how we spin up a thread to write to chat history

DDrSebastianK

Thanks for the help. I also looked into the code, tried to work out a logic, but now as you told me the same, I rather just use other tricks to speed-up inference and return the final outputs with less latency.

LLogan M

yea its pretty tricky

Add a reply

Find answers from the community

Return Direct