@Logan M Hey! I saw you did some updates regarding tool output of agents (return_direct). Is there a simple way, to handle streaming responses of query engine tools? The pydantic validator ToolOutput does not seem to be compatible with it (at least it just outputs the whole response as a string for now). I am wondering if you guys already did something for that purpose. Otherwise I am happy to contribute.
If you see an easy way to handle it, go for it! But tbh my impression is that it will be a lot of work due to how we spin up a thread to write to chat history
Thanks for the help. I also looked into the code, tried to work out a logic, but now as you told me the same, I rather just use other tricks to speed-up inference and return the final outputs with less latency.