Curious if anyone has experienced the same - I'm using a custom workflow agent and I noticed when testing with models served in the OpenAI format (instaed of OpenAI) that the step in the workflow to intercept a streaming response if there is a tool call doesn't work as well as as it does with openai. OpenAI's response seems to detect a tool call almost immediately when the response starts and some of the other models I'm testing with only detect it after the first few chunks
Ive actually noticed this too with non-openai models, sometimes there's some preamble
I think the way around this is to use event streaming in workflows. This way you can expose the stream whenever the llm is outputting non-toolcall responses