The community member is using a custom workflow agent and has noticed that when testing with models served in the OpenAI format (instead of OpenAI), the step in the workflow to intercept a streaming response if there is a tool call doesn't work as well as it does with OpenAI. OpenAI's response seems to detect a tool call almost immediately when the response starts, while some of the other models only detect it after the first few chunks.
Another community member has noticed the same issue with non-OpenAI models, and suggests using event streaming in workflows as a way around this. They provide a link to a Colab notebook that demonstrates this approach, which also includes dynamic context retrieval on every user message (which the community member can ignore if not needed).
The original community member expresses appreciation for the suggestion and plans to check out the Colab notebook.
Curious if anyone has experienced the same - I'm using a custom workflow agent and I noticed when testing with models served in the OpenAI format (instaed of OpenAI) that the step in the workflow to intercept a streaming response if there is a tool call doesn't work as well as as it does with openai. OpenAI's response seems to detect a tool call almost immediately when the response starts and some of the other models I'm testing with only detect it after the first few chunks
Ive actually noticed this too with non-openai models, sometimes there's some preamble
I think the way around this is to use event streaming in workflows. This way you can expose the stream whenever the llm is outputting non-toolcall responses