Agent

At a glance

The community member is building an agent that uses a large language model (LLM) and wants to stream the final output. They mention the option of passing the full final message to a final step and streaming that, but this would result in a latency hit. They haven't found a nice solution yet, as the full message is required to determine if a function call is needed.

Another community member suggests using an async generator in a workflow to first return a boolean to determine if it's a tool call, and then return the stream. They provide a link to a Colab notebook demonstrating this approach.

The original community member says they are building the agent mostly from scratch using Workflows and had a similar idea, so they will take a look at the notebook. The other community member says the approach in the notebook seemed to work pretty well.

The community members also discuss other potential solutions, such as creating a "Final Answer" tool that requires only a boolean to limit output tokens, and using the new event streaming API from LlamaIndex.

There is no explicitly marked answer, but the community members are collaborating and sharing ideas to find a solution to the original problem.

Useful resources

DDS

Hey all, anyone have an example of building an agent using a function calling llm where they stream the final output? There is the option of passing the full final message to a final step and streaming that but you'll get a latency hit; I haven't found a nice solution yet as the full message is required to determine if a function call is required.

9 comments

LLogan M

Are you building the agent from mostly scratch?

Detecting the tool call is very sneaky. I did it here in a workflow, using an async generator that first returns a boolean to determine if it's a tool call, and then then returns the stream if not
https://colab.research.google.com/drive/1UjDJMyXR11HKIki3tuMew6EEzq91ewYw?usp=sharing#scrollTo=1XoDZK0YvQQe

DDS

Yeah mostly from scratch using Workflows, I had a similar idea to what you did so i'll take a look at the notebook. Was it relatively successful?

LLogan M

Seemed to work pretty well!

DDS

I was also thinking of creating a "Final Answer" tool that required only a boolean to limit output tokens and then passing on the final message to a final step if that tool was called

LLogan M

Another option that could also work, is using the new event streaming api

(It wasn't available yet when I wrote that notebook)
https://docs.llamaindex.ai/en/stable/understanding/workflows/stream/

LLogan M

This could work too! But depends on the LLM properly calling this tool

LLogan M

Just curious, but are you building this as part of where you work? Workflows are new, so always curious about the usecases and business cases people are working on with them 🔥

DDS

Yeah I actually had a discussion with Biswaroop recently about the use cases and was actually going to ping him again for a follow up call, i'll mention you be included as well if interested

LLogan M

Oh sweet! Bis already chatted 🔥

Add a reply

Find answers from the community

Agent