Could anyone explain what a LlamaIndex

At a glance

The community member is trying to determine the correct approach for handling tool calls in a LlamaIndex Agent's astream_chat() call. The options are:

1. Detect a tool call, stop yielding tokens to the agent until a stop token is encountered, then parse out the tools and send the final buffered request with request.tools included.

2. Continue streaming tokens to the agent and only in the final response send the request back with request.tools.

The comments suggest that the first approach of detecting tool calls and returning them first is the correct one, as that is the main requirement for agents. The community members also discuss using the OpenAI format for tool calls and integrating their API with LlamaIndex.

CCryptRillionaire.eth

Could anyone explain what a LlamaIndex Agent expects in an astream_chat() call when it receives tools to use? Specifically, in a non-streaming call, I send back a list of tools parsed from the LLM response in the request.tools list.

I'm trying to determine whether, on my backend, I should:

Detect a tool call, stop yielding tokens to the agent until I encounter a stop token, then parse out the tools and send the final buffered request with request.tools included,

Continue streaming tokens to the agent and only in the final response send the request back with request.tools.

Which approach is correct?

10 comments

LLogan M

Streaming with tool calls is tricky

The main requirement for agents is that it expects tool/function calls to be returned first

i.e. the first response dict from the llm should have tool calls, if tools are being called

CCryptRillionaire.eth

Ok, so sounds like detect a "{" in the first token and build up a buffer with first delta response being the full message and a list of the tools.

CCryptRillionaire.eth

Thanks again fran : )

LLogan M

Pretty much (to be clear, its expecting this to show up in the message.tool_calls part of the llm response, if using openai format/client)

LLogan M

A quick way to check if its working, we can stream tool calls

Plain Text

from llama_index.core.llms import ChatMessage
from llama_index.llms.openai import OpenAI
from pydantic.v1 import BaseModel
from typing import List
from IPython.display import clear_output
from pprint import pprint


class MenuItem(BaseModel):
    """A menu item in a restaurant."""

    course_name: str
    is_vegetarian: bool


class Restaurant(BaseModel):
    """A restaurant with name, city, and cuisine."""

    name: str
    city: str
    cuisine: str
    menu_items: List[MenuItem]


llm = OpenAI(model="gpt-3.5-turbo")

input_msg = ChatMessage.from_str("Generate a restaurant in Boston")

sllm = llm.as_structured_llm(Restaurant)
stream_output = sllm.stream_chat([input_msg])
for partial_output in stream_output:
    clear_output(wait=True)
    pprint(partial_output.raw.dict())
    restaurant_obj = partial_output.raw

CCryptRillionaire.eth

Yea, Im trying to maintain parody with the ecosystem so using the OpenAI format everywhere. hope its the same that things like Ollama and other opensource tools use and what not....

CCryptRillionaire.eth

Would like my chat and agent endpoints to work with all the tools if i point them to my api. But dont have a full grasp of eveything yet. Like right now my LlamaIndex agent is behind a /v1/chat/completion that proxys to my real compleation endpoint in another docker container at /v1/chat/completion. Dont know yet if this should really be implementing the OpenAIAgent endpoint or not...

CCryptRillionaire.eth

My initail api completion endpoint is proyxying between my local docker LLM and a version of that image deployed to RunPod based on what LLM is set in the model field.

CCryptRillionaire.eth

When my docker image boots up it negotiates a PKI key with my server so i can have private communication into memory of the Runpod host for some obfuscated privacy. so private as long as there not digging my prompts out of memory... and im not that special lol. So new keys negotiated for every chat after 60 seconds.

CCryptRillionaire.eth

Got it working... Thanks again!

Add a reply

Find answers from the community

Could anyone explain what a LlamaIndex