Do function calling agents (OpenAI LLM)

CChad Boyda

Do function calling agents (OpenAI LLM) manage their history? Keep running into exceeding context length errors with OpenAI due to the history + prompt + tools exceeding 16k tokens, is there any way to avoid this?

20 comments

LLogan M

Reduce the token limit of the memory buffer?

LLogan M

Plain Text

from llama_index.core.memory import ChatMemoryBuffer

memory = ChatMemoryBuffer.from_defaults(token_limit=5000)
agent = <agent cls>(..., memory=memory)

CChad Boyda

I have this...

agent_worker = FunctionCallingAgentWorker.from_tools(...)
agent = AgentRunner(..., memory=ChatMemoryBuffer.from_defaults(token_limit=5000))
response = agent.chat(...)

but it doesn't seem to respect that at all, I will still get this:

Retrying llama_index.llms.openai.base.OpenAI._chat in 0.44005995072639026 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 16385 tokens. However, your messages resulted in 16404 tokens (15574 in the messages, 830 in the functions). Please reduce the length of the messages or functions.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}.

LLogan M

do you have a tool thats perhaps returning a huge output?

CChad Boyda

not exceeding the token limit

my steps do run for long time though as I have several back and forth calls until eventually it runs out of context space

I tried digging through all the source and can't seem to find where/what would actual limit the tokens sent, I assume ChatMemoryBuffer would (but only appears to on the .get() method) and I'm not sure it's counting tokens from tool responses against that

CChad Boyda

Even using a model with much larger context (128k) it still ends up exceeding it because the memory doesn't appear to be managed/buffered at all, despite setting it to 5000 (or any other number)

LLogan M

ah I think I see the issue maybe

LLogan M

Attachment

LLogan M

So the top-level agent memory is seprate from the worker memory. Then when the task is compete, the new memory gets commited to the top level

LLogan M

In this case, it would be defaulting to a token limit of 3000 though...

LLogan M

This probably needs to be updated to count tokens in tool calls, I think you are right

LLogan M

Attachment

CChad Boyda

yeah most of my tokens are from the tool calls, and I think that's what's missing

CChad Boyda

is there any existing code that counts tool call tokens? I couldn't find any great documentation on this to patch it

LLogan M

there is llama-index-core/llama_index/core/utilities/token_counting.py

LLogan M

should probably be used here

LLogan M

(And also updated, right now it only looks for function_call, but I think tool_calls is the kwarg now?

CChad Boyda

yeah I think so

CChad Boyda

Appreciate the help, have been fighting this for weeks and you've given me more leads in a few minutes than I've had for weeks!

LLogan M

Yea no worries! I'm not sure how quickly I can jump on this, but I think the issue is definitely identified. If you can make a PR that would be super cool 🙏

Add a reply

Find answers from the community

Do function calling agents (OpenAI LLM)