Reduce the token limit of the memory buffer?
from llama_index.core.memory import ChatMemoryBuffer
memory = ChatMemoryBuffer.from_defaults(token_limit=5000)
agent = <agent cls>(..., memory=memory)
I have this...
agent_worker = FunctionCallingAgentWorker.from_tools(...)
agent = AgentRunner(..., memory=ChatMemoryBuffer.from_defaults(token_limit=5000))
response = agent.chat(...)
but it doesn't seem to respect that at all, I will still get this:
Retrying llama_index.llms.openai.base.OpenAI._chat in 0.44005995072639026 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 16385 tokens. However, your messages resulted in 16404 tokens (15574 in the messages, 830 in the functions). Please reduce the length of the messages or functions.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}.
do you have a tool thats perhaps returning a huge output?
not exceeding the token limit
my steps do run for long time though as I have several back and forth calls until eventually it runs out of context space
I tried digging through all the source and can't seem to find where/what would actual limit the tokens sent, I assume ChatMemoryBuffer would (but only appears to on the .get() method) and I'm not sure it's counting tokens from tool responses against that
Even using a model with much larger context (128k) it still ends up exceeding it because the memory doesn't appear to be managed/buffered at all, despite setting it to 5000 (or any other number)
ah I think I see the issue maybe
So the top-level agent memory is seprate from the worker memory. Then when the task is compete, the new memory gets commited to the top level
In this case, it would be defaulting to a token limit of 3000 though...
This probably needs to be updated to count tokens in tool calls, I think you are right
yeah most of my tokens are from the tool calls, and I think that's what's missing
is there any existing code that counts tool call tokens? I couldn't find any great documentation on this to patch it
there is llama-index-core/llama_index/core/utilities/token_counting.py
should probably be used here
(And also updated, right now it only looks for function_call
, but I think tool_calls
is the kwarg now?
Appreciate the help, have been fighting this for weeks and you've given me more leads in a few minutes than I've had for weeks!
Yea no worries! I'm not sure how quickly I can jump on this, but I think the issue is definitely identified. If you can make a PR that would be super cool π