Hi! When using query_engine alone, I can control the size of data to not to exceed the context window:
chatmemory = ChatMemoryBuffer.from_defaults(token_limit=(history_limit + context_limit))
query_engine = index.as_chat_engine(chat_mode='condense_plus_context',
similarity_top_k=similarity_top_k,
llm=llm_engine,
system_prompt=prepared_system_prompt,
memory=chatmemory)
When using an agent, I'm trying to do the same:
agent = OpenAIAgent.from_tools(
tools, llm=self.llm_engine,
verbose=True,
system_prompt=self.system_prompt,
memory_cls=chatmemory # <=token_limit:14385
)
But some tool's output is still too big and I have the exception "This model's maximum context length is 16385 tokens. However, you requested 17561 tokens (15405 in the messages, 156 in the functions, and 2000 in the completion)". Why is that and how to fix it? Thanks!