Find answers from the community

Updated 8 months ago

Hi! When using query_engine alone, I can

Hi! When using query_engine alone, I can control the size of data to not to exceed the context window:
Plain Text
chatmemory = ChatMemoryBuffer.from_defaults(token_limit=(history_limit + context_limit))
query_engine = index.as_chat_engine(chat_mode='condense_plus_context', 
                                                similarity_top_k=similarity_top_k, 
                                                llm=llm_engine,
                                                system_prompt=prepared_system_prompt,
                                                memory=chatmemory)

When using an agent, I'm trying to do the same:
Plain Text
agent = OpenAIAgent.from_tools(
            tools, llm=self.llm_engine, 
            verbose=True, 
            system_prompt=self.system_prompt,
            memory_cls=chatmemory # <=token_limit:14385
        )

But some tool's output is still too big and I have the exception "This model's maximum context length is 16385 tokens. However, you requested 17561 tokens (15405 in the messages, 156 in the functions, and 2000 in the completion)". Why is that and how to fix it? Thanks!
L
S
13 comments
Theres no assumptions made on the size of tool outputs

Maybe using a load-and-search tool would help, to wrap tools that return large outputs
And additional quesiton is "15405 in the messages, 156 in the functions" what are "messages" and "functions" here?
Sorry, can you please elaborate on "using a load-and-search tool "?
And additional quesiton is "15405 in the messages, 156 in the functions" what are "messages" and "functions" here?
The OpenAI api uses messages and functions/tool calls. Both count towards the overall token count
Basically, you can wrap any tool with a load and search tool
https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/tools/llamahub_tools_guide/#loadandsearchtoolspec

It will create an index on the fly if your tool is outputting long outputs
Or you could implement this yourself in the definition for you tool (if its a function tool)
Thank you! Let me look into it.
You know what? It worked for me greatly! But I don't have understanding why wrapping with this load-and-search class solved the problem. Where can I learn more about it?
Probably reading the source code is the best spot πŸ™‚
(click the first source code dropdown)
https://docs.llamaindex.ai/en/stable/api_reference/tools/load_and_search/
Ah, nice! Thank you!
basically it creates a load tool and a read tool (which creates and index and query engine under the hood)
Yeah, I got this part but does the index engine check the context window itself? Because when I don't use tools and use the chat engine, I have to pass the calculated chat memory, else there could be the same exception.
Since its using a query engine, the context window is handled. It takes all the tool output and does a top-k search on top of it
Add a reply
Sign up and join the conversation on Discord