Find answers from the community

Updated 2 months ago

Query Engine's Handling of Maximum Context Window Limits

At a glance

The community members discuss how a query_engine handles going over the maximum context window when using GPT-4 with an 8092 context window. They suggest that there are multiple LLM calls and a response synthesizer that refines the answer over these calls. The discussion also covers the use of a ChatEngine, which requires a vector base index, and the possibility of using a simpler agent or SimpleChatEngine if the context is passed in directly. However, it's noted that SimpleChatEngine may not handle going over the context window and instead relies on memory to filter and keep the conversation within limits.

Hello there. was wondering does query_engine handle going over the maximum context window?

like if you're using gpt-4 with a 8092 context window, and your nodes are over that limit, how does query engine handle that.
W
L
c
13 comments
I think there are multiple LLM calls in this case if I'm not wrong
Yea exactly. There is a response synthesizer that handles the context window, refining an answer over multiple llm calls
ahh right thanks!
And just to double confirm, Chat Engine requires a vector base index to be used right?
I can't just use Chat Engine like the ChatGPT UI.
B/c for this specific thing I'm doing I don't need vector data, just passed in context from the prompt.
a chat engine typically requires either a retriever or query engine

If you are just passin in all the context, you could use a simple agent, or SimpleChatEngine
I'm assuming SimpleChatEngine also handles the going over the context with refining with multiple llm calls
Hmmm, i don't think it does actually
Relies on the memory to filter out and keep the conversation within limits
Not to say it couldn't be updated I suppose
i see good to know!
Add a reply
Sign up and join the conversation on Discord