Find answers from the community

Updated 2 months ago

Context window

This may not be related to LlamaIndex, but does anyone know how to directly access the context window? I do not mean setting the window size via Settings.context_window but accessing the actual contents inside the window, which permanently stores the system prompt and the query-response exchanges up to the token limit.

I want to give the user the option to 1) change the system prompt on the fly depending on model responses; and/or 2) manually edit the model responses more to the user's liking. I figured the most convenient way to do this is to directly access the context window contents.

If there's no way to do this, then I guess the only recourse is via the chat store or memory buffer, but the system prompt is not stored in either.
L
b
21 comments
The context window isn't permanent

Whenever you do llm.complete("text") or llm.chat(<list of chat messages>), that is exactly what gets sent to the llm
So I guess to answer your question, using the llm directly like that gives access
No, the context window is rolling, but the system prompt should be permanent because the LLM is supposed to always follow the instructions of the prompt (shouldn't it?).
However, I was not able to verify this because I set up a network traffic viewer (Wireshark) to see exactly what I'm sending to the LLM with each query, and the system prompt is not part of it.
This is my model and chat engine setup:
Plain Text
Settings.embed_model = HuggingFaceEmbedding(model_name = "BAAI/bge-small-en-v1.5")
Settings.llm         = Ollama(model = "mistral", request_timeout = 180.0)

if index is None:
    chat_engine = SimpleChatEngine.from_defaults(
        memory        = chat_memory,
        system_prompt = template
    )
else:
    chat_engine = index.as_chat_engine(
        chat_mode        = "condense_plus_context",
        memory           = chat_memory,
        similarity_top_k = 3,
        system_prompt    = template,
        verbose          = True
    )
As such, if system_prompt is a permanent portion of the context window, it doesn't show up in the network traffic, so I'm looking for another method to directly access the context window contents.
Setting a breakpoint would verify this immediately
Try setting the system prompt to say something like "talk like a pirate in every response" and watch it do that
I'm gonna look these up πŸ‘
I was always under the impression the rolling context window worked like this:
Plain Text
[START]

[sys_prompt]

[query_1]
[response_1]

[query_2]
[response_2]

...

[END]
Plain Text
[START]

[sys_prompt]

[query_31]
[response_31]

[query_32]
[response_32]

...

[END]
So, as the conversation proceeds, newer exchanges replace older exchanges, but the system prompt is permanently stored.
It's the only way for the model to "remember" its main objective.
I verified that the system prompt is indeed sent with every message by running Wireshark on the Ollama server port instead of my application port ... because duh πŸ€¦β€β™‚οΈ
There doesn't seem to be any way to directly access the contents in the context window, though.
I'll have to play around with the system_prompt or prefix_messages variables.
The context window is built up by several things
  • templates
  • prefix messages
  • chat history
  • (in a context chat engine) the retrieved context
All of these are accessed individually
Add a reply
Sign up and join the conversation on Discord