LlamaIndex

Log inLog into community

Find answers from the community

Updated 5 months ago

Context window

Context window

At a glance

The community member is interested in directly accessing the context window in LlamaIndex, which stores the system prompt and query-response exchanges. They want to be able to change the system prompt or manually edit the model responses. The comments suggest that the system prompt is sent with every message and is a permanent part of the context window, but there doesn't seem to be a way to directly access the contents of the context window. The community members discuss the structure of the context window and how it is built up, but there is no clear answer provided on how to directly access the contents.

Useful resources

bbin4ry_d3struct0r

·

This may not be related to LlamaIndex, but does anyone know how to directly access the context window? I do not mean setting the window size via Settings.context_window but accessing the actual contents inside the window, which permanently stores the system prompt and the query-response exchanges up to the token limit.

I want to give the user the option to 1) change the system prompt on the fly depending on model responses; and/or 2) manually edit the model responses more to the user's liking. I figured the most convenient way to do this is to directly access the context window contents.

If there's no way to do this, then I guess the only recourse is via the chat store or memory buffer, but the system prompt is not stored in either.

L

b

21 comments

The context window isn't permanent

Whenever you do llm.complete("text") or llm.chat(<list of chat messages>), that is exactly what gets sent to the llm

So I guess to answer your question, using the llm directly like that gives access

bbin4ry_d3struct0r

No, the context window is rolling, but the system prompt should be permanent because the LLM is supposed to always follow the instructions of the prompt (shouldn't it?).

bbin4ry_d3struct0r

However, I was not able to verify this because I set up a network traffic viewer (Wireshark) to see exactly what I'm sending to the LLM with each query, and the system prompt is not part of it.

bbin4ry_d3struct0r

This is my model and chat engine setup:

bbin4ry_d3struct0r

Plain Text

Settings.embed_model = HuggingFaceEmbedding(model_name = "BAAI/bge-small-en-v1.5")
Settings.llm         = Ollama(model = "mistral", request_timeout = 180.0)

if index is None:
    chat_engine = SimpleChatEngine.from_defaults(
        memory        = chat_memory,
        system_prompt = template
    )
else:
    chat_engine = index.as_chat_engine(
        chat_mode        = "condense_plus_context",
        memory           = chat_memory,
        similarity_top_k = 3,
        system_prompt    = template,
        verbose          = True
    )

bbin4ry_d3struct0r

As such, if system_prompt is a permanent portion of the context window, it doesn't show up in the network traffic, so I'm looking for another method to directly access the context window contents.

I'm like 99% confident that the system prompt is sent every time

Here it's added to prefix message a
https://github.com/run-llama/llama_index/blob/af6ea71c787811cf4c11ebfccf758530140b8380/llama-index-core/llama_index/core/chat_engine/simple.py#L59

Here the llm input is constructed with prefix messages
https://github.com/run-llama/llama_index/blob/af6ea71c787811cf4c11ebfccf758530140b8380/llama-index-core/llama_index/core/chat_engine/simple.py#L95

Setting a breakpoint would verify this immediately

Try setting the system prompt to say something like "talk like a pirate in every response" and watch it do that

Condense plus context is a little more janky

Inserts the system prompt dynamically along with the chat history into a response synthesizer
https://github.com/run-llama/llama_index/blob/af6ea71c787811cf4c11ebfccf758530140b8380/llama-index-core/llama_index/core/chat_engine/condense_plus_context.py#L221

Where the system prompt you provide is appended to the context system prompt
https://github.com/run-llama/llama_index/blob/af6ea71c787811cf4c11ebfccf758530140b8380/llama-index-core/llama_index/core/chat_engine/utils.py#L23

bbin4ry_d3struct0r

I'm gonna look these up 👍

bbin4ry_d3struct0r

I was always under the impression the rolling context window worked like this:

bbin4ry_d3struct0r

Plain Text

[START]

[sys_prompt]

[query_1]
[response_1]

[query_2]
[response_2]

...

[END]

bbin4ry_d3struct0r

Plain Text

[START]

[sys_prompt]

[query_31]
[response_31]

[query_32]
[response_32]

...

[END]

bbin4ry_d3struct0r

So, as the conversation proceeds, newer exchanges replace older exchanges, but the system prompt is permanently stored.

bbin4ry_d3struct0r

It's the only way for the model to "remember" its main objective.

bbin4ry_d3struct0r

I verified that the system prompt is indeed sent with every message by running Wireshark on the Ollama server port instead of my application port ... because duh 🤦‍♂️

bbin4ry_d3struct0r

There doesn't seem to be any way to directly access the contents in the context window, though.

bbin4ry_d3struct0r

I'll have to play around with the system_prompt or prefix_messages variables.

The context window is built up by several things

templates
prefix messages
chat history
(in a context chat engine) the retrieved context

All of these are accessed individually

Add a reply

Sign up and join the conversation on Discord