What is the significance of context

At a glance

The post asks about the significance of the context_window parameter in LlamaIndex's language models. The community members explain that the context_window specifies the maximum number of context tokens that the model can handle, which determines the amount of information the model can consider when generating a response. They provide examples of how the context_window is set for different models, and discuss the difference between context_window and max_tokens. The community members also define the chunk_size and chunk_overlap parameters, which control how the text data is split into chunks for processing. Finally, they explain an error message related to the context_window parameter, indicating that the input text has exceeded the model's context window.

ppraveenks30#120294

@kapa.ai What is the significance of context_window ?

21 comments

kkapa.ai

Please stand by @praveenks30#120294 as I search the knowledge base. I normally take about 20 seconds or so. :wave:

kkapa.ai

----------------------
The context_window is a parameter used in LlamaIndex's language models. It specifies the maximum number of context tokens that the model can handle. This parameter is important because it determines the amount of information the model can consider when generating a response.

For example, in the CleanlabTLM class, the context_window is set to DEFAULT_CONTEXT_WINDOW by default, but it can be overridden by user options. If the user specifies a model, the context_window is adjusted accordingly. For instance, if the model is "gpt-4", the context_window is set to 8192, and if the model is "gpt-3.5-turbo-16k", the context_window is set to 16385. If the model is not supported, the context_window is set to a dummy (default) value.

In some cases, the context_window is set to a lower value than the maximum to allow for some wiggle room. For example, in one of the examples, the context_window for llama2 is set to 3900, even though it can handle up to 4096 tokens.

The context_window parameter has replaced the max_input_size parameter in the prompt helper, which has been deprecated. Now, parameters are specified directly in the service_context and node_parser.

You can find more details about context_window in the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

ppraveenks30#120294

@kapa.ai what do we mean by this ?

It specifies the maximum number of context tokens that the model can handle

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
In the context of language models, a token is a unit of text that the model reads. A token can be as short as one character or as long as one word. For example, in the sentence "I love AI", there are three tokens: "I", "love", and "AI".

The context_window parameter specifies the maximum number of these tokens that the model can consider when generating a response. This is important because it determines how much context the model has to understand the input and generate appropriate output.

For instance, if the context_window is set to 100, the model can consider up to 100 tokens of the input text. If the input text has more than 100 tokens, the model will only consider the last 100 tokens and ignore the rest. This can affect the model's understanding of the input and the quality of the output.

In other words, the context_window parameter controls the amount of information from the input that the model can use, which can have a significant impact on the model's performance.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

ppraveenks30#120294

@kapa.ai What's the difference between context_window and max_tokens ?

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
The context_window and max_tokens are two different parameters used in LlamaIndex's language models, and they serve different purposes:

context_window: This parameter specifies the maximum number of context tokens that the model can handle. It determines the amount of information from the input that the model can use when generating a response. For instance, if the context_window is set to 100, the model can consider up to 100 tokens of the input text.

max_tokens: This parameter specifies the maximum number of tokens that the model can generate in its response. It controls the length of the output generated by the model. For example, if max_tokens is set to 50, the model's response will not exceed 50 tokens.

In summary, context_window affects the model's understanding of the input, while max_tokens affects the length of the output.

You can find more details about these parameters in the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

ppraveenks30#120294

@kapa.ai What's the default value of context window of different models ? is there a way to get those values ?

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
The default value of the context_window can vary depending on the specific model used. For instance, in the CleanlabTLM class, if the model is "gpt-4", the context_window is set to 8192, and if the model is "gpt-3.5-turbo-16k", the context_window is set to 16385. If the model is not supported, the context_window is set to a dummy (default) value.

In the Perplexity class, the context_window is determined based on the model name. For example, the models "llama-3-sonar-small-32k-chat", "llama-3-sonar-large-32k-chat", "llama-3-8b-instruct", and "llama-3-70b-instruct" have a context_window of 32768, 32768, 8192, and 8192 respectively.

In the IpexLLM class, the context_window is set to DEFAULT_CONTEXT_WINDOW.

In the LocalTensorRTLLM class, the context_window is also set to DEFAULT_CONTEXT_WINDOW.

In the AnthropicMultiModal class, the context_window is set to DEFAULT_CONTEXT_WINDOW.

In the HuggingFaceLLM class, the context_window is set based on the model's max_position_embeddings value, if available, otherwise it is set to the provided context_window value.

In the TextGenerationInference class, the context_window is set to DEFAULT_CONTEXT_WINDOW.

In the EverlyAI class, the context_window is determined by the everlyai_modelname_to_contextsize function.

To get these values, you can refer to the respective classes in the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

ppraveenks30#120294

@kapa.ai could you please define chunk_size and chunk_overlap as well ?

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
Sure, chunk_size and chunk_overlap are parameters used in LlamaIndex's node parsers, specifically in the TokenTextSplitter and SentenceSplitter classes. They control how the text data is split into chunks for processing.

chunk_size: This parameter specifies the maximum number of tokens in each chunk. When documents are ingested into an index, they are split into chunks of this size. For example, if chunk_size is set to 1024, each chunk of text will contain up to 1024 tokens. The default chunk size is typically 1024, but it can be customized based on the type of data you are indexing or the results from your retrieval.

chunk_overlap: This parameter specifies the number of tokens that overlap between consecutive chunks. This overlap can help ensure that the context is not lost between chunks. For example, if chunk_overlap is set to 20, each chunk will share the last 20 tokens with the next chunk. The default chunk overlap is typically 20, but it can also be customized.

Changing either of these parameters will change the embeddings that are calculated. A smaller chunk size means the embeddings are more precise, while a larger chunk size means that the embeddings may be more general, but can miss fine-grained details.

Here is an example of how to set these parameters:

Plain Text

from llama_index.core import Settings

Settings.chunk_size = 512
Settings.chunk_overlap = 50

kkapa.ai

You can find more details about these parameters in the LlamaIndex documentation () and here ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

ppraveenks30#120294

@kapa.ai what do we mean by this error ?
ValueError: Calculated available context size -234 was not non-negative.

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
The error message "ValueError: Calculated available context size -234 was not non-negative" typically indicates that the size of the input text (in tokens) has exceeded the context_window size of the model.

In LlamaIndex, the context_window parameter specifies the maximum number of tokens from the input text that the model can handle. If the input text is too long and exceeds this limit, the calculated available context size (which is context_window minus the number of tokens in the input text) becomes negative, leading to this error.

To resolve this issue, you can try reducing the size of your input text so that it fits within the model's context_window. Alternatively, if possible, you can use a model with a larger context_window.

You can find more details about the context_window parameter in the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

ppraveenks30#120294

@kapa.ai are you sure context_window is only for input tokens or is it input + output tokens ?

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
The context_window parameter in LlamaIndex's language models refers to the total number of tokens that the model can handle, which includes both the input tokens and the output tokens.

This means that the sum of the tokens in the input text and the tokens that the model generates as output should not exceed the context_window. If the input text is too long, it might leave little to no room for the model to generate output, which could lead to issues.

For example, if the context_window is set to 4096 and the input text is 4000 tokens long, the model can only generate up to 96 tokens of output. If the model tries to generate more than 96 tokens, it would exceed the context_window, which could lead to errors.

You can find more details about the context_window parameter in the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

Add a reply

Find answers from the community

What is the significance of context_window ?