Please stand by @praveenks30#120294 as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
The context_window
is a parameter used in LlamaIndex's language models. It specifies the maximum number of context tokens that the model can handle. This parameter is important because it determines the amount of information the model can consider when generating a response.
For example, in the CleanlabTLM
class, the context_window
is set to DEFAULT_CONTEXT_WINDOW
by default, but it can be overridden by user options. If the user specifies a model, the context_window
is adjusted accordingly. For instance, if the model is "gpt-4", the context_window
is set to 8192, and if the model is "gpt-3.5-turbo-16k", the context_window
is set to 16385. If the model is not supported, the context_window
is set to a dummy (default) value.
In some cases, the context_window
is set to a lower value than the maximum to allow for some wiggle room. For example, in one of the examples, the context_window
for llama2 is set to 3900, even though it can handle up to 4096 tokens.
The context_window
parameter has replaced the max_input_size
parameter in the prompt helper, which has been deprecated. Now, parameters are specified directly in the service_context
and node_parser
.
You can find more details about context_window
in the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai what do we mean by this ?
It specifies the maximum number of context tokens that the model can handle
Let me check the knowledge base for you. One moment! :books:
----------------------
In the context of language models, a token is a unit of text that the model reads. A token can be as short as one character or as long as one word. For example, in the sentence "I love AI", there are three tokens: "I", "love", and "AI".
The context_window
parameter specifies the maximum number of these tokens that the model can consider when generating a response. This is important because it determines how much context the model has to understand the input and generate appropriate output.
For instance, if the context_window
is set to 100, the model can consider up to 100 tokens of the input text. If the input text has more than 100 tokens, the model will only consider the last 100 tokens and ignore the rest. This can affect the model's understanding of the input and the quality of the output.
In other words, the context_window
parameter controls the amount of information from the input that the model can use, which can have a significant impact on the model's performance.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai What's the difference between context_window and max_tokens ?
Let me check the knowledge base for you. One moment! :books:
----------------------
The
context_window
and
max_tokens
are two different parameters used in LlamaIndex's language models, and they serve different purposes:
context_window
: This parameter specifies the maximum number of context tokens that the model can handle. It determines the amount of information from the input that the model can use when generating a response. For instance, if the context_window
is set to 100, the model can consider up to 100 tokens of the input text.
max_tokens
: This parameter specifies the maximum number of tokens that the model can generate in its response. It controls the length of the output generated by the model. For example, if max_tokens
is set to 50, the model's response will not exceed 50 tokens.
In summary,
context_window
affects the model's understanding of the input, while
max_tokens
affects the length of the output.
You can find more details about these parameters in the
LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai What's the default value of context window of different models ? is there a way to get those values ?
Let me check the knowledge base for you. One moment! :books:
----------------------
The default value of the context_window
can vary depending on the specific model used. For instance, in the CleanlabTLM
class, if the model is "gpt-4", the context_window
is set to 8192, and if the model is "gpt-3.5-turbo-16k", the context_window
is set to 16385. If the model is not supported, the context_window
is set to a dummy (default) value.
In the Perplexity
class, the context_window
is determined based on the model name. For example, the models "llama-3-sonar-small-32k-chat", "llama-3-sonar-large-32k-chat", "llama-3-8b-instruct", and "llama-3-70b-instruct" have a context_window
of 32768, 32768, 8192, and 8192 respectively.
In the IpexLLM
class, the context_window
is set to DEFAULT_CONTEXT_WINDOW
.
In the LocalTensorRTLLM
class, the context_window
is also set to DEFAULT_CONTEXT_WINDOW
.
In the AnthropicMultiModal
class, the context_window
is set to DEFAULT_CONTEXT_WINDOW
.
In the HuggingFaceLLM
class, the context_window
is set based on the model's max_position_embeddings
value, if available, otherwise it is set to the provided context_window
value.
In the TextGenerationInference
class, the context_window
is set to DEFAULT_CONTEXT_WINDOW
.
In the EverlyAI
class, the context_window
is determined by the everlyai_modelname_to_contextsize
function.
To get these values, you can refer to the respective classes in the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai could you please define chunk_size and chunk_overlap as well ?
Let me check the knowledge base for you. One moment! :books:
----------------------
Sure,
chunk_size
and
chunk_overlap
are parameters used in LlamaIndex's node parsers, specifically in the
TokenTextSplitter
and
SentenceSplitter
classes. They control how the text data is split into chunks for processing.
chunk_size
: This parameter specifies the maximum number of tokens in each chunk. When documents are ingested into an index, they are split into chunks of this size. For example, if chunk_size
is set to 1024, each chunk of text will contain up to 1024 tokens. The default chunk size is typically 1024, but it can be customized based on the type of data you are indexing or the results from your retrieval.
chunk_overlap
: This parameter specifies the number of tokens that overlap between consecutive chunks. This overlap can help ensure that the context is not lost between chunks. For example, if chunk_overlap
is set to 20, each chunk will share the last 20 tokens with the next chunk. The default chunk overlap is typically 20, but it can also be customized.
Changing either of these parameters will change the embeddings that are calculated. A smaller chunk size means the embeddings are more precise, while a larger chunk size means that the embeddings may be more general, but can miss fine-grained details.
Here is an example of how to set these parameters:
from llama_index.core import Settings
Settings.chunk_size = 512
Settings.chunk_overlap = 50
You can find more details about these parameters in the LlamaIndex documentation () and here ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai what do we mean by this error ?
ValueError: Calculated available context size -234 was not non-negative.
Let me check the knowledge base for you. One moment! :books:
----------------------
The error message "ValueError: Calculated available context size -234 was not non-negative" typically indicates that the size of the input text (in tokens) has exceeded the context_window
size of the model.
In LlamaIndex, the context_window
parameter specifies the maximum number of tokens from the input text that the model can handle. If the input text is too long and exceeds this limit, the calculated available context size (which is context_window
minus the number of tokens in the input text) becomes negative, leading to this error.
To resolve this issue, you can try reducing the size of your input text so that it fits within the model's context_window
. Alternatively, if possible, you can use a model with a larger context_window
.
You can find more details about the context_window
parameter in the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai are you sure context_window is only for input tokens or is it input + output tokens ?
Let me check the knowledge base for you. One moment! :books:
----------------------
The context_window
parameter in LlamaIndex's language models refers to the total number of tokens that the model can handle, which includes both the input tokens and the output tokens.
This means that the sum of the tokens in the input text and the tokens that the model generates as output should not exceed the context_window
. If the input text is too long, it might leave little to no room for the model to generate output, which could lead to issues.
For example, if the context_window
is set to 4096 and the input text is 4000 tokens long, the model can only generate up to 96 tokens of output. If the model tries to generate more than 96 tokens, it would exceed the context_window
, which could lead to errors.
You can find more details about the context_window
parameter in the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------