The community members are discussing the configuration of the "chunk_size", "chunk_overlap", "context_window", and "num_output" fields in the llamaindex library. A community member asks for advice on how to set these values, and the other community members provide the following insights:
The defaults for "chunk_size" (1024) and "chunk_overlap" (20) are generally good, and the community member is advised not to change them unless they have a specific reason to do so. The "context_window" is set automatically when using an OpenAI language model, and it represents the maximum input size for the model. The "num_output" field determines the maximum number of tokens the language model will generate, with the default for OpenAI models being 256. The community members explain that this parameter does not actually limit the model's output, but rather affects the amount of "space" reserved for the output when computing the available context window size.
One community member provides a more detailed explanation, stating that language models are primarily decoder models, which means they generate one token at a time and add it to the input. This means that if the desired maximum output is 256 tokens, the input must have enough space reserved for those 256 tokens.
The community members seem to have provided a helpful explanation to the original poster
Hi all, can people who have experience with llamaindex help me figure out how you guys decide on the "chunk_size" and "chunk_overlap" field values? and similarly the "context_window" and "num_output" fields in prompt_helper !?
I know how to change num_output but I don't understand what it signifies, the documentation says- "The number of maximum output from the LLM. Typically we set this automatically given the model metadata. This parameter does not actually limit the model output, it affects the amount of “space” we save for the output, when computing available context window size for packing text from retrieved Nodes." - Why are we saving space for the output? How can the output from the LLM for a query affect the context window - doesn't context window mean setting the space for the past records?
If you changed to some custom LLM that has a smaller/larger context window, then this likely needs to be set manually for things to work smoothly. I can't really think if a good reason to change chunk overlap besides just experimenting lol
LLMs are almost entirely decoder models. What this means is that they generate one token at a time, add that token to the input, and generate the next
This means that if we want to generate at most 256 tokens, then we have to leave room in the input for 256 tokens.
The only models that break this pattern are Google FLAN-T5 models, which are encoder-decoder instead of decoder. In this case, the input/output are completeley disconnected (although this architecture doesn't seem to be catching on for LLMs -- it's very resource heavy)