Hi all can people who have experience

At a glance

The community members are discussing the configuration of the "chunk_size", "chunk_overlap", "context_window", and "num_output" fields in the llamaindex library. A community member asks for advice on how to set these values, and the other community members provide the following insights:

The defaults for "chunk_size" (1024) and "chunk_overlap" (20) are generally good, and the community member is advised not to change them unless they have a specific reason to do so. The "context_window" is set automatically when using an OpenAI language model, and it represents the maximum input size for the model. The "num_output" field determines the maximum number of tokens the language model will generate, with the default for OpenAI models being 256. The community members explain that this parameter does not actually limit the model's output, but rather affects the amount of "space" reserved for the output when computing the available context window size.

One community member provides a more detailed explanation, stating that language models are primarily decoder models, which means they generate one token at a time and add it to the input. This means that if the desired maximum output is 256 tokens, the input must have enough space reserved for those 256 tokens.

The community members seem to have provided a helpful explanation to the original poster

Useful resources

rrini

Hi all, can people who have experience with llamaindex help me figure out how you guys decide on the "chunk_size" and "chunk_overlap" field values? and similarly the "context_window" and "num_output" fields in prompt_helper !?

10 comments

LLogan M

the defaults for chunk size and chunk overlap (1024 and 20) are pretty good, I wouldn't touch them tbh unless you had a good reason to

context window gets set automatically if you are using an openai LLM. If not, it's just the max input size to your LLM

num_output is how many tokens the LLM will output. By default, openai has a max of 256. You can read more on how to change this here: https://gpt-index.readthedocs.io/en/latest/core_modules/model_modules/llms/usage_custom.html#example-explicitly-configure-context-window-and-num-output

rrini

Thanks! Few followup questions:- I wouldn't touch them tbh unless you had a good reason to - what are examples of some good reasons Logan?

rrini

I know how to change num_output but I don't understand what it signifies, the documentation says- "The number of maximum output from the LLM. Typically we set this automatically given the model metadata. This parameter does not actually limit the model output, it affects the amount of “space” we save for the output, when computing available context window size for packing text from retrieved Nodes." - Why are we saving space for the output? How can the output from the LLM for a query affect the context window - doesn't context window mean setting the space for the past records?

LLogan M

If you changed to some custom LLM that has a smaller/larger context window, then this likely needs to be set manually for things to work smoothly. I can't really think if a good reason to change chunk overlap besides just experimenting lol

LLogan M

So here's a longer explanation:

LLogan M

LLMs are almost entirely decoder models. What this means is that they generate one token at a time, add that token to the input, and generate the next

This means that if we want to generate at most 256 tokens, then we have to leave room in the input for 256 tokens.

The only models that break this pattern are Google FLAN-T5 models, which are encoder-decoder instead of decoder. In this case, the input/output are completeley disconnected (although this architecture doesn't seem to be catching on for LLMs -- it's very resource heavy)

rrini

Wow! Got it! I had no idea!

rrini

You are god sent Logan!

rrini

🤩

LLogan M

Happy to help! 💪 :dotsCATJAM:

Add a reply

Find answers from the community

Hi all can people who have experience