Hi all, can people who have experience with llamaindex help me figure out how you guys decide on the "chunk_size" and "chunk_overlap" field values? and similarly the "context_window" and "num_output" fields in prompt_helper !?
I know how to change num_output but I don't understand what it signifies, the documentation says- "The number of maximum output from the LLM. Typically we set this automatically given the model metadata. This parameter does not actually limit the model output, it affects the amount of “space” we save for the output, when computing available context window size for packing text from retrieved Nodes." - Why are we saving space for the output? How can the output from the LLM for a query affect the context window - doesn't context window mean setting the space for the past records?
If you changed to some custom LLM that has a smaller/larger context window, then this likely needs to be set manually for things to work smoothly. I can't really think if a good reason to change chunk overlap besides just experimenting lol
LLMs are almost entirely decoder models. What this means is that they generate one token at a time, add that token to the input, and generate the next
This means that if we want to generate at most 256 tokens, then we have to leave room in the input for 256 tokens.
The only models that break this pattern are Google FLAN-T5 models, which are encoder-decoder instead of decoder. In this case, the input/output are completeley disconnected (although this architecture doesn't seem to be catching on for LLMs -- it's very resource heavy)