Text chunk size calculation

At a glance

im so confused 🙈 that doesn't make any sense to me. maybe im misunderstanding the meaning of one of more of these parameters

24 comments

LLogan M

No worries! Where is the confusing part to you?

ttim

what does "input" refer to in "max_input_size"?

LLogan M

the input to the model -> GPT is a decoder based model, and it has a limited context window.

Basically, this means there is an absolute cap to how much can be feed into the model, max_input_size (which for gpt3 and gpt3.5, is 4096 tokens)

It being a decoder model is important, because they predict one token at a time until a special stop token is predicted.

After each token is predicted, it is added to the input and the next token is predicted

So technically, it will keep predicting until the special stop token, or until the input becomes greater than max_input_size

LLogan M

I highly recommend this blog post for better details, a lot of this is specific to the architecture of GPT: https://jalammar.github.io/illustrated-gpt2/

It talks about GPT2, but GPT3 is basically the same except bigger

Other models are a little different, because they use encoder/decoder architectures instead (like googles FLAN-T5)

ttim

It's my understanding that the GPT models have only a single token limitation, and that is on the maximum tokens involved in both the prompt, and the completion

ttim

And for 3.5, that limit is 4096.

ttim

It doesnt make sens to describe this token limit as the "max input"

ttim

Unless im misunderstanding something

LLogan M

It's described as a "max input size" because 4096 is the max - if you try to input 4097 tokens into the model, you'll get an error

LLogan M

So back to the calculation

max_input_size=4096
prompt_tokens=200 (a guess)
num_output=256 (the maximum number of expected output tokens)

chunk_size=4096-200-256

Now, the model might not use all 256 output tokens that we left room for, but we can't know this ahead of time so we've left space for them. Remember, each token generated is then added back into the input. So that's we need to leave that "space"

LLogan M

Maybe this is going in circles though lol that's about the best I can explain it 😅

ttim

ah i think im getting it

ttim

the word "input" was really throwing me off

ttim

thanks so much man

LLogan M

No worries! I hope it's a little more clear! 😅

Most chat models are using this same architecture, so it should be the same idea for most models. The only one that's slightly different (that i can think of) is google FLAN, but maybe don't worry about that unless you use that model lol

ttim

appreciate it

ttim

as a follow up - shouldn't we be subtracting padding * num_chunks on line 112?

Attachment

ttim

since the padding is the space between chunks, num_chunks * padding gives us the aggregated padding amount

LLogan M

Since the function is calculating the size of a single chunk, no need to worry about how many other chunks there are

If llama index ends up creating 10 chunks, and the padding is one, that will be accumulated across all chunks like you said

ttim

hmm ok, so it takes "num_chunks" as a parameter just to fuck with me, ey

ttim

😆💀

ttim

bc it's always going to be 1 is what you're saying i think

LLogan M

Lol yea! Looking at it closer, we also divide by num chunks before we subtract the padding

After that division, it basically turns into "size per 1 chunk", and so we subtract the padding given a single chunk

ttim

👑 u dropped this

Add a reply

Find answers from the community

Text chunk size calculation