Find answers from the community

Updated 10 months ago

Hi everyone. There is a max token limit

At a glance

The community member in the post is asking what happens if the content they want to embed exceeds the max token limit of the embedding model. The kapa bot suggests that the model will only consider the first max_length tokens and ignore the rest. The comments indicate that some models may truncate and only calculate embeddings for the first X tokens, while others may error out. However, there is no definitive answer provided.

jjoe273558

Hi everyone. There is a max token limit for every embedding model. So if the size of the content I want to embed exceeds that token limit, what will happen? The kapa bot says that the model will only consider the first max_length tokens and ignore the rest. Is that the correct answer?

2 comments

WWhiteFang_Jr

If you are ingesting data via llamaindex then it will chunk down your content in a formatted way of 1024 tokens ( default )

If the embed model that you have chosen has even smaller limit, then you'll have to chunk it to smaller size.

Regarding whether it trunk the max token, not sure on that totally but I think it does

LLogan M

Some will "truncate" and only calculate embeddings for the first X tokens, some will error out

Add a reply