Find answers from the community

Updated 3 months ago

So the bge embedding models are the

So the bge embedding models are the leading OS embedding models; however, they have a 512 context window. I can see that the defaults for llama index is 1024. I presume that they are just taking the first 512 tokens when embedding a 1024 chunks size with such a model.

Shouldn't upper bounds of the chunk size be equal to the max_position_embeddings of the embedding model.
d
W
3 comments
yea I think it will truncate if the chunk size exceeds the max context window for the embedding model
ya that's a good suggestion. Right now the default setting is not dynamic based on the model selected. And the setting is mostly optimized for OpenAI embedding / models
Sweet. Maybe a warning would be sufficient so the user knows that there is a miss match between the chunk size and the embedding model.
Add a reply
Sign up and join the conversation on Discord