So the bge embedding models are the leading OS embedding models; however, they have a 512 context window. I can see that the defaults for llama index is 1024. I presume that they are just taking the first 512 tokens when embedding a 1024 chunks size with such a model.
Shouldn't upper bounds of the chunk size be equal to the max_position_embeddings of the embedding model.
ya that's a good suggestion. Right now the default setting is not dynamic based on the model selected. And the setting is mostly optimized for OpenAI embedding / models