It will do rather large chunks at the beginning (4000 tokens i think, with some overlap), and then break them up again at query time to make sure they fit into prompts
You can also set chunk_size_limit during index construction to manually set the size limit for each chunk
Ah, chunk_size_limit sound familiar! I think that's what I did last time - the data I'm working with is kind of dense, so smaller chunks ends up getting me better results. Thank you very much for your help!