The community members are experiencing an issue with the llamaindex library when trying to load a PDF file into a Postgres database using the Mistral embedding model. They are encountering an error message about going over the token limit. The community members have tried splitting the document into pages and using the TokenTextSplitter, but the only "solution" they found was to lower the insert_batch_size parameter, which they believe should only impact the database and not the embedding model.
In the comments, a community member asks what embedding model class is being used, and the original poster responds that they are using MistralAIEmbedding from llama_index.embeddings.mistralai. Another community member suggests that the issue may be related to the chunk size being too large, and recommends setting the embed_batch_size to 20 or a similar value on the embedding model.
Hello, we have a little problem with llamaindex, when we try to load a pdf file in a database (postrgres on neon) with Mistral's embed model we get an error message about going overlimit for the tokens, we tried to split the document for every page and use the TokenTextSplitter with no good result, the problem is the only "solution" was to set the insert_batch_size parameter lower (up to 21 max) but that shouldn't have an impact on the embed model but on the db right ? π