Find answers from the community

Updated 2 weeks ago

Embed

Hello, we have a little problem with llamaindex, when we try to load a pdf file in a database (postrgres on neon) with Mistral's embed model we get an error message about going overlimit for the tokens, we tried to split the document for every page and use the TokenTextSplitter with no good result, the problem is the only "solution" was to set the insert_batch_size parameter lower (up to 21 max) but that shouldn't have an impact on the embed model but on the db right ? πŸ˜…
L
R
3 comments
What embedding model class are you using?

The insert batch size will become the upper bound on the embed_batch_size
We used MistralAIEmbedding from llama_index.embeddings.mistralai, that was for our project in the llm x law hackathon in Paris yesterday
You didn't share the exact error, but my first thought was that your chunk size was too big

If it truly is the batch size, you can set embed_batch_size=20 or similar on the embed model
Add a reply
Sign up and join the conversation on Discord