Initial eval latency

Question

Hello, I am using LlamaIndex with Ollama to build a chatbot that leverages our fine-tuned model using RAG and a custom vectorized database. I use bge_onnx for the embedding model and DuckDB for the database. Previously, the setup included a embedding model (~125MB) and a vectorized database (~1GB). In that configuration, the FaithfulnessEvaluator typically completed evaluations in about 2 seconds.

Recently, I switched to a new embedding model version of bge_onnx (~2.2GB) and re-vectorized the database using DuckDB, resulting in a new database size of 1.75GB. After these updates, I've observed that the FaithfulnessEvaluator now takes more than 25 seconds for the first evaluation. However, on subsequent evaluations (2nd, 3rd, etc.), the process takes only about 1 second.

Could you help me understand why the first evaluation is significantly slower after the updates and suggest ways to optimize the evaluation process?

Vi · Answer

You're downloading the model at the initial run.You need to predownload it.

Find answers from the community

Initial eval latency