The community member is storing their index in a Postgres vector database and noticed that the first time they ask a question, it takes 4 seconds to answer, but subsequent questions take only 1 second. They couldn't find any cache logic in the code and are trying to understand the reasoning behind this behavior. Other community members suggest that there is likely caching built into the Postgres database and the llama-cpp library being used, which may be responsible for the faster response times on subsequent queries.
I am storing my index in a postgres vector database. The first time I ask a question, let's say it takes 4 seconds to answer but if I ask the same question, it takes about 1 second. I couldn't find any cache logic in the code. Why does this happen? I am trying to understand the reasoning for this behavior. Has someone else encountered it?
Maybe someone has a smarter answer but there is probably a lot of caching built into each step, postgres definitely has cache for the most recently used query, i'm sure the llm has something built in too.