Find answers from the community

Updated last year

I am storing my index in a postgres

At a glance

The community member is storing their index in a Postgres vector database and noticed that the first time they ask a question, it takes 4 seconds to answer, but subsequent questions take only 1 second. They couldn't find any cache logic in the code and are trying to understand the reasoning behind this behavior. Other community members suggest that there is likely caching built into the Postgres database and the llama-cpp library being used, which may be responsible for the faster response times on subsequent queries.

I am storing my index in a postgres vector database. The first time I ask a question, let's say it takes 4 seconds to answer but if I ask the same question, it takes about 1 second. I couldn't find any cache logic in the code. Why does this happen? I am trying to understand the reasoning for this behavior. Has someone else encountered it?
b
A
L
9 comments
what llm are y ou using?
Llama-2 7B locally
Maybe someone has a smarter answer but there is probably a lot of caching built into each step, postgres definitely has cache for the most recently used query, i'm sure the llm has something built in too.
@Logan M Would you like to comment please?
like with llama-cpp? I think llama-cpp does some caching internally
Yes with llama-cpp
yea I think it caches some stuff automatically, but haven't dove too deep into it
it's not llama-index doing it though
No problem! I'll check it out. Just wanted to understand where to look. This helps! Thanks so much!
Add a reply
Sign up and join the conversation on Discord