The community members are discussing the possibility of implementing prompt caching for large language models (LLMs) to avoid manually handling every request. They explore ideas like creating a retriever to extract nodes and perform LLM calls, as well as intercepting LLM calls to add a cache layer. However, the community members indicate that there is currently no such feature available, and the user will have to implement this functionality on their own, such as by rewriting classes or a query engine to intercept prompts and check the cache.
Is there any kind of prompt caching in place? How can I intercept llm calls, to put a cache layer before? Is there any mechanism in-place to do this, instead of implementing by hand?
So I want to have a cache layer so that eveyr prompt gets intercepted and ran through the caching layers. I want to avoid doing it manually for every request. Lets say we have an agent that makes multiple requests that I wont have access manually, I would have to rewrite all of the classes to be able to have cache. Lets say i use a query engine for agent tools, I would have to rewrite query engine so that it intercepts the prompts and checks the cache