Find answers from the community

Updated 6 months ago

Is there any kind of prompt caching in

At a glance

The community members are discussing the possibility of implementing prompt caching for large language models (LLMs) to avoid manually handling every request. They explore ideas like creating a retriever to extract nodes and perform LLM calls, as well as intercepting LLM calls to add a cache layer. However, the community members indicate that there is currently no such feature available, and the user will have to implement this functionality on their own, such as by rewriting classes or a query engine to intercept prompts and check the cache.

Is there any kind of prompt caching in place? How can I intercept llm calls, to put a cache layer before? Is there any mechanism in-place to do this, instead of implementing by hand?
W
s
8 comments
You can create a retriver and extract all the nodes and then do the llm call as per your format adding all node content
yeah but implicit
not explicitly creating a vector store
isnt there anything in the LLM abstraction?
Let me see if I got your query right:
  • You want to use llm without creating a vector store ? with prompt changing capability?
So I want to have a cache layer so that eveyr prompt gets intercepted and ran through the caching layers.
I want to avoid doing it manually for every request. Lets say we have an agent that makes multiple requests that I wont have access manually, I would have to rewrite all of the classes to be able to have cache.
Lets say i use a query engine for agent tools, I would have to rewrite query engine so that it intercepts the prompts and checks the cache
Yea currently there is no feature such as this present, You'll have to write this on your own
Add a reply
Sign up and join the conversation on Discord