Find answers from the community

Updated 4 months ago

Is there any kind of prompt caching in

At a glance

Is there any kind of prompt caching in place? How can I intercept llm calls, to put a cache layer before? Is there any mechanism in-place to do this, instead of implementing by hand?

8 comments

WWhiteFang_Jr

You can create a retriver and extract all the nodes and then do the llm call as per your format adding all node content

ssickness272

yeah but implicit

ssickness272

not explicitly creating a vector store

ssickness272

isnt there anything in the LLM abstraction?

WWhiteFang_Jr

Let me see if I got your query right:

You want to use llm without creating a vector store ? with prompt changing capability?

ssickness272

So I want to have a cache layer so that eveyr prompt gets intercepted and ran through the caching layers.
I want to avoid doing it manually for every request. Lets say we have an agent that makes multiple requests that I wont have access manually, I would have to rewrite all of the classes to be able to have cache.
Lets say i use a query engine for agent tools, I would have to rewrite query engine so that it intercepts the prompts and checks the cache

WWhiteFang_Jr

Yea currently there is no feature such as this present, You'll have to write this on your own

ssickness272

thank you!

Add a reply