Find answers from the community

Updated 2 weeks ago

Cache-Augmented Generation (CAG): The Fu...

W
c
7 comments
There may not be a direct implementation present yet but you could do something like:

Start keeping a dict pair of user query and bot answer , and then when user asks a new query do cosine similarity over the dict pair first if threshold is breached use that to answer the response else go to your engine
I guess that should help you with getting started with CAG
@WhiteFang_Jr this is what I am trying to wrap my mind around. The entire point of RAG is that you cannot fit in the context window. somehow the knowledge is distilled into a cache. you recommended keeping track of queries and storing that in the cache. but at the end of the day it seems that this is only viable if you have enough interactions.
Hey!

I have the following take for CAG:
  • If I try to maintain the chat history in the context then if I have a multi user system then I need to maintain the context for each user.
  • with a cache of dict pair I can easily manage the conversation of each user. Also in addition to acting as CAG , this system can work as a memory for your bot too. For example you can extract out the most similar records from it and then use that as a memory for maintaining the conversation context too!
But tbh I'm not a fan of this CAG approach
thanks @WhiteFang_Jr yeah, there is a lot of noise being made out there about this technique. I am not sure if it will be just a fad, but i think the end goal is to eliminate latency and caching would help at some point.
Add a reply
Sign up and join the conversation on Discord