Cache-Augmented Generation (CAG): The Fu...

Question

I have a question: does llamaindex have cag?https://medium.com/@moh.hussain06/cache-augmented-generation-cag-the-future-of-efficient-generative-ai-bdef3ff85f06

WhiteFang_Jr · Answer

There may not be a direct implementation present yet but you could do something like:

Start keeping a dict pair of user query and bot answer , and then when user asks a new query do cosine similarity over the dict pair first if threshold is breached use that to answer the response else go to your engine

WhiteFang_Jr · Answer

I guess that should help you with getting started with CAG

好了啦 · Answer

got it

cmosguy · Answer

@WhiteFang_Jr this is what I am trying to wrap my mind around. The entire point of RAG is that you cannot fit in the context window. somehow the knowledge is distilled into a cache. you recommended keeping track of queries and storing that in the cache. but at the end of the day it seems that this is only viable if you have enough interactions.

WhiteFang_Jr · Answer

Hey!I have the following take for CAG:If I try to maintain the chat history in the context then if I have a multi user system then I need to maintain the context for each user.with a cache of dict pair I can easily manage the conversation of each user. Also in addition to acting as CAG , this system can work as a memory for your bot too. For example you can extract out the most similar records from it and then use that as a memory for maintaining the conversation context too!

WhiteFang_Jr · Answer

But tbh I'm not a fan of this CAG approach

cmosguy · Answer

thanks @WhiteFang_Jr yeah, there is a lot of noise being made out there about this technique. I am not sure if it will be just a fad, but i think the end goal is to eliminate latency and caching would help at some point.

Find answers from the community

Cache-Augmented Generation (CAG): The Fu...