Cached Augmented Generation (CAG) with Gemini or other ...

At a glance

The community members are discussing the implementation of Cached Augmented Generation (CAG) with Gemini or other Large Language Models (LLMs) integrated with LlamaIndex. One community member suggests that CAG may require direct model access (e.g., with PyTorch) and may not be possible to implement over an API. Another community member shares an example implementation of CAG. The discussion also touches on the fact that Gemini has the longest context window among LLMs, making CAG particularly meaningful with Gemini, though other local models with large context windows may also be suitable.

Useful resources

AAlwin

Hi everyone,
Is there any implementation of Cached Augmented Generation (CAG) with Gemini or other LLMs integrated with LlamaIndex?

5 comments

LLogan M

correct me if I'm wrong, but doesn't CAG require direct model access (i.e. with pytorch)? I don't think you can implement this over an API

LLogan M

You can see an example here: https://github.com/hhhuang/CAG/blob/main/kvcache.py

AAlwin

Thanks for sharing that with me.
I'm unclear about direct model access too, which is why I'm looking into how it might work with Gemini. It might not be doable!

AAlwin

As I know, Gemini has the longest context window among the LLMs, so CAG is really only meaningful with Gemini!

LLogan M

There's lots of local models with large context windows too. But of course not as big as gemini

Add a reply

Find answers from the community

Cached Augmented Generation (CAG) with Gemini or other LLMs integrated with LlamaIndex