Find answers from the community

Updated 2 months ago

Cached Augmented Generation (CAG) with Gemini or other LLMs integrated with LlamaIndex

At a glance

The community members are discussing the implementation of Cached Augmented Generation (CAG) with Gemini or other Large Language Models (LLMs) integrated with LlamaIndex. One community member suggests that CAG may require direct model access (e.g., with PyTorch) and may not be possible to implement over an API. Another community member shares an example implementation of CAG. The discussion also touches on the fact that Gemini has the longest context window among LLMs, making CAG particularly meaningful with Gemini, though other local models with large context windows may also be suitable.

Useful resources
Hi everyone,
Is there any implementation of Cached Augmented Generation (CAG) with Gemini or other LLMs integrated with LlamaIndex?
L
A
5 comments
correct me if I'm wrong, but doesn't CAG require direct model access (i.e. with pytorch)? I don't think you can implement this over an API
Thanks for sharing that with me.
I'm unclear about direct model access too, which is why I'm looking into how it might work with Gemini. It might not be doable!
As I know, Gemini has the longest context window among the LLMs, so CAG is really only meaningful with Gemini!
There's lots of local models with large context windows too. But of course not as big as gemini
Add a reply
Sign up and join the conversation on Discord