Cached Augmented Generation (CAG) with Gemini or other LLMs integrated with LlamaIndex
Cached Augmented Generation (CAG) with Gemini or other LLMs integrated with LlamaIndex
At a glance
The community members are discussing the implementation of Cached Augmented Generation (CAG) with Gemini or other Large Language Models (LLMs) integrated with LlamaIndex. One community member suggests that CAG may require direct model access (e.g., with PyTorch) and may not be possible to implement over an API. Another community member shares an example implementation of CAG. The discussion also touches on the fact that Gemini has the longest context window among LLMs, making CAG particularly meaningful with Gemini, though other local models with large context windows may also be suitable.
Thanks for sharing that with me. I'm unclear about direct model access too, which is why I'm looking into how it might work with Gemini. It might not be doable!