Hey team - Awesome work you're doing with Llamaindex! I'm keen to replace my existing RAG pipeline with it. Hoping / wondering if someone could help with two quick questions?
- Are there a caching mechanisms available out of the box for the OpenAIEmbedding and OpenAI classes? Development iteration speed is the top priority in my use case, and I've found it helpful to cache these results in my custom RAG pipeline.
- Has anyone developed and tuned an adaptive max_token mechanism for the semantic splitter? The semantic splitter is awesome but I keep running into chunks that are way too large when working with wikipedia data. Even dialing down the breakpoint percentile often doesn't work. Any tips and tricks would be greatly appreciated!