The community members are discussing the idea of caching queries to improve the performance of a Q/A application. The main points are:
- The original poster has a proof-of-concept for semantic query caching using Pinecone (or a vector index) and is considering integrating it into the gpt_index library instead of shipping it as a separate library.
- The caching mechanism is based on approximate similarity, not exact similarity, as humans don't always ask the same exact questions. The cache will store high-quality answers and use them to respond to similar questions, avoiding the need for additional LLM calls.
- The community members discuss the ability to manually "seed" the cache with high-quality answers, and the possibility of incorporating a human feedback element to bust cached answers that are negatively scored.
There is no explicitly marked answer in the comments.
@jerryjliu0 have you thought at all before about caching queries?
Have a cool PoC for semantic query caching via pinecone (could use the vector index instead) rn and I feel like there might be a place in gpt_index to maybe slot this in as opposed to shipping an external library.
i've been thinking about it! what did you have in mind more specifically? like be able to re-use the exact same query? or also make use of related queries?
And I am going to seed the cache with high-quality answers via my hypothetical Q/A mechanism I mentioned previously, such that the majority of questions are answered nice, because vanilla gpt_index queries are lackluster on this dataset I am using (mostly because I havent optimized them).
thanks. it seems like this cache is more about approximate similarity than exact similarity. i was thinking about something similar w.r.t a query cache of previous questions/answers. i'm assuming you'd want the ability to manually "seed" the cache though?
And yes, useful feature to be able to seed, though I guess "seeding" here could just be creating a new SimpleVectorIndex at runtime with documents (or loading it from disk)