jerryjliu98 9313 have you thought at

At a glance

The community members are discussing the idea of caching queries to improve the performance of a Q/A application. The main points are:

- The original poster has a proof-of-concept for semantic query caching using Pinecone (or a vector index) and is considering integrating it into the gpt_index library instead of shipping it as a separate library.

- The caching mechanism is based on approximate similarity, not exact similarity, as humans don't always ask the same exact questions. The cache will store high-quality answers and use them to respond to similar questions, avoiding the need for additional LLM calls.

- The community members discuss the ability to manually "seed" the cache with high-quality answers, and the possibility of incorporating a human feedback element to bust cached answers that are negatively scored.

There is no explicitly marked answer in the comments.

yyourbuddyconner

@jerryjliu0 have you thought at all before about caching queries?

Have a cool PoC for semantic query caching via pinecone (could use the vector index instead) rn and I feel like there might be a place in gpt_index to maybe slot this in as opposed to shipping an external library.

12 comments

jjerryjliu0

i've been thinking about it! what did you have in mind more specifically? like be able to re-use the exact same query? or also make use of related queries?

yyourbuddyconner

Well, right now if the query has a cosine similarity of >0.9 it uses a cached answer for a previously answered question

yyourbuddyconner

Basically for a Q/A app, no need to answer similar questions twice via LLM calls

yyourbuddyconner

Just cache and hit the cache by comparing incoming embeddings with a vector store

yyourbuddyconner

So, related queries, not exact same

yyourbuddyconner

And I am going to seed the cache with high-quality answers via my hypothetical Q/A mechanism I mentioned previously, such that the majority of questions are answered nice, because vanilla gpt_index queries are lackluster on this dataset I am using (mostly because I havent optimized them).

yyourbuddyconner

There's also like 3-4 different queries I want to cache (only caching 1 right now), so was going to implement as a function decorator.

jjerryjliu0

thanks. it seems like this cache is more about approximate similarity than exact similarity. i was thinking about something similar w.r.t a query cache of previous questions/answers. i'm assuming you'd want the ability to manually "seed" the cache though?

yyourbuddyconner

Yeah, exact similarity is nice (and easy, redis will do that), but humans dont always ask the same exact questions

yyourbuddyconner

And if the answer will be the same anyways, the idea here is to not answer it again

yyourbuddyconner

And yes, useful feature to be able to seed, though I guess "seeding" here could just be creating a new SimpleVectorIndex at runtime with documents (or loading it from disk)

yyourbuddyconner

I also incorporated a human feedback element such that negatively-scored cached answers bust and result in new answers.

Add a reply

Find answers from the community

jerryjliu98 9313 have you thought at