I'm using llama index to build a

At a glance

I'm using llama index to build a question answering bot based on my private knowledge base(a bunch of prepared question-answer pairs in csv file). the knowledge base is embedded into llama index's vectorStore. for now everything works well, except the latencies due to LLM api calls. I want to improve in this way: when user asks questions, I want the bot search the vectorStore first, if there is a good match, return directly the matched answer without turn to LLM; if there isn't a good match, then turn to LLM for an answer. the aim is to reduce unnecessary calls to LLM. does anyone knows how to do this? thank a ton!

4 comments

TTeemu

You can set response_mode to no_text to only fetch the similar documents without doing a LLM call. Obviously those chunks will be pretty raw and not nicely formatted like a LLM response. There is also the similarity filter for the fetched nodes https://gpt-index.readthedocs.io/en/latest/understanding/querying/querying.html

aalextraza

you mean "SimilarityPostprocessor" right? that is a good direction. thank you so much!

TTeemu

Yup

TTeemu

That's the one

Add a reply

Find answers from the community

I'm using llama index to build a