Find answers from the community

Updated 3 months ago

I'm using llama index to build a

I'm using llama index to build a question answering bot based on my private knowledge base(a bunch of prepared question-answer pairs in csv file). the knowledge base is embedded into llama index's vectorStore. for now everything works well, except the latencies due to LLM api calls. I want to improve in this way: when user asks questions, I want the bot search the vectorStore first, if there is a good match, return directly the matched answer without turn to LLM; if there isn't a good match, then turn to LLM for an answer. the aim is to reduce unnecessary calls to LLM. does anyone knows how to do this? thank a ton!
T
a
4 comments
You can set response_mode to no_text to only fetch the similar documents without doing a LLM call. Obviously those chunks will be pretty raw and not nicely formatted like a LLM response. There is also the similarity filter for the fetched nodes https://gpt-index.readthedocs.io/en/latest/understanding/querying/querying.html
you mean "SimilarityPostprocessor" right? that is a good direction. thank you so much!
That's the one
Add a reply
Sign up and join the conversation on Discord