Find answers from the community

Updated 3 months ago

How to restrict the response from the query if the input was out of the domain

How to restrict the response from the query if the input was out of the domain
k
3 comments
Please stand by @Navagridhar Ramsait C as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
When developing your LLM application with LlamaIndex, you may encounter situations where your data is extremely specific (medical, legal, scientific, financial, or other documents with technical lingo) and falls out of the domain. Here are some strategies you can use to improve the results:

  1. Try out alternate embeddings: You can check the MTEB Leaderboard or configure a local embedding model. Here are the steps to configure a local embedding model: Local Embedding Models ()
  1. Test out fine-tuning of embeddings: Tools like setfit () can be used for this purpose. Anecdotally, retrieval accuracy has been seen to improve by ~12% by curating a small annotated dataset from production data. Even synthetic data generation without human labels has been shown to improve retrieval metrics across similar documents in train / val sets.
  1. Test out sparse retrieval methods: Methods like ColBERT, SPLADE have been shown to generalize well to out of domain data. Some of these methods are starting to be available in some enterprise systems like Elastic Search’s ELSeR ().
  1. Evaluate the changes: You can check out the evaluation principles guide () on how you might evaluate the above changes.
Remember, these are just strategies and may not completely restrict the response from the query if the input was out of the domain. However, they can significantly improve the retrieval accuracy.
Source: Development Pathway ()
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord