Moderation, prompt, and response restrictions

At a glance

Hi guys, the project I work on is about the creation of a chatbot assistant. Users will ask anything to it, but many of those things we shouldn't know how to reply, in fact, we should know how to reply for a specific set of subjects... What strategies do you guys use for identifying what the chatbot doesn't know about? Right know I'm trying using post processor reranking (FlagEmbedding with BAAI/bge-reranker-large) with a threshold, so if no documents match after that step, it means my chatbot doesn't know how to reply about that query. I have to tweak a bit more with that threshold because during my tests, sometimes the user query was supposed to match with a document, but then the reranking score is down the threshold and the chatbot says that it doesn't know about the subject of that query.

Do you guys know of more methods or strategies to dealing with restrictions of topics a chatbot knows about? Thanks!

5 comments

vverdverm

This is difficult to do generally. The big providers have concepts for "moderation" but it is pretty opaque generally. Some things they are likely doing

filtering specific terms (input &| output)
filtering based on embeddings (i/o)
prompt based techniques
using an llm to check (i/o)

They are generally more open ended systems, so restricting to what you should know about (like if RAG doesn't return anything meaningful, reply you don't know), the problem may get easier.

Prompt injection is another area to look into, as it will show the cleverness with which one can break these protection systems, to the point it is likely an impossible problem to solve.

vverdverm

Moderation, prompt, and response restrictions

CChicoButico

hey @verdverm thanks for your answer. You gave many ideas in order to improve moderation. I'll definitely try some of these approaches you mentioned.

vverdverm

yeah, when you look at real production systems, you will see a mix of hardcoded values, pattern based rules, and ML powered systems, generally speaking about production AI systems. Saw this kind of thing even before there were LLMs, chatbots, and prompts

YYolk

@ChicoButico we've just introduced a ZenGuard AI integration to LlamaIndex - https://llamahub.ai/l/llama-packs/llama-index-packs-zenguard. We protect against Prompts Attacks (prompt injection, jailbreaks), topic restrictions (allowed and banned topics), PII, sensitive info, and keywords leakage (control what you share vs. what is sent to you), toxicity and other features in security.

Add a reply

Find answers from the community

Moderation, prompt, and response restrictions