The community member is building a Q&A chatbot using llama-index, which is fed by multiple data sources (Slack, Confluence, Jira, Google Docs). They want to ensure that when a user talks to the bot, it only fetches documents that the user is allowed to see. For example, if a user is allowed to see document X but not document Y, the semantic search should exclude document Y.
The community members discuss potential approaches, with one suggesting the use of strict metadata filtering. The original poster confirms that for some data sources, they can fetch users' permissions directly from the API, but other sources do not expose this information.
There is no explicitly marked answer in the comments.
Hi everyone! I have a general question about RAG and Data Privacy. I'm using llama-index to build a Q&A chatbot, which is fed by multiple data sources (Slack, Confluence, Jira, Google Docs).
Now, when a user talks to the bot, I want to fetch documents which this user is allowed to see. For example, if a user is allowed to see document X but not document Y, I want the semantic search to exclude document Y.
What's the best way of doing that? Are there any best practices around this issue? I couldn't find much information online, and specifically about llama-index + privacy.