Hi everyone! I have a general question

At a glance

The community member is building a Q&A chatbot using llama-index, which is fed by multiple data sources (Slack, Confluence, Jira, Google Docs). They want to ensure that when a user talks to the bot, it only fetches documents that the user is allowed to see. For example, if a user is allowed to see document X but not document Y, the semantic search should exclude document Y.

The community members discuss potential approaches, with one suggesting the use of strict metadata filtering. The original poster confirms that for some data sources, they can fetch users' permissions directly from the API, but other sources do not expose this information.

There is no explicitly marked answer in the comments.

ddavid1542

Hi everyone! I have a general question about RAG and Data Privacy. I'm using llama-index to build a Q&A chatbot, which is fed by multiple data sources (Slack, Confluence, Jira, Google Docs).

Now, when a user talks to the bot, I want to fetch documents which this user is allowed to see. For example, if a user is allowed to see document X but not document Y, I want the semantic search to exclude document Y.

What's the best way of doing that? Are there any best practices around this issue? I couldn't find much information online, and specifically about llama-index + privacy.

2 comments

TTeemu

One approach could be using strict metadata filtering. Do you currently have the permissions defined somewhere regarding what each user can see?

ddavid1542

For some data sources, yes. I can fetch users' permissions directly from the API. Other sources doesn't expose this kind of information.

Add a reply

Find answers from the community

Hi everyone! I have a general question