Find answers from the community

Updated 2 months ago

Privacy

hey y'all, I just have kind of a dumb newbie question regarding data privacy and llama index - what, if any, private info is exposed when you send over an index + query to LLMs? say I wanted to analyze my private health care data, would I be at any risk of exposing personal info? is this dependent on the language model used? thanks in advance!
L
k
11 comments
Yea with default settings, all data is sent to openai over their api

So at that point you are subject to their privacy policies

You can definitely run a local LLM and embedding model, but you'll need some powerful resources to run that
(And also, open-source models aren't that great yet 🥲)
dang okay i sort of figured that would be the case
im just confused, if im using like a simplevectorindex doesn't that construct the index locally without sending anything to the LLM? so the only data the LLM gets is the index itself right?
also thanks for answering me! love how active this community is
When you construct the index, you still need to generate embeddings for the data, so the data gets sent to the embed model (which is openai by default)

Embeddings models are easier to run locally though, if that helps
The embed model and the LLM both eventually read the plain text of the data you indexed
If you only need the embeddings, you can set response_mode="no_text" to only retrieve the nodes, without sending to the LLM.

This still requires an embed model, but you could run that locally as linked above (it might still complain about an openai key, but just set that to a random string)

Plain Text
query_engine = index.as_query_engine(response_mode="no_text")

response = query_engine.query("query")

print(response.source_nodes)
got it, this is super helpful! for some reason in my head i totally ignored the embeddings 🤦
Add a reply
Sign up and join the conversation on Discord