Twelve AI Privacy Risks
Deepfakes, Phrenology, Surveillance, and More! A Taxonomy of AI Privacy Risks, HAO-PING (HANK) LEE, et al, Carnegie Mellon University, United States.

Twelve AI Privacy Risks

A recent paper on AI Privacy Risks discusses twelve privacy risks inherent to the use of artificial intelligence.?

If you don’t want to read further, there is one key takeaway.?

Effective AI governance will include privacy harm-envisioning techniques performed early in design. I'll encourage this to be performed by privacy engineering.?

These harm-envisioning techniques should be led by privacy engineers well-versed in privacy harms and AI-enabled system privacy risks. It is not sufficient to apply the typical Privacy by Design measure, the Privacy Impact Assessment questionnaire, after design. Instead engage privacy engineering during initial project ideation. It is also important for the privacy engineer to consider all potential privacy risks, as advances in AI may meaningfully change or exacerbate all potential privacy harms.?

Now to a summary of this excellent article, discussing the risks in the order discussed in the paper.?

1. Data Collection risks

Surveillance: Watching, listening to, or recording an individual's activities without their knowledge or consent. AI systems exacerbate the potential human harm due to their scale and ubiquity.

  • Systems that collect audio, video, or other sensor data could monitor users' activities or behaviors.?
  • This risk could be further exacerbated by a capability to link together audio, video or other sensor data specific to an individual across user accounts.?
  • A real-world example is a predictive policing system implemented in Xinjiang, China.

2. Data Processing risks

Identification: Linking specific data points to an individual's identity. AI capabilities create new scalable types of identification threats.?

  • Multiple data sources may be combined to contain a sufficient amount of information to identify or re-identify specific individuals.
  • Processing audio, video, or biometrics to identify or re-identify individuals is important when required for a specific purpose. If this purpose is not present, controls should be in place to reduce the identification threat.?
  • Impact may be greater, such as a greater decision error rate, if the AI operates on low-quality data.

Aggregation: Combining various pieces of data about a person to make inferences beyond what is explicitly captured. These defining capabilities of AI forecast behavior and are able to infer end-user attributes.

  • Combining data from multiple sources may derive insights about individuals' attributes, behaviors, or preferences.
  • Insights to be derived may not be responsible and ethical to the norms of society and may have social and legal impact. Inferring an individual has a specific medical condition may have impacts on the organization from such a decision.
  • Impact may be greater, such as a greater decision error rate, if the AI operates on low-quality data.

Phrenology/Physiognomy: Inferring personality, social, and emotional attributes about an individual from their physical attributes. AI may learn correlations between arbitrary inputs and outputs that are based on debunked pseudoscience.

  • Efforts may occur to make inferences about users' characteristics, traits, or proclivities based on their physical appearance or other specific physical attributes obtained directly or indirectly.
  • Phrenology and physiognomy are debunked pseudosciences and related decisions are likely to create privacy harm.
  • Impacts may go beyond the level of an individual harm and historical cases exist on how these pseudosciences cause mass discrimination.??

Secondary Use: Using personal data collected for one purpose for a different purpose without end-user consent.

  • Projects may involve repurposing or reusing personal data collected for a different original purpose, such as for training data.
  • Without informed consent, it is unlikely that users had contemplated that their contributed material was to be used as training material for a different use case. Training a facial recognition security system using photographs collected for a user photo album would clearly be considered a secondary use.

Exclusion: Failing to provide end-users with notice and control over how their data is being used. This lack of agency and control is enabling powerful AI systems to be created without individuals being able to exclude their information from the system.

  • Users should be aware of their personal data being collected and used for the AI system, and have the ability to consent or opt-out.
  • A capability should have been given to the users to remove their data from training datasets.
  • Take concern to understand the data lineage of the “public” data sets, and whether the data contained is truly public.

Insecurity: Carelessness in protecting collected personal data from leaks and improper access. This insecurity could lead to an AI model gaining unexpected access to personal data because of the lack of sufficient security controls, such as end-to-end encryption. AI models may be attacked to reveal training data causing leaks.

  • Measures must be taken to secure and protect any personal data collected or processed by the project.
  • Any use of personal data by the AI system must be authorized and provided through approved procedures to provide logical access.

3. Data Dissemination risks

Exposure: Revealing sensitive private information that people typically conceal. Generative AI may reconstruct censored or redacted content, or infer and expose sensitive information, preferences and intentions.

  • Controls need to be designed to prevent the system’s exposure of private information or activities.?

Distortion: Disseminating false or misleading information about people.?

  • Controls need to be designed so that the Generative AI does not generate or disseminate synthetic media (images, audio, video) that could misrepresent or falsely depict individuals.

Disclosure: Revealing and improperly sharing individuals' personal data. AI expands the disclosure risk as it may infer additional information beyond what was captured in the initial data.

  • Determine whether or not the system will share or disclose users' personal data with third parties, and if so, for what purposes.?
  • Determine whether or not the information shared or disclosed will contain information that is inferred.

Increased Accessibility: Widely available AI LLM chat bots are making it easier for a wide audience to access potentially sensitive information.?

  • Determine if users' personal data will be more accessible or available to a broader audience than intended.

4. Invasion risks

Intrusion: Actions that disturb one's solitude or encroach on personal space. AI enables ubiquitous and centralized surveillance infrastructures.?

  • Products that may be used to monitor the exterior of a home or a business could be used to encroach in the privacy of a neighbor.?
  • Employers are incredibly incorporating AI-infused workforce monitoring, connecting data from smartwatches and computer webcams, to track performance, attendance, and time-on-task.
  • Consolidation of sources to an aggregator, such as a local police force could conflict with citizens’ reasonable expectation of privacy.
  • Implementation of AI-enabled centralized surveillance infrastructures may be done without the awareness of the impacted population.?

Addressing AI-enabled system privacy risks

The researchers found in almost all cases, privacy risks were exacerbated by the AI-enablement in the system. Typical approaches to these risks are also insufficient. Consider the following privacy enhancing technologies or procedures:

  • Differential privacy and federated learning. These approaches only apply to some of the data processing risks, and do not address the physiognomy risk specifically.
  • Data privacy auditing. Privacy impact assessments, after the model is trained, are inherently limited in the ability to mitigate against risks that come from data collection and processing.?
  • Ethics checklists and toolkits. These tools may be applied earlier than the data privacy audits, however generally approach privacy risks at a higher level and rely on the practitioner’s own individual awareness of privacy risks.

AI-specific privacy guidance is required so that these risks are evaluated early in design, prior to AI model training. Care should also be taken as this research may not have uncovered all the potential privacy risks. Four subcategories of privacy risk in Solove’s taxonomy exist without relevant incidents reviewed as part of the research: Interrogation, Blackmail, Breach of Confidentiality, and Decisional Interference.?

Effective AI governance will include privacy harm-envisioning techniques performed early in design by privacy professionals well-versed in AI-enabled system privacy risks.

This is "early in design" not the typical "after design" questionnaires used for privacy impact assessments.

The paper is available at: https://arxiv.org/pdf/2310.07879.pdf

Isabel Barberá

AI Advisor & Researcher | AI Privacy & Security | AI Risk & Safety | Tech & Legal | PLOT4AI author | ENISA Data Protection Engineering advisor | Expert @CEN/CENELEC JTC21 developing AI European standards

11 个月

Eric Lybeck , maybe interesting for you: do you know PLOT4ai? It is a threat modeling library containing 86 AI threats. It is open source and in a couple of weeks gets an update that includes GenAI, third party related threats, and adaptation to the last AI Act text. https://www.plot4.ai It contains an online tool and it is also available in card deck format at Agile Stationery

Alexandre MARTIN

Autodidacte & Polymathe ? Chargé d'intelligence économique ? AI hobbyist ethicist - ISO42001 ? éditorialiste & Veille stratégique - Muse? & Times of AI ? Techno humaniste & Techno optimiste ?

11 个月
回复
Ece Gumusel

Ph.D. in Information Science Candidate at Indiana University Bloomington

11 个月

Highly recommended to read our paper here for conversational chatbots: https://arxiv.org/pdf/2402.09716.pdf#:~:text=These%20studies%20emphasized%20the%20following,Manipulation.

Daniel SUCIU

Data Protection & Governance dude | Founding member of Data Protection City | unCommon Sense "creative" | Proud dad of 2 daughters

11 个月

Right,but at least 10 of these 12 risks are present in other products/ services/ systems, even without AI. Moreover, even if these are known for some time, little was done to mitigate them.

Melissa Gorgei, JD, CIPP/US, CIPP/E

Technology, AI, & Data Privacy Attorney | Manager, Responsible AI @Accenture

11 个月

Looking forward to taking a look!

要查看或添加评论,请登录

Eric Lybeck的更多文章

社区洞察

其他会员也浏览了