Privacy and AI #16

Privacy and AI #16

In this edition of Privacy and AI

? AI & Algorithms in Risk Assessments (ELA, 2023)

? Hamburg DPA position on Personal Data processing by LLMs

? Dutch Data Protection Authority and the supervision of algorithms

? More on Privacy and AI - Data Protection Commissioner

? Privacy Threshold Assessment and Privacy Impact Assessment (Cal Dept Tech)

? AWS guidance to choose GenAI services

? Generative AI Controls Framework (IBM)

? Ethical Use of AI by SMEs - CAN-CIOSC 101:2019

? Foundation models vs task-specific models

? The AI's Trillion Dollar Time Bomb (CNBC)

? Generative AI releases


AI & Algorithms in Risk Assessments (ELA, 2023)

The European Labour Agency commissioned a report that addresses discrimination, bias and other ethical issues with using and developing automated systems.

Interesting points

- Analysis of the non-discrimination law, evaluation of what constitutes unlawful discrimination, and concrete applications in the context of fairness in AI.

- Evaluation of the gaps and limitations of the legal framework (hierarchy of protection, intersectional discrimination, and emergent patterns of discrimination)

- Mitigation framework, where the authors provide mitigations following the CRISP-DM process phrases (business understanding, data understanding, data preparation, modelling, evaluation, deployment).

Link here


Hamburg DPA position on Personal Data processing by LLMs

The Hamburg Commissioner for Data Protection and Freedom of Information (HmbBfDI) discussed the applicability of the GDPR to LLMs

The paper is very important because it clarifies how LLMs work and in particular whether they store personal data. This is in fact one of the most relevant papers I've seen lately regarding Privacy and AI because it clarifies some matters related to the processing of personal data by GenAI

Some highlights

1) Tokens are the basic element of LLMs processing of information. Tokens are small pieces of text, smaller than words but longer than letters. Eg the text “Mia Muller” might be stored as [M] [ia] [Mu] [ller]

2) Specific pieces of text are not stored anywhere within the model. So if the training data contains personal data, the data is converted into abstract mathematical representations. Crucially, “This abstraction process results in the loss of concrete characteristics and references to specific individuals”

3) LLMs do not store personal data. LLMs store only individual tokens Individual tokens as language fragments ([M] [ia] [Mu] [ller]) lack individual information content and do not function as placeholders for such. Even the embeddings, which represent relationships between these tokens, are merely mathematical representations of the trained input

4) Privacy attacks and personal data extraction do not mean that LLMs contain personal data.

5) The deployer of an LLM is not responsible for the unlawful training of an LLM by the developer. If the deployer fine-tune the model, it is then responsible for complying with privacy regulations

6) Data subject rights:

6.1) data subjects have no right to action against the LLM provider. This has tremendous consequences since, for instance, the right to erasure or rectification (in the context of accuracy) are not applicable. This seems to clash with the initial position of the Italian DPA in ChatGPT case.

6.2) Controller must fulfil DSR when using an AI system (based on an LLM. eg LLM powered chatbot) that processes personal data. This is relevant in particular regarding the outputs or database queries

7) Organizations fine-tuning LLMs must comply with privacy regulations, and the HmbBfDI advices using the least amount of personal data possible (or synthetic data), assign a proper legal basis and ensure DSR fulfilment

8) Locally hosted LLMs are safer from a privacy perspective

9) When using a third-party LLM, integrated into an AI system, the organization must enable the fulfilment of the DSRs, ensure protective measures are in place and assign responsibilities for processing (C, P, JC)

Link here


Dutch Data Protection Authority and the supervision of algorithms

The Dutch DPA (AP) highlights its work regarding the supervision of algorithms, in particular the use of AI by public bodies.

The AP cited that since 2023:

- The Education Implementation Service (DUO) used an algorithm that was discriminatory in nature without any substantiation to detect fraud with scholarships.

- The UWV illegally used an algorithm to detect fraud with unemployment benefits.

- Some municipalities unlawfully used the ‘ fraud score card ’.

- There are questions about the use of facial recognition by the police and about the functioning of the Police Public Order Information Team (TOOI).

The AP also highlights:

- its role in the supervision of algorithms, but it points out the need to take an intersectional approach

- its role in the monitoring of big tech processing of personal data

Finally, the AP critics the government's choice to rely on private organizations to evaluate privacy violations when this should have been the role of the AP. In particular, it mentions that the involvement of private parties often "leads to, for example, duplication, waste of tax money and confusion about the right assessment framework. In addition, private agencies sometimes have access to highly sensitive personal information during such an investigation."

Link here


More on Privacy and AI - Data Protection Commissioner

Some interesting points

- Some AI models have inherent risks relating to the way in which they respond to inputs or “prompts”, such as memorisation, which can cause passages of (personal) training data to be unintentionally regurgitated by the product.

- Sometimes, AI products rely on a process of “filtering” to prevent certain types of data to be provided to a user in response to a query or prompt. In some cases, those filters can be attacked and circumvented to cause such data to be made available or processed in unintended, unauthorised, insecure or risky ways.

- "Consider if you publish personal data on your own website. This may be from your staff or from your website users. You may need to ensure that you protect that personal data from being collected and used for AI training or other processing where you have not already agreed that purpose with your staff or users, or if they do not have a reasonable expectation it will be used for AI training." is this a responsibility of the controller that publishes the data or the company training and algorithm?

- Consider the purpose and goals of your processing and if there are other non-AI technologies or means to reaching them. These alternatives may be less risky or more appropriate for you.

- some points on legitimate expectations and processing data for training AI models

- As well as data protection obligations, you may need to consider other obligations like copyright, safety and security.

Link here


Privacy Threshold Assessment and Privacy Impact Assessment (Cal Dept Tech)

In July 2024, the California Department of Technology Privacy updated the Threshold Assessment and Privacy Impact Assessment to incorporate, among others, provisions related to Generative AI

Link here


AWS guidance to choose GenAI services

AWS launched guidance to help organizations determine which AWS GenAI services are the best fit. The guidance could also serve as a blueprint for selecting other GenAI service providers.


1) Understand the range of generative AI services, applications, tools, and supporting infrastructure

AWS GenAI Stack

2) Choose the appropriate foundation model for the particular needs. For this, consider:

- Modality: the type of data the model processes—text, images (vision), or embeddings

- Model size: number of parameters in the model

- Inference latency: the time it takes for a model to process input and return an output

- Context window: tokens (in LLMs, the amount of text) that the model can consider at any one time when generating responses

- Pricing: cost of using the FM

- Fine-tuning (process where a pre-trained model that has been trained on a large, generic dataset is further trained or fine-tuned on a smaller, specific dataset) and continuous pre-training capability (extending the initial pre-training phase with additional training on new, emerging data that wasn't part of the original training set, helping the model stay relevant as data evolves)

- Data quality: considering factors such as relevance, accuracy, consistency, bias, annotation and labelling, data processing

- Data quantity: consider the sufficiency of the data, data augmentation, balancing data, transfer learning

- Quality of the response: evaluate the output of a model based on several quality metrics, including accuracy, relevance, toxicity, fairness, and robustness against adversarial attacks

3) Train your employees to use the tool.

This step was not included in AWS guidance but I think it’s essential. Employees must be trained to improve key topics such as prompting strategies, what data they cannot use as input, how to evaluate the outputs.

4) Use: get ready to use the model

Link here


Generative AI Controls Framework (IBM)

IBM published a whitepaper identifying gen AI-specific risks at each layer of the tech stack.

The approach looks at risks in the context of:

? Securing the Data

? Securing the Model

? Securing the Usage

? Securing the Applications

? Securing the Infrastructure

Within each category, one or more controls were identified to address the risks associated with that layer of the stack (except for the Infrastructure Security Layer, which does not have specific controls and organizations should leverage the industry/custom framework that they use for the rest of their technology landscape)

Link here


Ethical Use of AI by SMEs - CAN-CIOSC 101:2019

In 2019 the Canadian Digital Governance Standards Institute launched CAN-CIOSC 101:2019, which is a standard that specifies minimum requirements for incorporating ethics in the use of AI by SMEs (<500 employees).

The standard provides a framework and process to help SME align with international and Canadian guidance norms on the governance of AI systems (including the OECD’s AI Principles, the Treasury Board of Canada Secretariat’s Directive on Automated Decision Making, and NIST RMF)

The standard was first launched in 2019, amended in 2021 and is now updated and open for comments until September.

Link here


Foundation models vs task-specific models

Taken from "Generative AI: Implications for Trust and Governance" AI Verify 2023

The AI Act uses:

- "General Purpose AI Model" instead of Foundation Models

- "General Purpose AI System" instead of Task-Specific Models

Link here


The AI's Trillion Dollar Time Bomb (CNBC)

Experts sounding the alarm on the widening gap between what big tech are spending on AI, and what they’re getting back from it.


Important points to understand what could happen with the AI hype in the short and long term, and why this wave is different from the blockchain.


Link here


Generative AI latestlatest releases

  • LLama 3.1 - Meta's most capable model to date (Meta)

https://ai.meta.com/blog/meta-llama-3-1/

  • Mistral Large 2 (Mistral)

https://mistral.ai/news/mistral-large-2407/

  • SearchGPT (prototype) (OpenAI)

https://openai.com/index/searchgpt-prototype/



Transparency note: GenAI tools

  1. Has any text been generated using AI? NO
  2. Has any text been improved using AI? This might include an AI system like Grammarly offering suggestions to reorder sentences or words to increase a clarity score. NO
  3. Has any text been suggested using AI? This might include asking ChatGPT for an outline, or having the next paragraph drafted based on previous text. NO
  4. Has the text been corrected using AI and – if so – have suggestions for spelling and grammar been accepted or rejected based on human discretion? YES, Grammaly app was used for typos and grammar
  5. Has GenAI been used in another way? YES, Google Translate was used to translate materials (eg. Dutch to English)


Unsubscription

You can unsubscribe from this newsletter at any time. Follow this link to know how to do it.


ABOUT ME

I'm a senior privacy and AI governance consultant currently working for White Label Consultancy. I previously worked for other data protection consulting companies.

I'm specialised in the legal and privacy challenges that AI poses to the rights of data subjects and how companies can comply with data protection regulations and use AI systems responsibly. This is also the topic of my PhD thesis.

I have an LL.M. (University of Manchester), and I'm a PhD (Bocconi University, Milano).

I'm the author of “Data Protection Law in Charts. A Visual Guide to the General Data Protection Regulation“ and "Privacy and AI". You can find the books here

Privacy and AI (2023) Paperback version available at Amazon
GDPR in Charts (2021)



Andres Messing Scholz

Business & Privacy | Data Protection GDPR | Technology & AI | DPO | AGPD | Manager | Country manager |

4 个月

thank you Federico Marengo

Seto Hovhannisyan

-- Business Development Manager | IT Sales Specialist | Driving Tech Solutions for Businesses | Python--

4 个月

Hi there! We are looking for out staff service providers. If you are the appropriate contact to discuss this further, please reply to this email. If not, kindly refer my profile to your relevant contacts. Best Regards, Seto Hovhannisyan Business Development Manager at Direlli

回复

Thank you for sharing

Melissa Tomazi

Lawyer | Data Privacy | LGPD and GDPR | Commercial and IT Contracts // I help companies to achieve compliance with data privacy laws

4 个月

Great material! Thank you for sharing it.

Leocadio Marrero Trujillo

Pasión por la Seguridad y la Privacidad.

4 个月

?Qué gran post Federico Marengo! Has hecho una puesta al día completa ??. Muchas gracias por la recopilación y por ponerla en contexto. Muy útil

要查看或添加评论,请登录

社区洞察

其他会员也浏览了