Anonymization: the holly grail of data protection.
Mauro Provenzano
CIPP/E | Data Protection Compliance | Privacy & AI | Legal Counsel
GDPR defines anonymous data as data that “does not relate to an identified or identifiable natural person, or to personal data rendered anonymous” so “the data subject is not or no longer identifiable.” Personal data that meets this criteria is therefore not subject to the GDPR, making anonymous data the holy grail of data protection. If you can anonymize data, regulations like the GDPR simply no longer apply. From a compliance perspective, anonymous data makes our life easier.
In spite of the GDPR reference to anonymous data, and even though European data protection authorities have publicly talked about anonymization for decades, it still remains unclear anyone really knows what “anonymization” means in practice.
What is the difference between anonymisation and pseudonymisation??
'Anonymisation' means that individuals are not identifiable and cannot be reidentified by any means reasonably likely to be used (ie, the risk of reidentification is sufficiently remote). Anonymous information is not personal data and data protection law does not apply.
On the contrary, 'pseudonymisation' means that individuals are not identifiable from the dataset itself, but can be identified by referring to other information held separately. Pseudonymous data is therefore still personal data and data protection law applies. For example, Recital 26 of the GDPR makes it clear that personal data which has undergone pseudonymisation remains in scope of the law.?
From a regulatory point of view, going back in time to 2007, the predecessor of the EDPB, the ‘Working Party 29’ (WP), issued an opinion that clearly articulated the difference between anonymization and pseudonymization. The main difference between the two defined ‘pseudonymization’ as privacy protective and technically reversible. On the other hand, 'anonymization' was defined as such: “Disguising identities can also be done in a way that no reidentification is possible, e.g. by one-way cryptography, which creates in general anonymized data.”
But later on, in its 2014 opinion on anonymization techniques, the WP revisited the difference between anonymization and pseudonymization, stating that it could result in a data protection hazard the consideration of pseudonymized data as to be equivalent to anonymized data: “Pseudonymity allows for identifiability and therefore stays inside the scope of the legal regime of data protection.”
In their new analysis, the difference between anonymization and pseudonymization relies in the likelihood of reidentifiability. Although, several studies had recently demonstrated that it’s considerably difficult to perfectly anonymize data, meaning some possibility of reidentification often remains.
Therefore, the WP enumerated three specific re-identification risks:
Hence, it seemed that whenever those risks are addressed, an organization can rely on its anonymization process.
?
However, the WP also stated that “It is critical to understand that when a data controller does not delete the original (identifiable) data at event-level, and the data controller hands over part of this dataset (for example after removal or masking of identifiable data), the resulting dataset is still personal data. Only if the data controller would aggregate the data to a level where the individual events are no longer identifiable, the resulting dataset can be qualified as anonymous.”
Therefore, in addition to having addressed the reidentification risks, both aggregation and destruction of the raw data are also needed to achieve anonymization.
Nevertheless, EU regulators still vacillate today between these WP’s standards. Some of them in fact state that a residual risk of re-identification is acceptable as long as the right precautions are in place.
What can Organizations attempting to comply with these standards and aiming to meet EU anonymization requirements do?
1.????The first option is to give up on the project of anonymizing data entirely and simply consider all deidentified data as pseudonymized. While pseudonymized data does not fall outside the scope of EU data protections because reidentification is still possible, the compliance burden on pseudonymous data can be significantly lighter — assuming the processing purpose is legitimate, a legal basis is established (or the secondary purpose is considered to be compatible with the initial purpose) and the data controller is not in a position to identify individuals (making most individual rights virtually nonexistent, except the rights to information and to object).
2.????The second option lies in arguing the means of reidentification are not reasonably likely to be used, and relying more heavily on the WP 2007 opinion than on its 2014 opinion. The question becomes how can be argued, even though risk remains for reidentification, that the risk is sufficiently remote and therefore their data is anonymous?
The importance relies on the context, and account must be taken of all the means likely reasonably to be used for identification by the controller and third parties, paying special attention to the current state of technology (likely reasonably), given the increase in computational power and tools available.”
3.????The next option is to rely on what are called “trusted third parties” (TTPs), which can help serve as intermediaries between organizations possessing the raw data and those who seek to use anonymous data. Specifically, when one party wants to share anonymous data with a secondary organization, a TTP can “broker” the exchange, implementing deidentification techniques on the raw data, which remains under the control of the original party while sharing the deidentified data with the secondary organization.
4.????Synthetic data is also an option for many companies, which consists of creating new data from a sample set of data that preserves the correlations within that sample set but does not recreate any direct identifiers. It is artificial data that is generated from original data and a model that is trained to reproduce the characteristics and structure of the original data. This means that synthetic data and original data should deliver very similar results when undergoing the same statistical analysis. The degree to which synthetic data is an accurate proxy for the original data is a measure of the?utility?of the method and the model.?The generation process, also called?synthesis,?can be performed using different techniques, such as decision trees, or deep learning algorithms.
The use of synthetic data is growing in the health care space in particular, offering a promising way to extract trends from health data without using patient identifiers directly. Indeed, one such solution was?designated as anonymous data under GDPR standards by the CNIL (France).
5.????Finally, Differential privacy is a mathematical privacy framework that also holds promise for anonymization. The framework is a method of inserting controlled randomization into a data analysis process, resulting in limits on the amount of personal information inferable by any attacker.
The Information Commissioner’s Office’s approach
The UK ICO’s recent guidelines on anonymization aims to bring some light to the ‘full-anonymization’ vs. ‘effective-anonymization' discussion. The aim is to help organisations unlock the potential of data by putting in practice PET’s.
Privacy Enhancing Technologies (PETs) are technologies that can help organisations share and use people’s data responsibly, lawfully, and securely, including by minimising the amount of data used and by encrypting or anonymising personal information. The draft PETs guidance explains the benefits and different types of PETs currently available, as well as how they can help organisations comply with data protection law.
领英推荐
The special mention from the UK DPA is, again, for ‘Differential Privacy’: a method for measuring how much information the output of an algorithm reveals about an individual. It is based on the randomised injection of “noise”. Noise is a random alteration of data in a dataset so that values such as direct or indirect identifiers of individuals are harder to reveal, depending on the level of added noise.
However, data treated by a differentially private algorithm does not necessarily result in anonymous information. If you do not properly configure differential privacy with the right amount of noise, there would be a risk of personal data leakage from a series of different queries. The key then resides then in the precise injection of noise.
Benefits of effective anonymisation:?
? better understand the legal requirements about the information you hold and intend to share or disclose;
? improve your decision-making and risk reduction and management processes;
? adopt a data protection by design approach;
? protect individuals’ identities;
? reduce reputational risks caused by inappropriate or insecure disclosure or publication of personal data;
? reduce questions, complaints or disputes about your disclosure of information derived from personal data;
? develop greater confidence in publishing anonymous information in rich, re-useable formats; and
-> Wider benefits of anonymisation include:
? developing greater public trust and confidence that data is being used for the public good, while privacy is protected (ie by ensuring legally required safeguards are in place and being complied with);
? greater transparency as a result of organisations being able to make anonymous information more widely available;
? incentivising researchers and others to use anonymous information instead of personal data, wherever this is possible;
? economic and societal benefits deriving from the availability of rich data sources; and
? improved public authority accountability through better availability of information about service outcomes and improvements.
Benefits of pseudonymisation:
An overarching benefit of pseudonymisation is that it can make your data protection compliance simpler in a number of areas. The general processing regime in the GDPR provides a number of examples, such as:
? general analysis – Recital 29 of the GDPR incentivises you to adopt pseudonymisation not just as a security measure. This is because it enables you to undertake ‘general analysis’ of pseudonymised datasets that you hold, provided you put in place appropriate technical and organisational measures;
? purpose limitation – pseudonymisation is one of the factors you should take into account when deciding if further processing for a new purpose is compatible with your original purpose. This is also one of the important safeguards for processing personal data for scientific, historical and statistical purposes;
? data protection by design – pseudonymisation is one of the key ways in which you can implement appropriate safeguards for the personal data you process, both at the design stage and throughout any project lifecycle;
? security – pseudonymisation is referenced as one of the ‘appropriate technical and organisational measures’ in both the security principle and the specific provisions on security of processing;
? personal data breach notifications – pseudonymisation techniques can reduce the risk of harm to individuals that may arise from personal data breaches. This will assist you in assessing when you need to notify individuals (both anonymisation and pseudonymisation techniques have application here); and
? individual rights – employing pseudonymisation techniques may reduce the amount of data you have to consider when responding to requests from individuals.
How do we proceed?
Unfortunately, there is no one-size-fits-all approach to anonymization/pseudonymization for organizations seeking to comply with EU data protection standards. There are a host of concrete options — and clear arguments — that our organization can use to get value out of their data while ensuring it remains protected.
?
COO en NQA Colombia |CERTIFICAMOS sistemas de gestión solventes| ISO27001/ 37001 /22301 /9001 / 14001 /45001/ 22001 + | Candidato a PhD en IA Ciberseguridad RGPD | DPO | INCIBE EMPRENDE
1 年Thanks for sharing Mauro!