Is data anonymous when we remove the personal identifiers?
This is a question we get all the time and requires some knowledge on the legal-tech side.
Anonymisation is a critical piece of the digital health scenario and is really hard to achieve.?
Sharing health data for the purposes of data analysis, product improvements, and research can have huge benefits. The question is how to do so in a way that protects individual privacy but still ensures that the data is of sufficient quality and that the analytics are useful.
When personal identifiers are removed from data, they can be considered de-identified data, but they may not always be fully anonymous.
It's important to note that the level of anonymisation necessary will depend on the sensitivity of the data, the purpose for which it will be used, and the relevant legal and ethical requirements.?
Why it is so important??
Because if the data is not anonymous, you will need some legal basis (like consent) to share or use it.
What are personal identifiers?
Personal Identifiers (PID) are a subset of personal data which identify an individual and can permit another person to “assume” that individual's identity, such as:
Data is pseudonymised (or de-identified) when it doesn’t contain explicit personal data but only unique references to it. Pseudonymisation is a good and relatively easy-to-manage security technique to make sensitive health data less explicit while still linking it to a physical subject.?
Remember that pseudonymised data are still personal data according to the GDPR.
What is anonymisation?
Just briefly, anonymization is a process of removing personal data and then treating the remaining data to remove indirect identifiers. There must be nothing, 0 information, that links back data to a patient.
Anonymization is relevant when health data is used for secondary purposes. Secondary purposes are generally understood to be purposes that are not related to providing patient care. So, things such as research, healthcare, and marketing would be considered secondary purposes.
The standard techniques to perform anonymisation include:
?? Generalisation
?? Swapping
?? Perturbation
?? Aggregation
Keep in mind that anonymous data is not covered by GDPR.
In healthcare, anonymization allows for the sharing of health information when it’s not mandated or practical to obtain consent and when the sharing is discretionary, and the data controller doesn’t want to share that data.
It is very hard to achieve full anonymisation. The point is this: for this data to become anonymous, in the beginning, it was either personal or pseudonymised data. This part of the data was still under your responsibility (as a data controller).?
Turning pseudonymised data into anonymous is still considered a processing activity. Here, you still have obligations under the GDPR: this means that you need to ask for consent or a proper legal basis.
To guarantee that the data is anonymous, you have to really guarantee that it can’t be re-identified with any other publicly available dataset.
If you are interested in learning more about this, we talked about the legal basis for anonymisation here!
What is de-identified data?
De-identified data refers to information that has had personal identifiers removed, such as names, addresses, and Social Security numbers. However, even after removing personal identifiers, it may still be possible to re-identify individuals through the remaining data, especially if it contains other sensitive information or is combined with other datasets. This is known as re-identification risk.
While the de-identification and anonymisation process both look to remove key identifiers from data, they take different approaches that result in differing outcomes.
De-identification is an important capability. It looks at a single item and removes sensitive information, such as the person’s name or social security number, so outsiders can’t tell who it is. What’s considered sensitive depends on the use case. In the case of clinical trials, it could be a patient’s current health information or medical history.
领英推荐
De-identification involves protecting fields covering things like demographics and individuals’ socioeconomic information. This can be useful if you are training an ML model.?
So, is de-identified data anonymous?
As we can understand, de-identification is part of the anonymization process. BUT it is not anonymisation according to the GDPR (although useful for data minimisation).?
In fact, de-identification only sometimes tends to anonymise successfully because there are so many sources if data in the world outside (and they still have information that can help to re-identify them.) Thus, re-identification is still possible. This is not anonymisation because it is still pseudonymous.
On the other hand, with anonymisation is not possible to do so, no matter what other information you have in your hand!?
So, if you have the idea to perform a de-identification process on your data set, remember that you will need consent (or any other legal basis) to process those data!
What about aggregated data? Are they anonymous?
The answer is: technically, yes. Let’s see why:
??Aggregated data can be anonymous, but it depends on the level of aggregation and the data used. Aggregated data refers to data that has been combined or summarised from individual-level data to provide insights at a higher level.
?? If the level of aggregation is high enough and there is enough variability in the data, it can be difficult or even impossible to identify individuals from the aggregated data. On the other hand, if the level of aggregation is low or there are only a few data points, it may be possible to re-identify individuals.
??In addition, even if the aggregated data is anonymous, there may still be privacy concerns if the data is sensitive.?
For these reasons, we suggest you conduct an anonymisation assessment to protect yourself.
Use case: Is it better to de-identify or anonymise data for clinical trials??
Under safe harbor methods, companies and hospitals must remove a host of potential identifiers, including names, email addresses, IP addresses, social security numbers, patient IDs, and biometric identifiers.?
Expert determination, meanwhile, requires the evaluation of de-identifying techniques by someone with knowledge and experience in this area to verify that the overall risk of re-identification is small. In the case of GDPR, data anonymisation is required to ensure that an individual’s personal data cannot be reconstructed and used.?
In practice, de-identification and anonymisation help clear the way for improved clinical trial speed without sacrificing patient privacy. It’s worth noting, however, that regulatory obligations are a moving target: While HIPAA currently require de-identification, this may change as larger and larger data sets are leveraged to inform new healthcare efforts.?
De-identification is now considered the “base expectation” for data handling in clinical trials. And while it’s possible to strike a balance between privacy and clinical trial procedures using this method, statistical analysis at scale typically leans on anonymization.
However, it is important to ensure that these data are collected and used in a way that protects patient privacy and maintains data security.
Never use anonymisation to avoid consent for data processing: you can’t say, “We don’t need consent” or “We don’t need to comply with GDPR” because you declare that data is anonymous.
Want to know more?
I really hope you enjoyed this content, and I’d love to hear your thoughts in the comments!?
If you want to know more,?go to our blog, contact us, or visit our website?www.chino.io
See you soon!
Jovan Stevovic?- CEO at Chino.io