登录查看更多内容

D = For data

Willem K.

Practice Lead Auto |Q Noord | Community Enthusiast | Testpeditionist|

发布日期: 2025年2月18日

D = For Data

Let’s face it we need data for everything what we do as a tester. And if this data holds value to us, we consider it information. In a time of extensive data collection and increasing concern for privacy. Data is a hot topic.

As automation engineers we work with data on a daily base, we;

-????????????? Generate test data

-????????????? Anonymize large datasets

-????????????? Ensure data is protected

Types of data If you would ask me to explain the difference between synthetic, masked and anonymous test data I probably couldn’t give you a straight answer right away. So, examining this is of great value to me as well.

Let’s see what we can find:

Personal data[1]

Data by itself is not inherently personal. But when it either directly or indirectly refers or relates to an individual it becomes personal data. So, an entry of a birthday by itself is not personal data, but when it’s a combination of fields like; “07-07-1977” and the name “John Doe” it becomes personal. So what about a common name? Like John Smith in the US, or Piet de Vries in the Netherlands. Is this considered personal data? It is if additional context narrows it down. For example, Piet de Vries living on the Herengracht 44 is personal data.

Anonymous data[2]

So what is anonymous data? According to the European Union’s data protection laws, in particular the General Data Protection Regulation (GDPR) , anonymous data is “information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable”

?Let’s break this down; “information which does not relate to an identified or identifiable natural person” In this context

identified means: Someone who can be directly identified with data.

Identifiable means someone who can be recognized indirectly from the available data. ? Identified: - Name - Data of Birth - Social security number

Identifiable: - Ip Address - Mac address - Geolocation

领英推荐

How do you ensure that data subjects are informed and…

Anil Patil ??"PrivacY ProdigY"?? 7 个月前

How do you manage data transfers and sharing across…

Anil Patil ??"PrivacY ProdigY"?? 8 个月前

EDPB Strategy 2024-2027: Setting the course for the…

European Data Protection Board 11 个月前

Now for the 2nd part: Personal data rendered anonymous in such a matter that the data subject is no longer identifiable. So when anonymizing testdata it’s important that subject is no longer identifiable. So if you anonymize all personal data but forget the identifiable data. It’s still possible to identify someone. Also don’t forget the other way data can be used to draw conclusions: “You work at the traffic fine department as a tester. All personal data has been anonymized, street addresses are static. however, the amount of traffic fines is not altered” When looking at such situations you can still identify that your next-door neighbor received 5 traffic fines in a 3 month time. Even though he is known in the test data by a different name and probably also drives a different type of car.?

Synthetic data

So next up what is synthetic data? Again, the European Data protection supervisor helped us with wisdom[3] In my opinion a very clear definition: “Synthetic data is artificial data that is generated from original data and a model that is trained to reproduce the characteristics and structure of the original data.” The data is generated by using existing data, but So does this mean synthetic data guarantees anonymity? Well…[4] According to this very clear article written by Marina Anagnostaki it really depends on the feasibility and possibility of re-identification.

Pseudonymous data[5]

So what is pseudonymous data? And how does it differ from synthetic data? The main difference is that pseudonymous data is still derived from real data and has a higher risk of re-identification. So, for testing synthetic data would be the weapon of choice.

Use cases where test data matters most

Large amounts of test-data are usually required for the more intensive tests such as E2E or performance test. Also to test database migrations.

?Conclusion For testing, synthetic data is your friend. As a tester you work risk based. Working with sensitive data is a way of introducing additional risks. ?To get the data which matches the requirements for your test scenario’s it’s best to discuss this with a (senior) data engineer who can synthesize the data for you. In addition there are several tools available such as DATPROF - Test Data Simplified who offer out of the box solutions.

[1] https://geo-data-support.sites.uu.nl/personal-data/personal-vs-anonymous-data/

[2] https://www.edps.europa.eu/system/files/2021-04/21-04-27_aepd-edps_anonymisation_en_5.pdf

[3] https://www.edps.europa.eu/press-publications/publications/techsonar/synthetic-data_en

[4] https://www.datenschutz-notizen.de/synthetic-data-anonymized-data-or-pseudonymized-data-3541386/

[5] https://www.dataprotection.ie/en/dpc-guidance/anonymisation-pseudonymisation

要查看或添加评论，请登录

Willem K.的更多文章

F = For full stack automation engineer analyst

2025年3月18日

F = For full stack automation engineer analyst

In my current role as a practice lead, I spot opportunities and assignments for my team. But lately I’ve been spotting…

7 条评论
E = For exploratory Automation

2025年3月13日

E = For exploratory Automation

E = For exploratory testing (in automation) This week I will describe exploratory testing, but with an automated twist.…
C = For CI/CD &CD

2025年2月4日

C = For CI/CD &CD

C = For CI/CD You might have expected me to talk extensively about Cucumber after handling BDD in the previous week…
B = For BDD

2025年1月23日

B = For BDD

Welcome back to the #AutomationABC. This week is the week for the letter B and this time I chose a topic that is close…

4 条评论
A = For Automation

2025年1月14日

A = For Automation

Welcome to the #AutomationABC In this weekly series I will explore topics regarding automation & testing. Each week a…

3 条评论

See all articles

D = For data

Willem K.

Practice Lead Auto |Q Noord | Community Enthusiast | Testpeditionist|

领英推荐

Willem K.的更多文章

社区洞察

其他会员也浏览了

Storage Limitation Principle

An Analysis of the Indian Digital Data Protection Bill, 2022

NDPA, NDPB, NDPC, NDPR - Nigeria does seem to be taking Data Protection more serious in 2025.

Data Protection Law’s Evolution and DPDP Act of India

The Digital Personal Data Protection Bill of India: A Step Forward for Data Protection in India

Decoding India's Draft Digital Personal Data Protection Rules

DPDP Draft Rules 2025: Essential Insights for Businesses and Data Fiduciaries

?? ?? Decoding India's Personal Data Protection Bill 2023 ????

A Review of the Nigerian Data Protection Act

CJEU data protection updates - Mar/Apr 2022

领英推荐

Willem K.的更多文章

F = For full stack automation engineer analyst

E = For exploratory Automation

C = For CI/CD &CD

B = For BDD

A = For Automation

社区洞察

其他会员也浏览了

Storage Limitation Principle

An Analysis of the Indian Digital Data Protection Bill, 2022

NDPA, NDPB, NDPC, NDPR - Nigeria does seem to be taking Data Protection more serious in 2025.

Data Protection Law’s Evolution and DPDP Act of India

The Digital Personal Data Protection Bill of India: A Step Forward for Data Protection in India

Decoding India's Draft Digital Personal Data Protection Rules

DPDP Draft Rules 2025: Essential Insights for Businesses and Data Fiduciaries

?? ?? Decoding India's Personal Data Protection Bill 2023 ????

A Review of the Nigerian Data Protection Act

CJEU data protection updates - Mar/Apr 2022