登录查看更多内容

K-Anonymity's process for protecting the data of its users

Mitch N.

Founder and Managing Partner | Comprehensive Solutions for Growth

发布日期: 2021年11月17日

In the wake of the Facebook-Meta announcement, discussions have intensified around the need for reimagined user data protection regulations and what users can do to protect their data [1]. On the other hand, it is equally essential that user data is handled responsibly within an organization. Encrypting sensitive information is one of the primary responsibilities of data owners to protect the data's integrity and the users' trust. Anonymizing data is not as simple as redacting or randomizing it. Multiple attributes can help identify data even if it cannot be uniquely identified on its own. Anonymized data can be re-identified by combining jigsaw puzzle pieces to complete the picture, i.e., finding common attributes between datasets to develop more detailed knowledge. One can, for example, obtain medical health information or voter registration information by knowing the postal code, date of birth, and gender.

Businesses must, therefore, first understand how data is anonymized before formulating a data protection strategy and mitigating the risk of data breaches. The k-Anonymity model is one of the most widely used techniques for anonymizing data.

How the k-Anonymity model works

K-Anonymity was first proposed in 1998 and refined in 2002 [2]. The technique involves "hiding the identity of an individual in plain sight" or "hiding the identity of an individual in a crowd." If the data is gathered from individuals who share similar characteristics, it won't be easy to link the data back to a specific person.

The 'k's are similar to the 'x's in algebraic expressions. The K-Anonymity principle is satisfied if at least k-1 individuals have the same characteristics. Imagine a dataset where k is 100, and the data is postal code. If we take any random individual, there will always be 99 others with the same postal code; thus, identifying an individual based on only the postal code value in a k-Anonymous dataset would be impossible. To gain more clarity, let's review the following table.

The table is divided into three parts: Identifiers, Quasi-Identifiers, and Sensitive Data. An identifier is a piece of information that directly identifies a person. Quasi-identifying information that may or may not uniquely identify an individual, but, when combined with other quasi-identifying information, may reveal the individual, is referred to as a Quasi-Identifier. The Identifier is 'Name,' the quasi-identifiers are 'Age,' 'Postal Code,' and 'Gender,' and the sensitive data is 'Disease.'

The table above is k-Anonymous with k-3, which is achieved by redacting certain quasi-identifiers. Notice that the age has been generalized from exact numbers to a group, while the identifiable numbers of the postal codes have been masked. Similarly, all names and some genders are redacted entirely.

Generalization

Generalization refers to grouping or generalizing data in the context of k-Anonymity. Organizing the identifiable data into a larger group eliminates identifying information that can be derived from it. Think of it as increasing the radius. For example, a dataset contains references to Italian cities such as Palermo, Turin, Milan, Rome, and Naples. In such a case, they can be generalized as 'Italy.' Similarly, in the preceding example, specific age data is generalized into age groups to achieve anonymity.

Here is an example of how Generalization can be accomplished.

Ronni K. Gothard Christiansen 4 个月前

Privacy-Preserving Analytics Using Differential…

Devendra Goyal 3 周前

Knock-Knock; It’s me, the data subject

Desmond Israel ESQ 1 年前

Suppression

Suppression is the process of removing all data from a dataset. Data handlers, on the other hand, should be cautious about what data is suppressed. For example, if we need to know which disease is more common in which age group of patients, suppressing the age group would be impractical. Instead, we must suppress datapoint information that is irrelevant to the current study.

To better understand the mechanism behind k-Anonymization, consider another hypothetical dataset of the number of goals scored by youth players in a soccer league.

Here, the dataset is k-Anonymous with k=1, as the (13, M), (13, F), (15, M), (15, F), (14, F), (14, M), and (17, M), (17, F) combinations are only represented once.

However, the table above is k-Anonymous with k-2 as every age-gender pair has at least two rows (13, F), (15, M), (14, F), (17, M). That is, there are at least two rows for each combination of identity-revealing characteristics.

In a soccer academy, we could use the Table 3 dataset to figure out which players scored how many goals since each combination is represented only once. Due to the obfuscation of the data in Table 4, it is difficult to determine the goals accurately.

As k increases, the anonymity of the dataset becomes more robust, and we have at least a 1/k chance of correctly attributing a row to a specific person. As a result, organizations that use a higher level of k-Anonymity in their data protection mechanism can achieve a higher level of data security while minimizing risks.

To Conclude

With businesses relying more and more on collecting data to gain insights, data masking is becoming increasingly important. For example, Google already uses k-Anonymity [3] to protect user data. Meanwhile, other privacy-preserving techniques such as l-diversity, t-closeness, k-Anonymity, and differential privacy are already being incorporated into the larger picture of data masking.

Nevertheless, k-Anonymity, Suppression, and Generalization remain the foundations of more advanced anonymization algorithms and are the most widely used techniques for masking data.

Driving business at scale

1,210 位关注者

Mitch N.

Founder and Managing Partner | Comprehensive Solutions for Growth

3 年

The following are my references cited in the newsletter: [1] Facebook is Meta now - How it will impact data privacy regulations law shorturl.at/eBEK3 [2] k-Anonymity: A model for protecting privacy shorturl.at/dxBLS [3] How Google Anonymised Data shorturl.at/lDM28

要查看或添加评论，请登录

Mitch N.的更多文章

3PL: The $3M E-Commerce Architect

2024年8月22日

3PL: The $3M E-Commerce Architect

Modern e-commerce often portrays success as a smooth journey driven by innovative products, brilliant marketing…
VC Shift: Precision Over Proliferation

2024年5月9日

VC Shift: Precision Over Proliferation

Startup ecosystems traditionally associate venture capital with explosive growth. Yet, it also represents a challenging…
Hidden Risks in Leadership

2024年4月16日

Hidden Risks in Leadership

There is nothing better than starting something new or leading a company. However, here’s a twist – while your ship…

1 条评论
Giants' Echoes: Startup Guidance

2024年3月18日

Giants' Echoes: Startup Guidance

Startups are at the forefront of innovation in the global business environment, as ideas travel at breakneck speed…
Seoul to Nasdaq Illuminated: Beyond Borders

2024年2月13日

Seoul to Nasdaq Illuminated: Beyond Borders

There is no doubt that Seoul's technology landscape is fertile ground for innovation despite its ambitions and…
Korean Tech: Go Global or Stay Local?

2024年1月12日

Korean Tech: Go Global or Stay Local?

South Korea's tech brilliance lights up the local markets, but in the global arena, our luster dims. We're content with…

5 条评论
Temu’s Rise: Shifting E-Commerce Tides

2023年12月18日

Temu’s Rise: Shifting E-Commerce Tides

The Dawning of a New E-Commerce Epoch bringga.com - Digital Marketing, done right In the sprawling mosaic of global…
OpenAI's Dilemma: Power and Paradox

2023年11月24日

OpenAI's Dilemma: Power and Paradox

I'm sure you begin any day with a handful of headlines swarming around advancements in artificial intelligence and…
Once Upon a Unicorn: Stardom's Facade

2023年10月25日

Once Upon a Unicorn: Stardom's Facade

Unicorn start-ups - those captivating chimera valued at over a billion dollars - are perceived as the immortals in an…
Modern Marketing Circus: Beyond the One Trick

2023年9月30日

Modern Marketing Circus: Beyond the One Trick

Remember when a high-octane jingle or a memorable slogan had the power to skyrocket a brand? Ah, nostalgia, you bring…

See all articles

K-Anonymity's process for protecting the data of its users

Mitch N.

Founder and Managing Partner | Comprehensive Solutions for Growth

How the k-Anonymity model works

Generalization

领英推荐

Suppression

To Conclude

Driving business at scale

1,210 位关注者

Mitch N.的更多文章

社区洞察

其他会员也浏览了

Is data anonymous when we remove the personal identifiers?

Your Data, Your Rights: A Guide to Navigating Privacy in the Digital Age

Understanding the Right to Erasure of Personal Data in Rwanda

Do you believe that understanding your company's policies is important or just reading is enough? Where do you even begin?

Methods and challenges of de-identifying data

CJEU data protection updates - Mar/Apr 2022

Just how well anonymized is anonymized data?

Challenges of Data Permissiveness in the Enterprise

To a more effective, and more trustworthy, data-driven 2021

How the k-Anonymity model works

Generalization

领英推荐

Suppression

To Conclude

Driving business at scale

1,210 位关注者

Mitch N.的更多文章

3PL: The $3M E-Commerce Architect

VC Shift: Precision Over Proliferation

Hidden Risks in Leadership

Giants' Echoes: Startup Guidance

Seoul to Nasdaq Illuminated: Beyond Borders

Korean Tech: Go Global or Stay Local?

Temu’s Rise: Shifting E-Commerce Tides

OpenAI's Dilemma: Power and Paradox

Once Upon a Unicorn: Stardom's Facade

Modern Marketing Circus: Beyond the One Trick

社区洞察

其他会员也浏览了

Is data anonymous when we remove the personal identifiers?

Your Data, Your Rights: A Guide to Navigating Privacy in the Digital Age

Understanding the Right to Erasure of Personal Data in Rwanda

Do you believe that understanding your company's policies is important or just reading is enough? Where do you even begin?

Methods and challenges of de-identifying data

CJEU data protection updates - Mar/Apr 2022

Just how well anonymized is anonymized data?

Challenges of Data Permissiveness in the Enterprise

To a more effective, and more trustworthy, data-driven 2021