Methods and challenges of de-identifying data

Methods and challenges of de-identifying data

Data de-identification is the process of stripping data of any personal identifiers. It is a set of practices, algorithms, and tools that are applied to data at varying levels with varying degrees of effectiveness. HIPAA mainstreamed the concept in their Privacy Rule, which primarily deals with the anonymization of Patient Health Information (PHI). HIPAA's commendable efforts have brought much-needed regulation of data and Patient Health Information as of the beacon of privacy regulations across the United States. Data de-identification can be utilized for purposes other than hospital records and patient records. Examples include:

  • Agencies for masking private data per CCPA, CPRA, or GDPR.?
  • Petroleum and mining companies want to de-identify the spatial location of deposits.
  • Environment Protection Agencies wish to de-identify endangered species data.
  • Businesses involved in statistical surveys.
  • Government agencies that wish to make data public.?

It is one of the fastest ways to ensure compliance with the likes of HIPAA and bolster security from a data protection viewpoint. In the digital age with strict privacy regulations, data de-identification is necessary. Let's dive deeper into the sea of de-identification and see how it works.

Methods of De-Identification

HIPAA has chalked out two methods of de-identifying data in Sections 164.514(b) and (c) of the Privacy Rule: Expert Determination and Safe Harbor. [1]

Expert Determination

As the name suggests, Expert determination involves individuals with experience and knowledge who can utilize statistical and scientific principles to minimize re-identification. Using the services of an expert can ensure that the anticipated recipient could identify the individual. However, finding an expert can be expensive.?

Safe Harbor

No alt text provided for this image

Safe Harbor requires the removal of 18 types of identifiers to assure there is no chance of residual information leakage. The 18 identifiers are:?

  1. Names
  2. All geographic subdivisions smaller than a State
  3. All elements of dates (except year)
  4. Telephone numbers
  5. Fax numbers
  6. Email addresses
  7. Social security numbers
  8. Medical record numbers
  9. Health plan beneficiary numbers
  10. Account numbers
  11. Certificate/license numbers
  12. Vehicle identifiers and serial numbers, including license plate numbers
  13. Device identifiers and serial numbers
  14. Web Universal Resource Locators (URLs)
  15. Internet Protocol (IP) address numbers
  16. Biometric identifiers, including finger and voice prints
  17. Full face photographic images and any comparable images
  18. Any other unique identifying number, characteristic, or code

This method is one of the most cost-effective ways of protecting user data. However, it is unsuitable for all use cases and may lead to information loss. Experts on top of Safe Harbor often utilize data masking techniques such as Generalization and Randomization. Let's see how de-identification can be achieved through them:

De-Identification through Generalization

Generalization refers to grouping or generalizing data in the context of k-Anonymity. [2] The technique involves "hiding an individual's identity in plain sight" If the data is gathered from individuals who share similar characteristics, it won't be easy to link the data back to a specific person. Organizing the identifiable data into a larger group eliminates identifying information that can be derived from it. Generalization can reduce the redacting of data, preventing information loss while securing the integrity of the data at the same time.?

De-Identification through Randomization?

De-identification can also be achieved through Randomization. In this technique, data is randomized so that any leakage of personal information is prevented. Randomization is performed through Differential Privacy - incorporating random noise in the data to make it imprecise and difficult to breach. You can then utilize the data for proper statistical analysis without exposure to personal information. Technology giants such as Facebook, Amazon, and Apple are already using differential privacy to anonymize and de-identify data.

Drawbacks

Although data de-identification is necessary, it can still possess some severe privacy risks if not done correctly. In 2006, AOL, one of 90s Internet's most famous companies, published a set of search log data on its subscribers which did not contain any personally identifiable data. Yet, a New York Times reporter de-anonymized and correctly identified users and their searches. [3]. In the same year,? Netflix, probably the world's favorite streaming platform, used to sell DVDs. It released over 100 million movie ratings by 500,000 subscribers to its online DVD rental service. Of course, the dataset was anonymized, but still, researchers used information from other movie review platforms to trace backward and match names to the profiles and their online behavior. [4]. These instances are a stark reminder that companies should conduct their due- diligence and perform a risk assessment before releasing any data online, even if it is de-identified to prevent any re-identification.?

Conclusion

Real-time data analysis is the fuel that modern organizations run their operations on. Data de-identification preserves data integrity, confidentiality, and privacy while still allowing it to be used to gain insights. Although the process of de-identification can be intimidating, you can address some of the complexities by utilizing automated tools and experts/institutions that can provide counsel. In any case, any data that is made public must be de-identified and anonymized.

Mitch N.

Founder and Managing Partner | Comprehensive Solutions for Growth

3 年

Below are my references cited in the article: [1]: Guidance Regarding Methods for De-identification of Protected Health Information,? https://bit.ly/3EuJfVK [2]: K-Anonymity's process for protecting the data of its users,? https://bit.ly/3dkaRAO [3]: A Face Is Exposed for AOL Searcher No. 4417749,? https://nyti.ms/3rFS23w [4]: Who’s Watching? De-anonymization of Netflix Reviews using Amazon Reviews, ?https://bit.ly/31wB2BE

回复

要查看或添加评论,请登录

Mitch N.的更多文章

  • 3PL: The $3M E-Commerce Architect

    3PL: The $3M E-Commerce Architect

    Modern e-commerce often portrays success as a smooth journey driven by innovative products, brilliant marketing…

  • VC Shift: Precision Over Proliferation

    VC Shift: Precision Over Proliferation

    Startup ecosystems traditionally associate venture capital with explosive growth. Yet, it also represents a challenging…

  • Hidden Risks in Leadership

    Hidden Risks in Leadership

    There is nothing better than starting something new or leading a company. However, here’s a twist – while your ship…

    1 条评论
  • Giants' Echoes: Startup Guidance

    Giants' Echoes: Startup Guidance

    Startups are at the forefront of innovation in the global business environment, as ideas travel at breakneck speed…

  • Seoul to Nasdaq Illuminated: Beyond Borders

    Seoul to Nasdaq Illuminated: Beyond Borders

    There is no doubt that Seoul's technology landscape is fertile ground for innovation despite its ambitions and…

  • Korean Tech: Go Global or Stay Local?

    Korean Tech: Go Global or Stay Local?

    South Korea's tech brilliance lights up the local markets, but in the global arena, our luster dims. We're content with…

    5 条评论
  • Temu’s Rise: Shifting E-Commerce Tides

    Temu’s Rise: Shifting E-Commerce Tides

    The Dawning of a New E-Commerce Epoch bringga.com - Digital Marketing, done right In the sprawling mosaic of global…

  • OpenAI's Dilemma: Power and Paradox

    OpenAI's Dilemma: Power and Paradox

    I'm sure you begin any day with a handful of headlines swarming around advancements in artificial intelligence and…

  • Once Upon a Unicorn: Stardom's Facade

    Once Upon a Unicorn: Stardom's Facade

    Unicorn start-ups - those captivating chimera valued at over a billion dollars - are perceived as the immortals in an…

  • Modern Marketing Circus: Beyond the One Trick

    Modern Marketing Circus: Beyond the One Trick

    Remember when a high-octane jingle or a memorable slogan had the power to skyrocket a brand? Ah, nostalgia, you bring…

社区洞察

其他会员也浏览了