Privacy Policies: The Forest and the Trees
"Privacy Policies: The Forest and the Trees"

Privacy Policies: The Forest and the Trees

One of the advantages of building a large scale privacy policy automated annotation engine (we described earlier [?https://lnkd.in/geDMuqHN?]) is that it allows us to finally understand industry wide trends. Using our collection/annotation engine we performed a study of roughly 3K organizations (Russell 3000 companies). We extracted structured annotations from the privacy policies of each organization.

The results are eye-opening. Recall that we are looking at 4 areas: types of data, purpose of collection, data protection mechanisms, and user rights.

The first thing that stands out is the sheer lack of any consistency. Other than the fact that we can generally identify those 4 areas in a policy, there are all kinds of oddities everywhere. More than half of the companies collect data on 13 or more different categories (a category is for example "contact information" or "location data") which is quite a broad set of collections. Companies generally do a good job of this first step - "lets put down data that we collect", but everything that comes after that is where it all turns to chaos.

Some interesting observations from our data:

1: While almost all companies claim they may use data collection for monitoring or improving services, only 26% mention that data might be shared with third parties. But we also know that almost all companies use analytics/pixel trackers so clearly there is an industry wide gap here in whether or not this constitutes third party sharing and the nature of analytics/pixel tracker data.

2: While close to 60% of the companies mention that data is retained for a limited amount of time, only 10% actually mention the actual duration. Usually, it is along the lines of "we only keep data as long as necessary...".

3: Similarly, 60% of the companies mention generic data protection mechanisms (mostly they use something like "we use industry best practices...").

These are just a few of the findings. Data such as this is the starting point of a meaningful conversation on what a privacy policy should be. Some are clearly going to be written for legal compliance, while others will be written to be actually helpful. Starting with annotated data makes any problem a little bit better.

References:

[1] Analyzing Corporate Privacy Policies using AI Chatbots Z. Huang, J. Tang, M. Karir, M. Liu, A. Sarabi, In Proceedings of ACM Internet Measurement Conference, November 2024



Shiv Kumawat

Tech Entrepreneur & Visionary | CEO, Eoxys IT Solution | Co-Founder, OX hire -Hiring And Jobs

5 个月

Manish, thanks for sharing!

回复
Rakhul Karthick

I Share Tools & Strategies To Balance Work, Life & Side Hustles | Transforming Mercedes-benz @ 9-5 pm

6 个月

Intriguing insights! Structured data extraction sheds light on data privacy practices. Consistency seems lacking. Manish Karir

回复

要查看或添加评论,请登录

Manish Karir的更多文章

  • The Empty Calories of Perfectly Passive Privacy Policies

    The Empty Calories of Perfectly Passive Privacy Policies

    We we have written before [1][2][3] about our unique approach towards large scale privacy policy data collection and…

    1 条评论
  • Enterprise Cyber Self Insurance

    Enterprise Cyber Self Insurance

    Self insurance occurs when an organization sets aside some of its own assets to cover any potential losses due to…

    2 条评论
  • Scoring the Un-Scoreables in Cyber Risk

    Scoring the Un-Scoreables in Cyber Risk

    Similar to credit risk scores, cyber risk ratings and scores also have a population pool that they work best for. This…

  • Cyber Risk Assessments - Level 1 vs Level 2

    Cyber Risk Assessments - Level 1 vs Level 2

    There are a wide range of similar sounding cyber risk assessment solutions in the market that offer a varying range of…

    1 条评论
  • Insurance for Cyber v/s Cyber Insurance

    Insurance for Cyber v/s Cyber Insurance

    With the short term impact and response to the CrowdStrike event behind us, it is now probably a good time to talk…

    12 条评论
  • The World of Web Analytics and Pixel Trackers

    The World of Web Analytics and Pixel Trackers

    Web analytics in general and pixel trackers in particular are widely used by companies big and small to better…

    3 条评论
  • The Ring 0 Vendor Risk Model

    The Ring 0 Vendor Risk Model

    In vendor risk management, the tiered model has often been used as a way to classify, and organize the various vendor…

  • Why current data breach loss costs might be under estimating true costs

    Why current data breach loss costs might be under estimating true costs

    The underlying assumption in current data breach cost estimation methods based on per record/type of record approach…

  • The ESG of Cyber

    The ESG of Cyber

    We have previously discussed the case for including cybersecurity related behaviors of a company as a component in the…

    2 条评论
  • Cyber Risk Portfolio Diversification

    Cyber Risk Portfolio Diversification

    Portfolio cyber risk aggregation is the unavoidable result of cyber risk management decisions that are made in…

社区洞察

其他会员也浏览了