Privacy Policies: The Forest and the Trees
One of the advantages of building a large scale privacy policy automated annotation engine (we described earlier [?https://lnkd.in/geDMuqHN?]) is that it allows us to finally understand industry wide trends. Using our collection/annotation engine we performed a study of roughly 3K organizations (Russell 3000 companies). We extracted structured annotations from the privacy policies of each organization.
The results are eye-opening. Recall that we are looking at 4 areas: types of data, purpose of collection, data protection mechanisms, and user rights.
The first thing that stands out is the sheer lack of any consistency. Other than the fact that we can generally identify those 4 areas in a policy, there are all kinds of oddities everywhere. More than half of the companies collect data on 13 or more different categories (a category is for example "contact information" or "location data") which is quite a broad set of collections. Companies generally do a good job of this first step - "lets put down data that we collect", but everything that comes after that is where it all turns to chaos.
Some interesting observations from our data:
1: While almost all companies claim they may use data collection for monitoring or improving services, only 26% mention that data might be shared with third parties. But we also know that almost all companies use analytics/pixel trackers so clearly there is an industry wide gap here in whether or not this constitutes third party sharing and the nature of analytics/pixel tracker data.
2: While close to 60% of the companies mention that data is retained for a limited amount of time, only 10% actually mention the actual duration. Usually, it is along the lines of "we only keep data as long as necessary...".
领英推荐
3: Similarly, 60% of the companies mention generic data protection mechanisms (mostly they use something like "we use industry best practices...").
These are just a few of the findings. Data such as this is the starting point of a meaningful conversation on what a privacy policy should be. Some are clearly going to be written for legal compliance, while others will be written to be actually helpful. Starting with annotated data makes any problem a little bit better.
References:
[1] Analyzing Corporate Privacy Policies using AI Chatbots Z. Huang, J. Tang, M. Karir, M. Liu, A. Sarabi, In Proceedings of ACM Internet Measurement Conference, November 2024
Tech Entrepreneur & Visionary | CEO, Eoxys IT Solution | Co-Founder, OX hire -Hiring And Jobs
5 个月Manish, thanks for sharing!
I Share Tools & Strategies To Balance Work, Life & Side Hustles | Transforming Mercedes-benz @ 9-5 pm
6 个月Intriguing insights! Structured data extraction sheds light on data privacy practices. Consistency seems lacking. Manish Karir