The Personal Data Primer
How others profit from your data- and how you can too
Most of us are coming around to the idea that we need a better grasp on how our personal data actually drives the business of the Internet. We tend to consternate over the idea of personal data privacy, while simultaneously sharing loads of information about ourselves across a wide variety of online platforms, from social media to web browsers. We share email and calendar contents, opinions on issues and products, reading and shopping habits. We even share our physical locations 24/7 simply by carrying a mobile phone. What choice do we have if we want to keep up with our friends on Facebook, find the best deal on a new pair of sunglasses, and have all the conveniences of constant Internet access? Most of us pragmatists want to continue using nifty “free” apps, but as we charge into the era of automated assistants delving into more and more of our lives, we simply want to understand what we’re really signing up for. Who is gathering and using my personal data, to what end ($), and how can I exert some control over that process or get a cut of the value my data generates?
What do we mean by “personal data”?
For the purposes of this discussion, personal data is defined as the information we share and create about ourselves, intentionally or automatically, through our use of digital platforms (i.e. social media sites, web browsers, mobile apps, etc.) and traditional platforms that have a digital link (e.g. our credit cards). This information falls into two broad categories: data about you (profile information such as age, zip code, gender, job title) and data about your activities and interests generated by your online activity, also referred to as “digital exhaust.” Think, for instance, of a list of all online shopping sites you visited in the last month or a record of your recent ride-sharing trips. This information is valuable to businesses because it helps them understand who you are and who you could be as a consumer (of goods, services, media etc.) based on your exhibited interests and behaviors. What the definition does NOT include is personally-identifiable details that are understandably sensitive and not quite as commercially relevant, such as contact information and social security numbers.
What’s going on behind the scenes?
Our data drives a lot of behind-the-scenes commerce, and there is a growing awareness that our digital activities are being tracked with increasing sophistication. We notice, for instance, that Facebook places ads on our page that are targeted based on our posted content and likes, even related to our behavior on other sites. We see special offers on our bank page next to our credit card purchases and suspect the bank must partner with advertisers targeting those offers to us. The more we learn, the less we focus on the idea of privacy (this is not about someone trying to determine who you had lunch with on Tuesday but really about where you shopped this week and what else you or people like you could be convinced to buy). But we get rightfully agitated as we realize there is an enormous moneymaking engine attached to all the data we share and create that is mostly shielded from our view. What’s more, even when we do know how our data is being monetized, we worry that we have no effective leverage to assert ourselves as the valuable creators of all that data.
These players fall into a few broad categories including:
1) Those that spy on you online and track your activities without your knowledge or permission. They collect bits of your browsing history, make an educated guess at who you are and lump that data in with other searchable data to label you as a certain type of customer. They then sell all that data to marketing companies. These are data brokers, whose activity generates over $15 billion a year. They share exactly zero of that money with you.
2) Those that offer you a free service and then use what they learn about you to make money. By using these services, which include social networks, mobile check-in apps and free email accounts, you make these sites/apps the central warehouses for large amounts of data about you. Some use the data to sell advertising targeting you with relevant ads/offers; others sell the data to market researchers and other data buyers. They fuel a market valued at over $200 billion a year. Again, though your data drives this, you get none of that value beyond free use of the service.
3) Those that just want to sell you something. Merchants want to learn more about people to help tailor products and find new customers. This is one reason why grocery and drug stores offer loyalty cards that record your purchases. If they know people like you buy certain items, they can target people like you with offers on similar items. Because it can be difficult for many merchants to collect accurate data directly from customers, though, they still spend big to buy data from the two groups above.
Internet-user beware?
Are any of these actors up front with us? OK, we typically agreed to the terms of service (TOS) when we installed an app or signed up for a service, but who is able to decipher all that vague legalese? Internet services, while often “free,” are take-it-or-leave-it propositions rife with ambiguity over what they do or may do with the data our online actions generate. Those TOSs also change regularly, often with very little explanation as to why. Unless you can organize millions of users to demand different terms, you are unlikely to influence the situation. If we have no influence, how can we control how our data is used or how we can use it to our benefit in ways beyond simply getting free services and seeing more targeted ads (because, let’s be honest, targeted ads are much more useful for the advertiser than the individual targeted). If this data is so valuable, can we make money from our own data? Or can we put it to some other form of good use in ways that we choose?
Where can we turn?
There is a growing set of options available to help individual data creators take greater control over their digital lives. One interesting effort to increase transparency on how apps treat your data is called Terms of Service: Didn’t Read, which attempts to rate TOSs for popular apps (e.g. Google, Facebook, Twitter) on specific privacy or data usage features (though it lacks clear explanations of the ways those services make money from your data). Ghostery, an app that identifies and exposes the trackers following you around the web, gives a very clear, up-front explanation about how they make money from users of their free app. If only every service would just explain simply and clearly how they make money from your data! Ghostery even allows you to opt in or out of allowing them to collect information about your web browsing experience and trackers you encounter (which is how they get paid). But what if we want to go beyond greater transparency and make some money from our own data?
The you-centric personal data agent
If we can’t force large platforms like our social media, search engines, banks, email providers etc. to share more of the spoils of our data with us, what are we to do? One of the beauties of working with data as an asset is that it is easy to duplicate and deploy in many different ways at once. As long as you can get a copy of the data you have built up on service platforms (your bank site, social media app etc.) you can use it in other applications at the same time. You are unlikely to successfully create your own targeted ad network using a copy your social data from Facebook or your credit card transaction list (shopping history) from your bank. But that data has value in other ways.
Datacoup helps people get value from their data. The driving belief is that individuals should have an easy way (connecting existing data from multiple platforms in a few clicks and keystrokes) to use their data as they see fit, including getting paid. The simplest method involves pooling and selling anonymized data and sharing the proceeds (cash payment, a donation to a favorite charity etc.) with the members of the group. The data is gathered, for instance, into an anonymized customer data report (e.g. males aged 25-35 in the southeast region spent 22% more this month than last month at Kohls; online searches for Nike shoes among the same group declined 10%, etc.) In that case your data is valuable even without any connection back to you as an individual. You get the network value of leveraging your data together with many others, but separate from a tightly controlled platform that isn’t sharing proceeds with you. And that combination of data from multiple sites is more valuable because it is complete, accurate, and comes with your overt consent.
Your data is also valuable to individual merchants assuming they have an easy way of compensating you to share it (i.e. a trusted brand offering a special discount for sharing some shopping data about your recent purchases at other stores). Sharing your shopping history helps them learn about the behavior and preferences of their customers. Since they buy variations of this type of data from third parties anyway, why not just pay you directly for it and cut out the middleman?
None of this stops social media platforms, banks, search engines, mobile check-in services etc. from continuing to cash in on the same data (which is still on their servers), but it leverages that data for the direct benefit of the data creator (you), allowing the individual to grab a greater share of the value their data represents. It creates a parallel “you-centric” method of squeezing value from your data. Once that data is connected and under control of the individual, there is a whole world of ways it can be put to use.
The bigger picture
It should be no surprise that companies try to ratchet up the amount of data their services collect, or go out and purchase data, as it increases the value of their operations and provides competitive advantages. IBM buys companies with access to healthcare data (like costs and treatment options), among others, in order to increase the value of its Watson offering. If Watson has access to valuable data sets no other artificial intelligence tool can search, the resulting advice is likely to be superior and to command a higher price. What if all the patients whose data was used to power Watson had copies of their own treatment data and pooled that data to sell to other service providers? What if they simply decided to donate that data to researchers to speed the cure of certain diseases? Those are the types of collective actions services like Datacoup are designed to allow. To be fair to IBM, they also contribute health data to research efforts, but tipping the scale of power over large data sets to diffuse groups of people allows for more ways the same types of data could be deployed. The greater extent to which data is open to more people in effective ways, the greater the opportunity for innovation in ways that data is used, whether for the profit of the data creators themselves, or for the general good. Companies have proven adept at coming up with new and interesting ways to allow for the generation and capture of data, either as a primary or ancillary effect of the service they provide. Technology is starting to allow for an explosion of ways that data can be used and people who can benefit.
There are other interesting projects in the works to give individuals more control over their digital lives. Tim Berners-Lee, the inventor of the World Wide Web, is developing technology to make individual control of social media data easier. His project, Solid, will allow people to choose where their data resides and make it easy to switch social applications without risk of the data getting trapped in a particular platform. Another effort, called Blockstack, will negate the need to set up separate accounts for every app people use. It allows for the creation of profiles that interact with external services, automatically identifying the individual and allowing that person to shut off the connection whenever they choose. These types of efforts should enable new innovations in the ways people use their data, independent of the restrictions imposed by today’s favorite apps.
The near future
As we enter the era of the digital personal assistant, we are creating and centralizing more and more data every time we interact with Siri, Alexa, Cortana or the Google assistant. As those technologies learn more about us through our email and calendar entries, text messages, search queries, music and TV selections, purchasing histories etc. they will become increasingly tailored and responsive. They will also generate large profits for the companies that make them, as that data will increasingly drive commercial activity. We will also want to deploy that trove of data in ways we choose, as data agents continue offering new ways for us to use that data to our benefit. Maybe we’ll simply swipe left or right on requests to permission specific mixes of our personal data to help city planning or medical discoveries, or to collect a fee for sharing it with a market researcher. In any event, the more options that we have, as individuals, to channel and deploy the data we create, the more innovation will follow and the more we will benefit.