Data, segmentation and privacy, where's the limit?
Johanna álvarez
Digital Transformation | Attribution Modeling | Web Analytics | International Speaker
What would happen if…? It's one of those questions that live almost permanently in the back of our minds, trying to predict what the consequence of a certain action or situation would be, or, in other words, trying to predict the future.
We are going through a time in which making fairly accurate predictions is pretty close to impossible. That's why in this post I'll be talking about some true truths about data, segmentation and privacy.
The pandemic our society is dealing with has opened a very interesting debate related to the options proposed to limit its spread, which are mainly associated with data-sharing. This makes people think about the effects and consequences that this could have.
In Spain, specifically, this is a reality. A royal decree has been approved, in which the Ministry of Economic Affairs and Digital Transformation is assigned with a task: creating an app that allows its users to get realible information related to COVID, as well as "self-evaluating" themselves if necessary. One of the main features of the app is that it will allow geolocation data to be obtained by the government, although it is not reported whether this will be optional or if it will be established by default with the download of the app.
An additional point here is that mobile phone companies must provide the government with our geolocation data to check who we've been in contact with. This makes it easier to know who we've interacted with in case either them or us get sick.
Reactions to the approach are diverse, but mainly, doubts arise, we question the idea and we wonder what the consequences will be if we give the government free access to our information. We are facing a "what would happen if ...?" scenario in all rules.
I have to admit that I have also asked myself these questions, so I want to share my vision of the limitations that exist when collecting, analyzing and stitching data points to truly understand what can be done with our data today and how certain it is to say that you can "know" a person based on the data you have of him.
Let's begin.
Did you know that in 2018, 96% of searches made online were done through Google? There is no doubt about the amount of information they can collect and actually, if you think about it, today you have probably gone through Google more than once and more than twice to search for something. It has been done by you, me, our family, friends, and possibly everyone with an internet connection.
Now, although searches represent their main source of information, their data collection does not stop there, Google is a tech company that makes available to its users a wide variety of platforms that can be used for free (Youtube, Google Maps , Google Calendar, Google Sheets, Google Slides, Google Drive ... among others) and at the same time, makes available to other companies the entire stack or set of tools necessary to activate their online advertising strategies. The topic does not end here either, since they have another wide variety of products that allow them to obtain information and associate it with your user profile even if there is no regular PC or smartphone involved.
There are two perfect examples for this case:
Nest: allows you to turn your house into a “smart home”. It is a product designed to simplify home automation. It is made up of different pieces and all are interconnected with your Google account.
Chromecast: allows you to convert your regular TV into a smart one. As in the previous case, the session starts with your Google account.
At a glance, these would be the main products owned by Google with which they can collect information (the list is long, if you want to have a look at it, the link will be in the references):
Let's put it plain and simple: all our interactions with these platforms and with all those that are part of the Google product list are continuously stored on their servers and the different information points are joined through unique identifiers associated with our navigation, sometimes in the form of cookies, other times in the form of user identifiers and the vast majority of times, based on a key piece: our email.
Please note that although in this post we are focusing on Google, what was previously mentioned can be extrapolated to other companies such as Facebook and Amazon.
Having said this and with all this context, we can delve into the realm of true truths.
True truth # 1: free does not mean without compensation, in this sense, as I previously mentioned, the products that these companies make available to users are mostly free, but they are not like any free product, they are developed to provide differential value. That is the main reason why every time we use these platforms an exchange is made, in this case we give information about our navigation, our searches.
There is a key idea to make sense of all this situation: "data is the new oil". And the reality is that in recent years the data economy has come to play a fundamental role in the day-to-day life of many companies, becoming the basis for their decision-making, analysis, automation, digital marketing activation, among others.
Basically, we could say that we transfer our data in exchange for using very well thought-out platforms and at the same time Google, Facebook and Amazon use that data to generate and impact us later on with hyper-segmented advertising. With this, they manage not only to pay for the development of the platforms we use, but also to generate benefits for their company. Broadly speaking, we could affirm that it is a win-win for both parties, although at very different scales, of course.
True truth # 2: online advertising is one of the main sources of Google's income (about 83.3% of its income in 2019 came from online advertising) and it is also one of the foundations of the digital strategy of many companies (few companies and in specific sectors do not do online marketing). Once we have internalized this idea, the question to ask is: would I rather know my data is used so that the advertising they show me is aligned with my preferences and interests? or would I rather limit access to my data and know the advertising I see is completely random?
True truth # 3: you decide how much you share since you have the possibility to totally or partially disable data collection through the privacy settings of Google or the websites you browse. If you want more information on how to manage your privacy, we will discuss it in more detail in another post.
Needless to say, it is a totally personal decision, now, if you ask for my opinion, should you do it? My opinion is no and here's the reason why:
Data collection has a very powerful argument: collecting as much useful information about the users who browse the web so that the content, experiences and advertising they see is as personalized to their own preferences as possible.
Not only this, but through this transfer of data and segmented advertising, we manage to access platforms that give us great value without paying (through a monetary value) for it. Without going too far, this post is written in Google Docs, all my presentations are made in shared format through Google Slides and many of my calls are made through Hangouts. I have a Chromecast that allowed me to convert my TV into smart and while I finish this post I'm listening to some great acoustic versions of my favorite artists on Youtube.
Do they exploit my data? Yes. Do I get a benefit in return? Correct. The key to this relationship pivots around three concepts: consent, freedom of decision and information.
True truth # 4: information does not mean knowledge, if you were to keep an idea from this post, I kindly ask you to keep this one. Although companies like Google, Facebook, Amazon and also all the other companies that have begun their journey of digital transformation to become data-driven collect information about us, we can confidently say there's an important gap that separates that information from true knowledge.
And probably, at this point, you will be thinking that with the amount of information that is collected, they could get to know more about us than ourselves. But the reality is different, this is where the limitations begin.
In 2020, a person had an average of 6.67 connected devices, that list includes mobiles, tablets, computers, TVs, smartphones and other IoT devices.
In addition, we usually have an average of 2 emails: one personal and the other for work-related topics.
This, added to the vast volume of data that is generated and managed every day online, makes it extremely difficult to get a 100% accurate assignation of each interaction to its real owner. Let me put it this way: in an ideal scenario, to ensure that each and every one of the interactions were deterministically and 100% reliably attributable to each "owner", it would be necessary for users to have a unique identifier available in all and each of your online actions and devices.
The reality is that this unique and universal identifier, at least today, does not exist. The closest thing could be email, but it is still not a 100% reliable method (taking into account what I mentioned before about us having an average of 2 emails). Let's see it with an example:
In the example, a user searches in Google Chrome from 3 different devices with 2 different emails and in one of the cases without email. In the Google databases, unless they base their assignation on a probabilistic model to link the information points, 3 different user profiles will be created.
Please allow me create a bit of a challenge here. It is often heard in typical hallway conversations that the methodology used by Google to cross data is deterministic and, to a large extent, I agree. They have such a large amount of data and interactions generally associated with emails or Google IDs that the level of traceability is extremely high.
But what about those cases in which they are not collecting enough first-party data on their digital assets to say for sure that that touchpoint is yours? Two options: create a new user profile and complete it as you interact more with the digital world or apply a probabilistic methodology to try to associate it with one of the previously existing profiles.
Only 56% of the segments I belong to represent me and my interests correctly. This lack of coordination between my reality and the digital reality that has been created based on my navigations gives us an idea of the distance that exists between information and knowledge.
The situation becomes even more interesting when we see how segmentation is being done, and the thing here is that although they have a large volume of data, they mainly rely on "events" and correlations to define our interests (or at least, that is what we know...). In this image you will see some of the segments I belong to according to the categorization of interests that Google has defined based on my searches and interactions with the digital world.
The symbols to the right of each category indicate whether this segment is indeed related to my interests or not. My conclusion after analyzing it was that only 56% of the segments I belong to represent me and my interests correctly.
This lack of coordination between my reality and the digital reality that has been created based on my navigations gives us an idea of the distance that exists between information and knowledge.
I'd like to recover a previous idea, all those relationships the data economy has created must pivot around these concepts: consent, freedom of decision and information. Our daily lives are already compossed of continuous data-sharing. They are small exchanges that we carry out almost automatically, without giving much thought to the matter.
For this reason I would like to conclude this article by asking 2 questions:
- Do you consider that as users we should exercise the right to privacy when faced with the possibility of transferring our data to try to stop a pandemic?
- What risks do you think there are in the short and long term that we transfer our data? Not only to our governments, but also to the companies mentioned above?
References:
Data and Analytics Strategy | Driving Results with Technology | Turning Vision to Business Benefits
4 年Basing it on historical experiences - the governments have this nasty tendency of "normalizing" restrictions. Sometimes it works for our benefit (like the security controls on the airports), sometimes as societies we just carry on and adapt to the new "normal" over time. So always there is this fine balance of how much data should be shared in order to help the government to provide a better security measures, and how much of this data is actually processed. The problems start when we talk about the hierarchy in this structure. As citizens, we are always below the government and are bound to the regulations that define the particular areas of our life. Therefore, what we as citizens agree to in terms of how much data we would like to share with the government is always secondary in comparison with what will be imposed on us by the law. Funny thing that in dictatorships, this practice is every day life, in democracies it works only in crisis times - naturally for those democratic regimes that would like to remain democratic after the crisis is over.
Thanks for your point of view. I highly mistrust the government so against it. They had all the information on an upcoming pandemic but failed to prepare and save their healthcare staff and people from this. (see Bill Gates and many others before him). On top of that Spain was just over 40 years ago a fascist state which oppressed dissent voices by all means possible? The governments failed in public health. don t trust them with your private data.
?????????? ?????????????? ?????????????? ?????????? @ ???????????? ?? ???? ????? ?????????? ???? ???????????????????? ?????????? ?????? ???????????????????? ?????? ?????????????? ?? Marketing ? Tech ? Strategy ? Media
4 年This is a very interesting point of view, even when I do not agree with some of it. As for the question of govt's using location data, etc post COVID, I haven't had my mind around it but the key here is aggregation for proper analysis vs individual recognition. This is no different to Google ADH idea for post-3rd party cookie deprecation, Governments and institutions should have a certain way to diagnose issues such as this one in an aggregated manner, but it should never point it towards an individual, at least that's my take so far. Now, as for cross-device and how accurate or not your "interests" are, that is not particularly tied to Google. If you look at Instagram or Twitter for your profile, you will see the same situation, where you'd be included in certain topics you are not even nearly as interested. Now, both FB and Twitter rely much more on PII data than say, Google, yet the situation happens anyway. Why is that? Simple, activation data scale. Marketers want to reach niche audiences in a scalable manner, even when it does not make sense at all. Since there isn't a universal playbook defining how many interactions/visits you need to have to be define as "interested" in something, everyone applies that freely to what suits them best, even when falling into this type of things. Evidently, the new wave of privacy and disclosure for users have triggered these practices to be very much open, yet they have been happening for quite some time. These companies have the technology to understand a whole lot more about us, however we are only looking at this from an advertising perspective, there may be tools or features being used in the back end for purposes outside of ads which can allow them to have a somewhat solid definition of what we like (emails we send and receive, pages we visit, search terms we use, social posts we like). How they use that to build profiles depends on the advertising business model, not particularly the tech. My 2 cents, I believe there needs to be a more evident trade off for data vs value to the user. Very glad I ran into your post. Keep it up
Activation Strategy Senior Manager @ JAKALA
4 年As always, great point of view on subjects that affect us both as marketing and individuals. Congrats!