This Is ... The Cagle Report
Welcome to the inaugural newsletter for The Cagle Report, or, as I frequently use in hashtag references, #theCagleReport. I started blogging about eighteen years ago, though I had written a couple of books and managed a technical newsletter before then, though my "day job" was working as a programmer for Microsoft, among other companies.
Blogging is, like so many innovations of the Internet era, something of a hybrid art form. Blogs are not news articles in the traditional sense, though many are "newsy". Nor are they necessarily essays (though may are essayish). They are typically written from a first-person perspective, though whether first person singular or plural usually depends upon whether such blogs are individual musings or corporate announcements. I suppose that if the Queen of England ever wrote a blog that it would likely be in the Royal We (as in We are not amused), but the individual expressiveness of a blog makes it a medium of its own.
The Cagle Report started out life as something of a gag metaphor for a conference presentation, a faux newspaper typed out on a Faux Royal Typewriter from the 1930s. Over the years since then, it has become something of a tag line for me as I blogged about everything from enterprise data management to quantum physics, so when I was approached by Linked-In about writing up a newsletter, it seemed to be the most logical choice for a name.
For me, the biggest question about doing such a newsletter was determining what it's focus would be. In the end, I realized that the best answer to this was the notion of digital work and augmented intelligence. As to what those mean: we human beings have, in the last few decades, radically changed the nature of how we work - what we produce, how we produce it, what we call ourselves as we are doing it, and most comprehensively, the very tools that we use to accomplish those tasks. This covers most of the significant areas that I have explored in previous articles, at both a high level and an occasional deep dive, while also occasionally exploring areas of interest that are rated (such as drones, space exploration and astronomy, genetics, medicine and so forth) and how they related to the contextual environment.
I will also be doing a regular podcast in conjunction with this newsletter (to be announced shortly) and if you are interested either in contributing articles or being interviewed to promote your products, services or ideas, please contact me at [email protected]. I plan on publishing daily (which is insane, to be honest, but right now I have the time), and will you usually concentrate on one to two articles per day. For the inaugural issue, I wanted to summarize what I've learned from multiple discussions (and way too many Zoom meetings) about the impact of Covid-19 on data collection:
Privacy vs. Need To Know: A Covid-19 Briefing
Covid-19 has dominated the news cycle for several months now. One immediate effect of the virus has been the extent to which it has pitted privacy advocates against those who need actionable data for research in both the development of treatments and the tracking of potential cases.
This tracking, known as Contact Tracing, attempts to create a graph showing those people that were in general proximity to other people in venues such as restaurants, bars, sporting events, or churches. In the hands of an epidemiologist, this information is invaluable, as it can be used to identify and isolate those who may have contracted Covid-19, and from this be able to warn of its spread to other areas. However, in the hands of civil authorities, marketers, or even employers, this same kind of information raises significant red flags in the arena of civil liberties, especially by bad actors.
One solution that is frequently applied to prevent this is the use of anonymizers, tools that strip out any obvious identifying information about individuals or organizations. In theory such applications keep such data private by removing data that could be used to identify that individual. While this works when the data is effectively isolated, once you put it into conjunction with other information from different sources, there are frequently contextual clues, such as nine-digit zipcodes or frequent location data, that can be used to reduce the set of people under consideration dramatically.
What's more, the epidemiologist is basically a detective and the kind of information that may be critical in terms of identifying potential environmental factors - What's the gender of the person? How old are they? Do they smoke? Do they have heart disease? Do they live near a waste site? and so forth - also can provide clues to that person's identity. We are defined by our attributes, and making data anonymous enough to keep identities hidden usually loses too much contextual information to be of value.
Contract tracing takes this problem and amplifies it. Most contact tracing solutions work either by people entering or leaving a facility signing in with their names and contact information or by tracking Bluetooth devices around a smartphone and correlating this with device identifiers that can then be resolved to device-owners. The former solution depends upon the honesty of patrons entering or leaving an establishment and creates a transactional barrier, the second is more seamless but also less reliable, as BlueTooth devices can vary in range from a few feet to dozens of yards.
Such a system has raised concerns among data privacy advocates, in large part because of both the worry about security and the potential for abuse. Who has access to this data? Does your employer, who can use it as evidence to pressure you? A political rival who can use it to fabricate accusations that appear credible even if they didn't happen? The police, who can use it to corroborate your whereabouts (or to create a false trail)? What happens if this data gets stolen, and this provides material for a blackmailer, or for someone to use this to conduct identity theft.
It's worth noting, though, that this is not really that much different from the way things are today, where data breaches are far more common than are generally reported. This is not to say that it's acceptable, but rather that the problem extends far beyond contact tracing and represents a failure on the part of data gatherers to ensure data privacy. Of course, it's also in their best interest for this to continue, and as such, they are likely to resist any such effort to comply unless there are sufficient penalties to not do so.
This also raises the question about whether our concern about the privacy of electronic health records and similar data may be misplaced. This question gets answered differently based upon age, with older generations being very concerned about medical records (and privacy in general) while the younger generations work with the assumption that their data will be accessible regardless, and that if it is they may end up taking as much advantage of that fact as they can.
The spectrum of opinion raises the possibility that legislators working on privacy legislation may be encoding an outdated set of assumptions into law - not so much the worry that personal information is and should remain private, but rather that people wished they had a better means to fine-tune their privacy profile to better match who they are comfortable giving access to this information. However, this capability also needs to be software mediated through an agent or proxy, as giving permission for every service that pinged that user can be tedious at best.
For all this, the flip-side of such information gathering specifically for critical research is proving to be just as complex and intractable.
While Covid-19 is generating a huge amount of data, not all of it is of equal value. A lack of adequate testing, whether due to tests being unready yet or political malfeasance, has made it difficult to determine epidemiological data. At the same time, other factors that have hampered the acquisition of data have included confusing and overlapping privacy regulation at the state and federal levels, incompatible (or incomplete) data models and access to precisely those factors such as environmental and genetic data so necessary to train models.
This period has also highlighted the fact that simply knowing Python or R does not actually make one a data scientist. This might not necessarily seem relevant, but ultimately, it highlights the fact that not everyone needs access to specific data, even if they would like access to it. This determination of qualification becomes critical in any discussion about data use and data privacy.
The same conundrum comes up when discussing companies that gather third party day through devices such as Fitbits or other wearable health devices. The kind of information that is gathered could very well be invaluable to medical researchers - it provides detailed evidence of not only the conditions of its wearers but their trajectories and schedules, it may provide a way of determining a priori whether someone who is asymptomatic still had a disease, and in conjunction with other data may very establish who is most likely to catch the disease.
There are few formal protocols in place for how such private data should or can be acquired, which points back to the broader issue about the nature of both data and data privacy. Legislation such as the General Data Protection Regulation (GDPR) in the European Union or the California Consumer Privacy Act (CCPA) do have some provisions in place for common good use of data, but it can be argued that in both cases these acts provided at best only cursory guidelines.
It is likely that the conflict between data privacy and data access ultimately will not be solved by technological means. There are tools that may mitigate the worst of the abuses, such as tools that only provide access to such data when an agreement is made between the data owner (or a software proxy, such as a data agent) and the data seeker, and even then only for a limited time. This is a model that is similar to that employed by the music and movie industries, yet at the same time, it's worth noting that piracy there shows that this isn't an ideal solution either.
As digital twins - records that create a recreation of a person, organization, or similar entity - become more common-place, we will need to acknowledge that such twins need certain protection under the law, while at the same take acknowledging that they have corresponding responsibilities. Marketing data (which was the original target of both GDPR and CCPA regulations) became generally available because no such protections existed at the time, but there is a legitimate concern that the marketing beast will need to be tamed.
Similarly, while governments have specific rights that include protecting the common health and welfare of its citizens, the potential for misuse on the part of corrupt officials who may abuse the trust placed in them for finding potentially damaging information also needs to be guarded against. Ultimately, this necessitates putting initial control of data into the hands of the individuals about whom it's collected and let them make these decisions on a case by case basis - not corporations, not governments.
This is one area where individual knowledge graphs - collections of information that can be locked and unlocked based upon user preference - may help, but ultimately it is a societal decision, one put into place by consensus against a wide array of data exploiters, that will determine whether we become keepers of our data or prisoners of it.
Kurt Cagle is the editor of The Cagle report, and a longtime blogger and writer focused on the field of information and knowledge management. The Cagle Report is a daily update of what's happening in the Digital Workplace. He lives in Issaquah, Washington with his wife, kid, and cat. For more of the Cagle Report, please subscribe.
**RETIRED *** Helping People Succeed
4 年Tracing/Tracking presents a whole new privacy frontier. HIPAA, PHI, PII - out of date anyhow with new fronts on the medical side - do not apply herein most cases. Plus, the tracking meta data has been around for years. We got free beer in Boston back in 2008 for receiving ads on our phone that popped up when we walked down the street. "No Place to Hide" addressing Edward Snowden hinted at what we are seeing today. ** Thanks for the Cagle Report - look forward to it.
ESG Analyst, Risk Forensics and Global Risk Mitigation Specialist
4 年Subscribe me
ALLFORNEWNEWFORALL SAUDIA ARABIA SM2030 PROJECTS TECHNICAL MANAGER
4 年Kurt Cagle ??
Content Engineer && BSD?STr?YR??
4 年Neat breakdown of implications and data privacy&accessibility concerns regarding contact tracing apps. Nice read, thank you Kurt!
Technology Futurist | Keynote Speaker | Strategic Consultant | Thought Leader | Forbes Top 50 Female Futurist | Future of Work | Innovation Strategist
4 年An extremely interesting article Kurt Cagle! I reallly like the way you pulled together a range of big picture issues that are happening now.