Citizen-protecting Proximity Tracing: COVID-19 between word-of-mouth gossip and a critical information infrastructure
Paulo Esteves-Veríssimo
Professor at CEMSE, Director of RC3-- Resilient Computing and Cybersecurity Center, KAUST (King Abdullah University of Science and Technology, KSA)
In this article (1), amongst other things that you may find interesting, I will talk about:
· Analysis of the several takes on implementing contact tracing, and of why the controversy between centralised and decentralised approaches may not be the right target of discussion.
· Why in our opinion, this is a global problem that requires a permanent, citizen-protecting public critical information infrastructure.
· Bringing ideas to action, I present the outline of a proposal of my team #CritiX @SnT, University of Luxembourg --- PriLok: Citizen-protecting distributed epidemic tracing critical information infrastructure
? POST PUB NOTE- We have published, after this article, the promised Open Architecture Proposal and Draft Design in ArXiv. Find the pointer here: https://www.dhirubhai.net/feed/update/urn:li:activity:6666032920169324544/
(1) Together with @Marcus V?lp and @Jérémie Decouchant (#SnT #UniversityOfLuxembourg)
NOTE: Since it kind of violates my book-of-style statement for my LinkedIn area (reserve my research for my university page), I should explain: the urgency and importance of situation so justify, and we are also looking at the immediate implementation practicality of the ideas, hopefully by industrial teams.
#COVID19 #proximityTracing #contactTracing #LocationTracking #epidemics #pandemic #privacy #coronavirus #SARSCoV2
Part I - The several takes on implementing contact tracing
The liveliest debates we seem to have now about COVID-19 are on contact tracing, which in public health epidemiology means: identifying and following-up on persons who may have been in contact with an infected person, in an infection-risk manner. A number of countries have used them during the pandemic heights, and a number of countries, companies and research groups are developing them as I write these lines, with a view on the de-confinement phases that are progressively happening.
Most of the debates we are seeing concern one or more of a few dilemmas:
· Security of the nation versus privacy of the individual
· Health of the individual versus community/nation health
· Ad-hoc Covid-19 versus generic alert readiness
· Proximity tracing vs. location tracking
· Peer-to-peer versus trusted-third-party system control
I’ll shortly summarize my take on these, asking more questions than giving answers. In a second part, I will give what my team thinks should be some answers.
Security of the nation versus privacy of the individual
This is a debate that has been going on for the last twenty years, since “nine-eleven”, and re-surfaces every time there is a crisis, like now. Sufficient wisdom should already have been gained by countries in the sense that: ‘security’ and ‘privacy’ are two faces of the same coin, and that undermining the privacy of the individuals and organisations of a whole nation with systematic/mass surveillance, destroys significant value (for individuals, organisations, and even nations’ business sectors), ultimately endangering the security of the nation.
So, should populations in democratic regimes, in very stressing times as epidemics, accept systems with limited or no privacy-preserving features, in a “temporary sacrifice of privacy”, to a promise of immediate safety?
Even if past history has shown that there is no easy going back from loss of privacy, once systems are on the field and instrumented?
Health of the individual versus community/nation health
In what concerns health, privacy is even more critical, since breaches about individuals’ health simply cannot be recovered like a compromised password. But there is another facet bearing some (indirect) connection to privacy. Some proposals approach privacy protection by focusing on voluntarist word-of-mouth actions, whose ultimate goal is the protection of a collection of willing (and technology-enabled) individuals, putting the state out of the loop.
How far do individual rights go, when the health of a whole nation is at stake? Think e.g., about what ‘state of emergency’ means.
How far should individual self-determination and independence go, when in the end: she/he will want the National Health Service (NHS) to assist him and possibly save his live; and his actions may influence, positively or negatively, the health of others?
Ad-hoc Covid-19 versus generic alert readiness
Tracking and retroactive construction of location and interaction profiles are effective instruments for tracing infection chains in near real-time and for identifying root causes, hot-spots, and propagation routes of outbreaks (see here).
Although the world’s antennas are tuned on COVID-19 this is true of any epidemic infection. Even COVID-19 will not fade out suddenly, there will most probably be several recurrences of the “Hammer and Dance” cycle before a vaccine. But RNA viruses mutate, so the vaccine will only alleviate epidemic recurrences, they will come back. Last but not least, there will be new infections coming up and…, some may be harsher, more lethal. This pandemic is all but a surprise (see here). Therefore, maybe it is time to think of something permanently in place that strikes a balance between safeguarding health and protecting the economy.
Are we going through a black-swan event with the outbreak of this strange SARS-CoV-2 coronavirus, or repeated and possibly harsh epidemics/pandemics are here to stay, given the life-style we have been developing?
Shouldn’t nations think about effective and permanent alert infrastructures for tracing epidemic infection chains in near real-time during their whole life cycle, safeguarding the population during heights, and allowing safe progressive re-opening of economy during fade-outs?
Contact tracing vs. proximity tracing vs. location tracking
Some confusion has existed between these terms. The first (CT) was explained earlier, and in epidemiologic terms, it means knowing (by whichever form, even phone calls) if there was contact between an infected individual and others. Automated (i.e. digital) ways of doing it rely on proximity tracing (PT). You do PT by any means that register that two people were in a proximity. For example, their phones contacted by NFC, Bluetooth. The third (LT) means registering the absolute geographical location at a given time, and/or a trajectory in an interval thereof, in a given space-time region (e.g, a city, or the zone under the scope of a mobile phone cell base station). You can do PT without LT, and you can do LT without PT, at least directly.
If some entity gets your interval LT data, that entity (e.g., Google, Apple, or a government) will know where you have been and when, during that interval. If they get LT from many other people, for that space-time region, they will also determine quite accurately near whom you have been, i.e. PT data. Now, if some entity gets your interval PT data, that entity only gets to know with whom you have been in close contact, not when, or where. Unless, of course, you or that entity annotate it with additional space-time information, or it is found by OSINT (open-source intelligence), with e.g., the help of ML/AI methods. This may be good or bad, depending on who does it, and I will get back later to this question.
Bottom-line, both have pros and cons in terms of precision and recall of the ultimate objective (determining infection risk), and your privacy is not miraculously protected because you e.g., pick one or the other.
Should we tie the high-level objectives of proximity tracing for epidemic infection follow-up to low-level architectural designs, or just pick the best trade-off, allied to other architectural measures?
Peer-to-peer versus trusted-third-party system management and control
Peer-to-peer (a.k.a. decentralised) PT with bilateral Bluetooth contacts, as suggested by some, allows a word-of-mouth use of the contact (e.g., you only find that ‘you’ were near an infectious person ‘b’, if that person tells you directly or indirectly later). The proposals have quite a few problems pointed out by several analysts recently (here, here, here and here). The technical ones maybe can be fixed, I will focus on the conceptual ones.
The found-infected person ‘b’ may or not give anyone her list of contacts (sent and/or received IDs). And even if she does, if it is word-of-mouth, it simply may not get to enough people in useful time. Then we may create a centralised public database of possibly infected people’s IDs (contacted by ‘b’), but here we are “violating” the decentralisation principle, and “contaminating” our approach with a trusted third party, whom we have to trust without being sure it is trustworthy (more ahead).
Let’s assume this works anyway. Now, you know you are at risk. What do you do? You may or not tell anyone else! Asymptomatic, in fear, you decide to just stay home and wait. Looks good, except that before ‘b’ was diagnosed positive but after contacting with (and possibly infecting) YOU, you went around possibly silently infecting other people, and this will be known quite late in the process.
What is the alternative? An approach with some centralisation could perform systematic operations, and thus put us much nearer the desired (national) objective tracing infection chains in near real-time and for identifying root causes, hot-spots, and propagation routes of outbreaks. It could (and should) be enriched with complementary tracing information. Let us think for a moment about PT derived from LT, which might then seem the most effective approach. LT with GPS will be the most precise, though it does not work inside buildings, and many old phones do not have GPS. LT with telco provider base-station triangulation is the one with best legacy-preservation, since it goes back probably to 2005 phones or so.
The problem with LT is that it has been prey to phone and phone OS vendors. Indiscriminate, forced or unauthorised use of phone owners’ LT is a recurring problem that has already cost hefty fines (from the EC) to some of them. Google and Apple (G/A) announced in a recent post, that they are developing contact tracing APIs to enable interoperability between Android and iOS devices, and that the associated functionalities would later be included directly in their respective operating systems, backward compatible to 2016. Given the history of these companies, the fact that these companies are willing to integrate these tracing features into their systems must be considered extremely critical, a view shared by many (see here).
Should we orchestrate the process in a peer-to-peer voluntarist way, to keep information away from central authorities, safeguarding privacy, at the cost of leaving it in the hands of individuals, and trust their adhesion, their reaction times and their strategies to be mostly altruistic, especially in these stressing times?
Can our peer-to-peer approach be totally decentralised and independent from some central trusted third party to finalise the process?
If we focus on an approach with some centralisation, relying on a trusted third party, which e.g. we could feed with richer PT/LT information, drastically improving the effectiveness of the epidemic follow-up system, how far would we be willing to go in terms of the privacy risks?
Could we possibly come up with an architecture where LT and/or PT (GPS, Bluetooth info) can evade intersection by Apple and Google, and avoid the corresponding privacy penalty?
Trust versus Trustworthiness
Going back to the Apple and Google (G/A) question, it is a fact that without further care, and given what the situation suggests, applications relying on the forthcoming G/A APIs --- whoever the developers, and whether peer-to-peer or TTP --- would energize a gigantic, world-wide, data collection system which could centralize the collected data in the realm of the Google/Apple coalition. Given the GAFAM data ownership and use model, it is not exaggerated to say that this would introduce severe privacy and security issues, not to mention digital sovereignty ones, in what concerns national public health and security policy. At the date of this article, not much is known, except that the French government has been heard as strongly opposing this effort from G/A, since it interferes with the app France is designing. It would be interesting to know the details.
Anyway, the title of this section is probably the crux of the discussion, and of some misunderstandings that we should not risk. I wrote a bit at length about this in a previous LinkedIn article. (For the interested reader, have a look at section Trust without Trustworthiness: or “Trust me because I tell you to!"). To make a long story short, the message was that nicely written trusted data processing algorithms will fail if the systems processing them are not trustworthy, and there is no trusted data over non-trustworthy systems. But on the other hand, there is nothing wrong with centralized trusted third party systems, if they are made to be trustworthy! And exactly because many useful problems are better served by systems with some centralization, technical solutions that make these centralized systems not fail or be compromised (or compromise citizens’ rights), might be a great advantage.
Is it possible, with current, at least near-market technology, to design and architect semi-centralized systems processing edge information from mobile phones in a way serving the national purposes of defence against epidemics, whilst at the same time giving trustworthiness guarantees of protecting citizens fundamental rights in compliance with the law?
Part II - Are we looking in the right direction?
Or: On why this is a distributed systems problem, of national interest, which requires a permanent, secure and dependable critical information infrastructure
In this second part, I will try to give some answers to the dilemmas above, presenting an outline of the proposal from my team CritiX@SnT, Univ. of Luxembourg --- PriLok: Citizen-protecting distributed epidemic tracing
Our proposal attempts at striking a balance between securing health, protecting privacy and safeguarding the economy.
I have written enough, and made statements in several public keynotes, events and appearances on media in the past years, to show without doubt that I am a privacy militant as a citizen. However, besides being a scientist researching on systems cybersecurity and resilience in general, and privacy-preserving biomedical information processing in particular, I am also a systems architect who believes that we should strike a balance between what we want and what we can achieve.
That is, the most beautiful theory or architectural concept must find a way to work in practice and correctly. E.g., it only makes sense to implement such a beautiful veil like the one above (concept) if it can really protect from sun or rain (a functional property) and we are sure it will work well (not crumble --- safety, a non-functional property). The meaning of these comments will come evident throughout the text, since some architecture compromises must be made to meet all the challenges at hand.
Desirable objectives and implied requirements
We believe that given the reasons above, the almost indispensable functional objectives allowed by any system doing digital contact tracing (CT) are:
1. Be epidemic-agnostic: act on any epidemic, even the unexpected, in near real-time.
2. Find about infected individuals in near real-time.
3. Find about potential individual infection chains in near real-time.
4. Alert, monitor, confine, and trace potentially infected individuals in near real-time.
5. Diagnose country/region/community epidemic dynamics in near real-time (map basic infection evolution numbers; locate and map infection hotspots and trajectories; detect super infectors and/or lone wolves; predict collections of asymptomatic individuals; discern between external and communal infection paths).
6. Learn from first epidemic outbreaks and act during individual re-infections and epidemic recurrences, in near real-time.
Additionally, the following non-functional objectives should be met:
7. Guarantees of protecting citizens’ fundamental rights (such as transparency, privacy and equality) in compliance with the law.
8. Resilience to manipulation and forging, fake-news, gossip, panic, denial of service.
9. Sustained real-time capability under overload, to maintain situational analysis and reaction capacity (infection roadblocks; sanitary fences around hotspots; group quarantines; and later, precise selective re-opening).
10. Smoothly incremental precision and recall, from an acceptable nation-wide baseline technology level, to levels attainable by s.o.t.a. technology (not only but including 5G).
If those requirements are met, we are bound to have a CII (critical information infrastructure) that really serves the nation and its individuals, in the possibly hard times to come in the next years.
We believe the state, as one stakeholder of the nation, should have the important responsibility of its implementation and operation, relying on other stakeholders (regulated companies, regulators, public associations, for example).
However, the people, individuals or collections thereof (who ‘are’ the nation) have the right to enjoy the CII equally (regardless of their technical literacy), release PII (personally identifiable information) lawfully and only on a need basis, and have transparent access to the modus operandi and regulation of use.
In this sense, we believe that approaches peer-to-peer managed (actually or pseudo-decentralised), voluntarist (totally or mostly based on word-of-mouth gossip), will work, but will miss some important objectives of the list above.
Likewise, approaches centrally managed (single entity), top-down controlled (totally or mostly based on the ?Trust me because I tell you to!? principle), and as such opaque, will work as well, but miss another set of equally important objectives of the list.
As wise people say, maybe the truth lies in the middle. Some few colleagues have written about the cons of the extremes (here, here, or here).
PriLok: Citizen-protecting distributed epidemic tracing critical information infrastructure
In consequence, we propose the architecture of PriLok: Citizen-protecting distributed epidemic tracing, a critical information infrastructure (CII), whose values, which we wish to safeguard in the design that we will make public, are:
- Maximizing nation-wide coverage of people and territory
- Ensuring near real-time continued situational awareness through whole epidemic life cycle
- Transparent protection of citizens’ rights within compliance with the law
- Resilience against data- and system-based social and technical threats.
- Protection of economy by precise and selective throttling (closing and opening).
In a brief explanation, PriLok should be oriented at the protection of populations, cities, countries, trans-border regions, in the face of epidemics. The objectives traced above imply the participation of a plurality of stakeholders, and should leverage on existing CIIs, such as the NHS systems and the telco networks.
Decentralisation or centralisation
The question of decentralisation versus centralisation must be better explained. The fact that something is logically centralised, does not imply it is physically centralised, with a simplex locus of control.
It may rely on a set of entities, which share power and control of the system and, leveraging natural redundancy, increase its resilience. Entities such as: government, through the national health service (NHS), directorate and family doctors; telco network incumbents and service providers; regulators and people’s ombudsman, patients, and so forth.
Trustworthy trusted third parties
Efficiency of such system obviously requires digital workflows, as happens today with the majority of e-gov operations of the kind foreseen. Security of operations, namely privacy, would rely, amongst other techniques, on the systematic encryption of PII, contact and location data, in movement and at rest. They rely as well on the use of fault and intrusion tolerance techniques relying on thresholds and quorums of correct entities, to obviate attacks or abuses from a minority of (possibly colluding) malicious entities, such as secret sharing, erasure coding, and voting.
Such measures seek to materialise the concept of a trusted third party (the said “centralised” controller) that is trustworthy, i.e. worthy of trust (because implemented by the mutually controlling actions and consensus thereof, of “decentralised” independently acting entities). These entities vouch for that trust near their respective stakeholders, including the individuals.
Contact tracing methodology
Much has been said about the contact tracing methodology itself, some arguing that finding a significant global epidemic map requires harvesting as much individual data as possible, so that s.o.t.a. ML/AI (machine learning / artificial intelligence) algorithms can make sense of it. This view is influenced by the undeniable successes of Google in some prior flu events.
We take in alternative a distributed systems approach at the problem, and argue that tracing epidemic infection chains in a data set of proximity and/or location traces, can be reduced to a potential causality determination problem, leading to a partially ordered directed acyclic graph (DAG), from which many of the insights named on the objectives we listed for such a system can be withdrawn.
ML/AI mechanisms should still be a useful asset, to prune, enrich and fine-tune such DAG information (which may have precision and recall limitations), but we believe that the initial DAG creation methodology (which will be privacy-preserving as we explain next) will carry enough information that may be complemented by OSINT (open-source intel), instead of putting political stress on knowing too much PII from individuals.
Sensing technologies for contact tracing
Continuing now our discussion in the introduction about the sensing technologies for contact tracing, if this system is to be inclusive, it should contemplate the lowest level of sensing feasible and perfect and improve from there. Let us note that 30% of the population is estimated not to own a smartphone, and most old people are not tech-savvy.
As such, it should leverage the existence of the primordial, though imprecise, sensing apparatus that promotes inclusion, composed of the mobile phone cells and associated base stations. It is quite imprecise, but can be improved with telco provider based base-station triangulation, and we have proposals to incrementally and securely improve on that, which we will detail in our design document soon to be issued. However, from the viewpoint of the national interest, note that it is the one offering better reliability, security and sovereignty conditions for a start, since it does not suffer from the considerable threat plane affecting Bluetooth from phone-to-phone attacks, or the phone/OS vendor interference in GPS sensing.
Leveraging CII state regulation
Since telecommunications are a regulated activity, and we defend that PriLok should be a (state endorsed and regulated) CII, leveraging and being supported by the existing mobile telecommunications infrastructure, there is some margin for legislating about support for this new CII, namely: protocol and information workflow modifications; technical enhancements and additions.
The reader still thinking of the possible benefits of escaping the state control that decentralised Bluetooth approaches seem to have, should remember at this time that all her/his mobile phone calls and the base stations within whose area she has been in are systematically logged and stored by providers, in special registers --- call detail records (CDR). This is extremely business relevant for invoicing, but also very privacy sensitive, and the regulations about how to use, keep and dispose of, CDRs, are ambiguous at best. Telco being a regulated activity though, abusing this information is a contravention, if not a crime, but there is no technical enforcement or protection of privacy whatsoever. Yet, we seem to live with that…
Privacy-preserving location
So, imagine that turning the epidemics threat into an opportunity, we defend that we need good enough contact tracing, but in a secure way. So, we start to store, along with CDRs, an additional structure, Proximity Detail Records (PDR), containing, for each region (cell), the timestamps of the periods of time spent in the region, plus some parameters of the region (size, centroid coordinates, etc.). Then, all the PDRs are pre-processed as well and likely proximity contacts of pairs of phones are flagged and enhanced PDRs for these pairs are created.
Now comes privacy: all these registers are encrypted, with threshold encryption, and the cleartext disposed of, so that they cannot be reopened except by a threshold number of “authorisation signatures” (keys or key shares), coming of course, from the above-mentioned set of entrusted entities. These data would stay encrypted at rest in data centers under control of the CII management. It would be deleted periodically according to conservative incubation estimates for infections.
To give an idea of what follows, imagine an epidemic comes, and person ‘a’ is found infected. Through the NHS (doctor, delegate, etc.) she is asked her phone number, and an encrypted search is triggered on the paired-PDR register depot, searching for the (encrypted) records containing that phone number. The list of registers is handed (obviously, all this in an electronic workflow) to an authorised threshold (may including a magistrate) of entities, who extracts the other phone numbers that ‘a’s phone has been in proximity with, and seeks to alert the respective users. Patients can even be given, with their collaboration, roadmaps of her track during the asymptomatic but contagious period, in order to help locate infection risk targets as precisely as possible.
Stream processing
All the other uses in our requirements list can be served by a generalisation of the example above, through efficient, but secure and privacy-preserving threshold and/or encrypted search operations, as we will detail later.
We foresee the use of stream processing techniques, e.g. for DAG construction and exploration, building of complex maps of epidemics hotspots and trajectory, energizing epidemics propagation models. Such processes will allow automatic and thus efficient data de-anonymization requests on demand, directed to cooperating authorized actors only (e.g., by epidemiologists following jurisdictive and governmental clearance for epidemics predictions, or in general, to one of, or a threshold combination thereof, depending on the criticality of the operation: physicians, representatives of the National Health Service, public authorities).
Such a high-performance solution will also enable operation under advanced adversarial and threat models that help optimise, in the way possible, the precision and recall of findings, and the predictive power. About predictions, we foresee running models allowing selective post-epidemic-height re-opening of economy, based on risk assessment.
? POST PUB NOTE- We have published, after this article, the promised Open Architecture Proposal and Draft Design in ArXiv. Find the pointer here: https://www.dhirubhai.net/feed/update/urn:li:activity:6666032920169324544/.
Thanks for your interest!
EUDI Wallet | Wallets | Digital Identity | Digital economy | Payments
4 年Interesting read however I think the focus for now should be on handling the current situation it has been 100 years since the previous significant pandemic and the good thing about this virus is that it mutates little. The bad thing is that it is a tricky variant to counter both by immune response and vaccine and the likelihood is that we are looking for years here. I do agree that in terms of privacy the real black swan is GSM records but I do believe we should not repeat that mistake again. Any suggestion that does not provide end user explicit consent by design and absolutely are in my view not an avenue to pursue. That implies no reliance on third parities to protect and play by the rules to obtain control and privacy.
Cryptography Researcher at AZKR Research
4 年Paulo Esteves-Veríssimo?fantastic, I will give a look immediately! Here It Is a work of ours:?https://eprint.iacr.org/2020/493 . Any comment Is welcome?
Associate Professor in Computer Engineering at Federico II University of Naples & Founder Critiware s.r.l., Secureware s.r.l.
4 年A "critical information infrastructure" for analyzing data with strong security would indeed make easier for citizens to accept (trust!) epidemic tracing apps. Thanks for sharing your ideas!
Fellow of the Learned Society of Wales at Learned Society of Wales
4 年Would be super keen to discuss with you a bit Paulo Esteves-Veríssimo
Group Leader at Luxembourg Institute of Science and Technology
4 年Very interesting observations, Paulo. I agree with many of your points! Looking forward to reading your paper.?