Confidential computing
Since the early days of the IaaS and PaaS Public Cloud, data protection has been quoted has one of the top roadblocks faced by corporations that manipulate highly sensitive information (quite understandably). With the sprouting of data privacy regulations, concerns have only gone more acute.
So what is to be done? For real, I mean.
Which patterns are useful, which are not? And why? Read on!
Looking for something to chase
It's weird, yet when I talk about data protection, a too common stance among security analysts is to jump to some conclusions without even describing the risk we are trying to chase. So let’s go back to basics:
Against whom should you protect?
If you are familiar with the handling of highly sensitive data like personally identifiable information, one can assume you already guard yourself on premises against internal and third-party threats.
Now in the Public Cloud, key actors to guard against are a rogue Cloud provider administrator and an adverse co-resident (a malevolent or a compromised tenant).
Against what?
Although data protection has many other aspects and side-effects, here our only worry is preventing read access to the data, no matter its state (in transit, in memory, at rest) and regardless of the health of its environment (compromised keys, critical vulnerabilities, unmonitored endpoints, pass-thru firewalls...)
Now… how?
The only serious mitigation that attempts to cover this risk that I know of was announced in late 2017 by Mark Russinovich: it’s an early access service called Azure Confidential Computing. In a nutshell, what it does is leverage Intel enclaves to process customer data in an isolated and encrypted area of an Azure CPU. This area is managed by a new set of instruction codes called SGX. A crucial factor to be aware of is that the Cloud provider (here: Microsoft) has not access to the encryption keys.
So to get back to what I said earlier, on the paper it seems to protect against read access from disgruntled Cloud administrators or adverse co-residents.
Limitations from the ecosystem
There are a few attention points as far as the health of the environment where sensitive data are plunged is concerned:
? some tools may be used by a privileged Cloud employee to attempt tampering with the Intel hardware (physical breaches should trigger some fuse, and the critical parts of the microprocessor should self-destruct. Should. Hopefully.)
? a Cloud provider employee may collide with some Intel SGX experts to get the keys (far-fetched, but not impossible)
? key management may be error-prone in a context of service chaining, where most of services are not hosted on-premises but in the Cloud (some private keys of yours, distinct from your enclave keys, must be shared across pub/sub enclaves)
Limitations by age
If it was only it, it should not be too much of an issue. But the most obvious limitation is elsewhere.
I gather that:
to handle highly sensitive data, you have to have very good reasons. And if you have very good reasons, you have to have very high expectations. If you have very high expectations, you need strong historical records backed with a large users base.
You need independent educated papers, third-party cryptanalysis, plenty of case studies. If you are risk adverse, you are not going to ride easily into Elon Musk’s ship to Mars. A usual bootstrap problem…
Limitations from nativeness (or lack thereof)
There is another less obvious limitation, it has nothing to do with security or risks: the range of use cases you may currently run into Cloud enclaves. Enclave look great for grid computing and batch processing: you send some (encrypted) job and data to the enclave(s), you wait for the job to be done, you get the output and you decrypt it.
If you are a little bolder and consider deploying service oriented architecture, making full use of PaaS, then as of today the opportunity to leverage enclaves shrinks a lot: as your stateless runtimes need to get data from your stateful services, you need to make encrypted queries to your backends.
In practice, most of the time you will need to load the backend into enclave memory to process a given query. You cannot just pass on the encrypted query straight to the stateful engine. As long as homomorphic encryption has not become widely accepted, loading the whole data structure is probably the only option currently at hand. It brings some big constraints on your side:
? if you rely on a third-party solution (a database, a memory cache,…) to maintain state, you must ensure that (a) the solution can be loaded into memory, (b) it supports the limited subset of SGX instructions has to offer, and (c) it is written in C++ (which comes with its own security caveats, being unmanaged code);
? if you use a Cloud native solution, the assumption that the Cloud provider has no read access to your data becomes quite weak;
? if you use an in-house custom solution, you can say goodbye to your hopes of being native and your return on investment will certainly take a hit.
Tentative conclusion
Enclaves are our best hope to run sensitive workloads into public Clouds, and Azure clearly has a head start. It can already be leveraged on for specific asynchronous, long running jobs with a simple data structure (no query decryption). For more complex cases, you will need to find a third party stateful engine that is SGX compatible, without taking you too far off from a PaaS native approach and cut yourself off their amazing integration and cost efficient opportunities.