Towards the Confidential PaaS: a detailed whitepaper
I have two quite pressing reasons for sharing this whitepaper today:
Part 1 - hard customer requirements
When considering CC, two extremely common pitfalls of Cloud customers are:
I believe it is critical to set industrial standards which are commonly shared and approved among high-demanding customers (i.e. from regulated industries), to avoid that the definition of such standards is left to providers and chipmakers' sole judgement. The lack of customers implication is likely to bring CC into some dead end where the delivered solution is not being adopted by the customers who, as it turns out, need it most.
So here is a proposal of hard (non-negotiable) customer requirements:
Requirement #1: confidential boundary
If you've followed by posts on CC, you may have noticed I keep claiming that to design a proper confidential PaaS, one must design a proper confidential IaaS before. What do I mean by this?
In Iaas, the confidential boundary is clear: it is the whole VM itself (except for SGX enclaves, which are only sparingly used). It is called a CVM (Confidential VM).
The reason for it is that no Cloud Service Provider (CSP) code must be allowed to be stored or executed into a customer compute environment: the CVM image must not contain any CSP products or libraries, including firmware itself.
In PaaS, the compute environment is split into worker nodes (aka 'data plane') and master nodes (aka 'control plane'). If the latter is under full control of the CSP, the former is a customer compute environment and must be enclosed into a CVM.
This constraint on the confidential boundary dictates the following requirements:
Requirement 1.1: in PaaS, the confidential boundary is the worker node. (put it another way: each worker node is a CVM).
Requirement 1.2: at build time, no CSP code (including firmware) must be stored in worker nodes. At run time, the worker nodes must only execute customer code. (The numerous pods executed by the CSP in PaaS offers must not run in a confidential worker node).
Requirement 1.3: the memory and registers of worker nodes must only be readable and writable in the clear from within the CVM. (Memory and register must be encrypted and signed by the confidential processor).
Requirement #2: secure boot
Each time a CVM is spun up, customers must be offered absolute guarantees that the instance is theirs, that it has not been duplicated and that it is not being emulated out of band (i.e. outside a confidential processor).
Here is an illustration of CVM emulation and why it is dangerous:
Requirement 2.1: CVM images (including firmware) must be encrypted with a key pair not belonging to the CSP.
Requirement 2.2: The confidential processor and the customers are the only actors able to decrypt the CVM image. This decryption must happen outside of provider reach.
Encrypted images may be stored by customers in an online gallery managed by their CSP.
Requirement #3: hermeticism
To customers, CVMs must act like any "normal" VM: they have usual egress/ingress network connectivity, they can have external disks mounts, ...
To any other entity, they are hermetic systems, meaning two things:
Requirement 3.1: By design, no communication can be established from the outside of the CVM (except by the customer herself).
Requirement 3.2: By design, communications established from the CVM to the outside of customer environment mustn't leak any customer information.
Please notice that this definition of a hermetic systems is somewhat stronger than AWS' own definition. For AWS, a hermetic system corresponds to requirement 3.1 with a focus on built-in noOps for CSP activities.
Part 2 - solution design
In this section, we discuss "a" solution among many possible ones. The purpose here is not to find "the" perfect design, but a design that would fill the bill.
Everybody is invited to challenge this design and to improve it wherever it makes sense, provided all hard requirements are met of course.
Confidential boundary
Requirement 1.1 (worker node is a CVM) is already implemented in a PaaS offer: Azure confidential Kubernetes Service.
For the IaaS, requirement 1.2 (no CSP code in CVM) is currently in preview in Azure (in-guest custom firmware).
For the PaaS, requirement 1.2 is not yet implemented to the best of my knowledge and it is prone to impact the shared responsibility model in a couple of ways:
Requirement 1.3 (protected memory and registers) is already implemented in AMD-SNP chips.
领英推荐
Secure boot
Requirements 2.1 (on premises images encryption) and 2.2 (CSP-proof decryption) are currently not supported by chipmakers for IaaS, hence not for PaaS either.
I have already shared a possible solution:
Hermeticism
This remains by far the most complex and innovative topic to tackle. Globally it amounts to a guest-host secure communication problem. Let's decompose this problem into its necessary subcomponents:
You may have noticed with have excluded all pod/worker node/kubelet to kube server API communication. Such flows won't be possible anymore, it's quite a breaking change.
Control flow
To ensure perfect air-gapped isolation between the data and control plane, several concepts must be introduced:
The good news is that all the above sequence is already supported by AMD-SNP technology.
The bad news is that two key ingredients must be designed: the Executive and the Servicer.
The Executive
The Executive is a new micro operating system hosted in the CVM, which acts as a proxy between the worker node and the control plane. It is the custodian of hermiticism: it ensures the communication channel is both safe from intrusion and safe from leakage.
The first outstanding feature of the Executive is that it must be provably secure under a CTL model checking (see for, example, https://www.cs.technion.ac.il/users/orna/FMCO05-survey.pdf)
To that end, the Executive must have the smallest footprint as possible. I believe that, given the number of operations required for operating a worker node, such a criterion will be easy to meet.
The second outstanding feature of the Executive is the unique way it handles I/Os: recall that the only I/Os it has to deal with are control flow orders. They can be normalized to follow a very simple syntax based on only three terms: a VERB, a NOUN and a UID.
For example, download image "ubuntu:20.04" from the registry can be translated as [ VERB=4506, NOUN="ubuntu:20.04", UID=0x5f6b39201 ]
Here, verb 4506 corresponds to a registry pull. It is highly recommended that the control plane uses the same UID for idempotent requests, because they are cached in Executive memory for a reason that will become obvious in the log results section.
If the syntax is correct, the Executive runs the Servicer routine for further analysis.
The Servicer
The role of the servicer is to maintain its own copy of the state of the worker node. All orders issued by the control plane to the worker nodes involve a graph transition between valid states. Upon receiving a VERB/NOUN order, the Servicer checks that the resulting state is valid, and if so updates its internal representation of the worker node state. It then returns controls to the Executive that informs the worker node to perform the order.
To avoid the Servicer and the worker node be out of sync, an UPLINK from the worker node to the Servicer must be done at regular intervals. Details of the UPLINK won't be discussed here because they are not on the critical path.
Results logging
When the worker node executes an order transmitted by the Executive, the outcome must be communicated back to the control plane.
To make it 100% leakage free, the executive uses APIC emulation and the following straightforward syntax to convey the outcome to the control plane: [ STATUSCODE, UID ]
The UID plays a crucially important role here:
Events notifications
When an event is fired by the worker node, it must be channeled all the way to the control plane.
How to make it so that zero information is leaked? We don't have a UID here, because the event is not correlated to a control flow request.
The answer is to define another routine in the Executive: the doorbell routine. The doorbell simply converts events received from the worker node into messages of type [ EVENTSTATUS].
When the control plane receives such message, it must then poll the worker node depending on the nature of the event: say the event is a pod crash [EVENTSTATUS=1008]. To determine exactly which pod has crashed, and assuming 45 pods are running on the worker node, the control plane issues 45 requests of type [VERB, NOUN, UID]. 44 of them will return an OK code, and one will return a KO. The UID of this request will let the control plane pinpoint the only pod which has actually crashed.
Last point I would like to share: the origins of the Executive and the Servicer. I think the problem at hand is quite similar with the kind of challenges NASA engineers had to meet when they launch the Apollo shuttle to the moon. They had to run a low footprint, highly optimized and resilient piece of software called the AGC (Apollo Guidance Computer).
One of the AGC main responsibility was to keep the state vector of the spacecraft constantly up-to-date thanks to a routine called the Servicer (similarly, we have to keep the state of the worker node accurate at all times). The AGC's operating system was called... the Executive. It accepted inputs of the form: VERB/NOUN. No need to reinvent the wheel!
Conclusion
I hope to have helped the community of potential Confidential PaaS customers get a clearer understanding of what is at stake, and how they could get more involved into the target design (since they will be most concerned when it is generally available!)
I also hope this whitepaper help chipmakers and providers alike have an idea of what level of security is expected from a production grade solution, so they do not waste time spending efforts in stillborn solutions.
Some requirements will have deep consequences on the architecture of container orchestrators and schedulers, so the sooner they are being addressed, the best. It is also an opportunity to make some of such orchestrators truly multi-tenant, but this is another topic!
If you find any error or if you would like to suggest improvements, fill free to do so in the comments sections.
Thank you to the courageous people who had the patience to read until the end. :)
Architect - Distributed Systems and Data *Active Clearance*
2 年Enjoyed this. Thanks, Christophe Parisel
Thank you for sharing this! Those are very interesting ideas. I see quite some overlap with what we are building with Constellation: https://github.com/edgelesssys/constellation Constellation is not a PaaS, and the big difference is that the control plane itself is running inside CVMs, so only the customer has access to it. Exactly that aspect, the control plane not running in CVMs, is definitely the biggest challenge I see in your approach. How does attestation work from a customer perspective? Do customers verify each worker node of their clusters? How do they securely deploy their workloads to these nodes? The model you described in "Control Flow" essentially goes one way: Preventing information leakage from the worker nodes to the control plane via the "Executive" and "Servicer". How do you protect the other direction: Supplying potentially confidential information to your workloads and isolating that from the control plane? If you're interested in having a short conversation discussing this further, let me know:-)
Thankyou for this - detailed *and* thought provoking.