The nested cloud

The nested cloud

Now is the perfect time to approach Cloud security through the interplay between data planes and control planes—a perspective I call "planeswalking."

The reasons are:

  1. PaaS is becoming increasingly more complex and integrated: the distance from PaaS to SaaS is shortening , so are the distances between exclusive Cloud Shared Responsibility Model areas. This makes cross-tenant boundary violations increasingly difficult to pinpoint and to prevent by-design (one cannot harden what one does not see).
  2. For customers, the data plane is the new frontie r. But... what are we talking about?
  3. Another, less obvious reason is that Cloud providers are under pressure from Cloud sovereignists to adopt a zeroOps approach, meaning IaaS and PaaS control planes are limited to the bare essentials. How does this simplification impact security?

Planeswalking isn't merely about distinguishing between 'the' data plane and 'the' control plane. The strong driving forces of services integration imply that

service planes are often nested, or intertwined: if we peel off the layers, their overlaps, ambiguities, and perils can be revealed.

Let's start from the obvious planes on the top and move to the nested ones in the bottom.


Peeling off the layers

Service level responsibilities

Starting from any service from an Azure or AWS service catalog, the separation between the control and data planes is straightforward:

  • the service control plane (SCP) is fully managed by your Cloud provider, and it is customized by you -the tenant through a set of APIs;
  • the service data plane (SDP) is fully managed by you within the limits of what the service is capable of in terms of features (set by the provider) and constraints (set by yourself via said APIs).

The APIs delineate the responsibility domains of the provider and its tenants. This aligns nicely with the Cloud shared responsibility model that is widely advertised by all mainstream Cloud players. Regardless of which end of the APIs you stand, it is pretty clear that:

For all Cloud services, an isolation breach can happen between a tenant and the provider in the SCP.

But the control/data plane divide also tells another story:

For compute and network services, an isolation breach can happen between a tenant and the provider in the SDP.

So an important takeaway is that tenant violations happen vertically (from one customer to the next) and horizontally (from one provider to her customer).


Backend level responsibilities

If you wear the hat of the Cloud provider feature team in charge of a Cloud service for a minute you will acknowledge that the SCP is further broken down into two planes:

  • the backend control plane (BCP) operates orchestration and scheduling activities for the service (and ancillary services) as a whole, in a centralized way.
  • the backend data plane (BDP) encompasses a bunch of local agents and proxies in charge of meeting BCP demands and reporting metrics back to the BCP. This locality may be either geographical, customer-centric, data-centric, or a mix of all. In any case, as opposed to the BCP, the BDP is highly decentralized and scalable.

Being aware of this vertical separation between BCP and BDP is actually quite useful for you, the customer: it lets you understand how and to which extent you are actually shielded from arbitrary co-residents, from an angle that is almost never taken.

For compute services, an isolation breach can happen between two tenants in the BDP.

So that's a third tenant violation boundary you should feel concerned about.


Frontend level responsibilities

Due to automation (everything-as-code), the PaaS platform will be constantly spinning up all types of workloads, so it is likely that a constellation of your feature teams will operate following a loosely coupled devOps modus operandi.

In this picture, a handful of feature teams will manage your?landing-zone (a set of traversal/reusable assets, including IT security and architecture patterns enforcements that will be used to groom all other zones) while the majority of feature teams will spend most of their time adding business value into the form of continuous deployments into?your application?zones:

  • the customer control plane (CCP) operates orchestration and the scheduling of common enforcements into the groomed zones, in a centralized way from the landing zone;
  • the customer data plane (CDP) is the application zone of a given feature team, where the workloads are actually executed according to the grooming rules set by the CCP.

As far as responsibilities are concerned, tenants have to enforce separation of concerns between the central feature teams and the satellite teams:

For all services, an isolation breach can happen between landing and app zones, or between two app zones.

Nothing specific to the Cloud here, it’s just standard devOps. But... The CDP is where most of customers risks hide: plenty observables are available from PaaS runtimes (pods, functions, worker nodes): what to do with the many false positives, how not to miss true positives, how to prevent runtime harm from happening are some of the biggest challenges in cybersecurity, as of today!

A typical Customer Data Plane situation.


Control flow in quasi-SaaS

Finally, an analysis of the control flow of sophisticated PaaS (a "quasi-SaaS") should typically raise questions about how far off its design is from the Cloud Shared Responsibility Model promise.

A few months ago, I illustrated this concern using a particularly extreme example: confidential computing in PaaS pods (these are now called peer pods).


Control flow in CCC (Cloud Confidential Containers)

The flow must pass through a series of control plane components until it reaches your SDP (almost) without touching your CDP: quite a challenge!

Crossing the Tenant boundary (the vertical dashed green line) in a confidential computing scenario must meet very strong guarantees , otherwise it's pretty useless, right?

  • flow direction: a diode pattern must be established so that the direction is reversed at the boundary... We are not quite there.
  • zero-knowledge contents: information passing must not lead to customer data leakage... We are not quite there.
  • provider's takeover resistance: customer pods must withstand an assault from a fully compromised control plane... We are not quite there.


Now that we've explored the various layers, let me provide some ways to deal with the risk.


Planeswalking security

Assessing planeswalking risks: a concrete example

Whenever a new vulnerability is discovered, I perform a planeswalker assessment. The diagram below show how I reason about planes traversal on a couple of real-life examples: #Azure autoWarp vulnerability and #AWS superGlue.


Planeswalking assessments.



Preventing planeswalking risks

At this stage, it is clear that customers cannot prevent all planeswalking risks, far from it. Still, a useful preventive approach for Cloud customers is to follow the PEACH framework designed by Amitai Cohen at Wiz, to which I contributed.


Measuring planeswalking risks

When a potential Tenant breach is found in the Cloud, the typical response is to resort to CVSS to measure its severity. However, when it comes to the Cloud, CVSS has several important limitations:

  1. It is "mono-tenant": only you, the target, is part of the impact. A Cloud breach usually involves at least two parties, as we've seen above.
  2. It flattens risk perception: triggering some CVSS metrics will change the severity from low to high, without much nuance.
  3. In CVSS 4, the rating doesn't care much about intermediate components in the kill chain: only the "vulnerable components", i.e. the last mile, is taken into account.

These are the reasons why I created the Piercing Index in 2022: it's a risk-assessment metric to better account for data plane and control plane traversals. While it isn't meant to replace CVSS, it helps to put cloud vulnerabilities into a more useful perspective.

Like the logarithmic Richter scale, the Piercing Index employs a tiered approach to assess the severity of security boundary violation: each tier is an order of magnitude higher than the previous one.

The Piercing Index is implemented by Wiz in their Cloud Vulnerability Database and the methodology is publicly available: Take a look at the repository if we want to know more!



Note: parts of this newsletter edition take inputs from key teleportation in Azure and AWS , an article I wrote in 2020.

Mauricio Ortiz, CISA

Great dad | Inspired Risk Management and Security Profesional | Cybersecurity | Leveraging Data Science & Analytics My posts and comments are my personal views and perspectives but not those of my employer

3 天前

Christophe Parisel thank you for sharing this valuable insights

Graham G.

Certified Expert in Multi-Cloud Solutions (#GCP, #Azure) | Speaker, Author, Blogger on Cloud Security, Architecture & Compliance | Cloud Security Alliance - UK Chapter member

3 天前

Love this (and need to read it a few more times I think!)

要查看或添加评论,请登录