An overview of security postures in Azure and AWS
When we read the many takeaways about compliance and risks management in public Cloud environments we might be tempted to believe there's only one approach to fend off our risks in the Cloud. But the compliance & security landscape has changed so much that there are now at least three.
Let's explore them today in order of acquaintance.
The classical way
Each time you want to deploy or modify a Cloud service, your orders are issued through your Cloud Provider's control plane; they trigger a series of events into the orchestration layers of the provider's backend. The problem is that, because the whole platform is shared and must honor global SLAs (not only yours), you have little control over the many default parameters and integration options the orchestrator assumes and applies to your specific service execution.
To make sure that the resulting deployment complies with your custom policies and that it does not deviate over time, you have to either schedule regular, out-of-band permanent supervision controls (the black stars in the picture below) that probe the backend to detect anomalies and fix them.
Painpoints
This is very inefficient for many reasons:
- you have to pay for every single run of a control;
- controls are point in time. The state of your deployment might still deviate between two runs;
- many resources are wastefully deployed only to be destroyed after-the-fact, even before being used.
- you have to maintain (or customize, if you leverage open source code base like cloud custodian) an increasingly large set of controls;
- when the provider changes something, your controls are likely to break. Even if the service API specs do not change, at least the controls must be thoroughly and painstakingly re-tested;
- customers following this path are not developing business value.
The mainstream way
Fully aware of these shortcomings, AWS and Azure have both developed many out-of-band compliance capabilities over the past couple of years. This is what I call the 'mainstream way': here, controls are not fundamentally different from classical ones; the main difference is that the build and run stages are now outsourced by the customers to their Cloud providers.
Outsourced controls are based on industrial requirements and benchmarks so they might not be to everybody's liking, but they make for a large and useful baseline nonetheless.
Each time you manage to uplift a control from the classical way to the mainstream way, you give yourself a favor by halving your operating issues.
Halving? With this approach, only the first three out of six issues remain:
- you still have to pay for every single run;
- controls are still point in time;
- many resources are still wastefully deployed.
The provider's out of band controls are not only able to detect but they can auto-mitigate anomalies on your behalf. Azure Policy was designed with built-in auto-mitigation capabilities whereas AWS Config implemented it only very recently (5 September for the GA).
Currently, the mainstream approach is the true IT security enabler that lets big corporation speed up their move to the Public Cloud in a highly regulated environment.
The native way
If you are in Azure, there is a third option.
In 2015, Société Générale CIB and Microsoft Azure architects engaged in a series of brainstorming sessions to sketch a model that would fit the bill without breaking Cloud foundations.
The ambition of this model was and still is to eliminate as much of issues 1 to 6.
The recipe is simple:
Tenants should be able to perform their controls in-band by leveraging hooks into the provider's pipeline.
The hooks pepper the pipeline from admission control to actual service launch (customer controls depicted as black stars again):
This native, fully integrated model is already a reality in Azure Policy. Rules are evaluated on the provider side and may affect a customer resource at any stage of its lifecycle: creation, update, or even steady state. Evaluation is context-dependent, scope-dependent, and stage dependent. Global re-evaluations are still necessary, but the approach has nothing to do with a blind, point-in-time run over all the assets.
In-band is not perfect
There is a large family of controls that do not lend themselves well to the in-band approach: controls that perform detection. Still, there are solid native ways to handle them: AWS Control Tower, Azure Sentinel. With the advent of Machine Learning, they are going to take even more relevance.
Also as of today, many out-of-band runs cannot be completely excluded. Issue #2 (point-in-time) is probably and by far the most complex to handle satisfactorily while at the same time stands at the core of the problem. (It may be the subject of another article ?)
Get prepared today for tomorrow
We've seen a long-time endeavor started to bear fruits when Azure Policy released in-band hooks and that's quite thrilling!
Since last year, Azure has enlarged the number of controls customer can perform in-band considerably. For example, there are thousands of resource aliases for the network resource provider alone (note however that only a small fraction can be referenced by policies), and counting.
For now, my recommended stand if you are in Azure would be to weed out out-of-band controls progressively in favor of native controls and develop custom, out-of-band controls only as a last resort (i.e. mostly detection).
I really much hope 'in-band' will become the new standard for preventive controls, but there is still a long way to go. It should not deter anyone from (a) getting to learn there is a native way (b) embrace it right now!
In summation
You have now plenty of options to adopt a proportional risk-management posture in the Cloud. Depending on your requirements, budget and regulatory pressure, you may start with the classical way or jump straight to hybrid or native.
If you haven't, leverage the amazing capabilities which are already off-the shelf and consider tuning your risk assessment methodology and your posture to the hybrid and native models.
Recent articles on Public Cloud security that you may like:
Risk professional, AI/ML amateur
5 年That was useful, thanks
Excellent writing, Christophe? We now need to think even further and beyond, as the question of safety running code on shared hardware (i.e. cloud) in the presence of a capable motivated actor is an opened academic problem.?
Principal Consultant @ Capco
5 年This is a great example of the exact challenges that enterprises face when planning to move to the Cloud... Well done Christophe!
Senior Enterprise Architect
5 年This is a very good model for how to think about controls in a cloud environment. It is simple enough to use with our senior IT management and IT Risk colleagues when they insist in perpetuating (or extending) existing Classic/Out-of-Band controls that currently impact development efficiency and cost a fortune to operate. I think this is also proof that Microsoft are serious about creating the cloud services required by large-scale corporate enterprises (rather than small scale unregulated start-ups).
Good read...