PlatformOps in a Microsoft Enterprise-scale landing zone
The "Contoso" Enterprise-scale reference implementation

PlatformOps in a Microsoft Enterprise-scale landing zone

The main goals with this blog are to illustrate:

  1. PlatformOps: How you build a datacenter in Azure with the new opinionated, prescriptive and code-based Enterprise-scale Landing Zone option(s) in Microsoft's Cloud Adoption Framework (CAF).
  2. "Compliance-as-Code": How you with Enterprise-scale can implement a compliant Azure platform with guardrails and policies in "code".

The blog will also illustrate how you can implement an advanced "DevOps" pipeline with GitHub and GitHub Actions to build, test, and deploy your Enterprise-scale platform and application landing zones to Azure.

However, it is NOT a goal to make this look easy!

Building a foundation for a full data center in Azure is a very complex task, but the goal here is to make it "as simple as possible, but not simpler".

"... as simple as possible, but not simpler!"

Expectations to the reader

This blog assumes that you have a broad understanding of generic Cloud and DevOps terms like Infrastructure-as-Code, landing zones, pipelines etc.

The first part will focus on compliance and it is an advantage to have knowledge of and/or interest in Cloud compliance, maybe including knowledge about security and risk assessment standards like FedRAMP, NIST, ISO 27001 etc.

The second part is more technical and you should be familiar with key Azure constructs and services as well as a basic understanding of and/or interest in GitHub.

Finally, it assumed that you either already are familiar with Microsoft's new Enterprise-scale architecture or are prepared to use the many links in this blog to learn more about the Enterprise-scale platform while you are reading.

Enterprise-scale Landing Zones: The North Star

The June edition of Microsoft's Cloud Adoption Framework (CAF) includes a new recommended implementation option - the Enterprise-scale Landing Zone (ESLZ).

In short, the goal of Enterprise-scale is to provide detailed guidance on how to build a complete datacenter in Azure, not 'just' an individual landing zone.

Note that an Azure 'landing zone' is never 'done' as the underlaying Azure platform will keep changing by adding new capabilities to drive the innovation, that you and your business want to benefit from. In other words, you should see Enterprise-scale as a direction - a "North Star" - not the end-state.

You will find additional details about Enterprise-scale Architecture as well as reference implementations in GitHub here.

No alt text provided for this image

As you can see in the picture, Enterprise-scale is part of the "Ready" phase in CAF, but Enterprise-scale will also have huge impact on how you handle "Adopt", "Govern" and "Manage".

Enterprise-scale is …

  • Prescriptive - with very detailed guidance and recommendations in a "Where do I start on Monday"-style
  • Opinionated - recommendations based on experiences from the numerous customer engagements over the last couple of years.
  • Code-based - with ARM (Azure Resource Manager) as "Management & Control Plane" and with reference implementations, ready to deploy in your environment. Refer to Part 2b below for much more details.

As with all architectural decisions, there are trade-off's.

In order to be Prescriptive, you can't boil the ocean, but have to focus on what you think are most important. You should expect that Enterprise-scale will get smarter over time as we learn from real-world experiences. See how you contribute here and the roadmap here.

Being Opinionated is simply giving your best recommendations to the most important architectural decisions and I know that many have been looking for exactly that. It is one of the Enterprise-scale principles to suggest an Azure native approach, where possible. Some will have strong arguments to take a different architectural decision - and it is absolutely fine - but it comes with a price: The more, you deviate, the less value you will get from current and future versions of Enterprise-scale.

Last, it is one of the Enterprise-scale principles to be "Cloud Native" and to use ARM in our Code-based approach. Some may decide to use Terraform or other "Management/Control Planes", typically to support use of several Public Clouds, and again, it is absolutely fine.

Important: I am convinced that most major organizations over time will benefit from using the innovation from several Public Clouds, but I strongly recommend that you start with ONE and "go native" with each Cloud. I will argue that very few organizations will have the capability (or resources) to implement more than one Cloud at the same time. Please see my Cloud Strategy blog for more details about Multi-Cloud considerations.

Start with ONE and Go Native!

Metropolis

No alt text provided for this image

Using an analogy, an Enterprise-scale platform is similar to how city utilities such as water, gas, and electricity are accessible before new houses are constructed.

In this context, the network, IAM, policies, management, and monitoring are shared 'utility' services that must be readily available to help streamline the application migration and innovation process.

An Enterprise-scale platform consists of the two areas:

  • The Platform: The general "plumbing"; e.g. the general identity, security, governance, networking, monitoring … services, to be used by all workloads
  • The Landing Zones (*): The application specific "plumbing"; e.g. everything needed by the specific application archetype on top on what is already provided by the Platform

(*) In Enterprise-scale the landing zone is implemented as an "Azure subscription".

Compliance-as-Code

All organizations have today very high compliance and security requirements. In regulated industries like Public Sector, Finance, Pharma etc., you often have to document the compliance formally. However, this documentation process is often manual and very time-consuming and with a significant "after-the-fact" focus; e.g. to describe what you have done to avoid incompliance and how you will identity potential breaches if/when they happen.

In this blog, I will illustrate how you with Enterprise-scale can implement most (all?) of these requirements "as code"; e.g. "Compliance-as-Code". See also here.

In other words, instead of sending documents to your auditors, you can now not only document what you wanted to do, but also that it is actually implemented, by sending your "code" - or by giving read-only rights to your GitHub :)

Azure policies

No alt text provided for this image

"Compliance-as-Code" is based on the core Azure policy concept.

As Azure is software-defined, you can code your compliance requirements, like in the simple "Allowed locations" example here, that will restrict ("Deny") any use of resources outside a specific location/Azure region, here "westus2". You will see further examples of "Compliance-as-Code" later in this blog.

Subscription democratization & Policy-driven governance

The Enterprise-scale architecture is based on the five design principles. The two first (Subscription democratization and Policy-driven governance) represent a fundamental shift in how we (IT/Ops) today offer services to the "business".

With Subscription (or Cloud) democratization, we want to make it as simple as possible for the "business" to use Cloud; e.g. no back-level "golden images" or ticket/approval systems to get access to "shared services". It should all be "self-service" and app teams can create the necessary services (through portal or Infrastructure-as-Code) on demand, maybe as part of a DevOps process.

However, you can still be in full control and be compliant, for example by using Azure Policy effects to:

  • Deny not approved services or applications (Public IP, locations ...) and force encryption, BYOK etc.
  • Deploy components, we require (patching, backup, monitoring etc.)
  • Create Audit logs if workloads are "incompliant", as especially legacy workloads may not always run in a fully compliant environment
No alt text provided for this image

Azure Blueprints

No alt text provided for this image

Azure has already defined a number of "blueprints" for the guardrails/policies, needed to comply to international standards like FedRAMP, NIST, ISO 27011 etc.

In Enterprise-scale we will not directly use Azure Blueprints as we as a guiding principle want to use Azure Resource Manager (ARM) as our "Management & Control Plane".

However, you can absolutely benefit from using the Azure Blueprints as inspiration for the Azure policies, you will need to implement and assign in "code".

Part 1: From Portal to PlatformOps & Compliance-as-Code

In the section, I will walk you through a typical Azure maturity journey and illustrate how you can transform your Azure foundation from using the Azure Portal to build "Compliance-as-Code" in an Open Source community.

Step 1: The Azure Portal

No alt text provided for this image

Almost all will start their Azure journey by using the Azure Portal, typically by building a specific LZ for a specific workload by setting up AAD, RBAC, network etc. up from scratch in your subscription.

This may work for your first workload, but experience shows it does not scale if later you want to add additional workloads. It is very different to build LZ's for a simple sandbox environment, a "Lift & Shift" migration or an SAP implementation.

Step 2: Infrastructure-as-Code (IaC)

No alt text provided for this image

The next natural step is to reuse ARM templates, generated by the Portal.

This obviously speeds up the process of generating a 'copy' of existing Azure resources, but it is still a very manual process with lots of potential human errors.

No alt text provided for this image

As an example, you can execute your modified ARM template from a VS Code PowerShell terminal, if you connect to your Azure tenant, using the "Connect-AzAccount" command

Step 3: Introduce Platform development teams - and GitHub

No alt text provided for this image

This is a major step, to go from reusing templates to introducing a Platform development team.

In the Enterprise-scale reference implementation, we will use GitHub to build, test, and deploy the Platform code.

We will use GitHub Actions to deploy to Azure.

No alt text provided for this image

As many will still continue to use the Portal for some deployments, we can also use GitHub Actions to synchronize current Azure state to GitHub.

This allows the Platform developers - the PlatformOps team - to be in full control, including being able to manage and code review the platform (including version control), to commit or rollback changes, and with the option to make branches to test new functionality.

Step 4: Compliance-as-Code

No alt text provided for this image

With the setup, you are now ready to bring in the compliance experts to define the requirements, you will need be compliant.

You will obviously need people, who can turn the requirements into code; e.g. xOps people who know "Infrastructure-as-Code" (IaC), but trust me: It is the easy part!

IaC is "easy", defining compliance is hard

The real challenge is to define what you need to be compliant, including

  • Which people need to be involved (Compliance, Legal, Security, Ops …)
  • Which standards do you want to lean on (ISO, NIST, FedRAMP, Government …)
  • What specific compliance requirements does your organization have?

Step 5: Adding the Application LZ's

No alt text provided for this image

When the Platform "plumbing" is ready, it is now time for the application LZ work to start.

They will naturally be able to use the same development environment, including GitHub.

As you see in the ESLZ Roadmap, it is planned to build LZ's for specific application archetypes, initially AKS, Windows Virtual Desktop (WVD), SAP, HPC and Analytics.

Step 6: "Compliance-as-Code" as Open Source

No alt text provided for this image

GitHub is today hosting the largest Open Source community in the world and it is a natural step to build on that to extend Enterprise-scale into communities, both the platform itself and over time even application archetype LZ's.

Microsoft invites already today individuals and organizations to contribute directly to the development of Enterprise-scale.

However, we also know that many organizations can benefit from collaborating directly. It could be Public Sector government organizations in countries with similar compliance requirements, co-developing a common "Compliance-as-Code" Platform, still allowing the individual organizations to implement own functionality where needed.

This is exactly what GitHub is built for and what it does every day!

Step 7: SaaS-as-Code

No alt text provided for this image

Many organizations are today using - or expecting to use - the innovation in the growing SaaS market without the overhead of local management and governance.

However, I see two areas where traditional SaaS vendors may be challenged in the near future:

  1. Compliance requirements keep getting more and more advanced and demanding, including who have access to which data in which situations and from where. As a Hyperscale Cloud provider, we have these complex dialogs every day and I expect SaaS vendors to have to meet the same expectations, if not today, then soon.
  2. Exclusive ownership of your own data seems like a natural ask, but it far from simple in a (traditional) SaaS world. In the new data-driven world, all your data must be in your own data lake, including raw/telemetry data. "Data has gravity" and for latency reasons your new innovative apps and AI models must be very close to the data you want to use for testing, training and production.

I foresee that innovative SaaS vendors will be able to build on the mature enterprise-scale platform and on the advanced community features in GitHub and give their customers a "SaaS-like" experience with both an evergreen platform and application within the customer's own environment.

And next: It is all about the "pipeline"!

Part 2a: The Enterprise-scale pipeline - overview

No alt text provided for this image

You will here get a high-level overview of how you with Enterprise-scale can develop and maintain your Azure platform in a PlatformOps team, utilizing GitHub and GitHub Actions.

As you can see below, you will be able to ...

No alt text provided for this image
  1. Import the current state from your Azure environment, including management groups and assigned policies
  2. Deploy changes from the PlatformOps team to Azure.

Step 1: From Azure - import current Azure state

No alt text provided for this image
  1. Initiate the workflow using GitHub CLI
  2. This will spin a Pull Request, and using GitHub Actions it will export current state to a new GitHub branch "System"
  3. Merge "System" into your "main" branch in your "Org" GitHub repo.
  4. Synchronize the changes back to your local "main" branch, using "git pull".

Step 2: To Azure - deploy changes to Azure

No alt text provided for this image
  1. Create a new branch in your local environment, make your changes and commit the branch.
  2. Push the changes to GitHub Org level
  3. Create a Pull Request and kick off a GitHub Actions workflow to deploy your changes to Azure
  4. If the deployment is successful, merge the new Azure state into your "main" branch
  5. Synchronize the changes back to your local "main" branch, using "git pull".

Part 2b: The Enterprise-scale pipeline - under the hood

In this section, you will see how you can use the Enterprise-scale reference implementation as a starting point to get to a production ready Enterprise-scale Azure platform.

Note: Please refer to the "Getting Started" section for more details on how to setup this in your own environment.

I will go through these 4 steps

  1. Deploy the Enterprise-scale reference implementation to an Azure environment, here based on two MSDN subscriptions
  2. Synchronize the state of this Azure environment to the Org Github repo (2a) and then "pull" to the local platform team, with local developer GitHub repo's (2b)
  3. Make changes to the Azure state in the Azure portal (3a) and then repeat Step 2; e.g. synchronize Azure state to the Org Github repo (3b) and "pull" to the local environment (3c)
  4. Make changes to the Azure state in "code" in the local environment, commit and push these changes to the Org GitHub (4a). Now create a new PR for the change to kick off a workflow to deploy to Azure (4b) and finally merge changes back to Org GitHub (4c) and "pull" to the local environment (4d)
No alt text provided for this image

Note: This process will utilize an Enterprise-scale provided script ("AzOps") to process the integration between GitHub and Azure.

Step 1: The Enterprise-scale reference implementation

No alt text provided for this image

The best way of learning how to benefit from Enterprise-scale, is to see it in action. As mentioned earlier, Enterprise-scale contains a "Contoso" reference implementation that can be deployed directly to your Azure environment in a One-Click experience.

The reference implementation will deploy ...

  • The management group hierarchy - see the picture below.
  • Approx. 100 "custom" policies
No alt text provided for this image

In the Azure Portal, you can now see both management groups and the new custom policies

No alt text provided for this image

Step 2: Extract current Azure state to GitHub

We will start this process by using GitHub CLI from a command line to kick off a GitHub Actions workflow (2a), utilization the AzOps script.

You can follow the progress in the "Actions" part of your GitHub website (2b).

The workflow will eventually create a Pull request (PR) and create a new "System" branch, that you can merge into your "main" branch (2c).

Last, you can use "git pull" to synchronize the changes to your local GitHub repo and you now have a synchronized representation of the state of your Azure management groups and policies in your local environment (2d).

No alt text provided for this image

Step 3: Change Azure State using the Azure Portal

In this step, we will start by using the Azure portal to make a number of changes to the Azure state:

Assign policies to management groups

  • Two policies ("DenyPublic-IP" and "Allowed locations" to "northeurope"&"westeurope") to management group "AB-management"
  • One policy ("Allowed locations" to "northeurope") to management group "AB-sandboxes"

Assign subscriptions (LZs) to management groups

  • The subscription "Platform LZ" to management group "AB-management"
  • The subscription "Sandbox LZ" to management group "AB-sandboxes"

The management group hierarchy now looks like this …

No alt text provided for this image

The state should be like this in the Azure portal.

No alt text provided for this image

If we now synchronize the Azure state again by repeating Step 2 above, we can now see the changes in our local code …

No alt text provided for this image

Compliance-as-Code in action

As expected, you will now be able to create a Public IP in "Sandbox LZ", but not in "Platform LZ".

No alt text provided for this image

However, if you try to deploy the public IP in "westeurope" in the "Sandbox LZ", it will be denied, as the only allowed location in management group "AB-sandboxes" is "northeurope".

No alt text provided for this image

Step 4: Change my Azure State from code

The last step is to demonstrate how you deploy your own code into Azure from your local VSCode with a local GitHub repo. You can read more about how artifacts are deployed in Enterprise-scale here.

In this example, we will assign a new policy "Deploy-Log-Analytics" to the "AB-management" management group.

Note that the "Deploy-Log-Analytics" policy has a "DeployIfNotExist" effect; e.g. it will deploy a Log Analytics workspace to any subscription in the "AB-management" management group if it does not already exist. In order to do this, a "DeployIfNotExist" policy will need a "managed identity" and as you will see, this will created automatically through this process. The two sample policies, we have used so far, DenyPublic-IP and Allowed locations, both have a "Deny" effect and do not need a managed identity!

No alt text provided for this image

We will start the process by creating a new GitHub branch - "deployLA" (4a), add the sample code to the "policyAssignments" section in the "Management-managementGroups_AB-management.parameters.json" file (4b), commit and push the new branch to the "origin" GitHub (4c).

No alt text provided for this image

We can now create a new PR from the new branch in GitHub Origin website (4d). This will kick off a "GitHub Actions" workflow, that will deploy our changes to Azure, again using the "AzOps" script (4e). When the AzOps script is finished, we can merge the potential changes into our "main" branch (4f).

No alt text provided for this image

In our local environment (here VSCode), we can now "git pull" the changes to our local GitHub repo (4g). As you can see here (4h), this process has actually changed part of the code, we provided, here adding information to the "Identity" section; e.g. the managed identity.

No alt text provided for this image

Last, but not least, we can also see the new policy assignment (4i) and the managed identity (4j) in the Azure portal.

No alt text provided for this image

A little history ...

No alt text provided for this image

Three years ago, in May 2017, I published an article with high-level guidance on how to build a production-ready Azure platform - Azure Onboarding AKA ‘The House.

I hope you will agree that we have come far since then :)

Want to hear more - or have feedback/suggestions?

As always, I am very interested on your feedback. Please feel free to add a comment to this blog, reach out to me ([email protected]).

Other Cloud related blogs

Cloud Strategy:

Digital Transformation Delivered

 Cloud Economics series

Hans Bjerner

Service Manager - Identity & Cloud services at Clas Ohlson

4 年

Anders Bonde Hi Anders. Are there any plans to incorporate or make use of Project Bicep (https://github.com/Azure/bicep) in the Cloud Adoption Framework? Have a nice day!

Great writeup Anders, thanks for sharing!

回复
Hans Bjerner

Service Manager - Identity & Cloud services at Clas Ohlson

4 年

This is good stuff regarding Azure governance!

Good stuff though I would say there are not many compliance schemes that require posture controls (e.g. ensure that x is not done) alone e.g. you still need the operative mitigations and ability evidence both posture and operative controls.

Kamil Wiecek

CCoE Azure Platform Engineer ??

4 年

I am also enthusiastic about the democratization of subscriptions. Managing budgets, permissions, and policy assignments is simpler and scales better when the resource container in a given application is a subscription rather than a resource group. What's more, you can automatically create 2000 subscriptions from one EA account. It would be even better if you didn't have to wait 48 hours to set your budget. I am planning to share the workaround with the community that we used in one of the projects. The concept of 'Step 2: Extract the current Azure state to GitHub' is interesting. I will check it for sure.

要查看或添加评论,请登录

Anders Bonde的更多文章

社区洞察

其他会员也浏览了