Policy as a code : Deep Dive
What is Policy?
A policy is typically a document that outlines specific requirements or rules that must be met. In the information/network security realm, policies are usually point-specific, covering a single area.
Policy is a nebulous term and can span many different categories:
- Compliance Policies : These policies ensure compliance with external standards such as PCI-DSS, SOC, or GDPR. External industry working groups or government agencies establish and mandate these standards.
- Security Policies : Security policies adopted internally protect data privacy and infrastructure integrity. For example, ensuring only certain applications run on public networks or expose specific ports to the Internet.
- Operational Excellence : These policies prevent service outages or degradation. For example, a policy may mandate at least two service instances, or validation of new configurations.
What is Policy as a Code?
- Policy as code is the idea of writing code in a high-level language to manage and automate policies.
- For example, Writing infrastructure policy in a high-level programming language helps automate and enforce best practices. When policies are written with code, you can apply software development practices such as testing, automated deployment, and version control.
- Policy as code uses codified policies and automated enforcement. You can use pre-built policies as well as you can write your own custom policies using a simple domain-specific language (DSL) or scripting language or YAML files etc.
- To ensure consistency between policies and to increase clarity a policy template can be created. For example, A policy template lists the data sources from which to retrieve data, the validations that apply to that data, and the actions to be taken when a violation occurs or is resolved.
- Using code to write your policies enables you to create a policy template that can be versioned using standard source code control versioning tools. Writing your own policy template also makes it possible to reuse and share code snippets across policies for the purposes of defining common data sources or common validations.
- By representing policies as code in YAML files, proven software development best practices can be adopted such as version control, automated testing, and automated deployment.
- Using Policy as Code, users can express business or security rules as functions that are executed against resources in their stacks.
- Cloud providers typically offer a GUI to create policies. Moreover, policies must be tested against a live system, which means using an existing system or configuring and deploying an ephemeral version.
Common use cases of polices in infrastructures can be
- Which users can access which resources.
- Which subnets egress traffic is allowed to.
- Which clusters a workload must be deployed to.
- Which registries binaries can be downloaded from.
- Which OS capabilities a container can execute with.
- Which times of day the system can be accessed at.
Policy as a code examples
1. Policy that finds all the resources in a deployment and calculates the total monthly cost. The policy sets the monthly cost permitted and sends a warning if the deployment exceeds it.
2. A watchdog serverless function with a policy that cleans up unused resources.
3. Policy as code to enforce tagging of resources when they are created to enable cost tracking by project, stack, and cost center.
4. Policy as code to track development instances may remain inadvertently available long after they have been used for development.
5. Controlling ingress and egress to and from resources and not exposing them on the Internet
6. Policies that run at the level of the organization can mitigate the likelihood of unauthorized access to resources or even data breaches.
7. Write a policy that checks the version of GKE and prevents those version from being deployed
8. Out-of-compliance resources are blocked from being created or modified by the policy.
9. You can run policies against a stack as a whole, i.e., when all resources are registered and will not prevent resources that are out of compliance from being created.
10. Tag your resources for cost tracking, automation, and organization. automate applying your tags in a consistent way across all of your projects and resources.
11. Restricting Access to a Specific HTTP Referer
12. Limiting Access to Specific IP Addresses
13. Granting Read-Only Permission to an Anonymous User
14. Granting Permissions to Multiple Accounts with Added Conditions
15. Prevent any production resources from being deployed outside EU region
16. Enable logging for all web app services deployed in a given subscription
17. Allow container images from whitelisted registries only
18. Restrict availability zones
19. Disallow 0.0.0.0/0 CIDR blocks
20. Restrict instance types of EC2 instances
21. Require VPCs to be tagged and have DNS hostnames enabled
22. Enforce owner allow list on aws_ami data source
23. Restrict VM images
24. Restrict the size of Azure VMs
25. Enforce limits on AKS clusters
26. Enforce limits on a GKE cluster
27. Restrict machine type of Virtual Machine instances
28. Restrict VM CPU count and memory
29. Enforce NFS 4.1 and Kerberos
30. Require Storage DRS to be enabled
31. Restrict virtual disk size
32. Granting Permission to an Amazon CloudFront OAI
33. Adding a Bucket Policy to Require MFA
34. Granting Cross-Account Permissions to Upload Objects While Ensuring the Bucket Owner Has Full Control
35. Granting Permissions for Amazon S3 Inventory and Amazon S3 Analytics
36. Bucket Policies for VPC Endpoints for Amazon S3
Why we need Policy as a code?
Policy as a code provides a number of benefits.
Organizations can realize cost savings, improved compliance, efficient deployments, fine-grained control over infrastructure, and better use of cloud provider native resources.
Cast Saving One way to control costs is to set policies based on pricing. With pricing data online, we can calculate the cost of a resource ahead of time and create a policy that limits the amount spent to deploy it.
Sandboxing Policies provide the guardrails for other automated systems. As the number of automated systems grow, there is also a growing need to protect those automated systems from performing dangerous actions. Manual verification is too slow; policies need to be represented as code to keep up with other automated systems.
Codification By representing policy logic as code, the information and logic about a policy is directly represented in code and can be augmented with comments rather than relying on oral tradition to learn about the reason for policies.
Version Control Policies are encouraged to be stored as simple text files managed by a version control system. This lets you gain all the benefits of a modern VCS such as history, diffs, pull requests, and more.
Testing Policies are just code. Their syntax and behavior can be easily validated. This also encourages automated testing such as through a CI. Paired with a VCS system, this allows a pull request workflow to verify that a policy keeps the system behavior as expected before merging.
Automation With all policies as code in simple text files or DSL. To automate, various automation tools can be used. For example, it is trivial to create tools to automatically deploy the policies into a system.
Policy as Code in terms of repeatability, versioning, and testing, benefits developers and operators directly. The benefits extend beyond DevOps and into the success of an organization. Policy as Code provides the following organizational benefits: means for automated cost control, compliance to avoid downtime by securing resources, validating infrastructure before creating resources (another cost-saving measure), encoding best practices for resource stacks, and working with cloud provider native resources to provide best of breed security and granular control.
How to do Policy as a code?
Tools for Policy as a code
1.Open Policy Agent (a Policy-based control for cloud native environments)
- Open Policy Agent (OPA) is an open-source unified toolset and framework for policy as a code across the cloud native stack capable of policy enforcement across different technologies and systems like microservices, CI/CD pipelines, gateways, Kubernetes, etc.
- OPA provides a single authoring language (Rego) and policy runtime, which can be used to enforce policy in a wide variety of environments including Kubernetes, Envoy, Terraform Kafka, SQL, Linux , Pulumi.
- The above image shows the architecture of OPA. It exposes APIs which any service that needs to make an authorization or policy decision, can call (policy query) and then OPA can make a decision based on the Rego code for the policy and return a decision to the service that further processes the request accordingly. The enforcement is done by the actual service itself, OPA is responsible only for making the decision. This is how OPA becomes a general-purpose policy engine and supports a large number of services.
- Whether for one service or for all your services, Use OPA to decouple policy from the service's code so you can release, analyze, and review policies (security and compliance) without sacrificing availability or performance.
- You can use OPA to control access to its internal API resources and can enforce access control and authorization.
- Chef uses OPA to provide IAM capabilities in their end-user products, to enforce policies on platforms (like Kubernetes clusters)
- OPA uses a number of APIs that make it easy to inject new policies, check the version and status of the existing ones, or collect audit and log data.
- OPA rules are Declarative and Expresses policy in a high-level, declarative language (Rego) that promotes safe, performant, fine-grained controls.
- The Rego language is easy to use, and expressive - along with good integration in tools like Visual Studio Code.
- OPA can integrate with many modern-day systems and platforms like Kubernetes, Kafka, SQLit and Terraform.
2. Azure Policy
- Azure Policy is a governance tool that gives users the ability to audit and manage their Azure environment at scale. Azure Policy provides the ability to place guardrails on Azure resources to ensure they are compliant with assigned policy rules. It allows users to perform audit, real-time enforcement, and remediation of their Azure environment. The results of audits performed by policy will be available to users in a compliance dashboard where they will be able to see a drilldown of which resources and components are compliant and which are not.
- Azure Policy is a service that offers both built-in and user-defined policies across categories mapping the various Azure services such as Compute, Storage or even AKS.
- These policies can be defined on the Azure Portal and assigned to one or more subscriptions/resource groups (referred as scope).
Azure Policy assignment structure
- Policy assignments are used by Azure Policy to define which resources are assigned which policies or initiatives.
- You can use JSON to create a policy assignment. The policy assignment contains elements for: display name, description, metadata, enforcement mode, excluded scopes, policy definition, parameters
- Example : https://docs.microsoft.com/en-us/azure/governance/policy/concepts/assignment-structure
Azure Policy definition structure
- Policy definitions describe resource compliance conditions and the effect to take if a condition is met.
- By defining conventions, you can control costs and more easily manage your resources. For example, you can specify that only certain types of virtual machines are allowed. Or, you can require that resources have a particular tag. Policy assignments are inherited by child resources. If a policy assignment is applied to a resource group, it's applicable to all the resources in that resource group.
- The policy definition policyRule schema is found here:- https://schema.management.azure.com/schemas/2019-09-01/policyDefinition.json
Programmatically create Azure policies
- Azure Policy definitions enforce different rules and effects over your resources. Enforcement makes sure that resources stay compliant with your corporate standards and service level agreements.
- https://docs.microsoft.com/en-us/azure/governance/policy/how-to/programmatically-create
Below are some of the use cases for creating Azure policies programmatically
Get compliance data of Azure resources
- https://docs.microsoft.com/en-us/azure/governance/policy/how-to/get-compliance-data
Determine causes of non-compliance
- https://docs.microsoft.com/en-us/azure/governance/policy/how-to/determine-non-compliance
Remediate non-compliant resources with Azure Policy
- https://docs.microsoft.com/en-us/azure/governance/policy/how-to/remediate-resources
Integrate Azure Key Vault with Azure Policy
- https://docs.microsoft.com/en-us/azure/key-vault/general/azure-policy
3. AWS Policy Generator
- The AWS Policy Generator is a tool that enables you to create policies that control access to Amazon Web Services (AWS) products and resources.
- https://awspolicygen.s3.amazonaws.com/policygen.html
4. Sentinel
- Sentinel is HashiCorp’s framework for the implementation of Policy as Code (PaC). It integrates with Infrastructure as Code (IaC), and allows teams/organizations to be proactive from a compliance/risk standpoint.
- Sentinel is a language and framework for policy built to be embedded in existing software to enable allows for granular, logic-based policy decisions that reads information from external sources to derive a decision. A policy describes under what circumstances certain behaviors are allowed.
- It can be used with the following HashiCorp products; Terraform, Vault, Consul, and Nomad.
- Sentinel provides a workflow for building policy across any system that embeds Sentinel.
- Sentinel also provides a local CLI for developing and testing Sentinel policies.
- Sentinel provides a simple policy-oriented language to write policies, and integrates with tools like Terraform Enterprise and Nomad Enterprise to enforce them.
- Rules : Rules form the basis of a policy by representing behavior that is either passing or failing (true or false). Rules are a first class language construct in Sentinel. A policy can and should be broken down into rules to aid with readability, testability, and performance.
- Assume we have a Sentinel policy that prevents users to deploy infrastructure resources to a public cloud environment without at least one tag. The terraform plan output would be evaluated against that Sentinel policy. If the resources contains at a minimum a tag, as defined in the policy, then the user is allowed to execute terraform apply. Otherwise the plan is rejected and the user is forced to make the specified changes so that the plan passes the policy check. In simple terms, Sentinel prevents users from conducting actions deemed “unapproved” by the policy authors. Traditionally, policy authors are platform administrators and/or information security.
5. PacBot
- Policy as Code Bot (PacBot) is a platform for continuous compliance monitoring, compliance reporting and security automation for the cloud.
- https://github.com/tmobile/pacbot
6. MagTape
- MagTape is a Policy-as-Code tool for Kubernetes that allows for evaluating Kubernetes resources against a set of defined policies to inform and enforce best practice configurations. MagTape includes variable policy enforcement, notifications, and targeted metrics.
- MagTape builds on the Kubernetes Admission Webhook concept and uses Open Policy Agent (OPA) for its generic policy language and engine.
- https://github.com/tmobile/magtape
7. Intercept
- Intercept is a command-line scanner that leverages the power of the fastest multi-line search tool to scan your codebase. It can be used as a linter, guard rail control or simple data collector and inspector.
- Intercept merges environment flags, policies YAML and optional exceptions YAML to generate a global config. It recursively scans a target path for policy breaches against your code and generates a human-readable detailed output of the findings.
- https://github.com/xfhg/intercept
8. Cloudpatrol
- Cloudpatrol is a Policy as Code for the Cloud Development Kit
- Cloud Patrol let's you define common policies with remediation strategies for your AWS CDK stacks and enforce them across your CDK stacks / applications.
- https://github.com/skorfmann/cloudpatrol
9. pulumi-policy
- Pulumi Policy SDK Define and manage policy for cloud resources deployed through Pulumi.
- Policy rules run during pulumi preview and pulumi up, asserting that cloud resource definitions comply with the policy immediately before they are created or updated.
- During preview, every rule is run on every resource, and policy violations are batched up into a final report. During the update, the first policy violation will halt the deployment.
- Policy violations can have enforcement levels that are advisory, which results in a printed warning, or mandatory, which results in an error after pulumi preview or pulumi up completes.
- Policies can be written in TypeScript/JavaScript (Node.js) or Python and can be applied to Pulumi stacks written in any language.
- https://github.com/pulumi/pulumi-policy
10. CrossGuard
- CrossGuard is Pulumi’s new Policy as Code offering.
- Pulumi CrossGuard is a product that provides gated deployments via Policy as Code.
- CrossGuard empowers you to set guardrails to enforce compliance for resources so developers within an organization can provision their own infrastructure while sticking to best practices and security compliance. Using Policy as Code, you can write flexible business or security policies.
- Using CrossGuard, organization administrators can apply these rules to particular stacks within their organization. When policies are executed as part of your Pulumi deployments, any violation will gate or block that update from proceeding.
- Policies can be written in TypeScript/JavaScript (Node.js) or Python and can be applied to Pulumi stacks written in any language.
- When you run pulumi up, CrossGuard evaluates every resource in the stack against the Policy Pack. CrossGuard works in AWS, Azure, Google Cloud Platform, and Kubernetes.
The CrossGuard preview provides the following key features
- Policy SDK for coding custom policies using TypeScript or Javascript
- Running a Policy Pack locally to speed up development and testing of policies. Validate infrastructure before deployment.
- AWSGuard : Pulumi CrossGuard policies for AWS
- AWSGuard is a ready-to-apply playbook for enforcing AWS best practices for security, reliability, and cost
- For Example, Apply a Policy Pack across an organization to validate all the infrastructure deployed
- another example can be a set of policies that codifies best practices for AWS that you can adopt and use in a Policy Pack.
- https://www.pulumi.com/docs/guides/crossguard/
Best Practices as Policies
- Policy as Code ensures that you can enforce best practices for cost, compliance, security, and team practices for a single project or across your organization.
- It’s best practice to operate your infrastructure under the principle of least privilege, i.e., only allowing access to resources needed to perform a job.
- A best security practice is not to let unknown devices attach to your network.
- It’s a best practice to control ingress and egress of resources and not expose them on the Internet unless needed.
- A deployment should pin the containers to a specific version to keep the infrastructure consistent.
- Policy Packs They provide a way to group similar policies based on how you manage your infrastructure.
- For Example, You can have Kubernetes policies bundled with container registry policies.
- You may have several policies for storage based on how they are tagged.
- You can apply a Policy Pack to a single stack of resources or across multiple stacks.
- Policy Groups A group of stacks that use the same Policy Pack is a Policy Group.
- A stack can belong to multiple Policy Groups.
- A typical application of Policy Groups is to set policies for environments; for example, you might have a more permissive Policy Group for your development and staging environments and more restrictive one for production.
- You can apply Policy Packs on individual resource stacks or across multiple stacks as Policy Groups, giving you granular control over how and which resources are deployed.
Best Practices for Policy as a code with Cloud Native Resources
- Custom policies to automate governance of costs, operations, security, and compliance so that you can control cloud.
- Automating your governance controls with a broad set of out-of-the-box policies or by creating custom policies tailored to your needs.
- Eliminating waste with cost policies that alert on cost anomalies and take automated actions on idle and underused resources.
- Avoiding security holes with policies that identify and alert on misconfigured networking options, unsecured data storage, and non-compliant resources.
- Enforcing compliance with policies that uncover missing tags and ensure that appropriate regulations are being met.
- Ensuring operational resilience with policies that check for required configurations, such as backups or failover systems.
The following 6 tips can help you formulate your security policy to support a DevSecOps pipeline.
- Plan and implement governance
- You need to put in place a clear set of policies and procedures to manage the DevSecOps process. You also need to enable the creation of audit trails, which are necessary for compliance reporting. To ensure the transparency and traceability of the DevSecOps pipeline, you should put in place easy, one-click compliance reporting throughout the software development lifecycle (SDLC).
- Another crucial step for DevSecOps planning is to clearly define the roles and responsibilities of your staff across teams. It is important to consider your security policy as a living organism that can grow and change over time, and respond to the insights gained through continuous monitoring of security events. When in doubt, you can refer to a DevSecOps security checklist.
2. Maintain a single source of truth
- To enforce your policies and evaluate compliance, you need to keep your policy definition in a single repository. You can use a version control system like GitHub, and the policy definitions arranged in a uniform system will become the single source of truth for all teams involved in security. Metadata is useful for making your files intelligible and will make it easier to implement your policies.
3. Encourage collaboration on security
- DevSecOps is dependent on development, operations and security teams having shared objectives. Activities are aligned to business priorities and are measured using uniform metrics. Make sure that your teams are all familiar with their responsibilities and provide a standardized production environment and common language for addressing security issues. This integrated framework will help secure both the application and the pipeline in a comprehensive and thorough manner.
4. Secure your code
- Security as code is a central aspect of the DevSecOps approach. To improve the security of your applications and reduce security debt, you should use secure coding practices, incorporating them into your policy. These include the use of automated testing and security tools while building code, restricting access to the development environment, and threat modeling to identify vulnerable points in your code. The use of techniques such as containerization and cloud infrastructure automation can also facilitate security and compliance auditing.
5. Create a continuous feedback loop
- Feedback allows developers and the machines they use to gain comprehensive insight into system vulnerabilities. It is also essential for informing policies and rule sets that keep security testing tools updates. For example, the threat intelligence collected can shape prioritization and process flow decisions.
- Proactive monitoring provides actionable information, conveyed to security teams via dashboards and automated alerts. Continuous monitoring will help security analysts identify security issues before damage is done. Organizations should arm themselves with real-time, continuous feedback that will allow them to stay on top of the evolving security landscape.
6. Automate recurring tasks
- Automation can help reinforce and elevate your security processes and is a core element of DevSecOps. Recurring tasks can be easily automated to save time, reduce human error and support an integrated workflow. You can automate tests, scans and operational controls to embed security into the development pipeline.
- Operations engineering tasks can be performed automatically in secure containerized or infrastructure-as-code environments. This will be much quicker than human-driven processes and ensure that responses to detected intrusions are instant. You can engineer these response capabilities to automatically freeze nodes, redirect traffic and notify operators or relevant third parties.
- Building security at the code level is a fundamental aspect of the DevSecOps approach, and there are numerous techniques and tools to help you achieve this. However, the security of your application and production environment is only as strong as your governance. For this reason, it is essential to design and implement clear security policies and provide the means to track compliance.
- For a true DevSecOps pipeline, you also need to ensure that your organization as a whole adopts a DevSecOps culture and that your policies are flexible and responsive to evolving security threats. Creating and implementing your DevSecOps policy with an emphasis on secure code will allow your organization to achieve security throughout the development pipeline and after release, both for the application and for the production environment, and you will be able to oversee the process with good governance transparency measures.
- Cost : It’s critical to identify all waste and take action to optimize it.
- Identifying idle and underutilized instances as well as complex scenarios like Reserved Instance planning to reduce wasted spend.
- Unattached volumes, old snapshots, reserved instances, underutilized VM's, Schedule Instances
- Use tagging as a foundation for ongoing cost management
- Security : Securing public storage buckets
- Taking control of your security groups
- Monitoring and securing IAM access
- Unsecured Storage, Open security Groups, Disallowed ports, Open IAM Policies
- Compliance : Ensuring a comprehensive tagging strategy
- Enabling you to write custom policies for HIPAA, GDPR, PCI, and more
- Untagged resources, Invalid Tags, Disallowed configurations
- Operational : Reduce waste by scheduling instances to run only when needed
- Implement automatic key rotation to avoid downtime
- Automating everyday IT operations. Running an automated and efficient cloud infrastructure frees up expensive IT resources to deliver on growth initiatives for your enterprise.
- AWS Key Rotation, No Recent Snapshots
- Cloud providers have methods for managing access to resources such as IAM (Identity Access and Management) policies and ACL (Access Control Lists). This is one type of policy, but it doesn’t cover the full range of actions available for managing your infrastructure.
- IAM Access Analyzer in conjunction with Policy as Code.
- IAM Access Analyzer validates all resources per-region with a single analyzer. It creates a record for each problem identified and shows which policy is responsible for granting wider access to resources than would be best practice. Policy as Code applies policy packs to stacks of resources anytime infrastructure is deployed.
- The IAM Access Analyzer works on deployed resources, we have to deploy the infrastructure first before running the analyzer.
- When the scan of a resource is complete, it returns the detailed results that we can pass to the policy for validation.
- In this way, we can use cloud provider native tools in conjunction with Policy as Code to manage and protect our infrastructure.
Policy as a code with DevOps Pipelines
Policy-as-code with Azure Pipelines
- Azure Policy is a service that offers both built-in and user-defined policies across categories mapping the various Azure services such as Compute, Storage or even AKS
- Azure Pipelines can help treat Azure Policies as code and weave compliance checks into application CI/CD.
- The version-controlled policies are now the single source of truth for teams and individuals accessing and modifying policies.
- Policies could be organized in a way that offers clarity and consistency. One such way is to place each policy in its own folder with the parameter values for dev/qa/prod stages in their own files.
- For instance, the networks allowed to a production resource would mostly be different from that for dev resource or the resources on a production system will belong to certain size and region compared to dev/qa.
- Alternatively, one can use metadata files placed alongside the policy definition files to have finer grained control on the policy creation and assignment process.
Integrating policy compliance with pipeline
- Azure Pipeline can be used to create and assign the Azure Policies alongside application code. You can pick an existing template (Azure Policy Deployment) that gives an example of how you can use simple powershell commands to create and assign policies quickly.
- The template has 2 tasks one each to create and assign policies to create/update compliance for a given scope (management group/resource group). You can modify the powershell scripts in the tasks below to handle multiple scripts at once.
- If policies are changing and validated only in production, it’s already too late for the application team to do anything but try and rollback when there is a compliance violation. ‘shift-left’ — making policy evaluation be part of the DevOps process right from the start and not retrofit is an important tenet of DevSecOps.
- Consider the release process overview below:
- We have the policies and application code in a version control system (typically, different repository). At the start of the release process that spans across Dev/QA/Prod environments, we are picking the latest policies and applying them to various environments. Further, between the deployment environments, we are adding explicit controls to check if the latest policies are being adhered to using Check Azure Policy compliance gate.
- This is how the pipeline would look like:
- Policy violations can happen in two scenarios:
- New policy applied on existing resources — in this case, the resources are deployed but not adhering to the new policies.
- New deployment request that violates the existing policy.
- In both the scenarios, we’d like to catch the violation as early in the release cycle as possible. To achieve this, we assign the fetched policies in a pipeline stage before any deployments (to prevent policy violations at deployment time) and enabling gates between pipeline stages to evaluate policy compliance before moving ahead (to handle policy violations on existing environments and violations that weren’t captured during deployment time such as in-guest configuration changes)
- we saw a way to realize policy-as-code using Azure Pipelines and how deploying latest policies to pre-production environments can help in discovering policy violations much earlier (shift-left).
- Scripts used for batch policy creation and assignment: https://github.com/pingvishal-msft/batch-policy-scripts.
References :
- https://www.openpolicyagent.org/docs/latest/
- https://docs.microsoft.com/en-us/azure/governance/policy/
- https://docs.hashicorp.com/sentinel/
- https://github.com/tmobile/pacbot
- https://github.com/tmobile/magtape
- https://github.com/xfhg/intercept
- https://github.com/skorfmann/cloudpatrol
- https://github.com/pulumi/pulumi-policy
- https://www.pulumi.com/docs/guides/crossguard/
- https://docs.microsoft.com/en-us/azure/devops/pipelines/policies/azure-policy?view=azure-devops