Embracing Application-Centric Infrastructure in the Cloud 2

Embracing Application-Centric Infrastructure in the Cloud 2

AWS CDK for EKS: Falling Short in Real-World, Multi-Account Kubernetes Deployments

AWS Cloud Development Kit (CDK) aims to simplify cloud infrastructure provisioning using familiar programming languages. While its EKS module promises to streamline Kubernetes cluster creation and management, a closer look reveals significant shortcomings, especially when considering practical, multi-account EKS deployments. This article will delve into these limitations, arguing that AWS CDK's current EKS implementation, particularly the Cluster.addManifest function, is not truly useful for organizations adopting a shared, multi-account EKS strategy.??

The Illusion of Simplicity: Cluster.addManifest and its Account Boundaries

The Cluster.addManifest(id: string, ...manifest: Record<string, any>[]): KubernetesManifest function in CDK appears to offer a straightforward way to deploy Kubernetes manifests to an EKS cluster. However, this simplicity is deceptive when considering real-world scenarios where EKS clusters are designed to be shared across multiple AWS accounts.

In practice, a central EKS cluster is often shared by various teams or applications residing in separate AWS accounts. This multi-account approach is crucial for security, isolation, and cost management. However, Cluster.addManifest operates under the implicit assumption of a single account and region deployment.

Evidence of this Limitation:

  • Implicit Same-Account Assumption in CDK Design: AWS CDK's core constructs and IAM role management are inherently designed for deployments within a single AWS account. While the CDK documentation for KubernetesManifest does not explicitly forbid cross-account deployments, its examples and underlying mechanisms are geared towards single-account use cases.
  • Cross-Account IAM Complexity: Deploying manifests to an EKS cluster in a different account necessitates complex cross-account IAM role configurations. As highlighted in this Stack Overflow discussion on cross-account resource access in CDK, CDK, relying on CloudFormation, faces inherent challenges in managing resources across accounts. Cluster.addManifest does not automatically handle the necessary cross-account IAM role assumptions, making it cumbersome to use in shared EKS environments.
  • AWS Best Practices Advocate Multi-Account EKS: AWS itself recommends a multi-account strategy for EKS, as outlined in their official documentation on Multi Account Strategy for Amazon EKS. This document details how to share VPC subnets and leverage IAM Roles for Service Accounts (IRSA) for secure cross-account access. The stark contrast between these best practices and the limitations of Cluster.addManifest underscores the tool's inadequacy for real-world EKS deployments.

Ignoring the Network Foundation: A House Without Proper Plumbing

A truly practical EKS solution, especially in multi-account setups, hinges on a robust network foundation. This typically involves:

  • Transit Gateways: To establish secure and scalable connectivity between VPCs across different accounts.??
  • VPC Sharing: To allow multiple accounts to share a central VPC and its subnets, often hosting the EKS cluster.
  • Private Subnets: For enhanced security, ensuring that manifest deployments and application workloads operate within private network segments.

However, AWS CDK's EKS implementation, including blueprints like aws-quickstart/cdk-eks-blueprints, often overlooks or simplifies this critical network layer. While these tools may automate EKS cluster creation and even VPC provisioning, they frequently fall short of providing comprehensive, automated solutions for setting up Transit Gateways or VPC sharing as an integral part of the EKS deployment process.

In real-world EKS architectures, the network layer is not an afterthought; it is the foundation upon which a secure, scalable, and multi-account Kubernetes environment is built. CDK's focus on simplifying cluster creation while abstracting away network complexities leads to solutions that are ill-equipped for production-grade, shared EKS deployments.

Token Resolution Failures: CDK's Promise Undermined

CDK's strength lies in its use of tokens – placeholders that are resolved during deployment, allowing for dynamic configurations and resource references. However, Cluster.addManifest fails to properly resolve these tokens, further hindering its practicality.??

CDK tokens are designed to be resolved within the scope of a single CDK application and CloudFormation stack. When attempting to use a token from a resource within a manifest deployed to a cluster using Cluster.addManifest, token resolution often breaks down. CDK's default token resolution mechanisms are simply not designed to traverse account boundaries.

This limitation forces users to abandon CDK's elegant token-based approach and resort to manually passing concrete values – such as VPC IDs, subnet IDs, and security group IDs – as context parameters or environment variables to their CDK applications. This manual value passing is not only less elegant but also introduces more opportunities for errors and reduces the overall benefits of using CDK in the first place.



Solution begin with networking:

Prerequisite: Embracing Application-Centric Infrastructure in the Cloud 1

Environment: All tightly coupled logical resources as a bounded context—a self-contained vertical slice with all resources needed to deliver a business capability, regardless of their physical location or type.

Enver: Environment Version, Different envers are logical/function consistent with different config values. An enver will deploy and rollback as a unit.

Networking as service: network team owns all VPC related resources in red by managing thru code and lib. The networking account running multiple networking envers, each networking enver contains and shares IPAM, VPC with transit gateway and NAT to workspace accounts, that will get a range of CIDR from shared IPAM and share NAT and internal naming system when deploying vpcs in workload envers. Each VPC can only connect to one Transite GateWay, so VPCs and their resources inside are connected thru TGW, but different TGW’s connected VPCs are physically disconnected.

RDS as service: DB team owns DB cluster hosting DB for other envers that define/own/control DB/Schema/Role/User.

EKS as service: k8s team own Eks cluster host container orchestration for other envers that define/own/control k8s manifests and service account to IAM role mappings.

Same for EC2, MSK, Opensearch, ECS, ElasticCache, Redshift, private link ...


In this diagram:

  • AWS Account Networking running two isolated envers, NT Enver LE and NT Enver Prod, take NT Enver Prod as example,

1) One transit gateway connecting multiple VPCs across multiple accounts( same region );

2) One NAT to share internet with all connected VPCs;

3) One IPAM and CIDR pool for all connected VPCs' subnet to avoid IP conflicts;

4) Not showing: subnet, routing, SG, DNS, hostedzone, Org, admin delegation, cert, ...

  • Central VPC as proxy to deploy resources cross VPCs:

1) Running Lambdas to deploy k8s manifest to different EKS clusters from different envers

2) Running Lambdas to deploy DB/Schema/Role/User to different RDS clusters from different envers

  • Central VPC as proxy/hub to connect cross VPCs: pods in EKS, tasks in ECS connecting to different databases in different RDS clusters from different envers.

Enver 1: declare/control all resources in green inside logically, including k8s manifests and database’s related resources( db, schema, role … ), deploy or roll back in transaction.

  • Manifest deployed to EKS Enver1’s cluster, pod assuming Iam role 1(thru SA/oidc) to access dynamoDB.
  • Database, schema, role/user deployed to RDS Enver Prod, ECS task inside assuming iam role 2 to access DB hosted thru TGW

Enver 2: declare/control all resources in purple inside logically too, after Manifest deployed to EKS Enver1’ cluster, pod will:

  1. Assume IAM role A to access DB thru transit gateway( no vpc needed in Enver 2! )
  2. Assume IAM role B to access S3bucket for files.

The platform takes care of the deployments, so that apps and services just focus on business logic/function:

  • The k8s manifests, declared in Enver 1 and Enver 2 will be sent to EKS Cluster thru Central Account's Lambda function in VPC-Prod.
  • The DB schema/role/user declared in Enver 1 and Enver 2 will be sent to RDS Cluster thru Central Account's Lambda function in VPC-Prod.



<detailed code and deployments are on the way>

要查看或添加评论,请登录

杨刚的更多文章