Embracing Application-Centric Infrastructure in the Cloud 2
AWS CDK for EKS: Falling Short in Real-World, Multi-Account Kubernetes Deployments
AWS Cloud Development Kit (CDK) aims to simplify cloud infrastructure provisioning using familiar programming languages. While its EKS module promises to streamline Kubernetes cluster creation and management, a closer look reveals significant shortcomings, especially when considering practical, multi-account EKS deployments. This article will delve into these limitations, arguing that AWS CDK's current EKS implementation, particularly the Cluster.addManifest function, is not truly useful for organizations adopting a shared, multi-account EKS strategy.??
The Illusion of Simplicity: Cluster.addManifest and its Account Boundaries
The Cluster.addManifest(id: string, ...manifest: Record<string, any>[]): KubernetesManifest function in CDK appears to offer a straightforward way to deploy Kubernetes manifests to an EKS cluster. However, this simplicity is deceptive when considering real-world scenarios where EKS clusters are designed to be shared across multiple AWS accounts.
In practice, a central EKS cluster is often shared by various teams or applications residing in separate AWS accounts. This multi-account approach is crucial for security, isolation, and cost management. However, Cluster.addManifest operates under the implicit assumption of a single account and region deployment.
Evidence of this Limitation:
Ignoring the Network Foundation: A House Without Proper Plumbing
A truly practical EKS solution, especially in multi-account setups, hinges on a robust network foundation. This typically involves:
However, AWS CDK's EKS implementation, including blueprints like aws-quickstart/cdk-eks-blueprints, often overlooks or simplifies this critical network layer. While these tools may automate EKS cluster creation and even VPC provisioning, they frequently fall short of providing comprehensive, automated solutions for setting up Transit Gateways or VPC sharing as an integral part of the EKS deployment process.
In real-world EKS architectures, the network layer is not an afterthought; it is the foundation upon which a secure, scalable, and multi-account Kubernetes environment is built. CDK's focus on simplifying cluster creation while abstracting away network complexities leads to solutions that are ill-equipped for production-grade, shared EKS deployments.
Token Resolution Failures: CDK's Promise Undermined
CDK's strength lies in its use of tokens – placeholders that are resolved during deployment, allowing for dynamic configurations and resource references. However, Cluster.addManifest fails to properly resolve these tokens, further hindering its practicality.??
CDK tokens are designed to be resolved within the scope of a single CDK application and CloudFormation stack. When attempting to use a token from a resource within a manifest deployed to a cluster using Cluster.addManifest, token resolution often breaks down. CDK's default token resolution mechanisms are simply not designed to traverse account boundaries.
This limitation forces users to abandon CDK's elegant token-based approach and resort to manually passing concrete values – such as VPC IDs, subnet IDs, and security group IDs – as context parameters or environment variables to their CDK applications. This manual value passing is not only less elegant but also introduces more opportunities for errors and reduces the overall benefits of using CDK in the first place.
Solution begin with networking:
Environment: All tightly coupled logical resources as a bounded context—a self-contained vertical slice with all resources needed to deliver a business capability, regardless of their physical location or type.
Enver: Environment Version, Different envers are logical/function consistent with different config values. An enver will deploy and rollback as a unit.
Networking as service: network team owns all VPC related resources in red by managing thru code and lib. The networking account running multiple networking envers, each networking enver contains and shares IPAM, VPC with transit gateway and NAT to workspace accounts, that will get a range of CIDR from shared IPAM and share NAT and internal naming system when deploying vpcs in workload envers. Each VPC can only connect to one Transite GateWay, so VPCs and their resources inside are connected thru TGW, but different TGW’s connected VPCs are physically disconnected.
RDS as service: DB team owns DB cluster hosting DB for other envers that define/own/control DB/Schema/Role/User.
EKS as service: k8s team own Eks cluster host container orchestration for other envers that define/own/control k8s manifests and service account to IAM role mappings.
Same for EC2, MSK, Opensearch, ECS, ElasticCache, Redshift, private link ...
In this diagram:
1) One transit gateway connecting multiple VPCs across multiple accounts( same region );
2) One NAT to share internet with all connected VPCs;
3) One IPAM and CIDR pool for all connected VPCs' subnet to avoid IP conflicts;
4) Not showing: subnet, routing, SG, DNS, hostedzone, Org, admin delegation, cert, ...
1) Running Lambdas to deploy k8s manifest to different EKS clusters from different envers
2) Running Lambdas to deploy DB/Schema/Role/User to different RDS clusters from different envers
Enver 1: declare/control all resources in green inside logically, including k8s manifests and database’s related resources( db, schema, role … ), deploy or roll back in transaction.
Enver 2: declare/control all resources in purple inside logically too, after Manifest deployed to EKS Enver1’ cluster, pod will:
The platform takes care of the deployments, so that apps and services just focus on business logic/function:
<detailed code and deployments are on the way>