Migrating workloads from Rancher to AWS ECS & Fargate (P.2)
#TechnologyAtTyme | Click to see the Article - Part 1
Why AWS ECS and Fargate?
Amazon ECS is a fully managed container orchestration service that helps us quickly deploy, manage, and scale containerized applications. We can run docker containers on EC2 instances or in Fargate, a serverless container option. Specific considerations make us choose ECS in combination with Fargate:
Architectural Walkthrough
Migrating from Rancher self-hosted to Amazon ECS with a Re-platform strategy requires careful consideration of various factors to meet the demands of the business for high availability, scalability, security, resilience, and performance efficiency. Let's explore the solution design and services that help us achieve these objectives and considerations for a successful migration.
Infra stacks: We decided to migrate all of the Infra Stacks on Rancher to use AWS-managed services to reduce operational overhead achieve high availability, scalability, and performance efficiency, resource utilization—no need to spend the effort to manage those services running on EC2.
App Stacks: All of our services following Microservice Architecture will now transition to use Fargate - a serverless container - which AWS ECS will manage. ECS offers built-in scalability features to handle varying workloads. There is no need to manage Control Plane like Rancher Master. AWS will take care of all of it. Also, our application running on ECS Fargate will download docker images hosted in Elastic Container Register (ECR). We will remove the management of JFrog on EC2.
Load Balancer Stacks: Leveraging AWS Application Load Balancer (ALB) with ECS allows us to distribute traffic across multiple containers, making it easier to manage the dynamic nature of microservices. Additionally, ECS Service Auto Scaling can dynamically scale the number of tasks based on predefined scaling policies, ensuring optimal resource utilization. There is no need to maintain the old Load Balancer stacks running on the Zuul container and worry about scalability when traffic spikes on Black Friday, Christmas, New Year, etc.
Migration approach: Take advantage of Amazon Route 53’s Weighted Routing (as shown in the following diagram). With Weighted Routing, we were able to have a progressive transition from our existing Rancher cluster to the new one with zero downtime by splitting the traffic at the DNS level. Our customers are slowly being transferred to our new ECS cluster as their cached TTL expires. The split could start with a small share of our customers, for example, 10% being pointed to the new Amazon ECS cluster and 90% still on the old one. As soon as traffic is confirmed to be working on the new ECS cluster, the percentage of clients pointed to the new one can be increased.
Deployment
The deployment process was one of the biggest challenges during the migration to ECS. The old deployment process used TeamCity as CI/CD platform and Docker Compose as Deployment Tool, which had a rolling update mechanism. To give a better customer experience with less downtime when releasing new versions, we wanted to approach the Blue/Green Deployment mechanism for ECS with native support by CodeDeploy. In other words, our organization has the strategy to use AWS-native solutions for hosting application workloads on ECS and using AWS Developer Tools as CI/CD tooling. The purpose was seamlessly integrated with the AWS ECS ecosystem from the build to the deployment stage.
CodeDeploy ended up doing much of what we wanted out of the box - it provisioned the Green version, lets us monitor the stage of the deployment, transitioned traffic from the Blue to the Green version with near zero downtime, and allowed us to roll back code or abort in-progress quickly deploys if anything goes wrong.
CodePipeline to automate our deployment process. CodePipeline provides a continuous delivery pipeline that integrates with our source code repository, builds container images, and orchestrates the deployment to ECS. It allows for customization and integration with other AWS services to streamline migration.
Utilize AWS CloudFormation to define our CI/CD as code. CloudFormation templates enable us to create and manage CI/CD as code and our ECS workloads, such as task definitions, services, and load balancers, in a declarative and version-controlled manner. This ensures the consistency and reproducibility of our infrastructure.
With all of that benefits, we utilized and developed our TymePipeline - a home-grown CI/CD platform with CI/CD as Code mission. It allowed our developers to define their CI/CD for applications easily and automatically provision the whole infrastructure for services.
领英推荐
Lessons learned
Proper Planning and Assessment: Before initiating the migration, we conduct a thorough assessment of our application architecture, dependencies, performance, and resource requirements. This evaluation ensures a smooth transition and minimizes potential disruptions. It also identifies the challenges we must address and the benefits we expect to achieve by migrating to ECS.
Plan for resource provisioning: Proper resource provisioning is crucial to ensure optimal performance and cost-effectiveness. Analyze our container workloads' resource requirements and design an ECS architecture for efficient resource utilization. Consider CPU and memory requirements, network bandwidth, storage needs, and scaling capabilities to avoid overprovisioning or underutilization.
Build a robust migration plan: Develop a comprehensive plan outlining the necessary steps, timelines, and resource allocation for a successful migration. Consider data migration, application dependencies, network configurations, and essential code modifications. Test the migration plan in a controlled environment before executing it in a production environment to minimize risks and downtime.
Leverage automation and infrastructure as code: Use automation tools like AWS CloudFormation and AWS Developer Tools to provision CI/CD pipeline and deploy ECS resources. Infrastructure as code allows for easy replication, versioning, and rollback, reducing the risk of manual errors and ensuring consistency across environments. Automating the deployment process also speeds up the migration and reduces the overall effort required.
Implement thorough testing and validation: Prioritize thorough testing and warranty throughout the migration process. Conduct performance tests, security scans, and integration tests to ensure that our containerized applications function as expected in the ECS environment. Establish clear testing criteria and success metrics to validate the migration's success and address any issues before going live.
Establish monitoring and observability: Implement robust monitoring and observability practices to gain insights into the performance and health of our containerized workloads. Utilizing ECS's monitoring features and integrating them with DataDog enables Tyme to gain insights into container performance, identify bottlenecks, and make informed optimizations.
Training and Skill Development: Migrating to ECS may require new skills and knowledge. Organizations should invest in training and skill development to ensure their teams are proficient in managing ECS and taking full advantage of its capabilities.
Foster collaboration and knowledge sharing: Throughout the migration journey, foster collaboration among teams involved, including developers, operations, and security. Encourage knowledge sharing, conduct training sessions, and document best practices to ensure everyone is aligned and prepared to support the new ECS environment. This collaborative approach helps mitigate risks, encourages innovation, and enhances the overall success of the migration.
Summary
In conclusion, migrating from Rancher self-hosted to Amazon ECS provides Tyme with a more advanced and scalable container management solution. By addressing the challenges of Rancher self-hosted, ECS offers improved operational efficiency, enhanced security, increased reliability, optimized performance efficiency, streamlined cost management, and seamless integration with other AWS services. Since migrating to ECS, Tyme has experienced several notable advantages. Firstly, we have been able to easily scale our infrastructure to handle unexpected spikes in workload, ensuring uninterrupted service for our customers. Additionally, ECS has allowed us to quickly roll back code deployments that didn't meet our expectations, enabling us to maintain high reliability and stability. Moreover, ECS has significantly enhanced the overall developer experience at Tyme, allowing our team to focus more on innovation and driving business growth rather than managing infrastructure. The migration journey to ECS has positioned Tyme on a robust and scalable container platform, empowering us to meet the evolving needs of our customers and drive continuous improvement. We are confident that this strategic decision will bring long-term benefits to our organization, supporting our goals of innovation, scalability, and customer satisfaction.
For more TymeLabs - Articles: