Migrating workloads from Rancher to AWS ECS & Fargate (P.2)

Migrating workloads from Rancher to AWS ECS & Fargate (P.2)

#TechnologyAtTyme | Click to see the Article - Part 1

Why AWS ECS and Fargate?

Amazon ECS is a fully managed container orchestration service that helps us quickly deploy, manage, and scale containerized applications. We can run docker containers on EC2 instances or in Fargate, a serverless container option. Specific considerations make us choose ECS in combination with Fargate:

  • Enhanced Scalability and Availability: When traffic spikes on seasonal holidays such as Black Friday, Christmas, or marketing campaigns happen, we used to provision and wait for several EC2 instances to start, then manually scale out containers. Now we can have ECS automatically start and stop containers as needed, which takes significantly less time. We use approval docker images that host in AWS ECR to ensure we're deploying the code we expect to deploy, eliminating the need for Tyme to manage the underlying infrastructure. AWS handles the infrastructure provisioning and scaling, allowing teams to focus on deploying and managing their containerized applications.
  • Operational Excellence: As AWS manages the underlying infrastructure, it reduces the operational overhead of Rancher self-hosted, such as monthly OS patching and security patches, and handles Rancher Master control plane and Data plane, usually managed by the Platform team. By migrating to AWS ECS, Tyme can offload the burden of infrastructure management to AWS and reduce many workloads for the Platform team. This enables us to focus on core business tasks rather than routine maintenance. With features like automated scaling, built-in monitoring with CloudWatch, and integrated CI/CD pipelines, AWS ECS simplifies and streamlines operational tasks.
  • Security: Maintaining a secure container environment is crucial to protect sensitive data and ensure compliance. AWS ECS integrates seamlessly with the AWS Identity and Access Management (IAM) service, enabling us to effectively manage access control and permissions for containerized workloads. ECS leverages AWS security features such as Amazon Virtual Private Cloud (VPC) networking, security groups, and encryption at rest to enhance the security posture of containerized applications. Additionally, we can leverage AWS Identity and Access Management Roles for Tasks (IAM Roles for Tasks) to control access to AWS resources within ECS tasks, ensuring a secure and auditable environment. As a bonus, we also utilize AWS Secrets Manager and SSM Parameters Store as configuration management tools which are more secure and available than self-hosted Vault in Rancher.
  • Reliability: In today's digital landscape, applications must be highly available and fault-tolerant. ECS offers built-in reliability features to ensure the continuous availability of containerized workloads. It supports task placement policies that distribute containers across multiple availability zones, providing resilience and fault tolerance. Tyme can define custom health checks to monitor container health and automatically replace unhealthy instances. Moreover, ECS integrates seamlessly with AWS Auto Scaling, allowing Tyme to scale resources dynamically based on demand, optimizing performance and ensuring reliable application delivery.
  • Performance Efficiency: Efficiently utilizing resources is essential for maximizing performance and cost-effectiveness. In combination with AWS Fargate, a serverless compute engine for containers, ECS streamlines resource allocation. Fargate eliminates the need to manage the underlying infrastructure, allowing Tyme to focus solely on running its containerized workloads. ECS offers task placement strategies that enable Tyme to optimize resource allocation based on factors such as instance types, availability zones, and custom constraints. This ensures efficient utilization of computing resources, enhancing application performance and optimizing costs.
  • Cost Optimization: AWS ECS helps optimize resource allocation and utilization. It offers dynamic scaling and auto-scaling capabilities based on CPU/Memory or custom metrics, such as the number of messages in SQS, ensuring that our containers always have the necessary resources. With AWS ECS, we can achieve better cost optimization by paying only for the resources we consume, scaling them up or down as required.

Architectural Walkthrough

Migrating from Rancher self-hosted to Amazon ECS with a Re-platform strategy requires careful consideration of various factors to meet the demands of the business for high availability, scalability, security, resilience, and performance efficiency. Let's explore the solution design and services that help us achieve these objectives and considerations for a successful migration.

No alt text provided for this image

Infra stacks: We decided to migrate all of the Infra Stacks on Rancher to use AWS-managed services to reduce operational overhead achieve high availability, scalability, and performance efficiency, resource utilization—no need to spend the effort to manage those services running on EC2.

  • Service Discovery will be replaced by using Internal Application Load Balancer.
  • Secrets Manager and SSM Parameters Store will replace the configuration service for storing credentials and the environment for the service to be running.
  • Networking service from Rancher with more painful “No Route to Host” will now be changed to use awsvpc networking mode where each ECS task has its unique IP address.
  • Storage Service will be replaced by EFS, which is more available, scalable, and better performance.
  • Local DNS service from Rancher will be replaced by Route53, where we hosted a private hosted zone.

App Stacks: All of our services following Microservice Architecture will now transition to use Fargate - a serverless container - which AWS ECS will manage. ECS offers built-in scalability features to handle varying workloads. There is no need to manage Control Plane like Rancher Master. AWS will take care of all of it. Also, our application running on ECS Fargate will download docker images hosted in Elastic Container Register (ECR). We will remove the management of JFrog on EC2.

Load Balancer Stacks: Leveraging AWS Application Load Balancer (ALB) with ECS allows us to distribute traffic across multiple containers, making it easier to manage the dynamic nature of microservices. Additionally, ECS Service Auto Scaling can dynamically scale the number of tasks based on predefined scaling policies, ensuring optimal resource utilization. There is no need to maintain the old Load Balancer stacks running on the Zuul container and worry about scalability when traffic spikes on Black Friday, Christmas, New Year, etc.

Migration approach: Take advantage of Amazon Route 53’s Weighted Routing (as shown in the following diagram). With Weighted Routing, we were able to have a progressive transition from our existing Rancher cluster to the new one with zero downtime by splitting the traffic at the DNS level. Our customers are slowly being transferred to our new ECS cluster as their cached TTL expires. The split could start with a small share of our customers, for example, 10% being pointed to the new Amazon ECS cluster and 90% still on the old one. As soon as traffic is confirmed to be working on the new ECS cluster, the percentage of clients pointed to the new one can be increased.

Deployment

The deployment process was one of the biggest challenges during the migration to ECS. The old deployment process used TeamCity as CI/CD platform and Docker Compose as Deployment Tool, which had a rolling update mechanism. To give a better customer experience with less downtime when releasing new versions, we wanted to approach the Blue/Green Deployment mechanism for ECS with native support by CodeDeploy. In other words, our organization has the strategy to use AWS-native solutions for hosting application workloads on ECS and using AWS Developer Tools as CI/CD tooling. The purpose was seamlessly integrated with the AWS ECS ecosystem from the build to the deployment stage.

CodeDeploy ended up doing much of what we wanted out of the box - it provisioned the Green version, lets us monitor the stage of the deployment, transitioned traffic from the Blue to the Green version with near zero downtime, and allowed us to roll back code or abort in-progress quickly deploys if anything goes wrong.

CodePipeline to automate our deployment process. CodePipeline provides a continuous delivery pipeline that integrates with our source code repository, builds container images, and orchestrates the deployment to ECS. It allows for customization and integration with other AWS services to streamline migration.

Utilize AWS CloudFormation to define our CI/CD as code. CloudFormation templates enable us to create and manage CI/CD as code and our ECS workloads, such as task definitions, services, and load balancers, in a declarative and version-controlled manner. This ensures the consistency and reproducibility of our infrastructure.

With all of that benefits, we utilized and developed our TymePipeline - a home-grown CI/CD platform with CI/CD as Code mission. It allowed our developers to define their CI/CD for applications easily and automatically provision the whole infrastructure for services.

No alt text provided for this image

Lessons learned

Proper Planning and Assessment: Before initiating the migration, we conduct a thorough assessment of our application architecture, dependencies, performance, and resource requirements. This evaluation ensures a smooth transition and minimizes potential disruptions. It also identifies the challenges we must address and the benefits we expect to achieve by migrating to ECS.

Plan for resource provisioning: Proper resource provisioning is crucial to ensure optimal performance and cost-effectiveness. Analyze our container workloads' resource requirements and design an ECS architecture for efficient resource utilization. Consider CPU and memory requirements, network bandwidth, storage needs, and scaling capabilities to avoid overprovisioning or underutilization.

Build a robust migration plan: Develop a comprehensive plan outlining the necessary steps, timelines, and resource allocation for a successful migration. Consider data migration, application dependencies, network configurations, and essential code modifications. Test the migration plan in a controlled environment before executing it in a production environment to minimize risks and downtime.

Leverage automation and infrastructure as code: Use automation tools like AWS CloudFormation and AWS Developer Tools to provision CI/CD pipeline and deploy ECS resources. Infrastructure as code allows for easy replication, versioning, and rollback, reducing the risk of manual errors and ensuring consistency across environments. Automating the deployment process also speeds up the migration and reduces the overall effort required.

Implement thorough testing and validation: Prioritize thorough testing and warranty throughout the migration process. Conduct performance tests, security scans, and integration tests to ensure that our containerized applications function as expected in the ECS environment. Establish clear testing criteria and success metrics to validate the migration's success and address any issues before going live.

Establish monitoring and observability: Implement robust monitoring and observability practices to gain insights into the performance and health of our containerized workloads. Utilizing ECS's monitoring features and integrating them with DataDog enables Tyme to gain insights into container performance, identify bottlenecks, and make informed optimizations.

Training and Skill Development: Migrating to ECS may require new skills and knowledge. Organizations should invest in training and skill development to ensure their teams are proficient in managing ECS and taking full advantage of its capabilities.

Foster collaboration and knowledge sharing: Throughout the migration journey, foster collaboration among teams involved, including developers, operations, and security. Encourage knowledge sharing, conduct training sessions, and document best practices to ensure everyone is aligned and prepared to support the new ECS environment. This collaborative approach helps mitigate risks, encourages innovation, and enhances the overall success of the migration.

Summary

In conclusion, migrating from Rancher self-hosted to Amazon ECS provides Tyme with a more advanced and scalable container management solution. By addressing the challenges of Rancher self-hosted, ECS offers improved operational efficiency, enhanced security, increased reliability, optimized performance efficiency, streamlined cost management, and seamless integration with other AWS services. Since migrating to ECS, Tyme has experienced several notable advantages. Firstly, we have been able to easily scale our infrastructure to handle unexpected spikes in workload, ensuring uninterrupted service for our customers. Additionally, ECS has allowed us to quickly roll back code deployments that didn't meet our expectations, enabling us to maintain high reliability and stability. Moreover, ECS has significantly enhanced the overall developer experience at Tyme, allowing our team to focus more on innovation and driving business growth rather than managing infrastructure. The migration journey to ECS has positioned Tyme on a robust and scalable container platform, empowering us to meet the evolving needs of our customers and drive continuous improvement. We are confident that this strategic decision will bring long-term benefits to our organization, supporting our goals of innovation, scalability, and customer satisfaction.

Written by?Phuc Dang?- Cloud Architect, Tyme and?Tri Tran?- Technical Lead, Tyme.

For more TymeLabs - Articles:

#TechnologyAtTyme

#LifeAtTyme

要查看或添加评论,请登录

TymeX的更多文章

社区洞察

其他会员也浏览了