登录查看更多内容

How To Deploy Serverless Containers For ML Pipelines Using ECS Fargate

Carlos Lara

Senior Software Engineer | AWS | Go | Certified Kubernetes Administrator (CKA)

发布日期: 2021年11月22日

"Should we use Kubernetes or go serverless first for new software solutions?"

This is a common question among technology teams across the world. Based on a recent LinkedIn survey, the answer seems to be an event split between the two approaches, with most people flexible based on the project.

Common arguments in favor of Kubernetes include portability, scalability, low latency, low cost, open-source support, and DevOps maturity.

Common arguments in favor of serverless include simplicity, maintainability, shorter lead times, developer experience, talent / skill set availability, native integration with other cloud services, and existing commitment to the cloud.

Is there a way to combine the best of both worlds and create cloud-native, serverless container-based solutions?

Let's look at machine learning training pipelines in AWS as an example.

Suppose we are in stage 2 of the ML Model Deployment Lifecycle, where we converted our Jupyter notebook into a microservice-based training pipeline:

This architecture is fully serverless and AWS-native, leveraging PySpark Glue jobs, SageMaker Lambda functions, and Step Functions for component orchestration and error handling. This workflow is typically triggered by EventBridge Rules in response to drift events from production machine learning models.

What if we wanted to lower the cost of the Data Validator component, leverage open-source libraries (such as TensorFlow Extended), add better DevOps capabilities, and make the component more portable?

We replaced the Glue job with an ECS Fargate Task that uses TensorFlow Extended's Data Validation library. ECS Fargate allows us to create Docker containers out of our code repo and run them as serverless microservices - either standalone or as part of a larger production workflow.

In the case of ML training pipelines where we can afford higher latency batch jobs, Fargate Spot lowers our runtime cost to about 1 cent per vCPU per hour and a tenth of a cent per GB per hour (40x+ lower cost than AWS Glue jobs).

How do we implement this?

Let's start in the SageMaker Studio IDE.

The first step is to create a requirements.txt file to install TFX's Data Validation library in the container, and a Dockerfile to copy the data validation component code into the container:

Next, we git commit and trigger our CI/CD pipeline to containerize and push the Docker image to ECR. The build code is provided in CodeBuild's buildspec.yml file:

领英推荐

? Zero trust ebook, OpenAI's replicating sandboxing…

Learnk8s 3 个月前

? Chinese Docker Hub complete shutdown, Kube-proxy…

Learnk8s 3 个月前

Quiz Microservice: A Deep Dive into Technical…

Dragons 4 个月前

Once the build is complete, we pass the ECR image URI into the parameter overrides of the CloudFormation deploy CLI command. Here is the basic CloudFormation template resource definition for ECS Fargate Tasks:

The ValidatorImageURI and the other referenced variables are defined in the Parameters section of the CloudFormation template. You can define static values, or pass them in dynamically through the CloudFormation deployment (as we did for ValidatorImageURI).

Finally, we include this ECS Fargate Task within our Step Function workflow (also defined in the same CloudFormation template):

We pass parameters dynamically into the container through environment variables, such as Run Date and Run Id (initialized by the init component of the Step Function). These environment variables can then be read inside the container code through os.environ["RUN_DATE"], etc.

Extending or updating this ECS Fargate component is simply done by writing code in the relevant files and/or updating the CloudFormation template resource configuration, followed by a git commit > pull request > code review > merge request > CI/CD > production.

Given the modular microservice architecture of this ML training pipeline, any changes to this component will not break any other component in the pipeline and vice-versa.

Additional benefits of ECS Fargate Tasks can be discovered through the CloudFormation deployment process:

Granular control over deployments
Versioning and rollback
Inference accelerators
Mixed strategy of on-demand and spot EC2 instances
Native integration with AWS Batch for running concurrent batch jobs at scale

Have you built fully serverless microservice architectures in the cloud? What benefits did you obtain? What capabilities did you give up? Let us know in the comments!

Subscribe to our weekly LinkedIn newsletter:?Machine Learning In Production

Reach out if you need help:

Maximizing the business value of your data to improve core business KPIs
Deploying & monetizing your ML models in production
Building Well-Architected production ML software solutions
Implementing cloud-native MLOps
Training your teams to systematically take models from research to production
Identifying new DS/ML opportunities in your company or taking existing projects to the next level
Anything else we can help you with

Would you like me to speak at your event? Email me at?[email protected]

Check out our blog at Gradient Group:?https://gradientgroup.ai/blog/

Carlos Lara

Senior Software Engineer | AWS | Go | Certified Kubernetes Administrator (CKA)

3 年

Have you seen an application of ECS/EKS Fargate using Spark containers? Would love to hear your thoughts Miles Erickson, Alexandr Surin, Joshua Mabry, Deepyaman Datta, Bruce Philp, Peng W., Gene (Ta-Chun) Su, Raghav Gupta, Xilin Cecilia Shi, Charles Blumenthal, Earl Hammond, Anil Choudhary, Evangelos Theodoridis, Shubham Agrawal, Milan Korbel

1 次回应

Carlos Lara

Senior Software Engineer | AWS | Go | Certified Kubernetes Administrator (CKA)

3 年

Have you built fully serverless microservice architectures in the cloud? What benefits did you obtain? What capabilities did you give up? Let us know in the comments!

1 次回应

查看更多评论

要查看或添加评论，请登录

Carlos Lara的更多文章

Centralized Feature Engineering With SageMaker Feature Store

2022年1月4日

Centralized Feature Engineering With SageMaker Feature Store

Can we guarantee that training and inference pipelines are ingesting the same data? Not only in terms of the source…

1 条评论
Test-Driven Development For Feature Engineering Microservices

2022年1月1日

Test-Driven Development For Feature Engineering Microservices

How do we know for sure that our machine learning pipelines consistently produce the datasets we expect for model…

1 条评论
Null Imputation Bias and Fairness for Production ML Solutions

2021年12月31日

Null Imputation Bias and Fairness for Production ML Solutions

Minimizing bias and maximizing fairness are vital elements of production machine learning solutions. After all, one of…

9 条评论
Continuous Training of Machine Learning Models in Production

2021年12月29日

Continuous Training of Machine Learning Models in Production

Is continuous training (CT) a machine learning operations (MLOps) best practice? It depends on what we mean by CT…

6 条评论
Unit Testing Data Validation Microservices for Production ML Pipelines

2021年12月25日

Unit Testing Data Validation Microservices for Production ML Pipelines

Unit testing is a vital element of production software engineering. After all, how do we know for sure that our code…

4 条评论
Testing ML Microservices for Production Deployments

2021年12月19日

Testing ML Microservices for Production Deployments

How do we ensure machine learning pipeline components produce the exact result we expect, especially prior to…

2 条评论
How To Drive Revenue Growth Through Production ML Solutions

2021年12月11日

How To Drive Revenue Growth Through Production ML Solutions

For any organization, 20% of the AI/ML use cases drive 80% of the business value. How do we identify this 20%? Always…

11 条评论
3 Degrees of Automation for Production Machine Learning Solutions

2021年11月30日

3 Degrees of Automation for Production Machine Learning Solutions

Have you released a machine learning solution to production, only to find yourself pulling KPI metrics manually every…

4 条评论
5 Pillars of Architecture Design for Production ML Software Solutions

2021年11月15日

5 Pillars of Architecture Design for Production ML Software Solutions

Creating a machine learning software system is like constructing a building. If the foundation is not solid, structural…

2 条评论
Lifecycle of ML Model Deployments to Production

2021年11月8日

Lifecycle of ML Model Deployments to Production

What does it mean to deploy a machine learning model to production? As technology leaders, we invest in data science…

8 条评论

See all articles

How To Deploy Serverless Containers For ML Pipelines Using ECS Fargate

Carlos Lara

Senior Software Engineer | AWS | Go | Certified Kubernetes Administrator (CKA)

领英推荐

Carlos Lara的更多文章

社区洞察

其他会员也浏览了

Demystifying Containerization with AWS: Revolutionizing Application Deployment

?? Serverless Weekly #380: The Next Generation of Serverless is Happening

10K+ Libraries & 90+ Free Resources | Mastering Containerization with Docker

?? Kubernetes Weekly #379: Fight The Hidden Cost of Regional Kubernetes Clusters

Getting started with Kubernetes

?? Serverless Weekly #376: Building a CI/CD Pipeline for a Serverless Application

Building Scalable Microservices with .NET Core and Docker

Kubernetes Concepts

The Unprecedented Shift in Application Architecture: AI Meets Cloud Infrastructure

Microservices: Consistent State Propagation with Debezium Engine

领英推荐

Carlos Lara的更多文章

Centralized Feature Engineering With SageMaker Feature Store

Test-Driven Development For Feature Engineering Microservices

Null Imputation Bias and Fairness for Production ML Solutions

Continuous Training of Machine Learning Models in Production

Unit Testing Data Validation Microservices for Production ML Pipelines

Testing ML Microservices for Production Deployments

How To Drive Revenue Growth Through Production ML Solutions

3 Degrees of Automation for Production Machine Learning Solutions

5 Pillars of Architecture Design for Production ML Software Solutions

Lifecycle of ML Model Deployments to Production

社区洞察

其他会员也浏览了

Demystifying Containerization with AWS: Revolutionizing Application Deployment

?? Serverless Weekly #380: The Next Generation of Serverless is Happening

10K+ Libraries & 90+ Free Resources | Mastering Containerization with Docker

?? Kubernetes Weekly #379: Fight The Hidden Cost of Regional Kubernetes Clusters

Getting started with Kubernetes

?? Serverless Weekly #376: Building a CI/CD Pipeline for a Serverless Application

Building Scalable Microservices with .NET Core and Docker

Kubernetes Concepts

The Unprecedented Shift in Application Architecture: AI Meets Cloud Infrastructure

Microservices: Consistent State Propagation with Debezium Engine