登录查看更多内容

Microservice Architecture for Machine Learning Solutions in AWS

Carlos Lara

Principal Machine Learning Engineer | AWS

发布日期: 2021年11月1日

Why adopt a microservice strategy when building production machine learning solutions?

Suppose your data science team produced an end-to-end Jupyter notebook, culminating in a trained machine learning model. This model meets performance KPIs in a development environment, and the next logical step is to deploy it in a production environment to maximize its business value.

We all go through this transition as we take machine learning projects from research to production. This is typically a hand-off from a data scientist to a machine learning engineer, although in my team it's the same, properly trained, full-stack ML engineer.

The most direct approach is to convert all Pandas code into PySpark so that it can handle any size dataset. But that can still be a single PySpark script, running on an EMR cluster or serverless in a Glue ETL job.

Inevitably, a machine learning engineer will git commit a new change to the script that will introduce a bug into the workflow, leading to failed execution. Testing will not always catch every single bug, especially in ML pipelines where bugs are often subtle and insidious.

Given that it's one large script containing the code for each step in the ML workflow, it may not be immediately clear why the error took place. As the code base grows, it becomes increasingly challenging and time-consuming to debug.

Furthermore, if this solution is active in production, the change broke the entire workload. We now have a production system down while we identify the root cause and fix it.

To ensure changes to a machine learning pipeline are introduced with minimal or no interruption to the existing workload in production, adopt a microservice instead of a monolithic architecture.

This approach replaces one large resource with multiple small resources and helps reduce the impact of a single failure on the overall workload.

Service-oriented architecture (SOA) is the practice of making software components reusable using service interfaces. Instead of building a monolithic application, where all functionality is contained in a single runtime, the application is instead broken into separate components.

Microservices extend this by making components that are single-purpose and reusable. When building your architecture, divide components along business boundaries or logical domains. For ML training pipelines, these components may include:

Data Extraction
Data Validation
Data Transformation
Model Training
Model Evaluation
Model Explainability
Model Deployment

A microservice strategy enables distributed development, improves scalability, and enables easier change management. It also enables modular production deployments per individual component through CI/CD versus all-or-nothing deployments of the entire solution every time code changes (see "Modular Deployments in AWS Cross-Account CI/CD Pipelines" to learn more).

Kumaran Kanniappan ( I / we / Human ) 1 周前

Principles for Effective Machine Learning Architecture…

Amit Patriwala 8 个月前

AWS Machine Learning Workflow

Sanjay Kumar MBA,MS,PhD 1 个月前

Two popular approaches for developing serverless microservices in AWS using Docker containers are Lambda and Fargate. Lambda functions are meant for lightweight components with small to moderate memory requirements and < 15-minute runtime. If your containers are heavier in memory and runtime, run them as ECS Fargate tasks.

If you are dealing with big data, but do not want to build and maintain your own Spark containers, Glue is an excellent alternative. You can mix and match services to produce the suite of serverless microservices for your ML solution, then orchestrate their execution using Step Functions.

This is a sample architecture of how my team and I build serverless microservice training pipelines, and how changes flow from git commits to production updates:

This architecture gave us the ability to extend, test, and deploy changes to pipeline components individually, frequently in small batches. Specialized error handling per component allows us to catch errors on a small scale, address them quickly, and fix them efficiently.

Inference pipelines - the custom software solutions that utilize the trained models - are also built out of microservices. Our culture is serverless first, and we leverage it whenever possible. Managing our own EC2 instances or compute layer is always a last resort for us, but you can definitely build solutions out of traditional, self-managed microservices if needed.

What is your approach from taking a machine learning project from research to production? Do you adopt a microservices architecture for ML pipelines? Let us know in the comments!

Subscribe to our weekly LinkedIn newsletter:?Machine Learning In Production

Reach out if you need help:

Maximizing the business value of your data to improve core business KPIs
Deploying & monetizing your ML models in production
Building Well-Architected production ML software solutions
Implementing cloud-native MLOps
Training your teams to systematically take models from research to production
Identifying new DS/ML opportunities in your company or taking existing projects to the next level
Anything else we can help you with

Would you like me to speak at your event? Email me at [email protected]

Check out our blog at Gradient Group:?https://gradientgroup.ai/blog/

Carlos Lara

Principal Machine Learning Engineer | AWS

2 年

Jaiganesh Prabhakaran,?Miles Erickson,?Yang(Simon) Song,?Om Patri,?Rohan Thakkar would love to hear your thoughts on this topic!

2 次回应

要查看或添加评论，请登录

查看全部

Microservice Architecture for Machine Learning Solutions in AWS

Carlos Lara

Principal Machine Learning Engineer | AWS

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

How to orchestrate MLOps by using Azure Databricks?

Issue #199 - THE ML ENGINEER ??

DataFest Tbilisi Conference 2024: Overview for Microservices Specialists

Course Launch - Scaling and Accelerating Machine Learning Models

LLMOps Series: Workflow Orchestration Tools for LLMOps Pipelines

From Kubernetes to Generative AI: The Future of Work - Harnessing the Power of MongoDB Atlas

Scaling ML Dreams: A Journey Through Distributed MLOps

The Revolution of Vector Databases: Insights from Shalini and Shirsha

Layer: Declarative MLOps Platform for ML Applications at Scale

What Are Data, Machine Learning, and MLOps Pipelines (ML4Devs Newsletter, Issue 14)

领英推荐

Centralized Feature Engineering With SageMaker Feature Store

2022年1月4日

Test-Driven Development For Feature Engineering Microservices

2022年1月1日

Null Imputation Bias and Fairness for Production ML Solutions

2021年12月31日

Continuous Training of Machine Learning Models in Production

2021年12月29日

Unit Testing Data Validation Microservices for Production ML Pipelines

2021年12月25日

Testing ML Microservices for Production Deployments

2021年12月19日

How To Drive Revenue Growth Through Production ML Solutions

2021年12月11日

3 Degrees of Automation for Production Machine Learning Solutions

2021年11月30日

How To Deploy Serverless Containers For ML Pipelines Using ECS Fargate

2021年11月22日

5 Pillars of Architecture Design for Production ML Software Solutions

2021年11月15日

社区洞察

其他会员也浏览了

How to orchestrate MLOps by using Azure Databricks?

Issue #199 - THE ML ENGINEER ??

DataFest Tbilisi Conference 2024: Overview for Microservices Specialists

Course Launch - Scaling and Accelerating Machine Learning Models

LLMOps Series: Workflow Orchestration Tools for LLMOps Pipelines

From Kubernetes to Generative AI: The Future of Work - Harnessing the Power of MongoDB Atlas

Scaling ML Dreams: A Journey Through Distributed MLOps

The Revolution of Vector Databases: Insights from Shalini and Shirsha

Layer: Declarative MLOps Platform for ML Applications at Scale

What Are Data, Machine Learning, and MLOps Pipelines (ML4Devs Newsletter, Issue 14)