MLOps with Cloud: AWS Sagemaker

MLOps with Cloud: AWS Sagemaker

Introduction

This article talks about the several key features with cloud powered ML developments, workflows on AWS cloud platform which resolves conventional difficulties such as long training time, integration, deployments, etc.

AWS Sagemaker solves a wide range of problems and provides a simple, robust solution for the ML complete workflow from development to deployment with various features.

Here is the high level view of the ML Life-cycle. In this article we only focus on the AWS cloud perspective to improve the ML Life-cycle.

No alt text provided for this image

1.Model Development - Training, testing

  • The development can be done directly on the cloud with either Jupyter Lab/Notebook instances and code can be maintained in a repository such as Github, CodeCommit as Sagemaker supports them.
  • The AWS-SDK can be used to utilize the features of the cloud platform. Click here to check the python-SDK.
  • S3 can be used to store huge datasets for both training and testing. Automatically get data insights with AWS Data wrangler in the exploratory data analysis phase.
  • Data labelling and other features can also be automated with Ground truth feature. AWS Quicksight has built in visualizer that helps to understand the data and find the correlation, dependencies and other info.
  • The AWS prebuilt docker image for xgboost runtime is used in this example. However the custom images with installed packages can also be used here.

No alt text provided for this image

  • Quickly spin up compute instances with required resources with few lines of code with SDK, for training and testing the ML model.

No alt text provided for this image

2.Architecture / Model Integration with Application

  • The model integration with the application can be of different types based on the complexity, architecture and the use case. Some of the fairly common ways are listed below.
  • With API Gateway and Lambda, the model inference can be deployed as a independent REST API open to public/private internet based on the required use case.

No alt text provided for this image

  • The flow can be controlled with the Sagemaker pipelines which allows us to configure custom DAG(Directed Asynchronous Graph) flows before and after the ML invocations, which in-turn can be used exposed as endpoint.
  • Different workflows can be integrated with the model such as Sagemaker pipelines, Airflow workflows, Kubernetes Orchestration and AWS Step functions.

3.Deployment with CI/CD

  • The model deployment can be done in various types, based on the requirements such as from Sagemaker, Cloudformation, etc
  • From sagemaker we can deploy the model as an endpoint with the given image, resource information and trained job information as given in the example.

No alt text provided for this image

  • For monitoring and analysis, we can make use of Cloudwatch to check the behavior of the deployed model. Various extensions can be done from these metrics such as create an alarm and initiate an Scale-up/Scale-down etc.
  • For CI/CD, we can normally use the conventional services such as CodeBuild, CodePipeline, etc for deployment. Even the Sagemaker pipeline can be used for this purpose.
  • For custom images, there some designs that needs to be followed by the engineers to achieve cloud powered ML endpoints.

Advantages over traditional approach

  • Reduces the latency time: Helps to improve the models over time and make the changes and deploy them without any delay. This helps to have models up to date and push the changes a lot faster than our conventional design.
  • Fast and Scalable servers: The ML compute instances are optimized to perform the ML inference and works on the Sagemaker Neo with 25x performance boost compared to the traditional compute instances.
  • Flexibility for custom runtimes: The AWS provides a wide range of managed containers with runtimes optimized to the needs of the users. However we can have custom containers with required dependencies and runtimes.
  • Features of AWS Integrations: Handling the events, notifications, dead-letter queues, scheduling, storage and other features can be seamlessly integrated with the Sagemaker.
  • Reuse and automation: The models are treated as an individual entity based on the architecture you need for your use case, hence it would be easy to reuse the same models across products if required. There are various functionalities available such as hyper parameter tuning, etc.

Sureshkumar Annadurai

Data Engineering Lead | Expert in Designing & Managing Streaming and Batch Data Pipelines | Azure, Databricks & IDMC Specialist

2 年

Gives a good insight on how to ML in AWS!

要查看或添加评论,请登录

Sridhar C R的更多文章

  • Chaos with AWS

    Chaos with AWS

    Problem Statement So after spending good amount of time in feature testing the big cloud application, most of us…

社区洞察

其他会员也浏览了