Demystifying AWS Sagemaker towards Data Transformation & inject ML models

Demystifying AWS Sagemaker towards Data Transformation & inject ML models

In this blog we will try to understand the below key agenda's of AWS Sagemaker in a concise manner .

  1. What is AWS Sage maker ?
  2. Sources of AWS Sage maker ?
  3. What is the role of AWS Sage maker in Devops scope ?


What is AWS Sage maker ?

AWS SageMaker is a fully managed machine learning (ML) service provided by Amazon Web Services (AWS). It empowers data scientists and developers to swiftly and confidently build, train, and deploy ML models in a production-ready hosted environment. Unlike traditional ML workflows that often involve complex setup, configuration, and infrastructure management, SageMaker automates many of these processes, making it easier to create scalable ML solutions.

Sources of AWS Sagemaker ?

Some of the popular sources of AWS Sagemaker from where it build , deploy ML models .

<A> Amazon S3 (Simple Storage Service):

  • Ease of Use: High. Amazon S3 is widely adopted and straightforward to work with.
  • Performance Characteristics: Scalable, secure, and offers industry-leading performance.
  • Cost: Cost-effective storage.
  • Limitations: None significant.
  • Use Case Example: Suppose you have historical customer behaviour data stored in an S3 bucket. Sagemaker can seamlessly read this data for ML Training .

<B> Amazon FSx for Lustre:

  • Ease of Use: Moderate. Requires linking to an existing S3 bucket.
  • Performance Characteristics: High throughput and scalability.
  • Cost: Associated with FSx for Lustre.
  • Limitations: Requires setting up FSx for Lustre .
  • Use Case :- If we have a large datasets of image , then FSx for Lustre provides efficient access for training .

<C> Amazon Elastic File System (Amazon EFS):

  • Ease of Use: Moderate. General-purpose shared file system.
  • Performance Characteristics: Scalable, serverless, and automatically grows/shrinks.
  • Cost: Multiple price tiers.
  • Limitations: None significant.
  • Use Case Example: Imagine maintaining a shared dataset across multiple SageMaker instances

Other Sources are as follows :-

  • Local Files: You can also read data from local files on your computer.
  • Amazon Redshift: For structured data, SageMaker can access Redshift clusters.
  • AWS Glue Data Catalog via Amazon Athena: Useful for querying data catalog metadata.
  • Salesforce Data Cloud: If you’re dealing with customer data such as from CRM .
  • Snowflake :- For cloud based data warehousing .

What is the role of AWS Sage maker in Devops scope ?

In a real-world DevOps scenario, consider the following:

  • CI/CD Integration:

  1. DevOps pipelines can automatically fetch data from S3 or other sources.
  2. SageMaker training jobs can be part of your CI/CD workflows.
  3. Example: A recommendation system continuously trains on user behaviour data, and the updated model is deployed automatically.


  • Monitoring and Scaling:

  1. SageMaker monitors data sources for changes.
  2. If new data arrives, the pipeline triggers retraining.
  3. Example: As more user interactions occur, the recommendation model adapts.


  • Feedback Loop:

  1. DevOps teams collect feedback (e.g., user ratings, clicks).
  2. SageMaker retrains the model periodically.
  3. Example: If user preferences shift, the recommendation system adjusts accordingly.


Remember, SageMaker’s flexibility allows seamless integration with various data sources, aligning with DevOps principles of automation, scalability, and continuous improvement .



要查看或添加评论,请登录

社区洞察

其他会员也浏览了