How to design ML/AI architectures [in Azure]

How to design ML/AI architectures [in Azure]

Download my latest 1000+ book on Machine Learning

As machine learning is gaining a lot of popularity, companies across industries are using it to solve complex problems. However, implementing ML use cases can vary based on the project objectives, and priorities such as scaling, or cost optimization. When you are asked to design an appropriate ML system architecture for the first time, that may initially seem challenging, but developments in cloud platforms have made it more accessible than ever.

The cloud has helped the development of enterprise-level ML projects, making it increasingly commoditized. By using components , the self-contained pieces of code that perform specific steps in a machine learning pipeline, you can more conveniently construct independently executable workflows. This integration of components allows you to achieve our business goals systematically.

In this article, I will share my insights into common practices and patterns for architecting the end-to-end ML project lifecycle in Azure. A brilliant platform that has integrated all the right tools under one roof, and with an overarching interface. You can view this as your foundation to explore Azure’s capabilities and take your initial step towards creating sophisticated ML project designs.


Now, let’s do a shallow dive into the concepts and have fun.

The design of data flow in ML projects can be generally divided into several parts:

  1. Data Source
  2. Ingestion & Preparation
  3. ML & Analysis
  4. Storage
  5. Model Consumption

You, as the leader of a data science team were told that acquiring new customers is more costly than retaining existing ones, thus you are planning to build a customer churn prediction model — identify customers at high risk of leaving the service.

#1 Data Source

Let's start with determining the relevant data sources and choosing the appropriate data storage options in Azure. Let’s explore the options below:

  • Azure Blob Storage : Useful for storing textual data or binary data, such as customer interactions on social media platforms, customer surveys, and customer service emails. This is a reliable object store for managing unstructured data.
  • Azure SQL Database : A relational database that ensures data integrity and enables fast data retrieval. It is fully managed and serverless. In other words, we are free of infrastructure management and maintenance responsibilities. This is suited for storing structured data, such as customer purchase history, subscription details, and customer demographics.
  • Other Azure Data Sources: Including Azure Table Storage , Azure Cosmos Database , and more

#2 Ingestion & Preparation

Regarding gathering, integrating, and transforming data, there are two options in Azure: Azure Synapse Analytics and Azure Data Factory (ADF). The key differences between the two options are listed below.

Azure Synapse Analytics

  • Data transformation capabilities: Have the flexibility of writing custom codes to handle complex business requirements that may evolve.
  • Support for ML: Support multiple analytics-friendly programming languages like Python, to enjoy the ecosystem of ML libraries.
  • Pricing: The cost mainly depends on the volume of data stored.

ADF

  • Data transformation capabilities: Primarily rely on no-code features such as data flow, pipelines, and other components. This makes it convenient and simple to orchestrate the data processing flow.
  • Support for ML: Suited for generic purposes in Extract, Transform, Load (ETL) workflows and automating in-built data pipelines.
  • Pricing: The pricing is based on the amount of data transferred.

In our case, if our goal is to integrate customer interaction data from multiple sources, such as Facebook, Instagram, X.com , and more, ADF is your go-to choice. Alternatively, if you focus on data analytics tasks, such as imputing missing data and extracting additional features from customer purchase history and demographics, you will find Azure Synapse Analytics a more comprehensive solution.

#3 ML & Analysis

After the integration and transformation of multiple data sources, we can build, train, and track machine learning models in an Azure Machine Learning workspace. Below are the various solutions available for model training:

  • Automated ML : This no-code solution automates the time-consuming and iterative tasks involved in machine learning model development.
  • Azure ML designer : A user-friendly drag-and-drop interface, enabling rapid experimentation. This is usually for building prototypes and minimum viable products (MVP).
  • ML pipeline : A code-first approach, standardizing the best practices for producing a machine learning model in a production environment.

Suppose the project is in the development phase, it is relatively effortless for us to apply automated ML to quickly explore the best-performing model among various algorithms, including logistic regression and random forest. However, ML pipelines would be our better option if we emphasize repeatability, shareability, and maintainability of the workflow.

Afterward, we deploy the trained ML models to a managed endpoint. They are stored and version-controlled in the model registry. During inferencing, the model can score new and unseen customers in regular batches, such as at 21:00 every weekday, to determine the churn probability.

#4 Storage

Once the model scoring results are generated, it is essential to have a reliable storage solution to store and process this data. Three Azure services offer centralized repositories: Azure Data Lake Storage Gen 2 (ADLS Gen2), Azure Synapse Analytics, and Azure SQL Database. We can compare their variations in terms of workload and data security.

ADLS Gen2

  • Workload: Designed specifically for high-performance big data analytics workloads on raw data, staged data, and production-ready data (i.e. bronze, silver, and gold layer respectively).
  • Data security: Offer basic security features, such as data encryption and sending alerts for suspicious security threats.

Azure Synapse Analytics

  • Workload: Handle complex analytics workload by breaking them down into smaller tasks (decoupling) and processing them simultaneously (parallelizing).
  • Data security: Offer advanced security features, such as Azure Active Directory integration, firewall, and virtual network support.

Azure SQL Database

  • Workload: Optimized for handling transactional workloads, particularly those involving the high frequency of simple read/ write workloads for smaller database sizes.
  • Data security: Require manual effort for database monitoring.

Based on these characteristics, if you are building an ML model for a small to medium-sized customer base and do not have high data load requirements, Azure SQL Database is a suitable choice. It provides sufficient analytical capabilities while maintaining control over costs. As your business grows, we can consider migrating from Azure SQL database to ADLS Gen2 or Azure Synapse Dedicated SQL pool. ADLS Gen2 is a more cost-effective option, while Synapse gives more advanced analytical and security capabilities

#5 Model Consumption

The scored results from the machine learning model are consequently consumed in the front-end tool, e.g. Power BI, or the Web Apps feature of Azure App Service .

  • Power BI : Aggregate and transform the result data into interactive visualizations, which can deliver insights to business users, and therefore facilitate their decision-making.
  • Web apps: Provide a user interface or an API endpoint, for users to retrieve the transformed model results.

Utilizing the model results in customer churn prediction, we can for example leverage Power BI to highlight the customers with a high likelihood of churning across customer segmentations or periods. This would probably help to answer some business questions like “What are the main reasons for churn?”.

Now, you have been successfully running the customer churn prediction model with the above foundational design architecture in Azure for several months. One day, your data science team receives a call from the business users, they wish to explore new enhancements and would like you to access the technical possibilities together.

#1 Add the streaming data as the new data source

For example, your company sells IoT devices equipped with environmental sensors that capture and stream real-time data on temperature, humidity, and more. This kind of data source can be processed using the combination of Azure Event Hubs and Azure Stream Analytics .

While the Event Hub itself does not store, send, or receive data, it serves as the entry point for ingesting up to millions of event messages per second. We can then create Stream Analytics jobs to read the data stream for further processing or analysis. Before deployment in the production, it is advisable to test the transformation query. Once the streaming data is refined, it can be stored in staging areas like Azure Synapse Analytics, and Azure SQL Database.

#2 Monitor the model results comprehensively

For example, the business users want to ensure that even after the model rollouts for a while, resources are still allocated effectively — focusing on retaining high-value customers and optimizing customer retention efforts. The ML and analytics process is more than just model training and inferencing. We require additional model monitoring tools to guarantee the model results are consistently accurate, unbiased, and interpretable.

  • MLflow Tracking : An open-source and sophisticated framework, for managing parameter, metric, and model tracking within data science code runs. Since Azure Machine Learning workspaces are MLflow-compatible, we can use MLflow to systematically manage the model without hosting additional server instances.
  • Responsible AI Toolbox : Offer a range of dashboards for model assessment in the directions of responsible AI (holistically diagnose the model errors), error analysis (identify the underperforming data cohorts), interpretability (understand the model predictions on both local and global scales), and fairness (understand the model behaviors across sensitive cohorts).

The combination of MLflow Tracking and Responsible AI Toolbox allows both the data science team and business users to proactively address and mitigate errors. If any obvious degradation in accuracy metrics is observed, we may need to consider retraining the model to maintain optimal performance.

#3 Run multiple models in parallel for the large-scale customer base

In scenarios like global subscription-based streaming platforms, using a single ML model to solve the customer churn prediction across different countries can be complex and unideal. Instead, they may consider creating separate ML models for each country. We can use Spark in Azure Databricks to facilitate the parallel execution of multiple models. There are several key considerations for the implementation.

  • Data preparation: Partitioning the data is necessary so that a dataset comprises all the data for a specific country. Multiple datasets are generated, each corresponding to a different country.
  • Model training: The pipeline identifies and invokes the appropriate model in parallel for each dataset. This is achieved by searching the model tag that includes the information such as the data partition key and the model version.
  • Scoring: While our initial discussion assumes batch scoring for the prediction model, real-time interaction with customers in the platform can give personalized responses or offers. To address this, we can do real-time scoring using Azure Kubernetes Services (AKS) so that the models are loaded on demand.


We have explored various enterprise-level ML architectures in Azure that serve as references for developers and architects, especially those who are seeking ideas about common design patterns.

  • Standard design architecture: Consider the appropriate Azure services and components that can be fit into key stages of “Data Source”, “Ingestion & Preparation”, “ML & Analysis”, “Storage”, and “Model Consumption”.
  • Streaming data requirements: Leverage the combination of Azure Event Hubs and Azure Stream Analytics, to effectively process data streams.
  • Continuous model monitoring: Use MLflow Tracking and Responsible AI Toolbox, to comprehensively monitor and analyze the model runs.
  • Parallel execution of multiple models: Use the Spark in Azure Databricks, to partition the data, train, and infer multiple models in parallel.


Signing-off Marco


Other stuff you might be interested in



CHESTER SWANSON SR.

Next Trend Realty LLC./wwwHar.com/Chester-Swanson/agent_cbswan

7 个月

Thanks for posting.

要查看或添加评论,请登录

社区洞察