How to design ML/AI architectures [in Azure]
Marco van Hurne
Architect of AI solutions that improve business efficiency and client engagement.
Download my latest 1000+ book on Machine Learning
As machine learning is gaining a lot of popularity, companies across industries are using it to solve complex problems. However, implementing ML use cases can vary based on the project objectives, and priorities such as scaling, or cost optimization. When you are asked to design an appropriate ML system architecture for the first time, that may initially seem challenging, but developments in cloud platforms have made it more accessible than ever.
The cloud has helped the development of enterprise-level ML projects, making it increasingly commoditized. By using components , the self-contained pieces of code that perform specific steps in a machine learning pipeline, you can more conveniently construct independently executable workflows. This integration of components allows you to achieve our business goals systematically.
In this article, I will share my insights into common practices and patterns for architecting the end-to-end ML project lifecycle in Azure. A brilliant platform that has integrated all the right tools under one roof, and with an overarching interface. You can view this as your foundation to explore Azure’s capabilities and take your initial step towards creating sophisticated ML project designs.
Now, let’s do a shallow dive into the concepts and have fun.
The design of data flow in ML projects can be generally divided into several parts:
You, as the leader of a data science team were told that acquiring new customers is more costly than retaining existing ones, thus you are planning to build a customer churn prediction model — identify customers at high risk of leaving the service.
#1 Data Source
Let's start with determining the relevant data sources and choosing the appropriate data storage options in Azure. Let’s explore the options below:
#2 Ingestion & Preparation
Regarding gathering, integrating, and transforming data, there are two options in Azure: Azure Synapse Analytics and Azure Data Factory (ADF). The key differences between the two options are listed below.
Azure Synapse Analytics
ADF
In our case, if our goal is to integrate customer interaction data from multiple sources, such as Facebook, Instagram, X.com , and more, ADF is your go-to choice. Alternatively, if you focus on data analytics tasks, such as imputing missing data and extracting additional features from customer purchase history and demographics, you will find Azure Synapse Analytics a more comprehensive solution.
#3 ML & Analysis
After the integration and transformation of multiple data sources, we can build, train, and track machine learning models in an Azure Machine Learning workspace. Below are the various solutions available for model training:
Suppose the project is in the development phase, it is relatively effortless for us to apply automated ML to quickly explore the best-performing model among various algorithms, including logistic regression and random forest. However, ML pipelines would be our better option if we emphasize repeatability, shareability, and maintainability of the workflow.
Afterward, we deploy the trained ML models to a managed endpoint. They are stored and version-controlled in the model registry. During inferencing, the model can score new and unseen customers in regular batches, such as at 21:00 every weekday, to determine the churn probability.
#4 Storage
Once the model scoring results are generated, it is essential to have a reliable storage solution to store and process this data. Three Azure services offer centralized repositories: Azure Data Lake Storage Gen 2 (ADLS Gen2), Azure Synapse Analytics, and Azure SQL Database. We can compare their variations in terms of workload and data security.
ADLS Gen2
Azure Synapse Analytics
Azure SQL Database
Based on these characteristics, if you are building an ML model for a small to medium-sized customer base and do not have high data load requirements, Azure SQL Database is a suitable choice. It provides sufficient analytical capabilities while maintaining control over costs. As your business grows, we can consider migrating from Azure SQL database to ADLS Gen2 or Azure Synapse Dedicated SQL pool. ADLS Gen2 is a more cost-effective option, while Synapse gives more advanced analytical and security capabilities
#5 Model Consumption
The scored results from the machine learning model are consequently consumed in the front-end tool, e.g. Power BI, or the Web Apps feature of Azure App Service .
Utilizing the model results in customer churn prediction, we can for example leverage Power BI to highlight the customers with a high likelihood of churning across customer segmentations or periods. This would probably help to answer some business questions like “What are the main reasons for churn?”.
Now, you have been successfully running the customer churn prediction model with the above foundational design architecture in Azure for several months. One day, your data science team receives a call from the business users, they wish to explore new enhancements and would like you to access the technical possibilities together.
#1 Add the streaming data as the new data source
For example, your company sells IoT devices equipped with environmental sensors that capture and stream real-time data on temperature, humidity, and more. This kind of data source can be processed using the combination of Azure Event Hubs and Azure Stream Analytics .
While the Event Hub itself does not store, send, or receive data, it serves as the entry point for ingesting up to millions of event messages per second. We can then create Stream Analytics jobs to read the data stream for further processing or analysis. Before deployment in the production, it is advisable to test the transformation query. Once the streaming data is refined, it can be stored in staging areas like Azure Synapse Analytics, and Azure SQL Database.
#2 Monitor the model results comprehensively
For example, the business users want to ensure that even after the model rollouts for a while, resources are still allocated effectively — focusing on retaining high-value customers and optimizing customer retention efforts. The ML and analytics process is more than just model training and inferencing. We require additional model monitoring tools to guarantee the model results are consistently accurate, unbiased, and interpretable.
The combination of MLflow Tracking and Responsible AI Toolbox allows both the data science team and business users to proactively address and mitigate errors. If any obvious degradation in accuracy metrics is observed, we may need to consider retraining the model to maintain optimal performance.
#3 Run multiple models in parallel for the large-scale customer base
In scenarios like global subscription-based streaming platforms, using a single ML model to solve the customer churn prediction across different countries can be complex and unideal. Instead, they may consider creating separate ML models for each country. We can use Spark in Azure Databricks to facilitate the parallel execution of multiple models. There are several key considerations for the implementation.
We have explored various enterprise-level ML architectures in Azure that serve as references for developers and architects, especially those who are seeking ideas about common design patterns.
Signing-off Marco
Other stuff you might be interested in
Next Trend Realty LLC./wwwHar.com/Chester-Swanson/agent_cbswan
7 个月Thanks for posting.