登录查看更多内容

Automate your machine learning models to maximise business benefits

Aisha Ekundayo, PhD

Data Analytics Consulting | AI Consultant | Data Product Management

发布日期: 2019年6月28日

As a data scientist, your work life will focus on several aspects of optimal use of data including deriving value from data to improve decision making, helping the product team to understand their end users better or forecasting several aspects of a business to gain competitive advantage. All in all, working on data and deriving valuable information will always be your top priority.

You want the machine learning models built to be used repeatedly for decision making by various stakeholders. In other words, to maximise return on investment (ROI) the insights from the data should not only be available once but at all times and, it should become a vital component of the business planning ecosystem and strategy formulation.

With the new problem being more data and less information, it is critical to have a strategy in place regarding how to make the information extracted from data available to end users at all times. That is, you want insights derived from your company’s data to be automated.

Otherwise, the effort and investment put into exploring the data, building models and making predictions may not realise the intended business benefits. This sounds counter-intuitive but it is the reality of some data science work with PowerPoint presentations the only outcome or a one-off dashboard on PoweBI or Tableau. This can be changed with more accessible and easy to deploy technologies at a reduced cost and time.

After spending a considerable amount of time preparing your data, training the models and making predictions, you need to put all these into a pipeline so that the output from the models can be automated for use in applications and other downstream processes.

Deploying and automating your data science pipelines using DevOps practices also enables the pipeline to automatically ingest new data as they come in and include them as part of the dataset to pick up new trends and patterns in the data for more accurate predictions.

There are a few options in the market. You could use AWS, Google Cloud, Microsoft Azure or IBM Cloud. In this blog, I will be writing about Microsoft Azure. There are numerous advantages of using cloud computing, these include flexibility, automated software updates, limited capital outlay, version control, collaboration, scalability and high availability.

Before we look at model operationalisation, I will like to state the main steps involved in a typical data science project after business understanding and ingesting relevant datasets.

These are:

Data Exploration
Data Cleaning
Feature Engineering
Feature Selection
Model Training
Model Testing
Model Scoring

Each of the components listed above will be part of the automated end-to-end (E2E) pipeline.

To start building your pipeline, set up an Azure subscription to have access to all the resources required. In your subscription, spin up a Databricks workspace, Azure Machine Learning Service or a Data Science Virtual Machine. Use the notebook to write all your code for the data science work within the provisioned application.

If you are using Databricks, it has a Spark backend and uses parallelisation and distribution dataframe structure to reduce execution time. It is particularly useful if you are working with big data and IoT data. Azure ML (AML) service also offers this capability and reduces time to production.

Next, operationise your machine learning models using Azure Data Factory. This will take care of data ingestion, preparation and transformation using scripts written in your preferred programming language that could be Python, Scala, etc., in your provisioned application such as Azure ML or Databricks. Azure data factory has a drag-and-drop UI which makes it user-friendly and it is the main orchestration centre for model operationalisation. It has three main components: Connections which is used to connect to the database; Datasets and Pipelines which has its ETL functionalities and; Trigger which is used to schedule the activities at pre-defined times.

Lastly, use Application Insights to monitor the performance of your pipeline through the telemetry as it reports all exemptions and events happening across the pipeline. Continuous Integration/Continuous Delivery (CI/CD) can be activated on Azure DevOps- Using CI/CD capability in DevOps for automated integration of new features in your code from git. New features will be added automatically to your models using this CI/CD framework.

To summarise, following these steps will enable you to operationalise your models quickly, efficiently and scale up the solution easily. Scaling up will be easy due to the ability to configure your cluster to scale up or down automatically depending on the required computing requirement.

Technology is always changing and data science is an iterative process, so an agile mindset and methodology are important for continuous improvement of your applications or solution. These applications will continue to evolve and improve but the intuition and principles will largely remain the same.

To share the outputs from your models, generate a REST API in Azure API Management to share your predictions or provision a CosmosDB where analyst and data scientist can query the results from the models.

Enjoy deploying your machine learning models and please share any hack or tip that can help other data scientists trying to automate their ML models.

Steve Nwokocha

5 年

Impressive

要查看或添加评论，请登录

Aisha Ekundayo, PhD的更多文章

Diary of a Data Product Manager: Considerations for a solid technology foundation

2023年10月13日

Diary of a Data Product Manager: Considerations for a solid technology foundation

Thank you for visiting my blog. In a previous blog, I presented the case for having a well-defined data strategy, data…
Diary of a Data Product Manager: A case for data strategy, product definition and stewardship.

2023年9月29日

Diary of a Data Product Manager: A case for data strategy, product definition and stewardship.

I want to share some lessons learned as a data product manager who has managed data products for the past few years…

3 条评论
Classic Approach and Modern Reality interaction for enhanced Big Data Applications across Industries

2021年5月13日

Classic Approach and Modern Reality interaction for enhanced Big Data Applications across Industries

As highlighted in the previous article, business processes generate a vast amount of structured and unstructured data…
The rhetoric and reality of Big Data Processing

2021年4月27日

The rhetoric and reality of Big Data Processing

The previous article concluded that Big Data could not be stored and processed using traditional methods such as a…

4 条评论
The Ascendancy of Big Data…

2021年4月11日

The Ascendancy of Big Data…

Advancement in technology recorded in the past two decades has led to a significant increase in data generated with…

3 条评论
A broad overview of machine learning models and its classifications

2019年6月28日

A broad overview of machine learning models and its classifications

A general definition of Machine Learning (ML) is “the art and science of programming computers so they can learn from…
How to plan a successful Data Science Project

2019年6月28日

How to plan a successful Data Science Project

Planning is essential in all projects to ensure that the strategic objectives of the project are accomplished. It is…
Five apparent reasons to study Economics

2019年6月28日

Five apparent reasons to study Economics

This blog is for young people, school leavers thinking about a degree and for those thinking about a career change. For…
Tenets of Good Work and Why They Matter for a Happier Society

2018年11月28日

Tenets of Good Work and Why They Matter for a Happier Society

With record employment level in most G7 countries and the increase in economic growth, little attention is given to…
Increasing your ROI: How recruitment technology and analytics can drive performance and growth

2018年11月7日

Increasing your ROI: How recruitment technology and analytics can drive performance and growth

Majority of business leaders agree that embracing digital transformation and automation is key to both short-term and…

See all articles

Automate your machine learning models to maximise business benefits

Aisha Ekundayo, PhD

Data Analytics Consulting | AI Consultant | Data Product Management

Aisha Ekundayo, PhD的更多文章

社区洞察

其他会员也浏览了

Understanding Data Science vs Machine Learning for Business Innovation

Navigating the Collision of Cloud and GenAI: How Decision Intelligence Platforms (DIP) Bring Harmony Across Teams

Considering CoPilot & Data Management

What is the Difference between Data Science and Machine Learning?

Exploring Copilot’s Intelligent Capabilities in Microsoft Fabric

Snowflake vs. Databricks: Which One Supports AI Best?

How to Start Incorporating Machine Learning in Enterprises

Core Challenges in MLOps

Bridging DataOps &?MLOps

DATA Pill #016 - Data Modelling for a startup, what is a Data Product, Kedro on Azure ML and more

Aisha Ekundayo, PhD的更多文章

Diary of a Data Product Manager: Considerations for a solid technology foundation

Diary of a Data Product Manager: A case for data strategy, product definition and stewardship.

Classic Approach and Modern Reality interaction for enhanced Big Data Applications across Industries

The rhetoric and reality of Big Data Processing

The Ascendancy of Big Data…

A broad overview of machine learning models and its classifications

How to plan a successful Data Science Project

Five apparent reasons to study Economics

Tenets of Good Work and Why They Matter for a Happier Society

Increasing your ROI: How recruitment technology and analytics can drive performance and growth

社区洞察

其他会员也浏览了

Understanding Data Science vs Machine Learning for Business Innovation

Navigating the Collision of Cloud and GenAI: How Decision Intelligence Platforms (DIP) Bring Harmony Across Teams

Considering CoPilot & Data Management

What is the Difference between Data Science and Machine Learning?

Exploring Copilot’s Intelligent Capabilities in Microsoft Fabric

Snowflake vs. Databricks: Which One Supports AI Best?

How to Start Incorporating Machine Learning in Enterprises

Core Challenges in MLOps

Bridging DataOps &?MLOps

DATA Pill #016 - Data Modelling for a startup, what is a Data Product, Kedro on Azure ML and more