ML Systems for Business: A Step-by-Step Guide

ML Systems for Business: A Step-by-Step Guide

Machine learning has rapidly transformed the business world in the recent years, offering new opportunities for companies to improve efficiency, streamline operations, and gain a competitive edge. As a result, there has been a growing demand for organizations to develop their own custom machine learning systems tailored to their specific business needs. However, creating a machine learning system from scratch can be a complex and intimidating process, requiring a deep understanding of the project's technical and business aspects. In this article, we will explore the key considerations and steps involved in building a machine learning system from the ground up to help organizations leverage the full potential of this cutting-edge technology.

CRISP-DM and OSEMN

CRISP-DM and OSEMN frameworks are popular methodologies for organizing and conducting data science projects. They provide structured and standardized approaches for developing data science projects. Their aim is to help data science teams stay organized and focused, ensuring high-quality solutions are delivered on time to meet the business's needs.

The CRISP-DM framework is a six-step process for data science projects

No alt text provided for this image
CRISP-DM Framework. Image Generated by Author

1. Business understanding

2. Data Understanding

3. Data Preparation

4. Modeling

5. Evaluation

6. Deployment.?

CRISP-DM focuses on ensuring that a project is well-structured and that the different stages of the project are completed systematically and iteratively.

The OSEMN framework, on the other hand, is a five-step process:

1. Obtaining

2. Scrubbing

3. Exploring

4. Modeling

5. Interpreting Data.

No alt text provided for this image
OSEMN Framework. Image Generated by Author

OSEMN focuses on the technical aspects of data science, such as cleaning and preparing data, building models, and interpreting results.

One key difference between CRISP-DM and OSEMN is that CRISP-DM strongly emphasizes the business understanding and deployment stages. At the same time, OSEMN is more focused on the technical aspects of data science. CRISP-DM is more suited to large and complex data science projects, while OSEMN is better suited to smaller and more focused projects.

While CRISP-DM and OSEMN frameworks can provide a solid foundation for a general data science project, businesses must assess whether these frameworks suit their specific needs before adopting them. Often, these frameworks are not enough to meet the goals and requirements of the business.

Below is my vision, based on experience, of how a generalistic data science framework should look like

ML Project from Scratch

Developing a machine learning system from scratch for business needs requires a systematic approach, starting with understanding business needs, building a proof of concept, developing the model, testing it with fine-tuning, and finally, deployment and support.

Below is a step-by-step guide on how I attempt to build production ML systems.

Step 1. Building a Proof of Concept (PoC)

No alt text provided for this image
Step 1. Building a Proof of Concept (PoC). Image Generated by Author

Step 1a. Understanding the Business Needs

Before starting with the development, it is vital to understand the business requirements and objectives of the machine learning system. This includes identifying the problem that needs to be solved, the data available, and the expected outcomes. It helps to ensure that the right questions are being addressed and that resources are not being wasted.?

Step 1b. Understanding Data

Understanding data is a crucial aspect of any data science project. It is the foundation of any data-driven decision-making process and helps uncover meaningful insights and patterns. Data understanding helps identify suitable data sources and ensures that the data collected is relevant and accurate. Moreover, it also helps identify any potential biases or outliers that may affect the analysis results.

Step 1c. Exploratory Data Analysis (EDA)

Once the business questions are defined, initial data can be collected, roughly cleaned, and analyzed using various statistical and machine learning techniques to extract meaningful insights and patterns. These insights can be used to inform business decisions, optimize processes, and drive growth. It is also important to validate the findings by comparing them with historical data and industry benchmarks.

Step 1d. Building a Proof of Concept (PoC)

PoC helps validate the machine learning solution's feasibility and business requirements on a small scale. This stage is crucial as it helps validate the idea and assess whether the solution is worth pursuing. Usually, experiments are run with a subset of the data, using different algorithms and sometimes fine-tuning their parameters. A successful PoC can be a basis for further development. Success is measured by an appropriate metric that relates directly to the business requirements. Keep in mind that a PoC is just a preliminary evaluation of the solution and may not reflect the final performance of a fully developed product.

Step 2. Building a Minimum Viable Product (MVP)

No alt text provided for this image
Step 2. Building a Minimum Viable Product (MVP). Image Generated by Author

Building a Minimum Viable Product (MVP) in data science is crucial in developing any data-driven solution. The MVP is a simplified final product version that includes only the essential features that solve the core problem. The objective of an MVP is to validate the hypothesis and gather feedback from stakeholders: user and customers, partners, and investors.

An MVP in data science is different from a PoC in that it is a market-ready product, while the PoC is a test of the feasibility of a solution. A PoC typically focuses on evaluating the technical aspects of the solution: accuracy of the algorithms, processing speed, etc. In contrast, an MVP focuses on solving a business problem and delivering value to the end users. While building the MVP take into account all possible data and ML issues. This way, it can be easily scaled as the business grows, reducing the risk of technical debt and the need for significant rework in the future.

Step 2a. Data Preparation

Data preparation: The machine learning system's quality depends on the data quality. Teams as sales, marketing, and customer support can provide valuable insights into customer behavior and preferences. Data collection from these teams can help in understanding the target audience, their needs, and how the product or service can be improved to meet those needs. After cleaning the data, which might take more time than expected, comes the fun part. Feature engineering is where data scientists use their domain knowledge and creativity to extract meaningful information from the data. They can manipulate and transform the data to extract relevant features, which can then be used for building models

Step 2b. Model Development

In this step, various machine learning algorithms are tested and compared to determine the most suitable one for the business case. The selected algorithm is then further developed and optimized through the use of testing data.

Step 2c. Model Evaluation

Translating the desired business outcomes into specific metrics that can be measured within the machine learning model is crucial. This helps ensure that the model is aligned with the goals and objectives of the project and that it will ultimately deliver value to the business. Regardless of the metric chosen, it is necessary to establish a baseline measure of performance to track progress and judge the rate of return from increasing the complexity of the modeling solution.

Step 2d. Model Deployment

The deployed model should be able to handle incoming data, make predictions, and sometimes even return results in real time. The deployment process involves integrating the model into the business, setting up the infrastructure for hosting the model, and testing the deployment. The pipelines are triggered, results are stored, and notifications and alerts are sent.

Step 3. Next Steps

No alt text provided for this image
Step 3. Next steps. Image Generated by Author

The PoC was a success, the MVP showed good traction. Now what? Let's concentrate on how we can improve the solution.

Step 3a. Monitoring results

Monitoring the performance of a machine learning model helps in determining the effectiveness of the model. To make informed decisions track and evaluate the results regularly. This is where monitoring tools such as PowerBI, Tableau, or even python visualization packages come into play, providing a clear and easy-to-understand visual representation of the data. This can have a significant impact on the success of the project and overall business outcomes.

Step 3b. Solution performance

The model's performance can usually be improved by adding new data or features, changing the algorithm, fine-tuning the parameters, or in other ml-related ways. The improved model is then retrained and evaluated, with the process being repeated until satisfactory performance is achieved. After an MVP is deployed, one can start iterating right away using agile frameworks. Machine learning engineers usually prioritize sorting out infrastructure issues and keep the first model simple. A reliable pipeline will allow further testing of more complex models.

Step 3c. Model documentation

Documenting the model and the process of developing it is important for future reference, maintenance, and updating of the model. The documentation should include the data used, the algorithm, the parameters, and the performance of the model.?

Step 3d. User support and training

Providing user support and training is crucial for the successful implementation of the machine learning system. This includes providing training on how to use the system, answering user queries, and providing technical support in case of any issues.

Overview

As it might be easy to spot, fragments of CRISP-DM and OSEMN were used to create a generalized framework to deliver a data science solution from idea to product. The suggested framework can be further customized to meet an organization's specific needs.

No alt text provided for this image
Authors ML Framework. Image Generated by Author

I hope this article has provided valuable insights into how to start an ML project, move from PoC to MVP or continue improving existing ml solutions.?

If you have found this information helpful, or have additional ideas, cases, or materials to share, please let me know in the comments section below.

No alt text provided for this image
Ivan Reznikov

PhD, Principal Data Scientist || O'Reilly Book Author || TEDx/PyCon/GITEX Speaker || University Lecturer || LangChain, Large Language Models (LLMs) and Generative AI || 30K+ followers

1 年
Rhaydrick Sandokhan

Data Quality & Governance Analyst @ FARFETCH | MSc. in Data Science and Engineering @ FEUP

1 年

This is a superb piece of work, well done Ivan Reznikov! ??

CHESTER SWANSON SR.

Next Trend Realty LLC./wwwHar.com/Chester-Swanson/agent_cbswan

1 年

Well said.

要查看或添加评论,请登录

Ivan Reznikov的更多文章

社区洞察

其他会员也浏览了