Part 2 : Interpreting Business Problem to Data Science project
Ashay Patil
Microservices|Java11|Python|Spring-Reactive|Spring-Cloud|Mysql8|ElasticSearch|NoSQL|RabbitMQ-Kafka|Spark
Hi Guys
It's been a while since I have written Last article on ML. So here I am :) to continue on the same ML blog series.
In this post, I will be sharing details about process/phase followed in data-science/data-mining to interpret a business problem into data science projects:
An important principle of data science is to create standard process with fairly well-understood stages. In collaboration with business stakeholders, data scientists decompose a business problem into sub-tasks. The solutions to the sub-tasks can then be composed to solve the overall problem. Some of these sub-tasks are unique to the particular business problem, but others are common data science tasks.
Data science project are iterative in nature, because every data-mining project mature over number of iterations. Standard stages in every data-mining project are follows :
- Business Understanding
- Data Understanding
- Data Preparation
- Modeling
- Evaluation
- Deployment
Business Understanding :
We must have clear idea about Problem we are going to solve. It might be possible that we might not get all things in place in very first attempt, but eventually after multiple iteration we will get there (reason of cyclic nature of process). Hence data analyst can apply creative & should try to convert business problem into multiple data science problems.High-level knowledge of the fundamentals helps creative business analysts see novel formulations.
Data Understanding :
Post Business understanding, data analyst should be looking for required set of data-points(these data-point can be raw in nature), hence It is important to understand the strengths and limitations of the data because rarely we might get an exact matching data with the problem.
The other important part of the data understanding phase is estimating the costs and benefits of each data source and deciding whether further investment is merited (Since we must need to add effort to merge these data-source for solving business problem).
Data Preparation :
In this phase, we have to perform analytical processing on collected dataset. Typical examples of data preparation are converting data to tabular format, removing or inferring missing values, and converting data to different types. There are standard data processing techniques developed to convert categorical(encoding to numeric) & numerical(normalization, scaling) data into types which models can understand in more effective way.
Modeling :
The modeling stage is the primary place where data mining techniques are applied to the data. We have mainly following defined classes of data-mining modeling :
- Classification and class probability estimation: This class produces a model that, given a new individual, determines which class that individual belongs to. e.g. Is fraud transaction.
- Regression (“value estimation”): This class produces a model that, predict the numerical value of some variable for that individual. e.g. What is house price
- Similarity matching: This class produces a model that attempts to identify similar individuals based on data known about them. Similarity matching can be used directly to find similar entities. e.g. Recommending similar products.
- Clustering: This class produces a model that attempts to group individuals in a population together by their similarity,but not driven by any specific purpose.
- Co-occurrence grouping: This class produces a model that attempts to find associations between entities based on transactions involving them. e.g.What items are commonly purchased together?
- Link prediction: This class produces a model that attempts to predict connections between data items. e.g. social networking systems
- Data reduction: This class produces a model that attempts to take a large set of data and replace it with a smaller set of data that contains much of the important information in the larger set
Evaluation :
The purpose of the evaluation stage is to assess the model results rigorously and to gain confidence that they are valid and reliable before moving on.If we look hard enough at any dataset we will find patterns, but they may not be aligned with Business problem. We would like to have confidence that the models and patterns extracted from the data are true regularities and not just random prediction or sample anomalies.
The evaluation stage also serves to help ensure that the model satisfies the original business goals. Recall that the primary goal of data science for business is to support decision making, and that we started the process by focusing on the business problem we would like to solve.
Deployment :
In deployment the results of data mining—and increasingly the data mining techniques themselves are put into real use in order to realize some Return on Investment(ROI).
Deploying a model into a production system typically requires that the model be re-coded for the production environment for greater speed or compatibility with an existing system. This may incur substantial expense and investment.
Regardless of whether deployment is successful, the process often returns to the Business Understanding phase. The process of mining data produces a great deal of insight into the business problem and the difficulties of its solution. A second iteration can yield an improved solution.
Hope you like above content, I will be creating content to illustrate how to develop solution around every Modelling techniques in upcoming blogs.
Happy Learning!!
RPA(Workfusion) | Machine Learning Developer
4 年Nice one and informative, thanks.