A Beginner’s Guide to Industry Standard Process of Data Mining: CRISP-DM
Lekha Priyadarshini Bhan
Generative AI Expert | WIDS Speaker | GHCI Speaker | Data Science specialist | Engineering Management
Data mining is the process of discovering hidden, valuable knowledge by analyzing a large amount of data. Also, we have to store that data in different databases. Therefore, there’s a need for a standard data mining process.
CRISP-DM (cross-industry standard process for data mining) is robust and well proven methodology that provides a structured approach to solve virtually any analytics problem in any industry.
It provides anyone -from novices to data mining experts -with a complete blueprint for conducting a data mining project.
Process of CRISP-DM framework:
CRISP-DM breaks down the life cycle of a data mining project into six phases.
1)Business Understanding
“I never failed once. It just happened to be a 2000-step process.”
The first stage of the framework is to develop a business understanding. For this we have to carry out below two steps
A)Determine the business objective
For a data analyst, understanding the business and its specific problems is of utmost importance. You ought to understand the problem clearly to convert it into a well-defined analytics problem. Only then you can lay out a brilliant strategy to solve it
B)Identify the goal of the data analysis
Current situation must be accessed and from these insights, the goals of carrying out the processes must be defined. This should follow the setting up of a plan to proceed.
2)Data Understanding
“Data! Data! Data! I can’t make bricks without clay!”
Data Understanding phase of CRISP- DM Framework focus on collecting the data, describing and exploring the data.
This stage comprises of four key steps to understand the available data, and identify new relevant data in order to solve the business problem
- Collect relevant data: You need to identify and collect the right set of data sets that can be used for analysis.
- Describe data — for explicit information: Once you have identified the data set, you need to describe its contents and explore insights to better understand the data and its business implications.
- Explore data by plotting graphs: A critical part of data understanding is exploring the data through plotting charts. Following types of insights can be achieved through plots/graphs. a) Spotting outlier values b)
- Observing trends of variables (increasing/decreasing) etc.c) Observing correlation between variables
- Verify data quality to remove errors: Once you have understood the data structure, you can next examine the quality of data and address various factors
3) Data Preparation
“Give me six hours to chop down a tree and I will spend the first four sharpening the axe.”
Data is usually spread across different files. Collating those files together and selecting the required rows and columns based on business understanding is a major step in data preparation. After collating the data set we address missing values and outliers. It is considered the most crucial step because the model will be built on the data sets created here.
Data preparation tasks are likely to be performed multiple times and not in any prescribed order. Tasks include table, record and attribute selection as well as transformation and cleaning of data for modeling tools.
It consists of the following steps:
- Select relevant data
- Integrate Data
- Clean data
- Construct Data: Derive new features
- Format Data
4) Data Modelling
Modelling is the heart of data analytics.
It is performed in the following manner:
- Selection of modelling technique is the very first step to take
- It is followed by the generation of test scenario for validating the model’s quality.
- After that few more models are generated.
- All the models are then assessed to make sure that they fall in line with the business initiatives.
5) Model Evaluation
“True genius resides in the capacity for evaluation of uncertain, hazardous and conflicting information.”
This is the fifth stage of the framework — model evaluation.
The predictive models can be tested to assess their effectiveness in solving the problem. Modelling and evaluation together is an iterative process in which the models are tweaked until satisfactory evaluation results are obtained.
6) Model Deployment
“The goal is to turn data into information, and information into insight.”
This is the last stage of the framework, where the model is translated into a business strategy. Business data is fed into the model and the model results are used to inform business decisions on an on-going basis.
ADVANTAGES OF CRISP-DM framework
CRISP-DM framework provides a uniform framework for
- Guidelines and experience documentation
- CRISP-DM is flexible to account for different business/agency problems and different data.
Strategic Account Solution Engineer, Enterprise Security Group at Broadcom
5 年Awesome!!! Congratulations on your first blog ?? lot more to come????