登录查看更多内容

Data Scientist Methodologies for Structured Data:

Bushra Al Sulayyim

Data Scientist | Cybersecurity Consultant & Researcher

发布日期: 2024年5月27日

Just as scientists have the scientific method, data scientists need a foundational methodology to guide their problem-solving efforts. ?A methodology is essentially a strategic roadmap that guides the activities within a process to obtain answers or results.

Here are three classic and widely adopted data science methodologies best suited for structured data:

Cross-Industry Standard Process for Data Mining (CRISP-DM)
Knowledge Discovery in Databases (KDD)
Sample, Explore, Modify, Model, Assess (SEMMA)

These methodologies are the cornerstones of data mining and share the following characteristics:

They employ data mining methods.
They are best suited for structured data (Structured data is data that fits neatly into data tables and includes discrete data types such as numbers, short text, and dates. Think of well-organized spreadsheets)
They are useful for both descriptive and predictive analytics (akin to predicting the weather for your next beach day).
They involve common activities such as data gathering, data transformation, data modeling, and model evaluation (turning raw data into actionable insights).

It’s important to note that these methodologies are not ideal for projects involving unstructured data, such as audio and video files and large text documents.

So, put on your thinking cap (and perhaps enjoy a flat white), and let’s delve deeper into these widely adopted data science methodologies and explore how they can transform you into a data wizard.

1.??? Cross-Industry Standard Process for Data Mining (CRISP-DM)

Founded by the European Strategic Program on Research in Information Technology initiative. As the name suggests, it can be embraced by any industry looking to structure their data science projects. Therefore, CRISP-DM is a top contender among data science methodologies, resonating with practitioners across various industries.

CRISP-DM has six pivotal phases:

Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment

What sets CRISP-DM apart is its initial focus on Business Understanding, where the spotlight shines on grasping project objectives and requirements from a business perspective, laying the groundwork for solving the data problem at hand.

Now, don’t let the sequence of these phases fool you. CRISP-DM is all about flexibility and iteration. These phases can be revisited and refined to continuously enhance outcomes. Sometimes, you might find yourself bouncing back to earlier stages based on insights gained along the way.

领英推荐

Uncover Insights using Exploratory Data Analysis (EDA)

Techcanvass 8 个月前

Introduction To Data Science: A Comprehensive Guide…

Ze Learning Labb 11 个月前

8 Steps In Data Science Process Decoded – 4th One Is…

Ze Learning Labb 1 年前

2.??? Knowledge Discovery in Database (KDD)

Ever felt like you're digging through a mountain of data, trying to find those golden nuggets of insight? That’s where Knowledge Discovery in Databases (KDD) steps in.

?The KDD journey typically unfolds across five key steps:

Selection
Preprocessing
Transformation
Data Mining
Interpretation/Evaluation

By following these steps, businesses can stay current with customer needs and behaviors, foreseeing future trends to keep their competitive edge sharp. KDD is iterative and new data can be seamlessly integrated and transformed, leading to fresh insights and more tailored results. This continuous cycle of knowledge acquisition fuels the effectiveness of the KDD process.

However, KDD has its limitations. It might not address the complexities of modern data science projects, such as setting up big data architecture, ethical considerations, or defining roles within a data science team.

3.??? Sample, Explore, Modify, Model, Assess (SEMMA)

Developed by the SAS Institute, SEMMA is all about mining data effectively, especially focusing on modeling tasks. It's handy for tackling various business challenges like spotting fraud, retaining customers, targeted marketing, boosting loyalty, segmenting markets, and analyzing risks.

SEMMA stands for its five steps:

Sample
Explore
Modify
Model
Assess

SEMMA is iterative. Solving one problem often leads to more questions, uncovering deeper insights along the way.

要查看或添加评论，请登录

Bushra Al Sulayyim的更多文章

Understanding the Three Key Methods of Machine Learning: Supervised Learning, Unsupervised Learning, and Reinforcement Learning

2024年5月23日

Understanding the Three Key Methods of Machine Learning: Supervised Learning, Unsupervised Learning, and Reinforcement Learning

Essentially, machine learning is teaching a computer to solve problems. Machine learning allows a machine to learn from…
Data Analytics vs. Data Science

2024年5月23日

Data Analytics vs. Data Science

Data analytics and data science might sound like they're from the same Outback, but they’re as different as kangaroos…

Data Scientist Methodologies for Structured Data:

Bushra Al Sulayyim

Data Scientist | Cybersecurity Consultant & Researcher

Here are three classic and widely adopted data science methodologies best suited for structured data:

These methodologies are the cornerstones of data mining and share the following characteristics:

1.??? Cross-Industry Standard Process for Data Mining (CRISP-DM)

领英推荐

2.??? Knowledge Discovery in Database (KDD)

3.??? Sample, Explore, Modify, Model, Assess (SEMMA)

Bushra Al Sulayyim的更多文章

社区洞察

其他会员也浏览了

The Difference Between a Data Scientist and a Data Analyst

Data-Ops: Empowering Data Scientists with Effective Data Management

8 Tips to become a Data Scientist without a Tech background

Big Data Analytics: Identifying Trends, Patterns, and Correlations

Leveraging Data Science for Strategic Business Analysis

DATA WRANGLING

Navigating the Data Science Lifecycle: From Problem Definition to Model Deployment

Data science meets Interpretation: A Blog idea around Data Science and Interpretation

What is the Data Science Life Cycle?

Data Collection in Data Science

Here are three classic and widely adopted data science methodologies best suited for structured data:

These methodologies are the cornerstones of data mining and share the following characteristics:

1.??? Cross-Industry Standard Process for Data Mining (CRISP-DM)

领英推荐

2.??? Knowledge Discovery in Database (KDD)

3.??? Sample, Explore, Modify, Model, Assess (SEMMA)

Bushra Al Sulayyim的更多文章

Understanding the Three Key Methods of Machine Learning: Supervised Learning, Unsupervised Learning, and Reinforcement Learning

Data Analytics vs. Data Science

社区洞察

其他会员也浏览了

The Difference Between a Data Scientist and a Data Analyst

Data-Ops: Empowering Data Scientists with Effective Data Management

8 Tips to become a Data Scientist without a Tech background

Big Data Analytics: Identifying Trends, Patterns, and Correlations

Leveraging Data Science for Strategic Business Analysis

DATA WRANGLING

Navigating the Data Science Lifecycle: From Problem Definition to Model Deployment

Data science meets Interpretation: A Blog idea around Data Science and Interpretation

What is the Data Science Life Cycle?

Data Collection in Data Science