Data Scientist Methodologies for Structured Data:

Data Scientist Methodologies for Structured Data:


Just as scientists have the scientific method, data scientists need a foundational methodology to guide their problem-solving efforts. ?A methodology is essentially a strategic roadmap that guides the activities within a process to obtain answers or results.

Here are three classic and widely adopted data science methodologies best suited for structured data:

  1. Cross-Industry Standard Process for Data Mining (CRISP-DM)
  2. Knowledge Discovery in Databases (KDD)
  3. Sample, Explore, Modify, Model, Assess (SEMMA)


These methodologies are the cornerstones of data mining and share the following characteristics:

  • They employ data mining methods.
  • They are best suited for structured data (Structured data is data that fits neatly into data tables and includes discrete data types such as numbers, short text, and dates. Think of well-organized spreadsheets)
  • They are useful for both descriptive and predictive analytics (akin to predicting the weather for your next beach day).
  • They involve common activities such as data gathering, data transformation, data modeling, and model evaluation (turning raw data into actionable insights).


It’s important to note that these methodologies are not ideal for projects involving unstructured data, such as audio and video files and large text documents.

So, put on your thinking cap (and perhaps enjoy a flat white), and let’s delve deeper into these widely adopted data science methodologies and explore how they can transform you into a data wizard.

?

1.??? Cross-Industry Standard Process for Data Mining (CRISP-DM)

Founded by the European Strategic Program on Research in Information Technology initiative. As the name suggests, it can be embraced by any industry looking to structure their data science projects. Therefore, CRISP-DM is a top contender among data science methodologies, resonating with practitioners across various industries.

Cross-Industry Standard Process for Data Mining (CRISP-DM)

CRISP-DM has six pivotal phases:

  1. Business Understanding
  2. Data Understanding
  3. Data Preparation
  4. Modeling
  5. Evaluation
  6. Deployment

What sets CRISP-DM apart is its initial focus on Business Understanding, where the spotlight shines on grasping project objectives and requirements from a business perspective, laying the groundwork for solving the data problem at hand.

Now, don’t let the sequence of these phases fool you. CRISP-DM is all about flexibility and iteration. These phases can be revisited and refined to continuously enhance outcomes. Sometimes, you might find yourself bouncing back to earlier stages based on insights gained along the way.


2.??? Knowledge Discovery in Database (KDD)

Ever felt like you're digging through a mountain of data, trying to find those golden nuggets of insight? That’s where Knowledge Discovery in Databases (KDD) steps in.

KDD- Knowledge Discovery in Database

?The KDD journey typically unfolds across five key steps:

  1. Selection
  2. Preprocessing
  3. Transformation
  4. Data Mining
  5. Interpretation/Evaluation

By following these steps, businesses can stay current with customer needs and behaviors, foreseeing future trends to keep their competitive edge sharp. KDD is iterative and new data can be seamlessly integrated and transformed, leading to fresh insights and more tailored results. This continuous cycle of knowledge acquisition fuels the effectiveness of the KDD process.

However, KDD has its limitations. It might not address the complexities of modern data science projects, such as setting up big data architecture, ethical considerations, or defining roles within a data science team.

?

?

3.??? Sample, Explore, Modify, Model, Assess (SEMMA)

Developed by the SAS Institute, SEMMA is all about mining data effectively, especially focusing on modeling tasks. It's handy for tackling various business challenges like spotting fraud, retaining customers, targeted marketing, boosting loyalty, segmenting markets, and analyzing risks.

Sample, Explore, Modify, Model, Assess (SEMMA)

SEMMA stands for its five steps:

  1. Sample
  2. Explore
  3. Modify
  4. Model
  5. Assess

SEMMA is iterative. Solving one problem often leads to more questions, uncovering deeper insights along the way.

?


?

要查看或添加评论,请登录

Bushra Al Sulayyim的更多文章

社区洞察

其他会员也浏览了