Data Scientist Methodologies for Structured Data:
Just as scientists have the scientific method, data scientists need a foundational methodology to guide their problem-solving efforts. ?A methodology is essentially a strategic roadmap that guides the activities within a process to obtain answers or results.
Here are three classic and widely adopted data science methodologies best suited for structured data:
These methodologies are the cornerstones of data mining and share the following characteristics:
It’s important to note that these methodologies are not ideal for projects involving unstructured data, such as audio and video files and large text documents.
So, put on your thinking cap (and perhaps enjoy a flat white), and let’s delve deeper into these widely adopted data science methodologies and explore how they can transform you into a data wizard.
?
1.??? Cross-Industry Standard Process for Data Mining (CRISP-DM)
Founded by the European Strategic Program on Research in Information Technology initiative. As the name suggests, it can be embraced by any industry looking to structure their data science projects. Therefore, CRISP-DM is a top contender among data science methodologies, resonating with practitioners across various industries.
CRISP-DM has six pivotal phases:
What sets CRISP-DM apart is its initial focus on Business Understanding, where the spotlight shines on grasping project objectives and requirements from a business perspective, laying the groundwork for solving the data problem at hand.
Now, don’t let the sequence of these phases fool you. CRISP-DM is all about flexibility and iteration. These phases can be revisited and refined to continuously enhance outcomes. Sometimes, you might find yourself bouncing back to earlier stages based on insights gained along the way.
领英推荐
2.??? Knowledge Discovery in Database (KDD)
Ever felt like you're digging through a mountain of data, trying to find those golden nuggets of insight? That’s where Knowledge Discovery in Databases (KDD) steps in.
?The KDD journey typically unfolds across five key steps:
By following these steps, businesses can stay current with customer needs and behaviors, foreseeing future trends to keep their competitive edge sharp. KDD is iterative and new data can be seamlessly integrated and transformed, leading to fresh insights and more tailored results. This continuous cycle of knowledge acquisition fuels the effectiveness of the KDD process.
However, KDD has its limitations. It might not address the complexities of modern data science projects, such as setting up big data architecture, ethical considerations, or defining roles within a data science team.
?
?
3.??? Sample, Explore, Modify, Model, Assess (SEMMA)
Developed by the SAS Institute, SEMMA is all about mining data effectively, especially focusing on modeling tasks. It's handy for tackling various business challenges like spotting fraud, retaining customers, targeted marketing, boosting loyalty, segmenting markets, and analyzing risks.
SEMMA stands for its five steps:
SEMMA is iterative. Solving one problem often leads to more questions, uncovering deeper insights along the way.
?
?