The Data Science Workflow

The Data Science Workflow

Data Science is often misunderstood by students seeking to enter the field, business analysts seeking to add data science as a new skill, and executives seeking to implement a data science practice. This article aims to clear up the mystery behind data science by illustrating the sequence of steps to go from a business problem to generating business value using a data science workflow. Once data science is understood, we can take steps to learn data science skills that will generate the most value and/or better make strategic investments in building a data science practice.

Overview

In this article, you will:

  • Understand what data science is
  • Learn how data science generates value for an organization
  • Learn how to go from business problem to business value
  • Get Special Offer to learn data science for business through Business Science University

The Mystery & Confusion

Data Science is a mysterious term to many, but why?

Students see data science as machine learning - 100% of the time (this is drastically disproportionate to reality). In reality, Machine Learning (or Modeling) is about 5% of your time. The rest of the time is spent:

  • Understanding the Business Problem: Communicating with Domain Experts (20%)
  • Working with Data: Cleaning, Manipulating, Visualizing, Processing, Transforming, and Understanding (60%)
  • Communicating Results: Reporting, Slide Decking, and Building Distributed Applications (Predictive Decision-Making Tools) (15%)

Executives and business professionals see data science as a new technology that could benefit their organization, but the connection between business problem and business value is not well understood. Fortunately, the reality is that large businesses:

  • Have many customers - The customers churn, generate sales, drive forecasts
  • Make many products and/or services - The products are linked to quality, lead time, and inventory
  • Have many suppliers - The suppliers affect lead times and serviceability
  • Have data - The data provides a means to measure business drivers and is the fuel for data science

This combination of business-drivers - customers, products, inventory, suppliers, and more - with a wide array of internal and external data available makes data science a competitive advantage to organizations that can effectively implement it.

Making Better Decisions Generates Business Value

The goal for any Data Science Practice (Data Science Team) is to enable the rest of the organization to make better, data-driven decisions. Therefore, a Data Science Practice is a support role (similar to IT) that allows the organization to function better. The Data Science team can add a lot of value very quickly - through better decision making.

A simple example illustrates my point - An organization that does $500M in annual revenue but has a customer churn rate of 10% loses out on $50M in revenue/year. If a data science practice can identify the issue, predict which customers are going to churn, and implement strategies that enable the workforce to targe the customers with retention strategies, the team can effectively reduce the churn rate 20%.

An organization that does $500M in annual revenue but has a customer churn rate of 10% loses out on $50M in revenue/year

In monetary terms, a reduction in churn of 20% equates to an annual savings of $10M. Over 5 years, this is $50M in savings generated from the Data Science Practice working with the decision makers (e.g. Sales, Marketing, Production).

How Do We Go From Problem To Value?

The way to go from business problem to business value follows an iterative set of steps that at Business Science University, we call the Data Science Workflow:

The Data Science Workflow has milestones (blue clouds), stages (dotted lines), and steps (gray shapes).

We begin with a Business Problem (milestone), where the team or organization identifies a problem that is worth solving. Typically this has a specific metric assigned to it that can be measured financially (e.g. 10% of our customers are not re-purchasing each year, this is costing the organization $50M annually).

The organization prioritizes this problem with the data science team, and they step into a project management workflow. Hopefully they follow a systematic approach designed to integrate the business with data science such as the Business Science Problem Framework (we teach the BSPF in our Data Science for Business with R Course).

Back to the Data ScienceWorkflow

There are 3 stages:

  1. Preparation - Data is collected and cleaned. This takes a significant amount of time because most data is unclean, meaning steps need to be taken to improve the quality and develop it into a format that machines can interpret and learn from.
  2. Experimentation - This is where hypotheses are generated, data is visualized, and models are generated. This takes significantly less time than Preparation.
  3. Distribution - Reports are generated documenting results, slide decks are created to present to management, and once management provides the go-ahead, apps are developed to implement decision making systems.

At the end of the workflow, data scientist’s call this “production” or “deployment”, and this is where Business Value (milestone) is generated.

The best data science teams can iterate through this process going from problem to value very efficiently, spending little time on modeling and maximum time at the ends of the spectrum:

  • Beginning of Workflow: Business Understanding / Domain Expert Communication, Data Understanding, Data Quality, and Feature Engineering are Critical
  • End of Workflow: Communication with Project Stakeholders, Product Delivery are Critical

DS Learning Process & Implementing a Data Science Practice

For those that are interested in (1) implementing a data science practice or (2) learning how to add data science as a skill set to accelerate your career, I have excellent news. The full stack data science skill sets can be learned very quickly through Business Science Unversity.

Here is why Business Science University will work:

  • We cover the entire data science workflow - Data Import, Data Preparation, Data Cleaning, Data Manipulation, Data Visualization, Functional Programing, Advanced Machine Learning, and Web Application Development
  • We teach how to solve problems first - The Data Science tools are secondary to the problem solving process. Therefore a typical course focuses on solving the problem while integrating 10-20 tools, which are the mechanisms for how we arrive at the end product.
  • We integrate cutting-edge tools and resources - We teach what works - High-performance, fast iteration, flexibility, and business value. These include tool sets like H2O Automated Machine Learning, LIME for explanations, frameworks like the BSPF, and referenced resources like the Ultimate R Cheat sheet
  • We provide a community for support - We operate a private Slack Channel with over 300 active members of like-minded individuals along with instructor support (yes, I am in there contributing and communicating every day)

Here’s an idea of what the Data Science Workflow looks like in Business Science University’s R-Track:

Business Science University is different. You learn the entire data science tool chain while you solve business problems.

Summary of Features

To summarize, Business Science University checks all of the boxes.


Special Offer - Business Science University

For a limited time, we are offering 15% OFF the Data Science For Business Bundle, which includes:

  1. Business Analysis with R (DS4B 101-R) - Beginner - Foundational Data Science program teaching 10+ tidyverse packages, 5 hours of machine learning in week 6, business reporting, and a lot more
  2. Data Science For Business With R (DS4B 201-R) - Advanced - Advanced Machine Learning and Business Consulting teaching H2O Automated ML, LIME, Tuning the Model for ROI (Return On Investment)
Get started with Business Science University 15% OFF

About The Author

Matt Dancho is the founder of Business Science and is and Instructor at Business Science University. He is committed to doing everything possible to help students successfully apply data science to business to generate value (ROI).

“I look forward to have you in my courses. I will do everything possible to help you succeed.”

-Matt Dancho, Founder of Business Science

Sriram S

AI Strategy | Advisory | Google PMLE | Star Performer | Learning Catalyst | Data Science Mentor

5 年

Great Job Matt!

回复
Karima Tajin

Lead Assistant Manager | ETL Analyst at EXL

5 年

HSSAINE ZAHIRI

回复
Jakub Rze?nik

PhD Candidate | Data Science | Python | SPSS

5 年

Iterative nature of data science process is worth ephasizing, it was a good read.

Kireshan Royan, FRM

Financial Services Advisory at Deloitte

5 年

Thanks so much for sharing this, Matt. The data science workflow approach is impressive. I think many fintechs, banks and corporations in general could benefit from the insights shared in this post.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了