登录查看更多内容

What's The Difference Between BI Analyst and Data Scientist?

Bill Schmarzo

Dean of Big Data, CDO Chief AI Officer Whisperer, recognized global innovator, educator, and practitioner in Big Data, Data Science, & Design Thinking

发布日期: 2014年11月19日

This is still the #1 question I get from many data warehouse and business intelligence folks. I use to show Figure 1 (BI Analyst vs. Data Scientist Characteristics chart, which shows the different attitudinal approaches for each) and Figure 2 (Business Intelligence vs. Data Science, which shows the different types of questions that each tries to address) in response to this question.

Figure 2: BI Analyst vs Data Scientist Characteristics

Figure 2: Business Intelligence vs. Data Science

However, these slides lack the context required to satisfactorily answer the question – I’m never sure the audience really understands the inherent differences between what a BI analyst does and what a data scientist does. The key is to understand the differences between the BI analyst’s and data scientist’s goals, tools, techniques and approaches. Here’s the more detailed explanation.

The Business Intelligence (BI) Analyst Engagement Process

Figure 3 outlines the high-level analytic process that a typical BI Analyst uses when engaging with the business users.

Figure 3: Business Intelligence Engagement Process

Step 1: Build the Data Model. The process starts by building the underlying data model. Whether you use a data warehouse or data mart or hub-and-spoke approach, or whether you use a star schema, snowflake schema, or third normal form schema, the BI Analyst must go through a formal requirements gathering process with the business users to identify all (or at least the vast majority of) the questions that the business users want to answer. In this requirements gathering process, the BI analyst must identify the first and second level questions the business users want to address in order to build a robust and scalable data warehouse. For example:

1st level question: How many patients did we treat last month?
2nd level question: How did that compare to the previous month?
2nd level question: What were the major DRG types treated?

1st level question: How many patients came through ER last night?
2nd level question: How did that compare to the previous night?
2nd level question: What were the top admission reasons?

1st level question: What percentage of beds was used at Hospital X last week?
2nd level question: What is the trend of bed utilization over the past year?
2nd level question: What departments had the largest increase in bed utilization?

The BI Analyst then works closely with the data warehouse team to define and build the underlying data models that supports the questions being asked.

Note: the data warehouse is a “schema-on-load” approach because the data schema must be defined and built prior to loading data into the data warehouse. Without an underlying data model, the BI tools will not work.

Step 2: Define the Report. Once the analytic requirements have been transcribed into a data model, then step 2 of the process is where the BI Analyst uses a Business Intelligence (BI) product – SAP Business Objects, MicroStrategy, Cognos, Qlikview, Pentaho, etc. – to create the SQL-based query for the desired questions (see Figure 4).

Figure 4: Business Intelligence (BI) Tools

The BI Analyst will use the BI tool’s graphical user interface (GUI) to create the SQL query by selecting the measures and dimensions; selecting page, column and page descriptors; specifying constraints, subtotals and totals, creating special calculations (mean, moving average, rank, share of) and selecting sort criteria. The BI GUI hides much of the complexity of creating the SQL

Step 3: Generate SQL Commands. Once the BI Analyst or the business user has defined the desired report or query request, the BI tool then creates the SQL commands. In some cases, the BI Analyst will modify the SQL commands generated by the BI tool to include unique SQL commands that may not be supported by the BI tool.

Step 4: Create Report. In step 4, the BI tool issues the SQL commands against the data warehouse and creates the corresponding report or dashboard widget. This is a highly iterative process, where the Business Analyst will tweak the SQL (either using the GUI or hand-coding the SQL statement) to fine-tune the SQL request. The BI Analyst can also specify graphical rendering options (bar charts, line charts, pie charts) until they get the exact report and/or graphic that they want (see Figure 5).

Figure 5: Typical BI Tool Graphic Options

By the way, this is a good example of the power of schema-on-load. This traditional schema-on-load approach removes much of the underlying data complexity from the business users who can then use the GUI BI tools to more easily interact and explore the data (think self-service BI).

In summary, the BI approach leans heavily on the pre-built data warehouse (schema-on-load), which enables users to quickly, and easily ask further questions – as long as the data that they need is already in the data warehouse. If the data is not in the data warehouse, then adding data to an existing warehouse (and creating all the supporting ETL processes) can take months to make happen.

The Data Scientist Engagement Process

Figure 6 lays out the Data Scientist engagement process.

Figure 6: Data Scientist Engagement Process

Step 1: Define Hypothesis To Test. Step 1 of the Data Scientist process starts with the Data Scientist identifying the prediction or hypothesis that they want to test. Again, this is a result of collaborating with the business users to understand the key sources of business differentiation (e.g., how the organization delivers value) and then brainstorming data and variables that might yield better predictors of performance. This is where a Vision Workshop process can add considerable value in driving the collaboration between the business users and the data scientists to identify data sources that may help improve predictive value (see Figure 7).

Figure 7: Vision Workshop Data Assessment Matrix

Step 2: Gather Data. Step 2 of the Data Science process is where the data scientist gathers relevant and/or interesting data from a multitude of sources – ideally both internal and external to the organization. The data lake is a great approach for this process, as the data scientist can grab any data they want, test it, ascertain its value given the hypothesis or prediction, and then decide whether to include that data in the predictive model or throw it away.

Step 3: Build Data Model. Step 3 is where the data scientist defines and builds the schema necessary to address the hypothesis being tested. The data scientist can’t define the schema until they know the hypothesis that they are testing AND know what data sources they are going to be using to build their analytic models.

Note: this “schema on query” process is notably different than the traditional data warehouse “schema on load” process. The data scientist doesn’t spend months integrating all the different data sources together into a formal data model first. Instead, the data scientist will define the schema as needed based upon the data that is being used in the analysis. The data scientist will likely iterate through several different versions of the schema until finding a schema (and analytic model) that sufficiently answers the hypothesis being tested.

Step 4: Explore The Data. Step 4 of the Data Science process leverages the outstanding data visualization tools to uncover correlations and outliers of interest in the data. Data visualization tools like Tableau, Spotfire, Domo and DataRPM[1] are great data scientist tools for exploring the data and identifying variables that they might want to test (see Figure 8).

Figure 8: Sample Data Visualization Tools

Step 5: Build and Refine Analytic Models.

Step 5 is where the real data science work begins – where the data scientist starts using tools like SAS, SAS Miner, R, Mahout, MADlib, and Alpine Miner to build analytic models. This is true science, baby!! At this point, the data scientist will explore different analytic techniques and algorithms to try to create the most predictive models. As my data scientist friend Wei Lin shared with me, this includes some of the following algorithmic techniques:

Markov chain, genetic algorithm, geo fencing, individualized modeling, propensity analysis, neural network, Bayesian reasoning, principal component analysis, singular value decomposition, optimization, linear programming, non-linear programming and more.

All in the name of trying to quantify cause-and-effect! I don’t suggest trying to win a game of chess against one these guys.

Step 6: Ascertain Goodness of Fit. Step 6 in the data science process is where the data scientist will try to ascertain the model’s goodness of fit. The goodness of fit of a statistical model describes how well the model fits a set of observations. A number of different analytic techniques will be used to determine the goodness of fit including Kolmogorov–Smirnov test, Pearson’s chi-squared test, analysis of variance (ANOVA) and confusion (or error) matrix..

Summary

The data science process is highly collaborative; the more subject matter experts involved in the process, the better the resulting model. And maybe even more importantly, involvement of the business users throughout the process ensures that the data scientists focuses on uncovering analytic insights that pass the S.A.M. test – Strategic (to the business), Actionable (insights that the organization can actually act on), and Material (where the value of acting on the insights is greater than the cost of acting on the insights).

[1] Disclaimer: I serve on DataRPM’s Advisory Board

--------------------

Thanks for taking the time to read my post. I’m fortunate that I spend most of my time with very interesting clients which fuel many of my topics. I hope that you are able to leave a comment or some thoughts about the blog. If you would like to read my regular blogs, please follow me on LinkedIn and/or Twitter.

In case you are interested, here are some of my favorite posts:

Big Data Senior Executive C.A.R.E. Package
To Achieve Big Data’s Potential, Get It Into The Boardroom
Vision Workshop
Big Data Business Model Maturity Index (animation)
Big Data For Competitive Differentiation
Developing a Business Strategy with Big Data
User Experience: the new king of the business
How I’ve Learned To Stop Worrying And Love The Data Lake

I am the author of the book “Big Data: Understanding How Data Powers Big Business” and am working on my second book “Big Data MBA: Driving Business Strategies with Data Science” due in December. I also teach the "Big Data MBA" at the University of San Francisco (USF) School of Management, where I was recently named the first Fellow of the USF School of Management.

Olga Kolesnichenko

SAB

9 年

Amazing knowledge, very thank you Mr. Schmarzo! I have already included your approaches to lectures for my students.

Frederic Chomette

9 年

Like "Data science for the Dummies" ! Thanks a lot for this easy-to-read and rich post

Kanna Dhasan

Technical Manager at Anoud Technologies Pvt Ltd

10 年

both work on patten analysis and matching techniques, KPI is more specific to the business where every customer wants to know those parameter value with various trend analysis, here BI analyst use the predefined tool to brings out those result and present to customer. In terms of data scientist as per his knowledge and known analysis he generate trend report using patten matching technique like z transfermation logic

Vinay Gupta

Head Data Analytics & Business Excellence at Suzlon Global Services Limited

10 年

I think BI Analyst analyses the data using pivot tables, OLAP cubes and effective visualization to create reports and outputs as per the business KPIs. Whereas, data scientists takes a deep dive into the statistical methods, why's and how's of the output results, tweaking of algorithms and different parameters values to see the variation in output, carry out hypothesis testing, comparing various models and then coming out with the inference/conclusion. The visualization tools are comparatively less focussed at...

Edward Bobrin

Technology Consulting Executive, Data & AI Leader at Ernst & Young, LLP (EY)

10 年

There is a lot of gray area between the definitions and the tools represented. Definitions: I think there are BI Analysts that (while I am not equating them to a qualified data scientists with programming and extreme mathematical skills), play the role of 'scaled down' data scientist in organizations that have not made the type of investments in data science. Again, there is a lot more gray out there than the comparison leads you to believe. Overall I agree with the general themes you describe... maybe just adding the caveat that these definitions are not one size fits all does the trick. Tools: For example, what makes Tableau a better data discovery or 'data scientist' tool than Qlik? They both rely on R integration to the heavy data science visualizations (e.g. clustering, etc..) Thanks for sharing - I do like the Data Science Engagement Process graphic overall. :-)

查看更多评论

要查看或添加评论，请登录

Bill Schmarzo的更多文章

Why Everyone Needs to Think Like a Data Scientist in Today’s Environment

2022年7月16日

Why Everyone Needs to Think Like a Data Scientist in Today’s Environment

The rise of data is driving an unprecedented wave of business opportunity across all business areas. However, with such…

39 条评论
Data Management Sessions at Dell Technologies World 2022

2022年4月25日

Data Management Sessions at Dell Technologies World 2022

Data, data everywhere…not a byte to use! As much as enterprises are getting ready to brace for the Data Decade, it is a…

18 条评论
Mastering the Data Economic Multiplier Effect and Marginal Propensity to Reuse

2021年6月6日

Mastering the Data Economic Multiplier Effect and Marginal Propensity to Reuse

Note: this blog introduces the concept of the Marginal Propensity to Reuse which is the primary driver behind the Data…

29 条评论
Data Science 2.0: From Analytic Outputs to Business Outcomes

2021年4月25日

Data Science 2.0: From Analytic Outputs to Business Outcomes

The “Data Science Learning Roadmap for 2021” in Figure 1 created by FreeCodeCamp does a great job of articulating the…

5 条评论
Data Science 2.0: From Analytic Outputs to Business Outcomes

2021年3月9日

Data Science 2.0: From Analytic Outputs to Business Outcomes

The “Data Science Learning Roadmap for 2021” in Figure 1 created by FreeCodeCamp does a great job of articulating the…

5 条评论
Digital Transformation Requires Redefining Role of Data Governance

2021年2月8日

Digital Transformation Requires Redefining Role of Data Governance

I’m overjoyed to announce the release of my latest book “The Economics of Data, Analytics, and Digital Transformation.”…

17 条评论
Master Machine and Human Learning to Win the Digital Transformation Wars

2021年1月18日

Master Machine and Human Learning to Win the Digital Transformation Wars

The “Economies of Learning” are more powerful than the “Economies of Scale” This may be my most powerful concept…

12 条评论
Crossing the Analytics Chasm with Nanoeconomics

2021年1月11日

Crossing the Analytics Chasm with Nanoeconomics

“I love it when a plan comes together” – John (Hannibal) Smith, The A Team One of the biggest challenges that I…

16 条评论
Ethical AI, Monetizing False Negatives and Growing Total Addressable Market

2020年12月28日

Ethical AI, Monetizing False Negatives and Growing Total Addressable Market

What if I told you that companies that don’t embrace Ethical AI are leaving significant amounts of “Money on the…

5 条评论
Mastering Nanoeconomics in the Era of Digital Transformation

2020年12月21日

Mastering Nanoeconomics in the Era of Digital Transformation

As I state in the opening paragraph of my new book “The Economics of Data, Analytics, and Digital Transformation”: “The…

11 条评论

See all articles

What's The Difference Between BI Analyst and Data Scientist?

Bill Schmarzo

Dean of Big Data, CDO Chief AI Officer Whisperer, recognized global innovator, educator, and practitioner in Big Data, Data Science, & Design Thinking

The Business Intelligence (BI) Analyst Engagement Process

The Data Scientist Engagement Process

Summary

Bill Schmarzo的更多文章

社区洞察

其他会员也浏览了

Why SQL is Essential for Business Data Analysts

How Tableau Software and big data come together

Updated: Difference Between Business Intelligence and Data Science

Star Schema vs. Snowflake Schema: Which Data Model is Right for Your Power BI Dashboard?

BI Lifecycle: Understanding the phases of the BI process, from data collection to analysis and communication of results.

Day 9: Unlocking Data Insights with Power BI

What do a Microsoft Power BI Data Analyst | PL-300 | Belayet Hossain

Understanding the Distinctions: Data Scientist vs. Data Analyst vs. Business Analyst. Power BI Developer vs. Tableau Developer

A Comprehensive Guide to Level of Detail (LOD) Expressions in Tableau

Business Intelligence & Data analytics | My notes 8'23

The Business Intelligence (BI) Analyst Engagement Process

The Data Scientist Engagement Process

Summary

Bill Schmarzo的更多文章

Why Everyone Needs to Think Like a Data Scientist in Today’s Environment

Data Management Sessions at Dell Technologies World 2022

Mastering the Data Economic Multiplier Effect and Marginal Propensity to Reuse

Data Science 2.0: From Analytic Outputs to Business Outcomes

Data Science 2.0: From Analytic Outputs to Business Outcomes

Digital Transformation Requires Redefining Role of Data Governance

Master Machine and Human Learning to Win the Digital Transformation Wars

Crossing the Analytics Chasm with Nanoeconomics

Ethical AI, Monetizing False Negatives and Growing Total Addressable Market

Mastering Nanoeconomics in the Era of Digital Transformation

社区洞察

其他会员也浏览了

Why SQL is Essential for Business Data Analysts

How Tableau Software and big data come together

Updated: Difference Between Business Intelligence and Data Science

Star Schema vs. Snowflake Schema: Which Data Model is Right for Your Power BI Dashboard?

BI Lifecycle: Understanding the phases of the BI process, from data collection to analysis and communication of results.

Day 9: Unlocking Data Insights with Power BI

What do a Microsoft Power BI Data Analyst | PL-300 | Belayet Hossain

Understanding the Distinctions: Data Scientist vs. Data Analyst vs. Business Analyst. Power BI Developer vs. Tableau Developer

A Comprehensive Guide to Level of Detail (LOD) Expressions in Tableau

Business Intelligence & Data analytics | My notes 8'23