8 Building Blocks of Statistical Thinking

8 Building Blocks of Statistical Thinking

The fundamentals of statistical thinking are crucial for success and remain the same whether you're dealing with small or big data.

In their article [1], Roger Hoerl and Ronald Snee discussed 8 building blocks of statistical thinking.

Here's a summary of the practical tips, mistakes, and relevant case studies.

1) Clear Problem Statement

Tip: Clearly define the problem and intended uses of the model upfront. Don't get distracted by the data.

Mistake: Jumping into data analysis without clear objectives. This often leads to models that don't generalize.

Case Studies:

  • Lehman Brothers developed sophisticated default prediction models but didn't foresee their own bankruptcy in 2008.
  • The Netflix Prize competition produced a 10% better movie rating algorithm, but was not put into production.

2) Process Understanding

Tip: Thoroughly understand the process that generated the data, including how measurements were obtained.

Mistake: Assuming the data is perfect without evaluating its pedigree. Garbage in, garbage out.

Case Studies:

  • In the Duke cancer trial scandal, faulty gene signature data led to retractions and halted clinical trials.
  • Reinhart and Rogoff's famous study on debt and growth had errors due to selective data exclusion and coding mistakes.

3) Analysis Strategy

Tip: Develop an iterative, phased analysis strategy rather than jumping straight to modeling.

Mistake: Attempting to solve the problem in one pass based on model fit statistics.

Case Studies:

  • The FDA's phased clinical trial process progressively builds safety and efficacy evidence.
  • Steve Jobs iterated the iPhone design through multiple generations to achieve success.

4) Variation Sources

Tip: Seek to understand sources of variation. Reducing variation improves processes.

Mistake: Assuming all variation is noise instead of a signal pointing to root causes.

Case Studies:

  • Control charts helped Western Electric identify and fix process issues in the 1920s.
  • W. Edwards Deming's teachings on variation helped transform manufacturing quality in post-war Japan.

5) Quality Data

Tip: Assess data pedigree including origin, collection methods, and measurement process.

Mistake: Using flawed data and assuming algorithms can compensate.

Case Studies:

  • The Duke cancer trials and Reinhart-Rogoff studies fell apart due to data errors.
  • Pricing algorithms sent a biology textbook's price to $23 million based on bad data.

6) Domain Knowledge

Tip: Leverage domain expertise in data selection, variable choice, model interpretation, and more.

Mistake: Ignoring theory and expertise already established, and relying solely on data.

Case Studies:

  • Medicine leverages knowledge from biology, anatomy, physiology, and clinical experience.
  • Modern semiconductor manufacturing integrates engineering discipline with statistical methods.

7) Sequential Approach

Tip: Take an iterative, multi-phase approach to dig into a problem versus one pass at the data.

Mistake: Assuming a problem can be solved fully with one data set and analysis.

Case Studies:

  • Edison took 10,000 tries over years to invent a practical lightbulb.
  • The scientific method iterates through hypothesis, predictions, tests, and revisions.

8) Modeling Process

Tip: Choose a modeling process that aligns with the problem and avoids overfitting the data.

Mistake: Focusing too much on complex algorithms without sound statistical thinking principles.

Case Studies:

  • Apple avoided overengineering and kept the original iPhone design clean and simple.
  • Occam's razor: Simpler explanations are preferable to more complex ones.

Successful data analytics integrates sound statistical thinking, domain expertise, and business understanding with modeling techniques.

This holistic approach is more likely to produce actionable insights that stand the test of time. Just throwing algorithms at data is prone to errors, overfitting, and results that cannot be replicated.

Take the time to do thoughtful analysis.

References:

[1] Hoerl, R. W., Snee, R. D., & De Veaux, R. D. (2014). Applying statistical thinking to?‘Big?Data’ problems. WIREs Computational Statistics, 6(4), 222–232.


What’s Next?

?If you’re an industrial leader and want to know about Statistical Thinking Applications, check my previous Newsletter last week here.

?? Subscribe here and Join 6800+ leaders getting actionable advice to solve problems, make informed decisions, and lead with influence - every Friday.


Mohammed AlEnany, Consultant-Training-Business Development

Sales & Business Development, Process Engineering & Simulation Specialist, Digital Transformation Leader, Training Courses & eLearning Designer

1 年

Thanks for sharing, all the best :)

要查看或添加评论,请登录

Mohammad Elshahat的更多文章

社区洞察

其他会员也浏览了