Understanding Variation:  What Part of "You have NO choice!" don't Leaders and Management "Get?"

Understanding Variation: What Part of "You have NO choice!" don't Leaders and Management "Get?"

Can we please stop the obsession with rankings and percentiles?

Don't tell me you're not tempted to look when you spot a magazine cover saying "How does your state rank in [trendy topic du jour]?"

Many of these alleged analyses rank groups on several factors then compare the groups' sum totals of their respective ranks to make conclusions.

For example, in 2006, I was at a presentation by someone considered a world leader in quality (WLQ) who has been singing Dr. Deming's praises since the late 1980s. He presented the following data as a bar graph from lowest score to highest. 

It is the sum of rankings for 10 aspects of 21 counties in a small country's healthcare system (considered on the cutting edge of quality). Lower sums are better:  Minimum = 10, Maximum = 210, Average = 10 x 11 = 110.

My antennae went up.  A bar graph?  With absolutely no context of variation for interpretation? And a literal interpretation of the rankings?  

What's wrong with this picture?

I wanted to use one of my favorite techniques, analysis of means (ANOM), to take a systems view of things. When looking at improvement opportunities, the mindset must change from “comparisons of individual performances” to “comparison of individual performances to their inherent ‘system’.”

I wrote to him for the data and he graciously complied.

There is a statistical technique known as the Friedman test where it is legitimate to perform an analysis of variance (ANOVA) using the combined individual sets of rankings (not shown) as the responses. I won’t bore you with the details, but the p-value of this analysis was < 0.001 -- little doubt that there is indeed a difference amongst counties.  Now...which ones?

[Only if you are interested in the statistics involved, click here]

From the ANOVA, one can calculate what is called the least significant difference (LSD) between any two summed scores due to common cause, which in this case was 51. Because of the potential of 220 potential pairwise comparisons, one can also calculate a more conservative difference to take this into account, which results in a difference as high as 91 being possible common cause.

Given the rankings and results of the two calculations above, suppose there was a subsequent meeting to discuss the rankings, possibly revise them, and then decide on how to take action.  Do you think the "unknown and unknowable" effects of the variation in human perception of variation might affect the discussion and subsequent actions? Ouija board, anyone?  

Do you see the dangerous potential for treating common cause as special cause?  Do they realize that their actions will have serious consequences?

Oh, and those two calculated differences aren't worth very much.

Let's consider these counties as a "system" and use ANOM

The ANOM results are below (overall p = 0.05 and p = 0.01 reference lines drawn in).  Note that the points are not connected and the horizontal axis order has no time element.  I chose to display them from smallest score to largest.

Dr. Deming hated probability limits and would just use his mentor Walter Shewhart’s recommended limits of "three" standard deviations (as in the red bead experiment comparing workers) – in this case, 110 +/- 55 (55 to 165), neither a whole lot of difference in limits from the ANOM nor change in the conclusions.

The statistical interpretation is that there is one outstanding county (#1), one county indeed "below" average in performance (#21), and the other 19 counties are, based on this data...indistinguishable!  

In The New Economics, Dr. Deming shows a similar chart and comments about the performance equivalent of counties 2 to 20: These cannot be ranked!  I once analyzed a similar state ranking. There were two states truly above average, two below average...and states three through 48 were indistinguishable.

Discussions on data like this involve a lot of talk about "quartiles," top or bottom 10 percent, and above- and below-average performances. Sound familiar?  (Healthcare folks:  Press-Ganey reports?)

When I shared this analysis with the WLQ, I was shocked at his response. Our verbatim e-mail correspondence follows. I'll let you use your own judgment of his comments and subsequent actions to decide whether he "got it."

World quality leader:  "A subtle issue you did not tackle is the political-managerial issue of communicating such insights to [the two special cause counties] and the counties that thought they were 'different,' but, statistically, aren't. I wonder what framework one could use to approach that psychological challenge."

Davis:  "As I say to my audiences, 'Hey...I'm just the statistician, Man!'

"I'm going to be very hard on you here, but I think the issue is how people and leaders like you are going to facilitate these difficult conversations...which will be profoundly different...and productive!  This is the leadership that quality gurus keep alluding to...and seems to be in very short supply.

"My job is to keep you all out of the 'data swamp'; however, I would be a very willing participant.  I have a saying, 'I'm the statistician, I know nothing.  You're the [leaders], you know too much.  That makes us a great team!'

"And I would love to pilot some of these types of analyses with you or other leaders -- We need to figure out what this process should be. This is potentially very exciting and could quantum leap the quality improvement movement.

"My point is that this 'language' needs to be a fundamental piece of any improvement process...and led by leaders who understand it and are now promoted into positions of leadership only if they understand it.  If this could become culturally inculcated, then the ongoing daily defensiveness reacting to data stops...PERIOD!

"The discussion will then focus, as it should, on process.

"I am seeing far too much concern about 'hurting people's feelings.' This would change that as well as have conversations leading to appropriate action.

"That's what I've been saying the last few years -- We need new conversations...and this could be a key catalyst."

World quality leader:  "Nope. I don't buy it. Yes, I am a leader and need to carry the message.  But I know you too well to let you off the hook. I'd love to see you try to lead these conversations and experiment with approaches. You're a leader, too."

Davis:  "Give me an opportunity and I will do my best to lead that conversation (and feel that we could begin by co-facilitating it). Have you fathomed the potential of this?"

That last e-mail has never been answered. Here it is, 10 years later, and several follow-up gentle e-mail reminders have been ignored. I'm still waiting for that promised exciting opportunity, but I've given up any hope.  And I've had no more luck persuading any other leader to give it a try.

At his insistence, I sent the analysis with explanation to the original executive group who collected and summarized the data.  No reply.

A Serious Consequence for Healthcare

I've done several grand rounds for various groups of doctors. When I explain ANOM and "plot the dots," just about every audience has said, "This makes sense!  If data were presented this way, we would take care of it ourselves."

Doctors and hospitals especially are currently being victimized left and right with inappropriate analyses and rankings by statistical "hacks" (Dr. Deming's term). Many of these have major influence on reimbursement. One common criterion is to penalize anyone falling into the bottom quartile of performance!  Given a set of numbers, aren't 25 percent of them the bottom quartile?

I've even seen criteria using one standard deviation (usually calculated incorrectly) to find "those outliers."  And then there are the "we feel that you should.be able to achieve..." arbitrary goals that treat any deviation as special cause, e.g., healthcare and the goal of zero "never events" (and punishment if you have one).

Is it any wonder why physicians are so angry?

How much variation would be reduced if ANOM could be standardized as an analysis?  A side benefit: rather than focusing on just rank, the exposure of variation in performance could result in non-defensive conversations to reduce inappropriate and unintended variation.

Many of this example's statistical principles are what Dr. Deming  demonstrated in his seminars (and, yes, the red bead experiment is an ANOM!).  After 30+ years of trying to teach similar things, I am still amazed at the abject cowardice (Yes, cowardice!) I see in (alleged) leaders abdicating responsibility to comprehend the liberating power of a simple understanding of variation.

Deming had zero tolerance for such ignorance (or is it arrogance?). Is that too much to ask of someone making a six- or seven-figure salary whose actions affect the "five-figure salary" folks?  

Amusing note:  My own state of Maine had a panicked headline in the newspaper a couple of weeks ago:  "Maine's ranking drops from 13th to 17th" in something or other, and the explanations and excuses started flying. I wonder on whom the blame finally fell?  Common cause or special?

In how many meetings does this nonsense go on with their accompanying, staggering "unknown or unknowable" costs?  Good news:  once you "get it," you now "know" what to do.

Many people who think they "get" Dr. Deming's message don't. To deeply understand the message and its power has taken me over 30 years...and I don't do the red bead experiment.


Update: please see Lourdes Pozueta Fernández's insightful comment below where she suggests a common cause strategy to start a different conversation to gain even more insight. I clarify this in my two responses.


This is one of 10 everyday examples in Chapter 2 of my book Data Sanity. These examples are designed specifically for leaders and managers to give an overview of the awesome power of a basic understanding of variation.

Chapter 7 thoroughly covers Analysis of Means and is one of the very few available resources to do so. As I hope you surmised, ANOM can be a very powerful statistical stratification technique and has become one of my major analyses. Unfortunately, it is woefully underuitilized.

Please see my LinkedIn profile for more information and clarifying downloads regarding Data Sanity.

Dr Tony Burns

Q-Skills3D Interactive learning in Continual Improvement for all employees

3 年

Great article Davis. Your subgroups have a size of 1. ANOM tables show a minimum value of n=2 (https://www.qualitydigest.com/inside/quality-insider-column/analysis-experimental-data.html) How do you handle this? Did you imply you used k=10 ?

David Strobel

Owner and Family Physician at HELPcare Clinic

8 年

In my fairly long experience, based largely upon your life-changing teaching, appropriate data analysis, generally abundantly obvious for "any fool to plainly see" from a simple run chart (even devoid of more complicated control limits), can stop even the dumbest people from being even dumber if the data are merely SEEN. How much money is wasted on analysis of nothing?

Matt Hansen

Principal Business Consultant

8 年

I never use the ANOM (though will start to now), but I am wondering something about your use of it here. You state how the data by counties is based on 10 aspects in their healthcare system. I would presume those aspects were using continuous measurements to yield the averages used in these original rankings and that the data from those measurements followed a normal distribution. Otherwise, if the individual measurements were using discrete data or if the distributions were non-normal, then the ANOM (or any tool using averages) would probably not be the best tool (or should at least be verified using other tools measuring discrete data or non-normal data). Is that correct? If so, then what do you use discrete and/or non-normal data? But if that's not correct, then how do you account for using the ANOM with discrete or non-normal data? Thanks!

Matt Hansen

Principal Business Consultant

8 年

Thanks for sharing this, Davis! If you don't mind, I'd like to re-post this article in my group https://www.dhirubhai.net/groups/4390601 which is for my https://StatStuff.com website. I had a similar debate with my former company's executive leadership where JD Power presented results showing we were last in performance among our competitors, but when I dug into the data in a similar way as you describe here (I used interval plots instead of ANOM), I proved we were not last at all.

Aaron Spearin, MBB, PMP

Innovation and Transformation Consultant: Profitability-Strategy-Design Thinking-Customer Experience-Lean Six Sigma-Quality-Reliability

8 年

It sounds to me the problem is less about the numbers, or how they're presented, and more about the consequences, also something Deming would have a field day with. I've met very few leaders willing to devote the brain bandwidth to understand the particulars of the numbers. Rather than just show the numbers, because there is so much gray area within that +/- 3 Sigma, what should be presented are recommendations. "Based on the data, I recommend we focus our efforts here..." The top-grading and flush the bottom approach ultimately leaves you with an oligopoly. If you always drop the bottom 25%, eventually only 3 players will survive. Nice post. Thank you.



Davis Balestracci的更多文章

