"Hundreds of Studies Show..." Yeah, right!
Regularly, on various media, you hear experts say "there are hundreds of studies that show..." how important music is for kids, that minimum wage increases have little effect on employment, that self-criticism is neurologically destructive, that creatine supplements affect athletic performance, or that electromagnetic fields have no effect on human cells,... This plethora of studies is supposed to quell any doubts you may have about the assertion they support. And hundreds of interviews have shown that it works, and that this glib statement often ends the line of questioning.
Ask the follow-up question!
Interviewers rarely ask "Could you please describe one of these studies?" If you independently research the literature on any of these subject, you may find five or six studies rather than hundreds, often with one or more of the following characteristics:
- They are conducted by organizations with a stake in the results, like studies of the side effects of medications by labs funded by the pharmaceutical companies that make these medications.
- They are based on self-selected, tiny samples from large populations. Individuals who volunteer to participate in surveys are not a representative sample of the population. They may have more time available than others, or may be favorably disposed to the organization doing the survey.
- The data are subjective answers, like ratings on a scale of 1 to 10, as opposed to objective measurements, like weights or dimensions.
- The researchers' narrow conclusions have grown into bold, blanket statements through multiple retellings. A difference rated "at a 5% level of significance" has a 1 in 20 chance of being a meaningless fluke. By the first retelling, it has become just a "significant difference"; by the third, correlation has been turned into causation, even when the study's author warned against the confusion.
John Bohannon's dark chocolate hoax showed how a study deliberately designed with obvious flaws found its way into publication as a "refereed" paper and its conclusion into the international press, simplified as "dark chocolate makes you lose weight."
Hundreds-0f-studies talk should be a red flag, inviting skepticism rather than acquiescence. It's not the number of studies that matters but their quality. Those who actually back up their assertions with data either conduct their own studies or quote specific ones conducted by institutions with a reputation for rigor.
Check out the studies
As a layman, how can you tell whether a study is credible science, or part of the pervasive internet quackery that just confirms beliefs?
Following are two examples of studies:
- One from medicine, about the effect of pets on coronary hearts patients, which I find credible.
- Another from social sciences, about the "Hawthorne Effect" in Manufacturing, which I do not find convincing, even though it has been taught for decades at Harvard Business School.
Pets And Coronary Heart Patients
Today, the American Heart Association (AHA) asserts that owning a pet may protect you from heart disease. As usual, the popularized version overstates and distorts the actual findings, which were about chest pains and heart attacks, not all diseases of the heart. It also equates pets with dogs, and assumes that the benefits to heart patients come from the exercise they get from walking their dogs, when the actual study found that pets other than dogs had the same effect, even though cats and goldfish don't make their owners exercise.
A few years earlier, in a radio interview, a researcher explained that, in studying the survival of patients admitted to hospitals for chest pains or heart attacks, they had included possession of a pet among the predictors, just because they easily could, without expecting that it would make much difference. To their great surprise, it did, and it didn't matter whether the pet was a dog or another species.
In a few clicks, Google takes you to a paper entitled Animal companions and one-year survival of patients after discharge from a coronary care unit, published in 1980 by an arm of the National Institutes of Health, the largest biomedical research organization in the world, founded in 1887 and part of the US government. The institution that sponsored the research is not just a website put up by a consortium of pet food manufacturers.
The study is not recent, but 1980 was not the dark ages in data science. Most of the methods available today already existed, albeit in the form of mathematical theories, rather than the software packages we can now run on laptops. In 1980, you could do multivariate discriminant analysis, but it was not common and it entailed writing Fortran code on a mainframe computer.
The paper is 6 pages of small print, telling you which specific diseases the 92 patients had -- Myocardial Infarction (MI) or Angina Pectoris (AP) -- and that, after 1 year, 28% of the 39 patients with no pets had died, but only 6% of the 53 pet owners. It is a large difference on a fairly small sample, but they describe it as having a .02% level of significance.
Only 10 of the pet owners had animals other than dogs, and they all survived. The researchers also looked for other differences between pet owners and non-pet owners but couldn't find any that could explain the former's higher survival rate.
I see no red flags. It looks like a solid piece of research work, done objectively by qualified people, written up soberly, and with cautious inferences. If I survived a heart attack, my next step would be to adopt a Jack Russell Terrier.
Factory Workers, Lighting, And Productivity
In social sciences, it is a good idea to dig in and check the studies. I heard about the Hawthorne effect from my first boss in Manufacturing. "They experimented with making the lights brighter over an assembly line, and productivity went up," he explained. "Then they dimmed the lights, and productivity went up again. It wasn't the level of light that made a difference, but the change and management attention." He had learned this at Harvard Business School.
It turns out that this belief about all workers worldwide is based on observations of a group of five young women assembling relays at an AT&T plant near Chicago 90 years ago, separated from their colleagues in a special test room.
Data recorded in the Hawthorne test room
The timing matters, for two reasons:
- While human nature has not changed, workplace culture has, and particularly the status and education of women. Five young women workers of 2015 might react quite differently from their colleagues in 1925 to the same instructions, not to mention male workers.
- The discipline of statistical design of experiments didn't exist yet. Ronald Fisher 's seminal work on the subject had yet to be published.
If you take one look at the relay assembly room at the Hawthorne plant in 1925, it had obvious productivity improvement opportunities other than lighting, such as integrating this operation with others in a flow line. Even in 1925, it was not a new concept, and these relays were made by the millions/year.
Relay Assembly at Hawthorne in 1925
This picture also makes you wonder why the measurements were taken only on women when the workers in the actual relay assembly room were a mixed group of men and women.
According to Richard Gillespie, the reason illumination was studied was pressure from electrical manufacturers and utilities that were keen to show artificial lighting as better for production than daylight. At least in the US, they got their way, as the iconic sawtooth factory roof with its windows gave way to flat roofs on windowless buildings, with only artificial light on the shop floor.
My guess is that illumination may have more of an impact on quality than productivity. A well-lighted work place should allow operators to see products better and notice, for example, color discrepancies they might otherwise miss. And overly bright or overly dim lighting may cause mistake-inducing fatigue at the end of the shift.
Replicate the experiments
The best way to validate a theory is to replicate the experiments that back it up, and it should always be technically possible, if not practically feasible. You may not be able to replicate a study of heart patient survival, but, if you find the evidence of the existence of the Hawthorne effect to be less than compelling, before investing in a new lighting system, you can run your own illumination studies, and it shouldn't take years. If there are Six Sigma Black Belts in your organization, they are supposed to be trained in statistical design of experiments and should be able to do it.
Helper, Sid Joynson Partnership
6 年When seeking real understanding we must always remember the words of Pavlov and Ohno. “Don’t just be a collector of facts. Try to penetrate to the secrets of their occurrence, persistently search for the laws that govern them.” Ivan Pavlov. – “Understanding is my favourite word. I believe it has a specific meaning - to approach an object/subject positively and comprehend its nature.” Taiichi Ohno.
Helper, Sid Joynson Partnership
6 年When seaking real understanding we must always remember the words of Pavlov and Ohno. “Don’t just be a collector of facts. Try to penetrate to the secrets of their occurrence, persistently search for the laws that govern them.” Ivan Pavlov. – “Understanding is my favourite word. I believe it has a specific meaning - to approach an object/subject positively and comprehend its nature.” Taiichi Ohno.
Walk Lane @ Campbell Soup | Experienced Professional
6 年Very sinful
Lean Director at Wesco
6 年Michel, I really appreciate your inclusion of the original graph and photo of the Hawthorne experiment area -it really puts the claims in proper perspective. Frankly the output line looks rather flat over the 4-month period.