Lying Effectively With Data Visualization: Part Two
A few weeks back, I began a series on How to Lie with Data Visualization.
Here's the second installment.
How do you lie with a data visualization?
One useful strategy is to (A) include data that isn't relevant, and (B) add lines and shading to keep the audience's eyes where you want them, rather than letting them see anything in the data that might erode your argument. Here's a data visualization from a Harvard meta-study that chose to conclude that discrimination is NOT going down.
The shaded funnel represents studies that the researchers decided "counted" for purposes of their study. If you read the study, the process all sounds very neutral and scientific. But the wording of the study results, and the visual presentation of the result has some very interesting characteristics. (I'll deal with the wording elsewhere.)
The area to the left of where the shaded funnel begins is all study data that the researchers chose to put into the graph, despite the fact that it wasn't used in the analysis. However, other studies that were NOT off to the left there, but which were not used, do NOT appear on the graph. Interesting, yes? (See that cute little 1976-ish study that found there was NO discrimination. ROFL.)
So the researchers have placed this big area off to the left that serves mainly as an open canvas to display their selected trend line. Then there's a ten-year gap with nothing, then three large studies they chose to keep, then another ten year gap with nothing, and then the studies start in earnest.
If you put your hand over the left half of the graph, like most people you will see the real trend immediately, even with the funnel. If it weren't for the shaded funnel the researchers added, everyone would see this...
And from there, almost everyone will make the VERY short hop to here.
and here...
And here...
Now, I am NOT claiming that "discrimination is dropping precipitously", because I don't think that is exactly what is being measured by these studies. However, it is clear that the thing these studies CALL discrimination is dropping significantly.
However, that apparent drop may be as much an artifact as the study results themselves. In particular, the 2001-2002 study that I have marked here in green below has some serious semantic and methodological flaws that I'll be exploring in my "name game" series.
For now, though, It should be enough that anyone can see why the researchers, in order to report "no drop in discrimination", had to do what they did, and had to choose to start with three studies before a huge gap, rather than just doing a "twenty year study".
Because otherwise, they would have been forced to come to the reverse conclusion.
Privacy Oriented IT&Networking Consultant, Researcher, Machine Learning Engineer Consultant, Bayesian Statistician and Economist
6 年It's not just yesterdays science, it's yesterdays deliberately bad science with a "Data Science" facelift.