How to lie with visualization
Stephen Redmond
AI Visionary | Head of Data Analytics and AI at BearingPoint Ireland. Delivering real business value to our clients by harnessing the transformative power of data and AI.
OK, so I don't really want to teach you how to lie with visualization! I want to discuss those common mistakes that are made in Data Viz, whether accidental or on purpose! Knowing this will help you to protect yourself when looking at charts on TV, in newspapers, and on social media, as well as help you to not make those mistakes yourself.
Let's think for a minute about what the primary goal is (or should be!) with data visualization.
It's pretty simple:
- we gather some data together from one or several sources;
- we “munge” it together in an appropriate way;
- we pick some visual variables – length, color, shapes, direction - in order to represent the data visually;
- with the goal of getting it into a human’s brain so that they can quickly understand what they are seeing.
And that's it. That's the primary goal!
But how much do we really know about these humans that we are talking about?
Pause for a moment and look at the picture above – what do you see? Some people have told me that they see a whale, or a shark, or even a submarine. Others have told me that they see a cat sitting on a whale, smoking a cigarette! But what do you see?
Of course, we are not looking at a shark or a submarine! It is a just a collection of water droplets. Psychologists know all about this tendency for people to see “castles in the cloud”, men in the moon, or faces in toast. The have a name for it: “Pareidolia”, it is our tendency to look for patterns in everything we encounter. We match those to our experience and memories and our brains come up with a suggestion of what we are looking for. We are, in fact, a pattern matching animal!
Many of us will have come across this paragraph above before as it has been widespread on social media for many years. If you haven’t seen it before, notice that as you read through it, you suddenly get the pattern “rule” and start to understand what you are looking for.
We start with "I cdnoult blveiee" – "what???!!!", but soon we are realizing that is it smart research from the University of Cambridge and that is seems spelling isn’t important.
Of course, part of this is true and is probably why we can find it hard to proof read our own work – we just skip over the misspelled words because the shape of them fits in the sentence, but it is not always so, and “rules” in people’s brains are not always 100% correct. We may struggle sometimes to understand items that seem to comply with the rule but are far enough away from our experience.
Can you read this sentence above? It fulfils the rules that are stated by the previous paragraph, but seems to be very difficult to parse. It would be quite a challenge to work out that baseball players performing similarly absolutely deserve comparable treatment!
So, actually the order of the letters within the words does matter and if they diverge too much from the pattern that we know, then it becomes impossible to understand. So, how do we derive rules from all of this?
We all know the rules of visualization that are evangelized by many DataViz gurus, don’t we:
Bar charts and line charts are good. We should use more of these because people understand them better. And, of course, pie charts are bad and should never be used!
Interestingly, the idea that people understand bars and lines has not always been so. When William Playfair first published his Commercial and Political Atlas in 1786, he needed to have a segment on how to read line charts as they were not something that had been used before for such representations. And while bar charts seem to be the perfect choice in many situations, Playfair only began using one because he did not have the full set of data for Scotland! Bar charts are a powerful choice, because we have a great ability to match the length of each bar against each of the others, but with that great power comes a great ability to be misused!
My own published research, along with many others, shows that the "rule" about pie charts isn’t really true either, and pie charts can be used freely in their correct context: part-to-whole comparison.
So, if we can't rely on these "rules" to fulfil our primary goal, what can we rely on?
A few years ago, on my QlikTips blog, I published my Fundamental Rules of Data Visualization - “Redmond’s Rules” - and they are as simple as:
- Make sure that you pick a correct visual variable to represent the data – it could be a line, a bar, a box, a segment, a color – whatever works. Understand why it works.
- Make sure that you provide the context, what Alberto Cairo, in his book How Charts Lie, calls "Scaffolding", all of the annotations, labels and titles, and even side text to help explain what the user is looking at.
- Finally, make sure that you are showing the user something that will actually mean something to them. If they are thinking “so what” then you have got it wrong!
Interestingly, it seems that most of the rules that we hear from other sources focus on the first two of these, and many seem to discard the last rule – which is actually the most important! You can spend a whole bunch of time developing the most perfect representations of the data and provide all the right labels and annotations, but if it doesn’t provide a good business answer, your viewers won’t care!
Let’s have a look now at some examples where people have broken even the simple rules in order to use our brains against us.
This first one is a classic example from the Quaker Oats company when they were advertising how well their product does in quickly reducing cholesterol.
At first, it looks like it does great, but as we look more closely, what we see is that the y-axis here is only representing values between 196 and 210. This should always be something that we look out for and cry foul if we see it!
They are using our great ability to pattern match the lengths of the bars against each other to make it look like there is a steep fall-off. The chart on the right shows how the data looks when we correct the axis – not really a steep fall off at all, and not something that you would run an advertising campaign about!
Here we see that we are only shown six months of data. In fairness to the designer, they have called this out in the sub-text. Although, do you always look at the sub-text?
If we have learned anything so far, we will also note that the y-axis does not start at zero. This is not always as much of a crime in line charts, though we should always be wary that it might, as in this case, enhance the steepness of a climbing or falling value.
The reason for only showing six months becomes clear when we see the whole story. Now, for this company, it may be OK for them to only show the data like this, but if we can’t see a full picture, and the axes look a little wrong, always ask questions!
One of the things that can often happen accidentally, is the juxtaposition of two charts of the same type but with different scales. Our brain will sometimes just flow though and take the whole picture in as one and miss that the other chart is on a different scale.
In this case though, because the measure is the same in both charts, putting them side-by-side like this, with different axes, would seem to be less than accidental. Brewed coffee has significantly more caffeine in it than black tea!
I mentioned before that pie charts are a good choice for presenting part-to-whole comparisons. But we really need to understand what is a part and what is the whole!
This is a recreation of a chart that was presented by Fox News in 2012. We can see, of course, that the values presented as percentages do not add up to 100%. This chart is used often as a primary example of this flaw, and it always makes my brain hurt a little!
To close, a quick refresh on Redmond’s Rules:
- Use a good visual variable to represent your data - a bar chart, a line chart, a pie even, whatever one suits the data and the story that you are telling.
- Apply all the right labels, axes and additional text – Albert Cairo's "scaffolding".
- Make sure that you are delivering insight that matters to the business user!
Remember, the last one is the most important – don’t have your users saying "so what!"
As a recommended reading on this topic, I would always recommend Alberto Cairo’s book on How Charts Lie, which goes into more depth than I could today – including how not to use a Sharpie on a hurricane map! You can read my review of the book here.
This article is an excerpt from my presentation at QlikWorld 2021. If you are interested in hearing me speak to these charts, and several other examples, that presentation is now available on demand - search for session #486257
Data/Business Intelligence Technical Consultant
3 年Great article and thank you Stephen for sharing. In the case of the two bar chart representing caffeine per cup in both coffee and tea, as you described that comparing the bar charts side by side is very misleading if one doesn't look at the numbers being represented on the y- axis for the individual charts. What is the best way to represent such charts side by side?
Group Head Content Studio - Irish Times Group
3 年Great article Stephen!