The biggest misconception in data visualization
Nick Desbarats
Instructor and best-selling author, data visualization and dashboard design | Taught in 15+ countries | Lecturer @ Yale, Columbia | LinkedIn Top Data Visualization Voice
This article is reposted from the?Practical?Reporting blog. Subscribe to the?Practical Reporting email list?to be notified of future articles like this (one to?three per month).
tl;dr: When designing a chart, most people try to come up with the ‘best way to visualize the data’. This often results in charts that are unobvious or useless to readers, though. Instead, we should try to design charts that best answer a specific question or that best communicate a specific insight about the data, even though such charts don’t answer all questions that readers might have about the data.
Like any field, data visualization has some common misconceptions floating around in it. There’s one, though, that I think has done more damage than any other, which is the assumption that…
“When designing a chart, the goal is to find the overall best way to visualize the data.”
“WTF are you talking about?”
How can that be a misconception? Am I suggesting that your goal should be to find a bad way to visualize the data? Obviously not. What am I saying, then?
Well, have a look at the data in the table below and three potential ways of visualizing it for our company’s CEO. Which of the three graphs do you think is the best way to visualize this data, graph A, B, or C?
The answer, of course, is that any one of these graphs could be ‘the best way to visualize this data’, depending on what, specifically, we need to say about the data:
Is any one of these graphs the ‘overall best way to visualize this data’, or the ‘truest representation of this data’? How would we even go about determining that? All three—and many other possible variations—are potentially ‘the best way to visualize this data’, depending on what, specifically, we need to say about the data. None of them is the ‘overall best way to visualize this data’, or ‘the best representation of this data’. In fact, there’s never a single, ‘overall best way’ to visualize any dataset; there are only ‘best ways to say different things about the data’, such as which regions have the highest or lowest expenses, or which regions are doing a better or worse job of sticking to their budgets.
That’s the harsh reality of data visualization that few people seem to realize: Charts never ‘show the data’, they always just say a few specific things about the data. Different ways of visualizing the same dataset make different insights about that data more obvious, less obvious, and not visible at all. Yes, it would be awesome if we could make charts that ‘just show the data’, i.e., that make all possible insights obvious or that answer all possible questions that readers might have about the data, but those charts don’t exist.
“Why not?”
Well, if we try to create a chart that makes all possible insights obvious or that answers all possible questions that readers might have about the data, we’ll always end up with a ‘spaghetti chart’:
Even this doesn’t answer every question that the CEO might have about this data, though. For example, if the CEO wanted to quickly see what fraction of total expenses each region represents, or how these expenses compare to those of the previous year, we’d need to add even more clutter. Indeed, we’d never stop adding clutter to our chart in a quest to ‘just show the data’ because there’s always a virtually unlimited number of things that we could say about any dataset.
“Why don’t we just use a table, then?”
Well, tables do ‘just show the data’ without saying anything about the data. Indeed, tables don’t make any insights obvious at all. For example, based on the table alone in the scenario above, is it obvious which regions are doing a better or worse job of sticking to their budget? Or what fraction of total expenses each region represents? Sure, the reader can get those insights, but they’re going to have to work for them and possibly do some calculations, and they’re far less likely to notice interesting or unexpected patterns or relationships in a table of numbers than in a graph.
Tables are also many times slower to consume than graphs and require a lot more cognitive effort to process, which substantially increases the risk that readers won’t get the insights they need from a table—or will just skip over it altogether—because it requires too much cognitive effort to consume. In most situations, then, saying a few things about the data (i.e., showing a graph) is far more useful than saying nothing about the data (i.e., showing a table).
领英推荐
“So, what does all this mean when it comes to actually designing charts?”
The next time you sit down to create a new chart, instead of asking yourself, “What’s the best way to visualize this data?”, ask yourself, “Do I know why I’m creating this chart?”, i.e., do you know what specific insight or answer you need the chart to communicate about the data? If the answer to that question is “no” (which it will be surprisingly often), you need to step away from the charting software and go find out. Perhaps you’ll need to do some exploratory analysis, or speak more with the target audience but, one way or another, you need to figure out what, specifically, your chart needs to say about the data. If you don’t, many of your design choices (chart type, color palette, etc.) will be quasi-random guesses, and the chances that the audience will get what they need from your chart will be low.
Once you’ve figured out what, specifically, your chart needs to say about the data, the next step is to accept that whatever design you come up with is going to communicate that specific insight or answer that specific question clearly (hopefully, anyway…), but there will be many other potentially interesting questions and insights that won’t be obvious in your chart, or possibly not visible at all. Not only is that O.K., it’s the only way it can work (unless you give your audience a spaghetti chart).
What happens if, try as you might, you can’t find out specifically why the audience needs to see a particular dataset or needs to see a chart? For example, perhaps the CEO has simply asked for “expenses for each department” and you don’t have the opportunity to ask them why they need that information because they’re too busy to meet with you. These are unpleasant situations to be in, but they do happen. In my Practical Charts course, we discuss strategies for increasing the odds that we end up giving the audience something that will be at least somewhat useful to them, but these strategies will have to be a topic for a future article since this one’s already longer than I’d like it to be. The bottom line, though, is that our chart probably won’t be as useful to the audience as it could be if we design it without knowing specifically what it needs to communicate about the data.
“So, are you also saying that…”
No. I want to be clear about a few things that I’m not saying:
Outside of obviously bad ways such as these, though, there are always many ‘best ways’ to visualize any dataset.
“Umm, this seems kind of obvious…”
The fact that there isn’t a single ‘overall best’ way to visualize a given dataset may seem obvious to some when it’s spelled out like this, but getting out of the mindset of ‘trying to find the best way to visualize this data’ and into the mindset of ‘designing the chart that best communicates a specific insight or best answers a specific question’ requires a fundamental shift in thinking that relatively few people seem to have made. I regularly hear even well-known experts discussing which chart design ‘best represents the data’ without even mentioning what, exactly, the chart is supposed to do. As I see it, though, that’s like arguing about whether a hammer or a screwdriver is ‘the best tool’ without ever mentioning if we need to pound in a nail or tighten a screw.
“But is this really the biggest misconception in data visualization?”
I think so, yes…
Let me know your thoughts in the comments, though. Do you have a different take on this idea?
By the way...
If you’re interested in attending my Practical Charts or Practical Dashboards course, here’s a list of my upcoming open-registration workshops.
Director of Policy | City of Cincinnati Councilmember Anna Albi
3 年Very insightful, thanks for sharing!
compliance reporting @ Natixis | spilledgraphics.com
3 年Nick Desbarats., first of all, kudos! I enjoyed this line (and I think it should have been bolded) "....we should try to design charts that best answer a specific question or that best communicate a specific insight?about?the data, even though such charts don’t answer?all?questions that readers might have about the data." Regarding this line "Yes, it would be awesome if we could make charts that ‘just show the data’".... wouldn't it be too boring though? you'll probably agree. With regard to this one: "In most situations, then, saying?a few things?about the data (i.e., showing a graph) is far more useful than saying?nothing?about the data (i.e., showing a table)." Amen!, could this fall into the category of "Less is More", sort of ? I say this because, for example, with dashboards we can easily fall into the trap of producing flashy, eye-candy, poppy visuals, thinking they are showing the data, and at some points, they are but very, very ineffectively. From my readings on visual perception, our minds trick us in thinking that these visual are relevant but they are not. They are attention-grabbing but once you start analyzing them or decomposing them, you realize they tell you very little or at times, nothing. I can continue adding more parts that I liked, reading from this article, but I will leave these two for now. Hopefully lots of business leaders, CFO, CEOs, hungry for analytics take a look at this article of yours. ?? p.s. I will find an example of a box plot that beats a strip plot ????
Supply Chain and Financial Analytics | Microsoft Certified PL-300
3 年I would see Waterfall or simple Table (with conditional formatting) much more readable