登录查看更多内容

Beyond Bar Charts and Pie Charts

Ismael Chang Ghalimi

CEO @ STOIC

发布日期: 2019年12月26日

As the saying goes, "a picture is worth a thousand words."

But drawing a good picture can be harder than telling a good story. And when the story is about data, drawing a good chart can prove to be quite a challenge. Charts are everywhere, yet most are poorly designed, when not flat-out wrong. And while we are taught to draw charts as early as elementary school, we are rarely told how to design them well.

This sorry state of affairs is regularly chronicled in articles denouncing the evils of prevalent yet poorly-understood charts like pie charts. In these diatribes, we are told to use bar charts instead, often without proper explanations for the unique qualities of pie charts.

In this article, I will outline the main benefits and limitations of both charts.

Then, I will introduce a new kind of chart that combines the best of both.

In order to support my argument, I will use a very simple dataset: a count of functions by modules found in the STOIC Intelligent Spreadsheet. If we plot this dataset on a bar chart, we get something like that:

This bar chart (also known as a frequency chart) is really good at one thing: it makes it easy to compare multiple values. For example, even without looking at the counts displayed at the top of the chart, we can see that Statistics and Probabilities have roughly the same number of functions. Or that Date & Time and Trigonometry have exactly the same number of functions, while Math has fewer of them.

These comparisons are made possible (and easy) because all bars share a common baseline (the horizontal axis), and because the human brain is really good at making such comparisons whenever a common baseline is available (Cf. Graphical Perception).

What this chart is not good at is to help us evaluate the contribution of each value to the whole. For example, how many functions are there in total, and what fraction of this total can be found in the Math module. Or what is the smallest number of modules that would give us at least half of all functions? In other words, a bar chart is great for comparing absolute values, but ineffective for comparing relative values.

This is why the pie chart was invented.

Here, we have a half-donut chart (also known as arc chart), because it looks a lot better than a pie chart. In fact, it is not only better looking, it is also more accurate from a theoretical standpoint, but we will leave that for future articles (read this if you cannot wait).

Thanks to the 6 inner ticks that visualize the 0﹪, 20﹪, 40﹪, 60﹪, 80﹪, and 100﹪ thresholds, we can easily guess that Math accounts for about 5﹪ of all functions, and that Statistics, Probabilities, and Text almost account for half of all functions, but not quite.

What this chart is not good at is to help us compare values with each other: for example, we can see that Date & Time, Trigonometry, and Math have roughly the same numbers of functions, but we cannot tell whether these are exactly the same. In other words, a pie chart is great for comparing relative values, but ineffective for comparing absolute values.

Is there a chart that could do both? As you have guessed already, there is one, but before we introduce it, we should present a rarely-used alternative to the pie chart, and before we even do that, we should present a more common alternative: the stacked bar chart.

Clearly, the stacked bar chart is nothing more than the Cartesian equivalent of a donut chart. In other words, while the donut chart uses a polar coordinate system, the stacked bar chart uses a Cartesian coordinate system, but the visualized information is exactly the same.

Unfortunately, the stacked bar chart works well only with a limited number of values, and starts deteriorating once values become small, because labels and values become difficult to display. In the previous example, we had to use smaller font sizes to address the issue.

Alternatively, we could offset labels and values with connectors, like we did earlier on the half-donut chart, but these would make the chart more difficult to read. Another option would be to display the stack horizontally instead of vertically, but this would work only with short labels, and some natural languages like German tend to have long ones.

Bottom line: the stacked bar chart is not an attractive option for the dataset at hand. But a less common yet more effective alternative might come to mind: the waterfall chart.

A waterfall chart is constructed from a vertical stacked bar chart, by offsetting the bars alongside the horizontal axis. In order to make the construction even more explicit, an All sum bar can be added to the left or to the right. When added to the left, the bars follow a decreasing curve, and we get something that actually looks like a waterfall. When added to the right (as in the example above), the bars follow an increasing curve.

This chart is almost as effective as the pie chart for comparing relative values. Unfortunately, because its bars are much shorter than the ones drawn on a bar chart and do not share a common baseline, it makes comparisons between different values really difficult. For example, we can see that Date & Time, Trigonometry, and Math have roughly the same numbers of functions, but we cannot tell whether these are exactly the same or not.

By design, the waterfall chart has the exact same benefits and limitations as the pie chart: it is great for comparing relative values, but ineffective for comparing absolute values. Nevertheless, it is a necessary precursor to the chart that will combine the benefits offered by a bar chart and a pie chart.

Meet the K chart:

The K chart is designed by combining a conventional waterfall chart with what is called a level chart. The latter looks like a bar chart, but uses solid ticks instead of solid bars for displaying values. This design decision allows both charts to be combined together without creating too many visual collisions between the bars of the waterfall chart and the ticks of the level chart. On this chart, the left vertical axis visualizes counts of functions per module, while the right vertical axis visualizes sums of counts of functions. In other words, both axes are congruent (they both visualize counts of functions), but use two different scales.

There are many ways to design this chart, but the most effective is by sorting horizontal categories (modules on this dataset) by decreasing order of values (numbers of functions), and by displaying the All bar on the right, thereby drawing the bars of the waterfall chart alongside an increasing curve. This configuration gives the chart its distinctive K shape, which in turn was used to give it a name — and my youngest daughter is named Kaia...

In most cases, a K chart will work well with a single color, unlike the pie or donut charts, which require multiple colors or tones. Nevertheless, the use of colors on the K chart should not be totally discouraged, for it can serve an important purpose: when the horizontal dimension is nominal (as is the case with our sample dataset), the use of colors can help communicate this fact and distinguish the chart from one that would be produced for an epochal dimension (like a date for example). This is especially important when one tries to design a chart that can be interpreted within a split second, without having to read the titles of axes (our sample charts do not show any, on purpose).

Furthermore, colors (or tones) can play a critical role in the K chart: they help communicate the fact that individual ticks on the level chart correspond to individual bars on the waterfall chart. When using a single color or tone, this relationship is weakened, and the chart becomes a little bit more difficult to interpret. To make a long story short: colors are not bad on their own, it is their careless application that should be avoided. But with the proposed design, the application of colors is very intentional.

All design options are presented on this follow-on article: Variations on the K chart.

By design, the K chart uses the exact same amount of real estate as the bar chart and waterfall chart that it is made of, but conveys a lot more information than each chart can offer individually. Therefore, it is a great candidate to replace them in most instances.

The K chart also offers some unique benefits over the pie chart:

The display of labels and values is greatly simplified and much more space efficient.
The horizontal axis can be epochal (temporal with an epoch, like dates).
The layout can be rotated by 90° (in order to support a portrait output for example).
There is a clear starting point for reading (left or right depending on language).
There is a natural place to display the sum of values.
There is a natural place to display deltas or rates of growth.
The use of colors is perfectly optional (great for accessibility).
The horizontal dimension can never be confused for a directional variable.

This leaves the venerable pie charts with only two unquestionable benefits:

Pie charts are preferable whenever the visualized dimension is a directional variable.
Humans love round figures (Cf. Why humans love pie charts, Manuel Lima, 2018).

The K chart also offers one major benefit over the bar chart: you cannot use a vertical baseline other than zero. Therefore, you cannot cheat with a K chart like you could with a bar chart. The K chart is a faithful chart. This fact was first observed by my friend Albert L.

The closest known alternative to the K chart is the Pareto chart. Unfortunately, this chart makes use of a line chart to visualize a sum of values for a discrete variable, which is in direct violation of some principles defined by Principia Pictura. Furthermore, the Pareto chart makes it much more difficult to visualize the contribution of each category to the sum. Therefore, the K chart is considered to be a more desirable modern alternative.

The K chart is part of a broad family of charts that I call Univariate Combo Charts. Some of these charts have been drawn for a long time, but they have yet to be studied in a systematic manner by the community of statisticians. We hope to explore some of them through our weekly series of articles on data visualization.

Finally, when evaluating the K chart, one should keep in mind the following:

A bar chart is always simpler than a K chart, yet conveys less information.
A pie chart is always simpler than a K chart, yet conveys less information.
A K chart should be compared to a Pareto chart, not to a single bar chart or pie chart.

References

Graphical Perception (William S. Cleveland, Robert McGill, 1984)
Principia Data — Unified Typology of Statistical Variables (Isma?l Ghalimi, 2017)
Principia Pictura — Unified Grammar of Charts (Isma?l Ghalimi, 2017)
Revising the Pareto Chart (Leland Wilkinson, 2006)
Why humans love pie charts (Manuel Lima, 2018)

Beyond Bar Charts and Pie Charts

Ismael Chang Ghalimi

CEO @ STOIC

References

More readings

更多精彩文章

社区洞察

其他会员也浏览了

The one (rare) situation in which it makes sense to use a single stacked bar chart

Math behind PCA

Order of Operation

A Deep Dive into ANOVA(part 3)

2000年来の発見0除算、0除算算法の発見

Phase Shift Calculator: A Comprehensive Guide You Should Read

Best Fraction Calculators in 2024

Overflow, Underflow, Division by Zero Published by Mario Oettler on 17. September 2021

Random Variable and Probability Distribution

Lambda calculus

References

More readings

The fastest computer for data analytics

2021年3月10日

Breathe Year 1

2021年1月4日

Less formulas, more fun

2020年12月20日

STOIC, a Modern Data Workbench

2020年11月24日

Technical Writing with STOIC

2020年11月23日

Cell Plots

2020年2月17日

Univariate Bar Charts

2020年2月10日

Dependent Plots

2020年2月3日

Differential Charts

2020年1月27日

Designing area charts like a stoic

2020年1月20日

社区洞察

其他会员也浏览了

The one (rare) situation in which it makes sense to use a single stacked bar chart

Math behind PCA

Order of Operation

A Deep Dive into ANOVA(part 3)

2000年来の発見0除算、0除算算法の発見

Phase Shift Calculator: A Comprehensive Guide You Should Read

Best Fraction Calculators in 2024

Overflow, Underflow, Division by Zero Published by Mario Oettler on 17. September 2021

Random Variable and Probability Distribution

Lambda calculus