Cell Plots
Welcome to this seventh article in our Designing charts like a stoic series!
Introduction
Today, we will cover cell plots, which are the small plots rendered within the cells of an intelligent spreadsheet, in relation to the univariate bar charts rendered within the spreadsheet's column headers. As such, this article is a natural sequel to our previous one.
Terminology
The following terminology is used in all articles of this series:
- A visual can be either a chart, a map, or a diagram.
- A variable is a column in a spreadsheet or database table.
- A dimension is shorthand for an independent variable.
- A measure is shorthand for a dependent variable.
- A continuable variable is a discrete variable that could be made continuous.
History
Cell plots are a generalization of the sparkline chart anticipated by Laurence Sterne in 1762, then popularized by Edward Tufte in 1983. A more detailed history of this minimalist yet highly-effective visualization technique is available on this Wikipedia article.
Structure
In its most basic form, a cell plot is a single-valued plot used to visualized a single value within a database table or intelligent spreadsheet. For example, values of the Name Length column are visualized dot plots.
Optionality
As illustrated on the above screenshot, cell plots should be rendered only when truly useful. Therefore, they should remain entirely optional. For example, values for the Name column do not have any cell plots, because there is no better way to visualize strings than rendering their actual values. Also, because strings can be fairly long, there is usually not much space left for anything else.
Complementarity
In most cases, a cell plot should be visualized right next to its cell value. This guarantees that cell plots are properly interpreted, with the highest level of precision. The positioning and alignment of cell values in relation to their related cell plots will be reviewed on a case by case basis throughout this article.
Vectors and Matrices
If an individual cell contains a full vector of values, a sparkbar or sparkline could be displayed within it. If it contains a full matrix (common with bitemporal modeling), a sparkbar or sparkline should be displayed as well, either by computing some transversal aggregations, or by performing a vector extraction (a future article will cover this topic).
Examples
The following examples show how different cell plots are produced from different datatypes. The set of supported datatypes is defined by Principia Data, and all sample plots have been produced from this reference dataset of countries using STOIC.
Integer: Dot Plot
An integer is best visualized with a dot plot:
On these cell plots, the dark green bars correspond to the bars of the column's histogram. For example, the dark green bar for the first value (11) corresponds to the 4?? bar on the histogram, which displays a count of values equal to 10 or 11. This correspondence make such cell plots easier to interpret. And because numbers in a table should always be aligned to the right, they are displayed to the left of their corresponding cell plots.
Floating Point Number: Level Plot
A floating point number is best visualized with a level plot. If the range of possible values is bounded (as is the case for latitudes for example), a colored background should be added to visualize the full range of values. And if the number is signed, two different colors should be used to distinguish negative values from positive ones.
Values should be formatted in such a way that all decimal points are horizontally aligned.
Period or Duration: Calendar Plot
A period or duration is best visualized with a calendar plot.
On these calendar plots, the dark aquamarine bars correspond to centuries, as do the bars on the temporal histogram. Periods (dates in this particular instance) should be displayed using a monospaced font and with trailing zeroes, thereby ensuring that all separators are horizontally aligned with each other.
Category: Bit Plot
A category is best visualized with a bit plot.
The horizontal positions of bits on the bit plot correspond to the positions of bars on the frequency chart. Furthermore, if the number of possible values is 10 or less, the bits and bars can be colored, and the colors should match. And because text is usually aligned to the left when writing from left to right, category values are displayed to the right of their corresponding bit plots.
Boolean: Toggle
A boolean is best visualized with a toggle.
In this case, the plot is replaced by a UI control. This is motivated by the fact that we can make the control very "graphic", while making it actionable. If the bar for true values on the frequency chart is rendered on the left, the toggle should be on the left when its state is true. This is unconventional for a toggle (the true state is usually on the right), but the mandatory horizontal correspondence with the frequency chart takes precedence.
Quantile: Bit Plot
A category is best visualized with a bit plot (same as category).
Rank: Dot Plot
A rank is best visualized with a dot plot (same as integer).
String: String Value
A string is best visualized with its string value.
This is due to the fact that strings do not offer any relevant canonical quantification beside their length in characters, and a string's length can be visually approximated by the width of its display area on the viewport. If this length is truly meaningful, string values should be rendered using a monospaced font. Otherwise, a proportional font will facilitate reading.
Identifier or Name: Bit Plot
An identifier or name is best visualized with a bit plot.
A column of identifiers or names is visualized with a frequency of frequency chart. With a proper set of identifiers or names, most frequencies are equal to 1 (values are not repeated), therefore the bit plot should be omitted for such unitary frequencies (as seen for Motto). This ensures that most strings can be displayed using the column's full width.
With such an approach, only frequencies strictly greater than 1 should be visualized with a bit plot similar to the one used for categories, but this bit plot should be displayed to the right of cell values, thereby preserving the column's full width for displaying values with unitary frequency, and distinguishing this type of column from category columns.
Image: Actual Image and Optional Bit Plot
An image is best visualized with the image itself.
Much like columns of identifiers or names, columns of images are visualized with a frequency of frequency chart. Therefore, bit plots could be added for images with frequencies strictly greater than 1.
Geopoint: Actual Geopoint and Level Plot
A geopoint is best visualized with the geopoint itself, complemented by a level plot used to visualize the geopoint's geodetic distance to the geocenter of the related column's geopoints.
Geoline: Actual Geoline and Level Plot
A geoline is best visualized with the geoline itself, complemented by a level plot used to visualize the geoline's geodetic length.
Geopolygon: Actual Geopolygon and Level Plot
A geopolygon is best visualized with the geopolygon itself, complemented by a level plot used to visualize the geopolygon's geodetic area (not yet shown on the screenshot below).
Numerical Measures: Ordered Bar Plot
A measure resulting from a PIVOT transformation is best visualized with a dot plot (discrete measures, as seen below) or a level plot (continuous measures). As a result, if table rows are ordered by decreasing values of this measure, the set of cell plots corresponds to the ordered bar plot used to visualize the column, rotated by 90° clockwise.
Benefits
Cell plots offer many benefits:
First, they allow individual values to be visualized in the context of other values in the same column, as well as other values across columns. When table rows are properly sorted, some visual patterns might emerge and suggest additional data transformations, which could lead to valuable insights that would have been difficult to discover otherwise.
Second, when combined with univariate bar charts, cell plots and univariate bar charts can strongly reinforce each other, making both easier to interpret, especially if mouse-over user interactions highlight marks on the cell plots and bars on the univariate bar charts in a symmetrically-synchronized manner.
Third, by making effective use of alignments and colors, cell plots strongly emphasize the differences between datatypes (as defined by Principia Data), thereby providing an additional incentive for users to precisely type their table columns.
Conclusion
Cell plots are a perfect complement to univariate bar charts.
The topic of next week's article will be announced soon. Stay tuned...
Sales Leadership: Better Business Thru Technology
4 年Let's "cellebrate" a better understanding of communicating with charts!
Joli tableau Ismael Chang Ghalimi