What makes a good data visualization – a Data Scientist perspective

What makes a good data visualization – a Data Scientist perspective

Data visualization has been a subject of many thousands of books, courses, and blogs. My course on data mining included a one-hour lecture on Data Visualization. However, since you, my reader, don't have an hour to read this blog, here is a KDnuggets-sized summary of what makes a good data visualization from a perspective of a Data Scientist.

This excellent map by french engineer Charles Minard effectively tells the story of advance and disastrous retreat of Napoleon army in Russia in 1812.

The map shows several key variables: the size of Napoleon army (width of the bar - one mm indicates 10 thousand man), location on a map; temperature (on second scale), direction of travel, and time.

Data Science is more than just building predictive models - it is also about explaining the models and using them to help people to understand data and make decisions. Data visualization is an integral part of presenting data in a convincing way.

There is a ton of research of good data visualization and how people best perceive information - see work by Stephen Few and many others.

Guidelines on improving human perception include

  • position data along a common scale
  • bars are more effective than circles or squares in communicating size
  • color is more discernible than shape in scatterplots
  • avoid pie chart unless it is for showing proportions
  • avoid 3D charts and reduce chartjunk
  • Sunburst visualization is more effective for hierarchical plots
  • use small multiples (even though animation looks cool, it is less effective for understanding changing data.)

See 39 studies about human perception, by Washington Post graphics editor for a lot more detail.

From Data Science point of view, what makes visualization important is highlighting the key aspects of data - what are the most important variables, what is their relative importance, what are the changes and trends.

Data visualization should be visually appealing but not at the expense of loading a chart with unnecessary junk, as shown in the image on the right.

Prof. Edward Tufte 3 principles of Graphical Excellence (see The Visual Display of Quantitative Information) say:

Give the viewer

  • the greatest number of ideas
  • in the shortest time
  • with the least ink in the smallest space.

There are many examples of misleading data visualizations, also here, and here.

One common error(or misleading tactic) is to change the axis to increase the size of effect.

Fig. 2: Misleading Visualization: Same Data, Different Axis.


Ok, so we mentioned how to avoid making a bad visualization.

How do we make a good data visualization?


  • To do that, choose the right type of chart for your data:Line Charts to track changes or trends over time and show the relationship between two or more variables.
  • Bar Charts to compare quantities of different categories.
  • Scatter Plots show joint variation of two data items.
  • Pie Charts to compare parts of a whole - used them sparingly since people have hard time comparing the area of pie slices
  • You can show additional variables on a 2-D plot using color, shape, and size
  • Use interactive dashboards to allow experiments with key variables

There are many advanced methods for multidimensional data visualization like parallel coordinates, Chernoff faces, or stick figures, but they have not become popular because they are hard for non-experts to interpret.


However, you can go beyond 2 dimensions by also using color, labels, and size to effectively represent additional dimensions.

Here is a toy example with US Presidential elections since 1976 (based on this data).

Below is an example of how not to visualize this data.

See the rest on KDnuggets:

What makes a good data visualization – a Data Scientist perspective

https://www.kdnuggets.com/2017/03/what-makes-good-data-visualization.html


Andrew Weekes

Senior Business Analyst / Product Lead |Performance Improvement Consultant | SCRUM Agile Delivery | NZ Master's Track & Field Athlete

8 年
回复
Greta Roberts

Recovering Software and Sales Exec.

8 年

Very nice article Gregory Piatetsky-Shapiro. Thank you. I'll be sharing this.

回复

要查看或添加评论,请登录

Gregory Piatetsky-Shapiro的更多文章

社区洞察

其他会员也浏览了