登录查看更多内容

Truthful Mapping

John Nelson

Maps and UX at Esri

发布日期: 2015年6月29日

+ 关注

The exact same data mapped using three common classification methods. What? Read on...

Divide and Color
What do each of these thematic maps have in common? They are all mapping the exact same data. Then why are they so different? They each use a different classification method. One of the first things that pops up when whipping up a thematic data visualization that has discrete range brakes (commonly a choropleth map, but not necessarily) is how do I bucketize my data for color assignment?
All three are commonly used and justifiable. And picking the "right" one can have a big impact on what you ultimately communicate.

Some Options
There are lots of ways to carve datasets into discrete classes. I’ll go over three of them...

Quantile Breaks the data into equally filled groups
Standard Deviation Breaks the data into statistical chunks diverging from the mean
Equal Interval Breaks the data into equally distant groups

Or you could just eyeball the data and then divide it into range breaks that look good or are easy to read. Or-or your breaks are driven by a conceptual categorization that doesn't really have anything to do with math. Anyways...

Distribution
When choosing a method of classifying your dataset into discrete ranges, there a couple of things to consider off the bat. First, what does the data distribution look like (if it is dynamic, what does itgenerally look like?)? Is it skewed toward one extreme or the other? Is it relatively normal (bell shaped on a histogram)? Are there outliers to consider? Applying various classification methods can create very different impressions of the data. Any interface is a manifestation of tradeoffs, let’s take a look at some examples...
Normal Data With relatively normally-distributed data, picking a classification method may not make a massive difference in your visualization. Your biggest concern is probably how many breaks to make, and what colors to use. Check out this example of average age per county -nice and normal...

Nice normal data looks relatively similar across the three classification methods. But normal data is pretty rare.

Skewed Data
Now it gets fun. With a data set like the percent of folks who consider themselves multi-ethnic, the distribution is far from normal. In this case, there is a bulge at the lower end and a long tail that eventually pinches off around 30% multi-ethnic. What a difference the classification methods make here!
Does the "Quantile" map below tell the truth? Yes. I can clearly see the locations of higher and lower proportions of multi-ethnic US residents, even regional trends and abrupt shifts. Does the "Equal Interval" map tell the truth? Yes. I get a clear indication that most places in the US are, proportionally, pretty low in multi-ethnic residents.
The two maps look strikingly different because I’m telling the truth about two different things...

Skewed data is a trickier beast. Different classification methods result in very different pictures. It's time to ask yourself what is the content of your message, and proceed while honoring the phenomenon (and your audience).

Examples...

For Normal Data
Equal Interval for Normal data
Equal interval slices the data into equally distant range breaks. Some color buckets get more counties than others, but if the distribution is wide, then the visualization will be adequate.
The gist: Evenly spaced, unevenly filled buckets...

Equal Interval for normal data: Evenly spaced, though unevenly filled, buckets.

Quantile for Normal data
Quantile yields a pretty high-contrast map, that is reliably good looking. The fact that the data is normally distributed doesn’t really matter –each bucket has the exact same number of counties, but you’ll notice that in order to accommodate that, the ranges have to span varying distances.
The gist: Unevenly spaced, evenly filled buckets...

Quantile for normal data: Unevenly spaced, though evenly filled, buckets.

Standard Deviation for Normal data
Trusty old Standard Deviation. It is going to look alright in most cases, but it really shines when applied to normal datasets. You’ve got the mean in the middle and you chunk it out from there based on standard deviation distances in either direction from the mean. Beautiful. Also, don’t do what I did –you should put actual values in your legend instead of the math nerd standard deviation breaks. And while we’re at it, it’s often a good idea to pick a diverging color scheme for data that is classified by Standard Deviation. Pick a neutral color for the mean (center) range and then transition to one color on the left and another color on the right. ColorBrewer gives some nice background here along with a rocking tool to generate your own cartographic color schemes.
The gist: Evenly spaced (to a statistician), unevenly filled buckets...

Standard Deviation for normal data: evenly spaced (from a mathy point of view), though unevenly filled, buckets.

For Skewed Data
Quantile for Skewed data
Quantile to the rescue. The buckets are defined by an equal number of member counties. The result is a map that, in spite of a data set that is packed on one end of the spectrum and virtually empty through the rest of the spectrum, you still tease out the comparative terrain of a phenomenon with a map that is guaranteed to have dynamite contrast. It's a bit of statistical/visualization magic, really.

To that point, though, it could be argued that this method implies a false or misleading heterogeneity of the data. While the vast number of counties have a proportionally tiny multi-ethnic population, this method could imply a greater variance (as compared to the Equal Interval example above) than there really is.
Just remember, when reading a map, read those legends and take the range breaks for what they are worth. Quantile is a good illustration of that.
The gist: Unevenly spaced, evenly filled buckets...

Quantile for skewed data: unevenly spaced, though evenly filled, buckets. Engineered to be handsome, but can be misleading or mask actual disparity.

Standard Deviation for Skewed data
Standard Deviation. It is still providing valuable visual breaks when applied to highly skewed data. But I can never get too cozy with it because it is so hard to explain.
The gist: Evenly spaced (to a statistician), unevenly filled buckets...

Standard Deviation for skewed data: evenly spaced (good luck explaining how), though unevenly filled, buckets. Also handsome (so long as you don't use the color scheme I did).

How-To’s
Equal Interval

Figure out the overall value range (highest value – lowest value) for the data population…
Decide how many classes you want, and divide the overall range value by it, to get the distance value between classes…
Insert the break every Nth value.

Quantile

Decide how many classes you want, and divide the overall population by it. This will tell you how many items get lumped into each class…
Chop up the population into buckets of that many.
SQL Server has a nice built in method for this, called "Ntile"

Standard Deviation

https://en.wikipedia.org/wiki/Standard_deviation…
Good luck.

Also, check out the CartoDB section on "quantification."

Truth
It’s one thing to willfully mislead others by the categorization and representation of data (obviously not cool). It’s another to do it on accident and mislead your audience and yourself. Varying classification methods will produce very different results. In gaining a little background about various methods of classification you’ll be in a better position to…

Create better, more effective visualizations
More keenly understand the visualizations of others
Read the legend
Use what you learn for good instead of evil

In any case, the thing to keep in mind is effective and truthful communication; your visualization should enable the data to tell it’s story. Let us know if we can be of any help!

This post originated from the UX/visualization blog of IDV Solutions: uxblog.idvsolutions.com

@JohnNelsonIDV

David Piles

Data Analyst

9 年

How important it is the method of data classification to make sense of a map. I usually like to use as the first choice Jenks's natural breaks algorithm.

Julien Rouaud

Head of Group Insurance @ Eurofins; Board Member at BELRIM - Belgian Risk Management Association

9 年

Great stuff and so true...it's all about communicating what, how and to whom. Perception remains a key driver in today's society. Thx

Matthew Hampton

Applied Cartographer, GIS Analyst, Geospatial Storyteller

9 年

It's nice to see some "truthful" mapping!

1 次回应

查看更多评论

要查看或添加评论，请登录

John Nelson的更多文章

Esri User Conference 2018

2018年7月4日

Esri User Conference 2018

Once again this year I have the privilege of joining my cohort Ken Field in our trusty Amazing And Inspiring Maps…

6 条评论
Firefly Cartography Resources

2018年6月27日

Firefly Cartography Resources

What is it? Firefly is a style of cartography were thematic layers have a glowing hue effect, sitting atop a dark…

3 条评论
My Bad Interview

2017年3月30日

My Bad Interview

It was 2004. I was wrapping up a masters degree in “GeoScience” at Central Michigan University.

6 条评论
How To Cascade

2017年2月15日

How To Cascade

Cascade is a new scrollytelling app from Esri's Story Maps team. Story telling is an inherently intertwined aspect of…
Make a Plastic-Looking Map

2016年11月1日

Make a Plastic-Looking Map

Recently, at the nacis 2016 conference, I enjoying the presentation by Michael Higgins, of Summit Terragraphics…

5 条评论
Firefly Cartography

2016年10月18日

Firefly Cartography

Firefly cartography (or glow-maps, overproduced tripe, whatever you like to call it…) has a pretty tight set of…
The Depths of Information

2016年10月13日

The Depths of Information

It's amazing (and somehow chilling) to me that the information I am seeing on my screen is often sent to me from…

1 条评论
Create Imressionistic Color Palettes with PhotoChrome.io

2016年9月1日

Create Imressionistic Color Palettes with PhotoChrome.io

PhotoChrome.io is a color resource for designers, developers, or the just plain curious.

4 条评论
Top-5 User Experience Fails

2015年8月17日

Top-5 User Experience Fails

Every interface has an energy budget –there is no way around that. Some interfaces cheap out on the design side and…

3 条评论
Workflow Sequence

2015年8月14日

Workflow Sequence

As part of delivering a user experience that allows a person to do what is needed in a way that seems natural and…

1 条评论

See all articles

Truthful Mapping

John Nelson

Maps and UX at Esri

John Nelson的更多文章

社区洞察

其他会员也浏览了

ChartPixel Review: Instantly Transform Data into Actionable Insights

Dealing with Missing Data

From Raw Signals to Meaningful Insights (2/7)

Data Quality & Cleaning: The Foundation for Reliable Analysis

Data Analytics: Data exploration and key techniques

5 ways to turn big data into insights

Unlocking the Power of Data: Expanding Your Horizons

3 Roles that Data Professionals Play to Bridge the Gap Between Data and Business

Five powerful ways that data analytics can help your business, and how to capitalize on them.

Essential guide to understanding Time Series Analysis and different models

John Nelson的更多文章

Esri User Conference 2018

Firefly Cartography Resources

My Bad Interview

How To Cascade

Make a Plastic-Looking Map

Firefly Cartography

The Depths of Information

Create Imressionistic Color Palettes with PhotoChrome.io

Top-5 User Experience Fails

Workflow Sequence

社区洞察

其他会员也浏览了

ChartPixel Review: Instantly Transform Data into Actionable Insights

Dealing with Missing Data

From Raw Signals to Meaningful Insights (2/7)

Data Quality & Cleaning: The Foundation for Reliable Analysis

Data Analytics: Data exploration and key techniques

5 ways to turn big data into insights

Unlocking the Power of Data: Expanding Your Horizons

3 Roles that Data Professionals Play to Bridge the Gap Between Data and Business

Five powerful ways that data analytics can help your business, and how to capitalize on them.

Essential guide to understanding Time Series Analysis and different models