Avoiding Errors of Interpretation: the case of Selby & Ainsty
Slope graphs of Selby & Ainsty by-election. Left: The Guardian ? Right: me

Avoiding Errors of Interpretation: the case of Selby & Ainsty

I often talk about trying to avoid in data science is errors of interpretation, and always maintain that it is the responsibility of the data scientist producing the output to maximise the clarity of information presented so as to make it hard for it to be misinterpreted. The Guardian this morning (https://www.theguardian.com/politics/2023/jul/21/byelection-results-paint-ominous-picture-tories-despite-uxbridge-win) uses Edward Tufte-style slope graphs to show the results in the three by-elections, starting with this graph for Selby ad Ainsty. Here is their graphic

No alt text provided for this image
The Guarduan's slope graph for the Selby & Ainsty by-election

My immediate (and incorrect) thought looking at this was that they'd messed up the scale and that this was a misleading visualisation as a result. In fact, they've scaled the data perfectly; but I still think it's a visually misleading visualisation. What's wrong with it?

  • First, and most fundamentally, visually, it doesn't (to me) look like a nearly 43% drop for the Tories and an 86% increase for Labour.*
  • One reason it doesn't look right is that there's no clear zero. And in fact, visually, the location of the 6.5 suggests that the zero is below where it really is: it's above the 6.5.
  • Another reason it doesn't look right is that the slopes all look shallow. Obviously the separation of 2019 and 2023 is arbitrary, and technically, the graph is equally accurate whether you shrink or stretch it horizontally. But different scalings create very different impressions.
  • Another thing that makes correct reading harder is the placement of the numbers on the left and right. With direct labelling (i.e., labelling the points with numbers, rather than using a key or a scale on an axis) the numerical values themselves form a more significant part of the visualisation, and here not a single one aligns with its corresponding data point. In the case of the "Green" and "Other" labels on the left, there would obviously be a problem aligning them because they are so close to each other. All the other six points could be properly aligned, but aren't.
  • The label placement is made even worse by the decision to put the party names under the numerical values on the right, pushing them even further out of alignment. The position of the "Con" label on the right suggests it's labelling the Labour value, and only colour (and order, I suppose) are telling you what's really happening.

I first replotted this to see where the lines should actually be (believing they must be wrong) and discovered they were. Here's the Guardian's plot and my re-drawing of it side by side, using the same scaling.

No alt text provided for this image
The Guardian's slope graph (left) vs. mine (right).

I think you get a significantly better sense of the results from the right-hand graph than the left. But I still think the shallow slopes are visually misleading. (I've not labelled the 6.5% point, because I don't like any of the options for that, but if I did, I'd either move it left or down to fit.) The comparison does, however, make it fairly clear that The Guardian did scale correctly. To ram home the point, here is the Guardian's graph overlaid on my grid and lines, with without my labels:

No alt text provided for this image
The Guardian's graph with my gride (and, in fact, my lones and points too)

It is true, however, that:

  • slope graphs are normally drawn without grids;
  • I've "wasted" some vertical space by extending the grid to 100%;

I would also choose to make the graph narrower so that the visual impact of the slopes is more in line with the true changes. (This is, of course, somewhat subjective.)

With this in mind, here's what I think would have served the Guardian's readers better for this visualisation, again comparing theirs on the left, and mine on the right.

No alt text provided for this image
Guardian Selby & Ainsty slope graph left. Mine right.

Notice:

  • The steep slopes (seem to me to) better convey the scale of the changes
  • I've added a zero line aa a more Tufte-ian, minimalist way of showing the baseline
  • All my labels align properly (vertically) with the points they label.
  • The graphic is no bigger. (I've used thinner lines and small points, but obviously, that's just an aesthetic choice.)


If you're interested in more on errors of interpretation, you might like my talk from the Rohan Alexander 's 2023 Toronto Conference on Reproducibility (https://canssiontario.utoronto.ca/event/toronto-workshop-on-reproducibility-2023/). The video of my talk, Errors of Interpretation (a.k.a. Type VI Errors),is available at https://www.youtube.com/watch?v=wV42aZUprDk. "Type VI errors" is a reference to Randal Munro's XKCD (https://xkcd.com/2303/):

No alt text provided for this image
Type IIII error: Mistaking tally marks for Roman numebrs

If you want to learn more about Test-Driven Data Analysis (TDDA), and the various kinds of analytical errors with which it is concerned, see the blog at https://xkcd.com/2303/, or perhaps this one-page summary: https://stochasticsolutions.com/pdf/TDDA-One-Pager.pdf.

________________

* When I talk about a 43% drop for the Tories and an 87% increase for Labour, I'm talking about the multiplicative change. The Tory vote went from 60.3% to 34.3%, which is an (additive) drop of 26 percentage points (pp) — 60.3% – 34.3% = 26pp — but a (relative) fall of 26/60.3 = 43.12%. Similarly, Labour's share of the vote increased from 24.6% to 46.0%, which is a 21.4pp increase, but a 21.4/24.6 = 86.99% increase.

Why is the Green share on the right of your graph 8.6% when it’s 5.1% in The Guardian?

回复
Graham Hill (Dr G)

30 Years Marketing | 25 Years Customer Experience | 20 Years Decisioning | Opinions my own

1 年

Reminds me a lot of.. Darrel Huff 'How to Lie with Statistics' https://archive.org/details/HowToLieWithStatistics_201608/mode/2up Or as Mr Churchill said, "Do not trust any statistics you did not fake yourself.” Best regards, Graham PS. Check the revised green number

We're on a slippery slope here...

要查看或添加评论,请登录

Nick Radcliffe的更多文章

社区洞察

其他会员也浏览了