Avoiding Errors of Interpretation: the case of Selby & Ainsty
Nick Radcliffe
CEO, Stochastic Solutions ?Behaviour modelling ? Data Science ? Data Quality ? Sustainability ? Organizer, PyData Edinburgh.
I often talk about trying to avoid in data science is errors of interpretation, and always maintain that it is the responsibility of the data scientist producing the output to maximise the clarity of information presented so as to make it hard for it to be misinterpreted. The Guardian this morning (https://www.theguardian.com/politics/2023/jul/21/byelection-results-paint-ominous-picture-tories-despite-uxbridge-win) uses Edward Tufte-style slope graphs to show the results in the three by-elections, starting with this graph for Selby ad Ainsty. Here is their graphic
My immediate (and incorrect) thought looking at this was that they'd messed up the scale and that this was a misleading visualisation as a result. In fact, they've scaled the data perfectly; but I still think it's a visually misleading visualisation. What's wrong with it?
I first replotted this to see where the lines should actually be (believing they must be wrong) and discovered they were. Here's the Guardian's plot and my re-drawing of it side by side, using the same scaling.
I think you get a significantly better sense of the results from the right-hand graph than the left. But I still think the shallow slopes are visually misleading. (I've not labelled the 6.5% point, because I don't like any of the options for that, but if I did, I'd either move it left or down to fit.) The comparison does, however, make it fairly clear that The Guardian did scale correctly. To ram home the point, here is the Guardian's graph overlaid on my grid and lines, with without my labels:
It is true, however, that:
领英推荐
I would also choose to make the graph narrower so that the visual impact of the slopes is more in line with the true changes. (This is, of course, somewhat subjective.)
With this in mind, here's what I think would have served the Guardian's readers better for this visualisation, again comparing theirs on the left, and mine on the right.
Notice:
If you're interested in more on errors of interpretation, you might like my talk from the Rohan Alexander 's 2023 Toronto Conference on Reproducibility (https://canssiontario.utoronto.ca/event/toronto-workshop-on-reproducibility-2023/). The video of my talk, Errors of Interpretation (a.k.a. Type VI Errors),is available at https://www.youtube.com/watch?v=wV42aZUprDk. "Type VI errors" is a reference to Randal Munro's XKCD (https://xkcd.com/2303/):
If you want to learn more about Test-Driven Data Analysis (TDDA), and the various kinds of analytical errors with which it is concerned, see the blog at https://xkcd.com/2303/, or perhaps this one-page summary: https://stochasticsolutions.com/pdf/TDDA-One-Pager.pdf.
________________
* When I talk about a 43% drop for the Tories and an 87% increase for Labour, I'm talking about the multiplicative change. The Tory vote went from 60.3% to 34.3%, which is an (additive) drop of 26 percentage points (pp) — 60.3% – 34.3% = 26pp — but a (relative) fall of 26/60.3 = 43.12%. Similarly, Labour's share of the vote increased from 24.6% to 46.0%, which is a 21.4pp increase, but a 21.4/24.6 = 86.99% increase.
Why is the Green share on the right of your graph 8.6% when it’s 5.1% in The Guardian?
30 Years Marketing | 25 Years Customer Experience | 20 Years Decisioning | Opinions my own
1 年Reminds me a lot of.. Darrel Huff 'How to Lie with Statistics' https://archive.org/details/HowToLieWithStatistics_201608/mode/2up Or as Mr Churchill said, "Do not trust any statistics you did not fake yourself.” Best regards, Graham PS. Check the revised green number
We're on a slippery slope here...