Book Review: Bad Data
Copyright Hachette / Georgina Sturge

Book Review: Bad Data

Following Rishi Sunak’s statement yesterday that “statistics underpin every job”, I thought it was time to review Bad Data by Georgina Sturge. First brought to my attention via the Today programme, it outlines the dangers of relying on poorly understood or unrepresentative data. As Georgina Sturge works at the House of Commons Library, helping MPs and their researchers (my first job), the focus is on lessons learned for governments, but many of them apply to industry as well.

“Nowadays the need for data is all but baked into the process of government”

The book quickly gets to a key premise “nowadays the need for data is all but baked into the process of government”. However, one of the central tenants of Bad Data is that it’s better to be realistic about the flaws and gaps in data – especially when planning public policy – than take what is there and make false extrapolations, thereby putting too much emphasis on the ‘bad data’. The book is illustrated with data debacles from recent political history, including some antecedents of Brexit.

How you perceive a problem often defines the data you then choose to collect. As a metric, the gender pay gap for example really tells us about the under-representation of women in top posts in firms, rather than overall gender disparities. So this metric, while useful, lacks nuance. In other scenarios, the available data drives how you tackle the problem because of the pressure to take an evidence-based approach. One example given is Beeching’s cuts to the rail network in the sixties which could be justified (in his view) because rail cost effectiveness data was centrally available – compared with similar data about roads, which simply wasn’t at the time.

Some things are inherently hard to measure - and it depends on what you ask, when you ask, whom you ask and who’s doing the asking. The book gives good coverage of surveys that seek to find out what matters to people. In other cases, Bad Data describes trying to measure domains that lack a fixed definition such as race, which is inconsistently collected and is “becoming more meaningless over time”.

Our national mistrust of id cards creates a myriad of counting problems – which are not just inconvenient for statisticians but are expensive and hamper the operation of government compared with countries that do have a single national id system. Of course, we have national ids in the UK – it’s just that there are more than 20 of them (according to government research) and they offer partial and overlapping coverage.

When is a customer not a customer?

Sturge touches on an issue dear to my heart: when we are asked to count something, what are we counting? This problem crops up everywhere when we talk about defining a metric. A classic case in commerce trying to count customers – are we counting individuals or simply transactions? Can the individuals be identified as different people or just different email addresses? If an individual hasn’t transacted for 6 months, are they still a customer? etc. Just like in business, these things can be redefined for a specific political aim. When this happens, there is yet another disconnect in this historical time-series data, making the track record of successive governments harder to determine. This is compounded by the well-known ‘lies damned lies and statistics’ problem which Bad Data also covers well. Even well-intentioned differences in metric definitions make it difficult to compare one country with another – for example to answer the question: Did the UK fare better or worse than other countries during the pandemic? (Actually, the definition of a ‘country’ and how sloppily this is used is my own pet hobby horse, but I’ll leave that in the stable for now.) On the other hand, the external world is always changing so we constantly need to ask ourselves if we’re measuring the right things in the right way.

A related problem to metric definition is data which requires human judgement when recording – the book gives fascinating insights on how this applies to crime statistics or healthcare outcomes – when you also consider the motivations of the people doing the recording.

"Derailed by a mutant algorithm"

The book also covers statistical models and – for the layperson – some practical examples of ‘rubbish in, rubbish out’, ‘mutant algorithms’ and ‘publication bias’ (publishing only results that are significant disturbs the balance of findings in favour of positive results).

I’m involved in running the Civil Service Data Challenge and we often urge teams to think about the financial impact of their ideas. Some of the numbers are impressive – one of teams in the final last year thought they could save £250M in benefit fraud and error connected with self-employed workers in the construction sector. While I’m sure the business case for data sharing in this domain between HMRC and DWP is sound, Bad Data cautions against putting too much faith in these levels of returns because in any scenario like this, there are too many unknowns (the book has a great case study of the Bedroom Tax).

A whole chapter is devoted to how politicians (and pollsters) handle uncertainty in the data – certainly the subtleties of bad data are lost on most of the public, which is one reason why this is an important and accessible book. If I had one criticism, it’s that the themes I mention here crop up across the book. That’s because the book uses case studies to illustrate and many of them exhibit overlapping sets of issues – and Bad Data is not a textbook after all.

Bad Data is not entirely an indictment of government data collection as much as exploring the messiness of recording and exploiting data where flawed human beings are in the mix. The book is not defeatist however and ends with a call to do better – pointing out that where there is investment, such as in sports statistics, the level of accuracy and detail is astounding by comparison. It’s a great read for anyone working in Data and Analytics or in government as well as anyone in my network who would like to be better informed when they hear someone like me preaching the benefits of data-driven decision-making ??.


Bad Data is available now in hardback, ebook and audiobook

Monikaben Lala

Chief Marketing Officer | Product MVP Expert | Cyber Security Enthusiast | @ GITEX DUBAI in October

1 年

Bill, thanks for sharing!

回复
Manohar Lala

Tech Enthusiast| Managing Partner MaMo TechnoLabs|Growth Hacker | Sarcasm Overloaded

1 年

Bill, thanks for sharing!

要查看或添加评论,请登录

社区洞察