When data bugs kill people

When data bugs kill people

A review of two COVID-related data bugs

This article was first featured in my AI insiders newsletter. Join it at this link: https://ai-academy.com/aiinsiders/

In Canada, when you become an engineer you participate to the "Ritual of the Calling of an Engineer". This is a ceremony where you receive the "Iron Ring", a symbol of the obligations and ethics associated with their profession.

Whenever a Canadian engineer signs a project, she’ll see the Iron Ring on her hand, and get reminded that a mistake in her design can severely hurt people (imagine if she’s designing a bridge or a plane…).

Data analysis is still far away from developing a culture of ethics and responsibility like engineering. But the repercussions of mistakes of a Data Analyst are not less than what an engineer can do.

COVID gave us a few example of this. In this post, I’ll explain two cases and try to draw some conclusions to avoid this from happening in the future.

1. First case - UK: 1.500 deaths due to Microsoft Excel bug.

In October 2020, the Guardian reported that ~16.000 positive COVID cases had been lost by UK’s National Health Service (NHS), due to an excel bug.

In November, the Guardian estimated that the glitch brought to ~1.500 deaths, considering that the "lost cases" didn’t quarantine and infected a larger amount of people.

How did that happen? There are a couple of different versions of the story, but they all point at poor usage of an excel spreadsheet.

The most reliable story reports that the NHS was receiving csv files from testing labs, and put together a system to merge them into a single excel sheet. Unfortunately, they used an old file extension (.xls) to save the merged file. Since .xls files are limited to 65,000 rows of data, any additional row was simply lost.

This meant that any positive patient whose data was recorded after the 65.000th row, was simply forgotten and never traced. This meant 16.000 COVID-positive patients walking around the UK, infecting many others.

2. Second case - Italy: wrong data leads to unnecessary lockdown

Italy has been hurt badly by COVID, and not just in terms of human lives. The national lockdown that the country had to enforce had a massive economic cost as well for businesses.

To contain both the virus spread and the economic hit, since the autumn of 2020 Italy has applied different lockdown regulations to each region, based on their respective virus spread rate.

The spread of the virus is calculated with a parameter known as Rt (effective reproduction number), which depends on the number of patients with symptoms each day (the reason why Rt is measured solely based on patients with symptoms has some valid statistical reasons I won’t dive into in this post, trust me on this one or write me for an explanation).

The process, in theory, is pretty simple: every day, Italian regions calculate the number of new infected people with symptoms, and send the data to the "Istituto Superiore di Sanità"("Higher Health Institute" - ISS). The ISS then calculates the Rt, and based on its value it classifies a region as "yellow", "orange", or "red". Each color implies different regulations, with red being a full lockdown.

In January 2021, Lombardy (the region where Milan is) started rising concerns about its classification as "red", which didn’t seem right.

It turns out that there was another bug.

Lombardy was collecting data about covid patients on an excel sheet. The excel sheet had a column for the date patients started having symptoms, and another column with their current state (asymptomatic, with few symptoms, symptomatic).

After a change in protocol, patients weren’t required anymore to report the disappearance of all symptoms. When they didn’t, the "current symptoms" column was left empty. All these patients seemed to have never healed, and were always counted as infected.

This bugged data collection process led the ISS to categorise Lombardy as "red" for weeks, causing businesses to shut down and people to be locked in their homes unnecessarily.

Any Lesson to learn?

Let’s go back to the Canadian Engineer and her Iron Ring. She knows that lives and businesses depend on the quality of her work, and she acts accordingly. But the responsibility of doing a good job is not just on her shoulders.

Suppose an engineering company specialised in designing "traditional", steel bridges needs to design a special carbon fibre bridge. There’s no engineer in the company with experience with carbon fibre. What would the company do? It would make sense to hire some experts in carbon fibre, or train current employees to this new material. And this is what a responsible manager would do.

A similar scene happens every day in every organisation dealing with data, but rarely the response is the one I described. Many organisations are facing new challenges with the same knowledge, tools, talents and culture they had before. I don’t blame the engineers for the mistakes they’ve done, I blame the people at the top who didn’t re-train them to be effective in this new world.

Data Analysis is a baby compared to an old practice like engineering. Ancient Greeks had some conception of engineering and basic geometry theorems more than 2.000 years ago, while Microsoft Excel is roughly 35 years old, It is understandable that the field needs to grow and become more mature.

But this is not an excuse for sloppy practices. The people who died in the UK or the Lombardy businesses who had to shut down don’t care about these. They deserve better.

They deserve their data to be in the hands of organisations where everyone is aware of the responsibility they have in handling this data. Managers need to push a data-driven culture down to every employee of their organisation, so that everyone knows the potential consequences of mistakes, and doesn’t touch a single bit of data without the right tools, skills, and respect.

Both cases (and the hundreds that I’m sure haven’t been discovered yet) could have been avoided following simple practices and tools.

This saddens me. And it’s the reason I work hard to try to educate people, and change cultures.

This article was first featured in my AI insiders newsletter. Join it at this link: https://ai-academy.com/aiinsiders/

要查看或添加评论,请登录

Gianluca Mauro的更多文章

社区洞察

其他会员也浏览了