Data Quality basics - When is “Good enough” good enough ? - PART 1

Data Quality basics - When is “Good enough” good enough ? - PART 1



Anyone who has studied business statistics to diploma/degree level will be aware of how to calculate the Expected Value of Perfect Information (EVPI).?In really simple terms, it’s how much would you be willing to pay for data that you’re going to make a decision on ??A similar approach can be used when determining just how good the quality of data needs to be.?By extension we can introduce the notion of Materiality to data quality by asking a simple question – does it make a difference ?

I spent several years working in the Pharmaceutical sector.?An industry known for strict regulation around Processes, Procedures and Quality.?Get it wrong and people die.?But not usually immediately.?That’s why there’s robust Quality Management Systems (QMS), Good Clinical Practices (GCP – to ensure trials are properly conducted), Good Automated Manufacturing Processes (GAMP - to ensure raw materials are handled properly, with no risk of impurities or poor production) and comprehensive Traceability (enabling rapid product batch recalls).?The lifecycle from molecule development, evaluation, testing and clinical trials from phase 0 (does it kill the recipient?) through to phase 3 (does it provide sufficient benefit when weighed up against the risks?) can take seven or more years before the product is approved by a regulator.


So what can we learn from pharmaceuticals in the wider context of data and quality ?


Firstly, the industry definition of “quality” is surprisingly pragmatic and can be expressed as “sufficient to meet the needs”.?This is something of an eye opener when you realise that actually a product that is better than what you need is not, by this definition, a quality product as it means that cost have been incurred which aren’t necessary.?Those who argue that they should “exceed customer needs” may want to rethink their mantra accordingly as what they’re saying is that they are incurring costs to produce a product or service that are higher than they need to be!


So, let’s turn to some data.?And this is where people may become a little uncomfortable with the data I’m going to use…..because it’s about deaths.?A theme that I sometimes use in describing the challenges of data quality though because deaths are pretty unarguable.?Someone has either died or they haven’t.?It’s quite binary and, unless you’re looking at, ahem, “adjusted data” then generally people don’t?suddenly undie (zombies don’t count!.


I’m going to use data provided by the UK Health Security Agency (UK HSA) and UK Office for National Statistics (UK ONS).?Links are at the bottom of the article

It’s when we blend the data from these two reports together we need to ask the important question – Is this data good enough for what we want to use it for ?

This is where some domain knowledge comes in handy!?We need to apply this sanity check to the data.?

As a starting point, is the overall death data “good enough”.?Let’s compare/contrast:

We start by asking a simple question.?According to the ONS data, how many people died in England in 2022 ?


This is where the problems begin and you realise that there’s a systematic absence in consistency of reporting deaths across government agencies….

540,047 according to the weekly deaths – Table 1 - Total Deaths England (2022)

504,251 according to the deaths by vaccination status

463,006 according to Deaths by vaccination status – Table 3 – Count of deaths by vaccination status (unvaccinated + vaccinated)


Why such a big range of differing opinions ?

One element is that the 504,251 only includes deaths of 18+.?For some reason they have excluded deaths of younger people.?However, according to the weekly deaths report from UK ONS, there were 4,229 deaths of people under 20 in England AND Wales.


To quote a famous line “Houston, we’ve got a problem”.?Because we have a disparity of over 80,000 deaths in a year.?And that’s material!


Let’s go with the mid-point though and work on the basis that the 504,251 is “good enough”.


This is where the importance of phase 3 clinical trials come in to play.?This is where we determine how much benefit is gained from a product versus not using it (and what the associated risk profile is).


To understand this we need to think now of two populations.?


The first population we will call Vaxistan and it has a population of 41,168,164.?By a remarkable coincidence this is exactly the same number as the total number of people aged 18+ in England who had received at least one vaccinated by the end of 2022.


The second population we will call PlaceboLand and it has a population of 9,062,564.?Fascinating that this happens to be the exact number of people aged 18+ in England who hadn’t received an injection.


What we want to understand is how effective is a product at reducing deaths ?


To do this, we compare the relative deaths between the two populations, broken down by who used the product and who didn’t.?If you’ve made it this far, then stick with it as the ending is interesting.


TABLE 1 – Covid19 Deaths by vaccination status, England (2022), age 18+


No alt text provided for this image



Now from a pharmaceuticals perspective, that’s quite worrying.?It suggests that, taking population differences into account, those who were vaccinated against covid19 were more than twice as likely to die from it!.?That’s a negative efficacy rate.?Not good.


But it gets worse.?Because that’s looking only at deaths specifically from covid.?Take a look at what happens when we look at ALL deaths


TABLE 2 – ALL Deaths by vaccination status, England (2022), age 18+


No alt text provided for this image



The overall death rate of 1% is ballpark right and broadly reflects how many people die in England each year as a percentage of the population.?Which gives some confidence in the results.


However, the rest of the data shows a dramatically different rate of mortality in the vaccinated vs the unvaccinated (the former seemingly 3.5x more likely to die).?


Some of this can be explained by age ?– older people are more likely to be vaccinated and are also, as a general principle, more likely to die.?However the difference is something that warrants further investigation


To conclude therefore in answering the question of “good enough”, even with relatively incomplete data, it is possible to determine broad trends and so reach a reasonably informed conclusion.


Hope you enjoyed reading this.


If you’d like to continue the conversation about data, data quality, data governance, accuracy and anything else relating to data analysis then get in contact!


Comments and likes are welcomed.?If you think others would benefit from the article then do repost it.?If you’d like to be kept informed of future posts, then click on the bell next to my profile!


#DataQuality #Pharmaceuticals #QualitySystems #DataAnalytics #DataAnalysis


Here’s the links. Note that I've wrapped them in quotes "" as Linkedin down ranks articles that include clickable external links. So you'lll need to copy them into the browser address window.

1 UK HSA Weekly national Influenza and COVID-19 surveillance report

Week 8 report (up to week 7 data) - 23 February 2023

“https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1138381/Weekly_Flu_and_COVID-19_report_w8-1.pdf”


2 UK ONS Deaths by vaccination status, England

Deaths occurring between 1 April 2021 and 31 December 2022

“https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/datasets/deathsbyvaccinationstatusengland/deathsoccurringbetween1april2021and31december2022/referencetablefeb213.xlsx”


3 UK ONS Deaths registered weekly in England and Wales 2022, provisional

“https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/datasets/weeklyprovisionalfiguresondeathsregisteredinenglandandwales/2022/publicationfileweek522022.xlsx”

Marisa Murton

Data Science Leader specialising in Insurance and Property data

1 年

interesting read. It also highlights the dangers quite nicely of one-way analysis ??

回复
Stewart Reeder

Head of Insurance

1 年

Important article ? Gary Nuttall MBCS CITP!

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了