Making Sense of Data ... Providing Evidence

Making Sense of Data ... Providing Evidence

Providing evidence

We have been through a rough time in the UK, with politicians making grand claims, and where there has been little evidence to back them up. Over the past few months, academics have not been at the fore, and this has left the stage clear for politicians to hurl unproven statistics one way and then the other.

So, academics are not politicians, but they should use evidence to critically appraise, and offer unbiased opinions. If the maths work, and the evidence is proved in the right form, there can be little debate.

As scientists and engineers we must look to provide evidence in a form which is based on evidence, and which others can understand. In the UK, around the EU referendum, some have criticised experts for not providing evidence in the form that citizens could make sense of, so perhaps there's a bit of a disconnect between academia and the general public when it comes to analysing data.  So I'm going to analyse a few things here, and show some connections, all of which are based on evidence, and are not made up. 

All of the analysis is my own, and I only want to show how data can be a powerful tool in presenting evidence. Overall, I'm not claiming anything here, just that evidence, rather than opinion, is a powerful method in advancing our society.

Data is King

We have been through many phases in the growth of the Internet, and we are in the era where virtualisation is king, but the true focus will be on data analytics. The current HPE advert really brings this to the fore:

So, increasingly we don't store our data in relational databases, as we possibly don't know that the relationships are that we want to link, so the future is focusing on unstructured data which we mine, and look to identify clusters, or correlations, or even anomalies. In computer security a system in a large company might produce billions of alerts, and from this we need to mine the data in order to find the few events that actually identify that a company is being hacked.

So, at the core, must be the provision of evidence, in order that we learn and improve things. The architecture that is now common is turn the clock back to the times of mainframe computers, and where we created batch file. In this way we basically teed-up a sequence of operations where the data flow from one batch process to another. This type of approach uses the resources of our computer systems effectively, as background processes can run and harvest unused processing power. 

Most of the system architectures I've seen recently have this decoupling approach where data flows through background processes, and where there is little interaction with user interfaces. The back-end languages such as Python is perfect for this type of approach, as they run at machine level, and often involve single tasks, which are then bound together into a complex one.

A bit of evidence

So let's take a few examples. 

Let's look first at health. It is well known that poverty often leads to poor health, so let's provide the evidence of this, by analysing the US state data on health. First we will look at the average income per state against heart disease, and see a strong correlation between the two (R-squared=0.913):

The approximate straight line fit is:

Heart Disease Death Rate (%) = -0.0014  x  Av Income + 239.9

Now we look at mapping two clinical factors together, such as the Stroke Death Rate and the Infant Morality Rate, and we see there is an even strong correlation (R-squared = 0.985):

With: Stroke Death Rate = 3.6641  x  Infant MR + 14.4091

Now let's look at education. In this case we will look at US state data of educational attainment, and link it to the average income and unemployment within the state. If we map the percent of High school graduates in each state against unemployment we see a strong negative correlation (R-squared=0.941):

and then against household income per state (R-squared=0.979):

We can see that as the household income increases there is generally an increase in the percentage of high school graduates in the state.

Now let's look at crime. For this we will analysis the US City crime data (for cities over 250,000 citizens). First let's look at mapping Robbery statistics against Violent Crime, and we see there is a strong correlation (R-squared =0.923):

With this we get: Robbery = 0.3821  x  Violent Crime -24.124

So, it we have high levels of household robberies will, generally, be also see an increase in car thefts:

In this case we can see there is a correlation (R-squared=0.767), but it's not quite as strong as the previous crime linkage.

Let's look at the recent EU referendum in the UK. For this we can analyse for the percent remain vote in each voting region and plot this against the percent of people in the area with 5 GCEs or more:

This gives us a good correlation (R-squared=0.946), and you can see that generally the higher the remain vote was, the higher the percent in the area with 5 GCEs or more. If we analyse the turn-out in each region we find the highest turnouts were in places where the leave vote was highest (R-squared=0.964):

So, in Edinburgh, as we head for a heatwave on Tuesday, let's look at light-hearted example that everyone gets. For this, every ice cream vendor knows that as the temperature goes up, so do ice-cream sales. But where is the evidence? Well we actually have data from an ice-cream seller, and the correlation between ice-cream consumption (IC) and the temperature is strong (R-squared=0.964):

Well, that's enough analysis ... I'm just away to put on some sun screen for the forthcoming weather. Oh ... before I go ... can I predict the maximum temperature in the UK in June based on the minimum temperature ... yes I can ... (R-squared=0.999) ... this data is taken over the last 20 years for June temperatures in different regions of the UK:

Well ... here's my evidence framework .. it's a bit ropy just now, but I've got to start somewhere:

Conclusions

Not really much to say here, apart from saying the academics perhaps need to get more involved in providing evidence to back-up opinion. As HPE say ... "the world is changing", and every person on the planet can gain access to data with the click of a mouse. But the key is making sense of that data.

The speed of change is now so much faster, and businesses need to increasingly use data to understand the dynamics of their customers and markets, otherwise they might not see the risks ahead, or take best advantage. For little companies, data provides a way to take on large companies and win. The Cloud now gives access to massive computing and storage power that only privileged companies could gain access to.

So ... go on ... change the world ... for the better! Hopefully it is a world based on evidence rather than someones opinion. In that way we will advance, and improve the lives of our citizens. 

要查看或添加评论,请登录

Prof Bill Buchanan OBE FRSE的更多文章

社区洞察

其他会员也浏览了