Getting the Insights you Need from Big Data: Not Black Magic Anymore!
First of all, many thanks to Ray Greenwood for his input.
If you have been reading my articles, I think you will notice that I’m fascinated with buzzwords and how they shape the perception of emerging technologies. I find it especially interesting that by the time the buzzword associated with a new technology reached the most mentions, it is usually the time when most people have heard it, but don’t quite understand what it does, or how it works. To give a one-word example that’s hot at the moment: “Blockchain”.
Anyway, by the time the technology is better understood and the applications are starting to be felt in the real world, it has faded from the pundit blog entries and Twitter. I am guessing this is because it has ceased to be a way to appear intelligent and mysterious during cocktail parties. Such is the fate of “Big Data”.
First of all, a caveat; I do have some background in Analytics, but only the basics of data science, so if you are interested in this topic, what you will read here is the account of a fellow traveler who is as fascinated in this area as you are, not a leading pundit. With that said, let’s go for a deeper dive.
As recently as 2014, a very popular quote on Big Data was: Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…” (Prof Dan Ariely, Duke University). Which was very true back then. Nobody knew what to do with their vast amounts of data, only that they needed to be doing something with it to get deep magical insights, and only a secret cabal of wizards called “Data Scientists” could unlock the secrets
Flash forward 3 years, let’s see where we are. What, teenage sex notwithstanding, is Big Data? I would suggest one aspect of Big Data is “Data which volume is big enough that you will not be able to get insights using just the regular slice and dice method”.
Slice and dice have been with us for decades, and there are great tools out there for this but when the volume of data grows to such an extent that no matter how much you slice and dice, you cannot be sure you are looking at the right thing or getting the right insight, you may need help.
For example, if you are looking at the retail sales of ten items for a store, you and your brain should find it easy to handle. How about for 16,000 items for your 6million online customers? What if your supermarket chain has to analyse what product your customers are likely to buy in the winter season given their purchasing history, income levels, residence post codes and whether they own more than one car?
Should you even be looking at car ownership in the first place or should you be looking at whether they subscribe to cable? This is where the predictive analytics software and data science tools come in.
What is especially fascinating for me is the concept of machine learning. How this works is that some smart people will come up with an algorithm, and then you ‘train’ the algorithm by feeding it data. This seems to be an abstract concept, but if you think about it, this is exactly how humans learn. By practicing a certain skill or craft many times, our brain develops the ability to detect patterns to optimise our judgement the next time we see a similar situation.
So anyway, after this learning process, if you do it right, the algorithm will be able to make educated guesses on [GR1] what the outcome would be given a particular set of circumstances. This same method can be used to predict your next e-Book purchase, predict traffic accidents or to decide whether you get jail time or a warning if you run afoul of the law.
In the first iterations of predictive analytics, while software tools existed to help you analyse the data and come up with some pretty stunning insights, you needed the wizards of data science to prepare the data for analysis, apply some highly mathematical formulae and algorithms, and interpret the results.
This is great in making these data scientists appear to have an aura of mystery which no doubt increased the number of cocktail parties they got invited to, but times are changing.
As Big Data hits the mainstream, it cannot afford to be the domain of a select few. But not everybody has the time and inclination to get a master’s in statistics (although a PhD would also come in handy), so companies such as SAP are now focusing on automating the data science part of the process so that business users can focus on the outcome instead of taking night classes in stats.
This new automated approach to predictive science takes a lot of the guesswork out of the process. You do not need to tell it which algorithm to use, it will apply an appropriate one. You do not need to set the parameters of the predictive model, it will derive them for you. No, we have not reached the point where Data Scientists are obsolete.
The automation tools will make the predictive process easier for the end user, but it will also allow the Data Scientists to perform a bigger number of analyses, build more models, and be much more productive.
This can even help people who still prefer to do a lot of slice and dice (me being one of them). Say you are working with a dataset with a hundred variables, and your boss told you to find out which factor is affecting your profit margin the most.
Sure, you can spend a very happy evening testing lots of different combinations to see which variable correlates the most with profit, or you can save time and load your data into SAP Lumira or SAP Analytics for Cloud and have it tell you in seconds which variables affect profit the most, do a dashboard based on those variables, and still be home in time for dinner.
So there you have it. If you have not looked at predictive tools to harness the power of your data because it has always been in the ‘too hard’ basket, now is the time to rethink that. The data is not getting smaller, and the tools are getting easier to use by the day.
I loved this article, Iwan. Spot on- having a grasp on data and the story it tells us isn't beyond our abilities- it means learning to trust the narrative that we can't see in isolation.
Presales | Product Management | Partner Management | Technical Consultant - SAP Analytics
7 年Great article ! well explained.