The Big Data Analytics War: IBM Watson v Kaggle?
Bernard Marr
?? Internationally Best-selling #Author?? #KeynoteSpeaker?? #Futurist?? #Business, #Tech & #Strategy Advisor
It’s humans versus the machines.
No, I’m not talking about a new Terminator movie, but rather, two models that seek to help address the shortage of data scientists in our industry.
Because the struggle is real.
As far back as 2012, Gartner estimated that there would be a shortage of 100,000 data scientists by 2020. In a 2016 data science report, CrowdFlower found that 83 percent of respondents felt there weren’t enough data scientists — up from 79 percent the year before.
Part of the problem is that big data exploded so quickly, there was no opportunity for the industry to ramp up. Another problem is that data science education isn’t necessarily turning out applicants who are ready to jump into a data scientist position (though they may be ready for a more entry level position). In fact, only six universities reviewed by U.S. News and World Report offer data science programs to undergraduates (the remaining 23 programs are only available to graduate students).
Yet the demand isn’t going to decrease any time soon — so what is the solution?
In fact, there are two solutions vying to come out on top, and it all comes down to the humans versus the machines.
Round 1: The Humans
To represent the humans, we have Kaggle. The San Francisco-based business awards cash prizes to its teams of “citizen scientists” who compete to untangle big data challenges of all shapes and sizes.
And it isn’t just businesses which are benefitting – their projects include looking deep into the cosmos for traces of dark matter, furthering research into HIV treatment, and more.
Chief scientist at Google (which has itself benefited from Kaggle’s research) and Kaggle investor, Hal Varian, describes it as “a way to organise the brainpower of the world’s most talented data scientists and make it accessible to organizations of every size.”
Even though there is a serious shortage of trained data scientists, as we said, Kaggle has 150,000 of them, ready to farm out to the highest bidder.
As well as charging companies they work with (including Amazon, Facebook, Microsoft and Wikipedia) up to $300 per hour for consultancy work, the company organises competitions – applying gamification to data science problems.
Of course like anything new, it isn’t without its critics. In particular, questions have been asked about how valuable the research it leads to actually is – often, they say, the biggest challenges in data analysis revolve around what data is needed, and what questions should be asked.
Kaggle’s pre-packaged competitions take this element out of the equation. The crowdsourced data scientists might be working on the solution to a particular problem – but is it the correct one? And might there be more relevant data elsewhere, other than that supplied in the competition package?
This might be a fundamental limitation to the competition model, until data collection and distribution evolves to the point where it can be made available to contestants in real-time, and then of course there will be serious privacy and data protection issues to hurdle.
Round 2: The Machines
You (and your computer) can be your own data scientist with new machine learning algorithms that can autonomously analyze data and identify patterns, even interpret the data and produce reports and data visualizations.
Most people can see how certain information would be useful and what sort of insights might be derived from it, but most lack the technical skills to perform the analytics. They also might not have powerful enough computers able to carry out the large volume of calculations quickly enough to take action.
IBM believes that it can offer a solution to the skills shortage in big data by cutting out the data scientists entirely and replacing (or supplementing) them with its Watson natural language analytics platform.
IBM’s Vice President for Watson Analytics and business intelligence, Marc Altshuller, explained “With a cognitive system like Watson you just bring your question – or if you don’t have a question you just upload your data and Watson can look at it and infer what you might want to know.”
Watson, and other NLP or cognitive technologies, have the potential to play a huge role in the future of analytics and the education around it. A growing number of people are going to want to be able to extract insights from their data, but they might not want to take three or four years out to learn advanced computer science and statistics. Instead, all that is required might be a brief introduction to NLP technologies.
The winner is…?
Right now, the humans and the machines seem to be tied.
For one thing companies and organizations need to access data analytics wherever they can get it. Kaggle is not yet large enough, and Watson not yet ubiquitous enough that every company can just pick one.
In the short and medium term, a combination will probably be the winner. AI is a nice supplement to the efforts of the data scientists today.
But in the end, I predict AI will win. Assuming that tools like Watson continues to improve as it has in the past, it will become easier and more efficient to use, democratizing data science to anyone who can phrase a question.
As always, I would love to hear your thoughts on this topic.
Thank you for reading my post. Here at LinkedIn and at Forbes I regularly write about management, technology and Big Data. If you would like to read my future posts then please click 'Follow' and feel free to also connect via Twitter, Facebook, Slideshare, and The Advanced Performance Institute.
You might also be interested in my new and free ebook on Big Data in Practice, which includes 3 Amazing use cases from NASA, Dominos Pizza and the NFL. You can download the ebook from here: Big Data in Practice eBook
Improving Call Center Agent Productivity with AI | Founder | IBM Alumnus | Lead Data Scientist
7 年The article wonderfully demonstrates the expanding gap between the technologically illiterate people who love such articles and hands-on engineers and data scientists.
Programme / Change Manager - Digital Transformation
8 年With Artificial Intelligent machines like IBM's Watson is there a match for Humans versus the Machines, or, is it more about machines working with humans to provide better data. I believe that eventually AI machines will become more efficient with data than humans and will remove the need for some humans. However, for now we still need to augment machines with data scientists to understand the data, look at patterns and to make sure we have good data in and out. For some, Elon Musk and Stephen Hawkings, have a fear that AI will be the end of humans. The only way I see this happening is if we allow it.
Financial Services & Fintech & AI expert I Entrepreneur I Business Angel I Advisor I Mentor
8 年thank you Bernard, great article !Machines and Humans can be the solution. for this reason I recommend you the Cognitive Machine Learning of Rulex Analytics; this tools is able to extract automatically, intelligible rules from data so the business man Can validate it
Advisor / Investor at The Crowd's Line
8 年Nice Overview of Modern Big Data scenario!
Unify your industrial data | make it visual | make it perform | with a Digital Twin-delivered in 30 days.
8 年Great commentary - keep it coming as this might become a shape changer for our (IT) future(s)