Top Five Data Science Trends in 2020 – A Practitioners Take
Angshuman Bhattacharya
Entrepreneur in the AIML space with a successful exit. Building AIML organizations for two decades. Leader in client consulting for AI solutioning. (views expressed here are personal)
The world of big data, machine learning and predictive analytics is actually one of the most vivid examples of a VUCA world – volatile, uncertain, complex and ambiguous. Just look at the name the industry got and how it has changed in last one and a half decade – modelling, then analytics, after that business analytics, then suddenly machine learning to sound it more techy; and now that is even passé, we are calling it Data Science. I am quite sure with every new name avatar; few new values and techniques were added; but doesn’t it also signify that the industry is also suffering from an identity crisis?!
In fact, the industry has changed three times faster in the last five years than it did in the previous ten years. Lot of new technological breakthrough has helped achieve this change, for better. It started with the game changing cloud computing that also made possible the huge computation required for the big data solutions to aid data science. The explode of machine learning libraries made available through the open source computing – R and python became mainstream and industry leaders have started providing algorithms in repositories like TensorFlow. The world of data science started changing at a rapid pace and a plethora of end users and practitioners started reaping the benefit.
As a practitioner and observer of the domain who have spent close to two decades in this ever-changing industry – witnessing the rapid change and the value addition everywhere have been a delight. We have noticed that some of the buzzwords once touted to be game changers, faded away. New types of consolidation started surfacing, new ideas and use cases started sprouting and probably we are approaching somewhat a steady state – as much steady as you can get in this ever-changing world. While it is always risky to predict trends, here is my top five takes that would shape the Data Science world in 2020, and hopefully later too.
1. The marriage of big data and data science finally happening! – the calf love stared at the beginning of the decade, somewhat premature I would say. Neither the Big data technologies and data engineering was matured, nor the machine learning algorithms running on the big data infrastructure was stable. There was scarcity of the useful petabyte sized data to test and build your algorithms. However, that didn’t deter the tech enthusiast (and the sales folks) in touting big data analytics 8-10 years back. Soon all realized that there are still miles to go before this brings in real business ROI. And the love story of big data and data science/ analytics gone behind the curtain, and entered a courtship mode where more intense and forward-looking negotiations started growing.
It is only the relentless hard work and futuristic vision of the practitioners, researchers, business users and the global collaboration amongst them; that the real business use cases involving both big data architecture and data science started emerging up as winners. The Machine learning algorithms running on top of big data implementations, the use of streaming data in petabytes scale in ML processes, and business strategy and action built on these are now becoming a reality. Any industrial innovation success depends upon the real benefits that a set of pioneering leaders would reap. Thankfully the time has come now, and industry has solemnized the marriage of Big data and Analytics/ data science. We are now looking forward to a successful nuptial, with many more benefits to come out across the globe.
2. Machine learning techniques are going mainstream – after the regular statistical and econometric algorithms (like regression techniques and time series modelling and clustering or decision tree algorithms etc.) are stabilized across domains and sectors; the industry was in look out for newer techniques or processes on algorithms that can either handle new types of data (text, image, video, unstructured) or can give a better acceptable results of known use cases, or provide newer approach to existing challenges that provide better results and ROI and actionability. For example, credit rating for the unorganized sub prime market with seriously insufficient financial history. Or the content affinity and stickiness of digital news subscribers. Or a faster, more accurate and unbiased process of car insurance. The existing algos were not sufficient in providing right results, neither the new age ML algos were able to justify the use cases. It took time to train the algos, customize for the right use cases, generate enough data to train and test. But finally, there are uses of new ML algorithms which are being accepted by the businesses – as their results are being proven beyond doubt. As the pioneers have accepted it, it is only a question of time that these algos will be mainstream now.
3. New industries, new use cases, new benefits – the new digital industries like e-commerce or digital contents, have been sitting on a top of huge user interaction generated data in forms of server logs and browsing histories and other data points. The video watching behaviour, the social media interactions all are sources of deep choice and preference information. As it is becoming increasingly important to monetize these behavioural data for the operators, data science practitioners jumped on these few years back unearth the huge value. After a number of experiments and trials with business use cases, the real ROI are being seen now as these analytics opening up new revenue streams (like, selling the subscriptions for digital products) or augmenting existing revenue channels (like, converting single channel subscribers to a multi-channel – in a print+ digital content business). New age ML algorithms, on top of the unstructured data are creating these whole new data science practices rapidly across the world.
4. Old Industries, new challenges – I am sure many of us got bored with the done and dusted BFSI or retail or telecom data science works and stories that are around for three decades now. Novelty in these areas were like more of an aberration than a regularity. However, these industries are changing faster too, so data science cannot lag behind. New approaches to solve old use cases (random forests instead of logistic regression or decision tree), new solutions (early credit rating using ML and limited data and using alternate data) and brand new use cases (supply chain improvement using IOT data coming out of shipment movements) are coming up faster with real benefits, and adopting of such cases are faster than ever. Suddenly these old school industries are started to look like more interesting for the practitioners, and the trend will continue for years to come.
5. IOT data is the new super Oil for data Science – the availability of and trust on the data have always been an issue for many business users to try and test analytics in the past. The traditional systems were not capturing enough data, and the external sources (public or private, free or syndicated or paid) were inadequate, or less trustworthy, or too summarised, or all at times. Important business functions had to ignore data and data science. Things started changing suddenly through the induction of IOT devices, and new possibilities started coming out. How a car driven by the driver are now known through sensors embedded in the car. Where the good actually is in transit are at fingertips for the logistic managers because of the GPS traceability. Digital twins of the manufacturing/ production systems are now equipped with sensors even in high pressure high temperature processes providing most accurate and real time data. All these are now available and data science is making no mistake in harvesting benefits. The tools that help manage unstructured data, mostly open source libraries, are being ably used. The old and new ML techniques are providing the insights which are hitherto not available to anyone. Business processes like logistics or supply chain or demand forecasting are becoming more efficient and predictable.
A bonus trend – use of the alternate data – while the above five are my top picks, I’d put my bet o another one, the use of alternate data. Weather, traffic, GPS, Soli Science, trade movement across borders are all huge and important sources of alternate data and a goldmine for businesses who can see the value in the data. Scarcity of data, or a myth about it, has always been a deterrent for adopting analytics in many places. Data science practices have tried to overcome these, using the alternate data and look beyond the enterprise’s data assets. However, the external data was scarce or wasn’t sufficient. Lately a number of new sources have opened up – granular weather data, transaction level trade movement data across product categories, traffic and cartography data mapped together. A number of use cases were thought of, like forecasting agricultural output or pest attack by analysing satellite images and temperatures and moisture, to predict pesticide demand more accurately, providing optimum micro nutrients, product design and competition tracking using trade flow and demand prediction of industrial products. Encouraged by a number of successful case studies, multiple industry and conglomerates are making it part of their strategy to adopt the alternate data and applying data science techniques and methods on these to solve challenges that were hitherto open. Now it is becoming mainstream to augment these sources as part of the business data strategy and building regular analytics and data science on top of that, many a time marrying that with the enterprise data assets.
The above are a few important business aspects that will shape the data science world in my opinion. There will be many more such trends on which I have limited exposure as of now. When it comes to predicting trends, it is very important to understand and discuss the technology angle to it. Data science technologies are highly evolving and competitive. The open source revolution has happened and played a great role in spreading the word and enabling millions to adopt data science. Last few years we have seen how the commercial technologies, like Microsoft are coming up with newer business models and opening up to our bringing in open source inside the umbrella. It is going to be highly a interesting and evolving world no doubt, but we can expect some consolidation and stability, and definitely real value addition to business cases by Data Science in the next year and beyond. Let’s all be a part of it in various capacities, and enjoy the journey!
9+ years Chartered Accountant | Expert in Legal for Startups | Financial Projections & Valuations | International Tax Laws | India Entry Strategist
4 年Amruta Suryavanshi
MES MOM & Automation Manager at Accenture
4 年Very nice article, thanks? for this.
Great Article Angshuman?depicting the changes we went through in last 20 years and the way ahead…
AI Solutions Provider @ KrtrimaIQ | Creating Intelligent Enterprises
4 年Good insights, Anghshuman from data science perspective.? Many new applications are getting into enterprise space that use unstructured data like text, images/video, audio/speech, which are referred to as AI by few practitioners, to differentiate from traditional predictive analytics i.e. data science! You have also alluded to it when you mentioned about IOT data, but this goes beyond IOT sensors data. Other data you mentioned is also referred to as Open Data in the industry. Yes, this is catching up fast, whereas syndicated(commercial) data has always been in use for decades in retail, cpg and banking industries.
AI / Gen-AI Go-To-Market (GTM)
4 年Angshuman Bhattacharya Thanks for the article.