Data Science Industrialization, what is next?
Fran?ois Rosselet
Data Architect @ Cargill | AI, DataOps, Data Mesh, AWS, Snowflake, Knowledge Graphs, Agentic GenAI
"Data Scientist: the sexiest job of the 21st century", this was how data science was percieved in 2012, I am writing it again: 2012. Why? because in 2012, connecting data from ultra-siloed systems was a real out-of-the-box performance, with a fantastic potential of adding value to numbers of business.
In 2012, we can see the data scientist as the IT dude of the 80's, as single guy managing the whole IT problems, no front, no backend, just IT. So was the data scientist in 2012, using inference techniques and machine learning to learn always more from data, out of the classical tools which were tailored to solve already known problems.
What happened since that? Data Science went through the entire Gartner cycle, and as the IT guy of the 80's, the data scientist have been split in numbers of distinct roles: machine learning engineer, data engineer, devops, BI developer, etc.. For agility reasons? Of course no, enterprise are still fond of the Taylor and Ford's model: Take a task, divide it into the optimal number of micro-tasks, make it executed within the optimal timeframe etc...
Source: iStockphoto
Good, enterprises started to industralize data science. Some of them eventually started to build full-operational data science supply chain in the Taylor's style: Data engineers caring of making data flow optimally, machine learning engineers building optimal predictive models, devops pushing all of it to production, optimally. Where is the data scientist? As a multi-skilled guy, he naturally lost a lot of its flavor in the middle of this crowd of hyper-specialists.What really happened inside enterprises? From what I have seen, data science has just been an additional sequence of work, or an additional silo inside the enterprise. This is where data science started to go down the desillusion slope of the Gartner cycle.
And now? What needed months to be implemented on-prems just take a few hours or less to launch in the cloud. Data science and AI are now available as micro-services highly available, resilient, scalable, no need to re-invent the wheel, GAFAMs have probably gathered the best talents in these fields and they are making data science available to everybody through powerful micro-services. Recently, the AutoML service from Google Cloud Platform was ranked within the top 5 of a Kaggle competition...
What should we learn from that? We should learn that the sexiest job of this century was build by multi-skilled people who were thinking out-of-the-box, guided by their intuition and following their own vision to build something that did not exist. This is why nobody was never successful in correctly defining what data science is, I saw hundreds of different Venn's diagram trying to impose a synthesis of what data science is, this is just absurde in a world where everybody is chasing a single version of a truth. Here is my version:
What to conclude? Maybe should we be careful as we are industrializing data science, doing it on-prems has less and less sense today, except if there are no other choices. We also should be careful as we are transfering data science toward a supply chain because:
- Technologies are evolving faster than most implementation processes, so the risk of being already outdated once ready for production is really high if you are not sticking to the latest agile and DevOps methods and tools.
- Do not forget the world 'cycle' at the end of 'supply chain'. Is your infrastructure agile enough to follow and accomodate the latest technologies in your business? Are your human resources agile enough to adapt to the next change of paradigm, because it should not sound unlikely...
More generally speaking, we should definitely give more credit to soft skills nowadays. Data science was disruptive, other disruptures are to come for sure and this won't have anything to do with technical skills.