Data Science Keeps Failing
Edward Chenard
Transformational Data, Digital, Product Leader. I transform the way companies do business with an innovative blend of data, digital and product transformation. Built several billion dollar plus products and platforms.
When summer rolls around, there always seems to be a new study that comes out on the impact data science teams have on the businesses they support. This summer is no different. This year, thanks to a panel discussion, VentureBeat has posted that 87% data science projects fail to create any impact on the businesses they support. That’s been in the arrange of 80-90% failure that I have seen over the past three years. Why is data science such a big failure? There are actually a number of issues in play that I have seen, dealing with many clients, and my past experiences building these teams in previous roles.
Lack of Collaboration
Many data science teams don’t do well at collaboration. They might talk about engaging a subject matter expect but that tends to be a meeting, not real collaboration where they work together, just a check the box meeting. I have even seen companies that have a separate team communicate to the rest of the org for the data science team because the DS team didn’t want to actually talk to other people.
Unfortunately, data science does attract a lot of people who enjoy just sitting at a computer plugging away and not dealing with other people. The nature of a lot of the business cases created for these teams to solve, demand a high level of interaction. In some ways, those companies that have another team doing the interacting for data science, are right. Most data scientists are often the last person you should send in to do the actually communication piece. That’s a skill set not taught to data scientist. Which leads to point #2.
The Wrong Talent
There has been a strong focus on data science the last few years. Everyone is rebranding themselves to be a data scientist, except me, I don’t call myself one. For good reason. A data scientist should focus on the algorithms. What many companies do is hire data scientists, because DS is cool, then load them up with tasks they really shouldn’t be doing. Such has hiring developers for non-DS projects (yes, I saw this). You have an HR team for that, use them! Having data scientist do data engineering. I know a ton of data scientists who claim to know data engineering, they often don’t. It is a rare breed that does have that level of proficiency to do both engineering and data science as well as a specialist. They will claim it, have them prove it. You might often find that their level of experience is just not good enough for the job at hand. Data engineering has a lot of non-DS aspects to it that are very important to ensure the business has good data pipelines for everyone, not just some data science project.
Lack of Clear Understanding
One of the things I find very interesting is just how often what people say and what they want, do not match when it comes to the analytics space. Case in point, I was dealing with a very large company as a client and they kept saying predictive analytics is what they want. I asked them to provide me a few examples of what they wished to accomplished. From this discussion, I realized, they wanted prescriptive analytics. This is endemic in that many people say they need data science, when in fact they really need data engineer or data strategy or something completely different.
The space has evolved over the last decade. Even back in 2012, an engineer and data scientists were very much their own specialized roles. Yet I constantly run into data scientist who think they are engineers. Not so many engineers who think they are data scientists, which is interesting. But also many data scientists will proclaim to be “business experts,” when in fact that is far from the truth. I do data strategy. I manage the build of algorithms and sometimes build them, but I spend a lot more time on the legal, regulatory, design, customer experience and P&L management of the process. Sometimes I get data scientists telling me that they can do my job, until they actually have to do it. To this day, I have yet to me one that actually could do my job.
It’s really is a lack of understanding that true data strategy is just as complex and labor intensive as the development of other aspects of the data stack. In fact, I would rather spend my time on engineering a tech stack vs doing a patent dive because designing a tech stack is far easier and enjoyable. If you really want a successful data practice, recognize that the skills of a data scientist, a data engineer and a data strategist, are very different and you can’t cheap out on one.
Things are Getting Static
Data science is often seen as cutting edge. So, it is rather funny that a lot of data scientist just hate using tools that make life easier. Just yesterday I saw a post ripping on Snowflake Computing for making data science “look easy.” That’s is kind of the point, to make it easy. Most of the commenters were saying that you should hand code everything! In some situations, sure, but we have to keep in mind that we often have deadlines and market forces, not data scientist, decide when something goes to market. So, if you need it fast, take the faster router. If you have time, sure, do your hand coding.
But this is the issue of data science, for all it’s marketing hype around cutting edge, a lot of data scientist are not. They are more than happy to disrupt the life of the data engineer with all kinds of requests for new tools and data, yet when you want to do the same to the data scientist, you will find they are often the most stubborn group to deal with. But life goes on, progress happens and data science is going to go main stream. We have seen this over and over again. Remember 20 years ago you need someone with specialized skills to code a website and that skill often was costly. Not today. The same will happen to data science. Most of us don’t need super computer level analytics, we need a basic recommender or pattern recognizer. Something that will because as easy as using Microsoft Word in 10 years. This lack of progress on their part is going to be their own downfall.
Bad Leadership
Finally, the biggest issues is bad leadership. I see data science often thrown in with software development. Sure, there are some similarities but I can say that about anything. I mean marketing and finance both use computers, so why not have both report to the CMO? Most people would laugh at that, yet when it comes to data science, we don’t? Data science belongs under the Chief Data Officer or the business leader and that leader needs to know how to put data science to use.
Far to often I see business leaders shy away from actually leading data science. They think their team will magically come up with answers. They will have answers, that take a long time and cost a lot of money. But don’t blame the team, blame the leader. This is a field were very few really know how to lead and if you find someone, odds are they are a data strategist because the nature of the job is more inline with leadership than the engineer or data scientist. Yet, that is often the most overlooked aspect.
Has anything changes from previous years? Not really, the same issues continue and thus we see the same high rate of failure. Until companies really change, I don’t expect this failure rate to come done. But, that doesn’t mean companies are all bad. I have worked on several teams where we had an 80% success rate. Because we did things different. Which would you rather have, 80% failure or success?
Director of Data Science and Analytics | PhD | AI/ML
5 年Great insight! Reliable, accurate measures over time are also critical to success. There is minimal opportunity when a data lake hasn’t been cleaned and properly cared for.
Author of 'Enterprise Architecture Fundamentals', Founder & Owner of Caminao
5 年The terminology is doubly confusing, first with regard to science, then with regard to data. Properly speaking "data science" is not a science but a technology, and as such a very effective one. Then, there is a confusion between data (what can be known or anticipated from environments), information (categories or types of data managed by systems), and knowledge (data and information put to use). https://caminao.blog/knowledge-architecture/ontologies-business-intelligence/
Holistic Management Analysis and Knowledge Representation (Ontology, Taxonomy, Knowledge Graph, Thesaurus/Translator) for Enterprise Architecture, Business Architecture, Zero Trust, Supply Chain, and ML/AI foundation.
5 年If you think of analytics as a sequence, first you begin descriptive analytics by building a domain class or type descriptive model, also called a metamodel, ontology, knowledge model, or type knowledge graph, to provide an abstract base for human and machine deep learning. Then you build the domain instance descriptive model, also called an architecture, knowledge base, and instance knowledge graph. Then you do diagnostic analytics using the descriptive analytics products, to find the deficiencies, overlaps, gaps, and shortcomings in the domain's description. Then you do prescriptive analytics to build and link a change portfolio of strategies, programs, projects, deliverables, and dependencies at the mission, function, and operation level. Then you do predictive analytics to guide decision-making on destinations, path, and pace of change. It seems data science operates as predictive analytics without integrated, dynamic, and adaptive prescriptive, diagnostic, and descriptive analytics and an underlying core descriptive model. So they're focusing on disjointed twigs and leaves of a knowledge tree, while discounting the interconnected branches, trunk, roots, soil, and surrounding vegetation. It is broken from the beginning.
Bootstrapping the Open (Mutual) Learning Commons -" Symmathecist in the medium of software” Augmenting Human InterIntellect on the IndyWeb
5 年We are heaping layers of layers of accidental complexity on false promises. (instead of slaying them) https://www.anchormodeling.com/?p=1214 Data Lakes not delivering, then add a layer of Machine Learning, still no joy, add Knowledge Graphs. The reason these ideas can be sold to the enterprise is that there is? a nugget of truth in them. Notice that, Knowledge Graphs get traction just when the big boys are ready to capitalize on it Neptune etc. Then there are big technology bets to consider, which way should we jump: https://www.dhirubhai.net/feed/update/urn:li:activity:6572076839991414784 Always ask: Is it alchemy or science https://www.youtube.com/watch?v=gG5NCkMerHU
Data Engineer/ Solution Architect @ LHIND
5 年"Everyone is rebranding themselves to be a data scientist, except me, I don’t call myself one." Edward Chenard "Data Strategy, Data Science,Big Data.." very trustly