The Deployment Pain
Patrick Slavenburg
Entrepreneur. Enterprise software & services. AI & Decision Intelligence. Occasional AI advisor to OECD, EU Commission & DigitalSME. Climate & NatSec hawk.
This article was in large majority written by Rosaria Silipo and written in first person!
Rosaria is Principal Data Scientist at KNIME. For a smaller part co-written by myself (Patrick Slavenburg).
In October 2017, I was running the KNIME booth at the ODSC London conference. At the booth, we had the usual conference material to distribute: informative papers, various gadgets, and even Swiss chocolates. Amongst the gadgets we had magnets, more specifically four types of magnets representing abstract nodes in an abstract workflow for data analytics: Read, Transform, Analyze, and Deploy. These magnets were quite popular. Conference attendees would come to play with them, to assemble them in a workflow, or to take a few home to their fridges.
I remember that most people were drawn to the magnet tile Analyze, while the same most people frowned and shrugged in front of Deploy. I even remember clearly one attendee who, when offered the Deploy tile, recoiled and in a joking (?) tone stated: “I do not want Deploy. It physically hurts”.
I was quite surprised (I even tweeted about this). How could a data scientist, who allegedly spent quite some time building and refining Machine Learning models, not want to deploy them, to release them into the real world? That is because deployment is indeed a pain!
Why is deployment so hard? Is it just the disappointment following the natural likely death of some data science projects? Or are there really some extra-obstacles that a data scientist is not equipped to overcome? I have my own opinion on this topic, of course, but I might be biased by my own attitude and past experience. So, I and Patrick Slavenburg asked our fellow colleagues on LinkedIn about the most common causes for deployment failure in a data science project. This article aims at summarizing the answers.
1) Infant Mortality
As everybody knows, a data science project consists of a few mostly standard phases, summarized for example in the CRISP-DM cycle: business case understanding; data exploration; data cleaning and data preparation; model training, model optimization, and model testing; and finally, if everything went well in the previous phases, deployment. “If everything went well” means:
- If we understood the business specs clearly
- If the data quality is sufficiently good for our task
- If we managed to clean and prepare the data properly
- If we trained, optimized, and tested a sufficiently good model
If all of that succeeded, we might proceed with deployment.
The deployment phase is at the end of the food chain. It collects all the garbage produced in the previous stages of the process. That is, the deployment phase is where all previously created and undetected problems might show up to kill the entire project. And here we are at the first cause for deployment failure: infant mortality. Infant mortality is due to a deep undetected problem in any of the previous stages.
Kenneth Longo, for example, reports the misunderstanding of the original business question as one of the most common causes for a Data Science project to drown during deployment. Kenneth describes the problem as follows: “What the Machine Learning model ultimately answers does not align with the original business question, or the business question has shifted during the development process.”
Sometimes, despite all efforts, model performances are still below expectations. In this case, either expectations were too high to start with (Data Science was sold as magic) or the quality of the data used to train the model was not good enough (the training set might have been too small and/or not covering all possible events).
Sometimes it is possible to discover this data quality issue during the data exploration phase, for example with some statistics exploration, but sometimes this problem flies under the radar and becomes clear only right before deployment.
A failed deployment of the project, because of infant mortality, is disappointing, but actually not a deployment failure per se. It just happens during the deployment phase. I would not count this as a cause of deployment pain, but more as a disappointing late discovery during the deployment phase.
2) Missing the IT Skills
Deployment requires some IT skills. Whether you need to write the final results on a database, to schedule the job execution every X hours, to run the application from a remote server with different connection parameters, or to run it as a REST service, you will need some help from your IT department, at the very least to get the server URI and the right credentials to access the resources. And this is, in my experience, the second show stopper for deployment: the missing collaboration between the data science group and the IT department.
The IT department’s task is to protect the data. The Data Science group’s task is to access the same data. These two opposite tasks might not foster the best collaboration.
In this situation, it might help if a team within the Data Science group acquires the necessary IT skills to communicate with the IT department and take responsibility for all (some) applications coming from the group and moving into production, for example on a dedicated server.
In cases where extreme protection of the original data is required, the dedicated server can host all intermediate and final results of the data science applications and, maybe, even a copy of the original data. In this way, the general IT and the Data Science IT are almost completely decoupled and the IT team of the data science group can take full responsibility for machines and applications.
Of course, choosing the right tool to deploy, schedule, and execute your application on a remote server might save you a lot of time, sweat, and endless discussions with the IT department. There are a number of tools out there, allowing for server deployment, REST service deployment, scheduling, and final dashboard display on a web browser.
Choose one, the one that best fits your needs and group skills, but seriously choose one! Training a Machine Learning model to a 99.99% accuracy is a great academic achievement. However, throwing the same model into real life is what will allow it to fulfil its purpose. This is a challenge not to underestimate. So, get all the help you can from the right tool!
3) Life in the Real World is more complicated than Life in the Lab
Real life is a jungle and throwing your predictor out there means to face the jungle of governance rules and data privacy issues, that were not considered during model training.
Your predictor application must be accountable in any moment for the decisions made. This means that all predictions in the real world must be stored, traceable, and archived for the required – often legally required – amount of time.
Many countries, especially in Europe, have strong laws about data privacy. Often what looked like a good idea during the initial brainstorming clashes with the legal reality of what you can do with the single person’s data. A model trained on a large amount of data usually does not display information that can be traced back to the original data owner. However, if the training set is too small or if the application is designed to track each single person, this might turn into a legal problem.
Lakshmi Krishnamurthy states that “Building, tuning and performance testing with training data in lab constraints is manageable; but integrating ''live'' data and people issues of quality, governance, enterprise architecture constraints and last but not least getting customer buy-in on what your model is /was supposed to do is not always ''predictable''. That's where the challenge is!!”
On a similar note, Sam Taha thinks that the integration with the existing data plumbing inside the company might prove to be more difficult than initially thought. “Going from a POC with ideal input and pipelines to a solution that works in the wild and across the solution space involves tight alignment with the business and with the upstream systems so that the model becomes part of the production "product"”.
Paige Roberts also agrees and says that “a failure to take into account the data engineering work involved in creating production data pipelines to feed the model is another cause for missed deployment”.
4) Data Leakage
Ruben Ten Cate lists data leakage as another big impediment to deployment. Data leakage is indeed a serious problem, not always easy to detect during training. This happens when data in the training set contain features that are not available in the real world, for example features that are collected after or in consequence of the event to classify.
To quote Ruben: “To add another cause of missed deployment: a 'data leak'. Not a leak in the sense of a privacy breach but a leak of data into the training data, which is not present in the operational environment. For example: often customers with more customer service interactions have a higher chance of churning. This is a valid correlation and is a result of the fact that unhappy customers often call customer service to complain or to actually stop their contract (churn).
The problem with a model based on this feature is that in real life, we want to predict churning customers much BEFORE they become unhappy and call customer support (to churn), so the higher rate of interactions has not yet happened and is not present in an operational environment. As a result, the model performs bad and is discarded before adding value in production.” (End quote)
A classic example of data leakage is when we try to predict a travel delay including the arrival delay at destination. Of course, if the transportation mean was delayed x minutes, this qualifies as a travel delay.
But how can we know the arrival delay before we arrive? There are many more of these examples that with the after-knowledge become self-evident, but are not so clear during training. Naming and describing your data columns clearly might help prevent this kind of problem.
5) Design Thinking, Lean Startup and Agile (or lack thereof)
Patrick Slavenburg: Software development - and increasingly other teams s.a. marketing & sales - moved to iterative processes like Design Sprints or Scrum. Agile processes in Data Science are still fairly new but increasingly popular s.a. DataOps (See DataOps manifesto).
User stories do not remain static however - and can suffer from scope creep. Especially if a Data Science project is taking "too much" time. Even more problematic is that businesses often have difficulties articulating which business value they want to derive from AI. And which has the highest priority. It's the perennial Product Manager’s dilemma: “Should we build this?” and “Why?”.
Agile & Design Thinking may not be the typical Data Scientist’s background - instead they are using a longer term “scientific method”. In the end, they may have solved the problem. But the problem is no longer in alignment with business needs. Or it never really was to begin with.
6) Company Politics
Finally, Patrick Slavenburg reports “organizational problems - and tribalism - as a problem for deployment”. It’s not just the usual office politics that we often find in Dilbert’s comics as well as, unfortunately, in real life.
It’s also the perennial Chinese wall between OT and IT. Whether business departments or the subject matter experts in Operations: all view IT as an enabler to their needs. It also means that if Data Science is introduced through IT, it will stay within IT and never become widely adapted. While it is Citizen Data Scientists we need in order to drive that adaption.
Another example is the lack of collaboration between the IT department and the data science group, as reported previously in this article (in the section about the missing IT skills). In the worst cases this can even evolve in an ugly unpleasant war.
Management support is also hard to win. This has sometimes to do with too high expectations from management on data science applications. Or the opposite: that machine can never replace humans. There’s sometimes little middle ground to be found.
CONCLUSIONS
Yes, deployment hurts. I have listed here some of the most common causes for the deployment pain. The pain can be cured by not underestimating any of these issues, choosing a tool that simplifies your life, and dedicating enough time, resources, skills, budget, and early planning from the beginning of the project.
If you want to read the original comments in the LinkedIn discussions, you can find them in these two sections:
https://www.dhirubhai.net/feed/update/activity:6499327120198627329/
https://www.dhirubhai.net/feed/update/urn:li:activity:6499534506897211392
I hope I have not killed the enthusiasm of implementing and deploying data science solutions. My goal was actually to make you aware of the pitfalls, rather than to scare you. I hope you will make treasure of our experience and keep going with the deployment of even more data science applications.
Freelance IT Consultant | PhD | DevOps | AWS IQ
4 年This happens not only in data science, it’s universal. I remember from my previous corporate life, management complaining that developers were “too perfectionists”, so they decided to release products not 100% ready according to developers. Result, the big international company exists now only in history books and in the place I was working there is now the biggest, most modern, energy neutral Lidl shop in The Netherlands ;)
Re-Searcher, Entrepreneur, Corporate Strategist,Senior Banking Manager, Corporate -IT-Architect, CSO,
5 年It looks like the people you work with operate in a virtual environment disconnect to real life. Is this in happening in "government" , "university" or a very rich company that has the money to do things for nothing"? If you want more projects for nothing please call me: https://constable.blog/2019/06/29/about-the-art-of-systems-engineering/