Turns out, your model isn't all that
At Code Pilot, we're not only seasoned Applied Data Scientists, we also spend a lot of time on enabling other data scientists. We recently conducted an extensive survey of Applied Data Scientists from industry, and while we're still processing the data and preparing our final report, one data point was head and shoulders above any other, and it was all about models.
Applied Data Scientist?
I first heard the term "Applied Data Scientist" during my last role at Microsoft. Up until that point, I had only heard the term Data Scientist, or Machine Learning Engineer, but the addition of the term, "Applied" just made sense.
Why? Well, mainly because at that time, Microsoft was all in on AI/ML. And the goal was clear, to use AI and ML as a way to add intelligent capabilities to a product. There are of course AI/ML products at Microsoft, but Applied Data Scientists were responsible for embedding into a team, helping them understand where AI/ML could add value, then help implement it. The Applied part really meant you were a hybrid; part engineer, part data scientist.
There are enough Applied Data Scientists to survey?
Well, sure, only they may not self identify that way, so it takes a bit of digging. At the end, we took a classic crawl strategy, we started with Applied Data Scientists we knew at companies like Microsoft, Google, AWS, Facebook, then asked for introductions to other Applied Data Scientists they knew, and it just rolled on from there.
What did you ask them?
The survey was conducted in two parts:
- Exploratory/verbatim: These were informal 30-minute chats to help frame the conversation. We learned a lot about practices, methods, team dynamics, tool chains, tribal knowledge, etc.
- Quant: After we had enough context, we created a survey to get a larger signal on the main points that kept coming up in the interviews. Like good data scientists, we needed to make sure we weren't falling victim to the law of small numbers.
So what did they say?
A lot, and it was awesome! As an Applied Data Scientist, it was unbelievable fun and completely insightful to understand how some of the best AI companies actually use AI. And not the hand wavy, math formulas to make your eyes bleed AI, but just solid, practical, applied AI.
And while we're going to publish the final paper asap, there was one consistent theme that emerged...
The model just isn't that important!
"Heresy!" I hear you uncontrollably scream into your can of Red Bull as you gently reach for your framed picture of the good doctor.
Yep, turns out, in today's real world of Applied Data Science, model fixation is just not a thing anymore. When I started out in the field, you could pick any model you liked as long as it was Linear Regression. Then during the halcyon days of ML, data scientists toiled day and night in their garages to come up with the hottest low rider model to take to the local diner and show off. But as it turns out, while accuracy, run time in production, and ROI were never factors for AI supremacy, in the companies that are using AI/ML to actually improve their business and customer's experiences, model hawtness has given way to value-based AI.
What is Value-Based AI?
I don't know, I literally just made it up. It's the conjunction of the word "value", which was easily the most mentioned term during the survey, and "based", because "measure" was like the second most mentioned term, but "value-measured" just doesn't have the same ring to it.
To really express it though, I'm going to steal a response from one of our interviews:
10% of real production AI is model selection, but depending on the question you're trying to answer and the makeup of your data, that might already be decided for you. The other 90% is care and feeding, and that's just good old engineering, with a focus on an AI feature.
It became a recurring theme, when asked to break down the amount of time spent in the key phases of a data science project, there was an overwhelming response that prioritized engineering and AI ops over model selection.
Conclusion
Look, I don't really have one, mainly because we're still going through all the responses and compiling our paper. But it is definitely food for thought, that as more organizations move past the hype phase of AI, and onto trying to understand how to adopt AI as a part of their business, model selection will converge into a set of widely agreed on primitives linked to illustrative data models much like what happened in Computer Science with Data Structures and Algorithms.
Or not, who can say for certain? But if we know one thing, the AI community loves a good schism.
Data Engineering | Data Science | 1x with good habits
5 年"Depending on the question you're trying to answer and the makeup of your data, that might already be decided for you." ?????? this is a hard pill to swallow for many, but after a while you learn that if you spend time to craft and pick the right question, even white box analytical models work. Turns out doing business is all about opportunity cost, and picking model is rarely the option that provides the best value.