Why We Code
"Code-first data science makes no sense." I guess that's true if you spend all day listening to vendor hype. If you talk to real data scientists, you know there's a reason they work with code. Here are some of them:
Speed. It takes seconds to write a line of code. “No-code” tools are slow. Click on the menu, the sub-menu, and the sub-sub-menu. Drag an operator to the canvas. Right-click and configure the operator.?
Working with “no-code” tools is like living in hell when you know how to code.???
Functionality. Code is richer than “no-code” tools, often 5x-10X richer. Compare any drag and drop UI with a code-based API for the same software. Software developers expose new features in APIs before building them into a “no-code” interface. Many features will never make it into the no-code version. That’s because there’s a limit to the number of operators you can stuff into a graphical UI.
Flexibility. Since “no-code” tools support fewer features, developers focus on ones they think are essential. That’s great if you do simple analysis; otherwise, you’re out of luck. Code is infinitely flexible. That’s why every commercial “no-code” tool for data science includes a “code node” capability where the user can insert code.
I’ve seen work done in a “no-code” tool where every single node in the workflow is a code node.?
Transparency. Code is what it is, and it’s open for inspection. That’s not always true for “no-code” tools. Data scientists are accountable for the accuracy of the work they do. When the analysis is wrong, you can’t blame the tool. The processing pipeline is completely visible when you work with code, from data to insight.
领英推荐
Efficiency. Nobody codes a project from scratch. Data science teams curate and share reusable code components. You can tweak and tune code to improve runtime performance and minimize the impact on computing infrastructure. That’s not possible with no-code tools.?
Working with code does not rule out working with innovations like AutoML. Every leading AutoML tool supports a code-based API. The best AutoML tools are extensible; expert data scientists can add code to the algorithm. Many AutoML tools deliver pipelines as code packages that experts can review, modify and tune.
Coding skills are not scarce. The pool of people with Python skills greatly exceeds the pool of experienced data scientists. Of course, knowledge of Python does not make one a data scientist. Data scientists require knowledge and skills that are well beyond programming. But that is precisely the point – coding skill is not the critical bottleneck limiting the supply of data scientists.
"Code-first" data science is fast, functional, flexible, transparent, and efficient. Data scientists working with code can use the most advanced innovations in machine learning and artificial intelligence. Every prospective data scientist already knows at least one programming language; many are multi-lingual.
“No-code” tools are fine for some tasks. Graphics, for example. Dashboards. Data scientists often use these tools together with code-based tools when necessary. “No-code” tools are also great for simple analysis. Many managers and analysts prefer “no-code” tools, and that’s fine.?
But don’t confuse managers and analysts with data scientists. If I can screw in a lightbulb, that does not make me a “citizen electrician.” It makes me a homeowner with a lightbulb.
Statistician. Computer scientist. Data scientist.
2 年Good article, and I know I am late to the party. When arguing for code and against no-code at organisations I have worked at, there are two other salient points. 1. Code is easier to test and put into production than no-code. It's understood by engineers and the technology arms of the organisation. 2. No-code tools lock you into vendors completely. How long will the vendor be around and remain relevant?
Data Modeling Aficionado and Senior Technical Consultant at virtual7 GmbH
2 年I get what you’re trying to say but I would argue there are quite a few people that you don’t necessarily want to write code because they will write lots of ugly, redundant, hard-to-maintain code that someone (and you might very well be that someone) has to sort out and clean up later.
Data Scientist | Views expressed on LinkedIn are solely my own.
2 年They both make sense, which is why SAS has been offering both code-based and point-and-click stats software for as long as I can remember, and I've watched heavy users of either (but almost never both) work with them daily in the biological sciences. For me, the code-v-no-code innovation around "no-code" ML (as opposed to point-and-click stats) packages is that they integrate with code-first ML packages. Coding gives you freedom and no-coding gives you abstraction. Having access to both at once is fantastic.
C-Suite Executive | Digital & Data Transformation Leader | Driving Revenue Growth & Innovation | AI & GenAI Expertise | Ex-Microsoft
2 年Does anyone find trend 1 concerning? Having someone who doesn’t understand how to interpret the model outputs create and deploy a model can be dangerous. (And yes, this also is a few years behind!)
GTM AI / Growth Driver / Trusted B2B Advisor / Operator / Perennial $1mm Quota Achiever
2 年Only someone who doesn't know how to code would write coding makes no sense.