Beware the data science pin factory:
The power of the full-stack data science generalist and the perils of division of labor through function.

Beware the data science pin factory: The power of the full-stack data science generalist and the perils of division of labor through function.

[This article is cross-posted from multithreaded and is a slightly modified version of the article I wrote for HBR originally published here].

In The Wealth of Nations, Adam Smith demonstrates how the division of labor is the chief source of productivity gains using the vivid example of a pin factory assembly line: "One person[1] draws out the wire, another straights it, a third cuts it, a fourth points it, a fifth grinds it." With specialization oriented around function, each worker becomes highly skilled in a narrow task leading to process efficiencies. Output per worker increases many fold; the factory becomes extremely efficient at producing pins.

This division of labor by function is so ingrained in us even today that we are quick to organize our teams accordingly. Data science is no exception. An end-to-end algorithmic business capability requires many data functions and companies usually create teams of specialists: research scientist, data engineers, machine learning engineers, causal inference scientists, and so on. Specialists’ work is coordinated by a product manager, with hand-offs between the functions in a manner resembling the pin factory: “one person sources the data, another models it, a third implements it, a fourth measures it” and on and on.

Alas, we should not be optimizing our data science teams for productivity gains; that is what you do when you know what it is you’re producing—pins or otherwise—and are merely seeking incremental efficiencies. The goal of assembly lines is execution. We know exactly what we want—pins in Smith’s example, but one can think of any product or service in which the requirements fully describe all aspects of the product and its behavior. The role of the workers is then to execute on those requirements as efficiently as possible.

But the goal of data science is not to execute. Rather, the goal is to learn and develop profound new business capabilities. Algorithmic products and services like recommendations systemsclient engagement banditsstyle preference classificationsize matchingfashion design systemslogistics optimizersseasonal trend detection, and more can’t be designed up-front. They need to be learned. There are no blueprints to follow; these are novel capabilities with inherent uncertainty. Coefficients, models, model types, hyper parameters, all the elements you’ll need must be learned through experimentation, trial and error, and iteration. With pins, the learning and design are done up-front, before you produce them. With data science, you learn as you go, not before you go.

In the pin factory, when learning comes first we do not expect, nor do we want, the workers to improvise on any aspect the product, except to produce it more efficiently. Organizing by function makes sense since task specialization leads to process efficiencies and production consistency (no variations in the end product).

But when the product is still evolving and the goal is to learn, specialization hinders our goals in several ways:

1. It increases coordination costs. Those are the costs that accrue in time spent communicating, discussing, justifying, and prioritizing the work to be done. These costs scale super-linearly with the number of people involved[2]. When data scientists are organized by function the many specialists needed at each step, and with each change, and each handoff, and so forth, make coordination costs high. For example, a data science specialists focused on statistical modeling will have to coordinate with a data engineer any time a dataset needs to be augmented in order to experiment with new features. Similarly, any time new models are trained the research scientist will have to coordinate with a machine learning engineer to deploy them to production, etc. These coordination costs act as a tax on iteration and can hamper learning.

2. It exacerbates wait-time. Even more nefarious than coordinate costs is the time that elapses between work. While coordination costs can typically be measured in hours—the time it takes to hold meetings, discussions, design reviews—wait-times are commonly measured in days or weeks or even months! Schedules of functional specialists are difficult to align as each specialist is allocated to several initiatives. A one-hour meeting to discuss changes may take weeks to line up. And, once aligned on the changes, the actual work itself also needs to be scheduled in the context of multiple other projects vying for specialists’ time. Work like code changes or research that requires just a few hours or days to complete still may sit undone much longer before the resources are available. Until then, iteration and learning languish.

3. It narrows context. Division of labor can artificially limit learning by rewarding people for staying in their lane. For example, the research scientist who is relegated to stay within her function will focus her energy towards experimenting with different types algorithms: gradient boosting, neural nets, random forest, and so on. To be sure, good algorithm choices could lead to incremental improvements. But there is usually far more to gain from other activities like integrating new data sources. Similarly, she may develop a model that exhausts every bit of explanatory power inherent to the data. Yet, her biggest opportunity may lie in changing the objective function or relaxing certain constraints. This is hard to see or do when her job function is limited. Since the research scientist is specialized in optimizing algorithms, she’s far less likely to pursue anything else, even when it carries outsized benefits.

Telling symptoms can surface when data science teams are run like pin factories, for example in simple status updates: “waiting on ETL changes” and “waiting on ML Eng resources” are common blockers. However, I believe the more insidious impact lies in what you don’t hear, because you can’t lament what you haven’t yet learned. Perfect execution on requirements and complacency brought on by achieving process efficiencies can mask the difficult truth, that the organization is blissfully unaware on the valuable learnings they are missing out on.

The solution to this problem is, of course, to get rid of the pin factory. In order to encourage learning and iteration, data science roles need to be made more general, with broad responsibilities agnostic to technical function. That is, organize the data scientists such that they are optimized to learn. This means hiring “full stack data scientists”—generalists—that can perform diverse functions: from conception to modeling to implementation to measurement. With fewer[3] people to keep in the loop, coordination costs plummet. The generalist moves fluidly between functions, extending the data pipeline to add more data, trying new features in the model, deploying new versions to production for causal measurement, and repeating the steps as quickly as new ideas come to her. Of course, the generalist performs the different functions sequentially rather than in parallel—she is just one person after all. However, doing the work typically takes just a fraction of the wait-time it would take for another specialist resource to come available. So, iteration time goes down.

Our generalist may not be as adept as a specialist in any one function. But we are not seeking functional excellence or small incremental improvements. Rather, we seek to learn and discover all-new business capabilities with step-change impact. With full context for the holistic solution she sees opportunities that a narrow specialist won’t. She has more ideas and tries more things. She fails more, too. However, the cost of failure is low and the benefits of learning are high. This asymmetry favors rapid iteration and rewards learning.

It is important to note that this amount of autonomy and diversity in skill granted to the full-stack data scientists depends greatly on the assumption of a solid data platform on which to work. A well constructed data platform abstracts the data scientists from the complexities of containerizationdistributed processing, automatic failover, and other advanced computer science concepts. In addition to abstraction, a robust data platform can provide seamless hooks into an experimentation infrastructure, automate monitoring and alerting, provide auto-scaling, and enable visualization of debugging output and algorithmic results. These components are designed and built by data platform engineers, but to be clear, there is not a hand-off from the data scientist to a data platform team. It's the data scientist that is responsible for all the code that is deployed to run on top of the platform. And, for the love of everything sacred and holy in the profession, don’t hand-off ETL for engineers to write.

I too was once lured to a function-based division of labor by the attraction of process efficiencies. But, through trial and error (is there no better way to learn?) I’ve found that more generalized roles better facilitate learning and innovating[4] and provide the right kinds of scaling: to discover and build many more business capabilities than a specialist approach. And, while there are some important considerations[5] that may make this approach to organization more or less tenable in some companies (see footnote), I believe the full stack data scientist model provides a better starting place. Start with them, and then consciously (grudgingly) move toward a function-based division of labor only when clearly necessary.

Final thought.

There is further downside to functional specialization. It can lead to loss of accountability and passion from the workers. Smith himself criticizes the division of labor, suggesting that it leads to the dulling of talent—that workers become ignorant and insular as their roles are confined to a few repetitive task[6] While specialization may provide process efficiencies it is less likely to inspire workers.

By contrast, generalist roles provide all the things that drive job satisfaction: autonomy, mastery, and purpose[7] Autonomy in that they are not dependent on someone else for success. Mastery in that they know the business capability from end-to-end. And, purpose in that they have a direct connection to the impact on the business they’re making. If we succeed in getting people to be passionate about their work and making a big impact on the company, then the rest falls into place naturally.

***

Footnotes and References

[1] I took the liberty of modernizing Smith’s use of pronouns.

[2] As J. Richard Hackman taught us, the number of relationships (r) grows as a function number of members (n) per this equation: r = (n^2-n) / 2. And, each relationship bares some amount of coordination costs. See: Hackman, J. Richard. Leading teams: setting the stage for great performances. Boston, Mass.: Harvard Business School Press, 2002. Print.

[3] It's important to note that I am not suggesting that hiring full-stack data scientists results in fewer people overall. Rather, I am merely suggesting that when organized differently, their incentives are better aligned with learning vs. efficiency gains. Consider the following contrasting department/team structures, each with 3 people. Fractional estimates and summed team sizes are illustrative only.

Specialist Model: organized for functional efficiency. Workers are not dedicated to any one business capability, rather their time is allocated to many.

Generalists Model: Full-stack Data Scientists optimized for learning. Workers are fully dedicated to a business capability and perform all the functions.

[4] A more efficient way to learn about this approach to organization vs the trial and error I went through is to read the book by Amy C. Edmondson called “Teaming: How Organizations Learn, Innovate, and Compete in the Knowledge Economy” (Jossey-Bass, 2014).

[5] This process of iteration assumes low cost of trial and error. If the cost of error is high you may want to rethink (i.e., it is not advised for medical or manufacturing). In addition, data volume and system availability requirements should also be considered. If you are dealing with petabytes or exabytes of data, specialization in data engineering may be warranted. Similarly, system availability (ie. uptime) and innovation are tradeoffs. If availability is paramount, functional excellence may trump learning. Finally, the full-stack data science model relies on the assumption of great people. They are not unicorns; they can be found as well as made. But they are in high demand and it will require certain conditions in order to attract and retain them (competitive compensation, company values, interesting work, etc.). Be sure your company culture can support this.

[6] Smith, Adam. An inquiry into the nature and causes of the wealth of nations. Dublin: Printed for Messrs. Whitestone, 1776. Print. Page 464.

[7] Pink, Daniel H.. Drive: the surprising truth about what motivates us. New York, NY: Riverhead Books, 2009.

Full-stack causes task-switching cost on the individual level. If modeling, data pipeline and putting it into production just require the easy 80% expertise, then full-stack is feasible. However, if the last 20% expertise is required (e.g. next iteration) then a full-stack person might end up doing 2 or 3 very different jobs that are only "kind of" related.

回复
Layla Foord

Chief Operating Officer at Smiling Mind

5 年
回复
Rohit Gossain

Product Leader at Adobe | AI/ML & Growth | ex-Expedia, OpenTable & Shutterfly | xTech Founder

5 年

I would prefer Full stack data scientists to build MVPs but its hard to find those who can solve specialized problems hence I would prefer mix of specialist and generalists to build successful and efficient products.

Olabode Ajayi

Big/Data Engineer Specialist, Architecture, and Analyst

5 年

This is true but not 100 % right argument. Your insight is good for learning purposes for the organisation that prefer generalist. Today companies, especially the fortune 500 companies prefer specialist that's well vast in knowledge. Companies that had defined structures would rather prefer hiring specialist whose speciality is more focus and align with their business model and objective.? I personally like the term generalist, and I would?think that graduates would prefer to go that route for his or her career development. After, would prefer to the specialist option just as the case in the medical field.

回复

要查看或添加评论,请登录

Eric Colson的更多文章

  • PM or get PM'd

    PM or get PM'd

    Product- and Project Management are critical for data science initiatives (in this piece, I use “PM” to refer to both…

    5 条评论
  • Let Curiosity Drive: Fostering Innovation in Data Science

    Let Curiosity Drive: Fostering Innovation in Data Science

    [This piece is cross posted from multithreaded and is more nuanced version of my HBR article titled Curiosity Driven…

    12 条评论
  • You Can’t Make This Stuff Up… Or, Can You?

    You Can’t Make This Stuff Up… Or, Can You?

    [this article is cross-posted from multithreaded and co-authored with Joerg Fritz] On November 7, 2016 we witnessed one…

    10 条评论
  • An animated tour of Stitch Fix Algorithms

    An animated tour of Stitch Fix Algorithms

    [crossposted from the Multithreaded] It’s amazing what can be enabled via data science. It’s one of the most effective…

    44 条评论
  • Personalizing Beyond the Point of No Return

    Personalizing Beyond the Point of No Return

    [Cross posted from the Stitch Fix tech blog: Multithreaded] There is an old story of a commander who, upon landing on…

    15 条评论
  • Advice to Data Scientists on Where to Work

    Advice to Data Scientists on Where to Work

    [cross-posted from the Stitch Fix Tech Blog and co-authored with Brad Klingenberg and Jeff Magnusson.] It's a good time…

    22 条评论
  • Recommendations as Unique as You

    Recommendations as Unique as You

    [cross-posted from the Stitch Fix tech blog] Each of our clients is unique. The visualization above shows a small…

    7 条评论
  • Machine and Expert-Human Resources: A Synthesis of Art and Science for Recommendations

    Machine and Expert-Human Resources: A Synthesis of Art and Science for Recommendations

    [note: cross posted from the Stitch Fix Tech Blog] The process of selecting just the right merchandise for each of our…

    4 条评论
  • Welcome to Stitch Fix, Jeff Magnusson

    Welcome to Stitch Fix, Jeff Magnusson

    Welcome Jeff Magnusson, Director of Data Platform [cross-posted from the Stitch Fix tech blog] I am excited to announce…

    4 条评论

社区洞察

其他会员也浏览了