Individualism: salami slicing
The Virtuous Data Scientist, chapter 5.2

Individualism: salami slicing

Corporate research can resemble grant-funded academic research:

  • A researcher or team writes a project proposal, describing the problem, the data, the method, the expected outcome, and the timeline.
  • Stakeholders (managers, coworkers) peer-review the proposal.
  • The revised proposal is usually accepted within days to years (gulp!)
  • The team works on the project, periodically updating the stakeholders.
  • The team organizes a wrap-up meeting and posts code, documentation, and other outcomes in accessible repositories.

Variants to this procedure include a few extremes, such as:

  • Stakeholders and team members skip most of the write-up, in favor of oral communication -- often, the manager tells the team what to do, and they get it done.
  • The team is required to write a project brief, a proposal, a comprehensively documented analysis (a.k.a. technical report), self-describing (read: highly verbose) slides, and a wiki page.

The following paragraphs provide a comment on these two extremes.

Because research has so many nuances, it is vital to keep it documented. What was the need for the project? What were the assumptions? Which trade-offs were considered? If addressed in writing, these and many other questions can be revisited in the future, so that other researchers understand the decision-making process and explore alternative paths.

Messy documentation is a pain for everyone: the authors, the reviewers, and future users. To avoid it, I suggest we use a familiar wheel: the classic format of a scientific paper.

  1. Abstract = Brief. Together with your manager, write 150-250 words outlining the why, what, who, when, and how associated with your new project. Include key performance indicators.
  2. Introduction + Methods = Proposal. In the first days of the project, thoroughly familiarize yourself the business problem and why it requires R&D efforts; write those in the Introduction. In the Methods, state the resources (personnel, data, etc.) required to execute the project, the timeline, and the expected outcome of each sprint.
  3. Results = Documented analysis. Once the project is underway, start updating the Results section, every sprint. Don't wait until everything is done.
  4. Discussion = Final presentation. Summarize the previous sections and focus on the project’s findings and implications. What future work do you recommend? A PowerPoint is fine, but I actually prefer a file format that can be concatenated with the other sections of the report, such as a pdf.
  5. Conclusion = List of links to data and code repositories. As your project progresses, fill out this section. Make it exceedingly easy to replicate your analysis and access your deliverables.

Placing all sections together, in a wiki, optimizes discoverability and reproducibility. Stakeholders can easily check for progress, prepare for the final meeting, and peruse deliverables.

Conclusion

This proposed strategy, of publishing work in five separate instances instead of one, may resemble scientific salami slicing. Why not delay the release until the end? The answer is that corporate research requires strong alignment with stakeholders.

Now I have some questions for you:

How much time do you devote to understanding the context of your new project and talking to stakeholders, before you dig in the data? Hypothetically, if you got hit by a bus today, how long would it take for someone else to follow your project's paper trail and pick up where you left off? What other documentation protocols would you recommend?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了