Having an Impact in Data Science Part II

Having an Impact in Data Science Part II

In Part I we focused on asking the question of how actions would change based on the results of the analysis, prioritizing problems that drove actions, and finally communicating clearly which actions can be taken based on the result. The next level of impact of finding a problem in which you can do this, but can be generalized i.e. the question needs to be asked over and over again. One of the things that I ask of my data scientists after they closed out a project is to reflect on the differences in similarities in their latest analysis work vs. their prior ones, or ones that others in the group have done. The reason for this is to inculcate product centric thinking in my data scientists and analysts. Because they are the one working with the data, brainstorming the effects that the analysis can have, interacting with the stakeholders, dealing with pain points while working with the infrastructure and the data, I believe that they are the ones who are also best placed to think of new products and applications that will be useful more generally. One of the reasons that tech companies are able to grow so fast is that the cost to replicate software is essentially 0, and having a data scientist able to codify and generalize their work into a product means that I can scale their expertise and impact without having to burn their time and effort every time the question comes up.

A product centric mindset means that you're constantly thinking on turning your current analysis into a product. This means trying to consider all of the different variations that someone might ask of your analysis prior to someone actually asking. Trying to consider how the analyses plugs into a workflow or automated process, and finally how to hand things off to an engineering team without someone wishing bodily harm upon you. (As someone who has inherited a 30k line piece of code crammed into a single function, I can confidently say that even mild mannered engineers can be brought to a point where they will wish holy retribution upon you)

In the best case, you as the data scientist have created a new software product/package that can be sold or used internally. In the worst case, you've created a framework that allows you to do subsequent analyses much faster than it did for your first one because you have considered the variations already, and developed a system that can be used primarily via configuration rather than writing new code. Most importantly, by considering how it will live within a product, you've also thought of ways of steering the conversation in ways that aligns with what your code can do, because rather than having an aspirational conversation and what a client or stakeholder wants the analytics to do, you can articulate a clear message and limit scope creep.

In all cases, you've expanded your ability to answer more questions and deliver more value than you would have if you approached each analysis naively without trying to leverage what you did in the past. In either case, the focus is to decrease the amount of time it takes for you to answer a specific question, so you have time to work on new and more interesting problems. From a purely commercial/OKR view, you can use the simple relationship of:

[ Money Saved Due to Analysis ] * [ # Analyses Done ] - [ Your Cost Per Hour ] * [Hours Spent]

The two primary skills that you must develop in order to be product focused is the following:

  • The ability to be mindfully lazy
  • The ability to ship code in a way that others can use

With respect to the first point, I love the following, possibly apocryphal quote by General Kurt Von Hammerstein:

I divide my officers into four classes as follows: the clever, the industrious, the lazy, and the stupid. Each officer always possesses two of these qualities. Those who are clever and industrious I appoint to the General Staff. Use can under certain circumstances be made of those who are stupid and lazy. The man who is clever and lazy qualifies for the highest leadership posts. He has the requisite and the mental clarity for difficult decisions. But whoever is stupid and industrious must be got rid of, for he is too dangerous.

For data scientists, I try to convince people to be mindfully lazy. I want them to spend a few days considering whether they or their colleagues have previously done a similar analyses, analyses that they can borrow and repurpose. All too often data scientists will immediately open their favorite development environment and dive into exploratory data analysis (EDA) and not considering the fact that EDA issues like connecting to data, calculating data fill rates, correlation matrices, etc are things that are used over and over again and probably should be "borrowed" from someone/somewhere else and in the case that it's the first time you're working with a specific modality to automate a process in the off chance that you might be asked to do it one more time. Part of this is to convince someone that their time and mental bandwidth is valuable, and that I'd rather they think of something new rather than grinding the Nth variation of an analysis that they've done previously, but also a that a few minutes of thought will save hours of coding and meetings to go over results. Finally and most importantly, having a set process that you an kick off by calling 1-2 functions rather than re-writing it from scratch every time means that if you ever catch an error (and that will happen), if you're working off of the same replicated code base you can correct it once, and stand a good chance that it's not going to bite you in the butt in the future. Having everything in code means that again you don't need to waste the mental bandwidth to think about it in the future.

This mindset is related to product centric thinking because now rather than thinking about a single problem, you are now thinking about how to solve a class of problems, and when you are trying to solve a class of problems, it becomes a small leap to think about how to package this all together in a way that you get a real product.

However, to take advantage of this, there has to be a focus on writing good code. This means that you've done a good job separating functionality, writing functions that are generalizable to a wider range of problems and knowing how to extend currently written code without copy and pasting things. There is also the side benefit that if you do write good code, it becomes possible to give someone else your code in a state that they can understand and extend your code for other purposes, and thus the original work you did for one thing is now percolating through the rest of your organization/field. This also makes it easier to pass it over to the engineering team, and frankly if the engineering team looks over your work and sees clear organization and good coding practices, they are more likely to advocate for your work, and help out with things that you currently don't know how to do, and there is nothing better at convincing a product manager/owner than to show them a mostly functioning demo.

While it might feel nice if every time your organization needs to do a specific task, you're the go-to person for that. And while given different economic environments (like late 2023-2024), there might be a desire to become indispensable. However, I would argue that there is another path to becoming indispensable, and that's by being the person that they call every time there is a new problem to be solved, and to do that you can't constantly be supporting old work.

Part of the draw for being a data scientist for myself is having the time to piece together all of the different questions and problems that have been posed to me, finding connections in between the different topics, and having time to explore the data set more fully to see whether or not there is something in there that someone has missed. The only way to do so, is to make sure that I don't have to be the critical path for some analyses I've done in the past that is currently being used. I've either been able to create a tool that can run without me, or I've been able to get my analyses to a point and the code clean enough that the analysis that I pieced together in the past can continue being used by others without me.

Sahith Guptha

Senior Java Full Stack Developer at CVS Health | AWS Certified Developer

11 个月

Your dedication to cultivating a product-centric mindset is truly commendable. Your emphasis on efficiency and scalability through reusable solutions reflects outstanding foresight and initiative.

回复
Koenraad Block

Founder @ Bridge2IT +32 471 26 11 22 | Business Analyst @ Carrefour Finance

12 个月

Your contributions to Data Science are invaluable. Keep up the great work! ????

回复

要查看或添加评论,请登录

Eric Yang的更多文章

  • Speed-running the lifecycle of a data science project – LLM Edition (Part 1)

    Speed-running the lifecycle of a data science project – LLM Edition (Part 1)

    Recently I had bemoaned the fact that for an open position, I was inundated with candidates, more than 500 in two days,…

    3 条评论
  • Sklearn-Pipelines are awesome

    Sklearn-Pipelines are awesome

    One of the things I see in a lot of data science code is an under-reliance on object-oriented programming. Many times…

    2 条评论
  • From Junior to Senior and Beyond in Data Science (IC) Track

    From Junior to Senior and Beyond in Data Science (IC) Track

    One of the questions that will invariably pop up in your career is "What do I need to do to get promoted?" One of the…

    3 条评论
  • LLM And Hallucinations

    LLM And Hallucinations

    I'm a bit surprised that the term hallucinations have been used to describe the output of LLMs when they're performing…

  • Having an Impact in Data Science (Part 1)

    Having an Impact in Data Science (Part 1)

    One of the biggest factors that lead to burnout for data scientists is feeling that their analyses are not making a…

    2 条评论
  • On Being a Manager

    On Being a Manager

    I've been pretty lucky in my career to have had pretty good managers overall, and I think that a big part of one's…

    1 条评论
  • On Grit

    On Grit

    I had a disagreement with my wife about Angela Duckworth’s book Grit: The Power of Passion and Perseverance, on purely…

  • Google's Kodak Moment

    Google's Kodak Moment

    The one thing that I had hoped that businesses would learn is that you shouldn't be put out of business by something…

    2 条评论
  • Finding the Optimum of a Unknown Function with Neural Networks

    Finding the Optimum of a Unknown Function with Neural Networks

    One of the desires that I had since graduate school, was to take a model that came from various ML methods and to be…

    1 条评论
  • Lasso Attention Layer For Neural Networks

    Lasso Attention Layer For Neural Networks

    For me, the holy grail of ML methods is one that can Properly predict an outcome Identify the relevant input features…

社区洞察

其他会员也浏览了