Having an Impact in Data Science (Part 1)

Having an Impact in Data Science (Part 1)

One of the biggest factors that lead to burnout for data scientists is feeling that their analyses are not making a meaningful impact on an organization. One of the most disheartening experiences that a data scientist can have is when you do an analysis to the best of your ability, and it returns an answer that runs counter to what the key stakeholders expect or what. You then get into a cycle where the key stakeholders tell you to adjust your analysis until they get the answer that they want to hear, and at the end of the day they either ignore your analysis because it still doesn't have the answer that they want, or just as bad, you've adjusted by so many fudge factors that it no longer makes sense, and if you're a person of principle, it doesn't sit well with you.

The unfortunate reality is that in many situations where key stakeholders call upon data science, they're looking for either "cover" for their decisions, a safety blanket that tells them all is well, or to project the illusion that they have a quantitative, data driven understanding of a problem. Any project that is about "generating metrics" or creating dashboards generally fall under this category.

An example of a past project that I had seen that feel into this trap was a Risk-Based-Monitoring platform that a my group had worked on. We started by doing a user interview with what we considered domain experts and asking them what "risk" factors existed at a Clinical Trial Site that would lead to poor trial execution. We ended up a large number of risk factors, aggregated them to a score and had various ways of drilling down through the data, and displaying the risk scores and their components across a clinical trial.

Our platform was fast, it was elegant architecturally, and it was beautiful. However, it has also been EOL'd, primarily because we developed a product that was more about providing a security blanket rather than driving an appropriate action. Our thinking was that high risk sites were candidates for more site monitoring, and that the action that our tool/analysis was supposed to drive was to dispatch a site monitor to a given clinical trial site. The primary problem with external adoption was that clients didn't necessarily trust our metrics, or had orthogonal ones that we had not included, and so at the end of the day, despite having a Risk Score that prioritized sites for monitoring visits, we weren't able to convince external stakeholders to reduce site monitoring. In hindsight, part of the failure of the tool is that we never answered the fundamental question, "The risk of what?" The clinical trial sites could have been high risk because of fraud, timeliness of record keeping, or patient retention and each of them drive a specific action and we did not communicate the workflow and the actions that the CRAs (Clinical Research Associate) were required to take when a specific risk metric is high, or whether a specific risk score was high. For instance had we communicated to clients that X metric was indicative of higher probability of inaccurate record keeping by an investigator, and that over a given threshold, a CRA would be required to do 100% Source Data Verification. This would have gotten us out of the interminable discussions of whose metric was more appropriate, and more towards the fact that the metrics were sufficient and that they would trigger an appropriate action.

To get out of this cycle, the most important question that has to be asked is how actions are going to change based on results. It's important to establish first whether or not the actions absent any analytics or tools are problematic, or even whether or not there is an appetite to change what is currently done. Secondly, it's important to establish whether or not the stakeholder is amenable to seeing a negative result. If there is initial resistance, the primary job of you as a data scientist is to either get the stakeholder to agree that a set of actions in the case of negative results is feasible and desirable, and or to communicate that this is not an analytical question that needs to be solved. Generally, the thing that has kept me from burning out is that I'm reasonably successful at getting stakeholders to accept less than stellar outcomes, and being able to bootstrap those negative results into something that actually does move the needle. (I'll admit that I'm not 100% successful, just successful enough times on big enough projects that I still view my career is a net positive and not an interminable slog). Finally, the last step is to be able to communicate the results in such a way where the action that needs to be taken is clear. However, to do this, I encourage all of you to do more thinking about what your results mean instead of spending that time trying your nth model.

Lixia Yao

real-world data and real-world evidence (#RWD, #RWE), health outcome research, Natural Language Processing (NLP) and Artificial Intelligence in Biomedicine

1 年

Thank you for sharing the insight! Many of the cases you listed in the well drafted article echo. Looking forward to reading the Part II.

回复
马靖龙

我使用Python编程语言,贝叶斯统计方法,以及深度学习这三大工具来解决生物问题。

1 年

> the most important question that has to be asked is how actions are going to change based on results This is a bingo question that absolutely needs to be hit!

要查看或添加评论,请登录

Eric Yang的更多文章

  • Speed-running the lifecycle of a data science project – LLM Edition (Part 1)

    Speed-running the lifecycle of a data science project – LLM Edition (Part 1)

    Recently I had bemoaned the fact that for an open position, I was inundated with candidates, more than 500 in two days,…

    3 条评论
  • Sklearn-Pipelines are awesome

    Sklearn-Pipelines are awesome

    One of the things I see in a lot of data science code is an under-reliance on object-oriented programming. Many times…

    2 条评论
  • From Junior to Senior and Beyond in Data Science (IC) Track

    From Junior to Senior and Beyond in Data Science (IC) Track

    One of the questions that will invariably pop up in your career is "What do I need to do to get promoted?" One of the…

    3 条评论
  • LLM And Hallucinations

    LLM And Hallucinations

    I'm a bit surprised that the term hallucinations have been used to describe the output of LLMs when they're performing…

  • Having an Impact in Data Science Part II

    Having an Impact in Data Science Part II

    In Part I we focused on asking the question of how actions would change based on the results of the analysis…

    2 条评论
  • On Being a Manager

    On Being a Manager

    I've been pretty lucky in my career to have had pretty good managers overall, and I think that a big part of one's…

    1 条评论
  • On Grit

    On Grit

    I had a disagreement with my wife about Angela Duckworth’s book Grit: The Power of Passion and Perseverance, on purely…

  • Google's Kodak Moment

    Google's Kodak Moment

    The one thing that I had hoped that businesses would learn is that you shouldn't be put out of business by something…

    2 条评论
  • Finding the Optimum of a Unknown Function with Neural Networks

    Finding the Optimum of a Unknown Function with Neural Networks

    One of the desires that I had since graduate school, was to take a model that came from various ML methods and to be…

    1 条评论
  • Lasso Attention Layer For Neural Networks

    Lasso Attention Layer For Neural Networks

    For me, the holy grail of ML methods is one that can Properly predict an outcome Identify the relevant input features…

社区洞察

其他会员也浏览了