登录查看更多内容

Having an Impact in Data Science (Part 1)

Eric Yang

Senior Director of Data Science @ Medidata (Acorn.Ai)

发布日期: 2024年1月16日

One of the biggest factors that lead to burnout for data scientists is feeling that their analyses are not making a meaningful impact on an organization. One of the most disheartening experiences that a data scientist can have is when you do an analysis to the best of your ability, and it returns an answer that runs counter to what the key stakeholders expect or what. You then get into a cycle where the key stakeholders tell you to adjust your analysis until they get the answer that they want to hear, and at the end of the day they either ignore your analysis because it still doesn't have the answer that they want, or just as bad, you've adjusted by so many fudge factors that it no longer makes sense, and if you're a person of principle, it doesn't sit well with you.

The unfortunate reality is that in many situations where key stakeholders call upon data science, they're looking for either "cover" for their decisions, a safety blanket that tells them all is well, or to project the illusion that they have a quantitative, data driven understanding of a problem. Any project that is about "generating metrics" or creating dashboards generally fall under this category.

An example of a past project that I had seen that feel into this trap was a Risk-Based-Monitoring platform that a my group had worked on. We started by doing a user interview with what we considered domain experts and asking them what "risk" factors existed at a Clinical Trial Site that would lead to poor trial execution. We ended up a large number of risk factors, aggregated them to a score and had various ways of drilling down through the data, and displaying the risk scores and their components across a clinical trial.

领英推荐

Cutting Through the Noise: a Q&A with Director of Data…

Syneos Health 1 年前

The Story That Data Tells You

Mobius Knowledge Services 2 年前

Exploring Advanced Model Assessment Techniques in Data…

Prestige Development Group 1 年前

Our platform was fast, it was elegant architecturally, and it was beautiful. However, it has also been EOL'd, primarily because we developed a product that was more about providing a security blanket rather than driving an appropriate action. Our thinking was that high risk sites were candidates for more site monitoring, and that the action that our tool/analysis was supposed to drive was to dispatch a site monitor to a given clinical trial site. The primary problem with external adoption was that clients didn't necessarily trust our metrics, or had orthogonal ones that we had not included, and so at the end of the day, despite having a Risk Score that prioritized sites for monitoring visits, we weren't able to convince external stakeholders to reduce site monitoring. In hindsight, part of the failure of the tool is that we never answered the fundamental question, "The risk of what?" The clinical trial sites could have been high risk because of fraud, timeliness of record keeping, or patient retention and each of them drive a specific action and we did not communicate the workflow and the actions that the CRAs (Clinical Research Associate) were required to take when a specific risk metric is high, or whether a specific risk score was high. For instance had we communicated to clients that X metric was indicative of higher probability of inaccurate record keeping by an investigator, and that over a given threshold, a CRA would be required to do 100% Source Data Verification. This would have gotten us out of the interminable discussions of whose metric was more appropriate, and more towards the fact that the metrics were sufficient and that they would trigger an appropriate action.

To get out of this cycle, the most important question that has to be asked is how actions are going to change based on results. It's important to establish first whether or not the actions absent any analytics or tools are problematic, or even whether or not there is an appetite to change what is currently done. Secondly, it's important to establish whether or not the stakeholder is amenable to seeing a negative result. If there is initial resistance, the primary job of you as a data scientist is to either get the stakeholder to agree that a set of actions in the case of negative results is feasible and desirable, and or to communicate that this is not an analytical question that needs to be solved. Generally, the thing that has kept me from burning out is that I'm reasonably successful at getting stakeholders to accept less than stellar outcomes, and being able to bootstrap those negative results into something that actually does move the needle. (I'll admit that I'm not 100% successful, just successful enough times on big enough projects that I still view my career is a net positive and not an interminable slog). Finally, the last step is to be able to communicate the results in such a way where the action that needs to be taken is clear. However, to do this, I encourage all of you to do more thinking about what your results mean instead of spending that time trying your nth model.

Lixia Yao

real-world data and real-world evidence (#RWD, #RWE), health outcome research, Natural Language Processing (NLP) and Artificial Intelligence in Biomedicine

1 年

Thank you for sharing the insight! Many of the cases you listed in the well drafted article echo. Looking forward to reading the Part II.

马靖龙

我使用Python编程语言，贝叶斯统计方法，以及深度学习这三大工具来解决生物问题。

1 年

> the most important question that has to be asked is how actions are going to change based on results This is a bingo question that absolutely needs to be hit!

3 次回应

查看更多评论

要查看或添加评论，请登录

Eric Yang的更多文章

Speed-running the lifecycle of a data science project – LLM Edition (Part 1)

2025年2月18日

Speed-running the lifecycle of a data science project – LLM Edition (Part 1)

Recently I had bemoaned the fact that for an open position, I was inundated with candidates, more than 500 in two days,…

3 条评论
Sklearn-Pipelines are awesome

2024年10月28日

Sklearn-Pipelines are awesome

One of the things I see in a lot of data science code is an under-reliance on object-oriented programming. Many times…

2 条评论
From Junior to Senior and Beyond in Data Science (IC) Track

2024年10月10日

From Junior to Senior and Beyond in Data Science (IC) Track

One of the questions that will invariably pop up in your career is "What do I need to do to get promoted?" One of the…

3 条评论
LLM And Hallucinations

2024年5月6日

LLM And Hallucinations

I'm a bit surprised that the term hallucinations have been used to describe the output of LLMs when they're performing…
Having an Impact in Data Science Part II

2024年3月4日

Having an Impact in Data Science Part II

In Part I we focused on asking the question of how actions would change based on the results of the analysis…

2 条评论
On Being a Manager

2023年12月3日

On Being a Manager

I've been pretty lucky in my career to have had pretty good managers overall, and I think that a big part of one's…

1 条评论
On Grit

2023年10月28日

On Grit

I had a disagreement with my wife about Angela Duckworth’s book Grit: The Power of Passion and Perseverance, on purely…
Google's Kodak Moment

2023年4月4日

Google's Kodak Moment

The one thing that I had hoped that businesses would learn is that you shouldn't be put out of business by something…

2 条评论
Finding the Optimum of a Unknown Function with Neural Networks

2021年11月30日

Finding the Optimum of a Unknown Function with Neural Networks

One of the desires that I had since graduate school, was to take a model that came from various ML methods and to be…

1 条评论
Lasso Attention Layer For Neural Networks

2021年2月10日

Lasso Attention Layer For Neural Networks

For me, the holy grail of ML methods is one that can Properly predict an outcome Identify the relevant input features…

See all articles

Having an Impact in Data Science (Part 1)

Eric Yang

Senior Director of Data Science @ Medidata (Acorn.Ai)

领英推荐

Eric Yang的更多文章

社区洞察

其他会员也浏览了

Mastering Time Series Analysis from Scratch: A Data Scientist's Roadmap

Transforming Information into Insights with Nanyatt

In a Radical Uncertainty world, be careful how we use data.

A simple guide to Cortex ML Functions: Anomaly Detection

Understanding Statistical Distributions

Journey of Data, depicted as Story

The Data Scientist's Prayer: Finding Humour and Insight in the World of Data

What are the 3 Stages where your Data Science Teams might Fail???

The Art of Asking Questions: A Data Scientist's Guide to Problem-Solving.

Exploring the F-Distribution and ANOVA: Keys to Statistical Insights

领英推荐

Eric Yang的更多文章

Speed-running the lifecycle of a data science project – LLM Edition (Part 1)

Sklearn-Pipelines are awesome

From Junior to Senior and Beyond in Data Science (IC) Track

LLM And Hallucinations

Having an Impact in Data Science Part II

On Being a Manager

On Grit

Google's Kodak Moment

Finding the Optimum of a Unknown Function with Neural Networks

Lasso Attention Layer For Neural Networks

社区洞察

其他会员也浏览了

Mastering Time Series Analysis from Scratch: A Data Scientist's Roadmap

Transforming Information into Insights with Nanyatt

In a Radical Uncertainty world, be careful how we use data.

A simple guide to Cortex ML Functions: Anomaly Detection

Understanding Statistical Distributions

Journey of Data, depicted as Story

The Data Scientist's Prayer: Finding Humour and Insight in the World of Data

What are the 3 Stages where your Data Science Teams might Fail???

The Art of Asking Questions: A Data Scientist's Guide to Problem-Solving.

Exploring the F-Distribution and ANOVA: Keys to Statistical Insights