登录查看更多内容

Patient journey comprehension to aid data harmonization

Anand Oka

VP of Engineering @ Truveta | AI, Leadership, Analytics

发布日期: 2024年10月6日

Introduction

One of Truveta’s core competencies is our ability to harmonize the electronic health care record (EHR) data of patients that we get from healthcare systems and make it available to life science researchers, in deidentified form, in a standard schema called the Truveta Data Model (TDM). ?

The schema explicitly leverages coding of clinical events according to appropriate ontologies, a process we will refer to as “data harmonization”. AI plays a big role in this data harmonization effort. In particular:

we use AI to normalize semi-structured data - i.e. convert raw strings in structured clinical events data to coded concepts.
we use AI to identify and redact PII from unstructured data such as notes and images
we use AI to extract and normalize clinical concepts from fully unstructured data such as notes and images.
we use AI to identify “un-normalizable” elements in unstructured/semi-structured data, so we get a better handle on understanding data quality.

Challenges in harmonizing isolated data elements

The standard approach we take to perform a data harmonization task on a particular data element, such as a term, note or image, is to look only at the contents of that data element. The presumption is that there is sufficient signal available in the element itself to allow its proper harmonization, and by looking at human annotations of many such elements we can get enough supervision to enable AI to learn the appropriate harmonization function. However, we do see that we reach a certain ceiling of accuracy (precision + recall) with this approach due to insufficient mutual information in the element to harmonize it. As an extreme case, can you normalize the term “y” that is an observation value? Does it refer to the “yellow color” of urine or the answer “yes” to a question asked about allergies, or something else entirely? In some domains such as observations or devices the accuracy ceiling may be too low to be adequate for our data harmonization needs.

How can we solve this problem? Interestingly, human annotators are not constrained to look only at the specific data element they are annotating but are free to also look at other relevant data elements from the patient’s EHR. Their aim is to provide accurate ground truth for the data elements, and they do see a big improvement in doing that from looking at adjacent or related data elements. This provides us a valuable hint!

Patient journey summarization for data harmonization

Our hypothesis posits that there is significant additional signal about a data element’s ground truth residing in other parts of the patient journey ?and by exploiting that additional context we can significantly elevate the accuracy ceiling mentioned earlier. Conversely, leveraging the contextual information could allow us to achieve our target accuracy goal with reduced human supervision during training.

Summarization and Harmonization agents working in tandem

We could take an incremental approach to proving and exploiting this hypothesis – say by starting with a specific harmonization task, such as mapping a concept from a diagnosis event – and looking at some handpicked ancillary data elements such as other notes, or images, or EHR events occurring in its temporal vicinity. We could estimate the accuracy lift available from the supplementary context, which would help us develop our hypothesis and perhaps bring in even more context or prune some of the chosen context. However, even if successful, this process will likely be a laborious effort that would need to be repeated for every harmonization task, and hence may not be scalable.

Instead, what if we took a more generalized and automated approach leveraging AI itself to produce this additional context? Specifically, let us view the entire patient journey as a time series of clinical events as shown in the figure above. For a specific data harmonization task such as normalizing a diagnosis and related observation events, we want to extract relevant information from the entire patient journey while filtering out any noise. This could be done by a summarization agent based on a Large Language Model (LLM). The prompt of that summarization task would obviously be tuned to the data harmonization task at hand. More importantly, the prompt would also be tuned for the specific data element instance being harmonized.

领英推荐

Data Strategies That Drive Revenue Growth

Rakuten Symphony 7 个月前

AI-Ready Data: Insights from Gartner Report to Unlock…

Quinnox 1 个月前

The Power of Data Visualization in Pharma: Turning…

Interra Information Technologies (InterraIT) 1 年前

Continuing with the example above, suppose that when normalizing the diagnosis, the raw string says “M.N. prst., anap” or “CA PRSTT, ANPLSTC”. ?This string can be passed in the context of the summarization agent, and it would pull out a summary of the events in the patient journey that are most relevant to normalizing that string. ?In this example, previous (as well as future) events in the patient journey pertinent to high levels of prostate-specific antigen (PSA) and enlargement of prostate could be very informative and the AI model would have more confidence in mapping the string to the concept “malignant neoplasm of prostate, anaplastic”. In contrast, lack of any such contextual information would make it have low confidence, especially if a similar abbreviation was never seen in training data.

The future is as informative as the past

Interestingly, the patient journey summary context need not be limited to only being from the past, but can also look at the future of the data element being normalized. After all, this is a data harmonization task, where all the events have already happened and are recorded in the patient journey and there is nothing wrong in looking at the subsequent events to gather signals to harmonize earlier events. In other words, data harmonization is not a causal task. ?In contrast, in a truly predictive task like predicting risk of disease in the future based on the past journey, we would of course have to constrain to only using causal features.

Iterative harmonization

Iterative summarization and harmonization

An interesting extension to our approach involves introducing an iterative process into the harmonization process. Instead of treating the extraction of context from the patient journey as a feed-forward operation, we could implement an iterative feedback mechanism. This iterative process would begin with an initial, context-free harmonization attempt. This preliminary output would then be used to extract a contextual summary from the broader patient journey, which in turn would be fed back into the harmonizer to generate a refined estimate. This refined output would enable the gathering of a more nuanced context, which could be iteratively looped back into the system. By repeating this cycle, we would expect the harmonization accuracy to improve incrementally with each iteration, ultimately converging on a more accurate and robust result.

The success of this iterative system hinges on carefully managing the information flow to ensure stability, specifically by circulating only “extrinsic information”—that is, information that was not part of the harmonizer’s initial input or immediate outputs. This approach is analogous to iterative decoding techniques such as Turbo decoding used in error correction codes, where iterative refinement and feedback are fundamental. Analytical tools like Extrinsic Information Transfer (EXIT) charts could be employed to model and predict the behavior and stability of such a system.

We also have evidence from agentic frameworks used in assistive chatbots that this iterative approach works well. In fact, reflection and iteration with message passing across independent agents are essential aspects of agentic frameworks.

Coupling of harmonization tasks across the patient journey

Such an iterative system will not simply couple the summarization and harmonization task pertinent to a specific data element but will also couple all the harmonization tasks across the whole journey. As each data element in the patient journey gets normalized adequately, the summarizers of other data elements benefit from that in the sense that their summary becomes more pertinent and less noisy. This is illustrated by the diagram above. This cross-communication between the data harmonizers will give an additional lift in accuracy and reduce reliance of large-scale human annotation.

Acknowledgement: I would like to thank Alireza Ghods for his contributions in writing and reviewing this article.

On resilience and innovation

512 位关注者

Rambabu Pusarla

Data Engineer

5 个月

Nice Article

要查看或添加评论，请登录

Anand Oka的更多文章

How to climb a mountain?

2024年11月28日

How to climb a mountain?

How to climb a mountain? One step at a time, of course! The answer seems obvious, but often it is not so. Sometimes we…

1 条评论
On Thought Leadership

2024年10月27日

On Thought Leadership

We often hear the phrase “thought leadership”, especially when evaluating the performance and impact of a contributor…

3 条评论
Understanding heterogeneity to generate hypotheses through predictive clustering

2024年2月24日

Understanding heterogeneity to generate hypotheses through predictive clustering

In this post I would like to talk about an inductive method of developing hypotheses from data using the assistance of…

2 条评论
Adapt to succeed: ten lessons in professional development

2024年1月1日

Adapt to succeed: ten lessons in professional development

The year that was We come to the end of another year – one that saw Artificial Intelligence (AI) break into the public…

2 条评论
Adversity begets Leapfrog

2023年9月4日

Adversity begets Leapfrog

I have been ruminating about the deep correlation of adversity and leapfrog for some time. We often say, “necessity is…

2 条评论
Believe it … or not!

2022年1月23日

Believe it … or not!

Introduction - the need for "sensing" In the previous post we discussed design principles that can provide built-in…

1 条评论
Resilient Design

2021年12月30日

Resilient Design

In the previous article we discussed why "Resilience in Supply Chains" has become a popular meme right now. Over the…
Modeling supply chains for resiliency

2021年12月18日

Modeling supply chains for resiliency

We recently announced the public preview of Dynamics 365 Supply Chain Insights, a product that aims to help businesses…

1 条评论

See all articles

Patient journey comprehension to aid data harmonization

Anand Oka

VP of Engineering @ Truveta | AI, Leadership, Analytics

Introduction

Challenges in harmonizing isolated data elements

Patient journey summarization for data harmonization

领英推荐

The future is as informative as the past

Iterative harmonization

On resilience and innovation

512 位关注者

Anand Oka的更多文章

社区洞察

其他会员也浏览了

Big Data LDN 2023: The 15 Conference Theatres

What is Predictive Analytics and why does it matter?

January's Insights for Data Leaders

Is AI-Driven Productivity & Insight an Illusion? [Enterprise Edition]

Ready to revolutionise with data and AI? Dive into our latest insights!

Impact of Data Science in Healthcare

Unlocking growth through Smarter decisions with Big Data Analytics

Are data silos hurting your business?

Unveiling The Power Of Predictive Analytics: A Graph-Based Approach

What is AI-Ready Data? How to Get Your There?

Introduction

Challenges in harmonizing isolated data elements

Patient journey summarization for data harmonization

领英推荐

The future is as informative as the past

Iterative harmonization

On resilience and innovation

512 位关注者

Anand Oka的更多文章

How to climb a mountain?

On Thought Leadership

Understanding heterogeneity to generate hypotheses through predictive clustering

Adapt to succeed: ten lessons in professional development

Adversity begets Leapfrog

Believe it … or not!

Resilient Design

Modeling supply chains for resiliency

社区洞察

其他会员也浏览了

Big Data LDN 2023: The 15 Conference Theatres

What is Predictive Analytics and why does it matter?

January's Insights for Data Leaders

Is AI-Driven Productivity & Insight an Illusion? [Enterprise Edition]

Ready to revolutionise with data and AI? Dive into our latest insights!

Impact of Data Science in Healthcare

Unlocking growth through Smarter decisions with Big Data Analytics

Are data silos hurting your business?

Unveiling The Power Of Predictive Analytics: A Graph-Based Approach

What is AI-Ready Data? How to Get Your There?