How Truveta is Tackling the Cold-Start Problem in Medical Research with Devices data using AI and LLMs

How Truveta is Tackling the Cold-Start Problem in Medical Research with Devices data using AI and LLMs

Truveta's device catalog is a centralized dataset that is crucial for medical research, particularly for studies examining the efficacy of a medical device, comparing similar devices, or for manufacturers wanting to understand how their devices are being used. Within the catalog, each device is represented by a UDI (Unique Device Identification) code and is associated with a list of product attributes. Figure 1 illustrates an example of a UDI and some of its attributes as they are stored in the FDA's GUDID (Global Unique Device Identification Database).


Figure 1: An example UDI and some of its attributes in the GUDID catalog.

Having high-quality, complete, and accurate device attributes for each item is a critical component of first-class datasets for medical research. These attributes enhance the research process by providing:

  • Better Selection:?Researchers can easily identify a device or related devices for their studies, confident in accessing a comprehensive list of devices relevant to their research needs.
  • Better Personalization:?By analyzing product attributes, Truveta can group devices based on common features, creating tailored product profiles for each customer. This enables the provision of highly relevant and personalized study recommendations based on users' specific interests and requirements.

When a healthcare provider joins Truveta, normalizing their raw data is crucial before integration into our catalog. Given the vast scale of this data, we utilize machine learning (ML) models for normalization. However, developing these models from scratch depends on acquiring a large amount of accurately labeled training data to meet accuracy targets. This scenario exemplifies the "cold-start problem," where a lack of initial data can significantly delay effective model training, as illustrated in Figure 2. Collecting sufficient and relevant data is both time-consuming and resource intensive. This not only slows the development and refinement of the model but also hinders the updating and expansion of the active catalog with new medical device entries. Consequently, the extended development cycle escalates operational costs, requiring more resources over a longer period.


Figure 2:Illustration of the cold start problem in machine learning.

To address the cold-start problem, we have turned to Large Language Models (LLMs) such as OpenAI's GPT, Google's Gemini, Meta's Llama, and Anthropic's Claude. These deep learning models, trained on extensive publicly available textual data, excel in understanding and generating human language and can perform various tasks with minimal to no labeled data. While LLMs can adapt to different NLP challenges through specific prompts, they lack domain-specific knowledge and may produce inaccurate information when posed with domain-specific questions—a phenomenon known as "hallucination." To prevent this, we implemented Retrieval-Augmented Generation (RAG). This technique enhances LLM output by referencing an authoritative external knowledge base before generating responses, thereby optimizing accuracy. Figure 3 demonstrates how the RAG process functions.


Figure 3: Retrieval Augmented Generation (RAG) process.

However, the raw data received from healthcare providers are often filled with jargon, abbreviations, and misspellings, making the task of searching for relevant information from our knowledge source (in this case, the GUDID) challenging. For example, consider the query "tib psn np stm 5 deg sz dr zimmer inc." Except for a few terms, we are unable to retrieve relevant information from our knowledge source, as illustrated in Table 1.

Table 1: Classifying item name tokens for a given query.

To mitigate this problem, we leverage LLMs to normalize queries before processing, as illustrated in Figure 3. This approach allows us to refine raw queries into a format that enhances our ability to retrieve relevant information from our knowledge sources.

Figure 4: LLM normalizer model.

Now, with the queries normalized, we can retrieve relevant information with greater accuracy. For instance, the result of the normalizer for the previous query would be "Tibia Persona non-steroidal 5 degree size." As demonstrated in Table 2, we are now able to accurately identify the correct brand name and other pertinent details about the query.

Table 2: ?Classifying item name tokens for a given normalized query.

Despite our efforts, normalization might not always be perfect, and due to frequently missing information from queries, we might not always retrieve the most relevant devices. This can cause the LLM to hallucinate or return incorrect answers. To address this, we introduced a new module to the RAG process thich automatically validate if the retriever result is valid. This module, powered by an LLM, checks whether the retrieved information from the knowledge source is valid for each query. If the result is deemed invalid, the system searches again to discover more relevant information. Conversely, if the automated validation process validates the search result, we proceed with the rest of the process.

Additionally, the lacks of completeness in the query, making the accurate prediction of UDI codes challenging and even leading to disagreements among human experts. To address this issue, we employ a "mixture of experts" approach, where each expert is a LLM. These experts may utilize the same or different models, but it is crucial that they operate under varied prompts. This approach is analogous to having multiple human experts review an input alongside a list of potential answers, each expert tasked with selecting the answer that best describes the input. The collective results from all experts are then processed using a majority rule to determine the final answer. Figure 5 illustrates how this new approach functions.


Figure 5: Proposed Retrieval Augmented Generation (RAG) process.

Looking into the Future

The incomplete nature of raw data often compromises the accuracy of our search results. We are actively exploring various methods to leverage LLMs to enrich this raw data. By generating additional information, we aim to improve both the efficacy and accuracy of our search processes. These enhancements will allow us to extract more precise and reliable insights from the raw data, thereby facilitating more accurate device identifications and cataloging.

Acknowledgments

Special thanks to Anand Oka, Mahsa Eslamialishah, Sarah Stewart, Saman Zarandioon, and Truveta’s terminology team, who all contribute to make this exciting work happen!

?

要查看或添加评论,请登录

Alireza Ghods的更多文章

  • Recipe for Developing a Great LLM – Part I

    Recipe for Developing a Great LLM – Part I

    Large Language Models (LLMs) have rapidly become a transformative tool in the world of artificial intelligence. From…

    3 条评论

社区洞察

其他会员也浏览了