How Large Language Models are Helping Crack the Code of Electronic Health Records

How Large Language Models are Helping Crack the Code of Electronic Health Records

Imagine a world where doctors have instant access to every detail of a patient’s medical history, no matter where they've been treated.? While health data interoperability is getting better, once that is accomplished, making sense of the data remains a challenge.?

Our new research explores how Large Language Models (LLMs)—the same AI powering those chatbots—can fundamentally change how we extract and understand information from electronic health records (EHRs).

The Challenge of EHR Data

EHRs are a goldmine of medical information, but they're also incredibly complex. Think about it:

  • Different systems: Hospitals and clinics use various EHR systems, each with its own way of storing data.? We have gathered data from more than a quarter of a million patients, from all sites of care, inpatient and outpatient, specialists, and pharmacies.? That data, from about 400 different EHR platforms and more than 250,000 different hospitals, labs, pharmacies, and medical practices, show that they are not only non-standard, but even the same EHR can be implemented differently at different hospitals.? In reality, there are thousands of data models with subtle and non-so-subtle differences in how patient data is stored.
  • Structured vs. Unstructured Data: Some data is neatly organized in tables (like medication lists), while other crucial details are buried in the free text of doctors' notes.? However, that doesn’t capture everything.? A large amount of the information is stored in scanned documents and other image files.
  • Medical Jargon: Clinical notes are full of technical terms, abbreviations, and context-specific language that can be hard even for humans to parse.? Historically, medical notes have been a battleground for natural language processing to test the limits of what computers can do.
  • Billing versus Treatment: Medical records were designed to facilitate billing, not to facilitate clinical care. There has been a long-standing hesitancy to make systems that might venture too closely to a medical device, which would subject these systems to more regulatory oversight.? This means that the data stored in the system is foremost there for accounting rather than care.???

It is very difficult to find key bits of information, often buried in notes, that doctors need for their clinical workflow.?

Enter Large Language Models

LLMs, typically variants of the Transformer neural network architecture, have changed natural language processing.? Since 2020, xCures has been working to use them to unlock information in the EHR data. Here's our approach:

  1. Data Preprocessing: We take all kinds of EHR data (documents, images, etc.) and clean it up, using techniques like optical character recognition (OCR) to extract text from images.
  2. Semantic Search: We use AI to understand the meaning of the text, not just the words themselves. Storing the data as vectors allows us to find relevant documents quickly, even if they use different terminology.? Semantic search means that users can find information using concepts rather than words (although we search for those too).
  3. LLM Extraction: We use LLMs to pull out specific pieces of information.? For our work in cancer, we were very focused on gathering information about cancer diagnosis, including stage and grade, histology and morphology, prognostic and diagnostic biomarkers, and other details that are rarely captured as structured information.? In this work, we were focused on medication names, dosages, and reasons for discontinuation, something that you’d expect to be relatively complete in a medical record.? As you’ll see below, that isn’t typically the case, meaning that important clinical information is not readily accessible.? So, we approach this problem by giving the LLM very specific instructions to ensure it extracts data in a structured format that is compatible with the other EHR data.
  4. Validation: The performance of the LLM is evaluated by trained clinical reviewers who manually checked the extracted data against medical records. A second reviewer verified the first review, and any disagreements were resolved by a third expert. The model's performance was measured using standard metrics like accuracy, precision, and recall, ensuring only explicitly stated information was considered valid.
  5. Standardization: We map the extracted information using the same medical coding tools that hospitals use to ensure conformity to standard healthcare data models (like FHIR and OMOP) to ensure it can be easily shared and analyzed.

Extraction pipeline for development, validation, and deployment of LLMs

The Results

We designed an LLM to extract medications and associated data and deployed it on a large dataset of more than 11,000 patient records, and the results were impressive.

  • Validation: After training the LLM, we used two human experts with an independent referee to judge the performance of the LLM’s extraction, achieving ~95% accuracy, precision, and recall.
  • Increased Data: Deploying our LLM-based system on the 11,000+ patients’ records extracted significantly more medication data than was available in the structured EHR data alone—a 27% increase in total medication records and a 31% increase in distinct drug ingredients!
  • Better Completeness: The LLM was able to pull out details that were often missing from the structured data, such as the reason a medication was prescribed or discontinued.
  • Oncology Breakthrough: We saw a particularly dramatic improvement in oncology-related data, with a 60% increase in total oncology medication records.

Think about that for a minute: Nearly 40% of the cancer medications in those patients' records were not listed in the medication tables.??

Why This Matters

Our work shows that LLMs have the potential to transform healthcare by making EHR data more accessible and usable. This can lead to:

  • Better patient care: Doctors can make more informed decisions with a complete picture of their patients' medical history, with semantic search to find information, and automated structuring of unstructured information to complete missing data in the tables of conditions, medications, procedures, allergies, family medical history, among other areas.
  • More efficient healthcare systems: By automating data extraction, we can free up healthcare professionals to focus on patient care.

The Future of AI in Healthcare

This is just the beginning. We're continuing to refine our approach and explore new ways to use LLMs to improve healthcare. We envision a near future where the information in every medical record - regardless of format - can be instantly searched at the point of care, and where clinicians are presented with precisely what they need in an automated way, tailored to their individual clinical workflows.

We've published our findings and technical details on MedRxiv pending acceptance to a peer-reviewed journal.

Key Takeaways:

  • LLMs can significantly improve the extraction of information from electronic health records (EHRs).
  • LLM extracted data can be mapped to FHIR.? Our method increases both the amount and completeness of structured EHR data.
  • This has the potential to improve patient care and make healthcare systems more efficient.



要查看或添加评论,请登录

xCures的更多文章