Current and Future Trends in Health and Medical Informatics
1.0 Large Language Models (LLMs)
Large Language Models are the next step in the evolution of Natural Language Processing (NLP). LLMs are basically machine learning models that have been trained on an extremely large dataset of human language and usually have over a billion parameters. (Jeon & Lee, 2023) One could say it is an NLP but on steroids. LLMs work by predicting the next word in a sentence and replicating a response as close to human language as possible. Below, I will share the market landscape of a few LLMs that have made the news and are at the centre of many medical use cases.
1.1 ChatGPT
The LLM trained by OpenAI is called ChatGPT and that has made most of the hype on current news and social media channels. The current version of ChatGPT - GPT4 has been trained using Reinforcement Learning from Human Feedback (RLHF). (OpenAI, 2023) Figure 1 shows how OpenAI trains its LLM which includes three phases – training a supervised policy, training a reward model and optimizing policy against a reward model.
1.2 Bard
Bard is the LLM developed by Google using LaMDA (Language Model for Dialogue Applications) which is a Transformer-based model. (Google, Meet Bard, 2023) Google’s Meena, a modern conversational agent, comes closest to human interaction based on the sensibility and specificity average (SSA) metric as shown in Figure 2. Meena outperforms other open-domain chatbots such as Mitsuku, Cleverbot, and DialoGPT.
1.3 Med-PaLM
Med-PaLM is another LLM created by Google specifically to answer medical questions. The latest version – Med-PaLM 2 has a score of 85.4% on the US Medical License Exam (USMLE). The USMLE requires the respondent to understand the symptoms and patient’s tests and combine that information with medical knowledge to provide a path of appropriate treatment. (Google, Med-PaLM, 2023) Med-PaLM is shown to outperform other medical domain LLMs such as PubMed GPT, DRAGON, etc. This comparison is visually represented in Figure 3.
1.4 LLaMA
LLaMA (Large Language Model Meta AI) is the LLM developed by Meta (previously Facebook). LLaMA comes in several sizes – 7B, 13B, 33B and 65B parameters. The “B” stands for billion. This enables LLaMA to answer general questions, mathematical questions, medical questions, etc. The LLaMA 33B model is trained on 1.4 trillion tokens or pieces of words. (Meta, 2023) In Figure 4 we see that PaLM by Google outperforms other LLMs across Humanities, STEM and Social Sciences based on their largest 540B parameter LLM. We can see that the size of the parameters correlates with accuracy.
2.0 Use Cases of LLMs
2.1 Communication and Summarization
Since LLMs can be trained on any text, health and medical records can easily be used as input data. LLMs are good at summarizing text and doctors can use it to quickly analyze any medical report. The LLMs can assist physicians in auto-creating emails to communicate with their patients, thereby reducing their mental burden. “A panel of licensed health care professionals evaluating the responses preferred ChatGPT's answers 79% of the time and found them more empathetic and of higher quality.” (Johns Hopkins University, 2023) Mark Dredze, an associate professor of computer science at Johns Hopkins University's Whiting School of Engineering found that the sheer number of questions that flow through the electronic patient messaging is causing physician burnout. Using LLMs to assist physicians can make doctors more efficient and patients happier and healthier. (Johns Hopkins University, 2023) This ability to summarize can be used by physicians to rapidly study any new article or research that impacts their specialization. Every second diverted from admin tasks towards patient care will lead to a better healthcare system. Since LLMs work by predicting the next word, LLMs can understand the context and suggest text for medical records which otherwise physicians usually have to type every word. Suggestive text for medical notes is another way in which LLMs can reduce the physician burden.
A similar approach can be used to communicate with insurance providers by prompting an LLM to create an email to convey the necessity of a medical procedure. Insurance providers can use LLMs to analyze and summarize medical records thereby adjudicating a large number of cases. LLMs can help speed up the process by suggesting appropriate treatment and reconciling it with the medical procedure shared for prior authorization.
A real-life use case: Epic, the company that houses electronic health records (EHRs), has integrated a voice assistant into its EHR system. This voice assistant can provide information about the patient such as their medication, vital signs, surgical history, etc. The physician can use the voice assistant to dictate clinical notes, manage meeting schedules, and even help with ICD-10 coding. The voice assistant can auto-generate clinical notes based on the conversation between the doctor and the patient. (Adams, 2023)
领英推荐
2.2 LLM as a doctor
Specialized LLMs such as Med-PaLM and even general LLMs such as ChatGPT have passed the US Medical Licensing Exam. This could mean that LLMs can potentially act in the capacity of a doctor and diagnose you based on your medical records. LLMs can become 1st stage doctors, especially in areas where there aren’t enough. A human physician could be called in to confirm the diagnosis or improve the LLMs algorithm. The additional help from an LLM could help reduce the physician burnout rate.?
LLMs such as GPT-4 can accept medical images and can be used to read radiology reports. Similar to the above use case, LLMs can become assistants to radiologists where radiologists have to confirm the diagnosis. The human-machine combination can reduce the error in diagnosis. In 2021, using deep learning and not an LLM, it was found that “in ophthalmology, AUCs ranged between 0.933 and 1 for diagnosing diabetic retinopathy, age-related macular degeneration and glaucoma on retinal fundus photographs and optical coherence tomography. In respiratory imaging, AUCs ranged between 0.864 and 0.937 for diagnosing lung nodules or lung cancer on chest X-ray or CT scan. For breast imaging, AUCs ranged between 0.868 and 0.909 for diagnosing breast cancer on a mammogram, ultrasound, MRI and digital breast tomosynthesis.” (Aggarwal, et al., 2021)?AUC or Area Under the Curve has a range from 0 to 1. An AUC of 0 means that the model is 100% wrong all the time and an AUC of 1 means that the model is 100% correct all the time. (Google, 2022)
3.0 Limitations of LLMs
LLMs work by predicting the next word. Since it is a prediction there can be errors. Additionally, the output is based on training data and if this data has bias or misinformation, the LLM does not know truth from fiction. Hence, with a lot of use cases of LLMs, humans in the loop are required especially subject matter experts (SMEs) to ensure that the output of the LLM is in accordance with accepted and common knowledge. Hence, every use case of LLM should be prefaced with a warning label.?
3.1 Carbon Footprint
Training of LLMs requires a lot of GPU hours which in turn needs electricity. The use of electricity in the training of LLMs causes high carbon emissions as shown in Figure 5. We can also see that the larger the parameter, the greater the carbon emission. This should make us question our use of LLMs – even though this is a breakthrough technology taking artificial intelligence (AI) to a whole new level, it does have a great impact on our environment due to carbon emissions. Companies training LLMs should take steps to make themselves carbon neutral by utilizing electricity generated by only renewable resources such as solar, wind or water.
3.2 Bias and Misinformation
???????????Most of the public use LLMs such as ChatGPT or the version that powers Bing search by Microsoft is trained on data on the web. This data is not vetted for biases or misinformation. This means that an LLM trained on biased data or nonfactual data will respond to questions with biased and nonfactual answers. There is a possibility that the LLM’s answers may contain toxic content and stereotypes. To counter that, it is important that LLMs contain a mechanism to detect bias and misinformation and raise that to the human-in-the-loop. LLMs should also have a measure to protect itself from being retrained by harmful content.
If an LLM is being used in a hospital setting by a doctor, we would not like to see biased or wrong information presented as the consequences could be dire. This is where specialized LLMs such as Med-PaLM can be utilized. Even with specialized LLMs, the responses should be monitored and moderated by doctors as this is a new field and low on tech maturity.
3.3 Proprietary and Private Information
???????????Another aspect of LLMs is that any input into the LLM is used to better the algorithm. This aspect of learning either by direct human feedback or new documents parsed is what makes LLMs so unique. However, in the business world, which is bound by trade secrets and proprietary knowledge, the fear of a leak will be a damper on LLM usage. With health records, it is even more crucial that only a limited number of people have access to the information. No one would want their medical records to be searchable on a public LLM. There is a lot of work that needs to be done in the space to ensure data privacy is maintained especially as we deal with laws such as HIPAA, GDPR, etc.
3.4 Liability of incorrect results
???????????As stated above, using LLMs can lead to an output that is either biased or contains misinformation. The parent companies of these LLMs do not guarantee a correct result which leaves the user liable for any steps taken based on the output of the LLM. This is one of the areas that need to be flushed out more before we can see widespread use of LLMs in the public domain that are of great consequence such as in health diagnosis based on medical records, insurance, etc.
4.0 Conclusion
???????????LLMs have taken the world by storm but before the use cases become mainstream, there are a lot of questions in the area of privacy, security and energy that need to be addressed. The use cases of LLMs have the ability to impact every job in the world, especially in the healthcare sector where LLMs can help physicians avoid fatigue by assisting in admin-related tasks. Working with medical records will become more conversational for both the patient and the physician. The patient will gain access to an e-doctor that can assist with some basic questions and the doctor will gain access to an e-admin that can help read out the medical records and even assist in diagnosis.?