Everybody was AI Scribing
Dr Terence Tan
Physician Defector | J-Apac Head of Health & Lifesciences, sustainability & agrifood startups @ AWS | Community founder @ TechBrews | My expressed views are personal and are not the opinions of my employer, AWS
AI scribes, 2024’s hottest new Healthcare AI solution promises to automate the process of history taking and in so doing, help to solve the administrative burden and clinician burn out problem. So are these AI powered solutions the best thing since sliced bread? The magic bullet? Let’s take a look at the problem, the proposed solution, the solution landscape and some of the problems with AI scribes.
5 point Executive summary
The Problem
Data suggests that clinicians are burdened by excessive administrative tasks, in particular, documentation. They spend 2.3hrs documenting for every 8 hours of clinical time. This workload contributes to burn out in 57% of physicians and poor patient experiences in 67% of patients polled (1,2).
There is a pressing need to reduce administrative tasks. One of the most commonly touted are he automated AI-scribes to help physicians with onerous task of history taking, to improve clinician well-being and patient outcomes.
Solution: AI?
Current evidence suggests that AI scribes reduces documentation time and prevent burnout.
A trial of an ambient AI tool in Iowa showed a significant reduction in burnout scores with a Stanford PFI score 4.16 falling to 3.16 (the validated cutoff for overall burnout is 3.33). Burnout rates also decreased from 69% to 43%. (3)
Permanente group reported in NEJM that AI scribes resulted in large reduction in documentation outside of 7am to 7pm (with no effect on other metrics such as time in clinical review) (4).
How does it work?
They use automated speech to text diarization (automatically partitioning an audio recording into corresponding individual speaker segments) and transcription of clinician-patient conversations. This text is then passed through several AI models usually including extraction of information from unstructured text (named-entity recognition) and finally through a language model to generate a draft structured clinical note for review, editing and finalisation.
Hallucination woes
But it’s not all plain sailing. Using general language models not specifically designed for medicine is a risky endeavour due to hallucinations. Hallucinations being “incorrect or misleading results that AI models generate” (Google’s definition, 5). For example, there were hallucinations in almost all outputs when researchers used GPT-4o and Llama-3 models to summarise 50 comprehensive medical notes (6).
Even dedicated AI solutions designed for medicine lack of clinical and safety validation: many AI scribes on the market have not published clinical utility/validity?and patient safety?data in academic journals (7).
Just this September, the Texas Attorney General’s Office reached a settlement with Pieces Technology over “allegations that the company made false, misleading or deceptive claims about the accuracy of its healthcare AI products” (8). Tellingly, this settlement includes “prohibitions against misrepresentations (including independence of an endorser or reviewers of a business product or service) “.
However, this risk is not just theoretical or based on allegedly false claims. It is here and it is present. OpenAI’s Whisper model regularly "creates fabricated text in medical and business settings despite warnings against such use" and OpenAI goes do far as to have specific warnings against using Whisper for "high-risk" domains. Yet, it is powering Nabla's AI copilot service in 40 health systems in the US (9).
Even Nabla acknowledges Whisper can confabulate- but simultaneously deletes the original recording, leaving no avenue for source verification!
On the other hand, we must acknowledge the current human standard of clinical notes is also imperfect. Although it is difficult to truly measure the error rate in clinical notes, studies have shown that 1 in 5 patients report finding a mistake in their own care notes (10). This could be due to the delivery of communication from patient to clinician or the reception of the information by the clinician. AI scribes may be able to reduce the clinician reception error but not the patient delivery error- and more validation is required for this, moving forward.
Author's note: I prefer the term confabulation instead of hallucination- largely because the medical terminology aligns closer how the models behave. Quite fitting actually that the term used is close but incorrect.
Other non-AI technical woes
Hallucinations are not the only problems AI scribes face.
Additionally, errors affecting clinical meaning and accuracy, such as:
Significant missing data is often missing in the aural record, such as, relevant information that is not discussed (e.g. radiology reports, referral letters) or nonverbal cues or data from medical devices. Although one can hardly blame the AI tooling for this, it is nevertheless a significant capability/function gap.
Data handling issues are also very present. For example, the Royal Australian College of GPs published a specific guidance on AI scribes stating that the clinician should consider;
Task switching: History taking vs. history checking
Traditionally, clinical documentation has been a synthetic process for clinicians. By forcing recall and integration of data from different sources, this process can be clarifying, analytical and valuable in the clinical thought process (RACGP- Royal Australian College of General Practitioners).
With AI scribes, the task has switched from <history-taking??documentation (synthetic task with clinical thinking> to <history taking??checking, editing??then clinical thinking>.
This shift in the documentation/clinical thinking process has several potential implications. First, transitioning from a synthetic process to one of checking & comparing introduces changes whose effects remain unclear. Second, if synthesis is still required, the added step of checking & comparing may impose an additional cognitive load. Research indicates that cognitive load is cumulative, and the increasing demands of secondary tasks can eventually reach a threshold where errors are more likely to emerge (12).
Verification Complexity can also be considered. Defined as “the task complexity of verifying that automation is performing correctly”, the same study showed that high Verification Complexity lead to the tendency for humans to become over-reliant on AI outputs and ignore critical clinical details or forgo checking the outputs generated by the AI tool (Automation Bias) (12).
Workflow disruptions
The necessity of reviewing the AI's output & incorporating additional information into clinical documentation still requires time and this may modulate the full potential of expected time-saving advantages of the AI scribe.
Clinicians may need training to effectively use an AI scribe, requiring investing time and money for providers’ institutions. Additionally, this training could require clinicians to take potentially unpaid time away from their practice.
Sergei Polevikov, who has written on this subject (and kindly reviewed my draft) remarked that;
领英推荐
"Physicians have mixed opinions about AI scribes. Some point out that, in addition to the cognitive load, AI scribes—especially during the early adoption phase—can lead to more "pajama time." This is because of the learning curve, the need for review and edits (due to concerns about inaccuracies), and integration issues like technical glitches and compatibility problems with existing systems."
Fig leaf of physician review and liability?
In fact, the Royal Australian College of GPs published a specific guidance on AI scribes in September this year. Within this guidance, this statement stands out (7):
GPs will be liable for errors within the patient health record even if they are generated by the AI scribe. GPs must ensure that the output prepared by the AI scribe constitutes an accurate record of a patient consultation. They must correct any errors and address omissions before signing off on the documentation and entering it into the patient health record.
In many cases of AI solutions, this liability has been considered to be a Fig Leaf for the solutions which are often hawked as having superior performance to human clinicians. For example, U.S., doctors are generally held liable for medical decisions even when these are based on outputs from an AI source (11).
This cannot be said of AI scribes. Due to the nature of the solution and technical limitations stated above, the Clinician must be the final arbiter as well as in the loop to correct any errors and omissions. Furthermore, the verbal portion is but one fraction of the entire consultation and clinicians are still required to consolidate and synthesise all the inputs; verbal, non-verbal, observation, examination etc.
However, it is important to enable a system to allow users access to the ground truth (such as the original recording or transcription) to enable verification- see the case of Nabla above.
The Current Landscape
So far, I found 55 companies offering AI scribes through a quick search of available resources such as Pitchbook and Crunchbase.
Currently, most AI scribes offer ambient AI note taking/summarisation- essentially leveraging the power of generative AI language models. However, this does not build a significant competitive advantage especially as the technology becomes not only more capable but also easier to deploy. Furthermore, AI scribes are not bound by the same strict regulatory requirements as other solutions in healthcare such as diagnostic tools.
As demonstrated by Permanente’s switch in vendors from Nabla to Abridge in less than a year (13), the current incumbents appear to be competing on an even footing, with price/performance or EHR (electronic health records) integration serving as a key differentiating factor (own opinion).
What’s next
Well, it’s to be expected that most vendors will move towards securing market share resulting in consolidation. If Generative AI and AI tooling in general moves towards simplification and mass market enablement, it is not inconceivable that the larger institutions or incumbent medical software infrastructure platforms deploy solutions of their own.
However, it is more likely that the natural progression towards EHR integration will prevail; after all, partnerships have already been drawn up for example, between Abridge / Epic and Suki / Meditech. Partnerships with large system players will be a priority to enable these solutions to scale. Smaller vendors will likely be deprioritised as too niche or lack scale (i.e. too long-tail).
One potential strategy for vendors is to build a competitive advantage by developing additional features. One key opportunity I see emerging is personalised outputs, allowing clinicians to specify their preferred style with "degree of detail" and formatting options.
Further developments would include expanding coverage to additional specialities (most scribes are designed for primary care), automate orders, referrals, coding & claims and eventually moving towards the regulated space of Clinical Decision Support.
Conclusion
Although preliminary evidence suggests that AI Scribes may help reduce administrative burden, more validation is required in order to determine if these solutions are practically useful once the first and second order effects of workflow disruption, task switching, confabulation/hallucinations and the limited-to-verbal modality of the scribes are accounted for.
-References-
1.https://www.businesswire.com/news/home/20220603005293/en/?While-Interoperability-and-Technology-Have-Made-Significant-Improvements-in-Healthcare-They-Will-Make-an-Even-Bigger-Impact-on-Care-if-Their-Full-Potential-Can-Be-Reached-Physicians-Say-According-to-a-New-athenahealth-Survey
2.https://www.accenture.com/us-en/insights/health/digital-adoption-healthcare-reaction-or-revolution
10.Bell SK, Delbanco T, Elmore JG, Fitzgerald PS, Fossa A, Harcourt K, Leveille SG, Payne TH, Stametz RA, Walker J, DesRoches CM. Frequency and Types of Patient-Reported Errors in Electronic Health Record Ambulatory Care Notes. JAMA Netw Open. 2020 Jun 1;3(6):e205867. doi: 10.1001/jamanetworkopen.2020.5867. PMID: 32515797; PMCID: PMC7284300.
12.Lyell D, Coiera E. Automation bias and verification complexity: a systematic review. J Am Med Infomatics Assoc. 2017;24(2):423-31.
-Disclaimers-
-Acknowledgements-
Biopharma Venture Builder | Strategic leadership | SID member | Oncology, Immuno-Derm & Vaccines | Global Drug Development, Medical Affairs & Operations | GTM strategy & Launch | Keynote speaker
3 个月I don’t know if “Everybody was KungFu fighting”; they surely moved on to AI scribing ??.. Good work on confabulation and collating the list ????
Computer Science PhD Candidate in AI Safety, Ophthalmologist, Science Director at Ufonia
3 个月Great writeup - would be keen to know if there are 'multilingual' scribes out there - especially for Asian health systems where you may be cycling through multiple languages in one clinic!
Empowering Allied Health with AI | Occupational Therapist | Founder & Innovator
3 个月Thanks for a well-written article Dr Terence Tan, a really great description of AI scribes. I agree with some of the previous comments, for me personally as a community based occupational therapist, notes aren't my biggest pain point but for some of my colleagues they are. At the moment there are some gaps for us in allied health with scribes- the lack of ability to capture any hands-on component of sessions (without dictating this really awkwardly), capturing nuanced body language and non-verbal communication which can change the content of the conversation, and of course in community- unpredictable or unstable internet connection to run the scribe. I'm excited though for the possibilities and watching this space closely!
Actual Real Human Intelligence - Metacare.ai , did:health - Healthcare AI/Digital Privacy , Security, & Ethics / HL7 FHIR W3C IETF Build3r
3 个月And the vendors are king fu fighting
Operations Manager | Elevating Healthcare Standards | Empowering Healthcare Professionals
3 个月I believe the perfect solution to the challenges posed by standalone AI scribes lies in combining AI with human expertise. This approach leverages AI for speed and efficiency while relying on human insight for context and accuracy. The result? Superior quality, streamlined workflows, and the best of both worlds.