Everybody was AI Scribing

Everybody was AI Scribing

AI scribes, 2024’s hottest new Healthcare AI solution promises to automate the process of history taking and in so doing, help to solve the administrative burden and clinician burn out problem. So are these AI powered solutions the best thing since sliced bread? The magic bullet? Let’s take a look at the problem, the proposed solution, the solution landscape and some of the problems with AI scribes.


5 point Executive summary

  1. Administrative Burden: Clinicians spend significant time on documentation (2.3 hours for every 8 hours of clinical work), contributing to burnout and poor patient experiences.
  2. AI Scribes as a Solution: AI-powered scribes aim to reduce documentation time and clinician burnout by automating history-taking and generating draft clinical notes for review.
  3. Challenges with Accuracy: AI scribes using general language models may produce "hallucinations" or inaccuracies, with concerns over clinical validation, data handling, and missing information.
  4. Cognitive Load and Workflow Issues: Shifting from history-taking to editing AI-generated notes can increase clinician cognitive load and introduce errors, despite saving time in some areas.
  5. Market and Future Trends: The AI scribe market is competitive with ongoing developments, including EHR integration and expansion to new specialties, but further validation is needed to confirm their practical value in healthcare.


The Problem

Data suggests that clinicians are burdened by excessive administrative tasks, in particular, documentation. They spend 2.3hrs documenting for every 8 hours of clinical time. This workload contributes to burn out in 57% of physicians and poor patient experiences in 67% of patients polled (1,2).

There is a pressing need to reduce administrative tasks. One of the most commonly touted are he automated AI-scribes to help physicians with onerous task of history taking, to improve clinician well-being and patient outcomes.

Solution: AI?

Current evidence suggests that AI scribes reduces documentation time and prevent burnout.

A trial of an ambient AI tool in Iowa showed a significant reduction in burnout scores with a Stanford PFI score 4.16 falling to 3.16 (the validated cutoff for overall burnout is 3.33). Burnout rates also decreased from 69% to 43%. (3)

Permanente group reported in NEJM that AI scribes resulted in large reduction in documentation outside of 7am to 7pm (with no effect on other metrics such as time in clinical review) (4).

https://catalyst.nejm.org/doi/pdf/10.1056/CAT.23.0404

How does it work?

They use automated speech to text diarization (automatically partitioning an audio recording into corresponding individual speaker segments) and transcription of clinician-patient conversations. This text is then passed through several AI models usually including extraction of information from unstructured text (named-entity recognition) and finally through a language model to generate a draft structured clinical note for review, editing and finalisation.


Hallucination woes

But it’s not all plain sailing. Using general language models not specifically designed for medicine is a risky endeavour due to hallucinations. Hallucinations being “incorrect or misleading results that AI models generate” (Google’s definition, 5). For example, there were hallucinations in almost all outputs when researchers used GPT-4o and Llama-3 models to summarise 50 comprehensive medical notes (6).

Even dedicated AI solutions designed for medicine lack of clinical and safety validation: many AI scribes on the market have not published clinical utility/validity?and patient safety?data in academic journals (7).

Just this September, the Texas Attorney General’s Office reached a settlement with Pieces Technology over “allegations that the company made false, misleading or deceptive claims about the accuracy of its healthcare AI products” (8). Tellingly, this settlement includes “prohibitions against misrepresentations (including independence of an endorser or reviewers of a business product or service) “.

However, this risk is not just theoretical or based on allegedly false claims. It is here and it is present. OpenAI’s Whisper model regularly "creates fabricated text in medical and business settings despite warnings against such use" and OpenAI goes do far as to have specific warnings against using Whisper for "high-risk" domains. Yet, it is powering Nabla's AI copilot service in 40 health systems in the US (9).

Even Nabla acknowledges Whisper can confabulate- but simultaneously deletes the original recording, leaving no avenue for source verification!

On the other hand, we must acknowledge the current human standard of clinical notes is also imperfect. Although it is difficult to truly measure the error rate in clinical notes, studies have shown that 1 in 5 patients report finding a mistake in their own care notes (10). This could be due to the delivery of communication from patient to clinician or the reception of the information by the clinician. AI scribes may be able to reduce the clinician reception error but not the patient delivery error- and more validation is required for this, moving forward.

Author's note: I prefer the term confabulation instead of hallucination- largely because the medical terminology aligns closer how the models behave. Quite fitting actually that the term used is close but incorrect.        

Other non-AI technical woes

Hallucinations are not the only problems AI scribes face.

Additionally, errors affecting clinical meaning and accuracy, such as:

  • Filtering out relevant information (classifying it as irrelevant)
  • ‘Mis-transcription’ of symptoms/medicines/conditions due to accent or use of slang
  • Incorrect data categorising (e.g. confusion of historical vs. current symptoms).

Significant missing data is often missing in the aural record, such as, relevant information that is not discussed (e.g. radiology reports, referral letters) or nonverbal cues or data from medical devices. Although one can hardly blame the AI tooling for this, it is nevertheless a significant capability/function gap.

Data handling issues are also very present. For example, the Royal Australian College of GPs published a specific guidance on AI scribes stating that the clinician should consider;

  • if adequate assurances are provided that the solution is compliant with relevant data management & storage legislation
  • where the data collected by the AI scribe is processed and stored (Australia or overseas)
  • if the data collected by the AI scribe can be used for secondary purposes under the terms and conditions determined by the vendor


Task switching: History taking vs. history checking

Traditionally, clinical documentation has been a synthetic process for clinicians. By forcing recall and integration of data from different sources, this process can be clarifying, analytical and valuable in the clinical thought process (RACGP- Royal Australian College of General Practitioners).

With AI scribes, the task has switched from <history-taking??documentation (synthetic task with clinical thinking> to <history taking??checking, editing??then clinical thinking>.

This shift in the documentation/clinical thinking process has several potential implications. First, transitioning from a synthetic process to one of checking & comparing introduces changes whose effects remain unclear. Second, if synthesis is still required, the added step of checking & comparing may impose an additional cognitive load. Research indicates that cognitive load is cumulative, and the increasing demands of secondary tasks can eventually reach a threshold where errors are more likely to emerge (12).

Verification Complexity can also be considered. Defined as “the task complexity of verifying that automation is performing correctly”, the same study showed that high Verification Complexity lead to the tendency for humans to become over-reliant on AI outputs and ignore critical clinical details or forgo checking the outputs generated by the AI tool (Automation Bias) (12).


Workflow disruptions

The necessity of reviewing the AI's output & incorporating additional information into clinical documentation still requires time and this may modulate the full potential of expected time-saving advantages of the AI scribe.

Clinicians may need training to effectively use an AI scribe, requiring investing time and money for providers’ institutions. Additionally, this training could require clinicians to take potentially unpaid time away from their practice.

Sergei Polevikov, who has written on this subject (and kindly reviewed my draft) remarked that;

"Physicians have mixed opinions about AI scribes. Some point out that, in addition to the cognitive load, AI scribes—especially during the early adoption phase—can lead to more "pajama time." This is because of the learning curve, the need for review and edits (due to concerns about inaccuracies), and integration issues like technical glitches and compatibility problems with existing systems."

Fig leaf of physician review and liability?

In fact, the Royal Australian College of GPs published a specific guidance on AI scribes in September this year. Within this guidance, this statement stands out (7):

GPs will be liable for errors within the patient health record even if they are generated by the AI scribe. GPs must ensure that the output prepared by the AI scribe constitutes an accurate record of a patient consultation. They must correct any errors and address omissions before signing off on the documentation and entering it into the patient health record.

In many cases of AI solutions, this liability has been considered to be a Fig Leaf for the solutions which are often hawked as having superior performance to human clinicians. For example, U.S., doctors are generally held liable for medical decisions even when these are based on outputs from an AI source (11).

This cannot be said of AI scribes. Due to the nature of the solution and technical limitations stated above, the Clinician must be the final arbiter as well as in the loop to correct any errors and omissions. Furthermore, the verbal portion is but one fraction of the entire consultation and clinicians are still required to consolidate and synthesise all the inputs; verbal, non-verbal, observation, examination etc.

However, it is important to enable a system to allow users access to the ground truth (such as the original recording or transcription) to enable verification- see the case of Nabla above.

The Current Landscape

Not comprehensive but a landscape snapshot. N=55.

So far, I found 55 companies offering AI scribes through a quick search of available resources such as Pitchbook and Crunchbase.

Currently, most AI scribes offer ambient AI note taking/summarisation- essentially leveraging the power of generative AI language models. However, this does not build a significant competitive advantage especially as the technology becomes not only more capable but also easier to deploy. Furthermore, AI scribes are not bound by the same strict regulatory requirements as other solutions in healthcare such as diagnostic tools.

As demonstrated by Permanente’s switch in vendors from Nabla to Abridge in less than a year (13), the current incumbents appear to be competing on an even footing, with price/performance or EHR (electronic health records) integration serving as a key differentiating factor (own opinion).


What’s next

Well, it’s to be expected that most vendors will move towards securing market share resulting in consolidation. If Generative AI and AI tooling in general moves towards simplification and mass market enablement, it is not inconceivable that the larger institutions or incumbent medical software infrastructure platforms deploy solutions of their own.

However, it is more likely that the natural progression towards EHR integration will prevail; after all, partnerships have already been drawn up for example, between Abridge / Epic and Suki / Meditech. Partnerships with large system players will be a priority to enable these solutions to scale. Smaller vendors will likely be deprioritised as too niche or lack scale (i.e. too long-tail).

One potential strategy for vendors is to build a competitive advantage by developing additional features. One key opportunity I see emerging is personalised outputs, allowing clinicians to specify their preferred style with "degree of detail" and formatting options.

Further developments would include expanding coverage to additional specialities (most scribes are designed for primary care), automate orders, referrals, coding & claims and eventually moving towards the regulated space of Clinical Decision Support.


Conclusion

Although preliminary evidence suggests that AI Scribes may help reduce administrative burden, more validation is required in order to determine if these solutions are practically useful once the first and second order effects of workflow disruption, task switching, confabulation/hallucinations and the limited-to-verbal modality of the scribes are accounted for.


-References-

1.https://www.businesswire.com/news/home/20220603005293/en/?While-Interoperability-and-Technology-Have-Made-Significant-Improvements-in-Healthcare-They-Will-Make-an-Even-Bigger-Impact-on-Care-if-Their-Full-Potential-Can-Be-Reached-Physicians-Say-According-to-a-New-athenahealth-Survey

2.https://www.accenture.com/us-en/insights/health/digital-adoption-healthcare-reaction-or-revolution

3.https://catalyst.nejm.org/doi/pdf/10.1056/CAT.23.0404

4.https://www.thieme-connect.com/products/ejournals/abstract/10.1055/a-2461-4576

5.https://cloud.google.com/discover/what-are-ai-hallucinations

6.https://openreview.net/pdf?id=6eMIzKFOpJ

7.https://www.racgp.org.au/running-a-practice/technology/business-technology/artificial-intelligence-ai-scribes

8.https://www.wired.com/story/hospitals-ai-transcription-tools-hallucination/

9.https://www.wilmerhale.com/en/insights/blogs/wilmerhale-privacy-and-cybersecurity-law/20241010-texas-attorney-ags-office-reaches-settlement-with-ai-company-over-deceptive-claims

10.Bell SK, Delbanco T, Elmore JG, Fitzgerald PS, Fossa A, Harcourt K, Leveille SG, Payne TH, Stametz RA, Walker J, DesRoches CM. Frequency and Types of Patient-Reported Errors in Electronic Health Record Ambulatory Care Notes. JAMA Netw Open. 2020 Jun 1;3(6):e205867. doi: 10.1001/jamanetworkopen.2020.5867. PMID: 32515797; PMCID: PMC7284300.

11.https://open.substack.com/pub/sergeiai/p/doctors-go-to-jail-engineers-dont

12.Lyell D, Coiera E. Automation bias and verification complexity: a systematic review. J Am Med Infomatics Assoc. 2017;24(2):423-31.

13.https://www.healthtechnerds.com/p/weekly-health-tech-reads-81824

-Disclaimers-

  • all views are personal and do not reflect any business or institution
  • not investment advice
  • try very hard not to make mistakes, but some may creep in, please let us know of any corrections

-Acknowledgements-

Dr.Josemund Menezes

Biopharma Venture Builder | Strategic leadership | SID member | Oncology, Immuno-Derm & Vaccines | Global Drug Development, Medical Affairs & Operations | GTM strategy & Launch | Keynote speaker

3 个月

I don’t know if “Everybody was KungFu fighting”; they surely moved on to AI scribing ??.. Good work on confabulation and collating the list ????

Ernest Lim

Computer Science PhD Candidate in AI Safety, Ophthalmologist, Science Director at Ufonia

3 个月

Great writeup - would be keen to know if there are 'multilingual' scribes out there - especially for Asian health systems where you may be cycling through multiple languages in one clinic!

回复
Jessica Francis

Empowering Allied Health with AI | Occupational Therapist | Founder & Innovator

3 个月

Thanks for a well-written article Dr Terence Tan, a really great description of AI scribes. I agree with some of the previous comments, for me personally as a community based occupational therapist, notes aren't my biggest pain point but for some of my colleagues they are. At the moment there are some gaps for us in allied health with scribes- the lack of ability to capture any hands-on component of sessions (without dictating this really awkwardly), capturing nuanced body language and non-verbal communication which can change the content of the conversation, and of course in community- unpredictable or unstable internet connection to run the scribe. I'm excited though for the possibilities and watching this space closely!

Richard Braman

Actual Real Human Intelligence - Metacare.ai , did:health - Healthcare AI/Digital Privacy , Security, & Ethics / HL7 FHIR W3C IETF Build3r

3 个月

And the vendors are king fu fighting

Mohit Sharma (Mike)

Operations Manager | Elevating Healthcare Standards | Empowering Healthcare Professionals

3 个月

I believe the perfect solution to the challenges posed by standalone AI scribes lies in combining AI with human expertise. This approach leverages AI for speed and efficiency while relying on human insight for context and accuracy. The result? Superior quality, streamlined workflows, and the best of both worlds.

要查看或添加评论,请登录

Dr Terence Tan的更多文章

  • Lessons from the evolution past: Cambrian Explosion & Extinction Events!

    Lessons from the evolution past: Cambrian Explosion & Extinction Events!

    ??I think we're at the start of a Cambrian Explosion in AI solutions for healthcare. Why?- because I'm seeing shades of…

    14 条评论
  • Carecam! Have you heard of them?

    Carecam! Have you heard of them?

    I've found a subset of startups, particularly in life sciences who are criminally under-represented in the wider view…

    3 条评论
  • Pittan! Have you heard of them?

    Pittan! Have you heard of them?

    I've found a subset of startups, particularly in life sciences who are criminally under-represented in the wider view…

    1 条评论
  • Interview with BluMaiden Biosciences

    Interview with BluMaiden Biosciences

    In this edition of Faces of Innovation, I speak to Damien Keogh , CEO of BluMaiden Biosciences, a Singapore…

    1 条评论
  • Amar Lab! Have you heard of them?

    Amar Lab! Have you heard of them?

    I've found a subset of startups, particularly in life sciences who are criminally under-represented in the wider view…

    2 条评论
  • Triphasic! Have you heard of them?

    Triphasic! Have you heard of them?

    I've found a subset of startups, particularly in life sciences who are criminally under-represented in the wider view…

    2 条评论
  • Human Health! Have you heard of them?

    Human Health! Have you heard of them?

    I've found a subset of startups, particularly in life sciences who are criminally under-represented in the wider view…

  • Z-Waka! Have you heard of them?

    Z-Waka! Have you heard of them?

    I've found a subset of startups, particularly in life sciences who are criminally under-represented in the wider view…

    4 条评论
  • Kinexcs! Have you heard of them?

    Kinexcs! Have you heard of them?

    I've found a subset of startups, particularly in life sciences who are under-represented in the wider view. They often…

    5 条评论
  • Interview with NalaGenetics

    Interview with NalaGenetics

    In this edition of Faces of Innovation, I speak to the good folks at NalaGenetics, a genomics company that is…

社区洞察

其他会员也浏览了