GPT-4 Significantly Exceeds Doctors In Assessing Eye Problems

GPT-4 Significantly Exceeds Doctors In Assessing Eye Problems

"We could realistically deploy AI in triaging patients with eye issues to decide which patients need to be seen by a specialist immediately."
Dr. Arun Thirunavukarasu, Lead Author, Academic Foundation Doctor at?Oxford University Hospitals NHS?Foundation Trust

This week researchers at University of Cambridge School of Clinical Medicine and Oxford University Clinical Academic Graduate School published a paper demonstrating that AI is much better than non-specialist doctors at assessing eye problems. Lead author Arun Thirunavukarasu, PhD, says that large language models (LLMs) have the potential to improve healthcare by triaging patients with eye problems and deciding which patients need immediate attention. LLMs have the potential to improve healthcare as part of the clinical workflow and could be useful for providing diagnosis and advice in places with limited access to specialist healthcare. The paper entitled "Large language models approach expert-level clinical knowledge and reasoning in ophthalmology: A head-to-head cross-sectional study" was published in the journal PLOS Digital Health.

Study Highlights

  1. GPT-4 was tested against doctors at different stages in their careers, including unspecialized junior doctors, trainee doctors, expert eye doctors.
  2. Each was presented with a series of 87 patient scenarios involving a specific eye problem and asked to give a diagnosis or advice on treatment by selecting from four options.
  3. GPT-4 scored significantly better on the test than unspecialized junior doctors, who are comparable to general practitioners in their level of specialist eye knowledge.
  4. GPT-4 gained similar scores to trainee and expert eye doctors, but the top performing doctors scored higher.
  5. GPT-4 was as good as expert clinicians at processing eye symptoms and signs to answer more complicated questions.
  6. With further development, LLMs could advise GPs who are struggling to get prompt advice from eye doctors.
  7. Large volumes of clinical text are needed to help fine-tune and develop these models, and work is ongoing around the world to facilitate this.
  8. This study is superior to previous studies because this study compared AI's abilities to abilities of practicing doctors, rather than to sets of examination results.
  9. The test included questions about a huge range of eye problems, including extreme light sensitivity, decreased vision, lesions, itchy and painful eyes.
  10. The test questions were extracted from a textbook used to test trainee eye doctors. This textbook is not freely available on the internet, making it unlikely that its content was included in GPT-4’s training datasets.

Results

Examination performance in the 87-question mock exam: The study tested GPT-4, GPT-3.5, PaLM2, LLaMA, expert ophthalmologists (E1-E5), ophthalmology trainees (T1-T3), and unspecialized junior doctors (J1-J2) with the same set of questions. GPT-4 gave more accurate responses than all of the LLMs.

  • Expert ophthalmologists (76%)
  • GPT-4 (69%)
  • Ophthalmology trainees (69%)
  • PaLM 2 (56%)
  • GPT-3.5 (48%)
  • Unspecialized junior doctors (43%)
  • LLaMA (32%)

2) Significantly greater performance for GPT-4 is highlighted green, significantly inferior performance for GPT-4 is highlighted orange. GPT-4 was superior to all other LLMs and unspecialized junior doctors, and equivalent to most expert ophthalmologists and all ophthalmology trainees.

3) Question subject and type distributions presented alongside scores attained by LLMs, expert ophthalmologists (E1-E5), ophthalmology trainees (T1-T3), and unspecialized junior doctors (J1-J2).

4) Agreement correlates strongly with overall performance and stratification analysis found no particular question type or subject was associated with better performance of LLMs or doctors, indicating that LLM knowledge and reasoning ability is general across ophthalmology rather than restricted to particular subspecialties or question types.

5) Accuracy and relevance of GPT-3.5 and GPT-4 in response to ophthalmological questions.

Reference

Thirunavukarasu, A J et al: ‘Large language models approach expert-level clinical knowledge and reasoning in ophthalmology: A head-to-head cross-sectional study.’ PLOS Digital Health, April 2024. DOI: 10.1371/journal.pdig.0000341

Subscribe, Comment, Join Group

I'm interested in your feedback - please leave your comments.

To subscribe to the AI in Healthcare Milestones newsletter click here.

To join the AI in Healthcare Milestones Group click here.

Copyright ? 2024 Margaretta Colangelo. All Rights Reserved.

This article was written by Margaretta Colangelo. Margaretta is a leading AI analyst who tracks significant milestones in AI in healthcare. She consults with AI healthcare companies and writes about some of the companies she consults with. Margaretta serves on the advisory board of the AI Precision Health Institute at the University of Hawai?i?Cancer Center @realmargaretta

Chaly Sen

Eh Me and and the Crew Now Too (International Space Station)

10 个月

Can Someone Help Me Clarify Why This is code region is Missing from the NCBI since this is suppose to be the HUMAN GENOME PROJECT of 1989?

  • 该图片无替代文字
回复
Ken S.

Territory Director at Acera Surgical, Inc

11 个月

It won’t stop here, either. Generative AI will completely transform the ability to diagnose and even make entire treatment protocols obsolete. Concerning for both medical professionals and big business but ultimately great for the patient.

Jess Spooner, JD, MSW, LSW, INHC

Trauma Healing Coach, Therapist, Social Work Professor, Business Consultant, Writer. I help people and organizations thrive.

11 个月

Wow thanks for sharing!

回复
Harsh Thaker

Vice Chair, Digital & Integrative Pathology; Vice Chair, Clinical Outreach; Director, Anatomic Pathology; Professor at The University of Texas Medical Branch

11 个月

Excellent summary of a very nice study. All of medicine is being disrupted at an exponential rate in front of our eyes.

要查看或添加评论,请登录

Margaretta Colangelo的更多文章

社区洞察

其他会员也浏览了