ChatGPT Outperforms All Human Doctors In Stanford Study
Margaretta Colangelo
Leading AI Analyst | Speaker | Writer | AI Newsletter 57,000+ subscribers
In a recent study at Stanford University, ChatGPT outperformed all human doctors when assessing medical case histories. The objective of the study was to assess if using LLMs could improve diagnostic reasoning performance among physicians in family medicine, internal medicine, or emergency medicine compared with conventional resources. ChatGPT achieved an average score of 90% when diagnosing a medical condition from a case report and explaining its reasoning. The study entitled Large Language Model Influence on Diagnostic Reasoning, A Randomized Clinical Trial, was published in JAMA on October 28, 2024.
Highlights
“They were treating ChatGPT like a search engine for directed questions: ‘Is cirrhosis a risk factor for cancer? What are possible diagnoses for eye pain? Only a fraction of the doctors realized they could literally copy-paste in the entire case history into the chatbot and just ask it to give a comprehensive answer to the entire question. Only a fraction of doctors actually saw the surprisingly smart and comprehensive answers the chatbot was capable of producing.”
Jonathan Chen, study author, physician data scientist, Stanford
Study Overview
“The doctors didn't listen to AI when AI told them things they didn’t agree with.”
Adam Rodman, study author, Beth Israel Deaconess Medical Center
Study Results
领英推荐
"Provocative result we did NOT expect. We fully expected the Doctor + GPT4 arm to do better than Doctor + "conventional" Internet resources. Flies in the face of the Fundamental Theorem of Informatics (Human + Computer is Better than Either Alone)."
Jonathan Chen, study author, physician data scientist, Stanford
Conclusions
References
Large Language Model Influence on Diagnostic Reasoning, A Randomized Clinical Trial, JAMA Network Open, October 28, 2024. doi:10.1001/jamanetworkopen.2024.40969
Authors: Ethan?Goh,?Robert?Gallo,?Jason?Hom,?Eric?Strong, Hannah Kerman, Joséphine?Cool, Zahir?Kanjee,?Andrew S.?Parsons,?Neera?Ahuja,?Eric?Horvitz, Yingjie?Weng,?Daniel?Yang,?Arnold?Milstein,?Andrew Olson, Adam?Rodman, Jonathan H.?Chen
Subscribe, Comment, Join Group
I'm interested in your feedback - please leave your comments.
To subscribe to the AI in Healthcare Milestones newsletter click here.
To join the AI in Healthcare Milestones Group click here.
Copyright ? 2024 Margaretta Colangelo. All Rights Reserved.
This article was written by Margaretta Colangelo. Margaretta is a leading AI analyst who tracks significant milestones in AI in healthcare. She consults with AI healthcare companies and writes about some of the companies she consults with. Margaretta serves on the advisory board of the AI Precision Health Institute at the University of Hawai?i?Cancer Center @realmargaretta
Senior Frontend Developer
2 个月yes absolutely chatgpt veya low code türevlerini efektif ?ekilde kullanmak performans artt?r?r. Yaz?l?m ekiplerinin (20-30 ki?ilik) aylarca ?al???p belki yapamayaca?? i?leri tek ba??na yapt?rtabiliyorsunuz.
Health 4.0 Architect | AI & Healthcare Policy Leader | Independent Board Director | Board Certified Corporate Executive Surgeon - AI, Obesity & Oncology | Private Family Office | US Army Veteran
3 个月Just received a “survey” from the AMA. As I made my way through it, it felt like the bias against AI was presented in a way that would lead to very cautionary conclusions about AI. It felt self serving.
Co-Founder @ Centre of Bioinformatics Research and Technology | specializing in Bioinformatics
3 个月Impressive update! Shows how AI can transform healthcare when used effectively.?
Coach y Mentor Ejecutivo de Lideres
3 个月Thanks for sharing Margaretta Colangelo
Founder & Principal | HealthTech Executive, Radiology, Medical Imaging & Devices, Strategic Innovations, AI, Regulatory
3 个月Thanks Margaretta for sharing this study. Over the past 20 years, during the many studies we conducted, Physician + Assisted tool was shown to be more effective than Physician alone when the physician was trained on how to utilize and interpret the information that the tool shared. Most times, the Physician + tool showed a significant improvement. The training covered also the type of false positive that the device could produce. As noted, the current study succeeded amongst the various endpoints, in demonstrating that in order effectiveness and efficiency require some training on using the tool. Expectation perhaps was that given LLM capabilities, user would be able to take advantage "out of the box". It would be great to see a follow up study(ies), where physicians are trained to use the tool properly and effectively and perhaps exploring different diagnostic paradigms, such as reader first vs. reader second, and compared to standalone.