登录查看更多内容

Could GPT-3 pass the Spanish Medical exam?

Julian Isla

发布日期: 2023年1月23日

Could GPT-3 pass the MIR? (Spanish medical test for new doctors)

It's a question all of us in AI and medicine have asked ourselves at one time or another. Well, I have tried it.

As you know, this past Saturday the applicants for public positions as resident internal medical interns in the national health system had to take the dreaded MIR exam. There are 200 questions on any medical subject that a recent graduate should know. Now that GPT-3 is so fashionable thanks to ChatGPT the question everyone asked me was, would I be able to answer these questions correctly?

I took the 14 neurology questions that Mariano Ruiz posted yesterday on Twitter, Mariano is a neurologist and a professor at an academy that prepares these tests. The tweet is here:

To do the tests I have taken the questions that the Ministry of Health has already published.?

I have translated them into English with an automatic translator based on machine translation.

And I have copied the question in the GPT-3 playground with a quite simple prompt, in this case it has been something as simple as : "Select the right answer from the above medical test:". As you can see it is a very simple prompt, very simple and with little engineering, I am sure it can be much more elaborate.

领英推荐

Leverage Technology to Boost Clinical Documentation…

AGS Health 1 年前

Differences Between RAG and Fine Tuning

XenonStack 1 年前

Commercialisation of Research is our raison d'etre

Trinity Innovation KnEx Knowledge Exchange 7 个月前

The surprise is that out of the 14 questions GPT-3 provides the correct answer in 11 of them. It may seem a modest result but to me it looks like a promising result, let's see why:

The MIR questions are not particularly simple from a linguistic point of view. There are some that are formulated in reverse and you have to look for the negative answer.
For example, in a question on epilepsy, epileptic seizures are called comitial seizures, a very old term that, as I learned from Dr. Angel Aledo, comes from the time when the Romans used to hold elections. This is weird terminology for a language model but GPT-3 looks very resilient to these old terms.
Some questions are supported by images, in this case I have skipped this information, I have only used text.
In the questions where you have to complete, for example suggest a diagnostic test the GPT-3 model seems to perform very well. No surprise, the G of GPT comes from its generative feature. It is a model that does not understand concepts, it predicts the next word in a sequence, if what we ask it to do is to predict, it is more likely to answer well.
I have translated the questions into English. Why? Although GPT-3 works reasonably well in Spanish it is more accurate in English, at least for medical use. This is reasonable since the training dataset is mostly in English.

GPT-3 still has some way to go to perform at the level of a qualified professional, but this high number of hits is a sign that something is changing. Yesterday I read that when doctors do the MIR is when they know the most about medicine (knowing in a sense of accumulation of concepts). Then clinical practice adds experience, which is fundamental. My reflection is whether it makes sense to teach medicine the same way it was done 100 years ago, with a system that rewards the learning of concepts.

In my experience with medicine and having many medical friends, I have already seen that the best doctors for me are those who have skills that they have not learned in school, such as empathy, the ability to recognize ignorance and the constant desire to learn new ways of doing medicine.

Think about one thing, how many medical errors occur every day in the world that could be avoided if doctors had better tools to help them make decisions? I still don't understand how doctors make diagnostic and treatment decisions every day without any help, would you like to fly with a pilot who uses no instruments and flies the plane with a map, a compass and his good eyesight?

GPT-3 and the massive language models are proving to be fabulous tools to bring us face to face with our own inconsistencies.

If you want to know more about the work we are doing at Foundation 29 with large language model for diagnosis join the newsletter at DxGPT

Luis Gonzalez Garcia

2 年

Excelente contribución Julian Isla. Totalmente de acuerdo en la necesidad de que los médicos se apoyen en herramientas informáticas como estas para ser más certeros en sus diagnósticos

1 次回应

Sacha Arozarena

2 年

Could ChatGPT pass the MBA exam? https://www.nbcnews.com/tech/tech-news/chatgpt-passes-mba-exam-wharton-professor-rcna67036

1 次回应

Miguel Rodríguez Rubio, MD

Bridging science and purpose to transform complex data into clear, actionable insights that improve patient care, empower healthcare decisions & deliver meaningful outcomes

2 年

These are promising results for GPT-3 but, to me, the key takeaway is that the MIR exam is not a very good way of assessing who is a good doctor and who isn’t. As you’ve mentioned in your post, there are many skills it doesn’t evaluate that are critical in being a good doctor (e.g. emotional skills, practical skills or critical thinking…).

1 次回应

Miguel Raimundo Arana Martin

Leading diverse & multinational teams to get results by executing on robust initiative-led approach

2 年

Amén!, and Stiller scraching thr surface of what is possible, as with all new things, mindset to accept is fundamental, reading the article i remember when Doctors use not to wash their hands an got angy toward leading colleages who suggested to do so... Thanks Julian for pushing the boundaries

查看更多评论

要查看或添加评论，请登录

Julian Isla的更多文章

Mi resumen del Ennova Health Day

2024年11月20日

Mi resumen del Ennova Health Day

Ayer estuvimos en el #Ennova Health Day organizado por Diario Médico & Correo Farmacéutico. Tal como prometí grabé la…
AI shaping the future of healthcare

2024年11月8日

AI shaping the future of healthcare

I've been invited to this workshop organized by European Medicines Agency and HMA about the use of AI in medicine. Make…

2 条评论
El master de Sergio

2023年9月15日

El master de Sergio

?Qué hace una foto de un ni?o en LinkedIn? Supuestamente Linkedin es una red profesional. Todo lo que hablamos aquí es…

78 条评论
Sobre cuidar

2023年9月8日

Sobre cuidar

He tenido la suerte de que la Universidad Internacional Menéndez Pelayo me considere un profesor y me invite a sus…

28 条评论
Harari could be more dangerous than AI

2023年5月15日

Harari could be more dangerous than AI

Consider this paragraph: “In the battle against disasters such as AIDS and Ebola, the balance is increasingly tipping…

2 条评论
Bing Chat achieves top score in Spanish medical examination (MIR)

2023年3月1日

Bing Chat achieves top score in Spanish medical examination (MIR)

How would Bing Chat do the Spanish medical exam? We are currently experiencing a unique period in the field of…
The race for the COVID-19 vaccine

2020年12月21日

The race for the COVID-19 vaccine

(Amazing picture from u/junobee at Reddit, this is a real Moderna vaccine vial) These days I have been looking forward…

1 条评论
No es la COVID-19. Son los datos, ?estúpido!

2020年4月19日

No es la COVID-19. Son los datos, ?estúpido!

En la campa?a de 1992 de Bill Clinton contra George W. Bush (el padre) había un claro favorito.
10 women every day

2020年3月9日

10 women every day

This is the number of women who disappear in Mexico every single day. 10 women every day.

1 条评论
Proteins descending the Everest

2020年2月18日

Proteins descending the Everest

Google is demonstrating again how an approach based on machine learning (deep learning in particular) can solve…

See all articles

Could GPT-3 pass the Spanish Medical exam?

Julian Isla

领英推荐

Julian Isla的更多文章

社区洞察

其他会员也浏览了

Transforming data into knowledge from 4.26 million medical images by VinBrain. Discover Now!

Navigating Risks Associated with Unreliable AI & Trustworthiness in LLMs

How can NLP be used in healthcare for medical records?

How can NLP be used in healthcare for medical records?

Why It Is Important To Understand Multimodal Large Language Models In Healthcare?

NLP to Improve Clinical Trials and Patient Safety

Why Human Intelligence Remains Essential to Artificial Intelligence

3 Key Takeaways from DIA’s RSIDM Forum 2023

AI in Science: How Robots Are Revolutionizing Research

Chatbots and AI systems in modern medicine – A double-edged sword!

领英推荐

Julian Isla的更多文章

Mi resumen del Ennova Health Day

AI shaping the future of healthcare

El master de Sergio

Sobre cuidar

Harari could be more dangerous than AI

Bing Chat achieves top score in Spanish medical examination (MIR)

The race for the COVID-19 vaccine

No es la COVID-19. Son los datos, ?estúpido!

10 women every day

Proteins descending the Everest

社区洞察

其他会员也浏览了

Transforming data into knowledge from 4.26 million medical images by VinBrain. Discover Now!

Navigating Risks Associated with Unreliable AI & Trustworthiness in LLMs

How can NLP be used in healthcare for medical records?

How can NLP be used in healthcare for medical records?

Why It Is Important To Understand Multimodal Large Language Models In Healthcare?

NLP to Improve Clinical Trials and Patient Safety

Why Human Intelligence Remains Essential to Artificial Intelligence

3 Key Takeaways from DIA’s RSIDM Forum 2023

AI in Science: How Robots Are Revolutionizing Research

Chatbots and AI systems in modern medicine – A double-edged sword!