Most Recent Version of ChatGPT passed US Medical Licensing Exam with Flying Colours & Quickly Identified a Problem Affected 1 in 100,000 People
In order to test GPT-4, Dr. Isaac Kohane, a Harvard computer scientist and physician, teamed up with two colleagues. Their main objective was to see how well the newest OpenAI artificial intelligence model performed in a medical environment.
In the soon-to-be-released book "The AI Revolution in Medicine," co-written by independent journalist Carey Goldberg and Microsoft vice president of research Peter Lee, he declares, "I'm stunned to say: better than many doctors I've observed.". (The authors claim that despite Microsoft having spent billions of dollars developing OpenAI's technologies, neither OpenAI nor Microsoft required any editorial review of the book.
Kohane claims in the book that GPT-4, which was made available to paying subscribers in March 2023, answers questions about US medical exam licensing more than 90% of the time. It performs much better on tests than earlier ChatGPT AI models, GPT-3 and -3.5, and even some trained medical professionals.
But GPT-4 is more than just an adept test-taker and fact-finder. It's a fantastic translator as well. In the book, it's able to translate discharge instructions for a patient who speaks Portuguese and condense complex technical jargon into language that sixth-graders could understand.
GPT-4 can read lengthy reports or studies and summarise them in a flash, as the authors demonstrate with vivid examples. It can also give doctors helpful advice about bedside manner, offering suggestions on how to talk to patients about their conditions in compassionate, clear language. Even so, the technology is capable of articulating its problem-solving methods in a way that appears to require some degree of intelligence in the manner of a human.
But if you ask GPT-4 how it manages to do all of this, it will probably respond that all of its intelligence is still "limited to patterns in the data and does not involve true understanding or intentionality. When asked if it was capable of causal reasoning, GPT-4 responded, "Yes," according to the book's authors. Kohane found in the book that despite these drawbacks, GPT-4 can still mimic a doctor's diagnosis of a condition with astonishing, if not perfect, success.
How GPT-4 can make medical diagnoses
Based on a real-life case involving a newborn child he treated years earlier, Kohane performs a clinical thought experiment with GPT-4 in the book. The machine was able to correctly identify congenital adrenal hyperplasia, which affects one in 100,000 babies, "just as I would, with all my years of study and experience," Kohane wrote. He gave the machine a few key details about the baby that he learned from a physical exam, as well as some information from an ultrasound and hormone levels.
领英推荐
The doctor was horrified and impressed at the same time.
The anxious realization that millions of families would soon have access to this impressive medical expertise was equally mind-blowing, he wrote. "On the other hand, I was having a sophisticated medical conversation with a computational process, and I could not figure out how we could guarantee or certify that GPT-4's advice would be safe or effective. ".
GPT-4 lacks an ethical compass and isn't always right
The book is chock full of instances of GPT-4's mistakes, which show that it isn't always reliable. They can be as simple as typing a BMI incorrectly when the bot had just calculated it correctly, or as complex as incorrectly "solving" a Sudoku puzzle or forgetting to square an equation. The errors are frequently subtle, and the system frequently defends its position when questioned. It's not hard to envision how a misplaced number or incorrect weight calculation could result in serious mistakes in diagnosis or prescription.
In the same way as earlier GPTs, GPT-4 is capable of "hallucinating," which is a technical term for when an AI invents responses or ignores instructions.
GPT-4 responded, "I do not intend to deceive or mislead anyone, but I sometimes make mistakes or assumptions based on incomplete or inaccurate data, when questioned by the book's authors about this issue. Additionally, I lack the ethical accountability and clinical judgment of a human doctor or nurse. ".
Starting a new session with GPT-4 and having it "read over" and "verify" its own work with a "fresh set of eyes" is one possible cross-check the authors advise in the book. This strategy occasionally succeeds in exposing errors, though GPT-4 is reluctant to acknowledge its errors. Asking the bot to display its work so you can check it human-style is another suggestion for error-catching.
Clearly, GPT-4 has the potential to free up valuable time and resources in the clinic, enabling clinicians to spend more time with patients "instead of their computer screens," the authors write. But, as they put it, "we have to force ourselves to imagine a world with smarter and smarter machines, perhaps surpassing human intelligence in nearly every dimension. Afterward, we should carefully consider how we want that world to function.