Selling The New o1 'Strawberry' to Healthcare Is a Crime, No Matter What OpenAI Says ??
Note: The numerical values and diagnostic outcomes displayed on this chart are hypothetical.

Selling The New o1 'Strawberry' to Healthcare Is a Crime, No Matter What OpenAI Says ??

I’m honored that this article will also appear in the next edition of Artificial Intelligence Made Simple, a renowned Substack publication that covers everything you need to know about hot topics in AI and the intricacies of machine learning models, written by the respected expert and star author, Devansh. I encourage all of you to visit and subscribe to Artificial Intelligence Made Simple.


4o is confidently misdiagnosing, while o1 is not only confidently misdiagnosing but also cleverly rationalizing the error.


Over the past few days, I’ve been exploring the diagnostic capabilities of the new o1 ‘Strawberry’ model.?


My verdict: even the best large language models (LLMs) we have are not ready for prime time in healthcare.


The o1 ‘Strawberry’ model hasn’t just inherited the habit of confidently lying from its predecessor, the 4o model, when it misdiagnoses. Unlike the 4o model, it now rationalizes these lies. Whether you call them hallucinations or use some other data science jargon, in the context of clinical decision-making, this is simply wrong information. With the help of the Strawberry ??, these errors are now perfectly rationalized. When lives are at stake, this is not just a problem — it’s downright dangerous.

I'm not surprised, to be honest. At yesterday's Senate hearing before the Subcommittee on Privacy, Technology, and the Law, OpenAI whistleblower William Saunders testified that the company has "repeatedly prioritized speed of deployment over rigor." While Saunders made this statement specifically regarding the development of Artificial General Intelligence (AGI), his testimony hinted at a broader culture within OpenAI, especially among its executives. (Source: C-SPAN.)

Since its inception, OpenAI has embraced a 'Lean Startup' culture—quickly developing an MVP (Minimum Viable Product), launching it into the market, and hoping something "sticks." The benefit of this approach is clear: it's cost-effective, saves time, and provides invaluable feedback from real customers. However, the downside is just as significant. By releasing a 'half-baked' product, you risk disappointing a large group of customers—many of whom may never come back.


This might be a viable strategy for a fledgling fintech startup, but it's reckless and dangerous for a company like OpenAI, especially when they now promote applications in critical areas like medical diagnostics.

But let’s start from the beginning…

Here’s a sneak peek at the roadmap for the article:

1. OpenAI o1 ‘Strawberry’ Launch ??

2. OpenAI Claims Its New o1 ‘Strawberry’ Model Can Handle Medical Diagnosis. Not a Chance! ??

3. The Sweetness of the Strawberry – Some People Love It ??

3.1. AgentClinic-MedQA Claims Strawberry is the Top Choice for Medical Diagnostics

3.2. AgentClinic-MedQA Faces a Benchmark Issue

3.3. Healthcare Applications of o1 ‘Strawberry’ by Oscar Health

4. The Sour Side of the Strawberry – OpenAI Hallucinates in Its Own Cherry-Picked Use Case ??

5. Clinicians to OpenAI: Show Us the Probabilities or Stop Lying to Us!

6. Conclusion



Continue reading at sergeiAI.substack.com...

Thanks for sharing your insights on the o1 'Strawberry' model's capabilities. You might also consider secondopinionfromai com for its thorough analysis of medical literature and guidelines.

回复
Chris Wixon, MD

Physicist | Vascular Surgeon | Health Tech Enthusiast | Entrepreneur

2 个月

Agree Sergei Polevikov, ABD, MBA, MS, MA ???????? LLM knowledge is currently only equivalent to 3rd or 4th year medical student. No one would depend on a 3rd year medical student’s opinion (except their own mothers!!) LLM’s are presently functioning very well as scribes, quietly sitting in the corner , listening to the convo , and recording it in the medical record using the newly learned vernacular. But they are gaining knowledge. Part of the process is making educated guesses , and then seeking expert feedback. LLM’s are poised for this level of SUPERVISED training. Having the physician in the loop will be the mechanism for this. Ironically, the winners in this race will be the systems with the most user-friendly user interface, not the system with the most robust underlying model. Once again success will come down to being Available, Affable and Able

Wen Profiri

AI Tooling Medicare: Dementia | Caregiver Support | Health Workers Training

2 个月

I have been testing and comparing the 4 models on chatgpt.com. I don't see much difference. I think the trick is back to the art of prompting. I use Ethan Mollick's role play and step wise methods. I'm also producing the stepped "FAQs" to guide users.

Stefan Bauschard

AI Education Policy Consultant

2 个月

I don't think anyone is going to use a language model w/o any fine tuning or training on a medical data set for much. And I don't think anyone will use it for review. These are helpful, however. The old ChatGPT3.5 diagnosed my son's condition in 1 minute. It took 10+ doctors a year to figure it out.

Sergey Mastitsky

Proudly making sense of data | Born @336.14 PPM CO2 - now 422.66 PPM

2 个月

Not too mention it’s several times more expensive. Not sure who in their right mind would pay more to receive something that is not guaranteed to be free of garbage.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了