登录查看更多内容

Selling The New o1 'Strawberry' to Healthcare Is a Crime, No Matter What OpenAI Says ??

Sergei Polevikov, ABD, MBA, MS, MA ????????

2023 Elsevier Author of "Advancing AI in Healthcare: A Comprehensive Review of Best Practices"

发布日期: 2024年9月18日

I’m honored that this article will also appear in the next edition of Artificial Intelligence Made Simple, a renowned Substack publication that covers everything you need to know about hot topics in AI and the intricacies of machine learning models, written by the respected expert and star author, Devansh. I encourage all of you to visit and subscribe to Artificial Intelligence Made Simple.

4o is confidently misdiagnosing, while o1 is not only confidently misdiagnosing but also cleverly rationalizing the error.

Over the past few days, I’ve been exploring the diagnostic capabilities of the new o1 ‘Strawberry’ model.?

My verdict: even the best large language models (LLMs) we have are not ready for prime time in healthcare.

The o1 ‘Strawberry’ model hasn’t just inherited the habit of confidently lying from its predecessor, the 4o model, when it misdiagnoses. Unlike the 4o model, it now rationalizes these lies. Whether you call them hallucinations or use some other data science jargon, in the context of clinical decision-making, this is simply wrong information. With the help of the Strawberry ??, these errors are now perfectly rationalized. When lives are at stake, this is not just a problem — it’s downright dangerous.

I'm not surprised, to be honest. At yesterday's Senate hearing before the Subcommittee on Privacy, Technology, and the Law, OpenAI whistleblower William Saunders testified that the company has "repeatedly prioritized speed of deployment over rigor." While Saunders made this statement specifically regarding the development of Artificial General Intelligence (AGI), his testimony hinted at a broader culture within OpenAI, especially among its executives. (Source: C-SPAN.)

Since its inception, OpenAI has embraced a 'Lean Startup' culture—quickly developing an MVP (Minimum Viable Product), launching it into the market, and hoping something "sticks." The benefit of this approach is clear: it's cost-effective, saves time, and provides invaluable feedback from real customers. However, the downside is just as significant. By releasing a 'half-baked' product, you risk disappointing a large group of customers—many of whom may never come back.

This might be a viable strategy for a fledgling fintech startup, but it's reckless and dangerous for a company like OpenAI, especially when they now promote applications in critical areas like medical diagnostics.

Fast Company 9 个月前

The Sam Altman saga reveals the need for AI…

Don Tapscott 10 个月前

DigitalStormWeekly#6

Paul Storm 8 个月前

But let’s start from the beginning…

Here’s a sneak peek at the roadmap for the article:

1. OpenAI o1 ‘Strawberry’ Launch ??

2. OpenAI Claims Its New o1 ‘Strawberry’ Model Can Handle Medical Diagnosis. Not a Chance! ??

3. The Sweetness of the Strawberry – Some People Love It ??

3.1. AgentClinic-MedQA Claims Strawberry is the Top Choice for Medical Diagnostics

3.2. AgentClinic-MedQA Faces a Benchmark Issue

3.3. Healthcare Applications of o1 ‘Strawberry’ by Oscar Health

4. The Sour Side of the Strawberry – OpenAI Hallucinates in Its Own Cherry-Picked Use Case ??

5. Clinicians to OpenAI: Show Us the Probabilities or Stop Lying to Us!

6. Conclusion

Continue reading at sergeiAI.substack.com...

AI Health Uncut

3,917 位关注者

Kenan Causevic

freelancer

2 个月

Thanks for sharing your insights on the o1 'Strawberry' model's capabilities. You might also consider secondopinionfromai com for its thorough analysis of medical literature and guidelines.

Chris Wixon, MD

Physicist | Vascular Surgeon | Health Tech Enthusiast | Entrepreneur

2 个月

Agree Sergei Polevikov, ABD, MBA, MS, MA ???????? LLM knowledge is currently only equivalent to 3rd or 4th year medical student. No one would depend on a 3rd year medical student’s opinion (except their own mothers!!) LLM’s are presently functioning very well as scribes, quietly sitting in the corner , listening to the convo , and recording it in the medical record using the newly learned vernacular. But they are gaining knowledge. Part of the process is making educated guesses , and then seeking expert feedback. LLM’s are poised for this level of SUPERVISED training. Having the physician in the loop will be the mechanism for this. Ironically, the winners in this race will be the systems with the most user-friendly user interface, not the system with the most robust underlying model. Once again success will come down to being Available, Affable and Able

2 次回应

Wen Profiri

AI Tooling Medicare: Dementia | Caregiver Support | Health Workers Training

2 个月

I have been testing and comparing the 4 models on chatgpt.com. I don't see much difference. I think the trick is back to the art of prompting. I use Ethan Mollick's role play and step wise methods. I'm also producing the stepped "FAQs" to guide users.

2 次回应

Stefan Bauschard

AI Education Policy Consultant

2 个月

I don't think anyone is going to use a language model w/o any fine tuning or training on a medical data set for much. And I don't think anyone will use it for review. These are helpful, however. The old ChatGPT3.5 diagnosed my son's condition in 1 minute. It took 10+ doctors a year to figure it out.

2 次回应

Sergey Mastitsky

Proudly making sense of data | Born @336.14 PPM CO2 - now 422.66 PPM

2 个月

Not too mention it’s several times more expensive. Not sure who in their right mind would pay more to receive something that is not guaranteed to be free of garbage.

2 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Selling The New o1 'Strawberry' to Healthcare Is a Crime, No Matter What OpenAI Says ??

Sergei Polevikov, ABD, MBA, MS, MA ????????

2023 Elsevier Author of "Advancing AI in Healthcare: A Comprehensive Review of Best Practices"

领英推荐

AI Health Uncut

3,917 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Key AI Insights: From Regulatory Criticisms to Secret Tech Access & Leadership Changes

???The risks behind OpenAI's Voice Engine

The Algorithmic State: How Artificial Intelligence is Transforming Governance

As critics circle, Sam Altman hits the road to hype OpenAI

AI Weekly Digest - September 30 2024

OpenAI Saga, Inflection-2, Kai-Fu Lee's Rise & Upcoming LLMs ??

AI Weekly Digest - October 7 2024

Gary Bolles’ Next Newsletter: OpenAI and the End of Silicon Valley Exceptionalism

Is Q: The Next Big Thing?

Welcome to the Responsible AI Weekly Rewind September 30th Edition

领英推荐

AI Health Uncut

3,917 位关注者

The "AI Outperforms Doctors" Claim Is False, Despite NYT Story - A Rebuttal (Part 2 of 5)

2024年11月21日

?? Catch Me at the "Coachella of the Southwest" in Austin on March 8, 2025 ??

2024年11月18日

FDA Commissioner: “We Don’t Validate AI Tools”

2024年11月13日

Sexism in Venture Capital: Why 'Penis' Is OK, But 'Vagina' Is Taboo

2024年11月5日

German Doktors Don’t Go to Jail. German Engineers Do.

2024年10月31日

15 Health AI Liars Exposed—Including One That Just Raised $70M at a $0.5B Valuation (Part 2 of 2)

2024年10月29日

The AI Bubble? It's Really a Startup Bubble, and It's Popping Big Time

2024年10月22日

The '??machines will replace doctors??????' debate turns 70! From Warner Slack to Geoffrey Hinton. (Part 1 of 4)

2024年10月16日

These 14 Health AI Companies Have Been Lying About What Their AI Can Do (Part 1 of 2)

2024年10月2日

Doctors Go to Jail. Engineers Don’t.

2024年9月9日

社区洞察

其他会员也浏览了

Key AI Insights: From Regulatory Criticisms to Secret Tech Access & Leadership Changes

???The risks behind OpenAI's Voice Engine

The Algorithmic State: How Artificial Intelligence is Transforming Governance

As critics circle, Sam Altman hits the road to hype OpenAI

AI Weekly Digest - September 30 2024

OpenAI Saga, Inflection-2, Kai-Fu Lee's Rise & Upcoming LLMs ??

AI Weekly Digest - October 7 2024

Gary Bolles’ Next Newsletter: OpenAI and the End of Silicon Valley Exceptionalism

Is Q: The Next Big Thing?

Welcome to the Responsible AI Weekly Rewind September 30th Edition