GPT4 Turbo with Vision on clinical images

GPT4 Turbo with Vision on clinical images

It has been barely 24 hours and OpenAI's servers are overloaded (in some cases unreachable) due to the rush to test this new powerful model - GPT4 Turbo with Vision.

Here is how the model performed on the initial testing on limited clinical test cases.

Spoiler alert: It is insanely powerful but has some limitations.

Case 1: Asking it to read an image with generations of CAR-T cell therapies.

https://www.frontiersin.org/articles/10.3389/fimmu.2022.927153/full

Before we go to the answer, this is a very hard image to read due to three key reasons

1. the domain-specific terminology

2. lack of image caption or description text

3. the words like CAR and TRUCK mean something entirely different here


For benchmarking, LlaVa another powerful multimodal model provided this answer (with just base config):

hallucination of open model

LlaVa is arguably considered one of the most powerful open-access multimodal models. In other words, a direct open-access competitor to GPT4V.

However, the new OpenAI model aced it with the following response:

Case 2a: Asking it to read a clinical workflow and provide suggestions. (Basic)

https://www.researchgate.net/publication/315188390_Clinical_development_of_anti-CD19_chimeric_antigen_receptor_T-cell_therapy_for_B-cell_non-Hodgkin_lymphoma


Prompt: My patient is diagnosed with CRS with ≥40% O2. As per this workflow which treatment should I try next?

GPT4 Turbo with Vision response:


Case 2a: Asking it to read a clinical workflow and provide suggestions. (Advanced)

https://link.springer.com/chapter/10.1007/978-3-030-94353-0_26

Prompt: After I administered Toci on a patient for a grade 2 CRS onset, his condition is not improving. What are my options based on this workflow?

GPT4 Turbo with Vision response:

Case 3a: Asking it to explain the survival curve without any hints.

https://onlinelibrary.wiley.com/doi/full/10.1002/cam4.5067


Prompt: In this survival curve can you calculate median survival for each colored line?

GPT4 Turbo with Vision response:

Case 3b: Asking it to explain the survival curve with hints (image text).

https://onlinelibrary.wiley.com/doi/full/10.1002/cam4.5067

Prompt: Can you use the table at the bottom to calculate the median survival for each colored line?

GPT4 Turbo with Vision response:

Ahh, perhaps we have reached the breaking point for now. Or perhaps adding a math agent would fix the issue seen above.

OpenAI did claim that they plan to release a stable production-ready model in the coming weeks.

Closing thoughts & and general comments

  1. Although it is considered a foundational model, the performance on domain-specific tasks might be already superior to the current domain-specific fine-tuned models.
  2. With the introduction of enterprise products (which don't use your private data) and copyright shield (which protects you against lawsuits), it might be faster, cheaper, and safer to be an OpenAI customer than other alternatives. Although it comes with an extreme level of external dependency and a lack of in-house core AI innovation.
  3. With the introduction of "Custom Models" programs and GPTs, the OpenAI team is targeting a new powerful user group that can help create domain-specific models and agents. This is likely to further reduce their competitor's market share.
  4. It is also a scary trend that is moving toward a world where most companies will not have the resources to build their own models, resulting in AI Research monopolies. And perhaps the only practical choice the companies out there have is to choose a vendor/partner from these big players.
  5. There is a huge, vibrant open-source community in generative AI, building amazing libraries and learning resources. The GPT4V does highlight there is a lot of catching up to do. And it is not going to be easy - especially arranging the computing power to compete with the GPT-like models.


Feel free to share your thoughts and interesting test cases!

要查看或添加评论,请登录

Akshay Chougule的更多文章

社区洞察

其他会员也浏览了