GPT4 Turbo with Vision on clinical images
It has been barely 24 hours and OpenAI's servers are overloaded (in some cases unreachable) due to the rush to test this new powerful model - GPT4 Turbo with Vision.
Here is how the model performed on the initial testing on limited clinical test cases.
Spoiler alert: It is insanely powerful but has some limitations.
Case 1: Asking it to read an image with generations of CAR-T cell therapies.
Before we go to the answer, this is a very hard image to read due to three key reasons
1. the domain-specific terminology
2. lack of image caption or description text
3. the words like CAR and TRUCK mean something entirely different here
For benchmarking, LlaVa another powerful multimodal model provided this answer (with just base config):
LlaVa is arguably considered one of the most powerful open-access multimodal models. In other words, a direct open-access competitor to GPT4V.
However, the new OpenAI model aced it with the following response:
Case 2a: Asking it to read a clinical workflow and provide suggestions. (Basic)
Prompt: My patient is diagnosed with CRS with ≥40% O2. As per this workflow which treatment should I try next?
GPT4 Turbo with Vision response:
领英推荐
Case 2a: Asking it to read a clinical workflow and provide suggestions. (Advanced)
Prompt: After I administered Toci on a patient for a grade 2 CRS onset, his condition is not improving. What are my options based on this workflow?
GPT4 Turbo with Vision response:
Case 3a: Asking it to explain the survival curve without any hints.
Prompt: In this survival curve can you calculate median survival for each colored line?
GPT4 Turbo with Vision response:
Case 3b: Asking it to explain the survival curve with hints (image text).
Prompt: Can you use the table at the bottom to calculate the median survival for each colored line?
GPT4 Turbo with Vision response:
Ahh, perhaps we have reached the breaking point for now. Or perhaps adding a math agent would fix the issue seen above.
OpenAI did claim that they plan to release a stable production-ready model in the coming weeks.
Closing thoughts & and general comments
Feel free to share your thoughts and interesting test cases!