Can generative AI produce realistic medical images?
The question above was posed to the students of my Generative AI class for graduate students at Northeastern, which resulted in a 3-week hands-on project assignment. I wanted to focus on images of something easily visible to the naked eye and thought that a skin rash could be a good example. The immediate concern I had was if the students would be comfortable looking at these images, some of which might cause discomfort, both mentally and physically. Remarkably, barring one or two, most were not concerned at all, but nonetheless, I allowed them to choose from the milder categories if they wished to.
If such an image generation application is in place, then I'm envisioning that physicians would be able to query the system as, for example, "Generate an image of a ringworm rash at the back of the neck of a dark-skinned person," for an informed decision aiding process. Observe the variability as there are hundreds of different combinations of the types of rashes, the areas of the body, and the skin tones. If we are more adventurous, the temporal progression of such an image over a period of time given the patient's current condition can also be explored.
There have been many research and commercial text-to-image models to generate images from text prompts. Here are some of their core conceptual foundations extracted from the published literature. VQ-VAE (Vector Quantized Variational Auto Encoder) uses autoregressive models to learn an expressive prior over a discretized latent space combining textual and image semantics. Different from VQ-VAEs is a combination of the generative capability of VQGAN (Vector Quantized Generative Adversarial Network) and discriminative capability of CLIP (Contrastive Image-Language Pretraining). VQGAN employs a first stage with an adversarial and perceptual objective to learn an intermediary representation using a codebook before feeding into an autoregressive transformer.
领英推荐
A (probabilistic) diffusion model is a parameterized Markov chain trained using variational inference (and U-Net) to produce samples matching the data after a finite number of steps. Diffusion Models learn to generate data by reversing a gradual noising process. Decoupling image generation from the implicit spatial biases of convolutions has allowed text-to-image models to reliably improve via the well-studied scaling properties of transformers. The VAE encoder part of these architectures compresses the image from pixel space to a smaller dimensional latent space, capturing a more fundamental semantic of the image.
One question that naturally arises is why not use DALL-E or Stable Diffusion as zero-shot to generate images from natural language queries, much along the line of their earlier ChatGPT-based chatbot development assignment. The answer is two-fold. First, this is an educational assignment for students to understand underlying models as opposed to how to use an off-the-shelf application. In fact, some students went for few-shot learning and some make use of the diffusion model to fine-tune with some training images. Second, the accuracy of DALL-E is below par for this specific use case. When the ringworm rash generation request is posted to DALL-E 2, the generated images are anything but ring-like, though the back and neck and dark-skinned features were accurately generated. The Stable Diffusion online application has a similar performance.
Some other alternative textual formulations of the request may generate images to satisfy the needed requirements, but the safety-critical medical environment calls for a more accurate image generation environment as per the semantics of a valid query. Students' efforts were hindered of course by the lack of training data and limited computing power and time. There were many novel ideas from students floated around, such as the transfer learning from one rash type to another or one rash type on one skin color to another. Needless to say, my grading was largely based on their thinking in the right direction as opposed to a complete working system.
CEO, QuantUniversity | AI Expert | Educator | Author | TedX Speaker |
9 个月Subrata Das, I did a research project in the area with another student asking the same question in 2021 (pre-ChatGPT :))! Here is the paper you may find interesting! Lu, Z & Krishnamurthy, S (2021). SkinGAN: Medical image Synthetic data generation using Generative methods Paper: https://www.slideshare.net/QuantUniversity/zijiasri-skinsynthesizepdf Code: https://github.com/ZijiaLewisLu/SkinGAN