Comparing AI models : DALLE and Stable Diffusion
Sonal Agrawal
Technical Project Manager | Stakeholder Engagement | Critical Thinking | Agile Methodologies |
DALL-E and DALL-E 2?are deep learning?models developed by OpenAI to generate digital images from natural languages?descriptions called "prompts". DALL-E was revealed by OpenAI in a blog post in January 2021, and uses a version of?GPT 3 modified to generate images. In April 2022, OpenAI announced DALL-E 2, a successor designed to generate more realistic images at higher resolutions that "can combine concepts, attributes, and styles". OpenAI has not released?source code?for either model.
DALL-E can generate imagery in multiple styles, including photorealistic ?imagery,?paintings, and emojis.?It can "manipulate and rearrange" objects in its images?and can correctly place design elements in novel compositions without explicit instruction. Furthermore, DALL-E exhibits a broad understanding of visual and design trend.
Interactive Image Editing
DALL·E has expanded its functionalities beyond generating images from scratch. It now supports interactive image editing, allowing users to modify existing images using natural language instructions. Users can describe desired modifications or visual effects, such as changing the background, adjusting the colors, adding or removing objects, or even applying artistic styles. DALL·E's advanced understanding of these instructions enables seamless and precise edits.
Medical Imaging and Diagnosis
DALLE was trained on massive medical imaging datasets and has shown to be a valuable tool for healthcare providers. It may generate detailed and accurate visual representations of medical diseases based on textual descriptions, assisting in the interpretation of complex imaging data. The capacity of DALLE to generate realistic anatomical structures and clinical manifestations aids diagnosis accuracy and facilitates communication between doctors and patients.
Limitations of DALL-E
DALL-E 2's language understanding has limits. It is sometimes unable to distinguish "A white table and a blue suitcase" from "A red suitcase and a yellow table" or "A panda making latte art" from "Latte art of a panda". It also fails to generate the correct images in a variety of circumstance.
Stable Diffusion
It is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, cultivates autonomous freedom to produce incredible imagery, empowers billions of people to create stunning art within seconds. It was developed by the start-up Stability AI in collaboration with a number of academic researchers and non-profit organizations. Stable Diffusion is a?latent?diffusion model, a kind of deep generative?neural network. Its code and model weights have been released?publicly, and it can run on most consumer hardware equipped with a modest?GPU?with at least 8?GB?VRAM. This marked a departure from previous proprietary text-to-image models such as?DALL-E which were accessible only via?cloud services.
领英推荐
Limitations
In some cases, Stable Diffusion suffers from degradation and errors. The model was initially trained on a dataset of 512*512 resolution images, which means that the quality of generated images degrades noticeably when user specifications deviate from its "expected" 512*512 resolution, the Stable Diffusion model later introduced the ability to natively generate images at 768*768 resolution in version 2.0.
Link to try: https://lastmileai.dev/trials/playground
Generative AI (GenAI): A generative artificial intelligence or generative AI is a type of AI system capable of generating text, images, or other media in response to prompts.
Large Language Model (LLM): Large Language Models (LLMs) are foundation models trained on large amounts of text data, consisting of billions of parameters. Given a prompt, LLMs can generate text and perform text-based tasks.
GPT: Generative pre-trained transformer. It's a technical implementation detail for today's state-of-the-art foundational models. Due to the popularity of "ChatGPT" and "GPT3", it has also become a marketing term to signify an advanced AI model. In the future there will likely be other kinds of foundational models that don't rely on transformers.
Note: Do test these models and compare different photographs, and do share and tag us with what you discover.
Sonal Agrawal Thanks for Sharing! ?