Imagen- Out of the Experimental Test Kitchen And Into the Limelight
Note: This article today will be a little longer and more in-depth, which is probably a good thing as I won't be writing another one until Tuesday.
I’ve written about Google’s Imagen before in its experimental form and I loved it. Imagen previously lived in Google Labs. (If you haven’t signed up yet to be a part of Google Labs, I highly recommend you do. It offers you and opportunity to try new products first and give feedback to the teams that build them.)
Google’s research paper, "Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding" describes the challenges behind creating images from a prompt that look just like real photos and perfectly match the description. Developed by Google’s Brain Team, Imagen sets a new standard for generating high-quality, realistic images that deeply understand and align with text. Or, to put it more simply, the model does a better job of reading and interpreting the prompt.
Now, Imagen has graduated out of Labs and is available for everyone to try. You can use the tool either on ImageFx (Google’s AI Test Kitchen) or through Gemini Advanced. This is the intriguing ImageFix (Test Kitchen) login which I used for my previous tests of the model.
But, for since the Gemini approach will likely be used by the vast majority of users, I chose that approach for this article.
I asked Gemini 1.5 Pro to write a series of prompts that would test the capabilities of Imagen. After we finished our series, I asked Gemini how it thought it did and then what makes it different from DALL-E and MidJourney. So, stay tuned for this interview after the test results.
The Test Prompts
1. A photorealistic image of a cat wearing a tiny crown, sitting on a throne made of books, with a mischievous glint in its eye."
2. A surreal painting of a melting clock dripping over a cityscape, in the style of Salvador Dalí."
3. A pixel art image of a spaceship battling a giant robot in a retro arcade game.
4. A photo of a bustling marketplace in a futuristic cyberpunk city, with neon signs reflecting on wet pavement."
5. A 3D render of a whimsical, colorful creature with features inspired by a hummingbird and a butterfly, hovering over a field of flowers.
By using these prompts and analyzing the resulting images, you can gain valuable insights into Imagen3's strengths and limitations in areas like photorealism, artistic styles, concept blending, and detail rendering. This will allow you to write a comprehensive and informed article about this powerful AI model.
Test Results:
Prompt # 1: Photo realistic cat on a throne.
This first result was actually very good, but the eyes seemed off. Imagen asked if I wanted any variations, so I asked for more “mischievous eyes.” This led to a back and forth series of prompts with different images that grew progressively weirder. I’ll save you the series of images, except for including the last one.
Variation on the Prompt (after several attempts):
I’d envisioned something more like a throne room scene rather than a cat “sitting on the throne.” When I requested such a scene, it devolved into weird floating cat heads.
This reinforced my conclusion one needs to be VERY specific in prompting Imagen and the model cannot conceptually follow a conversation as well as DALL-E to generate art.
Fourth Variation:
Imagen had just done well on a prompt where I had gotten very specific, so I asked for the same thing, but in a more photo-realistic style. This is a non-specific prompt, that builds upon the conversation before.
Ah... mental note: Imagen does not like non-specific prompts. The result? Floating cat heads.
Second Prompt:
This image render was by far my favorite and one I loved so much that I have made it by desktop screensaver. (I didn’t even know that I liked the art style of Salvador Dali. But, this is just gorgeous.) I also used it for the cover artwork of this story.
This image sparked a conversation between Gemini and I about how Imagen is different from DALL-E. I included this conversation below. Oh, and I just realized "DALL-E" is based upon the Spanish surrealist artist "Dali." Wow. Sure took me long enough to make that connection.
Third Prompt:
This image immediately made me nostalgic for the video games of the 80s. I love it.
Fourth Prompt:
(Gemini and I had gotten off into a side discussion about pixelated video games, hence the first sentence in the beginning of this prompt. Gemini had offered to generate more images in this style.)
Fifth Prompt:
领英推荐
Overall Impressions:
Imagen3 creates gorgeous photorealistic prompts. I will be adding this one to my toolkit for generating images for my newsletter. However, it is important to be specific in writing your prompts.
I’ve gotten lazy in writing my prompts with DALL-E because I use a custom “GPT” to create my newsletter. This GPT functions as a simplified agent and already knows many of my preferences before I even ask for an image. So, I can write a prompt as simple as: “Generate thought-provoking artwork that reflects the themes we discussed in this story.” Sometimes DALL-E produces a great image on the first try, even with such a non-specific prompt. However, when using Imagen, I'll need to up my game and write more specific prompts. It's not a problem. I also have to write clearer prompts when use Ideogram or Midjourney. Each art gen program likes to be spoken to lovingly in a different way.
To help all of us out, I ask Gemini to create a list of prompt tips for Imagen. I figured Gemini knows best how it wants us to sweet talk it.
Tips for Writing Effective Prompts
To get the best possible results from Imagen3, consider these tips:
*I’m not sure about this last one because “feedback” brought us into the realm of floating cat heads. But, I can see how I was at fault for not getting more specific and descriptive in my modifications. I was treating Imagen too much like DALL-E. And, on that note, Appendix B outlines the difference between these two models.
I'm a retired educator and freelance writer who loves researching AI and sharing what I've learned.
Note: I will be away at a conference for the next five days. Deep Learning Daily will resume on Tuesday, November 29th.
Appendix A: Imagen, Asking the Model How It Did On the Test.
* Editor’s Note: I would disagree with Gemini’s self-assessment of the mischievous cat scenario. While the first image was good, minus the somewhat creepy eyes, the rest of the images went from weird to weirder. However, I will grant that they were “creative” in the sense you call art “creative” when you don’t know what else to call it.
Appendix B:
Question: How is Imagen3 different from DALL-E 3?
1. Architecture and Training Data:
2. Image Quality and Style:
3. Prompt Understanding and Control:
4. Accessibility and Integration:
5. Strengths and Weaknesses:
Ultimately, the best model for you depends on your specific needs and preferences. If you prioritize photorealism and artistic flair, Imagen3 might be a good choice. If you need more control over image details and wider accessibility, DALL-E 3 could be a better fit.
Appendix C: How is Imagen3 Different From Midjourney?
After I commented that one of the images generated reminded of a “MidJourney-like style,” Gemini and I discussed the differences between Imagen and MidJourney.
Anyone who has used Midjourney will recognize that this is indeed "Midjourney-ish" style.
Here's a breakdown of their key differences:
1. Access and Interface:
2. Image Style and Aesthetics:
3. Prompting and Control:
4. Community and Collaboration:
5. Strengths and Weaknesses:
Imagen3:
Midjourney:
In essence: Imagen3 is like a user-friendly camera with excellent auto-mode, capturing realistic shots with ease. Midjourney is more like a versatile art studio, offering a wider range of tools and controls for those willing to experiment and refine their artistic vision.
The best choice for you depends on your priorities and preferences. If you value ease of use and photorealism, Imagen3 might be a better fit. If you prefer a more hands-on approach with greater artistic control and a vibrant community, Midjourney could be the way to go.
#Imagen3 #GoogleDeepMind #AIArt #TextToImage #GenerativeAI