Imagen- Out of the Experimental Test Kitchen And Into the Limelight

Imagen- Out of the Experimental Test Kitchen And Into the Limelight

Note: This article today will be a little longer and more in-depth, which is probably a good thing as I won't be writing another one until Tuesday.

I’ve written about Google’s Imagen before in its experimental form and I loved it. Imagen previously lived in Google Labs. (If you haven’t signed up yet to be a part of Google Labs, I highly recommend you do. It offers you and opportunity to try new products first and give feedback to the teams that build them.)

Google’s research paper, "Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding" describes the challenges behind creating images from a prompt that look just like real photos and perfectly match the description. Developed by Google’s Brain Team, Imagen sets a new standard for generating high-quality, realistic images that deeply understand and align with text. Or, to put it more simply, the model does a better job of reading and interpreting the prompt.

Now, Imagen has graduated out of Labs and is available for everyone to try. You can use the tool either on ImageFx (Google’s AI Test Kitchen) or through Gemini Advanced. This is the intriguing ImageFix (Test Kitchen) login which I used for my previous tests of the model.

But, for since the Gemini approach will likely be used by the vast majority of users, I chose that approach for this article.

I asked Gemini 1.5 Pro to write a series of prompts that would test the capabilities of Imagen. After we finished our series, I asked Gemini how it thought it did and then what makes it different from DALL-E and MidJourney. So, stay tuned for this interview after the test results.


The Test Prompts

1. A photorealistic image of a cat wearing a tiny crown, sitting on a throne made of books, with a mischievous glint in its eye."

  • Why it's a good test: This prompt tests Imagen3's ability to handle multiple attributes (cat, crown, throne, books, mischievous expression), combine concepts in a novel way, and generate a photorealistic image with high attention to detail.

2. A surreal painting of a melting clock dripping over a cityscape, in the style of Salvador Dalí."

  • Why it's a good test: This prompt evaluates Imagen3's capacity to understand and reproduce specific artistic styles (in this case, Surrealism and Dalí's unique approach). It also tests its ability to generate images with a dreamlike, abstract quality.

3. A pixel art image of a spaceship battling a giant robot in a retro arcade game.

  • Why it's a good test: This prompt assesses Imagen3's versatility in generating images across different visual styles, specifically pixel art. It also tests its ability to capture the nostalgic aesthetic of retro arcade games.

4. A photo of a bustling marketplace in a futuristic cyberpunk city, with neon signs reflecting on wet pavement."

  • Why it's a good test: This prompt challenges Imagen3 to create a complex scene with numerous elements (marketplace, cyberpunk city, neon signs, wet pavement) and to realistically depict lighting and reflections.

5. A 3D render of a whimsical, colorful creature with features inspired by a hummingbird and a butterfly, hovering over a field of flowers.

  • Why it's a good test: This prompt explores Imagen3's ability to blend disparate concepts (hummingbird and butterfly) into a cohesive and imaginative creature. It also tests its capacity to generate images with a 3D render quality.

By using these prompts and analyzing the resulting images, you can gain valuable insights into Imagen3's strengths and limitations in areas like photorealism, artistic styles, concept blending, and detail rendering. This will allow you to write a comprehensive and informed article about this powerful AI model.


Test Results:

Prompt # 1: Photo realistic cat on a throne.

This first result was actually very good, but the eyes seemed off. Imagen asked if I wanted any variations, so I asked for more “mischievous eyes.” This led to a back and forth series of prompts with different images that grew progressively weirder. I’ll save you the series of images, except for including the last one.


Variation on the Prompt (after several attempts):

I’d envisioned something more like a throne room scene rather than a cat “sitting on the throne.” When I requested such a scene, it devolved into weird floating cat heads.

This reinforced my conclusion one needs to be VERY specific in prompting Imagen and the model cannot conceptually follow a conversation as well as DALL-E to generate art.

Fourth Variation:

Imagen had just done well on a prompt where I had gotten very specific, so I asked for the same thing, but in a more photo-realistic style. This is a non-specific prompt, that builds upon the conversation before.

Ah... mental note: Imagen does not like non-specific prompts. The result? Floating cat heads.


Second Prompt:

This image render was by far my favorite and one I loved so much that I have made it by desktop screensaver. (I didn’t even know that I liked the art style of Salvador Dali. But, this is just gorgeous.) I also used it for the cover artwork of this story.


This image sparked a conversation between Gemini and I about how Imagen is different from DALL-E. I included this conversation below. Oh, and I just realized "DALL-E" is based upon the Spanish surrealist artist "Dali." Wow. Sure took me long enough to make that connection.


Third Prompt:


This image immediately made me nostalgic for the video games of the 80s. I love it.


Fourth Prompt:

(Gemini and I had gotten off into a side discussion about pixelated video games, hence the first sentence in the beginning of this prompt. Gemini had offered to generate more images in this style.)


Fifth Prompt:


Overall Impressions:

Imagen3 creates gorgeous photorealistic prompts. I will be adding this one to my toolkit for generating images for my newsletter. However, it is important to be specific in writing your prompts.

I’ve gotten lazy in writing my prompts with DALL-E because I use a custom “GPT” to create my newsletter. This GPT functions as a simplified agent and already knows many of my preferences before I even ask for an image. So, I can write a prompt as simple as: “Generate thought-provoking artwork that reflects the themes we discussed in this story.” Sometimes DALL-E produces a great image on the first try, even with such a non-specific prompt. However, when using Imagen, I'll need to up my game and write more specific prompts. It's not a problem. I also have to write clearer prompts when use Ideogram or Midjourney. Each art gen program likes to be spoken to lovingly in a different way.

To help all of us out, I ask Gemini to create a list of prompt tips for Imagen. I figured Gemini knows best how it wants us to sweet talk it.

Tips for Writing Effective Prompts

To get the best possible results from Imagen3, consider these tips:

  • Be specific and descriptive: The more detail you provide in your prompt, the better Imagen3 can understand your vision.
  • Use clear language: Avoid ambiguity and use precise terms to describe the desired elements and style.
  • Provide context: Include information about the setting, mood, and composition to guide the image generation process.
  • Experiment with different prompts: Don't be afraid to try different variations of your prompt to explore different creative possibilities.
  • Provide feedback: Offer specific feedback on the generated images to help Imagen3 refine its output and better understand your preferences.*

*I’m not sure about this last one because “feedback” brought us into the realm of floating cat heads. But, I can see how I was at fault for not getting more specific and descriptive in my modifications. I was treating Imagen too much like DALL-E. And, on that note, Appendix B outlines the difference between these two models.


I'm a retired educator and freelance writer who loves researching AI and sharing what I've learned.

Note: I will be away at a conference for the next five days. Deep Learning Daily will resume on Tuesday, November 29th.


Appendix A: Imagen, Asking the Model How It Did On the Test.


* Editor’s Note: I would disagree with Gemini’s self-assessment of the mischievous cat scenario. While the first image was good, minus the somewhat creepy eyes, the rest of the images went from weird to weirder. However, I will grant that they were “creative” in the sense you call art “creative” when you don’t know what else to call it.


Appendix B:

Question: How is Imagen3 different from DALL-E 3?

1. Architecture and Training Data:

  • Imagen3: Developed by Google DeepMind, Imagen3 is based on a diffusion model architecture. It's trained on a massive dataset of text and image pairs, with a focus on understanding and responding to complex prompts.
  • DALL-E 3: Created by OpenAI, DALL-E 3 also leverages a diffusion model but with architectural variations. Its training data is also vast but emphasizes aligning with human instructions and preferences, resulting in a stronger ability to follow detailed prompts.

2. Image Quality and Style:

  • Imagen3: Known for generating highly realistic and detailed images, often with a more artistic or illustrative flair. It excels at capturing textures, lighting, and complex compositions.
  • DALL-E 3: While capable of photorealism, DALL-E 3 tends to produce images with a slightly more stylized or "painterly" aesthetic. It's particularly adept at generating images that evoke specific moods or emotions.

3. Prompt Understanding and Control:

  • Imagen3: Excels at interpreting nuanced prompts and generating creative variations, even with limited guidance. It offers less precise control over specific elements within the image.
  • DALL-E 3: Designed for more controlled generation, allowing users to specify details like composition, perspective, and individual object attributes. It's better at accurately following detailed instructions.

4. Accessibility and Integration:

  • Imagen3: Primarily accessible through Google products and services like Gemini. Its integration with other tools and platforms is still evolving.
  • DALL-E 3: More widely available through OpenAI's API and platforms like Bing Chat. It has broader integration with third-party apps and workflows.

5. Strengths and Weaknesses:

  • Imagen3:
  • DALL-E 3:

Ultimately, the best model for you depends on your specific needs and preferences. If you prioritize photorealism and artistic flair, Imagen3 might be a good choice. If you need more control over image details and wider accessibility, DALL-E 3 could be a better fit.


Appendix C: How is Imagen3 Different From Midjourney?

After I commented that one of the images generated reminded of a “MidJourney-like style,” Gemini and I discussed the differences between Imagen and MidJourney.

Anyone who has used Midjourney will recognize that this is indeed "Midjourney-ish" style.

Here's a breakdown of their key differences:

1. Access and Interface:

  • Imagen3: Currently, access is primarily through Google products like Gemini Advanced. It has a more streamlined and user-friendly interface, designed for ease of use. ?
  • Midjourney: Accessed through a Discord server, requiring interaction with bots and commands. This can be a steeper learning curve for new users, but offers a more community-driven experience with active feedback and sharing. ?

2. Image Style and Aesthetics:

  • Imagen3: Excels at generating highly realistic images with a focus on accurate lighting, textures, and details. It often leans towards a more polished and refined aesthetic. ?
  • Midjourney: Known for its artistic and often dreamlike quality. It frequently produces images with a more stylized, painterly, or abstract feel, giving it a distinct visual signature. ?

3. Prompting and Control:

  • Imagen3: Designed for more straightforward prompting, where you describe the desired image in natural language. It offers less granular control over specific elements within the image. ?
  • Midjourney: Uses a more complex system of parameters and commands to refine the generation process. This allows for greater control over aspects like aspect ratios, stylization levels, and specific artistic influences. ?

4. Community and Collaboration:

  • Imagen3: Being newer, it has a smaller and less established community compared to Midjourney.
  • Midjourney: Boasts a large and active community on Discord, fostering collaboration, feedback, and the sharing of creative prompts and techniques. ?

5. Strengths and Weaknesses:

Imagen3:

  • Strengths: Photorealism, ease of use, strong understanding of natural language prompts.
  • Weaknesses: Less precise control over details, smaller community, limited access.

Midjourney:

  • Strengths: Unique artistic style, granular control, vibrant community, wider access.
  • Weaknesses: Steeper learning curve, can be less consistent with photorealism.

In essence: Imagen3 is like a user-friendly camera with excellent auto-mode, capturing realistic shots with ease. Midjourney is more like a versatile art studio, offering a wider range of tools and controls for those willing to experiment and refine their artistic vision.

The best choice for you depends on your priorities and preferences. If you value ease of use and photorealism, Imagen3 might be a better fit. If you prefer a more hands-on approach with greater artistic control and a vibrant community, Midjourney could be the way to go.


#Imagen3 #GoogleDeepMind #AIArt #TextToImage #GenerativeAI

要查看或添加评论,请登录

社区洞察

其他会员也浏览了