ImageFX: Google's Image Generation Tool
Google Labs' FX home page

ImageFX: Google's Image Generation Tool

A Comprehensive Look at ImageFX

Business Use Case:

Free image generation tools should be very attractive to businesses because of their ability to create non-copyrighted images for use in:

  1. Marketing materials
  2. Websites
  3. Social media postings
  4. Presentations
  5. Educational materials
  6. Company documents


Because of cost savings and convenience, these free tools are hard to argue against. Though getting just the right image can be a little challenging, it becomes easier once you know a tool's capabilities and idiosyncrasies.

To that end, I am writing this article to help introduce ImageFX as a viable free tool for both individuals and companies for their image needs.


ImageFX and Imagen 2

Last week I wrote about Google updating its Bard (now rebranded as "Gemini") chatbot to include the ability to generate images a la ChatGPT Plus and DALLE-3. Where Gemini's image generation is similar to ChatGPT Plus', ImageFX is similar to Microsoft's Image Creator. Both tools are used through a website interface and work in very similar ways.

The image generation in both Gemini and ImageFX is accomplished through Google's image generating diffusion model Imagen 2, that was released this past December.

Imagen 2 is also used to create images in some of Google's other offerings like, Ads, Duet AI, Workspace and Vertex AI.

In Vertex AI (Google's cloud-based offering) Imagen 2 is available using Generative AI Studio. With Generative AI Studio, all of Imagen 2's features and capabilities are intact. But it isn't free. Whether using Imagen 2 in Generative AI Studio directly or through its api (application program interface) there is a charge for each image generated.

Gemini and ImageFX are free to use, but some of Imagen 2's most powerful features like inpainting (changing an aspect of a created image through masking), mask free editing (through natural language iteration), outpainting (expanding a created image), and upscaling, are unavailable.


Let's take a deeper dive into ImageFX.

In this article I will:

  1. Describe how to create images with ImageFX.
  2. Provide sample images created with ImageFX.
  3. Give comparative images created with Gemini, Stable Diffusion (using Fooocus), DALLE-3 (using ChatGPT Plus) and Midjourney (using version 6).
  4. Describe ImageFX's idiosyncrasies and contrasting its features and limitations with those of Gemini.
  5. Finally, for those interested, I compare and contrast some of the features of the image generation tools used in this article.


Let's get started.

First, navigate to ImagenFX's webpage using the following link: https://aitestkitchen.withgoogle.com/tools/image-fx. You will be prompted to log into your Google account, if you are not already logged in.

ImageFX login screen

Once you are logged in, you will be brought to the main ImageFX screen (screenshot below) where you can begin creating images using natural language prompting (Note: If you wish to use the pre-filled random prompt displayed in the prompt field, place your cursor inside the prompt area and hit the tab button on your keyboard).

Screenshot of the ImageFX starting window with "I'm feeling lucky" button.

How to use ImageFX

Before creating sample images with ImageFX, I will take you step-by-step through the controls and some of the interesting limitations and ideocracies.

The screenshot below shows the result of my selecting the option "I'm feeling lucky" (see above image), which generated the random prompt: "Fuzzy polar bear plushie sleeping in a minimalist modern apartment bed."

Images created with ImageFX

  • Generating an image: If you choose to enter your own prompt, enter it in the prompt field and select the "Generate" button (#2 in the image below). ImageFX will generate images based on that prompt and update your original prompt to include common descriptive alternatives. By clicking on the triangle beside the original descriptive word (#1), a list of alternative descriptive words drops down. These are simply helpful suggestions you might want to consideration for changing the generated image(s). You are not limited by these suggested alternates.
  • Iterations: Unlike generating images in Gemini, you cannot iterate on your prompt. If you wish to change the look of the resulting images, you will have to change the initial prompt within the prompt field and resubmit the entire updated prompt.

ImageFX resulting image creation and setting.

  • Additional descriptors: In addition to the dropdown alternate description suggestions inside the prompt area, there is a list of additional suggested descriptive words found below the prompt field (#3 above). Selecting the "More" button at the beginning of the list, will generate even more suggestions.
  • Seed numbers: An image seed number is essentially a starting point for the random number generator that ImageFX uses during the image creation process. You can access the seed number by selecting "Settings" (#4). This will pop-up a small window with the seed number of the associated selected image. ImageFX creates sequential seed numbers for the images it creates in a single generation and if you want an image's ascetic to be maintained while you make updates to your image(s), ImageFX allows you to lock the number by selecting the padlock icon to the right of the seed number. It also allows you to create and use your own seed numbers. (Note: In most instances seed numbers have little functional use aside from minor image tweaks, image recreation or maintaining an image's aesthetic in subsequent generated images. Outside of that, controlling or manipulating seed numbers, has little functional value and you can choose to ignore them.)
  • Layout: Generated images can be laid out in a linear or square format by selecting the icon shown above (#6).
  • Image sets: During a single session, you may try different/updated prompts. To move forward or backwards through the image sets you created, you can select the forward or backward arrows found to the left of the image layout icon.
  • Retrieving an image's original prompt: Going backwards or forwards through previously generated images, will not show the prior prompt in the prompt field, however, you can always retrieve an image's prompt by mousing over an image which reveals the prompt that created it (#2 below) and selecting the "Copy prompt" button beneath it (#3).

Retrieving prompts and sharing, copying and downloading images

  • Sharing, copying, downloading images: ImageFX makes it pretty easy to share, copy or download images. Mousing over an image reveals the "Copy image" button (#5). Clicking on the button, copies the image to the clipboard and it can be pasted anywhere from there. The "Download" (#4) and "Share" (#6) buttons remain visible below the generated images. You'll need to select the image you want to download or share, but from there it is as easy as clicking the associated button. Clicking the download button, brings up the pop-up window for you to select the download location and clicking the share button, causes a "share window" to pop up. (Note: this function seems to be buggy. On two separate occasions, selecting the share button, wiped clean the prompt and all the images I had generated. So be forewarned that this might happen).

ImageFX sharing window.

  • ...sharing continued... the above image shows the pop-up "share window" on the left (#1). Selecting the "Copy share link" button on the bottom (#2), copies a link to the image, which can then be shared with others. As indicated at the very bottom (#7), "By sharing this link, anyone can view and remix your creation.". I was concerned that someone with a shared link might be able to remix my original image, but that is not the case. Link recipients log into their own Google account and changing the shared image, creates a unique copy which only they have access to. Selecting the "Remix" button on the bottom of the window (#3), simply returns you to the prompt window for further tweaking. Selecting the flip icon (#4), flips back and forth between the image window and a window showing the prompt input and seed number of the image (#5). Oddly, only the seed number has a convenient copy icon (#6), but you can manually copy the prompt by highlighting it, clicking your right mouse button and selecting "copy".
  • Reporting a legal issue: You can report a legal issue by selecting the flag icon anywhere within a window or by selecting the vertical three dots just to the left of your avatar image in the upper right corner of the main ImageFX window (see #1 two images above). Selecting the three dots drops down a menu which allows you not only to report a legal issue but also allows you to send app feedback to Google and delete your ImageFX data, as well as give you access to ImageFX's frequently asked questions.
  • Creating a new image: If you wish to start afresh on an entirely new image, you must clear the prompt field by deleting it manually or selecting the "Start over" button beside the "Generate" button.


Now let's start creating!

In this section, I'll provide some ImageFX screenshots and generated images that represent common categories of images often created with generative diffusion models. The screenshots of the ImageFX window will show the thumbnail results and I'll provide a full image of the one I feel is the best of the group:

Categories

  1. Realistic photograph of a person.
  2. Fantasy image of AI.
  3. Natural/realistic landscape.
  4. Fantasy landscape.
  5. Style image. (Applying a common style to a subject. Note that there are too many available styles to give a full representation. This is merely to give an example of ImageFX's general ability).
  6. Image with specified text inside it.
  7. Still life.


ImageFX Images

Prompt #1:

A studio photographic image of a female executive.

screenshot of prompt and image set
ImageFX - female executive

Prompt #2:

A fantasy image of a future AI consisting of lights and wires with a nighttime star filled sky as the background.

screenshot of prompt and image set
ImageFX - future AI

Prompt #3:

A beautiful landscape with a lake in the foreground, with a dock, and mountains in the background, in the early morning with sunrise and mist on the lake.

screenshot of prompt and image set
ImageFX - lake

Prompt #4:

A cyberpunk cityscape set in a rain-soaked, alternate reality. towering skyscrapers are cloaked in advertisements, and the streets are gritty.

screenshot of prompt and image set
ImageFX - cyberpunk future

Prompt #5:

An historic illustration of a factory shop floor.

screenshot of prompt and image set
ImageFX - illustration of factory floor

Prompt #6:

A closeup photograph of a Manhattan storefront door of a fashion boutique with a sign on the door which says, " Now Open! "

screenshot of prompt and image set
ImageFX - storefront (including text)

Prompt #7:

A photograph of a beautiful entree on a set dining table in an exclusive restaurant.

screenshot of prompt and image set
ImageFX -entree

Comparing ImageFX to Competitors' Text-to-Image Tools

Before I get into ImageFX's idiosyncrasies, let's do some comparison image generation. I'll be comparing how ImageFX images compare with:

1. Gemini (although Gemini and ImageFX use the same diffusion model, Imagen 2, Gemini's natural language prompt and filtering can cause significantly different results),

2. Stable Diffusion (using the webui Fooocus and its generic checkpoint model Juggernaut XL with default styles),

3. DALLE-3 (using ChatGPT Plus), and

4. Midjourney (using the latest version 6).

(Note: These images are not cherrypicked. I've just run the prompt once. Where multiple images were generated with a single prompt, I've selected the best of the group, but otherwise, no manipulation has been used to improve any of the results.)

Comparisons

As you can see in the above comparisons, these models do a very respectable job creating images. ImageFX, Gemini and Stable Diffusion are the only fully free models of the group (though both Gemini and ImageFX require a Google account) and that is impressive, especially considering they tend to do a better job in many instances than DALLE-3... particularly with realism.

Single pass prompting doesn't really do justice to each model's capabilities, but for the sake of comparison, it is the fairest method. Additionally, each tool can generate better results when prompting them in the style best suited for the particular model. But again, for comparison purposes, I felt it best to use simple natural language prompting for consistency and ease.


ImageFX and Gemini

Because ImageFX and Gemini use the same image generating diffusion model, they share some features and limits... but there are also some notable differences.

  1. Both are limited to square formats.
  2. Neither can upscale images.
  3. Neither can inpaint or outpaint.
  4. Gemini is able to use natural language iteration to update a previously generated image, while ImageFX requires a fully updated prompt to be submitted for subsequent changes.
  5. Unpredictably, Gemini and ImageFX may produce between one and four images per generation. I suspect this is due to a cap on tokens per generation. Tokens can be thought of as the currency of compute. More tokens equal more compute. As these are free to use, I suspect that a cap, results in more complicated images using up more tokens and causing a decrease in the number of images in a particular generation. BUT... this is purely an educated guess on my part, in lieu of any other obvious reason.


Some Final Thoughts

ImageFX is a very viable option for individuals and companies who wish to generate their own images. Being free-to-use causes some limitations in features and capabilities over the Imagen 2 paid version, however, these limitations won't affect most of the use cases and aren't dissimilar to those found in its main rival DALLE-3.

Additionally, as I've said before, it is always good to use more than one image generation tool, since each can produce different results. It also helps to avoid your images from having a similar aesthetic, which can happen over time when you use the same tool with similar prompting.

With ImageFX, Google has taken a significant step in catching up to GPT-4's DALLE-3 and in the process, given us all another very viable tool for our image generation arsenal.


I hope you have enjoyed this article and found it helpful. If you made it down to the bottom here, please like and follow. It helps get me motivated to do more articles like these. Thanks!

#Gemini, #ImageFX, #Stable Diffusion, #Midjourney, #DALL-E 3, #Diffusion Models, #Image Generation, #Imagen 2


The below comparison is provided if you are interested in understanding some of the differences and similarities of the tools I used in this article.

Tools compared:

  1. File Formats: Gemini generates images in .jpg, while ImageFX, Stable Diffusion and Midjourney generate their images in .png, and DALLE-3 uses .webp.
  2. Size Formats: ImageFX and Gemini both generate images in 1536 x 1536 square format only. Stable Diffusion allows for the user to select one of 26 formats (square, portrait or landscape). DALLE-3 and Midjourney generate images in 1024 x 1024 square format only.
  3. Upscaling: Only Stable Diffusion and Midjourney can upscale images within their respective tools.
  4. Privacy: Only Stable Diffusion is run locally and is, therefore, fully private. All the other services require prompts to be transferred to the cloud and may use your prompts for product improvement and training purposes.
  5. Cost: ImageFX, Gemini and Stable Diffusion are free. Though both ImageFX and Gemini require Google accounts to use them. DALLE-3 requires a paid subscription to ChatGPT Plus but can be used with a valid Microsoft account using Copilot. Midjourney requires a subscription, and its interface is through Discord for all but extreme power users who have early website access.
  6. Filtering: Stable Diffusion is the only one of the five tools that is unfiltered. NSFW (not safe for work) images can be created with Stable Diffusion and therefore it is important to be aware of this particularly in a working environment shared with coworkers. In the first week of their release, both ImageFX and Gemini were refusing to create images with very innocuous prompts. "I can't generate images of that. Try asking me to generate images of something else." This seems to have been resolved, as I didn't experience any of that during the writing of this article.
  7. Iterations: Only Gemini and DALLE-3 allow for natural language prompt iteration. Meaning that you can update prompts by requesting changes, rather than editing and resubmitting a new complete prompt. This is because their front ends are large language models, with Gemini being Google's Gemini LLM and DALLE-3 being GPT-4. All the others need fully updated prompts to be resubmitted.
  8. Text: ImageFX, Gemini and DALLE-3 all did quite well (generally) with text inside of generated images. Although Midjourney has made strides in this area, it can still struggle. But generally, all models need some coaxing to get text right.
  9. Photorealism: DALLE-3 struggles the most with generating photorealistic images. DALLE-3's images tend toward an illustrative aesthetic while all the others tended to do quite well in providing images that were photorealistic.
  10. Repeatability: ImageFX provides the seed numbers it uses to generate images but not the filtered/extended prompt. Gemini will provide neither, making it challenging to create repeatable or consistent images. DALLE-3, Stable Diffusion and Midjourney can all provide seed numbers and DALLE-3, and Stable Diffusion can provide any additional prompt details used when applying styles.
  11. Advanced features: Only Stable Diffusion and Midjourney have truly robust advanced features that allow for both direct editing of created images and more specific control during the creation process. Midjourney and Stable Diffusion both have the ability to inpaint, outpaint, upscale, use image-to-image and apply styles. However, Stable Diffusion stands alone in being able to use checkpoint models, LoRA's, style combinations, face swapping, controlnet (positioning/posing tool) as well as many other unique extensions designed to edit and control the generation process itself.

The above list is not exhaustive, but it does give you a sense of how these tools are different and how they are similar.




very interesting. Just one question> Can I sell images I create through POD and stock agencies?

回复
Max Bitton

Executive Director FANABOX

3 周

Great insight. Thank you Gene.

回复

I found this very, very helpful. Thank you so much.

回复

要查看或添加评论,请登录

Gene Bernardin的更多文章

社区洞察

其他会员也浏览了