Exploring the AI-Generated Image Landscape
Imagine three cute baby kākāpō in the New Zealand native bush, 3D cartoon style.

Exploring the AI-Generated Image Landscape

TLDR: In this post I reflect on my early experiences with digitising images, computer vision and AI-generated content (AIGC). I discuss concerns around AIGC and conduct my own experiment across Midjourney and DALL.E 2 to generate images of New Zealand native birds.

When I was 16 years old, I first experienced a flatbed scanner that gave me the power to digitise photographs and load images into a computer. This technology set me on a path to study computer science and computer vision at university almost 30 years ago.

The morphing sequence in Michael Jackson's Black or White video in 1991 inspired me to go further and I was soon making my own morphing videos on my home computer photographing all my friends and then digitising the images. I remember it rendering for days to generate the frames to create a 5 second video. When I showed the video to my friends and teacher at school, they couldn't believe what they were seeing. This was before the Internet!

Large advances in computer vision had a long winter until the launch of the of the annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2010. ILSVRC encouraged research in computer vision and benchmarked the progress of image recognition algorithms.

ImageNet was conceived by Fei-Fei Li, a Stanford University professor in 2006. The project aimed to create a vast database of labeled images to facilitate the development of computer vision algorithms. Spanning over 14 million images with annotations, ImageNet covers a diverse range of objects, scenes, and concepts, organised according to the WordNet hierarchy. The labeling exercise was outsourced to people around the world using Amazon Mechanical Turk.

ILSVRC led to the emergence of deep learning and rapid progress in image recognition. Five years ago, I wrote a post about this.

We are now entering a new phase in the advancement of AI-Generated Content (AIGC).

No alt text provided for this image
The history of Generative AI in CV, NLP and VL.
No alt text provided for this image
Statistics of model size and training speed across different models
No alt text provided for this image
. The general structure of generative vision language.

One of the most notable breakthroughs in AI-generated image generation was the development of Generative Adversarial Networks (GANs) by Ian Goodfellow and his team in 2014. GANs consist of two neural networks, the generator and the discriminator, that work together to create realistic images.

Building on the success of GANs, researchers at NVIDIA developed StyleGAN and its successors, StyleGAN2 and StyleGAN3. These models improve the quality of generated images and introduced new features, such as style mixing and the ability to control different aspects of an image.

The advancements of Large Language Models (LLMs) have bought about an interesting phenomenon which is the creation of images from text prompts. Three of the most popular services leveraging this technology are OpenAI's DALL.E 2 which is behind Microsoft's new Image Creator as well as Midjourney and Stable Diffusion.

Like any new technology there are concerns around AI-Generated Image Creation:

  1. Ethical concerns - As AI-generated images become more realistic, concerns regarding ethics and misuse have emerged. One such concern is deepfakes, which involve manipulating images or videos to make it appear as though someone is doing or saying something they did not. These can be used for nefarious purposes, such as spreading misinformation or defaming individuals. Recent examples include the Fake Trump Arrest Photos.
  2. Bias in AI Systems - AI-generated images can inadvertently perpetuate harmful stereotypes or biases. Since AI models learn from existing data, they can absorb and reproduce the biases present in that data. It is crucial for researchers and developers to actively work towards reducing biases in AI-generated images to ensure a more inclusive and diverse representation of people and objects. Jenka Gurfinkel does a great job of describing the influence of the American Smile on Midjourney
  3. Environmental Impact - Training large AI models like GANs requires substantial computational resources, which can have a significant environmental impact. As the field advances, it is essential to consider the ecological footprint of these technologies and develop more energy-efficient models.
  4. Copyright and Ownership - As AI-generated content becomes more prevalent, questions of copyright and ownership arise. Determining the intellectual property rights for images generated by AI systems is a complex issue that will require new legal frameworks and guidelines. This is also shaking up the music industry with a strong online debate.

Before you can use the Microsoft Azure OpenAI Service you need to agree to the companies Responsible AI policies. I worked with Natasha Crampton at Microsoft New Zealand for many years. Natasha moved to Redmond in 2018 to the role of chief counsel to the AETHER Committe and is now Microsoft's chief Responsible AI officer and posts often on the responsible AI program.

With all this as a backdrop this morning I conducted my own experiment across Midjourney and DALL.E 2 carrying on the theme of native birds from Aotearoa that I started five years ago.

No alt text provided for this image
Building an image classifier for NZ native brids.

My first prompt: "three cute baby kākāpō in the New Zealand native bush, 3D Pixar cartoon style"

No alt text provided for this image
DALL.E 2 three cute baby kākāpō in the New Zealand native bush, 3D Pixar cartoon style
No alt text provided for this image
Midjourney three cute baby kākāpō in the New Zealand native bush, 3D Pixar cartoon style

Some observations:

  • Midjourney did a better job of imagining the background of NZ native bush.
  • DALL.E put three of the birds up a tree despite the fact that the kākāpō is flightless.

I then turned to a more photo realistic prompt "kākāpō on the forest floor in the New Zealand native bush."

No alt text provided for this image
DALL.E 2 kākāpō on the forest floor in the New Zealand native bush
No alt text provided for this image
Midjourney kākāpō on the forest floor in the New Zealand native bush

Some observations:

  • Again, Midjourney did a better job of imagining the background of NZ native bush including ferns.
  • DALL.E produced a bird that appeared to be a hybrid of many NZ native parrots including the kākāpō, the kea and the kākā.

If you search for images of kākāpō you often get a photo of a kea instead with its distinctive hooked beak. This feature seems to have made its way incorrectly into both Midjourney and DALL.E's models.

No alt text provided for this image
image search with kea included incorrectly along with Kākāpō


No alt text provided for this image
Incorrect and correct beak styling for the kākāpō

I was very impressed by this imagined image from Midjourney

No alt text provided for this image
Midjourney kākāpō on the forest floor in the New Zealand native bush

That said if you zoom in you will see what appears to be jumbled copyright text at the bottom of the image.

No alt text provided for this image
what appears to be jumbled copyright text at the bottom of the image.

When I was building my own kākāpō classifier five years ago I noticed that most of the high-quality photos of kākāpō on the web had copyright notices watermarked into the images. A visual image search from the generated image links to copyrighted photos like this one.

The morphing between the kākāpō and the kea got me thinking and I created the same prompt for a kea.

No alt text provided for this image
DALL.E 2 kea on the forest floor in the New Zealand native bush
No alt text provided for this image
Midjourney kea on the forest floor in the New Zealand native bush

The distinctive curved beak certainly came through in these examples. Again, the images produced by DALL.E 2 were out of proportion and less accurate. The following imagined kea photo is fantastic.

No alt text provided for this image
Imagined KEA Photo

The next thing I thought I would try to generate an image from a reference photo.

I picked this reference photo from Rob Pine to "re-imagine".

No alt text provided for this image
DALL.E 2 reimagining Rob Pine's photo of two keas playing in the snow
No alt text provided for this image
Midjourney reimagining Rob Pine's photo of two keas playing in the snow

Where things get really interesting is when I used that image as a reference and dialed up the emotion by editing my prompt to include things like expressive human eyes. When I did this the strangest thing happened. Midjourney imagined what appear to be two bird like children focusing in on the emotion in the eyes.

No alt text provided for this image
Midjourney imagined what appear to be two bird like children morphed with two kea in the snow focusing in on the emotion in the eyes.

The landscape of AI-generated image generation is rapidly evolving as researchers, developers, and policymakers navigate this landscape, striking a balance between innovation and responsible development will be key to unlocking the full potential of AI-generated images.

I am only just scratching the surface over what these tools can do. I recommend if you are keen to get started experimenting for yourself you check out the Introduction to Prompt Engineering for Generative AI course on LinkedIn Learning that is currently free.

Dan Te Whenua Walker

Ngāti Ruanui | Aotea Waka | Proud dad

1 年

The Fake Trump photo got me for a second

回复
?? Eisa Q.

Senior Partner Development Manager @ Microsoft | Partner Solutions, Sales

1 年

What an awesome image ??

Drew Robbins

Engineering Leader | Driving Innovation and Observability in Generative AI Applications

1 年

Great article and very interesting experiments that support the earlier points about copyright, etc.

Nimish Rao

Data, Analytics and AI || Driving Business outcomes with Data, Analytics and AI

1 年

Wow this is so cool Nigel.

Igor Portugal

Technology Innovator | Fractional CxO | AI | Cyber Security | Investor | Author | Empowering Businesses, Enhancing Lives: Uniting technology and human insight for a more prosperous, enjoyable, smarter and safer world.

1 年

This is a very good overview and you are raising some very interesting questions. Time to think.

要查看或添加评论,请登录

Nigel Parker的更多文章

  • Birthing a Startup: Our First Trimester

    Birthing a Startup: Our First Trimester

    The last three months at Vivara have been nothing short of exhilarating. Transitioning back into startup mode after…

    16 条评论
  • Analysing Data with ChatGPT-4 Vision

    Analysing Data with ChatGPT-4 Vision

    TL/DR Massive potential when interpreting and explaining data, still making lots of mistakes. Our household is an early…

    15 条评论
  • In the Wake of SVB's Collapse How Safe are Bank Deposits in NZ?

    In the Wake of SVB's Collapse How Safe are Bank Deposits in NZ?

    By now you will no doubt of heard about the bank run at SVB last week and be wondering what further implications this…

    3 条评论
  • The Freedom to Change

    The Freedom to Change

    The Saros cycle is an eclipse cycle with a period of 18 years, 11 days and 8 hours. It was discovered by ancient…

    149 条评论
  • AI Futures ??????

    AI Futures ??????

    During an employee all hands when Ahmed Mazhari made a recent trip to Australia, he was asked what book he was…

    11 条评论
  • Te Wiki Hauora Tāne

    Te Wiki Hauora Tāne

    A boy born in Aotearoa today will be 20% more likely to die of a heart attack and 30% more likely to get type 2…

    5 条评论
  • Me kite, me rongo, me kōrero te reo Māori

    Me kite, me rongo, me kōrero te reo Māori

    Last Saturday was Juneteenth Microsoft acknowledges this day by encouraging employees to spend the following Monday to…

    3 条评论
  • 2020 A Year of Curiosity

    2020 A Year of Curiosity

    Six years ago my friend Vaughan spoke about being an impossible person he has this theory that we are all capable of…

    19 条评论
  • Fear a Black Planet

    Fear a Black Planet

    As a 13 year old I remember being dropped off at the Logan Campbell Centre in Auckland to attend my first concert with…

    3 条评论
  • Privacy & Confidentiality in COVID-19 Responses

    Privacy & Confidentiality in COVID-19 Responses

    Check out this nice little Q&A from Troy Hunt and Elaine van Bergen on Privacy & confidentiality in COVID-19 responses…

    1 条评论

社区洞察

其他会员也浏览了