Exploring the AI-Generated Image Landscape
TLDR: In this post I reflect on my early experiences with digitising images, computer vision and AI-generated content (AIGC). I discuss concerns around AIGC and conduct my own experiment across Midjourney and DALL.E 2 to generate images of New Zealand native birds.
When I was 16 years old, I first experienced a flatbed scanner that gave me the power to digitise photographs and load images into a computer. This technology set me on a path to study computer science and computer vision at university almost 30 years ago.
The morphing sequence in Michael Jackson's Black or White video in 1991 inspired me to go further and I was soon making my own morphing videos on my home computer photographing all my friends and then digitising the images. I remember it rendering for days to generate the frames to create a 5 second video. When I showed the video to my friends and teacher at school, they couldn't believe what they were seeing. This was before the Internet!
Large advances in computer vision had a long winter until the launch of the of the annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2010. ILSVRC encouraged research in computer vision and benchmarked the progress of image recognition algorithms.
ImageNet was conceived by Fei-Fei Li, a Stanford University professor in 2006. The project aimed to create a vast database of labeled images to facilitate the development of computer vision algorithms. Spanning over 14 million images with annotations, ImageNet covers a diverse range of objects, scenes, and concepts, organised according to the WordNet hierarchy. The labeling exercise was outsourced to people around the world using Amazon Mechanical Turk.
ILSVRC led to the emergence of deep learning and rapid progress in image recognition. Five years ago, I wrote a post about this.
We are now entering a new phase in the advancement of AI-Generated Content (AIGC).
One of the most notable breakthroughs in AI-generated image generation was the development of Generative Adversarial Networks (GANs) by Ian Goodfellow and his team in 2014. GANs consist of two neural networks, the generator and the discriminator, that work together to create realistic images.
Building on the success of GANs, researchers at NVIDIA developed StyleGAN and its successors, StyleGAN2 and StyleGAN3. These models improve the quality of generated images and introduced new features, such as style mixing and the ability to control different aspects of an image.
The advancements of Large Language Models (LLMs) have bought about an interesting phenomenon which is the creation of images from text prompts. Three of the most popular services leveraging this technology are OpenAI's DALL.E 2 which is behind Microsoft's new Image Creator as well as Midjourney and Stable Diffusion.
Like any new technology there are concerns around AI-Generated Image Creation:
Before you can use the Microsoft Azure OpenAI Service you need to agree to the companies Responsible AI policies. I worked with Natasha Crampton at Microsoft New Zealand for many years. Natasha moved to Redmond in 2018 to the role of chief counsel to the AETHER Committe and is now Microsoft's chief Responsible AI officer and posts often on the responsible AI program.
With all this as a backdrop this morning I conducted my own experiment across Midjourney and DALL.E 2 carrying on the theme of native birds from Aotearoa that I started five years ago.
My first prompt: "three cute baby kākāpō in the New Zealand native bush, 3D Pixar cartoon style"
Some observations:
I then turned to a more photo realistic prompt "kākāpō on the forest floor in the New Zealand native bush."
领英推荐
Some observations:
If you search for images of kākāpō you often get a photo of a kea instead with its distinctive hooked beak. This feature seems to have made its way incorrectly into both Midjourney and DALL.E's models.
I was very impressed by this imagined image from Midjourney
That said if you zoom in you will see what appears to be jumbled copyright text at the bottom of the image.
When I was building my own kākāpō classifier five years ago I noticed that most of the high-quality photos of kākāpō on the web had copyright notices watermarked into the images. A visual image search from the generated image links to copyrighted photos like this one.
The morphing between the kākāpō and the kea got me thinking and I created the same prompt for a kea.
The distinctive curved beak certainly came through in these examples. Again, the images produced by DALL.E 2 were out of proportion and less accurate. The following imagined kea photo is fantastic.
The next thing I thought I would try to generate an image from a reference photo.
I picked this reference photo from Rob Pine to "re-imagine".
Where things get really interesting is when I used that image as a reference and dialed up the emotion by editing my prompt to include things like expressive human eyes. When I did this the strangest thing happened. Midjourney imagined what appear to be two bird like children focusing in on the emotion in the eyes.
The landscape of AI-generated image generation is rapidly evolving as researchers, developers, and policymakers navigate this landscape, striking a balance between innovation and responsible development will be key to unlocking the full potential of AI-generated images.
I am only just scratching the surface over what these tools can do. I recommend if you are keen to get started experimenting for yourself you check out the Introduction to Prompt Engineering for Generative AI course on LinkedIn Learning that is currently free.
Ngāti Ruanui | Aotea Waka | Proud dad
1 年The Fake Trump photo got me for a second
Senior Partner Development Manager @ Microsoft | Partner Solutions, Sales
1 年What an awesome image ??
Engineering Leader | Driving Innovation and Observability in Generative AI Applications
1 年Great article and very interesting experiments that support the earlier points about copyright, etc.
Data, Analytics and AI || Driving Business outcomes with Data, Analytics and AI
1 年Wow this is so cool Nigel.
Technology Innovator | Fractional CxO | AI | Cyber Security | Investor | Author | Empowering Businesses, Enhancing Lives: Uniting technology and human insight for a more prosperous, enjoyable, smarter and safer world.
1 年This is a very good overview and you are raising some very interesting questions. Time to think.