登录查看更多内容

Exploring the AI-Generated Image Landscape

Nigel Parker

发布日期: 2023年5月4日

TLDR: In this post I reflect on my early experiences with digitising images, computer vision and AI-generated content (AIGC). I discuss concerns around AIGC and conduct my own experiment across Midjourney and DALL.E 2 to generate images of New Zealand native birds.

When I was 16 years old, I first experienced a flatbed scanner that gave me the power to digitise photographs and load images into a computer. This technology set me on a path to study computer science and computer vision at university almost 30 years ago.

The morphing sequence in Michael Jackson's Black or White video in 1991 inspired me to go further and I was soon making my own morphing videos on my home computer photographing all my friends and then digitising the images. I remember it rendering for days to generate the frames to create a 5 second video. When I showed the video to my friends and teacher at school, they couldn't believe what they were seeing. This was before the Internet!

Large advances in computer vision had a long winter until the launch of the of the annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2010. ILSVRC encouraged research in computer vision and benchmarked the progress of image recognition algorithms.

ImageNet was conceived by Fei-Fei Li, a Stanford University professor in 2006. The project aimed to create a vast database of labeled images to facilitate the development of computer vision algorithms. Spanning over 14 million images with annotations, ImageNet covers a diverse range of objects, scenes, and concepts, organised according to the WordNet hierarchy. The labeling exercise was outsourced to people around the world using Amazon Mechanical Turk.

ILSVRC led to the emergence of deep learning and rapid progress in image recognition. Five years ago, I wrote a post about this.

We are now entering a new phase in the advancement of AI-Generated Content (AIGC).

No alt text provided for this image — The history of Generative AI in CV, NLP and VL.

One of the most notable breakthroughs in AI-generated image generation was the development of Generative Adversarial Networks (GANs) by Ian Goodfellow and his team in 2014. GANs consist of two neural networks, the generator and the discriminator, that work together to create realistic images.

Building on the success of GANs, researchers at NVIDIA developed StyleGAN and its successors, StyleGAN2 and StyleGAN3. These models improve the quality of generated images and introduced new features, such as style mixing and the ability to control different aspects of an image.

The advancements of Large Language Models (LLMs) have bought about an interesting phenomenon which is the creation of images from text prompts. Three of the most popular services leveraging this technology are OpenAI's DALL.E 2 which is behind Microsoft's new Image Creator as well as Midjourney and Stable Diffusion.

Like any new technology there are concerns around AI-Generated Image Creation:

Ethical concerns - As AI-generated images become more realistic, concerns regarding ethics and misuse have emerged. One such concern is deepfakes, which involve manipulating images or videos to make it appear as though someone is doing or saying something they did not. These can be used for nefarious purposes, such as spreading misinformation or defaming individuals. Recent examples include the Fake Trump Arrest Photos.
Bias in AI Systems - AI-generated images can inadvertently perpetuate harmful stereotypes or biases. Since AI models learn from existing data, they can absorb and reproduce the biases present in that data. It is crucial for researchers and developers to actively work towards reducing biases in AI-generated images to ensure a more inclusive and diverse representation of people and objects. Jenka Gurfinkel does a great job of describing the influence of the American Smile on Midjourney
Environmental Impact - Training large AI models like GANs requires substantial computational resources, which can have a significant environmental impact. As the field advances, it is essential to consider the ecological footprint of these technologies and develop more energy-efficient models.
Copyright and Ownership - As AI-generated content becomes more prevalent, questions of copyright and ownership arise. Determining the intellectual property rights for images generated by AI systems is a complex issue that will require new legal frameworks and guidelines. This is also shaking up the music industry with a strong online debate.

Before you can use the Microsoft Azure OpenAI Service you need to agree to the companies Responsible AI policies. I worked with Natasha Crampton at Microsoft New Zealand for many years. Natasha moved to Redmond in 2018 to the role of chief counsel to the AETHER Committe and is now Microsoft's chief Responsible AI officer and posts often on the responsible AI program.

With all this as a backdrop this morning I conducted my own experiment across Midjourney and DALL.E 2 carrying on the theme of native birds from Aotearoa that I started five years ago.

My first prompt: "three cute baby kākāpō in the New Zealand native bush, 3D Pixar cartoon style"

Some observations:

Midjourney did a better job of imagining the background of NZ native bush.
DALL.E put three of the birds up a tree despite the fact that the kākāpō is flightless.

I then turned to a more photo realistic prompt "kākāpō on the forest floor in the New Zealand native bush."

Vincent Granville 8 个月前

OpenAI Introduces Whisper, The Case for “Single Basin…

Lightning AI 2 年前

TensorFlow Ecosystems for Deep Learning, Detecting…

Open Data Science Conference (ODSC) 2 年前

Some observations:

Again, Midjourney did a better job of imagining the background of NZ native bush including ferns.
DALL.E produced a bird that appeared to be a hybrid of many NZ native parrots including the kākāpō, the kea and the kākā.

If you search for images of kākāpō you often get a photo of a kea instead with its distinctive hooked beak. This feature seems to have made its way incorrectly into both Midjourney and DALL.E's models.

I was very impressed by this imagined image from Midjourney

That said if you zoom in you will see what appears to be jumbled copyright text at the bottom of the image.

When I was building my own kākāpō classifier five years ago I noticed that most of the high-quality photos of kākāpō on the web had copyright notices watermarked into the images. A visual image search from the generated image links to copyrighted photos like this one.

The morphing between the kākāpō and the kea got me thinking and I created the same prompt for a kea.

The distinctive curved beak certainly came through in these examples. Again, the images produced by DALL.E 2 were out of proportion and less accurate. The following imagined kea photo is fantastic.

The next thing I thought I would try to generate an image from a reference photo.

I picked this reference photo from Rob Pine to "re-imagine".

Where things get really interesting is when I used that image as a reference and dialed up the emotion by editing my prompt to include things like expressive human eyes. When I did this the strangest thing happened. Midjourney imagined what appear to be two bird like children focusing in on the emotion in the eyes.

The landscape of AI-generated image generation is rapidly evolving as researchers, developers, and policymakers navigate this landscape, striking a balance between innovation and responsible development will be key to unlocking the full potential of AI-generated images.

I am only just scratching the surface over what these tools can do. I recommend if you are keen to get started experimenting for yourself you check out the Introduction to Prompt Engineering for Generative AI course on LinkedIn Learning that is currently free.

Dan Te Whenua Walker

Ngāti Ruanui | Aotea Waka | Proud dad

1 年

The Fake Trump photo got me for a second

?? Eisa Q.

Senior Partner Development Manager @ Microsoft | Partner Solutions, Sales

1 年

What an awesome image ??

1 次回应

Drew Robbins

Engineering Leader | Driving Innovation and Observability in Generative AI Applications

1 年

Great article and very interesting experiments that support the earlier points about copyright, etc.

1 次回应

Nimish Rao

Data, Analytics and AI || Driving Business outcomes with Data, Analytics and AI

1 年

Wow this is so cool Nigel.

1 次回应

Igor Portugal

1 年

This is a very good overview and you are raising some very interesting questions. Time to think.

1 次回应

查看更多评论

要查看或添加评论，请登录

Nigel Parker的更多文章

Birthing a Startup: Our First Trimester

2024年8月6日

Birthing a Startup: Our First Trimester

The last three months at Vivara have been nothing short of exhilarating. Transitioning back into startup mode after…

16 条评论
Analysing Data with ChatGPT-4 Vision

2023年10月12日

Analysing Data with ChatGPT-4 Vision

TL/DR Massive potential when interpreting and explaining data, still making lots of mistakes. Our household is an early…

15 条评论
In the Wake of SVB's Collapse How Safe are Bank Deposits in NZ?

2023年3月12日

In the Wake of SVB's Collapse How Safe are Bank Deposits in NZ?

By now you will no doubt of heard about the bank run at SVB last week and be wondering what further implications this…

3 条评论
The Freedom to Change

2023年3月6日

The Freedom to Change

The Saros cycle is an eclipse cycle with a period of 18 years, 11 days and 8 hours. It was discovered by ancient…

149 条评论
AI Futures ??????

2022年8月18日

AI Futures ??????

During an employee all hands when Ahmed Mazhari made a recent trip to Australia, he was asked what book he was…

11 条评论
Te Wiki Hauora Tāne

2022年6月14日

Te Wiki Hauora Tāne

A boy born in Aotearoa today will be 20% more likely to die of a heart attack and 30% more likely to get type 2…

5 条评论
Me kite, me rongo, me kōrero te reo Māori

2021年6月21日

Me kite, me rongo, me kōrero te reo Māori

Last Saturday was Juneteenth Microsoft acknowledges this day by encouraging employees to spend the following Monday to…

3 条评论
2020 A Year of Curiosity

2021年2月7日

2020 A Year of Curiosity

Six years ago my friend Vaughan spoke about being an impossible person he has this theory that we are all capable of…

19 条评论
Fear a Black Planet

2020年6月19日

Fear a Black Planet

As a 13 year old I remember being dropped off at the Logan Campbell Centre in Auckland to attend my first concert with…

3 条评论
Privacy & Confidentiality in COVID-19 Responses

2020年5月20日

Privacy & Confidentiality in COVID-19 Responses

Check out this nice little Q&A from Troy Hunt and Elaine van Bergen on Privacy & confidentiality in COVID-19 responses…

1 条评论

See all articles

Exploring the AI-Generated Image Landscape

Nigel Parker

领英推荐

Nigel Parker的更多文章

社区洞察

其他会员也浏览了

Are generative AI apps the next self-driving cars?

The Evolution, Mechanisms, and Applications of Machine Learning

This Week in AI: GANs Tutorial, Fragmentation of MLOps, and a Chance to win $1000

Most Read AI/ML Research News Articles on Marktechpost in 2022 | ISSUE Jan 1, 2023

Why AI is growing so fast now?

AI Research News Update: Issue 4 (Dec 6-12, 2021)

2-Min AI Newsletter #15

The Big Bang Moment of Generative AI

AI's family tree

领英推荐

Nigel Parker的更多文章

Birthing a Startup: Our First Trimester

Analysing Data with ChatGPT-4 Vision

In the Wake of SVB's Collapse How Safe are Bank Deposits in NZ?

The Freedom to Change

AI Futures ??????

Te Wiki Hauora Tāne

Me kite, me rongo, me kōrero te reo Māori

2020 A Year of Curiosity

Fear a Black Planet

Privacy & Confidentiality in COVID-19 Responses

社区洞察

其他会员也浏览了

Are generative AI apps the next self-driving cars?

The Evolution, Mechanisms, and Applications of Machine Learning

This Week in AI: GANs Tutorial, Fragmentation of MLOps, and a Chance to win $1000

Most Read AI/ML Research News Articles on Marktechpost in 2022 | ISSUE Jan 1, 2023

Why AI is growing so fast now?

AI Research News Update: Issue 4 (Dec 6-12, 2021)

2-Min AI Newsletter #15

The Big Bang Moment of Generative AI

AI's family tree