登录查看更多内容

Artificial Creativity

Andrew Cross, Ph.D

Video. Technology. Innovation.

发布日期: 2022年9月25日

+ 关注

By Andrew Cross, Ph.D

Generating images.

I started programming as a teenager on a Sinclair ZX Spectrum which had a staggering image resolution of 256x192 pixels; this is how I started learning to code 3D wireframe graphics. This became my lifelong obsession and over the years I have written programs that have generated billions of images shown on TV screens in most countries in the world. In a sad sign of how long I have loved everything to do with pixels, my daughter - who is now a teenager herself - now tells me that coding in C++ it automatically adds 40 years to whatever age I claim to be.

Code that generates images generally follows the same basic recipe: take an input and then apply some set of mathematical steps in order to build an image. That output is then taken and “magically” displayed on a monitor for the world to admire.?

In real-time video processing you might use about 50 mathematical steps to compute the color of each pixel; you cannot use a lot more because you need to keep up with the deluge of new frames coming into the system. If you are really fancy, have a fast computer and want to start doing all sorts of advanced image filtering then you might end up doing 500 operations for each image pixel!

As you start to do far more sophisticated image creation, you move into the realm of 3D rendering techniques like raytracing. The complexity increases pretty quickly as you start to code operations that traverse tree structures, compute indirect lighting, integrate color from volume light sources and run complex surface shaders. It would not be uncommon to need to use 5 million compute operations for every pixel on the image.

Understanding pixels.

As a coder, instinctively your mind somehow learns to understand what each sequence of operations you type into the console will do. It becomes second nature to chain together a few thousand operations and have some relatively predictable image show on a monitor when you run the program. To understand how a programmer sees this, imagine a single piece of paper with a lot of numbers printed on it, one can see how if you add them all together - especially if there was some kind of pattern to them - and doing repeatedly for each pixel an image can be built up.?

If you take one step further down the rabbit hole, image generation gets one step more complex and when you start using more complex image algorithms that make the color of each pixel in the image dependant not just on the input, but also on all of the pixels that might surround it in the image.

As you start making pixels become dependent on other pixels you quickly start burning through many millions of operations to make images. As complex as this might sound, this is well within the grasp of what we can all understand; while it might take a few mind games to wrap your head around how pixels interact with each-other, you normally can get to a “feel” for what your program is doing after a bit.

If you go back ten years and look at the imagery produced using ray-tracing, fractals or complex image filtering at about the complexity level described; the results are remarkable even today. More remarkably, the millions of operations that those images take to generate and hours of processing they took are now possible to perform in real-time.?

Never content, once we know that we can use millions of operations to create a pixel, it is human nature to ask ourselves what happens when we start to take over a billion steps to craft the color of every dot in an image?

As one moves from millions of operations to billions of operations, something changes.

Misunderstanding pixels.

At high precision, most computer systems represent each pixel by a few billion discrete values for each of the red, green and blue; that is a lot. Conceptually it feels like something has changed when we are using billions of operations: you are now performing more mathematical operations for each pixel than there are values that can even be represented. Somehow this seems to cross the uncanny valley at which it becomes almost impossible to instinctively understand how a pixel is being formed.

When a billion operations contribute to each color it is simply no longer possible for an individual to everwrite out or directly conceive of a computer program formed from unique operations in any reasonable way. For the purpose of illustration, imagine that each mathematical operation is represented by a written word. A 400-page book typically contains about 100,000 words. This means that a book of a billion words would need to be about 4 million pages long. If this was a computer program required to determine the color of just a single pixel, it would take a programmer 76 years to type out if they worked 24 hours a day.

This is simply not possible of course, instead what we do is create very simplified programs that chain together billions of mathematical operations in a predictable structure called a?“Neural Network”. Because no human could ever contemplate how to design all the parameters that this network would require by hand, what we do is give a computer millions of examples of what we want and then let it work out how to best structure these parameters to solve the problem.?

This is quite profound, and I wonder how many people understand that humanity has reached a point at which computers can teach themselves to do things that we do not truly understand how to achieve directly ourselves. Today it is relatively easy for a modern neural network to recognize objects in an image as well as any human can. No coder has ever come remotely close to actually achieving this same level of performance with a hand-authored program. Every day, when we use speech recognition on our cell phones, face recognition to unlock a phone or have emails identified as spam - we are using programs for which all of the millions of parameters were discovered by a computer and not by humans.

Somehow it does not seem threatening to us that computers can now program themselves to do things that could not. We all seem at peace with the fact that humans remain special; after all we have the ability to improvise and innovate while computers just follow the rules. We can still sleep well at night.

Sleeping.

It did not take very long for smart people working on neural networks to realize that if a computer can run a program that takes an image and tells us what is in it, then surely we could run that same program backwards and do the opposite. One might then be able to tell a computer what we are interested in and it would generate an image of that for us. Amazingly, this process works.?

It is not quite as simple as it sounds and took years of iterative improvements by smart engineers for this approach to be refined. It then took many years of computing time and terabytes of image databases, but progress never goes backwards. A few years ago we created and trained the first neural networks that allowed you to describe in detail what you want in an image, and they would create what you want with remarkable fidelity. More profoundly, you can tell it what you want and it will generate an unlimited number of different images that represent what you want.

Dreaming.

I work at a company called Grass Valley where we make live video production tools. As part of our job we build the technology that helps the world’s biggest media companies create most of the TV shows you probably watch. Like most others, our company is working to make our tools so that they can run in Cloud datacenters and let people create an entire TV station at the click of a button.

One afternoon this week, I got bored at work and decided that I wanted to come up with some creative advertising ideas for our products. When I get bored, people that work with me always start to get very worried because invariably the “clever ideas” end up involving lasers and all create chaos for everyone else around me. I suppose this is why they often quickly mark themselves “offline”.

Deciding that I should limit the blast radius of this chaos to myself, I decided to fire up the Python programming language and an AI image generator.?I gave it the prompt “creative advertising images for cloud live production” and I asked it to generate a few hundred images for me. The results are staggering.?

nspired, I tried generating far more general “beautiful creative images” to see what is possible. Below are some of the image ideas that the computer generated. It is worth taking a pause to re-enforce that I am not cheating in any way: every pixel of every image was entirely generated by a computer program learned from lots of examples of reality, I then simply instructed it “to be creative” as a prompt (with a few image style guides).?

If what makes us different from computers is our ability to improvise and innovate, then we might need to think again.

Waking up.

When my daughter (who now codes) was born, one of the most profound things that a friend told me was that I should never look back. She would grow up into being her own person and that I should simply appreciate every day that she was part of our life before she left home to start her own path.

Maybe we need to start feeling the same way about the technologies that we create. If it is not creativity that sets us special, maybe what does is that we can marvel at the technologies we create as inevitably surpass our own abilities.

Jason Holtkamp

Software Development Engineer @ Amazon

2 年

Awesome commentary, Andrew. One thing I often wonder is: who will leverage the creative power of these new models to their highest capacity? Just because these tools are capable of incredible creativity doesn't mean that everyone will be equally proficient at extracting that creativity... these tools still require a human input in the form of a prompt with which to start their creative process. It follows that some individuals and organizations will be more skilled at leveraging the power of these tools than others. In other words, when everyone has a prodigy artist at their side, waiting for a project idea, who walks away with the best painting?

1 次回应

Robert Pray

Multimedia Producer at DVMedia+

2 年

Andrew Cross, Ph.D, great read and some stunning images to go with it. My first was a comm adore64.

Mathieu Woisson

IT and AVoIP expert, Product Manager A/V

2 年

Great read, felt very personal. Thanks Andrew for sharing these images, insights and feelings.

Christian Parker

Quality Assurance Team Lead at Docsink

2 年

Glad to hear your daughter is teaching you a few things. Good for her and tell her I said hello

Oswaldo Garcia

Customer Support Manager, Full Stack Developer

2 年

Awesome article on AI. Thank you for sharing your insight.

查看更多评论

要查看或添加评论，请登录

Andrew Cross, Ph.D的更多文章

Innovation is not enough.

2024年4月21日

Innovation is not enough.

By Andrew Cross, Ph.D, All opinions expressed are those of the author and not their employer.

22 条评论
The speed paradox.

2023年5月17日

The speed paradox.

By Andrew Cross, Ph.D, All opinions expressed are those of the author and not their employer.

6 条评论
Shouting secrets out loud.

2023年4月21日

Shouting secrets out loud.

By Andrew Cross, Ph.D, All opinions expressed are those of the author and not their employer.

6 条评论
The center of the maze.

2023年4月2日

The center of the maze.

By Andrew Cross, Ph.D, All opinions expressed are those of the author and not their employer.

8 条评论
Self-portraits.

2023年2月28日

Self-portraits.

By Andrew Cross, Ph.D, with significant input and suggestions from Steve Bowie.

7 条评论
Learning from products named after kitchen appliances.

2022年12月27日

Learning from products named after kitchen appliances.

By Andrew Cross, Ph.D, with significant input and suggestions from Steve Bowie.

12 条评论
The Video Singularity.

2022年11月22日

The Video Singularity.

By Andrew Cross, Ph.D, with significant contributions and advice from Steve Bowie.

15 条评论
Thinking About Decisions.

2022年11月10日

Thinking About Decisions.

By Andrew Cross, Ph.D, with significant contributions from Steve Bowie.

10 条评论
The Temple of Data.

2022年10月18日

The Temple of Data.

By Andrew Cross, Ph.D The world seems to have accepted without question that the more data that you have, the more that…

14 条评论

See all articles

Generating images.

Understanding pixels.

Misunderstanding pixels.

Sleeping.

Dreaming.

Andrew Cross, Ph.D的更多文章

Innovation is not enough.

The speed paradox.

Shouting secrets out loud.

The center of the maze.

Self-portraits.

Learning from products named after kitchen appliances.

The Video Singularity.

Thinking About Decisions.

The Temple of Data.