An endless world of images, done by the machine

An endless world of images, done by the machine

For some years now I've been loosely following the AI space with quite some interest and how some professional fields are being pushed into a corner by improved AI that, in time, will take over work tasks from people. Some things are already using AI in some capacity and for sure it will be more. Soon. One of the most talked about fields has been self driving cars. And they are well on their way. It's enough to watch the Youtube channel 'AI Driver' to see that.

Previously I took some comfort in the notion that the creative fields would be among the last to be affected by AI. That was before DALL-E showed up, which is a text to image generator from 'Open AI'. Some of you may have seen images created by DALL-E 2 which is staggeringly correct and beatiful. But off course, DALL-E 2 is just one system out of several. This area is worked on by many. Google has it's 'Imagen' system recently announced.

'Midjourney'

So just to inform myself I decided to try this thing out. How should I as a professional designer approach this? Is there a need for worry or will this be just one of many tools I use in a few years time?

The header image for this article was created using 'Midjourney', and the text to create it was: "eyeball with insect legs, realistic". Then there are some 'modifiers' that determine the style of it. In 60 seconds Midjourney spits out 4 variations. And that's more or less all I have to do. An important thing to remember here is that this is Beta. Non of these systems are widely available yet. Which means they will improve fast in the coming years.

A few examples of some of the images I generated just after getting access to Midjourney:

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

So, as you can see some prompts works better than others. Or, rather some interpretations of the AI matches our own expectations better than others. Abstract images works pretty well and turn out great, while humans and animals work less well, as you can see in this example it made of the Devil.

Modifers

Many known artist have now become a 'modifier' in these systems. I can write: "Skateboarding worm in the style of Ruben, Michealangelo, Chriss Foss, Giger etc and it will generate in that style. So then the question about who owns that 'modifier' or style starts? After all, some artists have spent an entire career perfecting a certain style. And even brands like Unreal Engine and Artstation has become modifiers. Just as some words are banned for sensitive content reasons, maybe well known names and brands will come with a fee to use in these systems? (After commenting this I felt this should be in here as it's a really important question.)

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image


Controls and direction

Also, it seems to favor a straight on angle to the subject and this is where the lack of controls starts to become apparent. I tried to write in 'three quarter view' as a modifier, but that didn't really do much.

No alt text provided for this image

This exmaple also shows a similar problem. I couldn't get this tank to be full view or a 'wide shot' no matter what prompts I tried. It always cropped it by rendering it too close. It's easy to see improvements being made here to gain some more control or, you should just look at it like you're directing someone else. On the other hand I'm a total noob at this, so there's that.

Another thing it struggles with is relatonships. It doesn't really understand 'on', 'inside', 'beside' and things like that, which currently also makes it harder to direct it.

While trying prompts for the above image I encountered censorship for the first time. Certain words are banned from the system. In this case it was 'Flesh'. This is to prevent obscene and upsetting content to be created which opens up a whole new can of worms. The authors of these upcoming systems will, in some form or another, dictate what you can create and what not. Obviously this is an important topic as you don't want people to be able to create images of well known people doing criminal, acts for instance.

Some things it just can't get right though. Like this one of a supposed Yeti.

No alt text provided for this image
No alt text provided for this image

As you can see the AI creates variations of the prompt you enter and from there you can iterate on it indefinitely.

Diffusion

Most people's initial reaction to this tech is that this is just photobashing from millions of images across the internet to create new images; following the prompt the user wrote. This is not what is happening here. The system uses something called 'Diffusion', which is a rendering model that starts of with an image of noise and iterates on that until it meets the prompts decription. It's like image tagging in reverse, so there's a languade model in the works here too called GTP-3. It's better explained in this video by Cold Fusion TV.

While we're on the techy stuff I can mention that this doesn't require a monster machine to use. I can use it on my phone if I want to, as the test prompts are sent to a server and executed there. It then spits back the images to me to review.

Reflections

I must say though that it's really fun to just play around with this and see what comes out. I sat well into the night the first try I had with this and each image sparks new ideas. As for concepting ideas goes this is unbeatable, I would say, when it's fully ready. The AI almost always creates the unexpected, which can both be frustrating and fun, depending how you use it.

Another thing I noticed was the lack of ownership I felt about these images. I didn't really care about loosing them or saving them because I didn't really put any effort into them. Had I made the above asteroid skulle in ZBrush for instance and rendered it in Keyshot, I would have been pretty pleased with it and perhaps even put it into my portfolio. Now it's really not that important to me, which I find interesting.

Naturally this is already able to create animations as well. And with Nvidias latest announcement about their tech that can create 3d scenes from images, it will soon be able to do 3d content too.

Kristian S?refelt

Product design / Branding

2 年

great article, k! there has always been a twist and turn how we designers work. this is a new direction of course, but lets use it. there will always be someone to steer the boat, right? :)

I can only agree That the degree to which this experience mesmerizes me is more than a little bit scary. I am excited and a bit scared to see what happens pretty soon when this technology generates virtual environments and characters. Also I feel deeply sorry for all those super talented Illustrators and concept artists that have been made obsolete overnight.

Marcus Landstr?m

Bringing future to the forefront

2 年

Amazing and sehr kompliziert. Testa input ”magna” Johan Adolfsson

要查看或添加评论,请登录

Krister Karlsson的更多文章

  • Here comes colors

    Here comes colors

    One Friday afternoon a bit over a year ago I was done with my tasks for the week and spent an hour or so quickly…

    6 条评论
  • Pixel art and retro style

    Pixel art and retro style

    Initially pixel art was the only game in town. Now it has become a deliberate design choice.

    1 条评论
  • Arcane art direction

    Arcane art direction

    Art direction hasn't always been a thing in the games industry. But as projects, teams and investment grew over time it…

  • Robot and the Whale

    Robot and the Whale

    This is a short film I helped Jonas Forsman to make. My contribution was the characters design and modelling.

    1 条评论
  • '1993 Shenandoah' Nintendo Switch release

    '1993 Shenandoah' Nintendo Switch release

    In the early nineties I loved my Amiga 500. I loved “shoot ‘em up” games.

    22 条评论
  • Balancing the sh#& out of our shmup

    Balancing the sh#& out of our shmup

    1993 Space Machine is a classic shmup ála R-Type, IO and XenonII and was originally made for the Amiga 500 in 1992-93…

  • Get your story straight

    Get your story straight

    When we were about to finish our game project “The Spookening” I got into contact with a PR consultant named Gunnar, to…

    2 条评论
  • Wowing Ford designers using tape & cardboard

    Wowing Ford designers using tape & cardboard

    In late 2004 I met up with a real visionary guy and legend among car builders in Sweden who’d made a very impressive…

    2 条评论
  • Making a game in 1993

    Making a game in 1993

    Original article featured on Gamasutra. The Amiga 500 system was a perfect breeding ground for young aspiring game…

    9 条评论

社区洞察

其他会员也浏览了