Imagination is the limit with ControlNet 1.1
Meme swapping is an easy task with ControlNet

Imagination is the limit with ControlNet 1.1

I’ve been backlogged on AI articles since so many new products come out every day, so it is probably time for me to jump from LLM stuff back to Generative Art stuff! If you want to follow me on substack as I try to break into AI learning, you can go here.

Last week, ControlNet on Stable Diffusion got updated to 1.1, which boosts the performance and quality of images, while also having models for more specific use cases.

No alt text provided for this image
Full Metal Alchemist character as a real child, showing the web-ui screen

With ControlNet, you can generate a prompt of text on top of an image and have the same traits, or tweak a pose on a 3D model to any position and create a description to show how you want the final image to look. Sound confused? An example would be taking an anime and turning it into a real life photo:

You can also go from real life to anime as well, here’s Elon Musk as an anime villain:

No alt text provided for this image
Elon as an anime villain

Intrigued? If you don’t care about instructions to install yet, you can skip this next section.

To install, you’ll need the ControlNet models from Hugging Face, to install all these at once, make sure you have Git LFS and Github, and run:

git lfs install

git clone https://huggingface.co/lllyasviel/ControlNet-v1-1

This will download all the models (around 10GB) to your computer. Or you can download them manually here. Make sure these go to the webui/models/ControlNet folder and NOT the ControlNet-v1-1 folder that you git clone

For those needing to install Stable Diffusion from scratch, I’ll create a tutorial some day when I’m not busy keeping up to date with other stuff

On your Stable Diffusion extension, you can search for sd-webui-controlnet under Extension → Available tab, OR if you already have it, you can select Extensions → Installed and click on “Check for updates”. Once you install, it will be under the txt2img tab and you’ll need to expand the ControlNet section near the bottom. Then click “enable”, and select the preprocesser and model for the corresponding type of image you want.

No alt text provided for this image
You’ll need to go to txt2img and expand this section here^

Finally, it helps to use something like Realistic Vision, which are photorealistic trained models for Stable Diffusion that you can use with ControlNet.

Alright! Let’s take a look at ControlNet and the latest features. With 1.1, the immediate thing you’ll notice is that there are WAY MORE preprocessors:

No alt text provided for this image

This also include corresponding models for each one, ControlNet 1.1 makes these images much higher quality and have less artifacts than 1.0. Let’s take a look at some of these options:

Canny is when you want the likeness of one picture and carry it to other options. So I can take a cute picture of a puppy:

No alt text provided for this image

And generate several other ones:

No alt text provided for this image

Depth is generating a model and making a 3D representation of it, then applying it to a new prompt, so taking this meme:

No alt text provided for this image

And generating a 3D version of it means you can then change the character:

No alt text provided for this image
Elon’s face after coming up with a tweet


No alt text provided for this image
Just added Spiderman to the prompt

Let’s try some architecture stuff, there is an option called MLSD, which is meant for straight lines, good for redesigning a room. For example, let’s take Seinfeld’s iconic room:

No alt text provided for this image

Let’s give him a makeover, my only prompt was “a modern family room”:

No alt text provided for this image

As you can see, all the lines are in the same places, and it updated the various objects outside of it.

Another feature, Segment, takes the outline of a prominent subject and wraps the prompt on top of it. One cool example is taking a city skyline like San Francisco:

No alt text provided for this image

And wondering what this skyline could like like in 50 years (if the buildings never got any higher):

No alt text provided for this image

OpenPose is a favorite of mine, you can find an image and it will take all the positions and recreate it to your prompt:

No alt text provided for this image
First it’s the Avengers…


No alt text provided for this image
Now it’s a K-Pop band

You can even generate your own stick figure position and wrap a new image on top:

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

A new feature is Shuffle, which takes an existing image and changes objects around, it does this by distorting the original image and then trying to process a new version of it:

No alt text provided for this image

This means you can take an existing photo of a skyline, then mix the elements together. Or you can take a comic book character and shuffle around the positions and such while maintaining a lot of the original detail

Finally the other new 1.1 options include lineart, which take the lines of the detail of the original image, then shuffle around the colors in stunning detail:

No alt text provided for this image
The original image was AI too

There’s even an anime specific setting for this, has pretty good results, note this was generated off a real picture of Steph Curry:

No alt text provided for this image
Steph Curry in the next anime movie

Finally a famous one is scribble, which comes back with multiple features. You just need to draw a rough picture like this:

No alt text provided for this image

Add something like, “Mountain range” and you’ll get:

No alt text provided for this image

Or is it…a camel’s back?

No alt text provided for this image
6 legged camel but it didn’t have much scribble to go on…

Alright, now that we’ve described many of the types of layers you can add to an existing photos, what other use cases can your imagination go?

As it turns out, people have been using ControlNet to generate videos. By being able to take each frame and applying a layer on top, you can turn any existing video into a brand new scene. For example turn any movie into an anime, or any anime into a 3d graphics movie, the sky is the limit here and other projects are already catching on. The tricky part here is keeping the same prompt and wrapper on top, and also deflickering the final video because it will have some inconsistencies with lighting.

There are two new tools that have just come out to help, while Ebsynth has been popular, you can use TemporalKit to add frames together that have been superimposed on the existing video images, or, there is now Mov2mov, where you simply drop a video and give a prompt, and it will output the video back to you using the methods we have just talked above, however, I have tried this and it doesn’t work well yet. I am hoping this is not a virus…

That’s all I have time for today, but I hope you’ve caught up to speed in the latest release from ControlNet and some of the video stitching programs that are coming out. With Runway Gen2 released today as of writing, there needs to be some open source competition to keep things moving!

要查看或添加评论,请登录

Jeff Wang的更多文章

  • Falling into Microsoft

    Falling into Microsoft

    Taken from https://jeffwang.substack.

  • How to Hit a Homerun

    How to Hit a Homerun

    It’s your turn at bat, it’s time to take a swing, what do you do? If you’re a startup founder or VC, you’re likely…

    1 条评论
  • Race against the machine

    Race against the machine

    It’s all over the news: “AI will take over all our jobs”, “mass layoffs because of AI”, “anxiety whether humans will be…

    1 条评论
  • Automating your newsletters in your voice

    Automating your newsletters in your voice

    In my last issue, I imitated a chatroom by training data from a massive Facebook Messenger chat, used Codeium to create…

  • Using LLaMa to Impersonate Friends

    Using LLaMa to Impersonate Friends

    (If you want to follow my AI learnings, you can follow my substack here) What if you could take a chat room with…

    1 条评论
  • AutoGPT is the next iteration of ChatGPT

    AutoGPT is the next iteration of ChatGPT

    Alright, we’ve heard lots of amazing uses for ChatGPT and how it will enhance (or even replace) human tasks. With the…

    1 条评论
  • The AI (3D) Space Race

    The AI (3D) Space Race

    Every day, the Generative AI and Art space has another breakthrough, whether it’s creating photorealistic images and…

  • The Open-Source LLM Effect

    The Open-Source LLM Effect

    Keeping up with AI is a full time job, I would know since I’m looking for one! Everyday, new start-ups are created…

    5 条评论
  • Thanks to LLaMa, you too can GPT

    Thanks to LLaMa, you too can GPT

    AI is moving at a blistering fast pace, there are mind-blowing Stable Diffusion extensions that come out every week…

  • First there was COIN, then there was BASE

    First there was COIN, then there was BASE

    (Excerpt taken from my newsletter at https://www.rocketfuelcrypto.

    1 条评论

社区洞察

其他会员也浏览了