Imagination is the limit with ControlNet 1.1
I’ve been backlogged on AI articles since so many new products come out every day, so it is probably time for me to jump from LLM stuff back to Generative Art stuff! If you want to follow me on substack as I try to break into AI learning, you can go here.
Last week, ControlNet on Stable Diffusion got updated to 1.1, which boosts the performance and quality of images, while also having models for more specific use cases.
With ControlNet, you can generate a prompt of text on top of an image and have the same traits, or tweak a pose on a 3D model to any position and create a description to show how you want the final image to look. Sound confused? An example would be taking an anime and turning it into a real life photo:
You can also go from real life to anime as well, here’s Elon Musk as an anime villain:
Intrigued? If you don’t care about instructions to install yet, you can skip this next section.
To install, you’ll need the ControlNet models from Hugging Face, to install all these at once, make sure you have Git LFS and Github, and run:
git lfs install
git clone https://huggingface.co/lllyasviel/ControlNet-v1-1
This will download all the models (around 10GB) to your computer. Or you can download them manually here. Make sure these go to the webui/models/ControlNet folder and NOT the ControlNet-v1-1 folder that you git clone
For those needing to install Stable Diffusion from scratch, I’ll create a tutorial some day when I’m not busy keeping up to date with other stuff
On your Stable Diffusion extension, you can search for sd-webui-controlnet under Extension → Available tab, OR if you already have it, you can select Extensions → Installed and click on “Check for updates”. Once you install, it will be under the txt2img tab and you’ll need to expand the ControlNet section near the bottom. Then click “enable”, and select the preprocesser and model for the corresponding type of image you want.
Finally, it helps to use something like Realistic Vision, which are photorealistic trained models for Stable Diffusion that you can use with ControlNet.
Alright! Let’s take a look at ControlNet and the latest features. With 1.1, the immediate thing you’ll notice is that there are WAY MORE preprocessors:
This also include corresponding models for each one, ControlNet 1.1 makes these images much higher quality and have less artifacts than 1.0. Let’s take a look at some of these options:
Canny is when you want the likeness of one picture and carry it to other options. So I can take a cute picture of a puppy:
And generate several other ones:
Depth is generating a model and making a 3D representation of it, then applying it to a new prompt, so taking this meme:
And generating a 3D version of it means you can then change the character:
Let’s try some architecture stuff, there is an option called MLSD, which is meant for straight lines, good for redesigning a room. For example, let’s take Seinfeld’s iconic room:
Let’s give him a makeover, my only prompt was “a modern family room”:
领英推荐
As you can see, all the lines are in the same places, and it updated the various objects outside of it.
Another feature, Segment, takes the outline of a prominent subject and wraps the prompt on top of it. One cool example is taking a city skyline like San Francisco:
And wondering what this skyline could like like in 50 years (if the buildings never got any higher):
OpenPose is a favorite of mine, you can find an image and it will take all the positions and recreate it to your prompt:
You can even generate your own stick figure position and wrap a new image on top:
A new feature is Shuffle, which takes an existing image and changes objects around, it does this by distorting the original image and then trying to process a new version of it:
This means you can take an existing photo of a skyline, then mix the elements together. Or you can take a comic book character and shuffle around the positions and such while maintaining a lot of the original detail
Finally the other new 1.1 options include lineart, which take the lines of the detail of the original image, then shuffle around the colors in stunning detail:
There’s even an anime specific setting for this, has pretty good results, note this was generated off a real picture of Steph Curry:
Finally a famous one is scribble, which comes back with multiple features. You just need to draw a rough picture like this:
Add something like, “Mountain range” and you’ll get:
Or is it…a camel’s back?
Alright, now that we’ve described many of the types of layers you can add to an existing photos, what other use cases can your imagination go?
As it turns out, people have been using ControlNet to generate videos. By being able to take each frame and applying a layer on top, you can turn any existing video into a brand new scene. For example turn any movie into an anime, or any anime into a 3d graphics movie, the sky is the limit here and other projects are already catching on. The tricky part here is keeping the same prompt and wrapper on top, and also deflickering the final video because it will have some inconsistencies with lighting.
There are two new tools that have just come out to help, while Ebsynth has been popular, you can use TemporalKit to add frames together that have been superimposed on the existing video images, or, there is now Mov2mov, where you simply drop a video and give a prompt, and it will output the video back to you using the methods we have just talked above, however, I have tried this and it doesn’t work well yet. I am hoping this is not a virus…
That’s all I have time for today, but I hope you’ve caught up to speed in the latest release from ControlNet and some of the video stitching programs that are coming out. With Runway Gen2 released today as of writing, there needs to be some open source competition to keep things moving!