登录查看更多内容

Imagination is the limit with ControlNet 1.1

Jeff Wang

Taking Care of Business at Codeium

发布日期: 2023年4月21日

I’ve been backlogged on AI articles since so many new products come out every day, so it is probably time for me to jump from LLM stuff back to Generative Art stuff! If you want to follow me on substack as I try to break into AI learning, you can go here.

Last week, ControlNet on Stable Diffusion got updated to 1.1, which boosts the performance and quality of images, while also having models for more specific use cases.

No alt text provided for this image — Full Metal Alchemist character as a real child, showing the web-ui screen

With ControlNet, you can generate a prompt of text on top of an image and have the same traits, or tweak a pose on a 3D model to any position and create a description to show how you want the final image to look. Sound confused? An example would be taking an anime and turning it into a real life photo:

You can also go from real life to anime as well, here’s Elon Musk as an anime villain:

Intrigued? If you don’t care about instructions to install yet, you can skip this next section.

To install, you’ll need the ControlNet models from Hugging Face, to install all these at once, make sure you have Git LFS and Github, and run:

git lfs install

git clone https://huggingface.co/lllyasviel/ControlNet-v1-1

This will download all the models (around 10GB) to your computer. Or you can download them manually here. Make sure these go to the webui/models/ControlNet folder and NOT the ControlNet-v1-1 folder that you git clone

For those needing to install Stable Diffusion from scratch, I’ll create a tutorial some day when I’m not busy keeping up to date with other stuff

On your Stable Diffusion extension, you can search for sd-webui-controlnet under Extension → Available tab, OR if you already have it, you can select Extensions → Installed and click on “Check for updates”. Once you install, it will be under the txt2img tab and you’ll need to expand the ControlNet section near the bottom. Then click “enable”, and select the preprocesser and model for the corresponding type of image you want.

Finally, it helps to use something like Realistic Vision, which are photorealistic trained models for Stable Diffusion that you can use with ControlNet.

Alright! Let’s take a look at ControlNet and the latest features. With 1.1, the immediate thing you’ll notice is that there are WAY MORE preprocessors:

This also include corresponding models for each one, ControlNet 1.1 makes these images much higher quality and have less artifacts than 1.0. Let’s take a look at some of these options:

Canny is when you want the likeness of one picture and carry it to other options. So I can take a cute picture of a puppy:

And generate several other ones:

Depth is generating a model and making a 3D representation of it, then applying it to a new prompt, so taking this meme:

And generating a 3D version of it means you can then change the character:

Let’s try some architecture stuff, there is an option called MLSD, which is meant for straight lines, good for redesigning a room. For example, let’s take Seinfeld’s iconic room:

Let’s give him a makeover, my only prompt was “a modern family room”:

领英推荐

This AI newsletter is all you need #95

Towards AI 11 个月前

Quantization, Linear Regression, and Hardware for AI:…

Towards Data Science 11 个月前

Almost Timely News: ??? The DROID Framework for AI…

Christopher Penn 4 个月前

As you can see, all the lines are in the same places, and it updated the various objects outside of it.

Another feature, Segment, takes the outline of a prominent subject and wraps the prompt on top of it. One cool example is taking a city skyline like San Francisco:

And wondering what this skyline could like like in 50 years (if the buildings never got any higher):

OpenPose is a favorite of mine, you can find an image and it will take all the positions and recreate it to your prompt:

You can even generate your own stick figure position and wrap a new image on top:

A new feature is Shuffle, which takes an existing image and changes objects around, it does this by distorting the original image and then trying to process a new version of it:

This means you can take an existing photo of a skyline, then mix the elements together. Or you can take a comic book character and shuffle around the positions and such while maintaining a lot of the original detail

Finally the other new 1.1 options include lineart, which take the lines of the detail of the original image, then shuffle around the colors in stunning detail:

There’s even an anime specific setting for this, has pretty good results, note this was generated off a real picture of Steph Curry:

Finally a famous one is scribble, which comes back with multiple features. You just need to draw a rough picture like this:

Add something like, “Mountain range” and you’ll get:

Or is it…a camel’s back?

Alright, now that we’ve described many of the types of layers you can add to an existing photos, what other use cases can your imagination go?

As it turns out, people have been using ControlNet to generate videos. By being able to take each frame and applying a layer on top, you can turn any existing video into a brand new scene. For example turn any movie into an anime, or any anime into a 3d graphics movie, the sky is the limit here and other projects are already catching on. The tricky part here is keeping the same prompt and wrapper on top, and also deflickering the final video because it will have some inconsistencies with lighting.

There are two new tools that have just come out to help, while Ebsynth has been popular, you can use TemporalKit to add frames together that have been superimposed on the existing video images, or, there is now Mov2mov, where you simply drop a video and give a prompt, and it will output the video back to you using the methods we have just talked above, however, I have tried this and it doesn’t work well yet. I am hoping this is not a virus…

That’s all I have time for today, but I hope you’ve caught up to speed in the latest release from ControlNet and some of the video stitching programs that are coming out. With Runway Gen2 released today as of writing, there needs to be some open source competition to keep things moving!

要查看或添加评论，请登录

Jeff Wang的更多文章

Falling into Microsoft

2024年4月3日

Falling into Microsoft

Taken from https://jeffwang.substack.
How to Hit a Homerun

2023年8月30日

How to Hit a Homerun

It’s your turn at bat, it’s time to take a swing, what do you do? If you’re a startup founder or VC, you’re likely…

1 条评论
Race against the machine

2023年5月4日

Race against the machine

It’s all over the news: “AI will take over all our jobs”, “mass layoffs because of AI”, “anxiety whether humans will be…

1 条评论
Automating your newsletters in your voice

2023年4月19日

Automating your newsletters in your voice

In my last issue, I imitated a chatroom by training data from a massive Facebook Messenger chat, used Codeium to create…
Using LLaMa to Impersonate Friends

2023年4月18日

Using LLaMa to Impersonate Friends

(If you want to follow my AI learnings, you can follow my substack here) What if you could take a chat room with…

1 条评论
AutoGPT is the next iteration of ChatGPT

2023年4月14日

AutoGPT is the next iteration of ChatGPT

Alright, we’ve heard lots of amazing uses for ChatGPT and how it will enhance (or even replace) human tasks. With the…

1 条评论
The AI (3D) Space Race

2023年4月11日

The AI (3D) Space Race

Every day, the Generative AI and Art space has another breakthrough, whether it’s creating photorealistic images and…
The Open-Source LLM Effect

2023年4月5日

The Open-Source LLM Effect

Keeping up with AI is a full time job, I would know since I’m looking for one! Everyday, new start-ups are created…

5 条评论
Thanks to LLaMa, you too can GPT

2023年3月8日

Thanks to LLaMa, you too can GPT

AI is moving at a blistering fast pace, there are mind-blowing Stable Diffusion extensions that come out every week…
First there was COIN, then there was BASE

2023年2月24日

First there was COIN, then there was BASE

(Excerpt taken from my newsletter at https://www.rocketfuelcrypto.

1 条评论

See all articles

Imagination is the limit with ControlNet 1.1

Jeff Wang

Taking Care of Business at Codeium

领英推荐

Jeff Wang的更多文章

社区洞察

其他会员也浏览了

This AI newsletter is all you need #5

This AI Newsletter is all you need #16

AGI - Powerful AI - Tomato - Tomahto

LLMs Are Becoming a Commodity—Now What?

Introducing Reflective Engineer: Building Conscious Agents

DeepSeek R1: The Incremental Innovation That Feels Disruptive

It’s time to dilly, DALL-E

You’ve Probably Heard About O3... but what comes next

Build a Q/A system using Langchain and Clarifai.

AGI Bible: Intelligent Causal Machines: Overwriting AI/ML/DL/LLM/AGI

领英推荐

Jeff Wang的更多文章

Falling into Microsoft

How to Hit a Homerun

Race against the machine

Automating your newsletters in your voice

Using LLaMa to Impersonate Friends

AutoGPT is the next iteration of ChatGPT

The AI (3D) Space Race

The Open-Source LLM Effect

Thanks to LLaMa, you too can GPT

First there was COIN, then there was BASE

社区洞察

其他会员也浏览了

This AI newsletter is all you need #5

This AI Newsletter is all you need #16

AGI - Powerful AI - Tomato - Tomahto

LLMs Are Becoming a Commodity—Now What?

Introducing Reflective Engineer: Building Conscious Agents

DeepSeek R1: The Incremental Innovation That Feels Disruptive

It’s time to dilly, DALL-E

You’ve Probably Heard About O3... but what comes next

Build a Q/A system using Langchain and Clarifai.

AGI Bible: Intelligent Causal Machines: Overwriting AI/ML/DL/LLM/AGI