How to run a Custom Stable Diffusion pipeline with GPUs: A Python?Tutorial
As a Software / ML Engineer without access to GPUs, if you want to run a custom Stable Diffusion (SD) pipeline with only Python code + a Cloud engine of your choice, this tutorial is meant to help you avoid a deep online rabbit hole.
I’ve implemented an example of a Stable Diffusion pipeline the Github repository below:
This article will also go through how to best get access to GPUs and CUDA environments to run it.
Is this tutorial right for you?
Most guides online talk about how to train these large models, not about running inference. Even if the tutorial is about running the model, it usually points to an online platform (e.g. Dreamstudio), a desktop app (e.g. DrawThings.ai), or an inference endpoint (e.g. Replicate) that does everything for you. But if you want to own the entire Python code used to run inference on the model, then there’s not much available. Which is why I created this tutorial.?
This article came out of a project I did a few weeks ago. My goal was to create a text-guided image-to-image pipeline where you could pass in some images and style them in a particular way based on the prompts you entered.
My rules were the following:
This guide will not explain how the model works. If you are interested, you should check out the Stable Diffusion with ?? Diffusers blog post or The Annotated Diffusion Model.
My Setup
Let's get started
We’ll break down the process into 4 steps.
STEP 1: Choose a Cloud GPU Provider and get your environment set up
Acquiring a GPU instance can be a time-consuming and occasionally frustrating process as most GPU-enabled instances on the major cloud computing providers (GCP, AWS) are taken.
I found that Lambda Labs had the easiest access to GPUs at a low cost. All the instances also come pre-installed with the Lambda Stack which contains Pytorch and the NVIDIA libraries such as CUDA and cuDNN.
# to check if you have a GPU
nvidia-smi
# to check if you have CUDA enabled
python -c "import torch; print(torch.cuda.is_available())
Once you have access to the instance with CUDA-enabled Pytorch installed, we can start to explore the img2img-pipeline GitHub repo.
Even if you don’t have CUDA enabled, that’s fine. The pipeline should still work, it will just be slower and you will not be using the GPUs.
STEP 2: Set up the img2img-pipeline Github repo
Details are all on the README file of the repo here. But I’ll outline the steps below anyway.
To get started, download the repo.
git clone https://github.com/zarifaziz/img2img-pipeline.git
Install requirements by running
# enter the repo
cd img2img-pipeline
# install requirements
pip install -r requirements.txt
# some extra libraries needing manual install
pip install typer diffusers transformers loguru accelerate xformers
STEP 3: Generate some images ??
Navigate to the data/input_images folder and upload some images that you want to stylize. The images can be in any format.
You can run the pipeline to make sure it’s all working with
python -m src.img2img_pipeline.commands.main run_pipeline
The command above processes all the images in the data/input_images directory all at once, picking a different model and prompteach time from the lists it has stored in src/im2img_pipeline/constants.py. Here are what the lists currently have:
If you want to run img2img on a single image with more control instead, you can do so with:
python -m src.img2img_pipeline.commands.main \
run_single_image_pipeline example_image.png \
--prompt "in the style of picasso" \
--model "stabilityai/stable-diffusion-2"
Most importantly, feel free to fork the repo and make changes to it as you wish! It’s very easy to extend it to your use cases. In the next section, we'll be going through some of the images you can generate with this pipeline.
Some of my generations ??
?? The model captured Salvador Dali’s artistic style very well in this one.
?? I loved the fact that it replaced the view of the Three Sisters rock formation perfectly with a castle.
Understanding what's happening in the img2img-pipeline package
The overall project structure of the repo is this:
.
├── README.md
├── data
│ ├── input_images
│ └── output_images
├── metrics.md
│ Metrics of the pipeline runs such as time, memory
├── requirements.txt
└── src
└── img2img_pipeline
Application source code
All the source code sits under src/img2img_pipeline
High-level command line
Running the entire img2img pipeline over a single image is implemented in the following lines of code in src/img2img_pipeline/commands/main.py.
It consists of only ~20 lines of code because the model class Img2ImgModel and the pipeline class DiffusionSingleImagePipeline are abstracted away in src/img2img_pipeline/model.py and src/img2img_pipeline/pipeline.py respectively.
Details of the Img2ImgModel in model.py
The code in this class was heavily inspired by the Diffusers library I talked about earlier. I strongly recommend going through their official docs and tutorials here:
I took lots of tips and tricks from this library to make the pipeline GPU memory efficient as well as fast. I would suggest reading through the README file in the repo to go through all the features?—?I won’t double up by talking about them here.
Conclusion and Action Points
In this guide, we saw a clear path to run a custom Stable Diffusion img2img pipeline using Python and Lambda Labs. You’ve seen how to set up your environment, access GPU resources, and use the img2img-pipeline repository to generate stylized images. It took me nearly two days to figure out when I began, so you’re ahead of the curve!
Now, you have the power to create your own image-to-image transformations and tweak the prompts and settings as you go along. Whether you choose to use this on Google Cloud, AWS, or locally, you have the flexibility to deploy it anywhere.
Action points for you:
I hope you found this valuable and inspiring. Happy image styling!
**************************************************************************
This article was originally published on Medium?at:
Zarif Aziz https://www.dhirubhai.net/posts/five-continents-5ce_aetaemaebaebaerabraesaewaegaeyaep-aepaeqaewaegaeqaeyabraepaesaepaetaepaezaepaes-activity-7216765895828750338-Et3R?utm_source=share&utm_medium=member_android
Data Scientist | AI/ML | Generative AI
11 个月Thanks for creating this post, is there any specific reason that why this project is using python 3.8 and not 3.10 or 3.11 ?
Founder & President of Dzine AI, IEEE Fellow
1 年Cool. You could also try Stylar.ai, a new??(and free) AI image editor that is quite useful for various image generation tasks.?
Founder @ Build Club - ?? the best place in the world to learn AI | Ex-BCG, Forbes 30 u 30
1 年Epic!
Co-Founder @ Sunflower AI | Easy Live Captions for Events
1 年Thanks for sharing the whole picture. Appreciate ??