Diffusion Model - Exploring Business Use Cases - Part 1
Anindita Desarkar, PhD
PhD in CSE (JU) || Product Owner || Gen AI Practitioner || Director @LTIMindtree|| Dedicated Researcher in Data Science, Gen AI || Mentor || Patents on AI/DS/Gen AI
In the previous two blogs Diffusion Model - Part 1 and Diffusion Model - Part 2 my friend and colleague Somsuvra Chatterjee has already explained the diffusion model and its architecture in detail. Here, we will explore a few real-life applications where we can generate 2D & 3D images and videos from text prompt as well as images by applying the various diffusion models.
The following scenarios present various utilities and corresponding business use cases of diffusion model.?
??1. Text-to-Image Generation
The input will be some text prompt and corresponding image will be generated as output. It can be used in:?
2. Text-to-Video Generation?
Diffusion models can be used to generate videos from text, stories, songs or poems as input.?
3. Image Inpainting
Reconstruction of missing regions in an image or image inpainting can also be achieved by diffusion models. Few common use cases include:
4. Image Outpainting
It is?a powerful technique that can be used to expand the images beyond their borders. It’s commonly utilized in the following cases.?
5. Text to 3D
Text to 3D image conversion is possible through diffusion models and the following are the common scenarios.?
6. Image to Image
In this methodology, we can modify or transform existing images using the text prompt. Few exciting use cases are:?
Now we will explore two very common use cases from the above-mentioned ones.
Use Case 1: Generating Personalized Images for Marketing (Text-to-Image)
Problem Statement: Personalized images for marketing: A clothing brand example
Imagine you're a clothing brand selling stylish t-shirts. You want to create engaging marketing materials that resonate with your target audience. Instead of using generic static photos, wouldn't it be cool to personalize your visuals? Let's see how diffusion models can make that happen:
Let us install the requisite libraries.
!pip install openai==0.28 requests
import os
import openai
import requests
openai.api_key = [Your openai Key]
We will use DALL-E model to generate the images from the text prompt. The text prompt should be framed very diligently as the appropriateness of the image will depend on that. We have created the below prompt based on our requirement here.
PROMPT = "A young woman at a rock concert wearing a t-shirt"
response = openai.Image.create(
prompt=PROMPT,
n=1,
size="256x256",
)
Here, prompt takes the above input prompt, n - defines the number of images we want, and size refers the size of the image.
url = response["data"][0]["url"]
data = requests.get(url).content
f = open('/content/sample_data/t_shirt_img.png','wb')
f.write(data)
f.close()
In the above code snippet, the Image.create method returns a response which contains a URL leading to the created image.
Then we can extract this URL from the response and can download and store it.
领英推荐
The requests library is used to download the image, followed by writing the image data to a file.
After running it, we can get the following tunning image which can be used directly for marketing campaign.
Use Case 2: Product Design and Visualization - Visualizing Product Prototypes before Manufacturing (Text-to-Image)
Let us assume, here our product is Sustainable Sneakers. Basically, our product is eco-friendly, made from recycled materials. Hence, to effectively communicate our vision to the customers and get stakeholder buy-in, we need stunning visuals that showcase the innovative design and sustainable nature of our product.
Let us try the above DALL-E model with the below prompt.
PROMPT = "Sleek, minimalist sneaker crafted from ocean plastic with vibrant coral reef-inspired patterns."
After running the model with the above prompt, it is producing the following exciting image of our dream product!
Use Case 3: Education and Training - Generating Visuals for Complex Concepts (Text-to-Image)
This is another area where we can apply the above model with appropriate prompt towards finding the right visuals for complex concepts. It will be extremely beneficial to the learners as we all know visuals are the key element of interactive learning. One such prompt is given below.
PROMPT = "Large Language Models are one of the biggest discoveries of AI. Explaining this concept through traditional diagrams which will help the students to understand easily and laypeople to grasp."
It will generate the diagram which will explain the LLM nicely. Please check how the output image is coming!
Use Case 4: Generating Automated Video from Textual Features for Launching new Product (Text-to-Video)
Now we will see the possible use cases for the second type which is text to video generation. One very common and widely used scenario is creating a video for launching a new product to attract more customers.
Let us consider, we are going to launch a smart watch for health monitoring. So, we will see below how diffusion models can do the wonders!
Let us install the requisite libraries.
!pip install diffusers transformers accelerate torch
Then importing the relevant modules from each library.
import torch
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
from diffusers.utils import export_to_video
Now we are going to create the pipelines.
pipe = DiffusionPipeline.from_pretrained("damo-vilab/text-to-video-ms-1.7b", torch_dtype=torch.float16, variant="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
We load the Text-2-Video model provided by ModelScope on HuggingFace in the Diffusion Pipeline. The model is based on UNet3D architecture that generates a video from pure noise through an iterative de-noising process. Also, the model has 1.7 billion parameters which is clearly visible from the above model description.
Moreover, 16-bit floating-point precision is used to reduce GPU utilization. In addition, CPU offloading is enabled that removes unnecessary parts from GPU during runtime.
The next part is the most exiting one; Yes, it is generating the desired video for our product demo.
prompt = "A woman is wearing a smart watch for health monitoring."
video_frames = pipe(prompt, num_inference_steps=25).frames
video_path = export_to_video(video_frames)
As our product is a smart watch; the above prompt is selected accordingly. We are passing the prompt to the video generation pipeline that provides a sequence of generated frames. Here we are taking the value of num_inference_steps as 25 so that the model will perform 25 denoising iterations.
The higher value of the parameter can improve the video quality but requires more amount of resource and time. Hence, based on computational resource availability, it can be increased.
In the last step, the separate image frames are combined using a diffuser's utility function, and a video is saved on the disk. Following presents the output of the same.
Please note that, it is showing here as an image, however it's creating the video and can be stored in the disk.
Hope you have enjoyed the simple experiments through diffusion models.
In the Part 2, we will see more business use cases on the same.
Wish everybody a very Happy New Year!!