登录查看更多内容

Diffusion Model - Exploring Business Use Cases - Part 1

Anindita Desarkar, PhD

PhD in CSE (JU) || Product Owner || Gen AI Practitioner || Director @LTIMindtree|| Dedicated Researcher in Data Science, Gen AI || Mentor || Patents on AI/DS/Gen AI

发布日期: 2023年12月29日

In the previous two blogs Diffusion Model - Part 1 and Diffusion Model - Part 2 my friend and colleague Somsuvra Chatterjee has already explained the diffusion model and its architecture in detail. Here, we will explore a few real-life applications where we can generate 2D & 3D images and videos from text prompt as well as images by applying the various diffusion models.

The following scenarios present various utilities and corresponding business use cases of diffusion model.?

??1. Text-to-Image Generation

The input will be some text prompt and corresponding image will be generated as output. It can be used in:?

?Creative Content Generation in Sales & Marketing: Generating personalized images for social media and marketing, various illustrations for books, creating artwork and designing various games.?
?Product Design and Visualization in Manufacturing:?Visualizing product prototypes before manufacturing, creating virtual product displays for e-commerce, product design customization based on user input.?
? Visuals for Education and Training: Generating visual examples for complex concepts, building personalized training materials.?
? Creating visuals for Entertainment Industry: Developing interactive storytelling experiences through visuals, creating avatars.?

2. Text-to-Video Generation?

Diffusion models can be used to generate videos from text, stories, songs or poems as input.?

Storytelling in Education: Creating animated videos from scripts or stories to make the concept easier for the students.?
?Automated Product Demo across Industry: Generate videos showcasing product features and usage directly from text descriptions which saves significant time and resources in product marketing.?
?Research and Development: Visualizing scientific simulations and data by generating videos to communicate complex research findings.?

3. Image Inpainting

Reconstruction of missing regions in an image or image inpainting can also be achieved by diffusion models. Few common use cases include:

Photo Restoration: Repairing damaged or scratched photos, restoring faded or discolored photos, removing unwanted objects or people from photos.?
?Creative Image Editing: Incorporating new elements into existing photos, changing backgrounds or scenery, experimenting with different artistic styles and effects.?
?Medical Imaging: Filling in missing data in medical scans, removing artifacts from medical images.?

4. Image Outpainting

It is?a powerful technique that can be used to expand the images beyond their borders. It’s commonly utilized in the following cases.?

Photography and creative image editing: Expanding photo boundaries to include more scenery, extending portraits to create full-body compositions, enlarging product photos to showcase more features, creating panoramic images for immersive views.?
?Medical Imaging: Expanding medical scans to view larger tissue areas for better diagnosis, visualizing complete organ structures for treatment planning.

5. Text to 3D

Text to 3D image conversion is possible through diffusion models and the following are the common scenarios.?

Product Design and Prototyping: Visualizing product concepts from text descriptions through the 3D models as well as evaluating the designs which reduces the physical prototyping costs and time-to-market, creating interactive 3D product demos which allow customers to visualize products from multiple angles,?interact with features,?and customize designs for a more engaging shopping experience.?
?Architecture and Interior Design: Visualizing architectural designs and spaces through the 3D models of buildings,?interiors,?and landscapes from textual descriptions. User can decide based on their choice after evaluating all the options.?
3D Printing and Manufacturing:?Generating 3D printable models from text, create custom 3D objects for fabrication without extensive design expertise which is crucial in manufacturing processes.?

6. Image to Image

In this methodology, we can modify or transform existing images using the text prompt. Few exciting use cases are:?

?To convert crude hand-drawn or unfinished images into beautiful pictures with the same content.?
Fashion industry: Provide an input image of a model and change the look and feel by changing the color and style of the apparel automatically.

Now we will explore two very common use cases from the above-mentioned ones.

Use Case 1: Generating Personalized Images for Marketing (Text-to-Image)

Problem Statement: Personalized images for marketing: A clothing brand example

Imagine you're a clothing brand selling stylish t-shirts. You want to create engaging marketing materials that resonate with your target audience. Instead of using generic static photos, wouldn't it be cool to personalize your visuals? Let's see how diffusion models can make that happen:

Let us install the requisite libraries.

!pip install openai==0.28 requests

import os
import openai
import requests

openai.api_key = [Your openai Key]

We will use DALL-E model to generate the images from the text prompt. The text prompt should be framed very diligently as the appropriateness of the image will depend on that. We have created the below prompt based on our requirement here.

PROMPT = "A young woman at a rock concert wearing a t-shirt"

response = openai.Image.create(
    prompt=PROMPT,
    n=1,
    size="256x256",
)

Here, prompt takes the above input prompt, n - defines the number of images we want, and size refers the size of the image.

url = response["data"][0]["url"]
data = requests.get(url).content
f = open('/content/sample_data/t_shirt_img.png','wb')
f.write(data)
f.close()

In the above code snippet, the Image.create method returns a response which contains a URL leading to the created image.

Then we can extract this URL from the response and can download and store it.

领英推荐

Introduction to Pica AI

Blockchain Council 9 个月前

Start Your MidJourney to Create Stunning AI-Generated…

张文律 2 个月前

Articulating Ideas Using AI in Designs

PureSquare 10 个月前

The requests library is used to download the image, followed by writing the image data to a file.

After running it, we can get the following tunning image which can be used directly for marketing campaign.

Use Case 2: Product Design and Visualization - Visualizing Product Prototypes before Manufacturing (Text-to-Image)

Let us assume, here our product is Sustainable Sneakers. Basically, our product is eco-friendly, made from recycled materials. Hence, to effectively communicate our vision to the customers and get stakeholder buy-in, we need stunning visuals that showcase the innovative design and sustainable nature of our product.

Let us try the above DALL-E model with the below prompt.

PROMPT = "Sleek, minimalist sneaker crafted from ocean plastic with vibrant coral reef-inspired patterns."

After running the model with the above prompt, it is producing the following exciting image of our dream product!

Use Case 3: Education and Training - Generating Visuals for Complex Concepts (Text-to-Image)

This is another area where we can apply the above model with appropriate prompt towards finding the right visuals for complex concepts. It will be extremely beneficial to the learners as we all know visuals are the key element of interactive learning. One such prompt is given below.

PROMPT = "Large Language Models are one of the biggest discoveries of AI. Explaining this concept through traditional diagrams which will help the students to understand easily and laypeople to grasp."

It will generate the diagram which will explain the LLM nicely. Please check how the output image is coming!

Use Case 4: Generating Automated Video from Textual Features for Launching new Product (Text-to-Video)

Now we will see the possible use cases for the second type which is text to video generation. One very common and widely used scenario is creating a video for launching a new product to attract more customers.

Let us consider, we are going to launch a smart watch for health monitoring. So, we will see below how diffusion models can do the wonders!

Let us install the requisite libraries.

!pip install diffusers transformers accelerate torch

Then importing the relevant modules from each library.

import torch
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
from diffusers.utils import export_to_video

Now we are going to create the pipelines.

pipe = DiffusionPipeline.from_pretrained("damo-vilab/text-to-video-ms-1.7b", torch_dtype=torch.float16, variant="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

We load the Text-2-Video model provided by ModelScope on HuggingFace in the Diffusion Pipeline. The model is based on UNet3D architecture that generates a video from pure noise through an iterative de-noising process. Also, the model has 1.7 billion parameters which is clearly visible from the above model description.

Moreover, 16-bit floating-point precision is used to reduce GPU utilization. In addition, CPU offloading is enabled that removes unnecessary parts from GPU during runtime.

The next part is the most exiting one; Yes, it is generating the desired video for our product demo.

prompt = "A woman is wearing a smart watch for health monitoring."
video_frames = pipe(prompt, num_inference_steps=25).frames
video_path = export_to_video(video_frames)

As our product is a smart watch; the above prompt is selected accordingly. We are passing the prompt to the video generation pipeline that provides a sequence of generated frames. Here we are taking the value of num_inference_steps as 25 so that the model will perform 25 denoising iterations.

The higher value of the parameter can improve the video quality but requires more amount of resource and time. Hence, based on computational resource availability, it can be increased.

In the last step, the separate image frames are combined using a diffuser's utility function, and a video is saved on the disk. Following presents the output of the same.

Please note that, it is showing here as an image, however it's creating the video and can be stored in the disk.

Hope you have enjoyed the simple experiments through diffusion models.

In the Part 2, we will see more business use cases on the same.

Wish everybody a very Happy New Year!!

要查看或添加评论，请登录

Anindita Desarkar, PhD的更多文章

Deep Drive into DeepSeek for Deep Reasoning

2025年2月7日

Deep Drive into DeepSeek for Deep Reasoning

1. Introduction Large Language Models (LLMs) are quickly evolving, inching closer to the goal of Artificial General…

10 条评论
Unlocking Research Potential with Agentic Framework: Crew AI

2025年1月12日

Unlocking Research Potential with Agentic Framework: Crew AI

The Objective of this blog is to explore the tools of Crew AI Agentic framework and how these can be deployed towards…

5 条评论
Agents and Workflow: Real Applications to understand When to use What

2024年12月25日

Agents and Workflow: Real Applications to understand When to use What

1. What are Agents and Workflows? The term "agent" can be interpreted in different ways.

6 条评论
Understanding GraphRAG and Its Challenges

2024年10月5日

Understanding GraphRAG and Its Challenges

What is RAG? RAG is a natural language querying approach for enhancing existing LLMs with external knowledge, so…

5 条评论
Exploring the AI Act Law using ScrapFly powered Web Scrapping API and RAG Framework

2024年9月15日

Exploring the AI Act Law using ScrapFly powered Web Scrapping API and RAG Framework

The Artificial Intelligence (AI) Act is a European Union (EU) law that establishes a legal framework for AI use. The…

3 条评论
Lamini Memory Tuning towards achieving better Accuracy and lesser Hallucination

2024年9月13日

Lamini Memory Tuning towards achieving better Accuracy and lesser Hallucination

1. Problem: Lesser Accuracy and Higher Hallucination in LLM Response Accuracy in LLM response and hallucination are two…

1 条评论
Exploring Explainable AI Techniques towards Enhancing Trust in Gen AI Models

2024年8月11日

Exploring Explainable AI Techniques towards Enhancing Trust in Gen AI Models

If we ask any copilot today, “Please write the steps of genetic algorithm”. There are two possible ways of getting the…

3 条评论
How to Talk with Engineering Drawings using Gen AI Techniques – Part 1

2024年7月21日

How to Talk with Engineering Drawings using Gen AI Techniques – Part 1

Probably we all have faced the following questions at some point of time while working. · What is the best way for…

5 条评论
Green Computing: A Myth or Achievable Reality

2024年7月14日

Green Computing: A Myth or Achievable Reality

Introduction: In the arena of Generative AI, increased carbon footprint poses a significant threat to the society which…

1 条评论
How Research differs from usual Development

2024年6月22日

How Research differs from usual Development

Research is a process of systematic inquiry that entails collection of data; documentation of critical information; and…

3 条评论

See all articles

Diffusion Model - Exploring Business Use Cases - Part 1

Anindita Desarkar, PhD

PhD in CSE (JU) || Product Owner || Gen AI Practitioner || Director @LTIMindtree|| Dedicated Researcher in Data Science, Gen AI || Mentor || Patents on AI/DS/Gen AI

??1. Text-to-Image Generation

2. Text-to-Video Generation?

3. Image Inpainting

4. Image Outpainting

5. Text to 3D

6. Image to Image

Use Case 1: Generating Personalized Images for Marketing (Text-to-Image)

领英推荐

Use Case 2: Product Design and Visualization - Visualizing Product Prototypes before Manufacturing (Text-to-Image)

Use Case 3: Education and Training - Generating Visuals for Complex Concepts (Text-to-Image)

Use Case 4: Generating Automated Video from Textual Features for Launching new Product (Text-to-Video)

Anindita Desarkar, PhD的更多文章

社区洞察

其他会员也浏览了

Art, Fashion, and Generative Adversarial Networks

How AI Image Generators Work and Are Used by Businesses

The Evolution of AI Image Generation: How GPT-4o Is Changing Visual Communication

The Language of Design in the AI Era: Why Your Creative Voice Will Be Your Most Valuable Asset

How Artificial Intelligence Needs to Changing Graphic Designer

Weekly Pulse | Issue #3 (Bite-sized AI News)

The Creative Brief: AI Will Add Creative Jobs, Another Text-to-Video AI tool, What is “Post-Branding?”, and What if Designers Went on Strike?

Exploring Leonardo.AI: Revolutionizing Creative Industries with AI-Powered Tools

Thanks ChatGPT-4o, now Graphic Designers are out of a job too...or are they?

From Posters to Picture Books: My Adventures in AI Creativity

??1. Text-to-Image Generation

2. Text-to-Video Generation?

3. Image Inpainting

4. Image Outpainting

5. Text to 3D

6. Image to Image

Use Case 1: Generating Personalized Images for Marketing (Text-to-Image)

领英推荐

Use Case 2: Product Design and Visualization - Visualizing Product Prototypes before Manufacturing (Text-to-Image)

Use Case 3: Education and Training - Generating Visuals for Complex Concepts (Text-to-Image)

Use Case 4: Generating Automated Video from Textual Features for Launching new Product (Text-to-Video)

Anindita Desarkar, PhD的更多文章

Deep Drive into DeepSeek for Deep Reasoning

Unlocking Research Potential with Agentic Framework: Crew AI

Agents and Workflow: Real Applications to understand When to use What

Understanding GraphRAG and Its Challenges

Exploring the AI Act Law using ScrapFly powered Web Scrapping API and RAG Framework

Lamini Memory Tuning towards achieving better Accuracy and lesser Hallucination

Exploring Explainable AI Techniques towards Enhancing Trust in Gen AI Models

How to Talk with Engineering Drawings using Gen AI Techniques – Part 1

Green Computing: A Myth or Achievable Reality

How Research differs from usual Development

社区洞察

其他会员也浏览了

Art, Fashion, and Generative Adversarial Networks

How AI Image Generators Work and Are Used by Businesses

The Evolution of AI Image Generation: How GPT-4o Is Changing Visual Communication

The Language of Design in the AI Era: Why Your Creative Voice Will Be Your Most Valuable Asset

How Artificial Intelligence Needs to Changing Graphic Designer

Weekly Pulse | Issue #3 (Bite-sized AI News)

The Creative Brief: AI Will Add Creative Jobs, Another Text-to-Video AI tool, What is “Post-Branding?”, and What if Designers Went on Strike?

Exploring Leonardo.AI: Revolutionizing Creative Industries with AI-Powered Tools

Thanks ChatGPT-4o, now Graphic Designers are out of a job too...or are they?

From Posters to Picture Books: My Adventures in AI Creativity