登录查看更多内容

What are Diffusion Models?

Ivan Isaev

ML tech-lead and senior engineer | Ex-Head of ML & DS | Ex-Head of Engineering | Kaggle Competitions Master

发布日期: 2024年5月29日

Diffusion models is one of the hottest topics now. This short post is just a reminder what is this and how they emerged and had been developed.

Forward diffusion process

Given a data point sampled from a real data distribution, let us define a forward diffusion process in which we add a small amount of Gaussian noise to the sample in steps, producing a sequence of noisy samples . The step sizes are controlled by a variance schedule?'Beta'.

Reverse diffusion process

If we can reverse the above process and sample from generated data.

We will be able to recreate the true sample from a Gaussian noise input.

领英推荐

Altair Forward First – November 2023 Edition

Altair 1 年前

Paper Review: Byte Latent Transformer: Patches Scale…

Andrey Lukyanenko 3 个月前

Knowledge Hypergraphs: Enriching Triples with Structure

Mike Dillinger, PhD 1 年前

Research and development effort of DF include following topics:

Parameterization of for training loss.
Parameterization of variance schedule 'Beta'? (The forward variances are set to be a sequence of linearly increasing constants from 1e-4 to 0.02).
Parameterization of reverse process variance.
Conditioned Generation (While training generative models on images with conditioning information such as ImageNet dataset, it is common to generate samples conditioned on class labels or a piece of descriptive text).
Speed up Diffusion Models is a broad topic which includes: a) Fewer Sampling Steps (One simple way is to run a strided sampling schedule by taking the sampling update every S steps to reduce the process from T to S steps); b) Distillation (Distilling trained deterministic samplers into new models. In every progressive distillation iteration, we can half the sampling steps.).
Latent Variable Space (Latent diffusion model runs the diffusion process in the latent space instead of pixel space, making training cost lower and inference speed faster).?
Model Architectures.

There are two common backbone architecture choices for diffusion models: U-Net and Transformer.

The U-net architecture. Each blue square is a feature map with the number of channels labeled on top and the height x width dimension labeled on the left bottom side. The gray arrows mark the shortcut connections. (Image source: Ronneberger, 2015)

The Diffusion Transformer (DiT) architecture.(Image source: Peebles & Xie, 2023)

Quick Summary

Pros: Tractability and flexibility are two conflicting objectives in generative modeling. Tractable models can be analytically evaluated and cheaply fit data (e.g. via a Gaussian or Laplace), but they cannot easily describe the structure in rich datasets. Flexible models can fit arbitrary structures in data, but evaluating, training, or sampling from these models is usually expensive. Diffusion models are both analytically tractable and flexible

Cons: Diffusion models rely on a long Markov chain of diffusion steps to generate samples, so it can be quite expensive in terms of time and computation. New methods have been proposed to make the process much faster, but the sampling is still slower than GAN.

Source: Weng, Lilian. (Jul 2021). What are diffusion models? Lil’Log. https://lilianweng.github.io/posts/2021-07-11-diffusion-models/.

要查看或添加评论，请登录

Ivan Isaev的更多文章

Quatitative interview task: human approach vs AI approach

2025年3月6日

Quatitative interview task: human approach vs AI approach

It is interesting to comare human approach to solving tasks reqired knowleage of some theorems with how current AI…
Group-wise Precision Quantization with Test Time Adaptation (GPQT with TTA)

2025年2月28日

Group-wise Precision Quantization with Test Time Adaptation (GPQT with TTA)

What is Group-wise Precision Quantization with Test Time Adaptation (GPQT with TTA)? Group-wise Precision Quantization…
Pseudo Labeling

2025年2月16日

Pseudo Labeling

Pseudo Labeling (Lee 2013) assigns fake labels to unlabeled samples based on the maximum softmax probabilities…
Learning to distill ML models

2025年2月14日

Learning to distill ML models

I’m investigating the topic of ML models distillation and learning to do that. These are my takeaways with the links to…
Kaggle Santa 2024 and what do the puzzles have to do with it?

2025年2月8日

Kaggle Santa 2024 and what do the puzzles have to do with it?

Our team got 23-rd place in Santa 2024 with a silver medal. We were close to gold but not this time.
Qdrant and other vector DBs

2025年1月28日

Qdrant and other vector DBs

Issue with vector DB size There are plenty of vector DBs available including FAISS, OpenSearch, Milvous, Pinackle…
Chutes: did you try it?

2025年1月21日

Chutes: did you try it?

Hi there I found one thing and want to ask if you tried it. It named Chutes and could be found there https://chutes.

3 条评论
InternVL2 test drive

2024年11月26日

InternVL2 test drive

Intern_vl2 Is one another vision language model I tried some time ago and I like it a lot. It is quite fast (10 times…
VITA multimodal LLM

2024年11月25日

VITA multimodal LLM

Lately, I've been working a lot with multimodal LLMs to generate video descriptions. This post is about the multimodal…
4 Neural Network Activation Functions you should keep in mind

2024年5月24日

4 Neural Network Activation Functions you should keep in mind

What is a Neural Network Activation Function (AF)? Why are deep neural networks hard to train? What is "rule of thumb"…

See all articles

What are Diffusion Models?

Ivan Isaev

ML tech-lead and senior engineer | Ex-Head of ML & DS | Ex-Head of Engineering | Kaggle Competitions Master

Forward diffusion process

Reverse diffusion process

领英推荐

Research and development effort of DF include following topics:

Quick Summary

Ivan Isaev的更多文章

社区洞察

其他会员也浏览了

Paper Review: YOLOv10: Real-Time End-to-End Object Detection

SVD — Single Value Decomposition

Paper Review: FreeU: Free Lunch in Diffusion U-Net

Perspective – Blurring Lines of Art and Data Science

SIGMA: Synthetic Integrated General Modeling and Analysis

YOLO-World: A Fresh Approach to Object Detection Integrating Image Features and Text Embeddings

Five Extensions of the General Linear Model

Unveiling the Potential of Support Vector Machines in Feature Engineering

Revealing the Geometric Bridge: Transformers and Support Vector Machines in Optimization Geometry

MCDM and Fuzzy Logic!

Forward diffusion process

Reverse diffusion process

领英推荐

Research and development effort of DF include following topics:

Quick Summary

Ivan Isaev的更多文章

Quatitative interview task: human approach vs AI approach

Group-wise Precision Quantization with Test Time Adaptation (GPQT with TTA)

Pseudo Labeling

Learning to distill ML models

Kaggle Santa 2024 and what do the puzzles have to do with it?

Qdrant and other vector DBs

Chutes: did you try it?

InternVL2 test drive

VITA multimodal LLM

4 Neural Network Activation Functions you should keep in mind

社区洞察

其他会员也浏览了

Paper Review: YOLOv10: Real-Time End-to-End Object Detection

SVD — Single Value Decomposition

Paper Review: FreeU: Free Lunch in Diffusion U-Net

Perspective – Blurring Lines of Art and Data Science

SIGMA: Synthetic Integrated General Modeling and Analysis

YOLO-World: A Fresh Approach to Object Detection Integrating Image Features and Text Embeddings

Five Extensions of the General Linear Model

Unveiling the Potential of Support Vector Machines in Feature Engineering

Revealing the Geometric Bridge: Transformers and Support Vector Machines in Optimization Geometry

MCDM and Fuzzy Logic!