Generative AI : A Primer

Generative AI : A Primer

All the content and opinion expressed in this article are mine and not of the organization I work for.

What is GenAI and what are some of the use cases?

In simple terms,Generative AI is a kind of artificial intelligence that generates content. These could be for example: text, code,speech,image,video or 3D content.

We will come to the question of how it generates content but first,let's understand some of the use cases. See the diagram below:

if you look at the tasks mentioned on the right hand side of this image, these are all content creation tasks and not regression/classification or pattern identification tasks that we accomplish through conventional machine learning.

How is it different from conventional machine learning?

Well, to answer this question, let's understand at a very broad level, how GenAI works. We will get into the depth of it later. It is a four step process:

1) Depending on the content you wish to generate, take an appropriate dataset for training(Image,Text etc.) Remember that this training dataset has an underlying probability distribution.

2) Pass this training dataset through an appropriate architecture to learn or approximate this underlying distribution

3) Start sampling the distribution once?the distribution is approximated

4) Generate new content

So we can say that for a conventional machine learning model like a discriminative model, we estimate the probability that an observation X belongs to class Y (P(Y|X) : X?->Y) whereas for a generative model, we estimate?the probability of generating a certain output from a given input(P(X) : Z->X)

Deep Dive : How does Generative AI work?

Okay, so good so far. Now, let's take a deep dive to understand how GenAI models generally work by covering a few architectures. We will first focus on some image models and then get into conversational GenAI models.

1) Autoregressive Models : These are generative models that utilize the underlying probability distribution of a given dataset just as we described in the previous section. The typical steps are:

a) Goal : Find an explicit density function Pθ(X) that estimates Pdata(X) from samples X(1),X(2),X(3)...where these could be for example image samples.

challenge : These are density functions of large complex.high dimensional data. Even a 128x128x3 image lies in a ~50,000 dimensional space.

b) Let's approximate : Learn θ so that Pθ(X) approximates Pdata(X).The aim is to minimize the distance between the two distributions where for example, in case of neural network, θ are the parameters of the network.

c) Minimize: This minimization process happens utilizing Kullback-Leibler divergence denoted by Dkl(p||q) which measures how one probability distribution p is different from a reference distribution q.

d) Apply Chain rule : Assume X consists of multiple subpart:

X = (X1,X2,X3,X4...Xt)

Pθ(X) = Pθ(X1,X2,X3,X4...Xt)

Pθ(X) = Pθ(X1)Pθ(X2|X1)Pθ(X3|X1X2)...

Application: There are several autoregressive models that have been developed. Like Pixel RNN,Row LSTM,Diagonal BiLSTM etc. but the basic working principle is same. Let's take the example of Pixel RNN. See below:

An image can be considered as an array of pixels. See how the pixels are generated row by row here by using the approximated distribution and chain rule.

Pixel RNN


2) Diffusion Models

a) Basic Construction : A diffusion model is built using the thermodynamic principles of physics. Systematically adding noise to an image to destroy its structure and learning the noise distribution in the process

b) The forward diffusion process : A forward diffusion process systematically adds a gaussian noise to the image in steps to destroy its structure to noise

c) The reverse diffusion process : A denoising neural network model then is fed with this noise and it tries to predict the noise. Since we have added the noise, we can create a loss function that measures the difference between actual and predicted noise(pixel wise MSE) and train the model to become very good at noise prediction and removing it from images!

Diffusion Model

d) Output : From this learning, they can start with pure noise and generate new images by removing the noise over a certain number of steps.


3) GAN(Generative Adversarial Network)

a) Basic Construction : Through the interaction between a “Generator” and a “Discriminator” part, the network learns about the probability distribution.

Here is a basic GAN architecture for your reference:

b) The Generator : The generator attempts to convert random noise into images that appear to be from the input training dataset

c) The Discriminator or The Critic : The discriminator tries to predict if the images are from training dataset or are fakes created by the generator

d) Output : From this learning, they can generate new elements that belong to the probability space but are different from the training set. Note that the generator never sees the training set in the whole process.

Here is a real output from a generator over a period of time to show how the generated images become gradually perfect over number of runs.

Image generated by generator over successive runs


Well, I think by now you have gotten a good understanding of the basic working principles of GenAI and a few image specific models. Now,let's switch our attention to what we call

Generative Conversational AI models:

Generative conversational AI, specifically refers to AI models that can generate human-like responses in conversation with users : For example : LLMs or Large Language Models.

These models are primarily constructed based on a state of art architecture called Transformers.

What are transformers? A transformer is a state of art model that works based on a mechanism called “Attention” to model the relationship between tokens rather than relying on recursion like LSTM/RNN based models. Transformers are typically used where sequential input-output relationships must be derived

How do they work? A transformer consists of two parts. An Encoder and a Decoder. The encoder takes input and converts it into a matrix representation. The decoder accepts that input and iteratively generates an output

Scalar Dot and Multi-head attention : It uses an attention mechanism to contextualize the input the derive the ‘contextual’ meaning of it

Here is a STEP BY STEP Over Simplified EXPLANATION OF THE ENTIRE PROCESS:( Please study the transformers for a better understanding)

a) Let's say user provides a prompt :”What is AI”

b) The prompt is tokenized : ‘What’,’is’,’AI’,<EOS>( EOS : End Of Sequence)

c)Word embedding for each word is calculated

d) Positional? embedding for each word is calculated Using sin and cosine functions

e) Each word’s context with respect to other words are calculated using a mechanism called self attention. This is done by generating three pairs known as “Query”, ”Key and ‘Value” and combining them to get something known as “self-attention” values

f) Self attention values are added to positional embedding values to create “residual connections” for parallel processing

g) Now the entire prompt is regenerated word by word by passing each word through a fully connected layer to produce the next word until the <EOS> token is generated.

h) Now the self attention value of the tokens are calculated again in order to keep track of the relationship between the input and output. Finally, we create the residual connections.

i) Now the self attention value of the <EOS> is passed through the fully connected layer to generate the output. This output generation continues till the decoder generated the <EOS> token to mark end of the output

How trustworthy are the output? Hallucination!!

An AI hallucination is when an AI model generates incorrect information but presents it as if it were a fact. This Often happens because of improper ways of asking, overfitting or using inappropriate words.

So be careful and do not trust the genAI output blindly. Check continuously!

Here is a great example I found in the context of code generation!

The model was asked to generate a code so as to “make a geodataframe grid within a polygon”,

Here is the output :

Incorrect code produced by LLM

Unfortunately Geopandas has no attribute gridify!! Look at the line :

grid = gpd.gridify(...)

Recognizing the GenAI Hammer

When machine learning first arrived, data scientists applied it to everything, even when there were far simpler tools. There is a very small subset of business problems that are best solved by machine learning; most of them just need good data and an understanding of what it means. In other words, however cool it might make you look to develop algorithms to find patterns in petabytes of data, simple math or SQL queries are often a smarter approach.

GenAI is often the wrong answer to a range of questions, where other approaches may generate better results. CEO of a data science organization recently suggested that “Small, fast, and cheap-to-run reinforcement learning models handily beat massive hundred-billion-parameter LLMs at all kinds of tasks, from playing games to writing code.”

we need to recognize GenAI as a useful tool for solving some problems, not all of them.









Aafreen Mody

AI/ML Computational Science Analyst at Accenture | Data Scientist Analyst | Soft Skills Trainer | Emcee | Orator| Public Speaker | Presenter| Freelance Host

1 年

Very Insightful, Saikat Chakraborty

ashis deb

Data Scientist@IBM| Generative AI/ NLP Lead | Seasoned Coder

1 年

completely different perspective to see gen ai models architecture, general approach is transformer explanations in 90% of articles, this will surely help both traditional as well as deep learning modelers in unlocking the blackbox.

Mayuri Kumari

University Gold Medalist || BHU || AI/ML Computational Science Analyst || Crafting stories that speak to people's hearts || Data Science || Statistics

1 年

Insightful and crisp, Saikat!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了