Rise of chat GPT ( Part-1)

Rise of chat GPT ( Part-1)

There are few events in history of human evolution which propelled us into new era and arrival of GPTs ( Generative Pre-Trained Transformers ) is definitely one of them . As an intelligent species we have been dreaming of creating something on our own which could display at least same level of intelligence as that of human beings . Many attempts have been made in past but nothing captured imagination and interest of people like this . In this multi-part article let us dive into a journey which narrates the story of evolution of LLM ( Large language Models ) . Don't worry even if you don't know anything about Deep Learning . We will keep it simple !

Attention Is all you need


Research paper published in 2017 by Ashish which paved the path for arrival of GPTs

A team of google researchers published a paper which proposed a simple network architecture called Transformers ( not one like movie series !) revolutionizing the world of NLP ( focused on Language translation ) . You may be wondering then why Google failed to be a pioneer in this area and till today it continues to play catching up role . Valid question but not a topic for this article . Million dollar question then is -how does this Transformer look like ? This model is not having attractive look like our fashion models .


Transformer Architecture as depicted in 2017 paper !

It is not easy to understand Transformer model and too when you don't have through understanding of Deep learning . Think of an alien ( technologically inferior to us ) watching MARS rover in action on red planet . She ( Lakshmi) would believe that we built this advance machine since beginning of our era on Earth . Will it not help her if she gets to know that this story started millions of years ago with cart driven by animals ? Knowing about Bullock cart does not explain the technology involved in rover but it does help to explain the journey . So let us go back in time .....

Sentiment Analysis Puzzle


Words represent our feelings !

Let us play a small game . Lakshmi can understand English language . She is given a task to classify movie 'Avatar 's ' reviews as positive or negative . Here is sample data set ( made up ) .


Ideal sample data just for explanation . Real reviews are different

Lakshmi does not know deep learning but she has got a logical brain . Take a tea break and think about it . Can you guide her ?

She comes out with a very simple but effective plan . She will replace words in sentences with numbers . Any word she can associate with positive feeling will be assigned any number between 1 to 5 and similarly negative words would be allotted number ranges between -1 to -5 . Neutral words like the , is etc. would be marked zero.


Representative example

Eureka ! Lakshmi's model is doing a great job . What if someone posts a review like - 'Movie is amazing . Theatre was horrible . Chairs were shabby and would not go again ' . Count of negative words is more than positive and it is likely that score would be negative but it is not a negative review of movie . Gist here is that Human languages are not so simple and meaning of words are context specific and this naive model is not deployable . Nevertheless it is a great idea and we can build upon it .

Machines do not understand Text

Essentially Lakshmi's first step is also the starting point in mighty Transformer architecture as well. The first step is to tokenize the sentences . For example Chat GPT-3 has more than 50, 000 tokens ( It has got around 175 Billion parameters ( in deep learning language it is called Weights and Biases ) . Not necessarily each token are words but it is convenient to assume that tokens are words ( A lie with which we can live with given our objective) . In below example , we can see the tokenization process in action .

Tokenization!

When we apply this to sample movie review then it becomes a matrix of numbers . ( I am using Keras )


Yes , like Lakshmi , we have dictionary of mapping as well . Would you like to see it ? Pathetic is assigned number 11 where as 'very' is 23rd in sequence . Note that these are mere sequencing and numbers have no meaning like what Lakshmi did in previous Naive Model .


0 is kept for i have no value tokens where as 1 is for unknown tokens !
This technique of tokenization has been used previously with shallow ML algorithms as well with deep learning  for NLP . What we saw above is also called UNIGRAMS ( Bag of words ) . We realized that words make more sense when we look at them together . For example movie alone does not say much  if it is bad or good movie but if we read 'wonderful movie' then it conveys a positive  meaning . When we are creating features with two or less than 2 tokens then it is Bigram . See the example for Bigram below . People used to apply one-hot encoding after that to feed into models . This idea has its limit because of various reasons and we leave it here .        


Bigrams


STEP- 2 -EMBEDDING


Source---https://jkfran.com/introduction-vector-embedding-databases.md/

What is word embedding ? Let us avoid trap of ML definition and go ahead with lay man explanation . Lakshmi is shown 2 dimensional Global Map of earth . She notices the continents like ASIA and within ASIA, she is able to identify closeness of South Asia nations . For her, all countries in LATAM are within South America cluster . If she is to represent Brazil as Vector in 2 D map , probably she will draw an arrow from origin towards S America cluster . What about Argentina ? She will draw another Vector which will be closer to Brazil or in same quadrant .

Apply a magic which erases the map and leaves the vectors . If you look at Vectors Brazil ( x= -12 , y=-14 ) and Argentina ( x= -13 , y=-16) what does those numbers say ? Probably they belong to same cluster , may be people are culturally similar ( playing fool ball better ) etc.

So embedding is converting tokens into higher dimensional* ( like we saw above 2 D ) vectors with semantic meanings.

In above pic , we see that words with similar meanings like fruits etc. have embedding vectors pointing in same direction .

In Chat GPT-3 , each tokens are having more than 12000 dimensions ( we cannot imagine that but it is good to know it ) embedding .

Let us do a small math . Chat GPT-3 gets around 50,000 tokens and each token is having 12000 ( approximate )  dimensional embedding hence we are looking  around 50,000 x 12,000 = 600, 000,000 parameters ( 600 Mn which is very small portion of 175Billion )         

doc2vec/GloVe: It is open source word embedding available for usage and you can access below links for more details .

At high level , each words are having 100 dimensional embedding ( GloVe) .


So we converted each of our sample tokens to 20 dimensional embedding . Our model is having 8500 parameters ! It is really a tiny ant when compared to mount Everest Chat GPT

Can it predict sentiments clearly . We will validate -

Prompt -1



Positive review and model is super confident about it


Prompt -2


We will wind up here . A very small model is able to do a fair job of sentiment analysis . Idea was not to show this but to create base for understanding Transformer architecture which will be topic for next article . I extensively used Chat GPT to get support while writing this article !

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

7 个月

Gen-AI's evolution will be driven by explainability and personalization, enabling AI systems to understand and respond to human needs with greater nuance. The recent breakthroughs in neuro-symbolic AI hold immense promise for bridging the gap between symbolic reasoning and deep learning. With the advent of embodied AI agents interacting in real-world environments, how will Gen-AI models adapt to learn and reason within complex social contexts?

要查看或添加评论,请登录

Ravi Prakash的更多文章

  • BERT -The Bahubali powering Google search Engine

    BERT -The Bahubali powering Google search Engine

    Try recalling the famous opening theme song of the TV series The Big Bang Theory. It goes like .

    1 条评论
  • Predicting the future of Heath care

    Predicting the future of Heath care

    In next 5 to 15 years AI powered Robots would be the first doctors who will attend to you during hospital visits. A…

  • Demand Segmentation - Simplified

    Demand Segmentation - Simplified

    For supply chain professionals demand segmentation is a very familiar word . If you have worked in any organization as…

  • Time Series Forecasting with RNN, LSTM and SARIMA

    Time Series Forecasting with RNN, LSTM and SARIMA

    It is highly probable that you might have been introduced to some of these cryptic words( like RNN etc.) during any…

  • An Epic Journey of Exponential Smoothing ( Part-1)

    An Epic Journey of Exponential Smoothing ( Part-1)

    Forecasting has always allured human beings. Priestesses of Delphi delivered prophecy in ancient Greece after being…

    1 条评论
  • AI battle -Bard vs ChatGPT

    AI battle -Bard vs ChatGPT

    Have you seen the Hugh Jackman starring science fiction sports drama movie Real Steel ? If not then it is worth your…

    2 条评论
  • Curse of Dimensionality and PCA!

    Curse of Dimensionality and PCA!

    Have you ever tried to visualize a space with more than 3 dimensions? It is really hard on mind. In world of Machine…

  • Demand Segmentation with ML models

    Demand Segmentation with ML models

    For supply chain professionals demand segmentation is a very familiar word . If you have worked in any organization as…

  • Black Magic method of Demand Planning

    Black Magic method of Demand Planning

    Le Verrier ( French Astronomer) began studying the motion of Mercury (during 1843) and published a report . In 1859, Le…

    1 条评论
  • Outliers ( Time Series)

    Outliers ( Time Series)

    What are outliers ? The basic question is why do we need to review outliers ?Why do we need to correct it ? What can go…

    2 条评论

社区洞察

其他会员也浏览了