Rise of chat GPT ( Part-1)
Ravi Prakash
Senior Manager , Planning and Business Systems , Johnson and Johnson , APAC , MedTech
There are few events in history of human evolution which propelled us into new era and arrival of GPTs ( Generative Pre-Trained Transformers ) is definitely one of them . As an intelligent species we have been dreaming of creating something on our own which could display at least same level of intelligence as that of human beings . Many attempts have been made in past but nothing captured imagination and interest of people like this . In this multi-part article let us dive into a journey which narrates the story of evolution of LLM ( Large language Models ) . Don't worry even if you don't know anything about Deep Learning . We will keep it simple !
Attention Is all you need
A team of google researchers published a paper which proposed a simple network architecture called Transformers ( not one like movie series !) revolutionizing the world of NLP ( focused on Language translation ) . You may be wondering then why Google failed to be a pioneer in this area and till today it continues to play catching up role . Valid question but not a topic for this article . Million dollar question then is -how does this Transformer look like ? This model is not having attractive look like our fashion models .
It is not easy to understand Transformer model and too when you don't have through understanding of Deep learning . Think of an alien ( technologically inferior to us ) watching MARS rover in action on red planet . She ( Lakshmi) would believe that we built this advance machine since beginning of our era on Earth . Will it not help her if she gets to know that this story started millions of years ago with cart driven by animals ? Knowing about Bullock cart does not explain the technology involved in rover but it does help to explain the journey . So let us go back in time .....
Sentiment Analysis Puzzle
Let us play a small game . Lakshmi can understand English language . She is given a task to classify movie 'Avatar 's ' reviews as positive or negative . Here is sample data set ( made up ) .
Lakshmi does not know deep learning but she has got a logical brain . Take a tea break and think about it . Can you guide her ?
She comes out with a very simple but effective plan . She will replace words in sentences with numbers . Any word she can associate with positive feeling will be assigned any number between 1 to 5 and similarly negative words would be allotted number ranges between -1 to -5 . Neutral words like the , is etc. would be marked zero.
Eureka ! Lakshmi's model is doing a great job . What if someone posts a review like - 'Movie is amazing . Theatre was horrible . Chairs were shabby and would not go again ' . Count of negative words is more than positive and it is likely that score would be negative but it is not a negative review of movie . Gist here is that Human languages are not so simple and meaning of words are context specific and this naive model is not deployable . Nevertheless it is a great idea and we can build upon it .
Machines do not understand Text
Essentially Lakshmi's first step is also the starting point in mighty Transformer architecture as well. The first step is to tokenize the sentences . For example Chat GPT-3 has more than 50, 000 tokens ( It has got around 175 Billion parameters ( in deep learning language it is called Weights and Biases ) . Not necessarily each token are words but it is convenient to assume that tokens are words ( A lie with which we can live with given our objective) . In below example , we can see the tokenization process in action .
When we apply this to sample movie review then it becomes a matrix of numbers . ( I am using Keras )
Yes , like Lakshmi , we have dictionary of mapping as well . Would you like to see it ? Pathetic is assigned number 11 where as 'very' is 23rd in sequence . Note that these are mere sequencing and numbers have no meaning like what Lakshmi did in previous Naive Model .
This technique of tokenization has been used previously with shallow ML algorithms as well with deep learning for NLP . What we saw above is also called UNIGRAMS ( Bag of words ) . We realized that words make more sense when we look at them together . For example movie alone does not say much if it is bad or good movie but if we read 'wonderful movie' then it conveys a positive meaning . When we are creating features with two or less than 2 tokens then it is Bigram . See the example for Bigram below . People used to apply one-hot encoding after that to feed into models . This idea has its limit because of various reasons and we leave it here .
领英推荐
STEP- 2 -EMBEDDING
What is word embedding ? Let us avoid trap of ML definition and go ahead with lay man explanation . Lakshmi is shown 2 dimensional Global Map of earth . She notices the continents like ASIA and within ASIA, she is able to identify closeness of South Asia nations . For her, all countries in LATAM are within South America cluster . If she is to represent Brazil as Vector in 2 D map , probably she will draw an arrow from origin towards S America cluster . What about Argentina ? She will draw another Vector which will be closer to Brazil or in same quadrant .
Apply a magic which erases the map and leaves the vectors . If you look at Vectors Brazil ( x= -12 , y=-14 ) and Argentina ( x= -13 , y=-16) what does those numbers say ? Probably they belong to same cluster , may be people are culturally similar ( playing fool ball better ) etc.
So embedding is converting tokens into higher dimensional* ( like we saw above 2 D ) vectors with semantic meanings.
In above pic , we see that words with similar meanings like fruits etc. have embedding vectors pointing in same direction .
In Chat GPT-3 , each tokens are having more than 12000 dimensions ( we cannot imagine that but it is good to know it ) embedding .
Let us do a small math . Chat GPT-3 gets around 50,000 tokens and each token is having 12000 ( approximate ) dimensional embedding hence we are looking around 50,000 x 12,000 = 600, 000,000 parameters ( 600 Mn which is very small portion of 175Billion )
doc2vec/GloVe: It is open source word embedding available for usage and you can access below links for more details .
At high level , each words are having 100 dimensional embedding ( GloVe) .
So we converted each of our sample tokens to 20 dimensional embedding . Our model is having 8500 parameters ! It is really a tiny ant when compared to mount Everest Chat GPT
Can it predict sentiments clearly . We will validate -
Prompt -1
Prompt -2
We will wind up here . A very small model is able to do a fair job of sentiment analysis . Idea was not to show this but to create base for understanding Transformer architecture which will be topic for next article . I extensively used Chat GPT to get support while writing this article !
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
7 个月Gen-AI's evolution will be driven by explainability and personalization, enabling AI systems to understand and respond to human needs with greater nuance. The recent breakthroughs in neuro-symbolic AI hold immense promise for bridging the gap between symbolic reasoning and deep learning. With the advent of embodied AI agents interacting in real-world environments, how will Gen-AI models adapt to learn and reason within complex social contexts?