A simplified overview of Language models(LMs) for beginners??
Hannah Igboke
Chemical Engineering Graduate || Data scientist || Technical Writer || Aspiring NASA engineer
Where do we start? I say we start from the very beginning. What is a language?
A language is a structured system of communication consisting of sounds, words, and symbols, which means we can have several languages depending on your locality, e.g., English, Igbo, French, and Telugu, just name it. With the advent of digital devices and tools came the need for us to be able to communicate with these tools and our devices. This is where natural language processing (NLP) comes in.
NLP is the ability of machines to process, understand, and respond to natural human speech using linguistics, statistics, and machine learning to derive semantic and contextual meanings from what you write or say. This is how we came to have speech-to-text systems. However, these systems don’t just happen; there’s a powerhouse behind them. Language Models!
What’s that?
The term language models (LMs) describes models that capture the relationship between words and sentences in a language. They help predict the next most appropriate word in a sequence of words based on the context of a given text. For example, in a sentence like this: ‘Mother said grandma was ill, so last Sunday, Jane and I visited _____.
As a human, you can easily fill in the blank space because you understand the relationship between ‘mother, grandma, ill, visit, Jane, and I.’ For the language models, the most natural way, just like you, to complete the sentence is “her,” because “grandma” refers to a female relative and “her” indicates possession.?
But this is only a taste of what these models can do.
Real-world applications of?LMs
Language models are used in a variety of NLP tasks. They become even more useful when they are integrated with a user-friendly interface. Case in point, look at the image below.
Here, in WhatsApp, the model is able to predict some words and even emoji alternatives I could use in the next part of my sentence. This is known as auto-suggestions. Other examples include Gmail, which, as you type, pops up some words that you can use when you hit the tab key on your keyboard.
Other applications include:
This includes speech-enabled applications where you can have a natural conversation with your computer, just like talking to a real person. Prominent examples of these include Apple’s Siri, Amazon’s Alexa, Google Assistant, Erica from Bank of America, and dozens of virtual assistants and chatbots you come across on websites.
OpenAI’s ChatGPT and Google’s Gemini are the quickest examples that come to mind. You find that all you need to do is write a prompt on what you want, e.g., “Write two paragraphs of content on the history of the channel brand.” It goes further than that. It can create news articles, poems, stories, scripts, etc.
This is the process of analyzing digital text to determine if the emotional tone of the message is positive, negative, or neutral. This is especially useful to generate insights on how well or not your products or services are doing. Below is a video of this in action.
These models are used to automatically shorten documents, videos, podcasts, papers, etc. into their most important parts. They do this by providing summaries that do not repeat the original language in the text fed in as input or by extracting the most important information among the noise of the data.
领英推荐
This includes their ability to take audio or video files and transcribe them into written text. A very practical example of this is when you watch a YouTube video and turn on the captions. You will usually see information saying that the captions are auto-generated. The models are simply converting the audio data from these videos into readable text on your screens.
This is just a glimpse of what these models can accomplish.
What do we need to build these sorts of?models?
First things first, we need to select the best machine learning architecture.?
There are several that exist, namely:
Secondly, we need Data, lots of?it.
Building language models requires tons and tons of data to be fed into the chosen machine-learning architecture. It has come to be known as the Large Language Model (LLM). BERT for instance, was trained using data from Wikipedia and Google Books.
Thirdly, an approach.
Three approaches exist here:
Regressive means returning to a former or backward state. Models are trained to predict the next word in a sentence based on the previous (former) words in the phrase. Models built using this approach correspond to the decoder part of the transformer architecture where a mask is applied to the full sentence so that the attention head can only see the words that came before. These models are ideal for text generation. Examples of these include GPT-1 and 2.
Over here, the goal is to reconstruct an original sentence from a corrupted version of the input. These models respond to the encoder part of transformers. Autoencoding creates a bidirectional representation of the whole sentence used to perform sentence and word classification like in sentiment analysis. Unlike in autoregression, no mask is applied to the input. An example is BERT trained with an encoder-only transformer and using techniques such as the Masked LM technique and NSP?—?Next Sentence Prediction.
A combination of models or approaches can generate more diverse and creative text in different contexts compared to pure decode-based autoregressive models due to their ability to capture additional context using the encoder. Examples include: T5, GPT-3.5, 4, etc.
Conclusion
Language models are integral to many interactive systems, and their underlying architectures are constantly improving. However, despite being trained on extensive textual data, LMs exhibit limitations in tasks requiring human reasoning or decision-making abilities. At the time of writing, they cannot understand abstract concepts like sarcasm, or make accurate inferences based on incomplete information. This, therefore, highlights the vast potential for LMs to evolve and become even more valuable across various domains.
References
Chemical Engineering Undergraduate | Passionate about Energy, Process Design, and Environmental Sustainability |Tech and Research Enthusiast | Seeking Opportunities for Innovation and Impact
10 个月Nice research work
Chemical engineering student || Data scientist || Microsoft learn student ambassador || Machine learning expert en route || Energy enthusiast || Volunteer
11 个月Great job Hannah Igboke
Aspiring Data Scientist | Freelance Videographer | Video Editor | Photographer
11 个月lovely
Ass. - ML Engineer @ Nvidia (Contractor) | Python | SQL | Machine Learning| EDA | AWS | NLP.
11 个月Nice blog
CTO @ Innomatics | ThatAIGuy.com
11 个月Brilliant. Interesting to see the impact of language modeling with examples like sentiment analysis. Thanks for sharing Hannah. ??