A simplified overview of Language models(LMs) for beginners??

A simplified overview of Language models(LMs) for beginners??


Where do we start? I say we start from the very beginning. What is a language?

A language is a structured system of communication consisting of sounds, words, and symbols, which means we can have several languages depending on your locality, e.g., English, Igbo, French, and Telugu, just name it. With the advent of digital devices and tools came the need for us to be able to communicate with these tools and our devices. This is where natural language processing (NLP) comes in.

NLP is the ability of machines to process, understand, and respond to natural human speech using linguistics, statistics, and machine learning to derive semantic and contextual meanings from what you write or say. This is how we came to have speech-to-text systems. However, these systems don’t just happen; there’s a powerhouse behind them. Language Models!

What’s that?

The term language models (LMs) describes models that capture the relationship between words and sentences in a language. They help predict the next most appropriate word in a sequence of words based on the context of a given text. For example, in a sentence like this: ‘Mother said grandma was ill, so last Sunday, Jane and I visited _____.

As a human, you can easily fill in the blank space because you understand the relationship between ‘mother, grandma, ill, visit, Jane, and I.’ For the language models, the most natural way, just like you, to complete the sentence is “her,” because “grandma” refers to a female relative and “her” indicates possession.?

But this is only a taste of what these models can do.

Real-world applications of?LMs

Language models are used in a variety of NLP tasks. They become even more useful when they are integrated with a user-friendly interface. Case in point, look at the image below.

Here, in WhatsApp, the model is able to predict some words and even emoji alternatives I could use in the next part of my sentence. This is known as auto-suggestions. Other examples include Gmail, which, as you type, pops up some words that you can use when you hit the tab key on your keyboard.

Other applications include:

  • Conversational AI

This includes speech-enabled applications where you can have a natural conversation with your computer, just like talking to a real person. Prominent examples of these include Apple’s Siri, Amazon’s Alexa, Google Assistant, Erica from Bank of America, and dozens of virtual assistants and chatbots you come across on websites.

  • Content creation

OpenAI’s ChatGPT and Google’s Gemini are the quickest examples that come to mind. You find that all you need to do is write a prompt on what you want, e.g., “Write two paragraphs of content on the history of the channel brand.” It goes further than that. It can create news articles, poems, stories, scripts, etc.


  • Sentiment analysis

This is the process of analyzing digital text to determine if the emotional tone of the message is positive, negative, or neutral. This is especially useful to generate insights on how well or not your products or services are doing. Below is a video of this in action.

  • Text summarization

These models are used to automatically shorten documents, videos, podcasts, papers, etc. into their most important parts. They do this by providing summaries that do not repeat the original language in the text fed in as input or by extracting the most important information among the noise of the data.

  • Transcription

This includes their ability to take audio or video files and transcribe them into written text. A very practical example of this is when you watch a YouTube video and turn on the captions. You will usually see information saying that the captions are auto-generated. The models are simply converting the audio data from these videos into readable text on your screens.

This is just a glimpse of what these models can accomplish.

What do we need to build these sorts of?models?

First things first, we need to select the best machine learning architecture.?

There are several that exist, namely:

  • Shallow ANN: using CBOW, skip-gram, or skip-gram negative sampling techniques, which produces Word2Vec, Glove, and FastText models. Good architecture for semantic information but incapable of learning sequential information in a given text.

  • RNN/LSTM/GRU: Recurrent Neural Network, Long-Term Short Memory, Gated Recurrent Unit. Better than the above, it doesn’t handle long-term dependencies in lengthy texts and is slow since it learns sequential information one token (word) at a time.

  • Transformers: the best. It is scalable and flexible, with fast training due to parallel training of long-term dependencies, and bidirectional contextual embeddings. It consists of two parts?—?the encoder and the decoder. An example of a model produced using Transformers is BERT (trained using encoder-only transformers).

Secondly, we need Data, lots of?it.

Building language models requires tons and tons of data to be fed into the chosen machine-learning architecture. It has come to be known as the Large Language Model (LLM). BERT for instance, was trained using data from Wikipedia and Google Books.

Thirdly, an approach.

Three approaches exist here:

  • Autoregression

Regressive means returning to a former or backward state. Models are trained to predict the next word in a sentence based on the previous (former) words in the phrase. Models built using this approach correspond to the decoder part of the transformer architecture where a mask is applied to the full sentence so that the attention head can only see the words that came before. These models are ideal for text generation. Examples of these include GPT-1 and 2.

  • Autoencoding

Over here, the goal is to reconstruct an original sentence from a corrupted version of the input. These models respond to the encoder part of transformers. Autoencoding creates a bidirectional representation of the whole sentence used to perform sentence and word classification like in sentiment analysis. Unlike in autoregression, no mask is applied to the input. An example is BERT trained with an encoder-only transformer and using techniques such as the Masked LM technique and NSP?—?Next Sentence Prediction.

  • Autoregression-Autoencoding

A combination of models or approaches can generate more diverse and creative text in different contexts compared to pure decode-based autoregressive models due to their ability to capture additional context using the encoder. Examples include: T5, GPT-3.5, 4, etc.


Conclusion

Language models are integral to many interactive systems, and their underlying architectures are constantly improving. However, despite being trained on extensive textual data, LMs exhibit limitations in tasks requiring human reasoning or decision-making abilities. At the time of writing, they cannot understand abstract concepts like sarcasm, or make accurate inferences based on incomplete information. This, therefore, highlights the vast potential for LMs to evolve and become even more valuable across various domains.


References

Osuaku Benjamin

Chemical Engineering Undergraduate | Passionate about Energy, Process Design, and Environmental Sustainability |Tech and Research Enthusiast | Seeking Opportunities for Innovation and Impact

10 个月

Nice research work

Vera Magbuwe

Chemical engineering student || Data scientist || Microsoft learn student ambassador || Machine learning expert en route || Energy enthusiast || Volunteer

11 个月

Great job Hannah Igboke

Jerry Besong

Aspiring Data Scientist | Freelance Videographer | Video Editor | Photographer

11 个月

lovely

Deepak Reddy Bora

Ass. - ML Engineer @ Nvidia (Contractor) | Python | SQL | Machine Learning| EDA | AWS | NLP.

11 个月

Nice blog

Kanav Bansal

CTO @ Innomatics | ThatAIGuy.com

11 个月

Brilliant. Interesting to see the impact of language modeling with examples like sentiment analysis. Thanks for sharing Hannah. ??

要查看或添加评论,请登录

Hannah Igboke的更多文章

社区洞察

其他会员也浏览了