Fundamentals -4: Lemmatization
Source: Medium

Fundamentals -4: Lemmatization

Hello?#subscribers?#connections?#linkedinfamily

MasterNLP?extends a warm welcome to everyone. This week, we will be learning about?Lemmatization?in NLP.

What is Lemmatization?

Lemmatization is a process in NLP that involves reducing words to their base or dictionary form, which is known as the lemma. The process involves identifying the base form of a word, which is also known as the morphological root, by taking into account its context and morphology.

What lemmatization does?

The goal of lemmatization is to reduce inflected words to their base form so that they can be analyzed more easily.

For instance, the words "am," "are," and "is" can all be lemmatized to the base form "be." Similarly, the words "go," "going," and "went" can be lemmatized to the base form "go."

By reducing words to their base form, we can reduce the number of unique words in a corpus and improve the accuracy of downstream NLP tasks such as information retrieval, sentiment analysis, and machine translation.

What makes lemmatization differ from stemming?

Lemmatization is different from stemming, which is another process used in NLP to reduce words to their root form. Stemming is a simpler process that involves removing the suffixes from a word to obtain its root form. However, stemming may not always produce valid words, whereas lemmatization always produces valid words.

Key Differences between stemming and lemmatization

  • Stemming is faster because it chops words without the context of the word in a given sentence. While lemmatization is slower as compared to stemming but it knows the context of the word before proceeding.
  • Stemming has less accuracy while lemmatization has more accuracy as compared to the stemming.
  • In stemming, when we convert any word into root form, it may create a word with no meaning. On the otherside, lemmatization always provides the dictionary meaning while converting the words to their base form.
  • Stemming is used when the meaning of the word is not important for the analyses. Example: Spam detection; while lemmatization is used when it is important to have the meaning of the word. Example: Sentiment Analysis.
  • Simple example:

Stemming: Studies ------------> Studi

Lemmatization: Studies ------------> Study


Pythonic Implementation

  • There are several algorithms and tools available for performing lemmatization in NLP. One common tool is the WordNet lemmatizer, which is included in the Natural Language Toolkit (NLTK) library for Python. The WordNetLemmatizer maps words to their corresponding lemmas based on their part-of-speech (POS) tags. For example, the word "running" would be lemmatized to "run" if it is tagged as a verb, but to "running" if it is tagged as a noun.
  • Another popular tool for lemmatization is the Stanford CoreNLP toolkit, which provides lemmatization along with other NLP features such as parsing, sentiment analysis, and named entity recognition.


import nltk
nltk.download('wordnet')? # download WordNet corpus

from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

# Example sentence
sentence = "the boys are going by the car"

# Tokenize sentence into words
words = nltk.word_tokenize(sentence)

# Lemmatize words
lemmatized_words = [lemmatizer.lemmatize(word, pos=get_wordnet_pos(pos_tag)) for word, pos_tag in nltk.pos_tag(words)]

print(lemmatized_words)

# Output: ['the', 'boy', 'be', 'go', 'by', 'the', 'car']        

We can also remove stopwords before lemmatization, which gives the output-

'boy', 'going', 'car'        

Stopwords removal is discussed already in previous editions. Please checkout stopwords edition for more -

Lemmatization can also be customized to specific domains or languages by using domain-specific dictionaries or language-specific algorithms.

For example, the spaCy library for Python provides language-specific lemmatization algorithms for several languages, including English, German, Spanish, and French.


As discussed in one pf the articles, we will be having one interview question in every upcoming article.

Interview Question:

Question: What is NLP, and how does it differ from other areas of AI?

Answer: NLP is a subfield of AI that focuses on enabling computers to understand, interpret, and generate human language. NLP aims to bridge the gap between human language and computer language, enabling machines to understand and respond to natural language inputs like text, speech, and images.

NLP differs from other areas of AI in its focus on language and communication. While other areas of AI, such as computer vision or robotics, may also deal with natural language input and output, their primary focus is on perception and action in the physical world. NLP, on the other hand, is concerned primarily with the meaning and structure of language, and how it can be processed and analyzed by computers.

Another important aspect that sets NLP apart from other areas of AI is the complexity and variability of natural language. Unlike computer programming languages or other formal languages, natural language is highly ambiguous, context-dependent, and constantly evolving. NLP researchers and practitioners must contend with issues like synonymy (different words with the same meaning), polysemy (the same word with multiple meanings), and homonymy (different words with the same spelling or pronunciation).


Subscribe to MasterNLP for more conceptual content every week.

Best regards,

#MasterNLP?Newsletter

Have a great week ahead.


Robert L. Williams III, CAMS,CCI,CRFCC

AML Watcher Brand Ambassador /Consultant Sanctions,AML,KYC | Director

2 年

Sounds like Latin where many languages emanate from?

回复

要查看或添加评论,请登录

Dheeraj RP ????的更多文章

  • Term Frequency-Inverse Document Frequency

    Term Frequency-Inverse Document Frequency

    Hello #subscribers #connections #linkedinfamily This week, we will be learning about Term frequency and Inverse…

  • Bag of Words(BoW)

    Bag of Words(BoW)

    Hello #subscribers #connections #linkedinfamily MasterNLP extends a warm welcome to everyone. This week, we will be…

  • 50 Important Interview Questions

    50 Important Interview Questions

    Hello #subscribers #connections #linkedinfamily #MasterNLP extends a warm welcome to everyone. Below are 50 interview…

  • Fundamentals - 3: Stopwords

    Fundamentals - 3: Stopwords

    Hello #subscribers #connections #linkedinfamily MasterNLP extends a warm welcome to everyone. This week, we will be…

  • Fundamentals - 2: Stemming

    Fundamentals - 2: Stemming

    Hello #subscribers #connections #linkedinfamily MasterNLP extends a warm welcome to everyone. This week, we will be…

  • Fundamentals - 1: Tokenization

    Fundamentals - 1: Tokenization

    Hello #subscribers #connections #linkedinfamily #MasterNLP extends a warm welcome to everyone. This week, we will be…

  • Complete NLP roadmap

    Complete NLP roadmap

    Hello extends a warm welcome to everyone. This roadmap prototype to help us lay a solid foundation and develop…

  • NLP - An overview

    NLP - An overview

    Natural Language Processing (#nlp) has been revolutionizing the way computers process and understand human language…

社区洞察

其他会员也浏览了