One Step Ahead to the Natural Language Processing

One Step Ahead to the Natural Language Processing

In this article I would like to share some digestive overview what I have learned through my Data Science and NLP journey. So let’s get into it !

One of the most revolutionary characteristics of Artificial Intelligence is that it can read, speak, listen and understand, NLP is a part of AI which understand human language and make decision based on the information.

What is NLP ?

Natural language processing (NLP) is the ability of a computer program to understand human language as it is spoken. NLP is a component of artificial intelligence (AI).

“We liked the name Alphabet because it means a collection of letters that represent language, one of humanity’s most important innovations, and is the core of how we index with Google search!” — Larry Page, co-founder of Google Google news release, 8/10/2015

WHY NLP ?

  • Answer questions using the Web
  • Translate documents from one language to another
  • Follow directions given by any user
  • Do library research; summarize
  • Manage messages intelligently
  • Help make informed decisions
  • Listen and give advice
  • Fix your spelling or grammar
  • Grade exams
  • Write poems or novels
  • Estimate public opinion
  • Read everything and make predictions
  • Interactively help people learn
  • Help disabled people
  • Help refugees/disaster victims
  • Document or reinvigorate indigenous languages

Applications in real life

  • Computational linguistics (modeling the human capacity for language computationally)
  • Information extraction, especially “open” IE
  • Chat bots (e.g., Siri, Alexa)
  • Machine translation
  • Opinion and sentiment analysis
  • Social media analysis
  • Fake News Recognition

We have heard the term Text Mining, so what is the basic difference between NLP and Text mining ?

You can find the answer in the below image !


Building a NLP Model

1. Text gathering

Collection of data is not straight forward task. If it is just pictures, then you can write a scraper to download those pictures from the internet? but why and which ones? We already have millions of images in the Imagenet data . For text recognition tasks, you may scrape the data from the web, PDF files, books or whatever sources, but are such data not available? To collect data from sensors, you may need real people to wear or use those sensors and give you data, but which specific data you want to collect to solve what problem. Depending upon the domain of your problem domain, you may either use standard datasets collected by others or start collecting your own data. As you intend to use neural networks, then you should be aware that your dataset should be large, or else those techniques may not be very useful.

2. Text cleaning

  • Tokenization: Large chunk of text in small
  • Normalization: converting all text to the same case (upper or lower), removing punctuation, expanding contractions, converting numbers to their word equivalents, and so on.
  • Stemming Stemming is the process of eliminating affixes (suffixed, prefixes, infixes, circumfixes) from a word in order to obtain a word stem. running → run


  • Lemmatization: Lemmatization is related to stemming, differing in that lemmatization is able to capture canonical forms based on a word’s.
  • Corpus: A corpus is a lexical resource that usually comprises words and some semantic info on those words, such as a dictionary. For eveey entry, the word on the left hand side of the equation has strong semantic equivilance with its gloss on the right. It comes with a somewhat structured layout. A corpus contains unstructured natural language text, and is used to apply nlp tasks on in attempt to enable machines to better understand this text.
  • Stop Words
  • Parts-of-speech (POS) Tagging
  • Bag of Words: Bag of words is a particular representation model used to simplify the contents of a selection of text.
  • Regular Expressions: Regular expressions, often abbreviated regexp or regexp, are a tried and true method of concisely describing patterns of text.
  • Similarity Measures
  1. Feature generation(Bag of words)
  2. Embedding and sentence representation(word2vec)
  3. Training the model by leveraging neural nets or regression techniques
  4. Model evaluation
  5. Making adjustments to the model
  6. Deployment of the model.

Follow the below flow chart to understand how NLP works.


A simple diagram given below to understand the steps of Natural Language Processing.


NLP solving significance problem in the current business word with the help of Big data, a common question can be arise in our mind that so that how a Chat bot can understand what could be next question from the user or what should be the relevant answer asked by a user.

Let’s get some Idea about Chat bot relating with NLP.

What is a Chatbot?

A chatbot is a piece of software that conducts a conversation via auditory or textual methods. Such programs are often designed to convincingly simulate how a human would behave as a conversational partner, although as of 2019, they are far short of being able to pass the Turing test. — wikipedia

According to Oxford Dictionaries, a chatbot is

“A computer program designed to simulate conversation with human users, especially over the Internet.”

Chatbots or Automated Intelligent Agents

  • These are the computer program you can talk to through messaging apps, chat windows or through voice calling apps.
  • These are intelligent digital assistants used to resolve customer queries in a cost-effective, quick, and consistent manner.

Why Are Chatbots Essential For Business

Chatbots are critical to understanding changes in digital customer care services provided and in many routine queries that are most frequently enquired.

Chatbots are useful in a certain scenario when the client service requests are specified in the area and highly predictable, managing a high volume of similar requests, automated responses.

How Does A Chatbot Work?

NLP allows and encourages machines to perform automated speech and automated text writing.


The chatbot basically needs to recognize the entities and intents of the user’s messages. In order to do that, we need to build an NLP model for every entity for an intent. For example, we can build an NLP intent model for the chatbot to recognize when a user wants to know the opening hours of a place. We can build an NLP entity model for the chatbot to recognize locations and directions. We can then use these NLP models for the chatbot to offer the opening hours of any place, based on the user’s location.

The NLP process is a core part of the chatbot architecture and process, since it is the foundation for translating the natural human language to structured data.


For example, say we are writing an NLP program to classify movie summaries by genre. The NLP program cleans up the data before processing it. It then converts the textual data into vectors after extracting the text’s features. It trains a classifier to categorize the data into genres like thriller, horror, etc. The model is then bench-marked based on parameters such as accuracy, precision, and recall. This process is repeated with different combinations of models and classifiers to find the optimal solution.

WORD CLOUD

We shall make use of what we have learned thus far in NLTK to generate a word cloud (also known as tag cloud). This is a fun and interesting way in which to visually represent how prominent certain words are in a text resource.

In the below image I have a generated a Word cloud using python, to know the whole procedure, be sure to follow me. I will publish another detailed article regarding it.

All the words have been used in the below word cloud diagram taken from the current article what you are reading now :-)


How Word Cloud is related with NLP ?

  • An easy visualization tool to understand the depth and uses of words in a text data set.
  • We can get to know the top words and the frequencies through word cloud.
  • It shows the sentiments through image of any particular events.

Conclusion:

NLP makes job easier in many real scenario but still human interference is necessary to get the ultimate solution. It need a big development in multiple languages to get the maximum business impact. In this modern Era Information is the main currency. To understand the semantic of those information is valuable insight, which only possible working and developing on NLP. Current technical advances in the field of Artificial Intelligence and Automation showcases a promising future, with a huge potential, for Natural Language Processing (NLP).

Here are some link and reference to understand NLP more !

Be remember in the end …

Love is all that matters and Statistics prove it !

hope you like the article. Keep in touch, thanks for reading, please leave your insights and comments for me!

要查看或添加评论,请登录

Buddha Deb Mondal的更多文章

社区洞察

其他会员也浏览了