Natural Language Processing using Python

Natural Language Processing using Python

This article is the first among a series of articles that will cover different topics in Natural Language Processing using Python. In these article, I will share the contents of my talks delivered with WomenWhoCode on March 9 , 16, 23 and 30, 2023 respectively. During these talks, I discussed code example from various internet sites, and official documentation sites of python libraries like NLTK, SpaCy, PyTorch, HuggingFace. I have also used the images from the internet for explaining the content. All the resources have been acknowledged in the References section of the article as well as in the Jupyter Notebooks used for my talks. The links to jupyter notebooks containing the code are shared as well.

Session 1:

Introduction to Natural Language Processing (NLP)

A Natural Language is a language like Hindi, English, German, French, Spanish etc. that is used by "Humans" to communicate among themselves.

Spoken and written text in human languages is the most abundant type of data on the Internet and it is unstructured.

Spoken and written text in human languages is the most abundant type of data on the Internet and it is unstructured.

This data comes from:

  • Social media interaction of users: twitter, facebook, linkedin, youtube, reddit, instagram etc.
  • Large collections of unstructured electronic text for research purposes
  • Data provided for open access e.g., google books, project gutenberg, wikidata, data at government sites
  • Library databases: data provided by libraries of academic universities etc.

The unstructured text data holds valuable information. Application of ML models to this data has lead to development of applications and tools for improving the way we do work. Such applications are used in the domains of governance, finance, social media, customer engagement, politics, health care, entertainment. (Alexa, Siri, Google Translate, ChatGPT)

Natural Language Processing (NLP) is a branch of AI that aims at building computer systems that can understand natural language and respond in a natural language.

  • Human language is filled with ambiguities?To "understand" written or spoken text in a natural language means to recognize the meaning, the intent and the sentiment of it.NLP uses statistical models, machine learning, deep learning to understand text.

These ambiguities in natural language make task of understanding natural language very hard. A number of NLP tasks are required to convert the text data into a meaningful form. These involves tasks like:

- Text Preprocessing

1.Tokenization?Splitting a large chunk of text into constituents - sentences, words, punctuations.

2.Normalization?Case conversion, Removing the non-useful words (stopwords like the, a, an etc.), bringing words in their stem forms, replacing words by their most used form (lemma).

- Converting Text to Numerical Vectors

3.Vector Embedding?Converting words and sentences into numeric vectors for use with computational algorithms.

- Lexical Tasks

4.Part of speech tagging?or grammatical tagging, Identifying grammatical class (Part of speech) of the word e.g., nouns, verbs etc. based on its use and context. E.g., to correctly tag ‘make’ as a verb in ‘I can make a paper plane,’ and as a noun in ‘What make of car do you own?’

5.Identifying relationships?Recognising relationships among words in the text, e.g., subject - object relationships.

6.Word sense disambiguation?Recognising the intended meaning of a word in context. E.g., 'make' in ‘make the grade’ (achieve) vs. 'make a bet' (place).

7.Named entity recognition?Is the task of identifying names of people, geographic locations, organisations etc. from text.

8.Co-reference resolution?is the task of identifying if and when two words refer to the same entity. It includes pro-noun resolution, identifying idiom or metaphor in text.

- Higher Level Tasks

9.Sentiment analysis?Identification of subjective attitudes, emotions etc.

10.Text Classification?Categorising text into separate classes based on presence of certain words.

11.Speech recognition?It is the task of converting voice data into text data. Different speeds, punctuation, emphasis, intonation, accents, incorrect grammar slurrying words together makes speech recognition very challenging.

12.Recognizing Textual Entailment:?Textual entailment or Natural Language Inference (NLI) is the task of recognizing whether one piece of text can be inferred from another piece of text.

13.Natural language generation?is the task of putting structured information into human language.

14.Language Modelling?A language model generates a probability distribution over sequences of words in one or many languages. We train language models on the text corpora on there languages. Once trained, these models can generate natural langauge text accurately. (GPT3!!)

NLP use cases

1.Cybersecurity?Spam detection involves using text classification to identify spam email. Presence of certain words and attributes of text like overuse of financial terms, wrong grammar, threats, inappropriate urgency etc.

2.Machine translation:?Translation of text in one language to another while preserving the intended meaning.

3.Conversational AI/Smart Virtual agents and chatbots:?These softwares first use speech recognition to capture the voice command from a human user, then they use natural language generation technologies to generate response. Chatbots have proved to be very effective as customer service chatbots for organisations for the tasks like answering FAQs, booking appointments, tickets etc.

4?Question-answering systems:?These can respond to well-formed and not so well formed questions.

5.Social media sentiment analysis:?Such systems help discover hidden insights, intents, attitudes, emotions from data customers put of social media forums, e-commerce sites etc.

6.Text summarization:?These systems parse huge volumes of text and create summaries of them. They employ semantic reasoning and natural language generation.

7.Generating Captions for Images.

8.Language Models:?Used for machine translation, OCR, text generation, (GPT, LaMDA etc.)

9.Autocorrect?Correct the spellings or grammar of text as it is being types.

10.Autocomplete:?Suggest sentence completion

11.Semantic Search:?Context aware search, useful for e-commerce

12.Healthcare:?Convert hand written prescriptions into text or voice, clinical documentation, clinical trial matching, computational phenotyping, etc.

13.Finance and Insurance:?Stock Analysis, credit scoring, insurance claims management, financial reporting and auditing

14.HR:?Resume screening and evaluation, interview assessment (sentiment analysis, document summarization), video transcription, employee sentiment analysis.

15.Market Intelligence:?Applies webscraping, sentiment analysis to detect market trends and optimize current market strategy for a company.

In the next article, I will cover the installation and features of prominent python libraries for NLP with focus on NLTK and SpaCy.

In the next article, I will cover the prominent python libraries for NLP especially NLTK and SpaCy.

Link to my GitHub repository containing the Jupyter Notebooks for these sessions is here:

https://github.com/NimritaKoul/NLP_WWC2023

References

1. https://www.ibm.com/in-en/topics/natural-language-processing

2. https://python-course.eu/machine-learning/natural-language-processing-with-python.php

3. https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1184/syllabus.html

4. https://www.nltk.org/book

5. https://www.analyticsvidhya.com/blog/2021/07/nltk-a-beginners-hands-on-guide-to-natural-language-processing/

6. https://web.stanford.edu/~gentzkow/research/text-as-data.pdf

7. https://github.com/n-kostadinov/sentiment-analysis

8. https://notebook.community/DryingPole/sentiment/notebooks/Final%20Project%20Process%20Book

9. https://nbviewer.org/url/norvig.com/ipython/How%20to%20Do%20Things%20with%20Words.ipynb

10. https://towardsdatascience.com/named-entity-recognition-with-nltk-and-spacy-8c4a7d88e7da

11. https://spacy.io/usage/spacy-101

Shinu Abhi, PhD

Fulbright Fellow | Professor and Director, Corporate Training | REVA Academy For Corporate Excellence (RACE) | REVA University

1 年

Excellent course materials Nimrita Koul, PhD! Keep up your hard work! More laurels on your way!

要查看或添加评论,请登录

Dr. Nimrita Koul的更多文章

社区洞察

其他会员也浏览了