登录查看更多内容

Natural Language Processing (NLP)

Prashil Wanjari

Business analyst | Lean Six Sigma | AWS Cloud Practitioner | PSPO - WIP | Digital Transformation | Business Transformation | CRM Implementation

发布日期: 2021年3月21日

As per Wikipedia, it is a subfield of linguistic, Computer Science, and Artificial Intelligence with the interaction between computers and humans.

In common words, it is like giving computers the ability to understand the text and spoken words in much the same way human beings can.

There are three important aspects of NLP:

1. Tokenization

2. Stemming

3. Lemmatization

1. Tokenization: It is a process of breaking down a piece of text into smaller units (sentence or word) called tokens.

Code to tokenize a para into sentence and word:

Sentence=nltk.sent_tokenize(para) #NLTK is a leading platform for building Python programs to work with human language data.

Word=nltk.word_tokenize(para)

2. Stemming: It is a process of reducing words to the base words. Sometimes stem words don’t carry any meaning.

Example- “Finally”, “Final”, “Finalized” will be stemmed to “Fina”

Code to stem words:

from nltk.stem import PorterStemmer # From NLTK library, PorterStemmer is used to stem words

PS=PorterStemmer()

Word= PS.stem(words)

3. Lemmatization: It is a process of reducing words to the base words, but lemmatized words carry meaning.

Example- “Finally”, “Final”, “Finalized” will be lemmatized to “Final”

Code to lemmatize words:

from nltk.stem import WordNetLemmatizer # From NLTK library, WordNetLemmatizer is used to lemmatize words

WNL= WordNetLemmatizer ()

Word= WNL.Lemmatize(words)

Here is the example of a problem statement I worked on using NLP

Problem Statement: Use machine learning to create a model that identifies spam and ham mails

Solution:

1. Importing all the functions from nltk library (PorterStemmer, Stopwords)

2. Importing pandas to read the file

3. Assigning a variable to function PorterStemmer

4. Loop to perform Data Cleaning

5. Importing CountVectorizer from sklearn.feature_extraction.text to perform BOW- BagOfWords

6. Splitting the data into Train and Train dataset

7. Importing MultinomialNB from sklearn.naive_bayes for predicting Y_predict

8. Importing confusion_matrix to compare y_test and y_predict

9. Finding accuracy_score by importing from sklearn.metrics

Code - Git_Hub_Link

要查看或添加评论，请登录

Prashil Wanjari的更多文章

Data Visualization

2021年8月3日

Data Visualization

Data visualization is the most important part of decision making. Analyst can jump to conclusion after analyzing…
A glimpse of Machine Learning

2021年6月20日

A glimpse of Machine Learning

Machine learning is a process of imitating humans with the help of algorithms. In simple terms, it involves copying the…
Introduction to Analytics

2021年6月10日

Introduction to Analytics

What is analytics? Analytics is a branch consisting of statistics, machine learning – deep learning algorithms, data…
Netflix Dataset Visualization – Tableau

2021年4月5日

Netflix Dataset Visualization – Tableau

Like every other aspiring BA student, I tried my hands on Tableau. This is the first time I used the software to build…
House Prices - Regression Techniques

2021年3月15日

House Prices - Regression Techniques

With some experience in Machine learning and Python, I tried my hands-on problem statement where one to predict the…
First Step towards Machine Learning

2021年3月9日

First Step towards Machine Learning

The increase in demand for Data science engineers has made students think of Analytics as a good career. Many…

See all articles

Natural Language Processing (NLP)

Prashil Wanjari

Business analyst | Lean Six Sigma | AWS Cloud Practitioner | PSPO - WIP | Digital Transformation | Business Transformation | CRM Implementation

Prashil Wanjari的更多文章

社区洞察

其他会员也浏览了

Understanding Word Embedding in NLP using Sentence Transformers

Brief Math and History Behind LLMs (Large Language Models)

Use OpenAI's natural language processing and question answering capabilities to generate and validate documents or knowledge base

Transformers: Basics on One-Hot Encoding, Dot Product and Matrices multiplication

Automatic Labeling of News Articles Using Natural Language Processing - A Machine Learning Approach

#NLP Series #Episode 4: Feature Engineering in NLP | Text Vectorization

Word to Vectors

Financial Sentiment Analysis using FinBert

Basic Text Representation: Bag of Words & TF-IDF