Natural Language Processing _ Part 3
ARNAB MUKHERJEE ????
Automation Specialist (Python & Analytics) at Capgemini ??|| Master's in Data Science || PGDM (Product Management) || Six Sigma Yellow Belt Certified || Certified Google Professional Workspace Administrator
Sentiment Analysis
Sentiment Analysis or opinion mining is an NLP technique used to determine whether data is positive, negative, or neutral.
Types of Sentiment Analysis :
1. Standard Sentiment Analysis: In Standard Sentiment Analysis we can find out that if anything is written about anyone or anyplace or anything in general, then what is the opinion that is formed in that writing about the topic that has been reported. It can be a positive opinion, a negative opinion, or a neutral opinion.
Eg :
" The book 'The complete idiot's guide to statistics' is a fantastic book" - Positive
"I need a free trial of your course to see if it covers all relevant topics or not" - Neutral
"This book on AI ML is so confusing" - Negative
2. Fine-Grained - Sentiment Analysis:
Here the divisions are such that, the divisions have more sub-divisions -
Very Positive
Positive
Neutral
Negative
Very Negative
3. Emotion Detection: This detects under what emotion this topic was thought of and written. It can be anger, sadness, happiness, or any other emotion.
4. Aspect-based Sentiment Analysis: In Aspect based sentiment analysis, if someone has used a product and is giving a review on it, then aspect-based sentiment analysis checks on what aspect that note was written or that review was written.
5. Intent Detection: This detects the intent with which this note or comment has been written.
Eg: My app gets shut down as soon as I try to upload a video. Can you help ?
This intends to assist.
Project on Sentiment Analysis using the 'Bag of Word' model
X_train = ["My goal in this chapter is to provide a useful concept of statistics ",?
??????" Here comes your life preserver ",?
??????" Not interpreting statistical information properly can lead to disaster "
??????"These decisions can affect our lives in many ways ",
??????" Today's corporates are making major decisions based on statistical analysis"
??????" The field of statistics is not evolving at all "
" Population surveys appear to be the primary motivation for the historical development of statistics as we know it today "]
y_train = [1,1,0,1,1,0,1]
# 1- Positive?0 - Negative
#The class represents whether this sentence is a positive or a negative sentence, if it is Positive, it is 1, if it is negative it is zero.
X_test = ["Statistics is very confusing for me"]
X_train
#Data cleaning
from nltk.tokenize import RegexpTokenizer
#Stop word removal
from nltk.stem.porter import PorterStemmer
from nltk.corpus import stopwords
#downloading stopwords package
import nltk
nltk.download('stopwords')
tokenizer = RegexpTokenizer(r'\wt')
#taking only the words and concatinate them
en_stopwords = set(stopwords.words('english'))
ps = PorterStemmer()
#using clean data function
def getCleanedText(text):
领英推荐
#converting the text into lowercase??
?text = text.lower()
?tokens = tokenizer.tokenize(text)
?#combining stopword removal and tokenizer
?new_tokens = [token for token in tokens if token not in en_stopwords]
??
?stemmed_tokens = [ps.stem(tokens) for tokens in new_tokens]
?clean_text = " ".join(stemmed_tokens)
?return clean_text
#define X_text
X_test?
#Use clean text to clean our test data and train data
X_clean = [getCleanedText(i) for i in X_train]
Xt_clean = [getCleanedText(i) for i in X_test]
X_clean
#before classification we need to vectorize our text
#from scikit learn extract text and import count vectorizer?
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(ngram_range = (1,2))
#vectorize our output?
X_vec = cv.fit_transform(X_clean).toarray()
X_vec
#so for every word we will get a vector over here?
#getting feature names
print(cv.get_feature_names())
#The countervectorizer tells how many times a word/ "string" has been repeated in a sentence
#This kind of model is known as bag of word model.?
#vectorization for test value
Xt_vect = cv.transform(Xt_clean).toarray()
#Classification Task
#In order to perform text classification we use Multinomial Naive Bayes(NB)
#import Multinomial Naive Bayes(NB)
from sklearn.naive_bayes import MultinomialNB?
mn = MultinomialNB()
mn.fit(X_vec , y_train)
#Perform Prediction
y_pred = mn.predict(Xt_vect)
#This will give us an array i.e 1 & 0, 1-Positive class, 0-Negative class??
y_pred