Natural Language Processing _ Part 3
Natural Language Processing

Natural Language Processing _ Part 3

Sentiment Analysis

Sentiment Analysis or opinion mining is an NLP technique used to determine whether data is positive, negative, or neutral.
No alt text provided for this image
Sentiment Analysis

Types of Sentiment Analysis :

1. Standard Sentiment Analysis: In Standard Sentiment Analysis we can find out that if anything is written about anyone or anyplace or anything in general, then what is the opinion that is formed in that writing about the topic that has been reported. It can be a positive opinion, a negative opinion, or a neutral opinion.

Eg :

" The book 'The complete idiot's guide to statistics' is a fantastic book" - Positive

"I need a free trial of your course to see if it covers all relevant topics or not" - Neutral

"This book on AI ML is so confusing" - Negative

2. Fine-Grained - Sentiment Analysis:

Here the divisions are such that, the divisions have more sub-divisions -

Very Positive

Positive

Neutral

Negative

Very Negative

3. Emotion Detection: This detects under what emotion this topic was thought of and written. It can be anger, sadness, happiness, or any other emotion.

4. Aspect-based Sentiment Analysis: In Aspect based sentiment analysis, if someone has used a product and is giving a review on it, then aspect-based sentiment analysis checks on what aspect that note was written or that review was written.

5. Intent Detection: This detects the intent with which this note or comment has been written.

Eg: My app gets shut down as soon as I try to upload a video. Can you help ?

This intends to assist.


Project on Sentiment Analysis using the 'Bag of Word' model

#X_train

X_train = ["My goal in this chapter is to provide a useful concept of statistics ",?

??????" Here comes your life preserver ",?

??????" Not interpreting statistical information properly can lead to disaster "

??????"These decisions can affect our lives in many ways ",

??????" Today's corporates are making major decisions based on statistical analysis"

??????" The field of statistics is not evolving at all "

" Population surveys appear to be the primary motivation for the historical development of statistics as we know it today "]

y_train = [1,1,0,1,1,0,1]

# 1- Positive?0 - Negative

#The class represents whether this sentence is a positive or a negative sentence, if it is Positive, it is 1, if it is negative it is zero.


X_test = ["Statistics is very confusing for me"]


X_train


#Data cleaning

from nltk.tokenize import RegexpTokenizer


#Stop word removal

from nltk.stem.porter import PorterStemmer

from nltk.corpus import stopwords


#downloading stopwords package

import nltk

nltk.download('stopwords')


tokenizer = RegexpTokenizer(r'\wt')

#taking only the words and concatinate them

en_stopwords = set(stopwords.words('english'))

ps = PorterStemmer()


#using clean data function

def getCleanedText(text):

#converting the text into lowercase??

?text = text.lower()


#tokenize

?tokens = tokenizer.tokenize(text)

?#combining stopword removal and tokenizer

?new_tokens = [token for token in tokens if token not in en_stopwords]

??

#stemming

?stemmed_tokens = [ps.stem(tokens) for tokens in new_tokens]

#cleantext

?clean_text = " ".join(stemmed_tokens)

?return clean_text


#define X_text

X_test?


#Use clean text to clean our test data and train data

X_clean = [getCleanedText(i) for i in X_train]

Xt_clean = [getCleanedText(i) for i in X_test]


X_clean


#vectorize

#before classification we need to vectorize our text

#from scikit learn extract text and import count vectorizer?

from sklearn.feature_extraction.text import CountVectorizer


cv = CountVectorizer(ngram_range = (1,2))

#vectorize our output?

X_vec = cv.fit_transform(X_clean).toarray()


X_vec

#so for every word we will get a vector over here?


#getting feature names

print(cv.get_feature_names())

#The countervectorizer tells how many times a word/ "string" has been repeated in a sentence

#This kind of model is known as bag of word model.?


#vectorization for test value

Xt_vect = cv.transform(Xt_clean).toarray()


#Classification Task

#In order to perform text classification we use Multinomial Naive Bayes(NB)

#import Multinomial Naive Bayes(NB)

from sklearn.naive_bayes import MultinomialNB?


mn = MultinomialNB()


mn.fit(X_vec , y_train)


#Perform Prediction

y_pred = mn.predict(Xt_vect)


#This will give us an array i.e 1 & 0, 1-Positive class, 0-Negative class??

y_pred

要查看或添加评论,请登录

社区洞察

其他会员也浏览了