ML Model Building with Amazon Alexa Review Data set using Advanced NLP spaCy, NLTK, Regression, GridCV & Xgboost
amazon.jobs

ML Model Building with Amazon Alexa Review Data set using Advanced NLP spaCy, NLTK, Regression, GridCV & Xgboost

The data set consists of a nearly 3000 Amazon customer reviews (input text), star ratings, date of review, variant and feedback of various amazon Alexa products like Alexa Echo, Echo dots, Alexa Firesticks etc. for learning how to train Machine for sentiment analysis.

we are using this data to analyze Amazon’s Alexa product ; discover insights into consumer reviews and assist with machine learning models.We can train our machine models for sentiment analysis and analyze customer reviews how many positive reviews ? and how many negative reviews ?

It has been performed with Following Steps:

  1. Import the dataset from my Github Repo which is raw file as 'amazon_alexa.tsv'.
  2. Perform Exploratory Data Analysis with different python commands such as head,info,describe,shape,isnull & groupby methods.
  3. Then we perform the data visualization part which will try to understand the business problem and its solution.perform operations with matplotlib, wordcloud commands
  4. Then we use the Advanced spaCy NLP library for text analysis by using English language and load 'en_code_web_md' Model Create our list of punctuation marks, stopwords, use tokenizer, tagger, parser, NER and word vectors.
  5. Creating function for tokenizer, Lemmatizing each token and converting each token into lowercase, Custom transformer using spaCy, clean the text.
  6. create training & testing data for Logistic Regression classifier, Create pipeline for cleaner, vectorizer & Classifier.
  7. Logistic Regression Accuracy: 0.9417989417989417
  8. Same dataset we are processing with libraries for NLTK, create corpus and bag of words for further processing
  9. Apply min-max scalar and again build RandomForestClassifier, here we get score for Training Accuracy : 0.9941043083900227, Testing Accuracy : 0.9428571428571428.
  10. Then we apply k fold cross validation and we get an Accuracy : 0.9365158371040725.
  11. Then applying grid search with stratified folds and try to find out the best hyper parameters as : Best Parameter Combination : {'bootstrap': True, 'max_depth': 100, 'min_samples_split': 8, 'n_estimators': 300} and Accuracy Score for Test Set : 0.9428571428571428
  12. then we build the model with xgboost classifier and get results as : Training Accuracy : 0.970521541950113, Testing Accuracy : 0.9407407407407408

Kindly go through each & every step and try to understand the different accuracy scores with each model, it's predictions and try to interpret the results on your own.

Thank you.. !!!! Happy Learning

Please give the link to the code. Thanks

要查看或添加评论,请登录

Nilesh Gode的更多文章

社区洞察

其他会员也浏览了