Sentiment Analysis using Deep Learning (1-D CNN)
Ankit Agarwal
Founder | CEO | Private Equity Value Creation | Generative AI Agents | Gen AI Board Advisor | Investor | Speaker | Mentor | Startups | Thought Leadership | Artificial Intelligence | Ex-Amazon
I wanted to experiment with the recent advances in Tensorflow including Tensorboard and Keras Callback functions and hence though of trying to build a Deep Learning CNN model for a cliched problem in AI: Sentiment Analysis.
Sentiment analysis (also known as opinion mining) is an active research area in natural language processing. The task aims at identifying, extracting, and organizing sentiments from user-generated texts in social networks, blogs, or product reviews. Over the past two decades, many studies in the literature exploit machine learning approaches to solve sentiment analysis tasks from different perspectives. Since the performance of a machine learner heavily depends on the choices of data representation, many studies devote to building powerful feature extractor with domain expertise and careful engineering. In last few years, deep learning approaches emerge as powerful computational models that discover intricate semantic representations of texts automatically from data without feature engineering. These approaches have improved the state of the art in many sentiment analysis tasks, including sentiment classification, opinion extraction, fine-grained sentiment analysis, etc.
Data
I have used multiple datasets to train the Sentiment Analysis Classifier –
- IMDB dataset taken from Kaggle : https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
- Yelp reviews dataset
- Amazon product reviews
- Random list of positive comments
- Random list of negative comments
(Refer my Github Repository to get all these datasets and complete Python Code- https://github.com/Ankit-DA/CNN_Sentiment_Analysis)
Callback Functions
From the Keras documentation: A callback is a set of functions to be applied at given stages of the training procedure. You can use callbacks to get a view on internal states and statistics of the model during training.
Why Callback: The biggest challenge programmer faces during model training if overfitting of the Model. Most of the times we use an arbitrary number of epochs and wait for the model training to complete to validate. If the model overfits before that number of epochs is reached then we reduce the number of epochs and train again otherwise, we increase the number of epochs and this approach is very wasteful& time consuming. An improved method is to handle this during training by stopping the training when we realize that the validation loss is no longer improving through Keras Callback functions EarlyStopping and ModelCheckpoint. A callback is an object (a class instance implementing specific methods) that is passed to the model in the call to fit and that is called by the model at various points during training. It has access to all the available data about the state of the model and its performance, and it can perform actions like: interrupt training, save a model, adjusting the learning rates over time, or otherwise alter the state of the model.
Why Convoluted Neural Network (CNN)
In the pooling neural network, we are only able to use word-level features. When the order of words changes in a sentence, the sentence representation result remains unchanged. In traditional statistical models, n-gram word features are adopted in order to alleviate the issue, showing improved performances. For neural network models, a convolution layer can be exploited to achieve a similar effect.
Formally, a convolution layer performs nonlinear transformations by traversing a sequential input with a fixed-size local filter.
Tensorboard
Tensorboard provides an excellent way to visualize various metrices generated during model training and validation. Some of the metrices that were immensely useful –
· Tracking and visualizing metrics such as loss and accuracy
· Visualizing the model graph (ops and layers)
· Viewing histograms of weights, biases, or other tensors as they change over time
It can easily be invoked from Windows 10 machine through Jupyter notebook with minimal efforts, so I thought of leveraging it to monitor some metrices. Here are couple of samples –
Further Enhancements
There are number of improvements that can be made to this model including (and not limited to)-
- Improving Word Embedding by increasing dimensions of Word Embedding from 50 to 100 or more
- Leveraging existing pre-trained word embedding (Google News dataset (about 100 billion words))
- Training the model on a larger dataset
- Currently Model is trained on only first 500 words of each review/comment. This can be increased to 1000 words
- Increasing number of hidden layers and optimizing other parameters within CNN Model training
- Leveraging RNN in place of CNN. The CNN structure uses a fixed-size of word window to capture the local composition features around a given position, achieving promising results. However, it ignores the long-distance dependency features that reflect syntactic and semantic information, which are particularly important in understanding natural language sentences. These dependency-based features are addressed by recurrent neural network (RNN) under the neural setting, achieving great success.
Here is the link to the Github repository for full code and test data: https://github.com/Ankit-DA/CNN_Sentiment_Analysis
Ending with a quote which I learn from the movie “Extraction”
“You drown not by jumping into water but by remaining submerged into it”