An overview of the combined power of Twitter and Python
Meghna Goswami
Manager - Cyber, Risk and Regulatory at PwC | MS-IS graduate - UT Arlington-College of Business | Engineer
A lot of us have heard about the fascinating and powerful visualizations and calculations that Python can do, but how many of us have actually seen it closely? This blog gives rare and useful insights into the combined power of Twitter and Python for Social Media Analytics…
The following are a few important steps used in Social Media Analytics using Twitter:
Data Collection :
Twython is an actively maintained, pure Python wrapper for the Twitter API. It supports both normal and streaming Twitter APIs. It is used to extract the required tweets by filtering them out based on keywords and time range.
Sentiment Analysis :
TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. It tries to make easy things easy and hard things possible. One can generate plots, histograms, power spectra, bar charts, error charts, scatterplots, etc., with just a few lines of code.
In the figures below, TextBlob in combination with Matplotlib is used to plot the polarity and subjectivity scores based on the corpus of tweets -
Word cloud :
NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries.
The figure below shows a Word Cloud created by removing stop words using NLTK package and then stemmed using the Porter Stemmer algorithm. The words of the tweets were then fed into the Word Cloud module -
Topic Modeling :
After removing stop-words and stemming, Non-negative Matrix Factorization (NMF) from Scikit-Learn and Latent Dirichlet Allocation (LDA) from GENSIM are used to conduct topic analysis.
References and further reading :
https://twython.readthedocs.io/en/latest/
https://towardsdatascience.com/topic-modelling-in-python-with-nltk-and-gensim-4ef03213cd21
Manager - Cyber, Risk and Regulatory at PwC | MS-IS graduate - UT Arlington-College of Business | Engineer
5 年Anish Grandhi