2022 Beijing Winter Olympic Games: Doing a Sentiment Analysis of the tweets

2022 Beijing Winter Olympic Games: Doing a Sentiment Analysis of the tweets

How can we analyze the tweets in almost real-time? Communication during the Olympic Games is fantastic because many eyes around the world are aimed into this massive event. Let's use the Sentiment Analysis to measure the reception of the Games on Twitter.


Olympic Games Context Introduction

Right now, in Beijing, China, the XXIV Olympic Winter Games are happening. With the motto "Together for a Shared Future", 2871 athletes from 91 nations will be competing in 15 different sports.

Of course, we have people in favor and people against the Games, and we won't discuss that. But, what we cannot deny is that we are in the most politicized Games since the Cold Ward era. With the USA announcing a diplomatic boycott, plus the demonstrations by Tibetan & Uyghurs independence groups, we need to add the situation in regard to the COVID-19.

On the other hand, according to the IOC Marketing & Broadcasting report, in Tokyo 2020 there were "6.1B engagements (likes, comments, shares and video views on Olympic post) on Olympic social media handles, across 9 social media platforms", from 25 Feb 2020 to 05 Sep 2021 (sources at the end of the article)

These interactions were positive and negative, but interactions at the end. We need to remember that there was an important piece of the public opinion that was against the realization of the Tokyo 2020 Olympic games due to the COVID situation. But, once the Games started, the opinion changed:

No alt text provided for this image

As we can see in this graph from NTT Data, there was a turnover regarding the ratio between positive and negative tweets once the Olympic Games started. This tells us that it is important to analyze the data, understanding the context. This includes when the data was produced: when the tweet was posted, in this case.


Introduction to Sentiment Analysis

In this article, I would like to show that making this analysis is not difficult at all, and we all have the tools in our hands.

In this case, we will analyze 30.000 tweets scraped from Twitter today, February 6th, a few hours before I post this: from 9:30 to 18:55hs. The scraped tweets must fulfill two conditions to be accepted: have the word 'olympics' as part of the text and be in English.

Using the NLTK Python's library, plus a tokenization process, we will:

  • Split the tweets in words: a tweet is not more than a group of words (a phrase). We will split the string/phrase into a list of words for individual analysis.
  • We remove StopWords such as the articles, prepositions, etc. Any search engine has been programmed to ignore them, both when indexing entries for searching and when retrieving them as the result of a search query.
  • We count how many times the words get repeated, to do a ranking
  • We get the sentiment of the tweet by analyzing the positiveness or negativeness of each word (summing positive words and resting negative ones). We have bags of positive words and bags of negative ones. If the word matches with one of the words in the bag of positives, we sum 1. If it matches with one of the negative's bag, we rest 1. If not, we just continue to the other words without summing or resting.

Note: This is just rough analysis. We need to take into consideration that there is bias in the counting for two main reasons:

  • Bots (you need to measure some key items to classify a Twitter account as a bot or not). -For example, in terms of Big Data principles, one of the best ways to validate the information you have (in order not to produce fake news) is by validating the author of the news. Nevertheless, we can still find false-negatives in this case.
  • Interpretation (the irony is not fully addressed. Like saying: 'Thanks for making me lose 2 hours of my day', it's a negative phrase, but the sentiment analysis will take it as neutral as it has 1 positive word and 1 negative one.)

Note 2: all the analysis was made taking the time GMT+1 (European Standard Time)


Analyzing Olympic Tweets

Let's start.

From the 30.000 tweets, after cleaning them, we got this value counts.

No alt text provided for this image

This is not very helpful for coming to any conclusion, so we do a sentiment analysis of each tweet, and rank it positive, neutral, or negative.

No alt text provided for this image


We get that from 9:30 to 18:55 there isn't any negative peak. In fact, we have a positive sentiment overall. At least from the period of time we are analyzing.

No alt text provided for this image


Let's make our time range smaller and only analyze the tweets from 10 to 11:00hs. It is very interesting to see the peaks, and the clustering

No alt text provided for this image

Crossing the information with the schedule, we can see that at 9:30 the Speed Skating Finals were happening, while at 9:40 the Ice Hockey game between JPN and CHN started. This could be a reason why we have that many tweets between 10:00 and 10:20. Data with context becomes information... and that is what we are looking for.


Positive or Negative?

Now... the tweets in general, were they positive or negative?

No alt text provided for this image

It's very interesting to see how positiveness beats negativeness. Neutrality is understandable, as many news may be objective, and with that, neutral.

Words Clouds

Finally, one of the most amazing tools we have found in marketing is the Word Cloud. What are the words we find (along with 'olympics') in the positive tweets?

No alt text provided for this image

And, what are the negative ones?

No alt text provided for this image


Fascinating, right?

Thanks for reading!


Sources: https://stillmed.olympics.com/media/Documents/International-Olympic-Committee/IOC-Marketing-And-Broadcasting/Tokyo-2020-External-Communications.pdf

---------------------------------------------------------------------------------------------------------------

Author:?Ignacio Ariznabarreta -?JIAF Consulting


Felicitaciones I?aki!!!! Muy interesante tu análisis. Un abrazo

要查看或添加评论,请登录

Jose Ignacio Ariznabarreta Fossati的更多文章

社区洞察

其他会员也浏览了