登录查看更多内容

2022 Beijing Winter Olympic Games: Doing a Sentiment Analysis of the tweets

Jose Ignacio Ariznabarreta Fossati

Making data speak...

发布日期: 2022年2月6日

How can we analyze the tweets in almost real-time? Communication during the Olympic Games is fantastic because many eyes around the world are aimed into this massive event. Let's use the Sentiment Analysis to measure the reception of the Games on Twitter.

Olympic Games Context Introduction

Right now, in Beijing, China, the XXIV Olympic Winter Games are happening. With the motto "Together for a Shared Future", 2871 athletes from 91 nations will be competing in 15 different sports.

Of course, we have people in favor and people against the Games, and we won't discuss that. But, what we cannot deny is that we are in the most politicized Games since the Cold Ward era. With the USA announcing a diplomatic boycott, plus the demonstrations by Tibetan & Uyghurs independence groups, we need to add the situation in regard to the COVID-19.

On the other hand, according to the IOC Marketing & Broadcasting report, in Tokyo 2020 there were "6.1B engagements (likes, comments, shares and video views on Olympic post) on Olympic social media handles, across 9 social media platforms", from 25 Feb 2020 to 05 Sep 2021 (sources at the end of the article)

These interactions were positive and negative, but interactions at the end. We need to remember that there was an important piece of the public opinion that was against the realization of the Tokyo 2020 Olympic games due to the COVID situation. But, once the Games started, the opinion changed:

As we can see in this graph from NTT Data, there was a turnover regarding the ratio between positive and negative tweets once the Olympic Games started. This tells us that it is important to analyze the data, understanding the context. This includes when the data was produced: when the tweet was posted, in this case.

Introduction to Sentiment Analysis

In this article, I would like to show that making this analysis is not difficult at all, and we all have the tools in our hands.

In this case, we will analyze 30.000 tweets scraped from Twitter today, February 6th, a few hours before I post this: from 9:30 to 18:55hs. The scraped tweets must fulfill two conditions to be accepted: have the word 'olympics' as part of the text and be in English.

Using the NLTK Python's library, plus a tokenization process, we will:

Split the tweets in words: a tweet is not more than a group of words (a phrase). We will split the string/phrase into a list of words for individual analysis.
We remove StopWords such as the articles, prepositions, etc. Any search engine has been programmed to ignore them, both when indexing entries for searching and when retrieving them as the result of a search query.
We count how many times the words get repeated, to do a ranking
We get the sentiment of the tweet by analyzing the positiveness or negativeness of each word (summing positive words and resting negative ones). We have bags of positive words and bags of negative ones. If the word matches with one of the words in the bag of positives, we sum 1. If it matches with one of the negative's bag, we rest 1. If not, we just continue to the other words without summing or resting.

Note: This is just rough analysis. We need to take into consideration that there is bias in the counting for two main reasons:

Bots (you need to measure some key items to classify a Twitter account as a bot or not). -For example, in terms of Big Data principles, one of the best ways to validate the information you have (in order not to produce fake news) is by validating the author of the news. Nevertheless, we can still find false-negatives in this case.
Interpretation (the irony is not fully addressed. Like saying: 'Thanks for making me lose 2 hours of my day', it's a negative phrase, but the sentiment analysis will take it as neutral as it has 1 positive word and 1 negative one.)

Note 2: all the analysis was made taking the time GMT+1 (European Standard Time)

Analyzing Olympic Tweets

Let's start.

From the 30.000 tweets, after cleaning them, we got this value counts.

This is not very helpful for coming to any conclusion, so we do a sentiment analysis of each tweet, and rank it positive, neutral, or negative.

领英推荐

Media Headquarters, Laura Woods, Poker Tournament &…

SBC Summit 6 个月前

Winter Is Calling! ?

Spacetoon 2 年前

Inform. Inspire. Celebrate.

The STA Group 1 年前

We get that from 9:30 to 18:55 there isn't any negative peak. In fact, we have a positive sentiment overall. At least from the period of time we are analyzing.

Let's make our time range smaller and only analyze the tweets from 10 to 11:00hs. It is very interesting to see the peaks, and the clustering

Crossing the information with the schedule, we can see that at 9:30 the Speed Skating Finals were happening, while at 9:40 the Ice Hockey game between JPN and CHN started. This could be a reason why we have that many tweets between 10:00 and 10:20. Data with context becomes information... and that is what we are looking for.

Positive or Negative?

Now... the tweets in general, were they positive or negative?

It's very interesting to see how positiveness beats negativeness. Neutrality is understandable, as many news may be objective, and with that, neutral.

Words Clouds

Finally, one of the most amazing tools we have found in marketing is the Word Cloud. What are the words we find (along with 'olympics') in the positive tweets?

And, what are the negative ones?

Fascinating, right?

Thanks for reading!

Sources: https://stillmed.olympics.com/media/Documents/International-Olympic-Committee/IOC-Marketing-And-Broadcasting/Tokyo-2020-External-Communications.pdf

---------------------------------------------------------------------------------------------------------------

Author:?Ignacio Ariznabarreta -?JIAF Consulting

Alejandro Luis ECHAZú

Docente

3 年

Felicitaciones I?aki!!!! Muy interesante tu análisis. Un abrazo

2 次回应

要查看或添加评论，请登录

Jose Ignacio Ariznabarreta Fossati的更多文章

Olympic Data: Making Olympic Games' Data Speak - A Generic Exploration

2022年1月22日

Olympic Data: Making Olympic Games' Data Speak - A Generic Exploration

Did you know that the youngest Olympian ever was 10 years old when he participated and even won a medal? (also being…

1 条评论
Part 4 (Final): Predicting results, and working with Command Boards using Machine Learning

2022年1月20日

Part 4 (Final): Predicting results, and working with Command Boards using Machine Learning

Analyzing the Employees TurnOver: Part 4 (Final) - Descriptive & Predictive analysis If you haven't seen Parts 1, 2, or…
Part 3: Predicting results, and working with Command Boards using Machine Learning

2022年1月18日

Part 3: Predicting results, and working with Command Boards using Machine Learning

Analyzing the Employees TurnOver: Part 3 - Descriptive & Predictive analysis If you haven't seen Part 1 or Part 2…
Part 2: Predicting results, and working with Command Boards using Machine Learning

2022年1月17日

Part 2: Predicting results, and working with Command Boards using Machine Learning

Analyzing the Employees TurnOver: Part 2 If you haven't seen Part 1, please refer to Part 1: Introduction As an…
Predicting results, and working with Command Boards using Machine Learning

2022年1月16日

Predicting results, and working with Command Boards using Machine Learning

Analyzing the Employees TurnOver Machine Learning, and the whole AI spectrum, give us a lot of resources to attack many…

1 条评论

See all articles

2022 Beijing Winter Olympic Games: Doing a Sentiment Analysis of the tweets

Jose Ignacio Ariznabarreta Fossati

Making data speak...

Olympic Games Context Introduction

Introduction to Sentiment Analysis

Analyzing Olympic Tweets

领英推荐

Positive or Negative?

Words Clouds

Jose Ignacio Ariznabarreta Fossati的更多文章

社区洞察

其他会员也浏览了

Home away from home

The Fifa Women’s World Cup blackout; Bluesky; an inside look at ChatGPT; the Hollywood’s writers’ strike.

8 unmissable pieces of content from the Women's World Cup...so far

Why shoot from above?

Redtorch Lights - May 2024

CSCF and Malta Police Force successfully organize the IntegriSport Next national Awareness Raising Session on tackling sports manipulations in Malta

Transform Your 2022 World Cup Coverage with The Best-in-class Data-driven Solution

What is the worth of the sports industry?

ONE8Y DB Newsletter

Predicting Team Canada's Performance at the 2024 Paris Olympics

Olympic Games Context Introduction

Introduction to Sentiment Analysis

Analyzing Olympic Tweets

领英推荐

Positive or Negative?

Words Clouds

Jose Ignacio Ariznabarreta Fossati的更多文章

Olympic Data: Making Olympic Games' Data Speak - A Generic Exploration

Part 4 (Final): Predicting results, and working with Command Boards using Machine Learning

Part 3: Predicting results, and working with Command Boards using Machine Learning

Part 2: Predicting results, and working with Command Boards using Machine Learning

Predicting results, and working with Command Boards using Machine Learning

社区洞察

其他会员也浏览了

Home away from home

The Fifa Women’s World Cup blackout; Bluesky; an inside look at ChatGPT; the Hollywood’s writers’ strike.

8 unmissable pieces of content from the Women's World Cup...so far

Why shoot from above?

Redtorch Lights - May 2024

CSCF and Malta Police Force successfully organize the IntegriSport Next national Awareness Raising Session on tackling sports manipulations in Malta

Transform Your 2022 World Cup Coverage with The Best-in-class Data-driven Solution

What is the worth of the sports industry?

ONE8Y DB Newsletter

Predicting Team Canada's Performance at the 2024 Paris Olympics