Game of Thrones Book Analysis
Obinna Nweke
Growth and Analytics | Marketing & Technology | Data-Driven Insights for Business Growth
My friend who read the Game of Thrones books often talks about how different they are from the TV series. I never read books myself and don’t plan to, but I thought it would be fun to analyze them.
So, welcome to this project. It should be fun!
In this analysis, we’ll discover the top characters mentioned in the books, their sentiment, and their longevity. We’ll use various data processing and analysis techniques to uncover interesting patterns and trends within the text.
Winter is coming soon; let’s dive in.
Data Acquisition and Preparation
We downloaded the “Game of Thrones Boxed” book series in EPUB format from pdfdrive.com and converted the EPUB file to a plain text file (TXT) to make it easier to analyze. We used an external converter, but we also found a Python library (ebookLib) that can do the same job.
Our main goals for this project are:
1. Identify the most frequently mentioned characters in the book series and compare them with the top series characters.
2. Analyze the sentiment of the text to understand its emotional tone.
3. Perform a survival analysis to estimate the “lifespan” of characters based on their mentions.
Word Frequency Analysis
To identify the most frequently mentioned characters, we performed a word frequency analysis. We counted capitalized words, which are likely to be character names. Initially, we compiled a list of frequently occurring capitalized words and then refined this list by excluding common sentence-starting words such as “The,” “He,” “She,” etc.
1. Top 10 Most Mentioned Characters
We then found the top 10 characters based on their mentions, which highlights the central figures of the narrative in the book.
Here's a poster of the series adaptation of the book. Do you see faces from our list?
From this, we see the similarity between the top characters from the book and the movie. For context, Dany, the 8th name on our list, is Daenerys Targaryen, the Mother of Dragons and Breaker of Chains. I enjoyed this character :)
领英推荐
2. Sentiment Analysis
Now, we go further to analyze the sentiment of the text using the VADER sentiment analyzer from the NLTK library. This helped us categorize the emotional tone of the text into positive, neutral, and negative sentiments, and the compound score categorized the results.
The text was preprocessed to remove stopwords and non-alphabetic tokens and then lemmatized. Sentiment polarity scores were calculated for each line, and the results were categorized based on the compound score.
Here’s a plot showing the sentiment distribution throughout the book:
Characters Sentiment Analysis
A step lower, we conducted sentiment analysis on the dialogues of the top characters and calculated the average sentiment score for each character. This analysis revealed differences in how characters are perceived emotionally based on their roles and experiences in the story.
The average sentiment scores for characters’ dialogues are visualized using a bar plot below. Characters who are associated with positive sentiments are scored upward, and negative sentiments are scored negatively. Characters closest to the middle are scored neutral.
3. Character Lifespan and Survival Analysis
To understand how often characters appear in the text and their prominence, we used regular expressions to extract the number of mentions (lifespan) and dialogues for each character. Then, we performed a survival analysis using the Kaplan-Meier estimator to gain insights into the prominence and longevity of characters within the narrative.
The Kaplan-Meier plot shows the survival probability of characters based on the number of mentions. Characters with higher survival probabilities are mentioned more frequently throughout the series, indicating their importance and continued presence in the narrative.
Overall Insights
Conclusion
This analysis of the “Game of Thrones” book series proves that by leveraging text processing and statistical analysis, we can gain an understanding of the narrative structure and emotional tone of qualitative data and uncover hidden trends and patterns.
?? Obinna Nweke, 2024.
Executive Technology Leader transforming business,growing revenue and improving brand value/Coach & Mentor/Africa
7 个月Arya