Barbenheimer - Analyzing the scripts with Gen AI, NLP and Python
In this analysis I will explore the scripts for the 2023 movies, Barbie and Oppenheimer. Using Python, I will parse the PDF of the script, isolating elements such as dialogue lines, character names, action descriptions, and scene headings. Following this, I will use python, NLP (Natural Language Processing
Using python we can read the pdfs for these scripts as plain text and identify the different screenplay elements by taking advantage of standard screenplay formatting. For example, if a line contains the substring, “INT.” or “EXT.” we can tag it as a scene heading. If a line is in all caps then we can tag it as a character name. I’ve built a very basic screenplay parser
Quick Look at the Data:
Barbie: 94 characters speak 1,005 total lines of dialogue consisting of 10,868 total words and 3,256 unique words. Barbie Margot has 422 lines, 41.99% of the total script and speaks in 51 different scenes.
Oppenheimer: 77 characters speak 1,803 total lines of dialogue consisting of 19,092 total words and 5,051 unique words. Oppenheimer has 590 lines, 32.72% of the total script and speaks in 167 different scenes.
As seen in the charts below, both movies feature individual lead performances that dominate screen-time in a manner fitting their titular roles.
Much of the conversation around Barbenhemier last Summer involved the perceived gender split in the audiences for these movies. Barbie, directed by Greta Gerwig, was said to be attracting a largely young female audience while Oppenheimer, directed by Christopher Nolan, was said to be attracting an older male audience. This split is certainly reflected when looking at the characters in the scripts: Oppenheimer features speaking parts for 4 female characters out of 77 while the majority of speaking characters and lines of dialogue in Barbie are female characters.
Using Generative AI and NLP
Although Oppenheimer certainly does not compare favorably Barbie in terms of female representation, how does it compare when we analyze the actual text? By using a zero-shot model such as Facebook’s bart-large-mnli, we can classify each line of dialogue as "Feminine" or "Masculine", getting a score for each category.
Is Oppenheimer actually more "Feminine" than Barbie? No. It is not.
Below are the most, "Masculine" and "Feminine" lines of dialogue from each movie, according to the zero-shot model.
Barbie - Top 3 Most Masculine Lines:
Barbie - Top 3 Most Feminine Lines:
领英推荐
Oppenheimer - Top 3 Most Masculine Lines:
Oppenheimer - Top 3 Most Feminine Lines:
How do the characters Barbie and Oppenheimer compare?
Who is smarter? Barbie or Oppenheimer?
Using the TextBlob python package, we can calculate the reading level of Barbie and Oppenheimers dialogue using the Flesch-Kincaid Grade level and SMOG Index, and gauge the complexity of their language with the Dale-Chall Readability Formula. As seen below, Oppenheimer has a slightly higher score in all three metrics.
Oppenheimer is smarter.
Who is Happier? Barbie or Oppenheimer?
We can look at emotion with the roberta-base model, trained on the go_emotions dataset for multi-label classification, available through Hugging Face. The chart below shows the % of each character's total dialogue where the primary emotion is related to happiness ("joy", "happiness", "amusement", "approval).
Barbie is happier than Oppenheimer.
Who is having a bigger existential crisis? Barbie or Oppenheimer?
Using the same method as above, we can isolate lines of dialogue where the primary emotion is related to existential thoughts ("confusion", "questioning", "curiosity").
Barbie is having a bigger existential crisis than Oppenheimer.
Who is the more complex character?
By dividing each movie into tenths, we can look at the number of primary emotions each character is expressing through their dialogue throughout their respective journeys. In the beginning of Oppenheimer, the character of Oppenheimer experiences a spike in the number of unique emotions he expresses, whereas Barbie experiences a spike at the end of her movie.
Overall, Oppenheimer expresses 162 total emotions while Barbie expresses 114.
Oppenheimer is the more complex character.