Barbenheimer - Analyzing the scripts with Gen AI, NLP and Python
Justine Goode ; NBC News / Universal / Warner Brothers

Barbenheimer - Analyzing the scripts with Gen AI, NLP and Python

In this analysis I will explore the scripts for the 2023 movies, Barbie and Oppenheimer. Using Python, I will parse the PDF of the script, isolating elements such as dialogue lines, character names, action descriptions, and scene headings. Following this, I will use python, NLP (Natural Language Processing), and various Large Language Models to conduct analysis on the contents of the script. The objective of this study is not to derive definitive conclusions about this script, but rather to examine various techniques for analyzing unstructured text.

Using python we can read the pdfs for these scripts as plain text and identify the different screenplay elements by taking advantage of standard screenplay formatting. For example, if a line contains the substring, “INT.” or “EXT.” we can tag it as a scene heading. If a line is in all caps then we can tag it as a character name. I’ve built a very basic screenplay parser here, which gives us the table on which the rest of this analysis will be based on. Below is a preview of the first few rows of the table, filtered to isolate lines of dialogue.

Quick Look at the Data:

Barbie: 94 characters speak 1,005 total lines of dialogue consisting of 10,868 total words and 3,256 unique words. Barbie Margot has 422 lines, 41.99% of the total script and speaks in 51 different scenes.

Oppenheimer: 77 characters speak 1,803 total lines of dialogue consisting of 19,092 total words and 5,051 unique words. Oppenheimer has 590 lines, 32.72% of the total script and speaks in 167 different scenes.

As seen in the charts below, both movies feature individual lead performances that dominate screen-time in a manner fitting their titular roles.

Much of the conversation around Barbenhemier last Summer involved the perceived gender split in the audiences for these movies. Barbie, directed by Greta Gerwig, was said to be attracting a largely young female audience while Oppenheimer, directed by Christopher Nolan, was said to be attracting an older male audience. This split is certainly reflected when looking at the characters in the scripts: Oppenheimer features speaking parts for 4 female characters out of 77 while the majority of speaking characters and lines of dialogue in Barbie are female characters.

Using Generative AI and NLP

Although Oppenheimer certainly does not compare favorably Barbie in terms of female representation, how does it compare when we analyze the actual text? By using a zero-shot model such as Facebook’s bart-large-mnli, we can classify each line of dialogue as "Feminine" or "Masculine", getting a score for each category.

Is Oppenheimer actually more "Feminine" than Barbie? No. It is not.

Below are the most, "Masculine" and "Feminine" lines of dialogue from each movie, according to the zero-shot model.

Barbie - Top 3 Most Masculine Lines:

  1. "It’s boy’s night!" - All the Kens - Scene 54
  2. "Men rule the world!" - Ken Ryan Gosling - Scene 26
  3. "It sure has! And please call me Mr. Ken President Prime Minister Man." - Ken Scott - Scene 52

Barbie - Top 3 Most Feminine Lines:

  1. "Follow that Barbie!" - Mattel CEO - Scene 43
  2. "And Barbie Video Girl!" - Gloria - Scene 59
  3. "Get that Barbie!" - Mattel CEO - Scene 40

Oppenheimer - Top 3 Most Masculine Lines:

  1. "Robert. The man of the moment." - Einstein - Scene 242
  2. "This time there was another man." - Oppenheimer - Scene 115
  3. "Pash? You met Colonel Pash?" - Groves - Scene 116

Oppenheimer - Top 3 Most Feminine Lines:

  1. "It’s Kitty." - Garrison - Scene 238
  2. "Kitty?" - Oppenheimer - Scene 52
  3. "(over phone) Kitty? Kitty?" - Charlotte - Scene 171

How do the characters Barbie and Oppenheimer compare?

Who is smarter? Barbie or Oppenheimer?

Using the TextBlob python package, we can calculate the reading level of Barbie and Oppenheimers dialogue using the Flesch-Kincaid Grade level and SMOG Index, and gauge the complexity of their language with the Dale-Chall Readability Formula. As seen below, Oppenheimer has a slightly higher score in all three metrics.

Oppenheimer is smarter.

Who is Happier? Barbie or Oppenheimer?

We can look at emotion with the roberta-base model, trained on the go_emotions dataset for multi-label classification, available through Hugging Face. The chart below shows the % of each character's total dialogue where the primary emotion is related to happiness ("joy", "happiness", "amusement", "approval).

Barbie is happier than Oppenheimer.

Who is having a bigger existential crisis? Barbie or Oppenheimer?

Using the same method as above, we can isolate lines of dialogue where the primary emotion is related to existential thoughts ("confusion", "questioning", "curiosity").

Barbie is having a bigger existential crisis than Oppenheimer.

Who is the more complex character?

By dividing each movie into tenths, we can look at the number of primary emotions each character is expressing through their dialogue throughout their respective journeys. In the beginning of Oppenheimer, the character of Oppenheimer experiences a spike in the number of unique emotions he expresses, whereas Barbie experiences a spike at the end of her movie.

Overall, Oppenheimer expresses 162 total emotions while Barbie expresses 114.

Oppenheimer is the more complex character.

要查看或添加评论,请登录

Fez Shah的更多文章

社区洞察

其他会员也浏览了