Poor Things - Analyzing the Screenplay with Python, LLMs and NLP

Poor Things - Analyzing the Screenplay with Python, LLMs and NLP

In this analysis I will explore the script of Yorgos Lanthimos’ movie ‘Poor Things,’ which is publicly available here. Using Python, I will parse the PDF of the script, isolating elements such as dialogue lines, character names, action descriptions, and scene headings. Following this, I will use python, NLP (Natural Language Processing), and various Large Language Models to conduct analysis on the contents of the script. The objective of this study is not to derive definitive conclusions about this script, but rather to examine various techniques for analyzing unstructured text.

This version of the script for ‘Poor Things’ is 97 pages long and contains 138 different scene headings. Using python we can read the pdf as plain text and identify the different screenplay elements by taking advantage of standard screenplay formatting. For example, if a line contains the substring, “INT.” or “EXT.” we can tag it as a scene heading. If a line is in all caps then we can tag it as a character name. I’ve built a very basic screenplay parser here, which gives us the table on which the rest of this analysis will be based on. Below is a preview of the first few rows of the table, filtered to isolate lines of dialogue.?

Quick look at the data:

32 characters speak 905 total lines of dialogue consisting of 9,210 total words and 2,692 unique words.

Bella (Emma Stone) has 333 lines, 37% of the total script and speaks in 73 different scenes.

Here’s a look at the top 5 characters in terms of lines of dialogue throughout the movie:

What’s immediately evident from this chart is that we can see a somewhat clear delineation of a three act structure. In act 1, Baxter and Max have a lot of lines and carry much of the expository burden. By act 2 we meet Duncan and Bella has matured to the point where she can carry scenes as the unquestioned lead of this movie. As we move into the end of act 2 and act 3, Bella is the dominating force of this movie, often having scenes without any of the other top 5 characters. In act 3, Baxter and Max come back into the fold and we meet Alfie.?

Is this actually a female led film?

USC’s Annenberg Inclusion Initiative’s research brief for 2023 revealed that only 30% of theatrically released films included a female lead, a significant drop from last year’s record high of 44%.?

While Poor Things certainly has a female lead in Emma Stone as seen in the charts above, is this actually a female dominant film? By grouping the data we can see that there are 32 unique characters who have at least one line of dialogue in this movie, 8 of those are female, 10 are male and 14 are unnamed (ex. Doctor or Doorman).?

After Bella, the female character with the next most lines is Swiney - a character who does not get introduced until Scene Number 95 and appears in only 5 scenes.?

Excluding the lines for unnamed characters, Male characters actually have a slight majority of the lines in this movie compared to female characters: Male - 449 lines of dialogue; Female- 415 lines of dialogue.?

While Poor Things does qualify as having a female lead, a deeper look at the data shows that this is actually a male dominant film in terms of supporting cast and lines of dialogue.?

Sentiment, Emotions and Themes

Now to use NLP and LLMs to look at the actual lines of dialogue, the sentiment, emotions and themes they indicate.?

Sentiment

To get sentiment scores, I fed each line of dialogue to the DistilBERT-base-uncased model by Hugging Face.This model is trained on web-scraped data including Wikipedia and thousands of books. Results for the top 5 characters can be seen below.?

  • The single line with the highest Negative Sentiment score: Steward: “Shit on me, you fucker!”. - Scene 73
  • The single line with the highest Positive Sentiment score: Kitty: “Oh I loved it. A handbag!” - Scene 57

Duncan (Mark Ruffalo) has the highest share of negative sentiment throughout the movie of the top characters. If you’ve seen the movie you’ll know that things turn for the worse for Duncan about halfway through Act II. By looking at average sentiment per scene for Duncan we can see that negative sentiment peaks around scene 75, where notably Bella sleeps with another person. The other significant peak for negative sentiment is when Duncan is getting drunk and losing a bunch of money gambling around scene 88.?

Emotion

A little more granular, we can look at emotion with the roberta-base model, trained on the go_emotions dataset for multi-label classification, available through Hugging Face. Following the same process as sentiment, the output for the top 5 characters is below.

  • Not surprisingly, Duncan is the angriest character in this movie.
  • About a third of Bella’s lines of dialogue are associated with curiosity or confusion which makes sense given the premise of this movie.
  • Max who’s introduced to the marvel that is Bella and subsequently falls in love with her fittingly has the majority of his lines corresponding to curiosity and caring.?

Does Bella get Smarter through the movie??

If you’ve seen the movie the answer is obviously yes. Bella can only babble incoherently in the beginning of the movie and by the end she’s speaking in introspective and assertive monologues.?

One way to measure this is through RIX scores. RIX measures verbal complexity by considering the length and frequency of words in a block of text, while certainly not perfect this measure is more suited for short blocks of text than others.??

By using the textstat python package, I calculated the RIX score for each of Bella’s lines of dialogue - below is a visualization that averages these scores over 30 scene blocks.?

Themes + Characters

A slightly more ambitious task, we can also look at themes present throughout the film, and by using a zero shot model - assign values to how each line of dialogue contributes to these themes to get a sense of the ways in which different characters experience these themes throughout the movie.?

To start, I fed the full pdf of the script to GPT-4 with the OpenAI API and asked it to return the 5 most prevalent themes. The output is below.?

  1. Identity
  2. Public vs Private Spheres
  3. Personal Growth
  4. Ethical and Moral Dilemmas
  5. Belonging

By using a zero-shot model such as Facebook’s bart-large-mnli, we can classify each line of dialogue, getting a score for each of these five themes.

The highest score for each theme:

  • Identity: “You know me? Tell me about myself. Was I nice?” - Bella Scene 128
  • Public vs Private Spheres: “A desperate rationalization Martha. Polite society will destroy you. Or can she not have friends Mr Wedderburn?” - Harry Scene 74
  • Personal Growth: “Swiney was right. I am discovering parts of myself hitherto unknown. The variety of desires being made manifest is fascinating.” - Bella Scene 108
  • Ethical and Moral Dilemmas: “We must get him up to the surgery. He will die if we cannot stop it.” - Bella Scene 135
  • Belonging: “Allonge toi. Aaaaaaaagh! Merci.” - Chapelle Scene 97

While this is certainly not a perfect method for identifying and classifying themes, there are some interesting points when looking at Bella’s dialogue over time such as Personal Growth generally trending upwards and belonging trending downwards while the other 3 are largely consistent throughout the movie.?


Dena Kleinrock, M.A., SPHR, ACPEC

Director, Employee Learning at SAG-AFTRA L&D | HR | Operations Guru | Leadership Communications | Strategy| DiSC Practitioner | Advanced Certified Personal and Executive Coach

1 年

Fez, this is fascinating!

要查看或添加评论,请登录

Fez Shah的更多文章

社区洞察

其他会员也浏览了