Love, Actually
Alice SH Wong
Contract Recruiter-in-Residence for Hire | Data Science Practioner for 12 Years
It's holiday season again, and again, some movie called Love, Actually seems to be airing all over the drive-ins, floating cinemas and cinemas on the snow. I'm still not sure what it's about, except that it contains a floppy-haired British dude my high school friends seemed to like quite a bit. I'm still not sure what love has to do with the holidays, but then again, that begs the question of what love is all about, so I decided to consult the oracular Jane Austen and my code on word embeddings. Before I demystify the magical workings of love with how word embeddings work, here's a summary of my findings on love and all its suspected corollaries:
- Love and marriage are, as expected, really rather related after all.
- Passion has little to do with love or marriage.
- Happiness is the most closely associated with love and marriage, and passion, almost not at all.
In a world of complications, thankfully, 1-3 do not contradict one another at all. However, the tenuous link of passion to happiness and love probably contradicts many a world-view. (However, Austen's sensibilities here seem to echo the acclaimed psychiatrist-authored F*** Feelings, which downplays the importance of passion in romantic relationships) Here's one world-view we can all agree with: happiness and wealth are quite closely related, whew. What may be surprising is that wealth trumps passion for happiness, so next time you receive your bonus, be sure to spend it on a dowry rather than the latest...sports car.
Here are some more lessons from the world of Jane Austen:
- marriage - love = needs/covered/eyeing (Emma), Smith/rude/communicative (Sense)
- marriage - farm = worst/nervous/doubtful (Emma), woods/lives/mistress (Sense)
- love - farm = charge/worse/direction (Emma), charge/difference/worse (Sense)
- happiness - farm = worst/nervous/doubtful (Emma), charge/difference/worse (Sense)
If you are wondering how marriage, love and happiness without a farm (or any farm-talk) can be so dire, leaving us with negative words like 'worse,' 'worst,' 'nervous' and 'doubtful,' the near synonymity (scoring above 0.99) of Smith with marriage without love should provide a gut-check that the algorithm is working. Mrs Smith will disinherit his nephew John Willoughby if he marries Marianne. (Pro tip: www.farmersdatingsite.com is a thing. Disclaimer: I've never used it and can't vouch for its quality.)
Using a tweak in methodology on just Emma to find the closest words to love, marriage, happiness, wealth and courage, some of my findings possibly corroborate the above. Words suggesting temperedness rather than headiness such as 'agreeable,' 'companion,' 'moderate' and 'scruple' accompany 'love.' This is rather in keeping with the finding that 'love' and 'passion' are very distantly related. Marriage, as expected, is associated with relations such as 'daughter' and 'husband.' What is a little less expected but which makes sense is its association with facial expression and cognition. 'Countenance' ranked fourth in similarity and 'glance,' tenth, while 'thoughts,' 'opinion' and 'conversation' made the top ten. 'Conversation' appeared again for happiness, topping its list of most closely associated words, so use this holiday season to talk one another's heads off. Here is the full list for happiness as it made a lot of sense almost through the whole list:
- conversation
- attention
- husband
- expectation
- comfort
- dependence
- opinion
- knowledge
- last
- succeeded
The above reads almost like a self-help book (that's not Teal Swan's). There's an unabashed emphasis on human interdependence (conversation, attention, dependence, husband - not sure why 'wife' isn't here) here, and it echoes a lot of scientific research on the primacy of human interaction in happiness.
We saw before that wealth was quite closely related to happiness, but what is surprising in this version of the analysis is the ethereality of its portrayal. The top two most closely associated words to wealth are 'spirit' and 'air.' Nothing like a fine country estate in Jane Austen's world to revive the spirit, I suppose. The appearance of 'error,' 'evil' and 'alarm' just out of the top ten reassures us that we are not heeding an entirely hedonistic guru in Austen.
I've forgotten, till now, that Austen was unmarried all her life and am now slightly skeptical of what she knows about the relationship between marriage and happiness. I might've done better to perform this on Eat, Drink, Remarry: Confessions of a Serial Wife, whose author is much more contemporary and experienced. But I will pass on this challenge to you, my friends, if I have awakened any interest in word embeddings.
Technical Appendix/Considerations
- For the above analysis, I used Word2Vec, which is probably the most commonly used algorithm under the umbrella of word embeddings.
- The canonical flavor of using lists of sentences (with each sentence as a list) was used in the the latter part of the essay. It did not seem to perform as well as using the whole novel as a list of words, which is less commonly used but is sometimes better. The use of the canonical lists of sentences performed reasonably only on Emma, which is why Sense and Sensibility was not used.
- Stop words (commonly occurring words with little additional semantic value) were not additionally removed by me as they rarely showed up in the lists of most similar words. Importantly, too, one of the convenient features of the 'gensim' package is that it helps remove stop words. However, I did find more of those using the canonical sentence-level listing rather than document-level listing method (see above paragraph). Hence, document-level listing seemed to be working better in all respects.
- Word similarities are commonly displayed on a two-dimensional graph, but in this case, my TSNE plotting was taking too long to run. Graphical displays of word similarities are recommended.
- There are many other features of word embeddings that one could explore, with the likelihood of a sentence belonging to the text as an example. It's not easy (for me at least) to replicate Austen's style; any sentence I typed in didn't seem to have as great a probability as some of these word embeddings bloggers' stylistic imitations seem to have. I learned that my greatest hope of emulating Austen's style was to keep it short - to about three words. "She was happy" was my magnum opus. Practice makes perfect. I'll hit four with equal credibility in a week...
- Maybe I haven't watched this Love, Actually because it's not high-def enough on Netflix yet...Let's just say.