Machine Learning Bruises
Patrick Biltgen
Author | Engineer | Data Scientist | Strategist | Working at the Intersection of Space and AI
One of the songs I keep on continuous repeat is Bruises by Train (featuring Ashley Monroe). I love the simple lyrics and slight southern twang, but I mostly love Bruises for the challenge it represents for machine learning gurus everywhere. Take for example this beautiful line sung by Monroe:
One that's five and one that's three.
What do you think the song is about? Write down your answer, and then continue reading.
Run this one through a word counter: one (2) that's (2) five (1) and (1) three (1). Clearly, this is a song about numbers. Let's try it again with the next line.
One that's five and one that's three // Been two years since he left me.
Well, human, you know what that means. Monroe's line cryptically conveys the knowledge: "I am a single mom with two young children."
Bruises is a song about two high school friends that meet up after ten years and recount their emotional "bruises" that "make for better conversation." The song features snippets of conversation between the pair. As a human, it's obvious and clever and cute. But what might an algorithm make of this?
Observe the following word cloud that summarizes the frequencies of the word counts:
I guess the song's about... bruises? or how everybody loses? Ok, I guess it is about a conversation. I'll give you that. Actually, the larger words are the ones repeated in the refrain. When I pasted the lyrics into a sentiment analysis tool, the result was "negative" with a confidence of 88.5983. But Bruises isn't negative. It's nostalgic. Wistful. Even a little flirty. Most sentiment analysis tools work by assigning a positive or negative sentiment value to the individual words in a passage. Advanced tools are required to detect nuance or to "understand" the entire passage in context. This problem is further compounded by the fact that Bruises isn't just text. It's a song, designed more for entertainment than for coherent content transmission.
Bruises highlights the difficulty of encoding contextual information into a computational algorithm. We've all had similar conversations with long-lost acquaintances and friends. We might have had similar life experiences as the song's protagonists. We are also good at deducing the other side of a conversation: "Good to know that you got free. That town [I know] was keeping you down on your knees." <presumably, Monroe asked Train's Patrick Monahan whether he was still living in his career/life-limiting hometown>. It's extremely difficult to design an algorithm that would robustly identify the purpose, meaning, and significance of this simple song. This example highlights one reason why I think automated "sensemaking" from computer algorithms is a long way off.
That's not to say this is an impossible mission. A number of algorithms and techniques have been developed over the last sixty years for automated summarization and gisting of text-based information. When I applied this type of technique to Bruises:
These bruises make for better conversation Everybody loses, we all got bruises
And there you have it. I hope this post also makes for better conversation!
The views in this article are solely those of the author and do not represent the views of a current or previous employer or sponsor.
Emerging Technology Enthusiast | Business Development & Sales
7 年Valuable insight. Short and sweet. As an aside, my written answer to the first was "children". :) Maybe because I have one that's five and one that's three...
fascinating. Especially as I just formed a team to participate in a fake news detection challenge. I think one idea i take from this is to aim the sense making tools at a narrow, specific domain. It seems impossible to use algorithms to infer meaning from songs, poetry, musings from a crazy person, etc. these types of writing, while simple, are not intended to directly and clearly communicate specific ideas, but rather to provoke thought and maybe emotional reaction. contrast with something like yelp reviews and the task of using an algorithm to make sense of the content automatically, seems less daunting.
Regents Professor & Sikorsky Professor at Georgia Institute of Technology
7 年Very interesting view point!