Using machine learning to predict new movie ratings
Kiran Brahma
Co-Founder/CEO Knighthood - Get the Right Staffing Solutions for Your Business | Entrepreneur, Mentor & Investor | ISB
Most of us prior to deciding whether a movie is worth a watch, try to seek out a general opinion on movie so that we are not disappointed in the end. Prior to a movie release, most movies are reviewed by critics, which set the tone for initial rating. With passage of time as more people view the movie, the movie ratings starts heading towards what will be the aggregate average rating for the movie. Slowly after a certain period of time, the movie rating stabilizes with minor change. The rating that we observe now is what can be termed as the actual rating of the movie. Imagine the possibility that prior to release of the movie, if we can come out with a estimate of the rating, so that appropriate plans can be developed to maximize the rating given the quality of the movie. It is important to note that the initial rating of movie has major impact on its earning and considering that movies need to earn most of their money in a short period, the initial rating is a powerful influencer in this regard.
Most of us head over to IMDb to get a movie rating and further provide our own ratings for already watched movies. Simply put, IMDb depends on crowd sourcing to assist in arriving at movie rating.
Crowd sourcing is a specific sourcing model in which individuals or organizations use contributions from Internet users to obtain needed services or ideas ( 1)
Though IMDb rating does not impact a movie much in its opening weekend as marketing has higher influence but post the initial euphoria of movie, this single number surely must play a role in determining the movie’s further collection. Many of us can certainly name movies which went onto becoming blockbusters despite a low opening due to high rating from general public. For Ex: Titanic, which is among the highest cumulative gross earnings did not do great in its opening weekned. Its ranked around 336 for it's opening week collection but ranked 2nd when overall earnings are considered (2)
In the entire equation, IMDb is dependent on a certain mass of users to see a movie before any rating, providing it with the initial set of observations, post which people further decide to see it or not. With the current advent of AI, is it possible that we can provide a rating for a movie which is close to the rating that people will give with passage of time. Most of us are aware of Recommendation Systems, where you are recommended another product or service on the basis of consumption of current product. We can see this system in play in full force in Amazon, Netflix and numerous other sites. However, this system has is unable to recommend you new products as the system does not have sufficient data on the new product. Some people can argue that we can take assume a similar product as replacement of new product to arrive a good guess but in case of songs or movies, its not that simple as it is for products or services.
So how does one crowdsource the opinion of an idea of a movie even before you get some people to view the movie. The solution that I predict should work well is borrowed from Spotify, who uses machine learning, which recommends new songs on weekly basis to its subscribers (3). One of the solution adopted breaks down song into a raw audio file to data which is fed as learning data for its deep learning algorithm . It then matches the same with songs that a listener prefers to hear when recommending new songs. This approach allows Spotify to recommend newly released songs which are yet to garner attention from sufficient audience to allow the similar algorithm to work. For movies, we can adopt a similar approach, wherein the new movie is broken down into raw video files and comparison drawn to other movies. Now in our case when we need to arrive at a final rating, so simply put, we need to draw comparison from existing movies rated by a user and predict the rating for the new movie by same user.
Implementing this idea can be computationally heavy and I am not even sure if we have the computational horsepower to run such analysis for each movie at user level to arrive at final prediction of the movie rating by the user. To simplify this, we can adopt the following structure to reduce computational load:
- Club multiple users into various segments so that now we focus on how a certain segment will rate the movie against the individual user. The user segment can be defined by us or arrived on basis of clustering, which is commonly used by marketing teams. We can then compare ratings given across various genre by users to arrive at segments that we feel are homogeneous in nature. The final number of segments that we arrive at needs to cognizant of the current computational capability. If there are too many segments, we will need longer time to arrive at our predictions, thereby failing to serve any purpose. We can even simplify this process by defining our own segments as current method can throw up a segment which is practically impossible to adopt (Imagine a segment wherein users are spread across 100 different locations)
- IBM Watson had already demonstrated on how AI can be used to develop a movie trailer(4). Borrowing from this idea, we can train our AI system to evaluate scenes from different genre of movies to understand how the combination of all such scenes influence the final movie rating by a segment that we have defined above
- Now, when a new movie is released, we bucket the specific movie into a genre and then draw comparisons from our pre-determined data sets for each genre to predict how each segment will rate the movie. We can even remove the bucketing of genre and do a general analysis. However, people are fans of specific genre so it makes little sense to consider a comedy genre fan segment to provide rating for horror genre as initial movie watchers will be from the genre fans mostly. We can run analysis on both and use it to understand which gives a better prediction rather than trying to guess which is better for a more objective analysis
- Finally, we take in a weighted average of rating predicted by each segment to arrive at a final rating of the movie
We will obviously need to compare our predicted rating with actual ratings to ensure that we are able to develop a more robust learning system. We will need to compare the following metrics from our predicted values to actuals
- The average ratings by each user segment
- Weightage of each segment in the final rating. The final aggregate rating is dependent on our prediction on which segments had actually decided to view the movie and rate against that we had predicted early on
The final prediction of the movie that we have predicted can have multiple real world applications. Some of the ideas that I can think of can be as following:
- Understand which user segments will give a higher rating so focus the movie marketing efforts to those specific segments
- The Movie makers can self define their own segments on basis of geographical locations to predict how movie will fare across different geographies. This can assist in planning locations where movie needs to be screened and where to limit it. Most movies have a limited time to recoup their money and if movie is released in a location that does not prefer, leading to poor rating than it can result in poor show even in locations where movie is expected to do well
- If we have limited budget and screen in limited theaters, we can optimize the locations where movie needs to be screened to get best possible final rating
- Movie makers can understand on what make a movie work and what makes it fail on basis of learnings from prior data.
I am sure that one can think of numerous other ideas on how we can derive further value from this process. I wont be shocked if IMDb (An Amazon company) actually releases such a tool available for Producers and generate a steady stream of revenue. In fact it can even use a similar concept to understand its Prime Video users better and deliver better content, thereby locking in more users to Prime, further developing a more powerful grip on its consumers on its way to world domination
Note:
1: Crowd Sourcing: https://en.wikipedia.org/wiki/Crowdsourcing
2: Titanic Box-Office Earnings: https://www.boxofficemojo.com/movies/?id=titanic.htm
3: Spotify Weekly New Song Recommedations: https://hackernoon.com/spotifys-discover-weekly-how-machine-learning-finds-your-new-music-19a41ab76efe
4: IBM Watson used in making Movie Trailer: https://www.ibm.com/blogs/think/2016/08/cognitive-movie-trailer/