Bollywood Movie Mania in MATLAB
Gunjan Gupta
Sr. Technical Expert, AI & ML at Volkswagen Group | Honored by PM Modi & Featured in Mann Ki Baat for Tellmate
Hello LinkedIn Friends & Followers.
It’s been a long period since my last post, and I am sorry for the delay. You always want to get things done on time, but some stuff happens, and you keep on delaying things that are not monitored or can be delayed without much repercussion. Ok, I never expected to get several comments on my articles, but I thought a few comments should have been there. But really, none? With each new article, I get hundreds of new subscribers to my?GG’s Journal, and thanks to you, my existing subscribers, and readers, I feel I am doing something important, and I want to keep doing it. I will also try to make up the lost articles of the previous month with helpful content for you. Apart from NOT writing these LinkedIn Articles, I was still gathering new insights day and night, and I have tons of ideas to be implemented. I would also request you to comment with your thoughts on this article or anything you suggest I should write next!
One of my colleagues has been working on Plotting different film genres' popularity in MATLAB. He wanted to do it with the Hollywood database, but he could not find a suitable database until the last few days. During this period, I found a nice dataset for Bollywood movies from?Adrian McMahon?on data.world. My colleague still wanted to do with Hollywood movies considering it would be a considerable number and more liked by college students and peers. Now that he has got that dataset and is not working with the Bollywood movies, I am taking this as an opportunity to work something more with the database I found. In this article, I will show you some incredible capabilities of MATLAB with examples that would make sense to even non-programmers!
First, how do you load the dataset in MATLAB? You can download the Excel file from the Author’s Page, put it in the folder where your current MATLAB directory is, and use the following command:
data = readtable('IMDb Movies India.csv');
We will have a variable containing 15509 rows of Name, Year, Duration, Genre, Rating, Values, Votes, Director, Actor1, Actor2, and Actor3.
Now we will do some operations and get desired results. We will start by knowing how many movies are released each year. I will not explain or discuss MATLAB Code in this article, but if you have any queries, please post them in the comments!
%% Movies released every year in Bollywood
newData = rmmissing(data,'DataVariables',{'Year'});
years = str2num(cell2mat(newData.Year(:)));
years = categorical(years);
tbl = tabulate(years);
t = cell2table(tbl,'VariableNames', ...
? ? {'Year','Count','Percent'});
t.Year = categorical(t.Year);
figure; bar(t.Year,t.Count); set(gcf, 'Position', get(0, 'Screensize'));
xlabel('Year'); ylabel('Number of Movies released')
You must have also played the Movie Antakshari Game. Well, I have played a lot in my school and college days. After some regular guesses, we get stuck and do not recall movie names. We started googling so that we could win the game. But guess what, if you use this existing Excel File, you will already have a list of Bollywood Movies in alphabetical order. Do you want a movie name to show randomly based on the alphabet you mention? Let’s use this code:
领英推荐
%% Movies Antakshari Game
newData = rmmissing(data,'DataVariables',{'Name'});
movie = char(newData.Name);
firstChar = lower(movie(:,1));
fc = categorical(string(firstChar));
inp = lower(input('Enter first alphabet for movie name:','s'));
idx? = find(fc == inp);
fprintf('There are %d movies starting from alphabet %s\n',length(idx),inp)
randomIndex = randi(length(idx));
randomMovie = newData.Name(idx(randomIndex));
fprintf('Randomly Guessed Bollywood Movie from %s: %s\n',inp,string(randomMovie));
Here’s a trial of the game:
Enter first alphabet for movie name:Q
There are 52 movies starting from alphabet q
Randomly Guessed Bollywood Movie from q: Qurbaniyaan
Did you love it? Now, let us try one more give before I give you some tasks of your own to take this article forward. We will find the number of movies across different ranges of movie duration and see the movie duration for the maximum number of films. Here is the code:
%% Movie Duration
newData = rmmissing(data,'DataVariables',{'Duration'});
duration = replace(string(newData.Duration),' min'," ");
duration = str2double(duration);
figure; histogram(duration)
After removing missing data entries, you will notice that we were only left with 7240 movies across the database. Maximum 683 movies were in the range of 135 to 140 minutes. The surprising thing was 29 movies were longer than 4 hours. Who would survive watching such long movies? Have you seen any Bollywood Movies for more than 4 hours?
We can do several more things with this database. One of them is finding the genre’s popularity across years, which you can expect as a YouTube #Shorts video with a Hollywood database in the coming few days. Now, let me give you some Homework or Task, whatever you consider:
I am looking forward to you solving the above problems. If you can do them, share your codes in the comments, or post your error or problem in the comments even if you get stuck at any stage. Let's enjoy MATLAB while doing some fun stuff. Happy MATLABing!