End to End Movie Recommendation System with Flask app

End to End Movie Recommendation System with Flask app

Introduction

In this blog post, we will go through the process of building an end-to-end machine learning project. We will start by acquiring a dataset from Kaggle, then we will preprocess the data and train a machine learning model. Finally, we will deploy the model as a web application using Flask.


Getting Data from Kaggle

Kaggle is a popular platform that hosts various datasets for machine learning. To get data from Kaggle, we first need to create an account on Kaggle and join a competition or find a dataset of interest.


Once we have found a dataset, we can download it directly from Kaggle. However, some datasets might require us to accept a competition rule or agreement first. After downloading the dataset, we can extract the files and load the data into our project.


Preprocessing Data

Before we can train a machine learning model, we need to preprocess the data. Preprocessing involves tasks such as cleaning the data, handling missing values, scaling the data, and encoding categorical features.


In our project, we will use pandas for data preprocessing. Pandas is a powerful library that provides data structures and functions for data analysis. We will load the data into a pandas dataframe and perform various preprocessing tasks on the dataframe.


Training Machine Learning Model

Once we have preprocessed the data, we can train a machine learning model. In our project, we will use the scikit-learn library for machine learning. Scikit-learn is a popular library that provides various machine learning algorithms and tools for model selection, evaluation, and preprocessing.


We will use the cosine similarity for our project. It is used to calculate distance between vectors.


After training the model, we will serialize the model using the pickle library. Pickle is a library that allows us to save Python objects in a binary format. We will save the trained model as a file so that we can load it later in our Flask application.


Building Flask Application

Flask is a popular web framework for Python that allows us to build web applications quickly and easily. We will use Flask to build a web application that takes an input image and predicts the class of the image using the trained model.


Our Flask application will have two routes: a home route and a prediction route. The home route will display a simple HTML page with a form for uploading an image. The prediction route will take the uploaded image, preprocess the image, and make a prediction using the trained model. The prediction result will be displayed on a new page.


Conclusion

In this blog post, we have gone through the process of building an end-to-end machine learning project. We started by acquiring a dataset from Kaggle, then we preprocessed the data and trained a machine learning model. Finally, we deployed the model as a web application using Flask.


The code for the project can be found below.


Note Book Code:

import numpy as np

import pandas as pd

[3]

movie = pd.read_csv('tmdb_5000_movies.csv')

credits = pd.read_csv("tmdb_5000_credits.csv")

[4]

movie.head()

[5]

credits.head()

merging both dataset

[6]

movies = movie.merge(credits, on='title')

[7]

movies.head()

keeping important columns

[8]

movies.columns

Index(['budget', 'genres', 'homepage', 'id', 'keywords', 'original_language',

      'original_title', 'overview', 'popularity', 'production_companies',

      'production_countries', 'release_date', 'revenue', 'runtime',

      'spoken_languages', 'status', 'tagline', 'title', 'vote_average',

      'vote_count', 'movie_id', 'cast', 'crew'],

     dtype='object')

[9]

movies = movies[['id','title','overview','keywords','genres','cast','crew']]

[10]

movies

checking null vals

[11]

movies.isnull().sum()

id         0

title      0

overview   3

keywords   0

genres     0

cast       0

crew       0

dtype: int64

[12]

moveis.dropna(inplace=True)

working with overview `

[13]

movies.iloc[0].overview

'In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization.'

[14]

# it is a string so convert is into list

movies['overview'] = movies['overview'].apply(lambda x:x.split())

[15]

movies['overview'][0]

['In',

 'the',

 '22nd',

 'century,',

 'a',

 'paraplegic',

 'Marine',

 'is',

 'dispatched',

 'to',

 'the',

 'moon',

 'Pandora',

 'on',

 'a',

 'unique',

 'mission,',

 'but',

 'becomes',

 'torn',

 'between',

 'following',

 'orders',

 'and',

 'protecting',

 'an',

 'alien',

 'civilization.']

working with keywords

[16]

movies['keywords'][0]

'[{"id": 1463, "name": "culture clash"}, {"id": 2964, "name": "future"}, {"id": 3386, "name": "space war"}, {"id": 3388, "name": "space colony"}, {"id": 3679, "name": "society"}, {"id": 3801, "name": "space travel"}, {"id": 9685, "name": "futuristic"}, {"id": 9840, "name": "romance"}, {"id": 9882, "name": "space"}, {"id": 9951, "name": "alien"}, {"id": 10148, "name": "tribe"}, {"id": 10158, "name": "alien planet"}, {"id": 10987, "name": "cgi"}, {"id": 11399, "name": "marine"}, {"id": 13065, "name": "soldier"}, {"id": 14643, "name": "battle"}, {"id": 14720, "name": "love affair"}, {"id": 165431, "name": "anti war"}, {"id": 193554, "name": "power relations"}, {"id": 206690, "name": "mind and soul"}, {"id": 209714, "name": "3d"}]'

[17]

import ast  # use to convert string to integer

def keywords(obj):

    l = []

    for i in ast.literal_eval(obj):

        l.append(i['name'])

    return l

[18]

movies['keywords'] = movies['keywords'].apply(keywords)

[19]

movies['keywords']

0      [culture clash, future, space war, space colon...

1      [ocean, drug abuse, exotic island, east india ...

2      [spy, based on novel, secret agent, sequel, mi...

3      [dc comics, crime fighter, terrorist, secret i...

4      [based on novel, mars, medallion, space travel...

                             ...                       

4804   [united states–mexico barrier, legs, arms, pap...

4805                                                  []

4806   [date, love at first sight, narration, investi...

4807                                                  []

4808           [obsession, camcorder, crush, dream girl]

Name: keywords, Length: 4806, dtype: object

working with genres

[20]

movies['genres'][0]

'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'

[21]

import ast  # use to convert string to integer

def genres(obj):

    l = []

    for i in ast.literal_eval(obj):

        l.append(i['name'])

    return l

[22]

movies['genres'] = movies['genres'].apply(genres)

[23]

movies['genres']

0      [Action, Adventure, Fantasy, Science Fiction]

1                       [Adventure, Fantasy, Action]

2                         [Action, Adventure, Crime]

3                   [Action, Crime, Drama, Thriller]

4               [Action, Adventure, Science Fiction]

                           ...                     

4804                       [Action, Crime, Thriller]

4805                               [Comedy, Romance]

4806              [Comedy, Drama, Romance, TV Movie]

4807                                              []

4808                                   [Documentary]

Name: genres, Length: 4806, dtype: object

working with cast

[24]

movies['cast'][0]

'[{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri", "credit_id": "52fe48009251416c750ac9cb", "gender": 1, "id": 8691, "name": "Zoe Saldana", "order": 1}, {"cast_id": 25, "character": "Dr. Grace Augustine", "credit_id": "52fe48009251416c750aca39", "gender": 1, "id": 10205, "name": "Sigourney Weaver", "order": 2}, {"cast_id": 4, "character": "Col. Quaritch", "credit_id": "52fe48009251416c750ac9cf", "gender": 2, "id": 32747, "name": "Stephen Lang",

import ast  # use to convert string to integer

def cast(obj):

    l = []

    # intersted in top three cast

    count = 0

    for i in ast.literal_eval(obj):

        if count != 3:

            l.append(i['name'])

            count+=1

        else:

            break

    return l

movies['cast'] = movies['cast'].apply(cast)

[26]

movies['cast']

0       [Sam Worthington, Zoe Saldana, Sigourney Weaver]

1          [Johnny Depp, Orlando Bloom, Keira Knightley]

2           [Daniel Craig, Christoph Waltz, Léa Seydoux]

3           [Christian Bale, Michael Caine, Gary Oldman]

4         [Taylor Kitsch, Lynn Collins, Samantha Morton]

                             ...                       

4804   [Carlos Gallardo, Jaime de Hoyos, Peter Marqua...

4805        [Edward Burns, Kerry Bishé, Marsha Dietlein]

4806          [Eric Mabius, Kristin Booth, Crystal Lowe]

4807           [Daniel Henney, Eliza Coupe, Bill Paxton]

4808   [Drew Barrymore, Brian Herzlinger, Corey Feldman]

Name: cast, Length: 4806, dtype: object

working with crew

[27]

movies['crew'][0]

'[{"credit_id": "52fe48009251416c750aca23", "department": "Editing", "gender": 0, "id": 1721, "job": "Editor", "name": "Stephen E. Rivkin"}, {"credit_id": "539c47ecc3a36810e3001f87", "department": "Art", "gender": 2, "id": 496, "job": "Production Design", "name": "Rick Carter"}, {"credit_id": "54491c89c3a3680fb4001cf7", "department": "Sound", "gender": 0, "id": 900, "job":

import ast  # use to convert string to integer

def crew(obj):

    l = []

    # interested in top three cast

    count = 0

    for i in ast.literal_eval(obj):

        if i['job'] == 'Director':

            l.append(i['name'])

            break

    return l

[29]

movies['crew'] = movies['crew'].apply(crew)

[30]

movies['crew']

0          [James Cameron]

1         [Gore Verbinski]

2             [Sam Mendes]

3      [Christopher Nolan]

4         [Andrew Stanton]

              ...        

4804    [Robert Rodriguez]

4805        [Edward Burns]

4806         [Scott Smith]

4807         [Daniel Hsia]

4808    [Brian Herzlinger]

Name: crew, Length: 4806, dtype: object

concatenating last four cols into one`

[31]

movies['tags'] = movies['overview'] + movies['cast'] + movies['crew'] + movies['keywords']

[32]

movies =  movies[['id','title','tags']]

[33]

movies

removing spaces form tags

[34]

movies['tags'] = movies['tags'].apply(lambda x: [i.replace(" ", "") for i in x])


A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

 movies['tags'] = movies['tags'].apply(lambda x: [i.replace(" ", "") for i in x])

[35]

movies['tags'][0]

['In',

 'the',

 '22nd',

 'century,',

 'a',

 'paraplegic',

 'Marine',

 'is',

 'antiwar',

 'powerrelations',

 'mindandsoul',

 '3d']

applying stemming

[36]

from nltk.stem.porter import PorterStemmer

ps = PorterStemmer()

[37]

movies['tags']

0      [In, the, 22nd, century,, a, paraplegic, Marin...

1      [Captain, Barbossa,, long, believed, to, be, d...

2      [A, cryptic, message, from, Bond’s, past, send...

3      [Following, the, death, of, District, Attorney...

4      [John, Carter, is, a, war-weary,, former, mili...

                             ...                       

4804   [El, Mariachi, just, wants, to, play, his, gui...

4805   [A, newlywed, couple's, honeymoon, is, upended...

4806   ["Signed,, Sealed,, Delivered", introduces, a,...

4807   [When, ambitious, New, York, attorney, Sam, is...

4808   [Ever, since, the, second, grade, when, he, fi...

Name: tags, Length: 4806, dtype: object

[38]

def stemming(text):

    l = []

    for i in text:

        l.append(ps.stem(i))

    return " ".join(l)

[39]

movies['tags'] = movies['tags'].apply(stemming)


A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

[40]

movies['tags'][10]

'superman return to discov hi 5-year absenc ha allow lex luthor to walk free, and that those he wa closest too felt abandon and have move on. luthor plot hi ultim reveng that could see million kill and chang the face of the planet forever, as well as rid himself of the man of steel. brandonrouth kevinspacey katebosworth bryansing savingtheworld dccomic invulner sequel superhero basedoncomicbook kryptonit superpow superhumanstrength lexluthor'

Vectorization code

[41]

from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(max_features=500, stop_words='english')

[42]

vectors = vectorizer.fit_transform(movies['tags']).toarray()

[43]

vectors

array([[1, 0, 0, ..., 0, 0, 0],

      [0, 0, 0, ..., 0, 0, 0],

      [0, 0, 0, ..., 0, 0, 0],

      ...,

      [0, 0, 0, ..., 0, 0, 0],

      [0, 0, 0, ..., 1, 0, 0],

      [0, 0, 0, ..., 0, 0, 0]], dtype=int64)

 ['3d',

 'accident',

 'act',

 'action',

 'adventur',

 'affair',

 'aftercreditssting',

 'age',

 'agent',

 'alcohol',

 'alien',

 'alway',

 'young',

 'zombi']

Calculating distances

[45]

from sklearn.metrics.pairwise import cosine_similarity

[46]

similarity = cosine_similarity(vectors)

[47]

movies[movies['title']=="Avatar"]

[48]

sorted(list(enumerate(similarity[0])),reverse=True,key=lambda x:x[1])

[(0, 0.9999999999999998),

 (507, 0.50709255283711),

 (151, 0.46188021535170054),

 (1216, 0.44262666813799045),

 (539, 0.38729833462074165),

 (1321, 0.36514837167011066),

 (1920, 0.3544587784792833),

 (305, 0.3464101615137754),

 (2786, 0.3450327796711771),

 (1774, 0.3442651863295481),

 

 ...]

movies.iloc[100].title

'The Curious Case of Benjamin Button'

[50]

def Recommendation_system(movie):

    movie_index = movies[movies['title']== movie].index[0]

    distances = sorted(list(enumerate(similarity[0])),reverse=True,key=lambda x:x[1])

    

    for i in distances[1:20]:

        print(movies.iloc[i[0]].title)

[51]

Recommendation_system('Avatar')

Independence Day

Beowulf

Aliens vs Predator: Requiem

Titan A.E.

The Thing

Lifeforce

Treasure Planet

Attack the Block

Martian Child

Edge of Tomorrow

Predators

Meet Dave

Capricorn One

Tears of the Sun

Under the Skin

Independence Daysaster

Lockout

Aliens in the Attic

E.T. the Extra-Terrestrial

Pickling files

[53]

import pickle

pickle.dump(movies, open('model.pkl','wb'))

pickle.dump(similarity, open('similarity’,'wb'))











Flask Code;



from flask import Flask, request, render_template

import requests

import pandas as pd

import pickle



app = Flask(__name__)



# loading models

# movies = pd.read_csv('movies.csv')

movies = pickle.load(open('model.pkl', 'rb'))

similarity = pickle.load(open('similarity.pkl', 'rb'))



# function to fetch movie poster

def fetch_poster(movie_id):

    url = "https://api.themoviedb.org/3/movie/{}?api_key=390e76286265f7638bb6b19d86474639&language=en-US".format(movie_id)

    data = requests.get(url)

    data = data.json()

    full_path = "https://image.tmdb.org/t/p/w500/" + data['poster_path']

    return full_path



# function to get recommended movies

def get_recommendations(movie):

    # get the index of the selected movie

    idx = movies[movies['title'] == movie].index[0]

    # get pairwise similarity scores of all movies with the selected movie

    sim_scores = list(enumerate(similarity[idx]))

    # sort the movies based on similarity scores in descending order

    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # get top 20 similar movies (excluding the selected movie)

    sim_scores = sim_scores[1:21]

    # get titles and posters of the recommended movies

    movie_indices = [i[0] for i in sim_scores]

    movie_titles = movies['title'].iloc[movie_indices].tolist()

    movie_posters = [fetch_poster(movies['id'].iloc[i]) for i in movie_indices]

    return movie_titles, movie_posters



# home page

@app.route('/')

def home():

    movie_list = movies['title'].tolist()

    return render_template('index.html', movie_list=movie_list)



# recommendation page

@app.route('/recommend', methods=['POST'])

def recommend():

    movie_title = request.form['selected_movie']

    recommended_movie_titles, recommended_movie_posters = get recommendations(movie_title)

    return render_template('index.html', movie_list=movies['title'].tolist(),

                           recommended_movie_titles=recommended_movie_titles,

                           recommended_movie_posters=recommended_movie_posters)



if __name__ == '__main__':

    app.run(debug=True)






HTML Code:



<!doctype html>

<html>

    <head>

        <title>Movie Recommender</title>

        <link rel="stylesheet"  integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous">

    </head>

    <body style="background:#D9F799">

        <div style="color:white; margin-top:15px; border-radius:20px;" class="container my-3 mt-3 bg-dark">

            <h1 class="text-center">Movie Recommendation System</h1>

            <form action="/recommend" method="POST">

                <div class="form-group">

                    <label for="movie-select">Select a movie:</label>

                    <select class="form-control" id="movie-select" name="selected_movie">

                        {% for movie in movie_list %}

                            <option value="{{ movie }}">{{ movie }}</option>

                        {% endfor %}

                    </select>

                </div>

                <button type="submit" class="btn btn-primary">Get Recommendations</button>

            </form>



       {% if movie_list %}

    <h2>Recommended Movies:</h2>

    <div class="row">

        {% for i in range(recommended_movie_titles|length) %}

            <div class="col-md-3">

                <div class="card mb-3">

                    <img src="{{ recommended_movie_posters[i] }}" class="card-img-top" alt="...">

                    <div class="card-body">

                        <h5 class="card-title">{{ recommended_movie_titles[i] }}</h5>

                    </div>

                </div>

            </div>

        {% endfor %}

    </div>

{% endif %}



        </div>



        <!-- Optional JavaScript -->

        <!-- jQuery first, then Popper.js, then Bootstrap JS -->

        <script src="https://code.jquery.com/jquery-3.3.1.slim.min.js" integrity="sha384-q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo" crossorigin="anonymous"></script>

        <script src="https://cdn.jsdelivr.net/npm/@popperjs/[email protected]/dist/umd/popper.min.js"></script>

        <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/js/bootstrap.min.js" integrity="sha384-JjSmVgyd0p3pXB1rRibZUAYoIIy6OrQ6VrjIEaFf/nJGzIxFDsf4x0xIM+B07jRM" crossorigin="anonymous"></script>

    </body>

</html>





Cecilia Lau

Dev at NT GROUP

1 年

Nice post. Is it deployed?

回复

要查看或添加评论,请登录

AI With Noor的更多文章

社区洞察