登录查看更多内容

点击“继续加入或登录”，即表示您同意遵守领英的《用户协议》、《隐私政策》及《Cookie 政策》。

End to End Movie Recommendation System with Flask app

AI With Noor

AI Engineer | Genai | RAG | Expertise in AI,ML,NLP,DL | YouTuber | Data Engineer - Python | Computer Vision | AI Tutor

发布日期: 2023年4月25日

+ 关注

End to End Movie Recommendation System with Flask app

Introduction

In this blog post, we will go through the process of building an end-to-end machine learning project. We will start by acquiring a dataset from Kaggle, then we will preprocess the data and train a machine learning model. Finally, we will deploy the model as a web application using Flask.

Getting Data from Kaggle

Kaggle is a popular platform that hosts various datasets for machine learning. To get data from Kaggle, we first need to create an account on Kaggle and join a competition or find a dataset of interest.

Once we have found a dataset, we can download it directly from Kaggle. However, some datasets might require us to accept a competition rule or agreement first. After downloading the dataset, we can extract the files and load the data into our project.

Preprocessing Data

Before we can train a machine learning model, we need to preprocess the data. Preprocessing involves tasks such as cleaning the data, handling missing values, scaling the data, and encoding categorical features.

In our project, we will use pandas for data preprocessing. Pandas is a powerful library that provides data structures and functions for data analysis. We will load the data into a pandas dataframe and perform various preprocessing tasks on the dataframe.

Training Machine Learning Model

Once we have preprocessed the data, we can train a machine learning model. In our project, we will use the scikit-learn library for machine learning. Scikit-learn is a popular library that provides various machine learning algorithms and tools for model selection, evaluation, and preprocessing.

We will use the cosine similarity for our project. It is used to calculate distance between vectors.

After training the model, we will serialize the model using the pickle library. Pickle is a library that allows us to save Python objects in a binary format. We will save the trained model as a file so that we can load it later in our Flask application.

Building Flask Application

Flask is a popular web framework for Python that allows us to build web applications quickly and easily. We will use Flask to build a web application that takes an input image and predicts the class of the image using the trained model.

Our Flask application will have two routes: a home route and a prediction route. The home route will display a simple HTML page with a form for uploading an image. The prediction route will take the uploaded image, preprocess the image, and make a prediction using the trained model. The prediction result will be displayed on a new page.

Conclusion

In this blog post, we have gone through the process of building an end-to-end machine learning project. We started by acquiring a dataset from Kaggle, then we preprocessed the data and trained a machine learning model. Finally, we deployed the model as a web application using Flask.

The code for the project can be found below.

Note Book Code:

import numpy as np

import pandas as pd

[3]

movie = pd.read_csv('tmdb_5000_movies.csv')

credits = pd.read_csv("tmdb_5000_credits.csv")

[4]

movie.head()

[5]

credits.head()

merging both dataset

[6]

movies = movie.merge(credits, on='title')

[7]

movies.head()

keeping important columns

[8]

movies.columns

Index(['budget', 'genres', 'homepage', 'id', 'keywords', 'original_language',

'original_title', 'overview', 'popularity', 'production_companies',

'production_countries', 'release_date', 'revenue', 'runtime',

'spoken_languages', 'status', 'tagline', 'title', 'vote_average',

'vote_count', 'movie_id', 'cast', 'crew'],

dtype='object')

[9]

movies = movies[['id','title','overview','keywords','genres','cast','crew']]

[10]

movies

checking null vals

[11]

movies.isnull().sum()

id 0

title 0

overview 3

keywords 0

genres 0

cast 0

crew 0

dtype: int64

[12]

moveis.dropna(inplace=True)

working with overview `

[13]

movies.iloc[0].overview

'In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization.'

[14]

# it is a string so convert is into list

movies['overview'] = movies['overview'].apply(lambda x:x.split())

[15]

movies['overview'][0]

['In',

'the',

'22nd',

'century,',

'a',

'paraplegic',

'Marine',

'is',

'dispatched',

'to',

'the',

'moon',

'Pandora',

'on',

'a',

'unique',

'mission,',

'but',

'becomes',

'torn',

'between',

'following',

'orders',

'and',

'protecting',

'an',

'alien',

'civilization.']

working with keywords

[16]

movies['keywords'][0]

'[{"id": 1463, "name": "culture clash"}, {"id": 2964, "name": "future"}, {"id": 3386, "name": "space war"}, {"id": 3388, "name": "space colony"}, {"id": 3679, "name": "society"}, {"id": 3801, "name": "space travel"}, {"id": 9685, "name": "futuristic"}, {"id": 9840, "name": "romance"}, {"id": 9882, "name": "space"}, {"id": 9951, "name": "alien"}, {"id": 10148, "name": "tribe"}, {"id": 10158, "name": "alien planet"}, {"id": 10987, "name": "cgi"}, {"id": 11399, "name": "marine"}, {"id": 13065, "name": "soldier"}, {"id": 14643, "name": "battle"}, {"id": 14720, "name": "love affair"}, {"id": 165431, "name": "anti war"}, {"id": 193554, "name": "power relations"}, {"id": 206690, "name": "mind and soul"}, {"id": 209714, "name": "3d"}]'

[17]

import ast # use to convert string to integer

def keywords(obj):

l = []

for i in ast.literal_eval(obj):

l.append(i['name'])

return l

[18]

movies['keywords'] = movies['keywords'].apply(keywords)

[19]

movies['keywords']

0 [culture clash, future, space war, space colon...

1 [ocean, drug abuse, exotic island, east india ...

2 [spy, based on novel, secret agent, sequel, mi...

3 [dc comics, crime fighter, terrorist, secret i...

4 [based on novel, mars, medallion, space travel...

...

4804 [united states–mexico barrier, legs, arms, pap...

4805 []

4806 [date, love at first sight, narration, investi...

4807 []

4808 [obsession, camcorder, crush, dream girl]

Name: keywords, Length: 4806, dtype: object

working with genres

[20]

movies['genres'][0]

'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'

[21]

import ast # use to convert string to integer

def genres(obj):

l = []

for i in ast.literal_eval(obj):

l.append(i['name'])

return l

[22]

movies['genres'] = movies['genres'].apply(genres)

[23]

movies['genres']

0 [Action, Adventure, Fantasy, Science Fiction]

1 [Adventure, Fantasy, Action]

2 [Action, Adventure, Crime]

3 [Action, Crime, Drama, Thriller]

4 [Action, Adventure, Science Fiction]

...

4804 [Action, Crime, Thriller]

4805 [Comedy, Romance]

4806 [Comedy, Drama, Romance, TV Movie]

4807 []

4808 [Documentary]

Name: genres, Length: 4806, dtype: object

working with cast

[24]

movies['cast'][0]

'[{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri", "credit_id": "52fe48009251416c750ac9cb", "gender": 1, "id": 8691, "name": "Zoe Saldana", "order": 1}, {"cast_id": 25, "character": "Dr. Grace Augustine", "credit_id": "52fe48009251416c750aca39", "gender": 1, "id": 10205, "name": "Sigourney Weaver", "order": 2}, {"cast_id": 4, "character": "Col. Quaritch", "credit_id": "52fe48009251416c750ac9cf", "gender": 2, "id": 32747, "name": "Stephen Lang",

import ast # use to convert string to integer

def cast(obj):

l = []

# intersted in top three cast

count = 0

for i in ast.literal_eval(obj):

if count != 3:

l.append(i['name'])

count+=1

else:

break

return l

movies['cast'] = movies['cast'].apply(cast)

[26]

movies['cast']

0 [Sam Worthington, Zoe Saldana, Sigourney Weaver]

1 [Johnny Depp, Orlando Bloom, Keira Knightley]

2 [Daniel Craig, Christoph Waltz, Léa Seydoux]

3 [Christian Bale, Michael Caine, Gary Oldman]

4 [Taylor Kitsch, Lynn Collins, Samantha Morton]

...

4804 [Carlos Gallardo, Jaime de Hoyos, Peter Marqua...

4805 [Edward Burns, Kerry Bishé, Marsha Dietlein]

4806 [Eric Mabius, Kristin Booth, Crystal Lowe]

4807 [Daniel Henney, Eliza Coupe, Bill Paxton]

4808 [Drew Barrymore, Brian Herzlinger, Corey Feldman]

Name: cast, Length: 4806, dtype: object

working with crew

[27]

movies['crew'][0]

'[{"credit_id": "52fe48009251416c750aca23", "department": "Editing", "gender": 0, "id": 1721, "job": "Editor", "name": "Stephen E. Rivkin"}, {"credit_id": "539c47ecc3a36810e3001f87", "department": "Art", "gender": 2, "id": 496, "job": "Production Design", "name": "Rick Carter"}, {"credit_id": "54491c89c3a3680fb4001cf7", "department": "Sound", "gender": 0, "id": 900, "job":

import ast # use to convert string to integer

def crew(obj):

l = []

# interested in top three cast

count = 0

for i in ast.literal_eval(obj):

if i['job'] == 'Director':

l.append(i['name'])

break

return l

[29]

movies['crew'] = movies['crew'].apply(crew)

[30]

movies['crew']

0 [James Cameron]

1 [Gore Verbinski]

2 [Sam Mendes]

3 [Christopher Nolan]

4 [Andrew Stanton]

...

4804 [Robert Rodriguez]

4805 [Edward Burns]

4806 [Scott Smith]

4807 [Daniel Hsia]

4808 [Brian Herzlinger]

Name: crew, Length: 4806, dtype: object

concatenating last four cols into one`

[31]

movies['tags'] = movies['overview'] + movies['cast'] + movies['crew'] + movies['keywords']

[32]

movies = movies[['id','title','tags']]

[33]

movies

removing spaces form tags

[34]

movies['tags'] = movies['tags'].apply(lambda x: [i.replace(" ", "") for i in x])

A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

movies['tags'] = movies['tags'].apply(lambda x: [i.replace(" ", "") for i in x])

[35]

movies['tags'][0]

['In',

'the',

'22nd',

'century,',

'a',

'paraplegic',

'Marine',

'is',

'antiwar',

'powerrelations',

'mindandsoul',

'3d']

applying stemming

[36]

from nltk.stem.porter import PorterStemmer

ps = PorterStemmer()

[37]

movies['tags']

0 [In, the, 22nd, century,, a, paraplegic, Marin...

1 [Captain, Barbossa,, long, believed, to, be, d...

2 [A, cryptic, message, from, Bond’s, past, send...

3 [Following, the, death, of, District, Attorney...

4 [John, Carter, is, a, war-weary,, former, mili...

...

4804 [El, Mariachi, just, wants, to, play, his, gui...

4805 [A, newlywed, couple's, honeymoon, is, upended...

4806 ["Signed,, Sealed,, Delivered", introduces, a,...

4807 [When, ambitious, New, York, attorney, Sam, is...

4808 [Ever, since, the, second, grade, when, he, fi...

Name: tags, Length: 4806, dtype: object

[38]

def stemming(text):

l = []

for i in text:

l.append(ps.stem(i))

return " ".join(l)

[39]

movies['tags'] = movies['tags'].apply(stemming)

A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

[40]

movies['tags'][10]

'superman return to discov hi 5-year absenc ha allow lex luthor to walk free, and that those he wa closest too felt abandon and have move on. luthor plot hi ultim reveng that could see million kill and chang the face of the planet forever, as well as rid himself of the man of steel. brandonrouth kevinspacey katebosworth bryansing savingtheworld dccomic invulner sequel superhero basedoncomicbook kryptonit superpow superhumanstrength lexluthor'

Vectorization code

[41]

from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(max_features=500, stop_words='english')

[42]

vectors = vectorizer.fit_transform(movies['tags']).toarray()

[43]

vectors

array([[1, 0, 0, ..., 0, 0, 0],

[0, 0, 0, ..., 0, 0, 0],

...,

[0, 0, 0, ..., 0, 0, 0],

[0, 0, 0, ..., 1, 0, 0],

[0, 0, 0, ..., 0, 0, 0]], dtype=int64)

['3d',

'accident',

'act',

'action',

'adventur',

'affair',

'aftercreditssting',

'age',

'agent',

'alcohol',

'alien',

'alway',

'young',

'zombi']

Calculating distances

[45]

from sklearn.metrics.pairwise import cosine_similarity

[46]

similarity = cosine_similarity(vectors)

[47]

movies[movies['title']=="Avatar"]

[48]

sorted(list(enumerate(similarity[0])),reverse=True,key=lambda x:x[1])

[(0, 0.9999999999999998),

(507, 0.50709255283711),

(151, 0.46188021535170054),

(1216, 0.44262666813799045),

(539, 0.38729833462074165),

(1321, 0.36514837167011066),

(1920, 0.3544587784792833),

(305, 0.3464101615137754),

(2786, 0.3450327796711771),

(1774, 0.3442651863295481),

...]

movies.iloc[100].title

'The Curious Case of Benjamin Button'

[50]

def Recommendation_system(movie):

movie_index = movies[movies['title']== movie].index[0]

distances = sorted(list(enumerate(similarity[0])),reverse=True,key=lambda x:x[1])

for i in distances[1:20]:

print(movies.iloc[i[0]].title)

[51]

Recommendation_system('Avatar')

Independence Day

Beowulf

Aliens vs Predator: Requiem

Titan A.E.

The Thing

Lifeforce

Treasure Planet

Attack the Block

Martian Child

Edge of Tomorrow

Predators

Meet Dave

Capricorn One

Tears of the Sun

Under the Skin

Independence Daysaster

Lockout

Aliens in the Attic

E.T. the Extra-Terrestrial

Pickling files

[53]

import pickle

pickle.dump(movies, open('model.pkl','wb'))

pickle.dump(similarity, open('similarity’,'wb'))

Flask Code;

from flask import Flask, request, render_template

import requests

import pandas as pd

import pickle



app = Flask(__name__)



# loading models

# movies = pd.read_csv('movies.csv')

movies = pickle.load(open('model.pkl', 'rb'))

similarity = pickle.load(open('similarity.pkl', 'rb'))



# function to fetch movie poster

def fetch_poster(movie_id):

    url = "https://api.themoviedb.org/3/movie/{}?api_key=390e76286265f7638bb6b19d86474639&language=en-US".format(movie_id)

    data = requests.get(url)

    data = data.json()

    full_path = "https://image.tmdb.org/t/p/w500/" + data['poster_path']

    return full_path



# function to get recommended movies

def get_recommendations(movie):

    # get the index of the selected movie

    idx = movies[movies['title'] == movie].index[0]

    # get pairwise similarity scores of all movies with the selected movie

    sim_scores = list(enumerate(similarity[idx]))

    # sort the movies based on similarity scores in descending order

    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # get top 20 similar movies (excluding the selected movie)

    sim_scores = sim_scores[1:21]

    # get titles and posters of the recommended movies

    movie_indices = [i[0] for i in sim_scores]

    movie_titles = movies['title'].iloc[movie_indices].tolist()

    movie_posters = [fetch_poster(movies['id'].iloc[i]) for i in movie_indices]

    return movie_titles, movie_posters



# home page

@app.route('/')

def home():

    movie_list = movies['title'].tolist()

    return render_template('index.html', movie_list=movie_list)



# recommendation page

@app.route('/recommend', methods=['POST'])

def recommend():

    movie_title = request.form['selected_movie']

    recommended_movie_titles, recommended_movie_posters = get recommendations(movie_title)

    return render_template('index.html', movie_list=movies['title'].tolist(),

                           recommended_movie_titles=recommended_movie_titles,

                           recommended_movie_posters=recommended_movie_posters)



if __name__ == '__main__':

    app.run(debug=True)

HTML Code:

<!doctype html>

<html>

    <head>

        <title>Movie Recommender</title>

        <link rel="stylesheet"  integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous">

    </head>

    <body style="background:#D9F799">

        <div style="color:white; margin-top:15px; border-radius:20px;" class="container my-3 mt-3 bg-dark">

            <h1 class="text-center">Movie Recommendation System</h1>

            <form action="/recommend" method="POST">

                <div class="form-group">

                    <label for="movie-select">Select a movie:</label>

                    <select class="form-control" id="movie-select" name="selected_movie">

                        {% for movie in movie_list %}

                            <option value="{{ movie }}">{{ movie }}</option>

                        {% endfor %}

                    </select>

                </div>

                <button type="submit" class="btn btn-primary">Get Recommendations</button>

            </form>



       {% if movie_list %}

    <h2>Recommended Movies:</h2>

    <div class="row">

        {% for i in range(recommended_movie_titles|length) %}

            <div class="col-md-3">

                <div class="card mb-3">

                    <img src="{{ recommended_movie_posters[i] }}" class="card-img-top" alt="...">

                    <div class="card-body">

                        <h5 class="card-title">{{ recommended_movie_titles[i] }}</h5>

                    </div>

                </div>

            </div>

        {% endfor %}

    </div>

{% endif %}



        </div>



        <!-- Optional JavaScript -->

        <!-- jQuery first, then Popper.js, then Bootstrap JS -->

        <script src="https://code.jquery.com/jquery-3.3.1.slim.min.js" integrity="sha384-q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo" crossorigin="anonymous"></script>

        <script src="https://cdn.jsdelivr.net/npm/@popperjs/[email protected]/dist/umd/popper.min.js"></script>

        <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/js/bootstrap.min.js" integrity="sha384-JjSmVgyd0p3pXB1rRibZUAYoIIy6OrQ6VrjIEaFf/nJGzIxFDsf4x0xIM+B07jRM" crossorigin="anonymous"></script>

    </body>

</html>

Cecilia Lau

Dev at NT GROUP

1 年

Nice post. Is it deployed?

要查看或添加评论，请登录

AI With Noor的更多文章

Amazon Product Recommendation using machine learning with website

2023年4月16日

Amazon Product Recommendation using machine learning with website

Amazon Product Recommendation · dataset · Project Analysis Report · Recommendation system Data Overview The dataset…
Concrete strength prediction model analysis report and project coding part with website.

2023年4月16日

Concrete strength prediction model analysis report and project coding part with website.

Concrete strength prediction model analysis report and project coding part. · About dataset and its goal.
The Abalone Age Prediction Machine Learning Project Report with model, front end and backend code;

2023年4月15日

The Abalone Age Prediction Machine Learning Project Report with model, front end and backend code;

Contents: · Abalone · About dataset · Project report · Notebook code · Flask code · Frontend code (html bootstrap) What…
Crop Recommendation System Project Report & analysis, Coding part with website

2023年4月15日

Crop Recommendation System Project Report & analysis, Coding part with website

Crop Recommendation System Project Report & analysis. 1.
What is a Pipeline in Machine Learning? How to create one?

2023年4月14日

What is a Pipeline in Machine Learning? How to create one?

\What is a Pipeline in Machine Learning? How to create one? Machine learning is an essential aspect of modern-day…
Breast Cancer Project Analysis Report:

2023年4月12日

Breast Cancer Project Analysis Report:

Source code: https://github.com/611noorsaeed/Breast-Cancer-Prediction-Model-using-Machine-Learning Project Report:…
Minmax Scalling with Python Machine learning

2023年4月12日

Minmax Scalling with Python Machine learning

When working with numerical data, it's important to ensure that the values are on the same scale. One common technique…
2050 events prediction models By AI, ML, DL

2023年4月12日

2050 events prediction models By AI, ML, DL

2050 events prediction models By AI, ML, DL Predicting future events is always a challenging task, but with the advent…
Employee Attrition Prediction Project Analysis and Research Paper

2023年3月19日

Employee Attrition Prediction Project Analysis and Research Paper

Employee Attrition Prediction Project Analysis: · Research Report. · Project Research.
Creating a TF-IDF Model from Scratch in Python

2023年2月27日

Creating a TF-IDF Model from Scratch in Python

Introduction to TF-IDF TF-IDF is a method of information retrieval that is used to rank the importance of words in a…

See all articles

AI With Noor的更多文章

Amazon Product Recommendation using machine learning with website

Concrete strength prediction model analysis report and project coding part with website.

The Abalone Age Prediction Machine Learning Project Report with model, front end and backend code;

Crop Recommendation System Project Report & analysis, Coding part with website

What is a Pipeline in Machine Learning? How to create one?

Breast Cancer Project Analysis Report:

Minmax Scalling with Python Machine learning

2050 events prediction models By AI, ML, DL

Employee Attrition Prediction Project Analysis and Research Paper

Creating a TF-IDF Model from Scratch in Python

社区洞察