Arise and build a Bible-rooted community.Claim Your Free 999 Pesos Bonus Today

Introduction

London is one of the most multicultural cities in the world. It is a melting pot of cultures, where one can meet people from all the parts of the world and taste the best of the world cuisine. It is a major centre for banking and finance, insurance, world trade, media, advertising, tourism, theatre, fashion, arts and more. Fusing gritty, historic pomp with shimmering modernity, world-class culture and fashion-forward shopping, the UK’s capital has it all and there’s something for everyone. The vibrancy of the city extends across all 32 of its boroughs, all of which are home to a plethora of unique neighbourhoods.

Business Problem

The decision to move to a new a city or a new country altogether, is a harrowing one. But after having decided to move to London, the next challenge one faces is to decide where to live in London. If one looks at the map of London, they will find a haphazard cluster of neighbourhoods and villages, each with their own distinct features and identity. Some of London’s best neighbourhoods are usually established on the typical tourist trail, while others are constantly evolving, taking turns to emerge as the new cool hotspot. The following questions then arise in our mind,

Which neighbourhood is right for us?
Which part of the city has the best parks and playgrounds?
Which schools fall in the neighbourhood?
What area has the best craft beer scene or all-night eateries?
Where can one find the hippest bookstores or outdoor yoga?

And at the top of all these doubts, the most intriguing questions anyone would face are,

What is the crime rate in the area?
Is it a secure neighbourhood?
Is it safe to venture out in the night?

All these questions and more plague our mind and then the quest to find the answers begins.

Objective of the Capstone Project

The objective of this assignment is to give an insight into what some of the safest London neighbourhoods can offer its residents and tourists.

?To help uncover the best that London has to offer, this project aims to do the following,

Identify the safest boroughs and wards in London based on the latest crime data
Find the Latitude & the Longitude coordinates of the preferred neighbourhoods by using their Postcodes
Plot the safest neighbourhoods on the Map of London using the geographical coordinates obtained
Locate the most common venues in the vicinity of 500 metres from these neighbourhoods
Cluster these neighbourhoods based on the common venues using a Machine Learning algorithm (K-Means Clustering)

Interested Parties

The objective of this project is to identify and recommend the best & safest neighbourhoods in London to anyone who wants to visit or relocate to London. The interested parties could be anyone from the below mentioned list,

Young couples
Families with children
Executives
Tourists, etc.

Description of Data

1.?MPS Ward Level Crime Data for London:

This dataset has been extracted from the Metropolitan Police Service’s “Recorded Crime: Geographic Breakdown” Data available on the London Datastore, https://data.london.gov.uk/dataset/recorded_crime_summary
This data provides the number of crimes recorded per month according to crime type at the geographic level of London’s Wards for the period July 2019 to June 2021

2.?List of London Boroughs:

This dataset has been extracted from the Wikipedia.org page: https://en.wikipedia.org/wiki/List_of_London_boroughs
It has been used to fetch more information on the different Boroughs of London, like the local authority of the borough, the political party controlling the local authority, the Head Quarters of the local authority, the area of the Borough, its population, its coordinates, and its designated number on the map of London
With this information we can get more insight in to the various Boroughs of London

3. London Postcodes:

This dataset has been extracted from Doogal.co.uk: https://www.doogal.co.uk/london_postcodes.php
The dataset has a complete list of London postcode districts
Even though this dataset already had the Latitude and the Longitude data available, I have used the ArcGIS API to re-fetch the coordinates of the preferred locations

4. ArcGIS API Data:

ArcGIS (https://www.arcgis.com) is an online API that enables us to connect people, locations, and data using interactive maps
We use the ArcGIS API to get the geographical coordinates (Latitude and Longitude) of the neighbourhoods of London by providing the Postcodes of the desired locations
The following information is obtained for each Postcode,

- Latitude: Latitude of the Postcode

- Longitude: Longitude of the Postcode

5.?Foursquare API Data:

Foursquare (https://foursquare.com) is a location data provider with information about different venues and events within an area of interest
The information obtained from the Foursquare API includes venue names, locations, menus, reviews, photos, etc.
The Foursquare location platform is, thus, used by us as a data source since all the required information about the different venues in various neighbourhoods of the desired Borough or Ward can be obtained through their API

Methodology

A] Importing Libraries:

Libraries used in this Project are,

Pandas:?For creating and manipulating dataframes
Numpy:?For scientific computation
JSON:?To handle JSON files
Requests:?To handle http requests
Matplotlib:?It is a data visualisation and graphical plotting library
Plotly:?It is also a visualisation library for creating interactive and publication-quality charts / graphs
Folium:?It is used for visualising geospatial data and plotting interactive maps
Geocoder:?To retrieve Location Data
Scikit Learn:?To use K-Means Clustering, a Machine Learning Algorithm

import pandas as pd
import numpy as np


import json
from pandas.io.json import json_normalize
import requests


%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors


import plotly.express as px
import plotly.graph_objects as go


import folium


import geocoder
from geopy.geocoders import Nominatim


from arcgis.geocoding import geocode
from arcgis.gis import GIS


from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearnex import patch_sklearn
patch_sklearn()

B] Extracting, Scraping, Exploring, Cleaning and Processing the Datasets:

After importing all the required libraries, we will extract the data from different sources and clean it so that it is ready for processing and analysing

Dataset 1: Metropolitan Police Service Ward Level Crime Data for London

The extracted data is the most recent data available updated till June 2021
This data counts the number of crimes per month according to crime type at the geographic level of London’s Wards for the period July 2019 to June 2021
In March 2019, the Metropolitan Police Service started to provide offences grouped as per the updated Home Office crime classifications, which has been incorporated in the dataset
Before cleaning the data, the dataset contained 29 columns
Post cleaning and processing the data, 5 columns have been renamed and 1 has been added
The dataset now contains 30 columns
While the exploring the dataset, it was found that there were two Wards by the name of "Belmont" in Harrow as well as in Sutton. Hence, in order to segregate them so as not to cause any confusion during analysis, their names were changed to "Belmont Harrow" and "Belmont Sutton".
Further, in order to maintain consistency, the names of these two Wards were also changed in the fourth dataset, which had the London Postcodes
The original dataset contained a total of 22,403 records
Once the dataset was processed to include only the Top 5 safest Boroughs of London, the number of records reduced to 3,007 from 22,403 records
After the dataset was processed further, to include only the Top 50 safest Wards of London, the number of records reduced to 1,549 from 3,007 records

crime_df = pd.read_csv("MPS Ward Level Crime (most recent 24 months).csv")

columns = ["Crime Head", "Crime Sub-Head", "Ward", "Ward Code", "Borough", 201907, 201908, 201909, 201910, 201911, 201912, 202001, 202002, 202003, 202004, 202005, 202006, 202007, 202008, 202009, 202010, 202011, 202012, 202101, 202102, 202103, 202104, 202105, 202106]

crime_df.columns = columns

crime_df = crime_df.reindex(["Ward Code", "Ward", "Borough", "Crime Head", "Crime Sub-Head", 201907, 201908, 201909, 201910, 201911, 201912, 202001, 202002, 202003, 202004, 202005, 202006, 202007, 202008, 202009, 202010, 202011, 202012, 202101, 202102, 202103, 202104, 202105, 202106], axis = 1)

crime_df["Total"] = crime_df.sum(numeric_only = True, axis = 1)

crime_df.loc[((crime_df["Ward"] == "Belmont") & (crime_df["Borough"] == "Harrow")), "Ward"] = "Belmont Harrow"

crime_df.loc[((crime_df["Ward"] == "Belmont") & (crime_df["Borough"] == "Sutton")), "Ward"] = "Belmont Sutton"

Dataset 2: List of London Boroughs

The dataset, “List of London Boroughs”, has been extracted from Wikipedia.org
It has been used to fetch more information on the different Boroughs of London, like the local authority of the borough, the political party controlling the local authority, the Head Quarters of the local authority, the area of the Borough, its population, its coordinates, and its designated number on the map of London
With this information we can get more insight in to the various Boroughs of London
Post cleaning and processing the data, 2 columns have been dropped and 5 columns have been renamed
The dataset now contains 8 columns
The dataset contains a total of 32 records, which is the total number of London Boroughs, excluding the City of London

london_bor_list_url = "https://en.wikipedia.org/wiki/List_of_London_boroughs"

london_bor_list = pd.read_html(london_bor_list_url)

london_bor_df = london_bor_list[0]

london_bor_df.columns=["Borough", "Inner", "Status", "Local Authority", "Political Control", "Head Quarters", "Area (sq mi)", "Population (2013 estimate)",
"Co-ordinates", "Borough No. on Map"]

london_bor_df = london_bor_df.replace("note 1", "", regex=True)?
london_bor_df = london_bor_df.replace("note 2", "", regex=True)?
london_bor_df = london_bor_df.replace("note 3", "", regex=True)?
london_bor_df = london_bor_df.replace("note 4", "", regex=True)?
london_bor_df = london_bor_df.replace("note 5", "", regex=True)

london_bor_df["Borough"].replace({
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "Barking and Dagenham[]" : "Barking and Dagenham",
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "Greenwich []" : "Greenwich",
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "Hammersmith and Fulham[]" : "Hammersmith and Fulham"
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? }, inplace = True)

london_bor_df = london_bor_df.drop(["Inner", "Status"], axis = 1)

Dataset 3: Merged Dataset of Dataset 1 and Dataset 2

The third dataset has been created by merging the first two datasets, i.e., by merging the datasets, “MPS Ward Level Crime (most recent 24 months)” and “List of London Boroughs”
The two datasets have been merged on the common column present in both the datasets, i.e., the “Borough” column
After merging and reindexing the columns, the dataset contains 37 columns
The dataset contains a total of 22,403 records

london_crime_df = pd.merge(crime_df, london_bor_df , on = 'Borough')

london_crime_df = london_crime_df.reindex(["Ward Code", "Ward", "Borough", "Local Authority", "Political Control", "Head Quarters", "Area (sq mi)", "Population (2013 estimate)", "Co-ordinates", "Borough No. on Map", "Crime Head", "Crime Sub-Head", 201907, 201908, 201909, 201910, 201911, 201912, 202001, 202002, 202003, 202004, 202005, 202006, 202007, 202008, 202009, 202010, 202011, 202012, 202101, 202102, 202103, 202104, 202105, 202106, "Total"], axis = 1))

The merged dataset provides more information on the different Boroughs of London, like the local authority of the borough, the political party controlling the local authority, the address of the local authority, the area of the Borough, its population, its coordinates, and its designated number on the map of London
Thus, with this information we can get more insight in to the various Boroughs of London

Dataset 4: London Postcodes

As the name suggests, this dataset has been used to fetch the Postcodes of the different neighbourhoods in London
This dataset has been created to find Venues in the Neighbourhood of London using the Foursquare API
It has been extracted from Doogal.co.uk and has a complete list of London postcode districts
The original dataset had 49 columns, but after the cleaning process, 28 columns were dropped as the same were not required for analysis
Post cleaning and processing the data, the dataset contains 21 columns
While the exploring the 1st Dataset, i.e., “London Crime”, it was found that there were two Wards by the name of "Belmont" in Harrow as well as in Sutton. Hence, in order to segregate them so as not to cause any confusion during analysis, their names were changed to "Belmont Harrow" and "Belmont Sutton".
In order to maintain consistency, the names of these two Wards were also changed in this dataset
Even though this dataset already had the Latitude and the Longitude data available, I have used the ArcGIS API to re-fetch the coordinates of the preferred locations
Before cleaning the data, the dataset contained a total of 3,24,634 records
The number of records were reduced to 1,79,704 from 3,24,634 after removing the Postcodes that were not in use
Once the dataset was processed to include only the Top 5 safest Boroughs of London, the number of records reduced to 20,249 from 1,79,704
After the dataset was processed further, to include only the Top 50 safest Wards of London, the number of records reduced to 10,083 from 20,249
Now, we could have used these 10,083 Postcodes of the Top 50 Wards of London to find their coordinates, but the process of fetching the coordinates for so many postcodes would have taken a lot of time. Hence, it was necessary to reduce the number of records further.
Therefore, in order to reduce the dataset further, I selected the location that was nearest to the Station
After processing the dataset, it was found that there was a total of 81 locations in the Top 50 safest Wards of London that were nearest to the Stations
Thus, the number of Postcodes were reduced from 10,083 to 81

london_postcodes_df = pd.read_csv("London Postcodes.csv", low_memory = False)

london_postcodes_df = london_postcodes_df.drop(["County", "Country", "County Code", "Introduced", "Terminated", "Parish", "National Park", "Population", "Households", "Built up area", "Built up sub-division", "Rural/urban", "Region", "Altitude", "Local authority", "Parish Code", "Census output area", "Index of Multiple Deprivation", "Quality", "User Type", "Last updated", "Police force", "Water company", "Plus Code", "Average Income", "Sewage Company", "Travel To Work Area"], axis = 1)

london_postcodes_df = london_postcodes_df[london_postcodes_df["In Use?"] == "Yes"]

london_postcodes_df = london_postcodes_df.drop(["In Use?"], axis = 1)

postcode_cols = ["Postcode Data", "Latitude Data", "Longitude Data", "Easting", "Northing", "Grid Ref", "Borough", "Ward", "Borough Code", "Ward Code", "Constituency", "Lower Layer Super Output Area", "London Zone", "LSOA Code", "MSOA Code", "Middle Layer Super Output Area", "Constituency Code", "Nearest Station", "Distance To Station", "Postcode Area", "Postcode District"]

london_postcodes_df.columns = postcode_cols

postcode_cols_new = ["Postcode Data", "Latitude Data", "Longitude Data", "Nearest Station", "Distance To Station", "Ward Code", "Ward", "Borough Code", "Borough", "Constituency Code", "Constituency", "LSOA Code", "Lower Layer Super Output Area", "MSOA Code", "Middle Layer Super Output Area", "London Zone", "Postcode Area", "Postcode District", "Easting", "Northing", "Grid Ref"]

london_postcodes_df = london_postcodes_df.reindex(postcode_cols_new, axis = 1)

london_postcodes_df.loc[((london_postcodes_df["Ward"] == "Belmont") & (london_postcodes_df["Borough"] == "Harrow")), "Ward"] = "Belmont Harrow"

london_postcodes_df.loc[((london_postcodes_df["Ward"] == "Belmont") & (london_postcodes_df["Borough"] == "Sutton")), "Ward"] = "Belmont Sutton"

C] Understanding the Dataset Using Groupby Function and Charts:

We will then use the Groupby Function and Charts to understand the data better

During this process, the dataset will be used to find Boroughs that have the highest and the lowest crime rate
After having found the boroughs with the lowest crime rate, the data will be sorted, and the 5 safest Boroughs in London will be identified
Though the 5 Boroughs identified can easily serve our purpose, as these 5 Boroughs are the safest ones as compared to the other Boroughs of London; we will further try to eliminate the areas with crime so as to find the most secure venues for our target audience
If we take all the 92 Wards from the shortlisted 5 safe Boroughs, there may still be a possibility that some of the Venues could fall in the "unsafe" Ward of that particular safe Borough
Therefore, in order to avoid such a scenario and to ensure that the Venues found are from the most secure areas of London, another layer of safety will be added to identify the 10 Most Safest Wards within each of the 5 Most Safest Boroughs
Thus, out of a total of 615 Wards in the whole of London, we will shortlist only the 50 Most Safest Wards

(i) Group the Dataframe by “Borough”

bor_crime_df = crime_df.groupby("Borough").sum()

bor_crime_df.sort_values(by = ["Total", "Borough"], inplace = True)

Plotting the Bar Chart of the Total Crimes Recorded During the Period July 2019 to June 2021

bar_chart = px.bar(
? ? ? ? ? ? ? ? ? ? ? ? bor_crime_df,
? ? ? ? ? ? ? ? ? ? ? ? title = "Total Crimes Recorded in London Boroughs During the Period July 2019 to June 2021",
? ? ? ? ? ? ? ? ? ? ? ? color = "Total",
? ? ? ? ? ? ? ? ? ? ? ? color_continuous_scale = [(0, "cyan"), (0.25, "yellow"), (0.5, "red"), (0.75, "red"), (1, "maroon")],
? ? ? ? ? ? ? ? ? ? ? ? width = 1000,
? ? ? ? ? ? ? ? ? ? ? ? height = 700
? ? ? ? ? ? ? ? ? )

bar_chart.show()

Plotting the Horizontal Bar Chart of the Total Crimes Recorded During the Period July 2019 to June 2021

h_bar_chart = px.bar(
? ? ? ? ? ? ? ? ? ? ? ? bor_crime_df,
? ? ? ? ? ? ? ? ? ? ? ? title = "Total Crimes Recorded in London Boroughs During the Period July 2019 to June 2021",
? ? ? ? ? ? ? ? ? ? ? ? color = "Total",
? ? ? ? ? ? ? ? ? ? ? ? orientation = "h",
? ? ? ? ? ? ? ? ? ? ? ? color_continuous_scale = [(0, "cyan"), (0.25, "yellow"), (0.5, "red"), (0.75, "red"), (1, "maroon")],
? ? ? ? ? ? ? ? ? ? ? ? width = 1000,
? ? ? ? ? ? ? ? ? ? ? ? height = 750
? ? ? ? ? ? ? ? ? ? )

h_bar_chart.show()

(ii) Group the Dataframe by “Ward”

Plotting the Horizontal Bar Chart for the Total Crimes Recorded in the Top 20 Most Safest London Wards During the Period July 2019 to June 2021

top_ward_crime_df = crime_df.groupby("Ward").sum()

top_ward_crime_df.sort_values(by = ["Total", "Ward"], ascending = True, inplace = True)

top_ward_crime_df = top_ward_crime_df.head(20)

h_bar_chart = px.bar(
? ? ? ? ? ? ? ? ? ? ? ? top_ward_crime_df,
? ? ? ? ? ? ? ? ? ? ? ? title = "Total Crimes Recorded in the Top 20 Most Safest London Wards During the Period July 2019 to June 2021",
? ? ? ? ? ? ? ? ? ? ? ? color = "Total",
? ? ? ? ? ? ? ? ? ? ? ? orientation = "h",
? ? ? ? ? ? ? ? ? ? ? ? color_continuous_scale = [(0, "cyan"), (0.25, "lightgreen"), (0.5, "yellow"), (0.75, "orange"), (1, "red")]
? ? ? ? ? ? ? ? ? ? )

h_bar_chart.show()

Plotting the Horizontal Bar Chart for the Total Crimes Recorded in Worst 20 Most Dangerous London Wards During the Period July 2019 to June 2021

worst_ward_crime_df = crime_df.groupby("Ward").sum()

worst_ward_crime_df.sort_values(by = ["Total", "Ward"], ascending = True, inplace = True)

worst_ward_crime_df = worst_ward_crime_df.tail(20)

h_bar_chart = px.bar(
? ? ? ? ? ? ? ? ? ? ? ? worst_ward_crime_df,
? ? ? ? ? ? ? ? ? ? ? ? title = "Total Crimes Recorded in Worst 20 Most Dangerous London Wards During the Period July 2019 to June 2021",
? ? ? ? ? ? ? ? ? ? ? ? color = "Total",
? ? ? ? ? ? ? ? ? ? ? ? orientation = "h",
? ? ? ? ? ? ? ? ? ? ? ? color_continuous_scale = [(0, "red"), (0.25, "darkred"), (0.5, "maroon"), (0.75, "maroon"), (1, "indigo")]
? ? ? ? ? ? ? ? ? ? )

h_bar_chart.show()

(iii) Group the Dataframe by “Crime Head”

Plotting the Horizontal Bar Chart for the Types of Crimes Recorded in London During the Period July 2019 to June 2021

type_crimes_crime_df = crime_df.groupby(["Crime Head"]).sum()

type_crimes_crime_df.sort_values(by = ["Total", "Crime Head"], ascending = True, inplace = True)

type_crimes_crime_bar = px.bar(
? ? ? ? ? ? ? ? ? ? ? ? type_crimes_crime_df,
? ? ? ? ? ? ? ? ? ? ? ? title = "Types of Crimes Recorded in London During the Period July 2019 to June 2021",
? ? ? ? ? ? ? ? ? ? ? ? color = "Total",
? ? ? ? ? ? ? ? ? ? ? ? orientation = "h",
? ? ? ? ? ? ? ? ? ? ? ? color_continuous_scale = [(0, "cyan"), (0.25, "yellow"), (0.5, "orange"), (0.75, "red"), (1, "maroon")]
? ? ? ? ? ? ? ? ? ? )

type_crimes_crime_bar.show()

(iv) Group the Dataframe by “Crime Sub-Head”

Plotting the Horizontal Bar Chart for the Top 20 Crimes Recorded in London During the Period July 2019 to June 2021

type_sub_crimes_crime_df = crime_df.groupby(["Crime Sub-Head"]).sum()

type_sub_crimes_crime_df.sort_values(by = ["Total", "Crime Sub-Head"], ascending = False, inplace = True)

type_sub_crimes_crime_df = type_sub_crimes_crime_df.head(20)

type_sub_crimes_crime_df.sort_values(by = ["Total", "Crime Sub-Head"], ascending = True, inplace = True)

type_sub_crimes_crime_bar = px.bar(
? ? ? ? ? ? ? ? ? ? ? ? type_sub_crimes_crime_df,
? ? ? ? ? ? ? ? ? ? ? ? title = "Top 20 Crimes Recorded in London During the Period July 2019 to June 2021",
? ? ? ? ? ? ? ? ? ? ? ? color = "Total",
? ? ? ? ? ? ? ? ? ? ? ? orientation = "h",
? ? ? ? ? ? ? ? ? ? ? ? color_continuous_scale = [(0, "cyan"), (0.25, "orange"), (0.5, "red"), (0.75, "maroon"), (1, "purple")]
? ? ? ? ? ? ? ? ? ? )

type_sub_crimes_crime_bar.show()

(v) Top 10 Most Safest and Worst 10 Most Dangerous Boroughs of London

Plotting the Bar Chart for the Top 10 Most Safest Boroughs of London

T10S_bor_crime_df = bor_crime_df.head(10)

T10S_bor_bar = px.bar(
? ? ? ? ? ? ? ? ? ? ? ? T10S_bor_crime_df,
? ? ? ? ? ? ? ? ? ? ? ? title = "Top 10 Most Safest Boroughs of London",
? ? ? ? ? ? ? ? ? ? ? ? color = "Total",
? ? ? ? ? ? ? ? ? ? ? ? color_continuous_scale = [(0, "cyan"), (0.25, "lightgreen"), (0.5, "yellow"), (0.75, "orange"), (1, "red")]
? ? ? ? ? ? ? ? ? ? ?)

T10S_bor_bar.show()

Plotting the Bar Chart for the Worst 10 Most Dangerous Boroughs of London

W10D_bor_crime_df = bor_crime_df.sort_values(by = "Total", ascending = False).head(10)

W10D_bor_bar = px.bar(
? ? ? ? ? ? ? ? ? ? ? ? W10D_bor_crime_df,
? ? ? ? ? ? ? ? ? ? ? ? title = "Worst 10 Most Dangerous Boroughs of London",
? ? ? ? ? ? ? ? ? ? ? ? color = "Total",
? ? ? ? ? ? ? ? ? ? ? ? color_continuous_scale=[(0, "yellow"), (0.25, "red"), (0.5, "red"), (0.75, "maroon"), (1, "maroon")]
? ? ? ? ? ? ? ? ? ? ? ? #color_continuous_scale = [(0, "orange"), (0.5, "magenta"), (1, "red")]
? ? ? ? ? ? ? ? ? ? ?)

W10D_bor_bar.show()

(vi) Top 5 Crimes in the 5 Most Safest Boroughs of London Grouped By "Borough"

Plotting the Grouped Bar Chart for the Top 5 Crimes in the 5 Most Safest Boroughs of London Grouped By "Boroughs"

df = [N01SB_T05C_df, N02SB_T05C_df, N03SB_T05C_df, N04SB_T05C_df, N05SB_T05C_df]

Top05SB_df = pd.DataFrame()

Top05SB_df = Top05SB_df.append(df, ignore_index = True)

Top05SB_df.set_index("Borough", inplace = True)

Top05SB_df_new = Top05SB_df[["Crime Head", "Total"]]

Top05SB_bar = px.bar(
? ? ? ? ? ? ? ? ? ? ? ? Top05SB_df_new,
? ? ? ? ? ? ? ? ? ? ? ? title = "Top 5 Crimes of the 5 Most Safest Boroughs of London",
? ? ? ? ? ? ? ? ? ? ? ? color = "Crime Head",
? ? ? ? ? ? ? ? ? ? ? ? barmode = "group",
? ? ? ? ? ? ? ? ? ? ? ? width = 900,
? ? ? ? ? ? ? ? ? ? ? ? height = 600
? ? ? ? ? ? ? ? ? ? ?)

Top05SB_bar.show()

(vii) Top 10 Crimes in the 5 Most Safest Boroughs of London Grouped By "Crime Head"

Plotting the Grouped Bar Chart for the Top 10 Crimes in the 5 Most Safest Boroughs of London Grouped By "Crime Head"

T10C_df = pd.DataFrame()

for i in range(5):
? ? data_df = pd.DataFrame()
? ? data_df = bor_ch_crime_df[bor_ch_crime_df["Borough"] == T10S_boroughs[i]].sort_values(by = "Total", ascending = False).head(10)
? ? T10C_df = T10C_df.append(data_df, ignore_index = True)

T10C_Top05SB_df = T10C_df[["Borough", "Crime Head", "Total"]]

T10C_Top05SB_df = T10C_Top05SB_df.reindex(["Crime Head", "Borough", "Total"], axis = 1)

T10C_Top05SB_df.sort_values(["Total"], ascending = False, inplace = True)

T10C_Top05SB_df.sort_values(["Crime Head", "Total", "Borough"], ascending = False, inplace = True)

T10C_bct_df = T10C_df[["Borough", "Crime Head", "Total"]]

T10C_bct_df = T10C_bct_df.groupby("Crime Head").sum()

T10C_bct_df.sort_values(["Total"], ascending = False, inplace = True)

T10C_bct_list = [str(i) for i in list(T10C_bct_df.index)]

T10C_df_new = pd.DataFrame()

for i in range(10):
? ? data_df_new = pd.DataFrame()
? ? data_df_new = T10C_Top05SB_df[T10C_Top05SB_df["Crime Head"] == T10C_bct_list[i]].sort_values(by = "Total", ascending = False)
? ? T10C_df_new = T10C_df_new.append(data_df_new, ignore_index = True)

T10C_df_new.set_index("Crime Head", inplace = True)

T10C_Top05SB_bar = px.bar(
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? T10C_df_new,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? title = "Top 10 Crimes of the 5 Most Safest Boroughs of London",
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? color = "Borough",
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? barmode = "group",
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? width = 1000,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? height = 700
? ? ? ? ? ? ? ? ? ? ? ? ?)

T10C_Top05SB_bar.show()

(viii) Top 50 Most Safest Wards in the 5 Most Safest Boroughs of London

Plotting the Pie Chart for the Top 50 Most Safest Wards in the 5 Most Safest Boroughs of London

bor_ward_crime_df = crime_df.copy(deep = True)

bor_ward_crime_df = bor_ward_crime_df[["Borough", "Ward", "Total"]]

bor_ward_crime_df = bor_ward_crime_df.groupby(["Borough", "Ward"]).sum()

bor_ward_crime_df.reset_index(inplace = True)

T10W_df = pd.DataFrame()

for i in range(5):
? ? data_df_updated = pd.DataFrame()
? ? data_df_updated = bor_ward_crime_df[bor_ward_crime_df["Borough"] == T10S_boroughs[i]].sort_values(by = "Total", ascending = True).head(10)
? ? T10W_df = T10W_df.append(data_df_updated, ignore_index = True)

T10W_df_bar = px.pie(
? ? ? ? ? ? ? ? ? ? ? ? T10W_df,
? ? ? ? ? ? ? ? ? ? ? ? title = "Top 50 Most Safest Wards in the 5 Most Safest Boroughs of London",
? ? ? ? ? ? ? ? ? ? ? ? values = "Total",
? ? ? ? ? ? ? ? ? ? ? ? names = "Ward",
? ? ? ? ? ? ? ? ? ? ? ? color = "Borough",
? ? ? ? ? ? ? ? ? ? ? ? width = 900,
? ? ? ? ? ? ? ? ? ? ? ? height = 700
? ? ? ? ? ? ? ? ? ? ?)

T10W_df_bar.show()

D] Collecting the Coordinates and Plotting them on the Map of London:

Once we are done with identifying the safest Boroughs and Wards of London, we will extract the Postcodes of the different neighbourhoods in London

It should be noted that even though the dataset already has the Latitude and the Longitude data available as a part of the originally downloaded dataset, we will still be using the ArcGIS API to re-fetch the coordinates of the preferred locations
The dataset was processed to identify the postcodes of the Top 50 safest Wards of London
After cleaning the data, we find that there are still 10,083 records of the Postcodes for the Top 50 safest Wards of London
Now, we could have used these 10,083 Postcodes of the Top 50 Wards of London to find their coordinates, but the process of fetching the coordinates for so many postcodes would have taken a lot of time
Hence, it is necessary to reduce the number of records further
In order to further reduce the number of locations, the neighbourhoods having distance nearest to a station in these safe Wards will be selected
This process will not only reduce the number of locations, but it will also greatly assist the target audience, as finding venues that are nearer to the stations will reduce their travel time and will also be more convenient to them
Since the number of stations that fall in these safe Wards are 81, after processing this requirement, the number of Postcodes reduce to 81 from 10,083
Further, since we have selected only those Venues that are nearest to the Station, we will rename the column "Nearest Station" to "Neighbourhood" as all these neighbourhoods are very close to the respective Stations
This dataset of Postcodes will then be used to fetch the geographical coordinates, i.e., the Latitude and Longitude, of the different neighbourhoods within the Top 50 safest Wards of London
As discussed earlier, we will use the ArcGIS API to collect the Latitude and Longitude coordinates of the neighbourhoods based on their postcodes
These coordinates will then be used to plot these locations on the Map of London

(i) Merging Crime Dataframe of the Top 50 Most Safest Wards

all_crime_df = crime_df[["Ward Code", "Ward", "Borough", "Crime Head", "Crime Sub-Head", "Total"]]

Top5_bor_crime_df = pd.DataFrame()

for i in range(5):
? ? data_df_updated = pd.DataFrame()
? ? data_df_updated = all_crime_df[all_crime_df["Borough"] == T10S_boroughs[i]]
? ? Top5_bor_crime_df = Top5_bor_crime_df.append(data_df_updated, ignore_index = True)

T10W_df_updated = T10W_df_new.copy(deep = True)

T10W_df_updated.drop(["Total"], axis = 1, inplace = True)

T10W_crime_df = pd.merge(T10W_df_updated, Top5_bor_crime_df, on = 'Ward')

T10W_crime_df.drop(["Borough_y"], axis = 1, inplace = True)

T10W_crime_df = T10W_crime_df.reindex(["Ward Code", "Ward", "Borough_x", "Crime Head", "Crime Sub-Head", "Total"], axis = 1)

T10W_crime_df.columns = ["Ward Code", "Ward", "Borough", "Crime Head", "Crime Sub-Head", "Total"]

(ii) Extracting the Postcodes of the 50 Most Safest Wards of London and Selecting Locations Nearest to a Station

top5_bor_postcode_df = london_postcodes_df.copy(deep = True)

postcode_top5_bor_df = pd.DataFrame()

for i in range(5):
? ? dataset_new = pd.DataFrame()
? ? dataset_new = top5_bor_postcode_df[top5_bor_postcode_df["Borough"] == T10S_boroughs[i]]
? ? postcode_top5_bor_df = postcode_top5_bor_df.append(dataset_new, ignore_index = True)

postcode_top50_ward_df = pd.DataFrame()

for i in range(len(T10W_df_updated)):
? ? dataset_updated = pd.DataFrame()
? ? dataset_updated = postcode_top5_bor_df[postcode_top5_bor_df["Ward"] == T10W_df_updated["Ward"][i]]
? ? postcode_top50_ward_df = postcode_top50_ward_df.append(dataset_updated, ignore_index = True)

min_dist_station_df = postcode_top50_ward_df.groupby(["Nearest Station", "Distance To Station"]).min()

min_dist_station_df.reset_index(inplace = True)

nearest_to_station_df = min_dist_station_df.drop_duplicates(subset = ["Nearest Station"], keep = "first")

nearest_to_station_df.reset_index(inplace = True)

nearest_to_station_df.drop(["index"], axis = 1, inplace = True)

(iii) Fetching Coordinates by Using the ArcGIS API

gis = GIS(


def get_coordinates_uk(address):
? ?latitude_coordinates = 0
? ?longitude_coordinates = 0
? ?g = geocode(address = "{}, London, England, GBR".format(address))[0]
? ?longitude_coordinates = g["location"]["x"]
? ?latitude_coordinates = g["location"]["y"]
? ?return str(latitude_coordinates) + "," + str(longitude_coordinates))

london_postcodes = nearest_to_station_df.loc[ : , "Postcode Data"]

london_postcodes_dfnew = pd.DataFrame(london_postcodes)

post_cols = ["Postcodes"]

london_postcodes_dfnew.columns = post_cols

london_coordinates = []

for i in range(len(london_postcodes)):
? ? london_coordinates.append(get_coordinates_uk(london_postcodes[i]))

london_latitude = []

for i in range(len(london_coordinates)):
? ? lat = london_coordinates[i].split(",")[0]
? ? lat = round(float(lat), 5)
? ? london_latitude.append(lat)

london_latitude_df = pd.DataFrame(london_latitude)

lat_cols = ["Latitude"]

london_latitude_df.columns = lat_cols

london_longitude = []

for i in range(len(london_coordinates)):
? ? long = london_coordinates[i].split(",")[1]
? ? long = round(float(long), 5)
? ? london_longitude.append(long)

london_longitude_df = pd.DataFrame(london_longitude)

long_cols = ["Longitude"]

london_longitude_df.columns = long_cols

london_pc_df = pd.concat([london_postcodes_dfnew, london_latitude_df, london_longitude_df], axis=1)

nearest_to_station_coordinates_df = pd.concat([nearest_to_station_df, london_pc_df], join = "outer", axis=1)

postcode_cols_new = ["Nearest Station", "Distance To Station", "Postcodes", "Latitude", "Longitude", "Ward Code", "Ward", "Borough Code", "Borough", "Constituency Code", "Constituency", "LSOA Code", "Lower Layer Super Output Area", "MSOA Code", "Middle Layer Super Output Area", "London Zone", "Postcode Area", "Postcode District", "Easting", "Northing", "Grid Ref", "Postcode Data", "Latitude Data", "Longitude Data"]

nearest_to_station_coordinates_df = nearest_to_station_coordinates_df.reindex(postcode_cols_new, axis = 1)

postcode_cols_updated = ["Neighbourhood", "Distance To Station", "Postcodes", "Latitude", "Longitude", "Ward Code", "Ward", "Borough Code", "Borough", "Constituency Code", "Constituency", "LSOA Code", "Lower Layer Super Output Area", "MSOA Code", "Middle Layer Super Output Area", "London Zone", "Postcode Area", "Postcode District", "Easting", "Northing", "Grid Ref", "Postcode Data", "Latitude Data", "Longitude Data"]

nearest_to_station_coordinates_df.columns = postcode_cols_updated

neighbourhood_df = nearest_to_station_coordinates_df.reindex(postcode_cols_updated, axis = 1)

(iv) Plotting All Stations on the Map of London

address = "London, England"

geolocator = Nominatim(user_agent = "london_explorer")

location = geolocator.geocode(address)

latitude = location.latitude

longitude = location.longitude

print("The coordinates of London are {}, {}.".format(latitude, longitude))"

min_dist_all_station_df = london_postcodes_df.groupby(["Nearest Station", "Distance To Station"]).min()

min_dist_all_station_df.reset_index(inplace = True)

min_dist_all_station_df.to_csv("Minimum Distance to All Stations.csv")

station_df = min_dist_all_station_df.drop_duplicates(subset = ["Nearest Station"], keep = "first")

station_df.reset_index(inplace = True)

station_df.drop(["index"], axis = 1, inplace = True)

# Creating the map of London
map_London_all_stations = folium.Map(location = [latitude, longitude], zoom_start = 10)

# Adding markers to map
for latitude, longitude, borough, ward, neighbourhood in zip(station_df["Latitude Data"], station_df["Longitude Data"], station_df["Borough"], station_df["Ward"], station_df["Nearest Station"]):
? ? label = "{}, {}, {}".format(neighbourhood, ward, borough)
? ? label = folium.Popup(label, parse_html = True)
? ? folium.CircleMarker(
? ? ? ? ? ? ? ? ? ? ? ? ? ? [latitude, longitude],
? ? ? ? ? ? ? ? ? ? ? ? ? ? radius = 5,
? ? ? ? ? ? ? ? ? ? ? ? ? ? popup = label,
? ? ? ? ? ? ? ? ? ? ? ? ? ? color = "red",
? ? ? ? ? ? ? ? ? ? ? ? ? ? fill = True
? ? ? ? ? ? ? ? ? ? ? ? ).add_to(map_London_all_stations)??

map_London_all_stations

(v) Plotting Stations in the Safest Wards of London on the Map of London

# Creating the map of London
map_London_safe_neigh = folium.Map(location = [latitude, longitude], zoom_start = 10)

# Adding markers to map
for latitude, longitude, borough, ward, neighbourhood in zip(neighbourhood_df["Latitude"], neighbourhood_df["Longitude"], neighbourhood_df["Borough"], neighbourhood_df["Ward"], neighbourhood_df["Neighbourhood"]):
? ? label = "{}, {}, {}".format(neighbourhood, ward, borough)
? ? label = folium.Popup(label, parse_html = True)
? ? folium.CircleMarker(
? ? ? ? ? ? ? ? ? ? ? ? ? ? [latitude, longitude],
? ? ? ? ? ? ? ? ? ? ? ? ? ? radius = 5,
? ? ? ? ? ? ? ? ? ? ? ? ? ? popup = label,
? ? ? ? ? ? ? ? ? ? ? ? ? ? color = "blue",
? ? ? ? ? ? ? ? ? ? ? ? ? ? fill = True
? ? ? ? ? ? ? ? ? ? ? ? ).add_to(map_London_safe_neigh)??
? ??
map_London_safe_neigh

E] Identifying Venues Around the Safest Neighbourhoods of London:

The Latitude and Longitude coordinates will be linked with the Foursquare API to identify the different venues near these neighbourhoods
In order to get the required information, we provide the Foursquare API with the Latitude and Longitude coordinates of the preferred neighbourhood
Based on the Latitude and Longitude coordinates, the Foursquare API acquires information about different venues within each neighbourhood
The data retrieved from the Foursquare API contains information of venues, which are within the radius of 500 metres of the latitude and longitude of said postcode
The following information is obtained for each venue,

- Neighbourhood: Name of the Neighbourhood

- Neighbourhood Latitude: Latitude of the Neighbourhood

- Neighbourhood Longitude: Longitude of the Neighbourhood

- Venue: Name of the Venue

- Venue Category: Category of the Venue

- Venue Latitude: Latitude of the Venue

- Venue Longitude: Longitude of the Venue

In order to understand this information better, we will analyse the data using the Groupby function

CLIENT_ID = "xxxxxxxxxxxxx" # Enter your Foursquare ID

CLIENT_SECRET = "xxxxxxxxxxxxx" # Enter your Foursquare Secret

VERSION = "20180605" # Foursquare API version

LIMIT = 100 # A default Foursquare API limit value

Function to Get the Nearby Venues

def getNearbyVenues(names, wards, boroughs, latitudes, longitudes, radius = 500):
? ??venues_list = []
? ? for name, ward, borough, lat, lng in zip(names, wards, boroughs, latitudes, longitudes):
? ? ? ? print(name)
? ? ? ? ? ??
? ? ? ? # create the API request URL
? ? ? ? url = "https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}".format(
? ? ? ? ? ? ? ? ? ? CLIENT_ID,?
? ? ? ? ? ? ? ? ? ? CLIENT_SECRET,?
? ? ? ? ? ? ? ? ? ? VERSION,?
? ? ? ? ? ? ? ? ? ? lat,?
? ? ? ? ? ? ? ? ? ? lng,?
? ? ? ? ? ? ? ? ? ? radius
? ? ? ? ? ? ? ? ? ? )
? ? ? ? ? ??
? ? ? ? # make the GET request
? ? ? ? results = requests.get(url).json()["response"]["groups"][0]["items"]
? ? ? ??
? ? ? ? # return only relevant information for each nearby venue
? ? ? ? venues_list.append([(
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? name,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ward,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? borough,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? lat,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? lng,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? v["venue"]["name"],
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? v["venue"]["categories"][0]["name"],
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? v["venue"]["location"]["lat"],
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? v["venue"]["location"]["lng"]
? ? ? ? ? ? ? ? ? ? ? ? ? ? ) for v in results])


? ? nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])

? ? nearby_venues.columns = ["Neighbourhood", "Ward", "Borough", "Neighbourhood Latitude", "Neighbourhood Longitude", "Venue", "Venue Category", "Venue Latitude", "Venue Longitude"]
? ??
? ? return (nearby_venues)

venues_london = getNearbyVenues(neighbourhood_df["Neighbourhood"], neighbourhood_df["Ward"], neighbourhood_df["Borough"], neighbourhood_df["Latitude"], neighbourhood_df["Longitude"])

F] Segmenting Neighbourhoods of London by Common Venue Categories:

(i) One Hot Encoding

Here, we will use One Hot Encoding on the column Venue Category
This will convert all the values in the column Venue Category to those many different columns

venues_london_ohe = pd.get_dummies(venues_london[["Venue Category"]], prefix = "", prefix_sep = "")

venues_london_ohe["Neighbourhood"] = venues_london["Neighbourhood"] # This adds the "Neighbourhood" column in the end

# Moving the Neighbourhood Column to the First Column
columns = [venues_london_ohe.columns[-1]] + list(venues_london_ohe.columns[ : -1])

venues_london_ohe = venues_london_ohe[columns]

neighbourhood_group_ohe = venues_london_ohe.groupby("Neighbourhood").sum()

neighbourhood_group_ohe.reset_index(inplace = True)

(ii) Printing Each Neighbourhood Along with the Top 8 Most Common Venues

We will then print each Neighbourhood along with the Top 8 Most Common Venues in that Neighbourhood

num_top_venues = 8


for neigh in neighbourhood_group_ohe["Neighbourhood"]:
? ? print("---------"+neigh+"---------")

? ? temp = neighbourhood_group_ohe[neighbourhood_group_ohe["Neighbourhood"] == neigh].T.reset_index()

? ? temp.columns = ["Venue", "Frequency"]

? ? temp = temp.iloc[ 1 : ]

? ? temp["Frequency"] = temp["Frequency"].astype(float)

? ? temp = temp.round({"Frequency" : 2})

? ? print(temp.sort_values("Frequency", ascending = False).reset_index(drop = True).head(num_top_venues))

? ? print("\n")

(iii) Transferring the Venues into a Pandas Dataframe

After this, we will create a dataframe having the columns Neighbourhood and the Top 8 Most Common Venues in those Neighbourhoods

def return_most_common_venues(row, num_top_venues):
? ? row_categories = row.iloc[1 : ]

? ? row_categories_sorted = row_categories.sort_values(ascending = False)
? ??
? ? return row_categories_sorted.index.values[0 : num_top_venues]


indicators = ["st", "nd", "rd"]

# Create columns according to number of top Venues
columns = ["Neighbourhood"]

for ind in np.arange(num_top_venues):
? ? try:
? ? ? ? columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
? ? except:
? ? ? ? columns.append('{}th Most Common Venue'.format(ind+1))

# Create a new Dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns = columns)

neighbourhoods_venues_sorted["Neighbourhood"] = neighbourhood_group_ohe["Neighbourhood"]

for ind in np.arange(neighbourhood_group_ohe.shape[0]):
? ? neighbourhoods_venues_sorted.iloc[ind, 1 : ] = return_most_common_venues(neighbourhood_group_ohe.iloc[ind, : ], num_top_venues)

G] Clustering Neighbourhoods by Common Venues (K-Means Clustering):

(i) Building a Model to Cluster the Neighbourhoods

In order to assist our Target Audience to find venues of their choice in the safest neighbourhoods of London, we will be clustering the neighbourhoods using the K-Means Clustering Algorithm
The K-Means Clustering Algorithm will cluster neighbourhoods with similar venues into different clusters
We will first use the Elbow Method to identify the Optimal Number of Clusters
Elbow?method gives us an idea on what a good?k?number of clusters would be based on the sum of squared distance (SSE) between data points and their assigned clusters’ centroids
As per this method, the optimal number of clusters is achieved when the change in slope of the line becomes small
Thus, we pick?“k” at the spot where SSE starts to flatten out and forming an elbow
After we have identified the optimal number of clusters, we will run the Machine Learning Algorithm to get the Cluster Labels

neighbourhood_group_cluster = neighbourhood_group_ohe.drop(labels = "Neighbourhood", axis = 1)

distortions = []

K = range(1,20)

for k in K:
? ? kmean = KMeans(init = "k-means++", n_clusters = k, random_state = 0, n_init = 50, max_iter = 500)

? ? kmean.fit(neighbourhood_group_cluster)

? ? distortions.append(kmean.inertia_)

plt.figure(figsize = (10, 5))

plt.plot(K, distortions, "bx-")

plt.xlabel("k")

plt.ylabel("Distortion")

plt.title("The Elbow Method")

plt.show()

# set number of clusters
k_num_clusters = 3

kmeans = KMeans(init = "k-means++", n_clusters = k_num_clusters, random_state = 0)

kmeans.fit(neighbourhood_group_cluster)

# check cluster labels generated for each row in the dataframe
labels = kmeans.labels_[0 : 80]

neighbourhoods_venues_sorted.insert(1, "Cluster Labels", kmeans.labels_)

neighbour_df = neighbourhood_df[["Neighbourhood", "Distance To Station", "Ward", "Borough", "Postcodes", "Latitude", "Longitude"]].copy(deep = True)

london_merged = neighbour_df

london_merged = london_merged.join(neighbourhoods_venues_sorted.set_index("Neighbourhood"), on = "Neighbourhood")

london_merged = london_merged.dropna(subset = ["Cluster Labels"])

(ii) Principal Component Analysis (PCA)

Applying Dimensionality Reduction Techniques helps in visualising how the Clusters are related in the original high dimensional space
Hence, in order to see how the Clusters are related in the original space, we will use Principal Component Analysis (PCA) to visualise the high dimensional data
PCA also helps in finding if the features of the data are linearly related to each other
It can be seen that the Explained Variance for the 10% of the Total Components, i.e., the first eight components, are able to preserve about 74% of the original information, thus, reducing the dimensionality of our data

pca = PCA().fit(neighbourhood_group_cluster)

pca_neigh = pca.transform(neighbourhood_group_cluster)

print("Variance Explained by Each Component (%): ")

for i in range(len(pca.explained_variance_ratio_)):
? ? ? print("\n", i+1, "o: " + str(round(pca.explained_variance_ratio_[i] * 100, 2)) + "%")

print("\nTotal Sum: " + str(round(sum(pca.explained_variance_ratio_) * 100, 2)) + "%")

print("\nExplained Variance of the First Eight Components, i.e. 10% of the Total Components: " + str(round(sum(pca.explained_variance_ratio_[0 : 8]) * 100, 2)) + "%")

c1 = []
c2 = []
c3 = []

for i in range(len(pca_neigh)):
? ? if kmeans.labels_[i] == 0:
? ? ? ? c1.append(pca_neigh[i])
? ? if kmeans.labels_[i] == 1:
? ? ? ? c2.append(pca_neigh[i])
? ? if kmeans.labels_[i] == 2:
? ? ? ? c3.append(pca_neigh[i])
? ? ? ??? ? ? ??
c1 = np.array(c1)
c2 = np.array(c2)
c3 = np.array(c3)

plt.figure(figsize = (10, 8))
plt.scatter(c1[ : , 0], c1[ : , 1], c = "red", label = "Cluster 1")
plt.scatter(c2[ : , 0], c2[ : , 1], c = "blue", label = "Cluster 2")
plt.scatter(c3[ : , 0], c3[ : , 1], c = "green", label = "Cluster 3")

plt.legend()
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.title("Low Dimensional Visualisation (PCA) - Neighbourhoods")

(iii) Visualising the Resulting Clusters on the Map of London

# Create Map of London
map_clusters = folium.Map(location = [latitude, longitude], zoom_start = 10)


# Set color scheme for the clusters
x = np.arange(k_num_clusters)
ys = [i + x + (i * x) ** 2 for i in range(k_num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


for lat, lon, mcv, poi, ward, bor, cluster in zip(london_merged["Latitude"], london_merged["Longitude"], london_merged["1st Most Common Venue"], london_merged["Neighbourhood"], london_merged["Ward"], london_merged["Borough"], london_merged["Cluster Labels"]):
? ? label = folium.Popup("Cluster " + str(int(cluster) + 1) + ":\n" + str(mcv) + ",\n" + str(poi) + ",\n" + str(ward) + ",\n" + str(bor), parse_html = True)
? ? folium.CircleMarker(
? ? ? ? ? ? ? ? ? ? ? ? ? ? [lat, lon],
? ? ? ? ? ? ? ? ? ? ? ? ? ? radius = 5,
? ? ? ? ? ? ? ? ? ? ? ? ? ? popup = label,
? ? ? ? ? ? ? ? ? ? ? ? ? ? color = rainbow[int(cluster)],
? ? ? ? ? ? ? ? ? ? ? ? ? ? fill = True,
? ? ? ? ? ? ? ? ? ? ? ? ? ? fill_color = rainbow[int(cluster)],
? ? ? ? ? ? ? ? ? ? ? ? ? ? fill_opacity = 0.5
? ? ? ? ? ? ? ? ? ? ? ? ).add_to(map_clusters)
? ? ? ??
map_clusters

(iv) Examining the Clusters

Cluster 1

cluster_1 = london_merged.loc[london_merged["Cluster Labels"] == 0, london_merged.columns[[0] + [2] + [3] + list(range(7, london_merged.shape[1]))]]

cluster_1.to_csv("Venues in the Neighbourhood of London - Cluster 1.csv")

cluster_1

Cluster 2

cluster_2 = london_merged.loc[london_merged["Cluster Labels"] == 1, london_merged.columns[[0] + [2] + [3] + list(range(7, london_merged.shape[1]))]]

cluster_2.to_csv("Venues in the Neighbourhood of London - Cluster 2.csv")

cluster_2

Cluster 3

cluster_3 = london_merged.loc[london_merged["Cluster Labels"] == 2, london_merged.columns[[0] + [2] + [3] + list(range(7, london_merged.shape[1]))]]

cluster_3.to_csv("Venues in the Neighbourhood of London - Cluster 3.csv")

cluster_3

Links to Jupyter Notebook

If, due to some reason, you are unable to view / open the Jupyter Notebook, Charts or Maps on GitHub, you may access the .ipynb file from the below mentioned links
?Link to the Jupyter Notebook on IBM Cloud:

https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/ac0cc32a-2a4b-44c9-babc-2506856bace4/view?access_token=4bd1f1dea43b545d2abaa7391720e9b947f6a990af10990b92fe5490ec212b3a

?Link to the Jupyter Notebook on Binder:

https://mybinder.org/v2/gh/vincyspereira/Coursera_Capstone/cd96eb73058886ece38132d0265b443e5aaecb58

Note:

First click on the “Week 5 – The Battle of Neighbourhoods (Part 2)” folder§?
Next, click on the “Capstone Project - The Battle of Neighbourhoods - London's Crime Rate Analysis and Clustering of the Safest Neighbourhoods of London.ipynb” file to access the Jupyter Notebook
Lastly, click the ‘File’ Menu and then select ‘Trust Notebook’ to view the charts and maps

Link to the Jupyter Notebook using ‘nbviewer’:

https://nbviewer.jupyter.org/github/vincyspereira/Coursera_Capstone/blob/cd96eb73058886ece38132d0265b443e5aaecb58/Week%205%20-%20The%20Battle%20of%20Neighborhoods%20(Part%202)/Capstone%20Project%20-%20The%20Battle%20of%20Neighbourhoods%20-%20London's%20Crime%20Rate%20Analysis%20and%20Clustering%20of%20the%20Safest%20Neighbourhoods%20of%20London.ipynb

Link to the Jupyter Notebook on GitHub:

https://github.com/vincyspereira/Coursera_Capstone/blob/cd96eb73058886ece38132d0265b443e5aaecb58/Week%205%20-%20The%20Battle%20of%20Neighborhoods%20(Part%202)/Capstone%20Project%20-%20The%20Battle%20of%20Neighbourhoods%20-%20London's%20Crime%20Rate%20Analysis%20and%20Clustering%20of%20the%20Safest%20Neighbourhoods%20of%20London.ipynb

Note:

If you are unable to view the code / charts properly on GitHub, then you may either:

§?????Click on the “Circle with Horizontal Line” symbol on the top right-hand corner to view the Jupyter Notebook with “nbviewer”

OR

§?????Click on the “Download” button to download the .ipynb file

Link to Report

Link to the Report on GitHub:

https://github.com/vincyspereira/Coursera_Capstone/blob/320adf5974202e43d9bd7f45e6fb631b5d2647de/Week%205%20-%20The%20Battle%20of%20Neighborhoods%20(Part%202)/Report%20-%20Capstone%20Project%20-%20The%20Battle%20of%20Neighbourhoods.pdf

Results and Discussion

The aim of this project is to help the Migrants and Tourists who want to explore the safest neighbourhoods of London
They can decide to stay or visit a specific neighbourhood based on their preferred cluster
Based on the type of clusters, different people, i.e., families with children, young couples, executives, or tourists, can decide which neighbourhood is best suited for them
Cluster 1:

- This cluster is mostly made up of Hotels, Pubs, Theatres, Art Galleries, Art Museums, Outdoor Sculptures and Plazas

- Thus, this cluster is most suitable for Tourists

Cluster 2:

- This cluster is mostly made up of Pubs, Coffee Shops, Cafe, Multi-Cultural Restaurants, Bars, Gyms, Sports Clubs, Supermarkets, Grocery Stores, Shopping Plazas, Fast-food Joints, etc.

- Thus, this cluster is most suitable for young couples and executives

Cluster 3:

- This is the biggest cluster from our Dataset

- It is mostly made up of Supermarkets, Bakeries, Pharmacies, Auto Garages, Parks, Playgrounds, Sports Complexes, Multi-Cultural Restaurants, Ice Cream Parlours, Fish & Chips Shops, Pubs, various stores like, Grocery, Convenience, Clothing, Furniture, Pet, Optical, Electronics, Warehouse, etc., and Train Stations

- It has almost everything that a family requires

- Thus, this cluster seems to be most suitable for families with children

This segmentation is also proved right from the PCA Chart
According to PCA, Cluster 2 and Cluster 3 seem to be Linearly Related, while Cluster 1 is not at all related to the other two clusters
As can be seen above, Clusters 2 & 3 seem to suit Migrants, who intend to stay in neighbourhoods falling in those clusters, while Cluster 1 seems to suit Tourists, who intend to visit neighbourhoods falling in that cluster

Conclusion

This Capstone Project will help families with children, young couples, executives, and tourists, to understand,

- which are the safe Boroughs, Wards and Neighbourhoods of London

- the most common venues in those neighbourhoods

- the different types of neighbourhoods based on the cluster of venue categories

- which neighbourhoods to choose as per their preference

As can be seen from the data on clusters, the aim of the project to seems to have been fulfilled.

Thank You

The Battle of Neighbourhoods: London's Crime Rate Analysis and Clustering of the Safest Neighbourhoods of London

Vincent Pereira

MBA (Financial & Computer Management) | Interested in Data Analytics, Data Science, Machine Learning and Deep Learning

领英推荐

社区洞察

其他会员也浏览了

Europe’s Housing Crisis: UK needs 4.3 million homes and Barcelona bids to ban tourist rentals

The City As An Idea

NYC Neighborhood Highlight: Murray Hill

NYC Neighborhood Highlight: Yorkville

NYC Neighborhood Highlight: Elmhurst

Top 6 Neighborhoods in Los Angeles, California to Live in 2022

Best in Texas?: US Cities with Largest Percent of $1M+ Homes. 4 in Texas. ALEX Labs. ALEX Austin, ALEX Dallas, ALEX Houston, ALEX San Antonio

The latest issue of BOMA Edmonton Industry Insights & News: May 31, 2024

Yes- In My Backyard

Best in Texas?: US Cities with Largest Percent of $1M+ Homes. 4 in Texas. ALEX Labs. ALEX Austin, ALEX Dallas, ALEX Houston, ALEX San Antonio