Analyzing US Gun Violence Data

Analyzing US Gun Violence Data

In this post, we will be analyzing US gun violence data using the data set available on a popular site Kaggle.

Before starting to analyze data, its very important to understand why we need to analyze the data. In every data analysis use case, to understand the basic aim of analysis is the most fundamental step.

Every year, US witnesses a large number of gun violence incidents which are not terrorist activities but cases where people with unstable mindset (due to numerous reasons) shoot on innocent people. A large number of people are affected by these incidents, so it’s important to study data to understand and identify common points or patterns which government can use to provide better security to its citizens. So let’s hit it!

First Step: Import the Data Set

All the data analysis that we would be doing would be in python and we would be using pandas, numpy as well as modules like plotly for visualizations.

Following is the code to import the data set into a data frame ‘df’:

#Import pandas library
import pandas as pd
#Import data set
df = pd.read_csv('/Users/toughguy/Downloads/gun-violence-data_01-2013_03-2018.csv')
#prints the data for few rows
df.head()
No alt text provided for this image

Now that we have data frame available, let’s see what information we have in dataframe that we imported:

df.info()

The above line of code prints out the following:

No alt text provided for this image

Next Steps: Analyzing data frame

The above columns found in data frame can be used to understand and infer lots of information. But the information inferred should be useful enough. Till now, we just know the columns and type of data we have. For better understanding, lets visualize data and plot all the data points on US map to understand number of incidents reported in each state.

Number of incidents by state

We would use the current dataframe ‘df’ and perform operations to extract number of incidents of gun violence happened in US. Let’s see the results. Following is the code:

# Get statewise incidents rows and count
statewise_numbers = df.groupby('state')['state'].count()
statewise_numbers.columns = ['states', 'count']
statewise_numbers.rename(columns={'state': 'State', '': 'Number_of_incidents'}, inplace=True)

This would generate following results:

No alt text provided for this image
No alt text provided for this image

So, we now have the total count statewise for gun violence incidents. Let’s plot these on US map for better visualization.

Visualizing and plotting data on US map

Following is the code to import the appropriate libraries and plot data points:

#For plotting
from matplotlib import pyplot as plt
import seaborn as sns
%matplotlib inline
# Import FIPS of all US states
import pandas as pd
df_states = pd.read_csv('/Users/admin/my-documents/fips_state.csv')


# Create new data frame with statewise numbers
df_new = pd.DataFrame({'state':statewise_numbers.index, 'Number_of_incidents':statewise_numbers.values})
# Rename data frame columns
df_new.rename(columns={'post_code':'state'},inplace=True)
# Merge df_new and df_states
result = pd.merge(df_new,df_states,on='state',how='inner')

We have prepared data to be visualized above and ‘result’ data now looks like below:

No alt text provided for this image

For visualization, use the code below:

import plotly.plotly as py
import plotly.graph_objs as go
import plotly.tools as tls
scl = [
    [0.0, 'rgb(240,230,255)'],
    [0.1, 'rgb(224,204,255)'],
    [0.2, 'rgb(209,179,255)'],
    [0.3, 'rgb(194,153,255)'],
    [0.4, 'rgb(179,128,255)'],
    [0.5, 'rgb(163,102,255)'],
    [0.6, 'rgb(148,77,255)'],
    [0.7, 'rgb(133,51,255)'],
    [0.8, 'rgb(117,26,255)'],
    [0.9, 'rgb(102,0,255)'],
    [1.0, 'rgb(92,0,230)']
]
result['text'] = result['state']
data = [go.Choropleth(
    colorscale = scl,
    autocolorscale = False,
    locations = result['post_code'],
    z = result['Number_of_incidents'].astype(float),
    locationmode = 'USA-states',
    text = result['text'],
    marker = go.choropleth.Marker(
        line = go.choropleth.marker.Line(
            color = 'rgb(255,255,255)',
            width = 2,
        )),
    colorbar = go.choropleth.ColorBar(
        title = "Number of incidents")
)]
layout = go.Layout(
    title = go.layout.Title(
        text = 'Gun Violence by State<br>(Hover for breakdown)'
    ),
    geo = go.layout.Geo(
        scope = 'usa',
        projection = go.layout.geo.Projection(type = 'albers usa'),
        showlakes = True,
        lakecolor = 'rgb(255, 255, 255)'),
)
fig = go.Figure(data = data, layout = layout)
py.iplot(fig, filename = 'd3-cloropleth-map')

Above code would generate below visualization:

No alt text provided for this image

From above, we can infer that highest number of incidents were reported in Illionis, California, Texas and Florida

Above is just an example. Similarly, we can infer a lot more information from the data. That’s all folks.

Thanks for reading guys. I’ll try to post more frequently with similar data sets and new use cases :)

要查看或添加评论,请登录

Nishank Arora的更多文章

  • Feature Selection for faster analytics

    Feature Selection for faster analytics

    Feature selection is an important aspect when analyzing datasets with large number of features. It is one of the most…

  • IPL matches?—?An interesting Data Analysis

    IPL matches?—?An interesting Data Analysis

    Introduction We generate a lot of data today in our daily lives considering the data-hungry smart devices be it your…

    1 条评论

社区洞察

其他会员也浏览了