登录查看更多内容

Analyzing US Gun Violence Data

Nishank Arora

Project Manager - Capgemini | Ex-Svam | Microsoft Certified Data Analyst Associate

发布日期: 2019年5月29日

In this post, we will be analyzing US gun violence data using the data set available on a popular site Kaggle.

Before starting to analyze data, its very important to understand why we need to analyze the data. In every data analysis use case, to understand the basic aim of analysis is the most fundamental step.

Every year, US witnesses a large number of gun violence incidents which are not terrorist activities but cases where people with unstable mindset (due to numerous reasons) shoot on innocent people. A large number of people are affected by these incidents, so it’s important to study data to understand and identify common points or patterns which government can use to provide better security to its citizens. So let’s hit it!

First Step: Import the Data Set

All the data analysis that we would be doing would be in python and we would be using pandas, numpy as well as modules like plotly for visualizations.

Following is the code to import the data set into a data frame ‘df’:

#Import pandas library
import pandas as pd
#Import data set
df = pd.read_csv('/Users/toughguy/Downloads/gun-violence-data_01-2013_03-2018.csv')
#prints the data for few rows

df.head()

Now that we have data frame available, let’s see what information we have in dataframe that we imported:

df.info()

The above line of code prints out the following:

Next Steps: Analyzing data frame

The above columns found in data frame can be used to understand and infer lots of information. But the information inferred should be useful enough. Till now, we just know the columns and type of data we have. For better understanding, lets visualize data and plot all the data points on US map to understand number of incidents reported in each state.

Number of incidents by state

We would use the current dataframe ‘df’ and perform operations to extract number of incidents of gun violence happened in US. Let’s see the results. Following is the code:

# Get statewise incidents rows and count
statewise_numbers = df.groupby('state')['state'].count()
statewise_numbers.columns = ['states', 'count']

statewise_numbers.rename(columns={'state': 'State', '': 'Number_of_incidents'}, inplace=True)

This would generate following results:

So, we now have the total count statewise for gun violence incidents. Let’s plot these on US map for better visualization.

Visualizing and plotting data on US map

Following is the code to import the appropriate libraries and plot data points:

#For plotting
from matplotlib import pyplot as plt
import seaborn as sns
%matplotlib inline
# Import FIPS of all US states
import pandas as pd
df_states = pd.read_csv('/Users/admin/my-documents/fips_state.csv')


# Create new data frame with statewise numbers
df_new = pd.DataFrame({'state':statewise_numbers.index, 'Number_of_incidents':statewise_numbers.values})
# Rename data frame columns
df_new.rename(columns={'post_code':'state'},inplace=True)
# Merge df_new and df_states

result = pd.merge(df_new,df_states,on='state',how='inner')

We have prepared data to be visualized above and ‘result’ data now looks like below:

For visualization, use the code below:

import plotly.plotly as py
import plotly.graph_objs as go
import plotly.tools as tls
scl = [
    [0.0, 'rgb(240,230,255)'],
    [0.1, 'rgb(224,204,255)'],
    [0.2, 'rgb(209,179,255)'],
    [0.3, 'rgb(194,153,255)'],
    [0.4, 'rgb(179,128,255)'],
    [0.5, 'rgb(163,102,255)'],
    [0.6, 'rgb(148,77,255)'],
    [0.7, 'rgb(133,51,255)'],
    [0.8, 'rgb(117,26,255)'],
    [0.9, 'rgb(102,0,255)'],
    [1.0, 'rgb(92,0,230)']
]
result['text'] = result['state']
data = [go.Choropleth(
    colorscale = scl,
    autocolorscale = False,
    locations = result['post_code'],
    z = result['Number_of_incidents'].astype(float),
    locationmode = 'USA-states',
    text = result['text'],
    marker = go.choropleth.Marker(
        line = go.choropleth.marker.Line(
            color = 'rgb(255,255,255)',
            width = 2,
        )),
    colorbar = go.choropleth.ColorBar(
        title = "Number of incidents")
)]
layout = go.Layout(
    title = go.layout.Title(
        text = 'Gun Violence by State<br>(Hover for breakdown)'
    ),
    geo = go.layout.Geo(
        scope = 'usa',
        projection = go.layout.geo.Projection(type = 'albers usa'),
        showlakes = True,
        lakecolor = 'rgb(255, 255, 255)'),
)
fig = go.Figure(data = data, layout = layout)

py.iplot(fig, filename = 'd3-cloropleth-map')

Above code would generate below visualization:

From above, we can infer that highest number of incidents were reported in Illionis, California, Texas and Florida

Above is just an example. Similarly, we can infer a lot more information from the data. That’s all folks.

Thanks for reading guys. I’ll try to post more frequently with similar data sets and new use cases :)

要查看或添加评论，请登录

Nishank Arora的更多文章

Feature Selection for faster analytics

2019年5月29日

Feature Selection for faster analytics

Feature selection is an important aspect when analyzing datasets with large number of features. It is one of the most…
IPL matches?—?An interesting Data Analysis

2019年5月29日

IPL matches?—?An interesting Data Analysis

Introduction We generate a lot of data today in our daily lives considering the data-hungry smart devices be it your…

1 条评论

Analyzing US Gun Violence Data

Nishank Arora

Project Manager - Capgemini | Ex-Svam | Microsoft Certified Data Analyst Associate

First Step: Import the Data Set

Next Steps: Analyzing data frame

Number of incidents by state

Visualizing and plotting data on US map

Nishank Arora的更多文章

社区洞察

其他会员也浏览了

"A well regulated Militia"...

L.F. Ed. 4: How Family, Community, Compassion, and Trust have turned a Marine’s passion into reality.

Uniting Global Forces for a Safer Tomorrow: World Police Summit 2024 in Dubai

Don’t Like Metal Detectors? These Are the Next Gen Alternatives

Another Mass Shooting. More Lives Lost. When Will We Take Security Seriously?

You’ve Been Flagged as a Threat: Predictive AI Technology Puts a Target on Your Back

TWO DEAD, FIVE WOUNDED IN MASS SHOOTING AT MAPLEWOOD PARK IN ROCHESTER, NEW YORK

Gunshot Detection Systems Market May See a Big Move

Murder-Suicide at Riverside Regional Medical Center in Newport News, Virginia. Police Found Husband and Wife Dead From Gunshot Wounds.

Gunshot Detection Works, We're Just Not Measuring It Right

First Step: Import the Data Set

Next Steps: Analyzing data frame

Number of incidents by state

Visualizing and plotting data on US map

Nishank Arora的更多文章

Feature Selection for faster analytics

IPL matches?—?An interesting Data Analysis

社区洞察

其他会员也浏览了

"A well regulated Militia"...

L.F. Ed. 4: How Family, Community, Compassion, and Trust have turned a Marine’s passion into reality.

Uniting Global Forces for a Safer Tomorrow: World Police Summit 2024 in Dubai

Don’t Like Metal Detectors? These Are the Next Gen Alternatives

Another Mass Shooting. More Lives Lost. When Will We Take Security Seriously?

You’ve Been Flagged as a Threat: Predictive AI Technology Puts a Target on Your Back

TWO DEAD, FIVE WOUNDED IN MASS SHOOTING AT MAPLEWOOD PARK IN ROCHESTER, NEW YORK

Gunshot Detection Systems Market May See a Big Move

Murder-Suicide at Riverside Regional Medical Center in Newport News, Virginia. Police Found Husband and Wife Dead From Gunshot Wounds.

Gunshot Detection Works, We're Just Not Measuring It Right