My Data Journey: 21 Days to Data

My Data Journey: 21 Days to Data

Introduction

Data is all around us. We use it regularly in our lives from monitoring our health on Smart watches to “Googling” a word or phrase. It’s as important in our everyday lives as it is in every business and industry. My background is in mathematics and I have spent the last 10 years as an educator, but I have found my passion and connection between these two in data analytics. I am continuously looking for ways to grow my skills by learning and applying all that I can with data, personally and professionally.

I joined Avery Smith’s #21DaystoDataChallenge to continue to upskill my knowledge with the process of cleaning, analyzing and visualizing data effectively. Throughout this challenge I learned about data jargon, careers in data, steps for cleaning raw data, descriptive analytics/statistics, visualization including maps and dashboards, using Tableau, SQL, and Python to wrangle and visualize data, and how to present insight to stakeholders. This #21DaystoDataChallenge helped me take steps each day to analyze a real data set and create a project to share my insights about the crime in New York City.


The Mission

THE PROBLEM

The New York City Police Commissioner was concerned about the crime happening in the city and desired insights to form an actionable, data-driven plan to combat the city’s crime. He wanted to learn more about the types of crime and when and where they are happening.

KEY TAKEAWAYS FROM THE ANALYSIS BELOW

  • Interesting to note, most of the crime incidents happened between the hours of 3:00-6:00 p.m. while 5:00 a.m. had the least. Crimes peaked on Fridays and were the lowest on Sundays.
  • Misdemeanors were the highest reported crimes, specifically petit larceny.?
  • While Brooklyn showed the highest crime occurrences, Bronx actually had the most crimes per person.
  • The highest percentage of suspects were males aged 25-44 while the highest percentage of victims were females within the same age group.
  • Majority of crimes occurred at the victim’s residence (whether that be apartment, house, or community home) or on public streets.

THE DATA

The original New York Crime Data in its entirety can be found here.

A subset of this New York Crime Data can be found on Kaggle at the link below.

This data is open source and credible, as it is provided directly by the NYPD. It is also current through 2020; however, when we used this data it was updated only through 2018. There are over 109,000 rows of data with about 35 columns of information including types of crime reported, time and location of the incident, suspect and victim details, and more.


The Process

We began by cleaning the data by using Google Sheets and Open Refine. While the process of data cleaning can be tedious and long, it is essential to ensure data integrity and useful information for analysis. I primarily used the filtering feature in Google Sheets to clean this data set, focusing on removing duplicates, checking for outliers that didn’t make sense, and fixing any mis-entries.?

I found and fixed outliers that didn't make sense with Complaint Dates (noting the years didn't seem all accurate) and within the Victim’s Age column (finding negative ages). I also cleaned the data by combining like groups, such as some of the categorical crime types listed ("BUS (NYC TRANSIT)", "BUS(OTHER)", AND "BUS STOP" all became "BUS").

No alt text provided for this image

Not only did I fix data in some columns, but I also thought it useful to add a few columns to the data set including “Hour” and “Day of the Week,” using the formulas below, knowing this would help me better understand when these crimes were occurring.

No alt text provided for this image
No alt text provided for this image

Finally, I wanted to organize data by borough to get a better picture of crime information for each area. I calculated the crime count for each borough.

No alt text provided for this image

I also calculated the crime incidents for each date available in the most recent year (2018) for each borough, which will be helpful in seeing changes in crime over time.

No alt text provided for this image


Analysis

Once the data was sufficiently cleaned, we were asked to answer some questions of the NYC Police Commissioner. I began by using Flourish to create a bar chart to show the Police Commissioner a comparison between the number of crimes that occurred in each borough. This graph shows us that Brooklyn had the highest reported crimes while Staten Island consistently had the lowest.

No alt text provided for this image

Thinking carefully about what could impact crime reporting in each borough, I was also curious to see how the number of crimes reported looked when considering population size. The combination bar and line chart made in Flourish below shows how Bronx actually had the highest number of crimes per person and Queens had the least for its population size.

No alt text provided for this image

I also created a Racing Line Chart in Flourish that you can see play out here. You can also view the image below. This shows changes of reported crime over time for each borough.

No alt text provided for this image


Digging Deeper in Analysis

After gaining initial insights from these visualizations and thinking back to the problem the Police Commissioner is looking to solve, we were tasked with taking a deeper look into the analysis using SQL and Python.

Using the cloud-based CSV SQL LIVE website, I uploaded the cleaned data set here to run queries, such as the top offenses occurring in each borough, by count and type.

No alt text provided for this image

I also wanted to compare the highest and lowest reported hours of the day. I discovered that 3:00-6:00 p.m. (shown as 15, 16, 17, and 18 in the image below) has the most reported crimes while 5:00 a.m. has the least reported.

No alt text provided for this image
No alt text provided for this image

Moving into Google Colab to analyze with Python, I discovered more about our data set.

First I imported the pandas library and uploaded the cleaned data set. I used this to read the file and view the data set. I determined which day of the week had the highest reported crimes, which turned out to be Friday, using the code below.

No alt text provided for this image

I also imported the seaborn library to create neat visualizations of the data for the Police Commissioner, comparing the number of crimes reported per borough as well as show where each type of crime occurred in a map.

No alt text provided for this image
No alt text provided for this image

Finally, using Tableau, I created a Dashboard to provide a one-stop spot for all of the information the Police Commissioner wished to learn about.?

  • The bar graph shows the crime incidents per borough.
  • A map with the locations of different types of crime throughout the city.
  • A pie chart showing percentages of victim ages of the total reported.
  • Graphs comparing the suspect and victim ages and genders.
  • A tree map comparing the number of incidents reported per location.

No alt text provided for this image

To view and interact with the dashboard, please click here.


Conclusion?

As a result of this analysis, I could advise the Police Commissioner to add more police officers to cover Brooklyn and Bronx, particularly residence's homes and the streets during the late afternoon to work to combat crime in New York City.

Prior to the #21DaystoDataChallenge, I had not posted on LinkedIn nor shared any of my experiences with data analysis. I am so proud to have grown my network and myself professionally with data analytics!

DURING THIS CHALLENGE I LEARNED

  • I truly enjoy working with data! I feel more confident using Google Sheets, Flourish, Tableau, SQL, and Python.
  • Visualization and helping others see my insights through them are my favorite parts of the data analysis process.
  • Clean data is such a vital component in accurately analyzing and sharing results without bias or mis-represented information.
  • Communicating the story with data and using it to guide decision making is so important!
  • Sharing my learning experiences with others and engaging with their experiences is so valuable!?

I am continually working to learn more and expand my data skills. I welcome any suggestions or feedback! Please feel free to connect with me and message me on LinkedIn!?

I look forward to sharing more data projects in the future!

Wow, really great job! ??

Hillary Ruby Lani Kisser

Data Analyst & Visualization Engineer - Insights might be difficult to get from data, but I help make them clear and actionable.

2 年

Congratulations Ashley ??

Chris French

I run a data analytics learning platform geared towards unappreciated and underpaid professionals. Give DataFrenchy a look!

2 年

Awesome job Ashley!! Very excited to see what the future holds for you

Sarah Rajani

'Data with Sarah' ? Data Analyst at Government of AB (Ministry of Justice) ? Sharing practical data tips, insights, and lessons learned

2 年

Great job!

Jennifer Bayless

Seasoned CRM Strategist | Higher Education Technology Consultant | Expert in Systems Integration, Project Management, and Digital Transformation

2 年

Great job! Can't wait to see more of your projects. You are on your way!

要查看或添加评论,请登录

Ashley Zacharias的更多文章

  • Financial Analysis with SQL

    Financial Analysis with SQL

    Introduction The World Bank Group has five organizations working to end poverty by utilizing loans, credits, and grants…

    24 条评论
  • Improving Student Outcomes: Visualizing School Data with Tableau

    Improving Student Outcomes: Visualizing School Data with Tableau

    Preparing kids for the future of our world and the work force is an essential goal for schools. Understanding how…

    15 条评论
  • SQL Murder Mystery

    SQL Murder Mystery

    SQL Murder Mystery A crime has taken place and the detective needs our help. The detective has given the crime scene…

    4 条评论

社区洞察

其他会员也浏览了