My Data Journey: 21 Days to Data
Ashley Zacharias
Data Analyst @ dentsu || Sharing data insights with Tableau, SQL, Excel
Introduction
Data is all around us. We use it regularly in our lives from monitoring our health on Smart watches to “Googling” a word or phrase. It’s as important in our everyday lives as it is in every business and industry. My background is in mathematics and I have spent the last 10 years as an educator, but I have found my passion and connection between these two in data analytics. I am continuously looking for ways to grow my skills by learning and applying all that I can with data, personally and professionally.
I joined Avery Smith’s #21DaystoDataChallenge to continue to upskill my knowledge with the process of cleaning, analyzing and visualizing data effectively. Throughout this challenge I learned about data jargon, careers in data, steps for cleaning raw data, descriptive analytics/statistics, visualization including maps and dashboards, using Tableau, SQL, and Python to wrangle and visualize data, and how to present insight to stakeholders. This #21DaystoDataChallenge helped me take steps each day to analyze a real data set and create a project to share my insights about the crime in New York City.
The Mission
THE PROBLEM
The New York City Police Commissioner was concerned about the crime happening in the city and desired insights to form an actionable, data-driven plan to combat the city’s crime. He wanted to learn more about the types of crime and when and where they are happening.
KEY TAKEAWAYS FROM THE ANALYSIS BELOW
THE DATA
The original New York Crime Data in its entirety can be found here.
A subset of this New York Crime Data can be found on Kaggle at the link below.
This data is open source and credible, as it is provided directly by the NYPD. It is also current through 2020; however, when we used this data it was updated only through 2018. There are over 109,000 rows of data with about 35 columns of information including types of crime reported, time and location of the incident, suspect and victim details, and more.
The Process
We began by cleaning the data by using Google Sheets and Open Refine. While the process of data cleaning can be tedious and long, it is essential to ensure data integrity and useful information for analysis. I primarily used the filtering feature in Google Sheets to clean this data set, focusing on removing duplicates, checking for outliers that didn’t make sense, and fixing any mis-entries.?
I found and fixed outliers that didn't make sense with Complaint Dates (noting the years didn't seem all accurate) and within the Victim’s Age column (finding negative ages). I also cleaned the data by combining like groups, such as some of the categorical crime types listed ("BUS (NYC TRANSIT)", "BUS(OTHER)", AND "BUS STOP" all became "BUS").
Not only did I fix data in some columns, but I also thought it useful to add a few columns to the data set including “Hour” and “Day of the Week,” using the formulas below, knowing this would help me better understand when these crimes were occurring.
Finally, I wanted to organize data by borough to get a better picture of crime information for each area. I calculated the crime count for each borough.
I also calculated the crime incidents for each date available in the most recent year (2018) for each borough, which will be helpful in seeing changes in crime over time.
Analysis
Once the data was sufficiently cleaned, we were asked to answer some questions of the NYC Police Commissioner. I began by using Flourish to create a bar chart to show the Police Commissioner a comparison between the number of crimes that occurred in each borough. This graph shows us that Brooklyn had the highest reported crimes while Staten Island consistently had the lowest.
领英推荐
Thinking carefully about what could impact crime reporting in each borough, I was also curious to see how the number of crimes reported looked when considering population size. The combination bar and line chart made in Flourish below shows how Bronx actually had the highest number of crimes per person and Queens had the least for its population size.
I also created a Racing Line Chart in Flourish that you can see play out here. You can also view the image below. This shows changes of reported crime over time for each borough.
Digging Deeper in Analysis
After gaining initial insights from these visualizations and thinking back to the problem the Police Commissioner is looking to solve, we were tasked with taking a deeper look into the analysis using SQL and Python.
Using the cloud-based CSV SQL LIVE website, I uploaded the cleaned data set here to run queries, such as the top offenses occurring in each borough, by count and type.
I also wanted to compare the highest and lowest reported hours of the day. I discovered that 3:00-6:00 p.m. (shown as 15, 16, 17, and 18 in the image below) has the most reported crimes while 5:00 a.m. has the least reported.
Moving into Google Colab to analyze with Python, I discovered more about our data set.
First I imported the pandas library and uploaded the cleaned data set. I used this to read the file and view the data set. I determined which day of the week had the highest reported crimes, which turned out to be Friday, using the code below.
I also imported the seaborn library to create neat visualizations of the data for the Police Commissioner, comparing the number of crimes reported per borough as well as show where each type of crime occurred in a map.
Finally, using Tableau, I created a Dashboard to provide a one-stop spot for all of the information the Police Commissioner wished to learn about.?
To view and interact with the dashboard, please click here.
Conclusion?
As a result of this analysis, I could advise the Police Commissioner to add more police officers to cover Brooklyn and Bronx, particularly residence's homes and the streets during the late afternoon to work to combat crime in New York City.
Prior to the #21DaystoDataChallenge, I had not posted on LinkedIn nor shared any of my experiences with data analysis. I am so proud to have grown my network and myself professionally with data analytics!
DURING THIS CHALLENGE I LEARNED
I am continually working to learn more and expand my data skills. I welcome any suggestions or feedback! Please feel free to connect with me and message me on LinkedIn!?
I look forward to sharing more data projects in the future!
Data Specialist
2 年Wow, really great job! ??
Data Analyst & Visualization Engineer - Insights might be difficult to get from data, but I help make them clear and actionable.
2 年Congratulations Ashley ??
I run a data analytics learning platform geared towards unappreciated and underpaid professionals. Give DataFrenchy a look!
2 年Awesome job Ashley!! Very excited to see what the future holds for you
'Data with Sarah' ? Data Analyst at Government of AB (Ministry of Justice) ? Sharing practical data tips, insights, and lessons learned
2 年Great job!
Seasoned CRM Strategist | Higher Education Technology Consultant | Expert in Systems Integration, Project Management, and Digital Transformation
2 年Great job! Can't wait to see more of your projects. You are on your way!