21 Days to Data Challenge : Complete?

21 Days to Data Challenge : Complete?

No alt text provided for this image

Introduction

It didn’t take long in every new role for me to be tasked with creating or managing spreadsheets. While it wasn’t in my job description, I enjoyed the challenge, and others noticed. Though no one crowned me Queen of Excel, my knack for turning whiteboards into dashboards often sparked, “Who made that?!” moments. I strategically positioned myself as the go-to problem solver in data organization and visualization.

Recently, I participated in the 21 Days to Data Challenge to deepen my skills in analytics, a field where I’ve found both passion and purpose. The experience introduced me to tools like OpenRefine, Tableau, and Python’s seaborn library, while reinforcing my SQL and data wrangling capabilities. Avery Smith’s advice stuck with me:

"The best project is the one you finish..."

That quote evolved into a mindset that propelled me to dig deeper into the data and push past technical challenges.

The challenge provided daily lessons, exercises, and almost-daily LinkedIn shares to complete, and it was an incredible experience.

What I Learned

  • Manipulating data frames in Python with pandas.
  • Creating charts using the seaborn library.
  • Writing SQL queries to extract insights.
  • Exploring tools I hadn’t used before, like OpenRefine, Flourish , CSV SQL LIVE, 谷歌 Colab, and Tableau .

Key Analysis Takeaways

  • Crime trends: There’s a significant dip in crime between January 3rd and 4th (New Year’s recovery, anyone?), but overall, January through March sees the highest crime rates of the year.
  • High-risk groups: Women aged 25–44 are at greater risk of being targeted, while perpetrators are mostly men in the same age range.
  • Timing & location: Fridays see the most incidents, and most crimes occur at residences, which aligns with domestic partner-related incidents.

The Process

No alt text provided for this image

Step 1: Define the Problem or Question

Provide data-backed insights to help the Police Commissioner reduce crime in NYC’s “Data District.”

Step 2: Collect & Examine the Data

The subset of New York City's open-source reported crime data can be found here. Data dictionary for further explanations of terms can be found here.

The dataset contained over 109,000 rows of crime reports, including dates, times, locations, and details about victims and suspects.

Step 3: Clean the Data

After importing the dataset into OpenRefine , I took my time to familiarize and understand the data. This helps me to identify errors and pin-point anything that seems illogical.

With a Text Facet selection in the 'VIC_AGE_GROUP' column, the granular view is shown on the left-hand side.

No alt text provided for this image

Several outliers are seen as negative or triple-digit numbers meant to represent the victim's age. I renamed them to fall in the UNKNOWN category. Although they may have been typos, I cannot assume, so I must take the logical step to mark them as UNKNOWN.

No alt text provided for this image

Next, I checked that the dates were properly formatted and, NOT to my surprise, the CMPLNT_TO_DT had a data type of TEXT. Common mistake, simple fix.

No alt text provided for this image

In the column header, I selected drop down > Edit Cells > Common Transforms > To date. This took care of the entire column.

No alt text provided for this image

Now I'm not going to sit here and give screenshots all day because cleaning is a PROCESS....but one more because, C'MON!!!!

Can you see what the problem is? If not, that's what you have me here for! [hint: Look at the years...EEEEK!]

No alt text provided for this image

I wanted to pull the hour of the day from the timestamps the complaints were reported. To do that I imported the data into 谷歌 Sheets [I'm familiar with Microsoft Excel and we're on an adventure here...let's explore].

I added a column to the right of the CMPLNT_FR_DT and labeled it HOUR_OF_DAY.

=LEFT(F2,FIND(":",F2)-1)        
No alt text provided for this image

After splitting and aggregating, this HOUR_OF_DAY column will eventually provide a count for incident occurrences throughout the day for the timeline of the data.

Another interesting insight could potentially come from the day of the week to include determining weekend, so that required two more new columns and formulas.


=WEEKDAY(D2)        
No alt text provided for this image

=IF(OR(E2=7,E2=1),1,0)         
No alt text provided for this image

Step 4: Analyze & Visualize & Analyze some more...

I started with a racing chart in Flourish to track the daily crime reports per borough. I then wanted to get a visual for the total incident accounts per borough so I threw the stats into a column chart.

No alt text provided for this image
No alt text provided for this image

Next I used SQL queries with the CSV SQL LIVE cloud platform to find specific parameters requested by the Commissioner. The priority was to retrieve the counts of the top 5 borough and offense combinations.

SELECT BORO_NM, OFNS_DESC, COUNT(*) AS incident_number 

FROM "crime_csv" 

GROUP BY BORO_NM, OFNS_DESC 

ORDER BY incident_number DESC 

LIMIT 5        
No alt text provided for this image

Petit Larceny is the most frequent, but the amount in Manhattan is significantly more than Brooklyn who ranked the highest for overall reported incidents. There may be something to look further into there.

Next I imported the dataset into 谷歌 Colab to manipulate and see what I could see with Python and [code snippets above - do not run all at once break up by line to follow my methods].

import?pandas?as?pd

df?=?pd.read_excel('data_district_crime.xlsx')
df.head()
df.describe()
df[['HOUR','BORO_NM']]
df.loc[10:15]
df['BORO_NM'].value_counts()
df[df['BORO_NM']=='BROOKLYN']['HOUR'].count()
df.columns
df.info()
df['MONTH']?=?df['CMPLNT_TO_DT'].dt.month
df.head(10)
df['MONTH'].value_counts()
df['DAY_OF_WEEK']?=?df['CMPLNT_TO_DT'].dt.day_name()
df['DAY_OF_WEEK'].value_counts()        
No alt text provided for this image

Incorporated some charts via the library.

No alt text provided for this image

Finally, I created an interactive dashboard in Tableau . Click image to go to public dash.

No alt text provided for this image

My Tableau Public dashboard includes:

  • Crime reports by borough.
  • Victim demographics (age, race, sex).
  • Offense breakdowns by type.
  • Hourly crime trends.
  • An interactive map with crime details.

Step 5: Interpret and Apply [See Key Analysis Takeaways]

Conclusion

Overall I learned a plethora of new skills and platforms to keep tinkering with. I met a lot of great people that I now feel have formed a tiny data community to hold one another accountable going forward.

The experience pushed me and guided me to new things I may or may not have learned on my own. It was nice to have a a motivated and knowledgeable guide.

Thank you, Avery Smith ! Also, shout out to Catherine | Chris French | Sarah Rajani | Ashley Zacharias | Kimberly Ouimette | James Charest ! We did it!

Hillary Ruby Lani Kisser

Data Analyst & Visualization Engineer - Insights might be difficult to get from data, but I help make them clear and actionable.

2 年

Great work Angela

回复
Kimberly Ouimette

Research Analyst | Master of Science in Data Analytics and Policy

2 年

Awesome work! Love how in-depth you went!????

I love how you organized this! Awesome job!

Rafael Rivera

AWS Cloud Engineer - Containers | DevOps | Kubernetes | 2x AWS Certified | 2x CompTIA | DoD IAM-I | Army Veteran |

2 年

Awesome project Angela.

回复
Chris French

I run a data analytics learning platform geared towards unappreciated and underpaid professionals. Give DataFrenchy a look!

2 年

The details and organization is amazing!! Fantastic job Angela!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了