Train Wreck Project
Caroline J.
Data Analyst | Business Intelligence | I help companies drive data informed decision making | Remote
Introduction
A family friend and her husband were recently involved in a train accident while the traveling through Missouri. The train derailed and tumbled down a hillside! They both sustained relatively significant injuries, involving being airlifted to a nearby medical facility. Thankfully, they are now completely healed. But this unanticipated incident was traumatic and got me thinking about the safety of trains.
Using a data set that was found online, and working with another data analyst, we were able to discover some important information about train accidents in the U.S. across the span of forty-six years.
Data
The original data set can be found on Kaggle here: 1975-2021 Train Wreck Dataset.
Initially, the Excel file contained 239,488 rows and 141 columns. We worked together to first determine what questions we wanted to answer using this data. Then, we collaborated to decide which columns to utilize and which to delete, based on necessity. Additionally, we did some pretty extensive data cleaning in Excel to standardize the formatting of all of the columns, to determine how to deal with null values and to weed out obvious errors/outliers. It was honestly the dirtiest data set either of us have used, so it was a great opportunity to filter row by row and collaboratively decide how to handle the issues mentioned here. After data cleaning in Excel, we were left with 63 columns and 235,680 rows.
Aggregating the Data
In order to access all of the SQL queries we wrote for this project, please reference our shared GitHub repository here: Train Wreck Github Repository.?
To begin, we investigated and discovered which were the 20 highest nearest stations where train collisions occurred, using this query:
Then, we were curious about which types of vehicles were most often involved in the train wrecks. We wrote this query to investigate this.
Next, we looked into the most frequent position of the highway user when the collision occurred using this query.
Afterwards, we wanted to query what kinds of weather most commonly occur during collisions involving trains.
Then, we investigated the number of train collisions across time from 1975 to 2021 using the following query.
Finally, we wrote this query to understand how view obstruction impacts train wrecks.
Data Visualization
First, we created a bar graph to visualize which of the nearest stations had the most frequent train collisions, noticing that Houston was the “winner” by a landslide!
领英推荐
Next, we investigated the highway user frequency. In other words, which of the highway users were most frequently involved in the train wrecks.
Then, we turned our attention to discovering what the weather was like during times when train wrecks occurred. We were surprised to find that the weather was mostly clear!
Afterwards, we investigated the most common highway user position. That is to say, where was the highway user when the collision occurred, which was usually crossing the tracks.
At this point, we wanted to see the trend over time and discovered that there was an initial upward trend in the mid 1970s, after which time there has been a steady decline in train collisions.
Finally, we created a bar graph to visualize the various types of view obstructions. Interestingly, the vast majority of train collisions cannot be correlated to any kind of obstructed view.
Recap & Recommendations
Luckily, my friend recovered from her collapsed lung, broken collarbone and 7 broken ribs. Likewise, her husband's small brain bleed also resolved. However, not all victims are so lucky, which is why it is important to determine what can be done to further increase the safety of trains and train travel.
To recap our major findings:
Additionally, we discovered that over the span of the last forty six years, 124 souls have lost their lives due to train crashes. During that same time period, from 1975 to 2021, 7,863 individuals have been injured during train wrecks. Both of those statistics include passengers and train personnel combined. And while the train collisions have steadily declined over time, we still need to find and implement measures to reduce this occurrence even further.
In terms of recommendations, we brainstormed some ideas regarding how to make train travel safer and came up with the following thoughts:
Here's the link to the explore the full project in?Tableau.
Action!
I thank you for reading and welcome your feedback! Please consider following me or connecting on LinkedIn at?Carly Jocson. And please keep me in mind for any remote positions as a data analyst!
Experienced Data Analyst | SQL & Python Enthusiast | Power BI Specialist | Strategic Decision Maker ?????? #DataAnalysis #SQL #Python #PowerBI #BusinessIntelligence
1 年It is an interesting dataset that you have worked on Carly! With so many columns to choose from, it would have been tough to drill down on which ones to keep. The article is well written. Liked the use of different visuals. I wonder what makes Houston area so prone to accidents. I’m not able to access the GitHub link.I would like to collaborate on a project if any one is interested.
?? I help people land their first data job (even with no experience) ?? Join 10k+ other analysts & get my newsletter! ??? Host of The Data Career Podcast
1 年Love the use of the SQL snippets! So easy to read!? For the weather, most accidents occured when conditions were clear. But conditions are probalby usually clear? It would be cool if there was data for number of clear days vs otherwise. That way we could compare proportions instead of absolutes! The line chart was particularily impactful! Loved seeing it.? Overall, great project!!
Director of Research | Author of The Road Map to Data Analytics
1 年Great job, Caroline! Love the thorough write up!
Data Analyst who ?? Excel | SQL | Tableau | I analyze and interpret data so companies have the information and insights they need to make sound business decisions.
1 年Interesting project, Caroline, especially with train derailments and safety being in the news lately. I like how you used a variety of SQL techniques and Tableau visualizations.