Train Wreck Project

Train Wreck Project

Introduction

A family friend and her husband were recently involved in a train accident while the traveling through Missouri. The train derailed and tumbled down a hillside! They both sustained relatively significant injuries, involving being airlifted to a nearby medical facility. Thankfully, they are now completely healed. But this unanticipated incident was traumatic and got me thinking about the safety of trains.

Using a data set that was found online, and working with another data analyst, we were able to discover some important information about train accidents in the U.S. across the span of forty-six years.

  • There were a total 235,680 train collisions between 1975 and 2021.
  • Most train wrecks happened nearest the Houston train station, which comprised 1,515 of the total collisions.
  • Autos (cars) were most frequently involved in train wrecks at 140,960, followed by trucks at 44,084.
  • Surprisingly, the vast majority of collisions, 155,661, occurred during clear conditions!
  • The most frequent highway user position was “moving over the crossing” at 169,448 total.?
  • Another surprising fact was that 217,503 collisions occurred where there was no view obstruction.

Data

The original data set can be found on Kaggle here: 1975-2021 Train Wreck Dataset.

Initially, the Excel file contained 239,488 rows and 141 columns. We worked together to first determine what questions we wanted to answer using this data. Then, we collaborated to decide which columns to utilize and which to delete, based on necessity. Additionally, we did some pretty extensive data cleaning in Excel to standardize the formatting of all of the columns, to determine how to deal with null values and to weed out obvious errors/outliers. It was honestly the dirtiest data set either of us have used, so it was a great opportunity to filter row by row and collaboratively decide how to handle the issues mentioned here. After data cleaning in Excel, we were left with 63 columns and 235,680 rows.

Aggregating the Data

In order to access all of the SQL queries we wrote for this project, please reference our shared GitHub repository here: Train Wreck Github Repository.?

To begin, we investigated and discovered which were the 20 highest nearest stations where train collisions occurred, using this query:

No alt text provided for this image
Houston, 1515, Springfield 765, Baton Rouge 738, Columbus 720, Detroit 656, Jackson 651, Memphis 618, Monroe 595, Greenville 595, Seattle 582, Marion 558, Green Bay 542, Beaumont 538, Dallas 532, Birmingham 518, Chicago 507, Lafayette 503, Decatur 493, Richmond 492, Gary 466

Then, we were curious about which types of vehicles were most often involved in the train wrecks. We wrote this query to investigate this.

No alt text provided for this image
auto 140,960, truck 44,084, truck-trailer 24,006, pick-up truck 8,714, other 6,451, pedestrian 4,827, other motor vehicle 2,615, van 2,197, bus 371, school bus 172

Next, we looked into the most frequent position of the highway user when the collision occurred using this query.

No alt text provided for this image
moving over crossing 169,448, stopped on crossing 40223, stalled or stuck 24,723, trapped by traffic 1,154, blocked by gates 132

Afterwards, we wanted to query what kinds of weather most commonly occur during collisions involving trains.

No alt text provided for this image
clear 155,661, cloudy 49,967, rain 18,548, snow 6,866, fog 4,034, sleet 604

Then, we investigated the number of train collisions across time from 1975 to 2021 using the following query.

No alt text provided for this image
Overall downward trend starting with 11,712 in 1975 to 324 in 2021.

Finally, we wrote this query to understand how view obstruction impacts train wrecks.

No alt text provided for this image
not obstructed 217,503, permanent structure 6,094, vegetation 4,259, other 2,167, topogrophy 1,908, standing railroad equipment 1,705, passing train 986, highway vehicles 829

Data Visualization

First, we created a bar graph to visualize which of the nearest stations had the most frequent train collisions, noticing that Houston was the “winner” by a landslide!

No alt text provided for this image
Houston has almost twice the amount of collisions as any other nearest station.

Next, we investigated the highway user frequency. In other words, which of the highway users were most frequently involved in the train wrecks.

No alt text provided for this image
Autos comprise the most frequent user type, followed by trucks.

Then, we turned our attention to discovering what the weather was like during times when train wrecks occurred. We were surprised to find that the weather was mostly clear!

No alt text provided for this image
Bad weather did not play as significant a role in collisions as we anticipated.

Afterwards, we investigated the most common highway user position. That is to say, where was the highway user when the collision occurred, which was usually crossing the tracks.

No alt text provided for this image
The "Moving over Crossing" is predominantly most problematic.

At this point, we wanted to see the trend over time and discovered that there was an initial upward trend in the mid 1970s, after which time there has been a steady decline in train collisions.

No alt text provided for this image
Fortunately, train collisions are on the decline.

Finally, we created a bar graph to visualize the various types of view obstructions. Interestingly, the vast majority of train collisions cannot be correlated to any kind of obstructed view.

No alt text provided for this image
Interestingly, most incidents occur in the absence of an obstructed view.

Recap & Recommendations

Luckily, my friend recovered from her collapsed lung, broken collarbone and 7 broken ribs. Likewise, her husband's small brain bleed also resolved. However, not all victims are so lucky, which is why it is important to determine what can be done to further increase the safety of trains and train travel.

To recap our major findings:

  • Between 1975 and 2021, there were 235,680 train crashes overall.
  • The majority of the collisions—1,515 in total—occurred close to the Houston railway station.
  • At 140,960, automobiles (cars) were most commonly involved in railroad accidents, followed by trucks at 44,084.
  • Surprisingly, 155,661 collisions, the vast majority, took place when it was clear weather!
  • With 169,448 total instances, "moving over the crossing" was the most common highway user position.?
  • The fact that 217,503 crashes happened without a view obstruction was another unexpected finding.

Additionally, we discovered that over the span of the last forty six years, 124 souls have lost their lives due to train crashes. During that same time period, from 1975 to 2021, 7,863 individuals have been injured during train wrecks. Both of those statistics include passengers and train personnel combined. And while the train collisions have steadily declined over time, we still need to find and implement measures to reduce this occurrence even further.

No alt text provided for this image

In terms of recommendations, we brainstormed some ideas regarding how to make train travel safer and came up with the following thoughts:

  • Focus on investigating ways to reduce collisions near the top 20 nearest stations, starting with Houston which has the greatest number of train collisions by far.
  • Consider running a campaign targeted at automobile drivers and train safety since they are the most frequent types of vehicles to be involved in a train wreck.
  • Cloudy and rainy weather comprise the largest occurrence of adverse weather-related collisions, which means there need to be extra precautions in place specifically designed to decrease the number of train wrecks that occur during these types of weather.
  • Whenever possible, reduce the chances that vehicles and pedestrians are able to obtain access to the train tracks when trains are passing by since most collisions occur when they are crossing them.
  • Since more than 10,000 train collisions were associated with view obstructions from permanent structures and vegetation combined, there needs to be an investigation to see if any of those permanent structures could be relocated and to regularly groom the vegetation so that it is not creating an obstructed view.

Here's the link to the explore the full project in?Tableau.

Action!

I thank you for reading and welcome your feedback! Please consider following me or connecting on LinkedIn at?Carly Jocson. And please keep me in mind for any remote positions as a data analyst!

Mythily R.

Experienced Data Analyst | SQL & Python Enthusiast | Power BI Specialist | Strategic Decision Maker ?????? #DataAnalysis #SQL #Python #PowerBI #BusinessIntelligence

1 年

It is an interesting dataset that you have worked on Carly! With so many columns to choose from, it would have been tough to drill down on which ones to keep. The article is well written. Liked the use of different visuals. I wonder what makes Houston area so prone to accidents. I’m not able to access the GitHub link.I would like to collaborate on a project if any one is interested.

Avery Smith

?? I help people land their first data job (even with no experience) ?? Join 10k+ other analysts & get my newsletter! ??? Host of The Data Career Podcast

1 年

Love the use of the SQL snippets! So easy to read!? For the weather, most accidents occured when conditions were clear. But conditions are probalby usually clear? It would be cool if there was data for number of clear days vs otherwise. That way we could compare proportions instead of absolutes! The line chart was particularily impactful! Loved seeing it.? Overall, great project!!

Jonathan Smith

Director of Research | Author of The Road Map to Data Analytics

1 年

Great job, Caroline! Love the thorough write up!

Christy Ehlert-Mackie, MBA, MSBA

Data Analyst who ?? Excel | SQL | Tableau | I analyze and interpret data so companies have the information and insights they need to make sound business decisions.

1 年

Interesting project, Caroline, especially with train derailments and safety being in the news lately. I like how you used a variety of SQL techniques and Tableau visualizations.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了