CASE STUDY: Exposing Accident Risks Using Open Data and Excel

CASE STUDY: Exposing Accident Risks Using Open Data and Excel

When I kicked off my career in marketing analytics about 10 years ago, I thought I was good at Excel. Hell, I thought I was great.

Like most users, I treated Excel as a spreadsheet tool with a few analytics and visualization functions built in -- one of the key reasons why people downplay its capabilities. It took a few years, but once I got a taste of what the real MVPs were doing, I couldn't get enough. Now I'm consulting full-time, teaching advanced courses with Lynda.com and Udemy, and realizing that many of Excel's perceived limitations are actually user limitations, propagated by those who lack either the skills or the imagination to think beyond rows, columns, and primitive pie charts.

With the right tools and a little creativity, you'd be amazed at the type of analysis and insight that Excel can deliver -- custom visualizations, geo-spatial maps, predictive analysis, real-time data, and more. In this post, I'll walk through a real-world case study that helps to showcase some of these powerful, yet lesser-known capabilities.

THE CASE:

In late 2014, Open Data Nation issued a challenge: analyze traffic accident records from the city of Cambridge to help inform policy through data-driven insights.

The data set contained records of every accident in the city from 2010-2013, with timestamps, objects involved, and exact location coordinates:

(Full data set available for download here)

In true "open source" style, there were no required tools or software, no process guidelines, and no restrictions in terms of delivery. In other words, anything from advanced econometrics to interpretive dance were fair game.

I opted for Excel as my weapon of choice, and chose to kick things off with some data exploration and good 'ole descriptive analytics. In this case, my objective would be to understand when, where, and ideally why accidents were most likely to take place:

WHEN WERE ACCIDENTS HAPPENING?

This was the simplest piece to answer, but an important one nonetheless. Using stats functions, data validation, and custom cell formatting, I could build out a dynamic heat map to visualize accident rates by hour of day (x-axis) and day of week (y-axis), with the option to drill down by season or accident type:

BOOM. Instant insight, without showing a single data point. Clearly, auto accidents tend to happen on weekdays during rush hour, particularly on Fridays.

WHERE WERE ACCIDENTS MOST COMMON?

When it comes to geospatial maps (or chloropleths if you're fancy), many analytics junkies ditch Excel in favor of online tools or shinier data viz platforms like Tableau. But here's the thing: with the Power Map plug-in (available in most recent versions of Excel), you can drop in raw geo data and create any type of map your heart desires. Animated heat maps? Yup. Zip Code-level data? You know it. Real-time satellite video feeds? Of course not, that's ridiculous.

In this case, I used latitude/longitude data to build out geospatial heat maps for each major accident type (auto, bicycle, pedestrian). Not only did this clearly show the hot spots where accident rates are highest, but it also exposed some insightful -- and otherwise invisible -- trends.

Bottom line? Avoid Prospect St and the intersection of Mass Ave and Vassar (unless you like to live dangerously).

WHY DID ACCIDENTS OCCUR?

There's a big difference between understanding which factors correlate with accidents, and which factors cause them. The former is relatively simple, and the latter is typically derived through statistical modeling (and much larger sample sizes). However, despite the pint-sized data set at hand, I had enough inputs at hand to generate some pretty compelling hypotheses.

For instance, it's clear that accident frequency is a function of traffic volume, which in turn is a function of when people tend to be on the road and which routes they tend to take (which could be validated using both heat maps above). I also had some pretty strong evidence that proximity to congested areas like college campuses (MIT) and tourist attractions (Harvard Square) tend to correlate with higher accident frequencies.

But one element was glaringly absent from the data: weather conditions. All else equal, inclement weather is a huge driver behind accident rates -- particularly in New England where old man winter is cruel and heartless fiend.

To account for the impact of weather, I pulled daily precipitation readings from the NOAA, standardized and joined the data using conditional statements, logical operators, and lookup functions, and used basic statistical functions to compare accident risks under a variety of conditions. Turns out that accidents rates were ~10% higher in light rain, ~33% higher in heavy rain, and ~54% higher in heavy snow -- significant and meaningful differences even considering the small sample size.

Lastly, I needed to find a way to capture some of the more qualitative hazards; things like merging lanes, confusing traffic signals, etc. Rather than trying to force a statistical approach, I simply pulled satellite images of the most dangerous locations:

The results were simple, clear, and incredibly insightful; I found intersections with confusing, dangerous merges (see Mass Ave. & Garden St. above), obstructed stop signs, nightmare-inducing rotaries, and a number of other cases which make you wonder why accidents don't happen more often. 

COMMUNICATING THE FINDINGS:

Once I had addressed the what, where, and (to some extent) the why, it was time to focus on communication -- arguably the most important of the 3 stages of exceptional analytics. Rather than compile a static report or traditional readout, I wanted to bring the data to life. I wanted to build something that users could interact with; something that could actually change behaviors and encourage safer travel habits by exposing accident risks in a clear, powerful, and intuitive way. My approach would be threefold:

Step 1) Designing an interactive UX

Contrary to popular belief, Excel can actually serve as a pretty powerful data viz/BI platform. For example, using a few lines of VBA code and some basic charts and formulas, I was able to build an interface that would allow users to filter views by accident type, compare trends, and drill into hotspots on the map to reveal key stats and satellite images:

Step 2) Integrating real-time data for real-time insight

One of the cooler -- yet lesser known -- functions of Excel is the ability to integrate real-time data. In this case I applied formula-based formatting to my heat map, using NOW() and TODAY() functions to highlight the most relevant data. Better yet, I was able to seamlessly integrate real-time weather conditions from Weather Underground using Excel's WEBSERVICE and FILTERXML functions, and plot the expected accident risk on a custom-built gauge chart:

Step 3) Connecting to Google Maps to create personalized alerts

The final phase of the project -- and unquestionably the most challenging -- began with an idea: what if I could create a tool that acts like Google Directions, but allows users to circumvent areas with high accident risk?

Turns out that's a really tough nut to crack, since Google allows users to input waypoints along a route but not areas to explicitly avoid.

That said, what I was able to do is use WEBSERVICE formulas to pull raw turn-by-turn data for any user-generated start/end points directly into Excel, use text formulas to extract individual streets along each route, match those streets against the accident dataset using INDEX and MATCH functions, and assign a relative risk level (low, medium or high) based on the results:

Certainly not the most scientific approach, but a nice opportunity to showcase how relatively simple functions can be combined to create unique and powerful solutions  -- all within the confines of Excel.

NEXT STEPS:

How would you have approached this analysis, and what tools or datasets would you have used? Is there anything that you would have done differently to take this analysis to the next level?

If you'd like to learn how to build the above tools on your own, check out my Excel Workshop series on Lynda.com, including my brand new course: Excel Workshop: Working with Real-Time Data.

For comprehensive, on-demand advanced Excel training, check out my Advanced Formulas & Functions and Advanced Charts & Graphs courses (both 25% off with the code LINKEDIN25).

ubong inokon

Logistics HSE Support at Shell Petroleum Development Company Nigeria Limited

7 年

Hi, Please can someone help me with the accident trend in the oil and gas industry in Nigeria

回复
Kenny McMillan PhD

Sports Performance | Analytics | Technology | Management. gogetfunding.com/raising-money-for-animal-welfare-in-qatar/

8 年

Amazing :-)

回复
Matthew Maloney

Director of Business Development at Sustainable Growth Advisers, LP

8 年

This is wizard level stuff.

要查看或添加评论,请登录

Chris Dutton的更多文章

社区洞察

其他会员也浏览了