Brief Analysis of specific questions on the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database
Ramón De L'Hotellerie de Fallois
Mobile App CMS & ConnectedCommerce Specialist
Ramon De L’Hotellerie - Reproducible Research
July 20, 2020
Synopsis
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Data Processing
The data for this assignment comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
- Storm Data [47Mb]
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
- National Weather Service Storm Data Documentation
- National Climatic Data Center Storm Events FAQ
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database, there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
So I start by calling the necessary libraries, downloading the data to be analyzed and checking the first lines.
A quick analysis of the top of the data framework shows that the data set has 902297 rows and 37 columns.
In the analysis, the following columns will be used in order to form a new data frame by using the following variables:
- EVTYPE
- FATALITIES
- INJURIES
- PROPDMG
- PROPDMGEXP
- CROPDMG
- CROPDMGEXP
- STATE
Finding, across the United States, the most harmful events with respect to population health
In order to be able to find the most harmful events, the total fatalities and the total injuries for each event type are calculated. First, I start calculating the ten most harmful events that caused fatalities and the total of fatalities per event.
Then, I calculate the ten most harmful events that caused injuries and the total of injuries per event.
Finally, I just sum up both, fatalities and injuries per event, together and visually we can see the order of most harmful events across the United States.
Let’s now visualize how big is the difference between the most harmful events across the United States
The conclusion would be that the most harmful event, by far, in terms of the number of casualties across the United States is the Tornado with 96,979 total casualties.
Finding, across the United States, the types of events with the greatest economic consequences
In order to be able to find the types of events causing the greatest economic consequences, I take the calculations of Property Damage Expense and Crop Damage Expenses.
Before any calculation, some data preparation is needed. The damage variable exists of two sets, one alphanumeric (“-DMGEXP”) and one numeric (“-DMG”), therefore, in order to be able to sum both sets as one, converting alphanumerical values into numerical values will be necessary. After being able to calculate the sums with numerical sets, then a new variable will be created, which will add both Crop and Property damage. Let’s start checking the types of values.
So I found different types of units, which need to be converted into dollars:
- K / k is thousand dollars - M / m are million dollars - B / b are billion dollars First, I convert the Property Damage expenses into dollars and print the 5 most current values:
Now that we have the list in decreasing order of the most harmful events in terms of damage to property and crop, let’s put on a visual way.
Results
So according to my analysis of the given data, the two answer to the research questions are:
- Across the United States, which types of events are most harmful with respect to population health? - The 5 most relevant harmful events with respect to population health are Tornado, Excessive Heat, TSTM Wind, Flood and Lightning in this order.
- Across the United States, which types of events have the greatest economic consequences? - The 5 most relevant events causing the greatest economic damage are Flood, Hurrican/Typhoon, Tornado, Storm Surge and Hail in this order, for both Property and Crop Damage.
In other words, and according to these figures, Tornados and Flood seem to be the most destructive events in all senses across the United States.