-
Notifications
You must be signed in to change notification settings - Fork 0
/
Readme.Rmd
195 lines (176 loc) · 10.4 KB
/
Readme.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
---
title: "Finding the most harmful weather disasters"
author: "Reza Hosseini"
date: "March 13, 2016"
output: md_document
---
#Synopsis
Storms and other sever weather events can cause health and economic problems. In this report I explore the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database in order to find weather disasters whit the most health and economic problems.
The results show that Tornado has been the most harmfull event both in the case human fatalities and injuries with more than 5600 fatalities and 90 thousand injuries from 1950 to today. After that with a big gap comes severe heat with more than 3000 death and around 10 thousand injuries.
In the sense of economic consequences, the worse event is flood with more than 140 billion dollars following by Hurricane with 77 and Storm surge with 42 billion dollars.
#Data Proccesing
The NOAA storm database contains a variety of information about individual weather disasters including their time and place of occurrence, property damage, number of death and injuries, etc.
The analysis is done in two steps.
-- Downloading and loading the data.
-- Summarizing and tiding up the data.
###Downloading and loading the data
The data will be downloaded directly from the web (if not already in the data folder under the working directory) and loaded into R. This step is quite time consuming due to volume of the data and will be cashed for the sake of efficiency.
```{r cache = TRUE}
if (!dir.exists("data")) dir.create("data")
if (!file.exists("./data/Storm-Data.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "./data/Storm-Data.csv.bz2")
}
storms <- read.csv("./data/Storm-Data.csv.bz2")
```
###Summerizing the data
Among fields of the data set,some columns are the most important for us. "EVTYPE", the type of the event, "FATALITIES", number of fatalities, "INJURIES" number of injuries, "PROPDMG", "PROPDMGEXP", "CROPDMG" and "CROPDMGEXP", respectively, short version numbers of property and cropland damages and the corresponding exponents (for instance 5 "PROPDMG" with "PROPDMGEXP" of "M" means 5 million dollars property damage).
To have the actual damage, exponents needs to be converted to numbers. Unfortunately there is no proper explanation of the units in the documentation, but searching in the web I could conclude that the conversion table is as follows which I saved them in a variable named 'convert'.
```{r}
convert <- data.frame(exp = c(as.character(levels(storms$PROPDMGEXP)),"k"), value = c(0,0,0,10,10,10,10,10,10,10,10,10,10,1000000000,100,100,1000,1000000,100000, 1000))
```
```{r echo = F}
convert
```
To convert these units (exponents) to numbers I write a function named "relable" which takes the exponents field and the converting table and output a vector of numbers corresponding to the input exponents.
```{r}
relable <- function(oldfactor, convert){
newvalue <- as.character(oldfactor)
for (i in unique(newvalue)){
newvalue[newvalue == i] <- convert$value[convert$exp == i]
}
as.numeric(newvalue)
}
```
Using the "relable" function I produce two vectors corresponding to the property and cropland expenses of the events in the data set.
```{r}
propexp <- relable(storms$PROPDMGEXP, convert)
cropexp <- relable(storms$CROPDMGEXP, convert)
```
Using these two vectors I computed the actual property, cropland and total damage of each event.
```{r message=F}
library(plyr)
library(dplyr)
library(data.table)
storms <- as.data.table(storms)
storms[, PropDamage := propexp * PROPDMG]
storms[, CropDamage := cropexp * CROPDMG]
storms[, TotalDamage := PropDamage + CropDamage]
```
Then I subset the data set to events which actually made human or economic problems and summarized the data based on the total damage, fatality and injuries made by each event type.
```{r}
#storm with positive damage ...
harmfullStorms <- storms[TotalDamage > 0 | FATALITIES > 0 | INJURIES > 0,]
#summerize
DamageType <- ddply(harmfullStorms, .(EVTYPE), summarise,
TotalDamage = sum(TotalDamage),
PropDamage = sum(PropDamage),
CropDamage = sum(CropDamage),
TotalFatality = sum(FATALITIES),
TotalInjuries = sum(INJURIES))
```
Summarizing is not finished yet. There are lots of duplicates in the data set. Lots of similar event types are stored separately due to different spelling, spacing and using special characters. Plus some events are saved with a too much detailed name.
First of all I noted that TORNADO once was misspelled as TORNAO and corrected it.
```{r}
DamageType[DamageType$EVTYPE == "TORNAO", 1] <- "TORNADO"
```
To have a more neat data set, I wrote a function which gets a word and merge the events which have that word in their EVETYPE.
```{r}
sumduplicate <- function(pattern, expensetable){
index <- grep(tolower(pattern), tolower(expensetable[,1]))
sum <- apply(expensetable[index,-1],2,sum)
new <- data.frame(pattern, t(sum))
names(new) <- names(expensetable)
rbind(expensetable[-index,],new)
}
```
Exploring the data set I concluded that one should merge the rows containing the following words together.
```{r}
namelist <- c("Flood", "Thunderstorm", "TSTM", "Rain", "Tornado", "Fire", "Squall", "Freez", "Hurricane", "Ice", "Hail", "waterspout", "Heat", "Glaze", "precipitation", "microburst", "Wet", "winter", "Cold", "Tsunami", "Snow", "Stream", "Lightning", "Dust", "Wintry Mix", "Urban", "Wind")
```
I ran this list of words in the "sumdupliacte" function mentioned above. Making a much more compact data set.
```{r}
for ( x in namelist){
DamageType <- sumduplicate(x, expensetable = DamageType)
}
```
I also noted that Thunderstorm sometimes where called TSTM, so I merged the two occurrences.
```{r}
#Thunderstorm and TSTM are the same
index <- which(DamageType$EVTYPE == "Thunderstorm" |
DamageType$EVTYPE == "TSTM")
sum <- apply(DamageType[index,-1],2,sum)
new <- data.frame("Thunderstorm", t(sum))
names(new) <- names(DamageType)
DamageType <- rbind(DamageType[-index,], new)
```
The data is acceptably summarized. As the last step, I capitalized the first letter of the event types names to be shown more nicely in the plots.
```{r message= F}
library(Hmisc)
DamageType$EVTYPE <- capitalize(tolower(DamageType$EVTYPE))
```
#Results
In this section we answer the main questions of the analysis i.e. finding the most harmful weather events to human health and economy and show the results in plots.
####Fatalities and injuries
To have an understandable diagram, I summarized the data by the events having at least 30 fatalities or 100 injuries.
```{r}
#plot injuries and fatalities
FatInj <- subset(DamageType, TotalFatality > 30 | TotalInjuries > 100)
```
Then the remaining data are arranged descendingly by the total fatalities and then total injuries.
```{r}
FatInj <- arrange(FatInj, desc(TotalFatality), desc(TotalInjuries))
```
Finally a bar plot which shows total injuries on top of total fatalities so the total height of the bar is the sum of fatalities and injuries of each event. Tornado by a very big fatality and a huge total number of injuries is showed with separate color at the most right side of the plot with a different scale. So that the other event could be see-able.
```{r fig.width=10, fig.height=5}
par(mar = c(8.4,4,4,4) + .1)
barplot(t(as.matrix(rbind(FatInj[-1,c("TotalFatality", "TotalInjuries")],c(0,0)))),
horiz = F, names.arg = c(FatInj$EVTYPE[-1],FatInj$EVTYPE[1]), las = 2,
col = c("blue", "gray"), beside = F,
legend.text = T, args.legend = c(xjust = 1.5))
barplot(t(as.matrix(FatInj[1,c("TotalFatality", "TotalInjuries")]))/7, add = T,
las =2, names.arg = "", space = dim(FatInj)[1]*1.2 -1,
col = c("red", "gray28"), beside = F)
text(dim(FatInj)[1]*1.2 -.5, FatInj$TotalFatality[1]/10 + 1400,
FatInj$TotalFatality[1],
srt = 90, col = "red")
text(dim(FatInj)[1]*1.2 -.5, FatInj$TotalInjuries[1]/10 + 1200,
FatInj$TotalInjuries[1] + FatInj$TotalFatality[1],
srt = 90, col = "white")
title(main = "Total number of fatalities and injuries cased by weather events 1950-2015")
```
One can see that the most harmful type of weather event has been Tornado and after that with a big difference come heat and flood.
####Econimical expenses
Economical expenses consists of property damages and cropland damages. We draw a barplot showing cropland damages on the top of property damage. Three of the event types make considerably more damage with respect to others, so I plotted them on the right most of the plot with different scale and color. The total damage made by them are written in million dollars inside their corresponding bars.
First of all the data are subset with just the damages over 10 million dollars for the sake of simplicity.
```{r}
SevereDamage <- subset(DamageType, TotalDamage >= 1e7)
SevereDamage <- arrange(SevereDamage, desc(TotalDamage))
```
Then the plot will be drawn. Some computations need to be done to put the outlires on the right with different scale
```{r fig.width=10, fig.height=5}
newdamage <- SevereDamage[,c("PropDamage","CropDamage")]/1e9
newdamage <- as.matrix(rbind(newdamage[-c(1:3),], newdamage[1:3,]))
l <- length(newdamage[,1])
#plotting
par(mar = c(8.4,4,4,4) + .1)
barplot(t(rbind(newdamage[1:(l-3),],matrix(0,nrow = 3,ncol = 2))),
names.arg = c(SevereDamage$EVTYPE[-(1:3)], SevereDamage$EVTYPE[1:3]),
las =2, col = c("blue", "gray"))
barplot(t(tail(newdamage,3))/15, las =2, add = T, col = c("red","gray"),
space = c(l-3+(l-2)*.2,.2,.2), names.arg = c("","",""))
title(main = "Sever damages made by natural disasters",
ylab = "Total Damage (billion $)")
#writing values
for (i in 14:(l-3)){
text(i*1.2-.5, sum(newdamage[i,])+1.300, round(sum(newdamage[i,]), digits = 2), srt = 90)
}
for (i in (l-2):l){
text(i*1.2-.5, 2.000, round(sum(newdamage[i,]), digits = 0), srt = 90)
}
```
As it is seen the most expensive type of weather event has been flood, after it come Hurricane and Storm surge. The fourth expensive event which is Tornado has made very fewer damage with respect to these three.
Plus in all cases except Ice and Heat the amount of property damage is much more that the amount of cropland damage.
#References
Storm database, U.S. National Oceanic and Atmospheric Administration's (NOAA), available online at
https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2