A Beginner’s Guide to Carry out Extreme Value Analysis (4) - DDF
Chonghua Yin
Head of Data Science | Climate Risk & Extreme Event Modeling | AI & Geospatial Analytics
Intensity Duration Frequency (IDF) or Depth Duration Frequency (DDF) curves are among the most common tools used in water resources management. They are derived from historical rainfall records under the assumption of stationarity. Rainfall IDF/DDF curve information describes the frequency of extreme rainfall events for a variety of duration and intensities.
The following presents a primary procedure to create DDF curves. This part is also inspired by the tutorial of Extreme Rainfall Distribution Estimation with Python.
1. Load precipitation data
This is a hourly data precipitation data. The data have been convert to annual maxima time series for the duration of 1h, 3h, 6h, 12h, and 24h, which is also the accumulation time for precipitation.
import os
import math
from pandas import *
import pandas as pd
from datetime import datetime
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 15, 9
data = pd.read_csv('precipitation.txt')
data=data.set_index('anno')
data.head()
Have a simple check for each duration
ax = data.plot(title="Annual Maximum Precipitation (mm)", style="o-")
ax.set_xlabel("Year")
2. Carry out extreme value analysis for each duration
This is a primary step that has been shown before (part 2). For the purpose of illustration, the GUM distribution is used for illustration.
import lmoments
from scipy.stats import genextreme,gumbel_r
from numpy import linspace
from statsmodels.distributions.empirical_distribution import ECDF
from collections import OrderedDict
Fit Gumbel distribution for each duration and have a check
gumfit = OrderedDict()
for name in data.columns.values.tolist():
dx = (data[name]).dropna()
gumfit[name] = lmoments.pelgum(lmoments.samlmu(dx) )
3. Fit extreme precipitation into theorectical DDF curve for each return year
Setup some return years (e.g., 2, 10, 20, 50, 100, 200)
q2 = 1-1/2. q10 = 1-1/10. q20 = 1-1/20. q50 = 1-1/50. q100 = 1-1/100. q200 = 1-1/200. q = [q2,q10,q20,q50, q100, q200]
Calculate extreme values for each return year and duration
samprobs = OrderedDict()
for name in gumfit.keys():
samprobs[name] = lmoments.quagum(q, gumfit[name])
pts = pd.DataFrame.from_dict(samprobs, orient='index')
pts.columns=[2,10,20,50, 100, 200]
pts
convert index into numerical values for axis of plotting
pts.index = [1,3,6,12,24] pts
pts.plot(style="o-")
plt.xlim([0,30])
plt.ylim([30,200])
ag = pts.plot(style="o-")
ag.set_yscale('log')
ag.set_xscale('log')
plt.xlim([0,30])
plt.ylim([30,200])
plt.xlim([0.7,30])
plt.ylim([30,200])
The simplest DDF curve formula is like
Fit DDF curves based the above formula
numpy.polyfit is used to fit the above curve for each return year in 10, 20 and 100 years.
ddfparams = OrderedDict()
for name in pts.columns.values.tolist():
ddfparams[name] = np.polyfit(np.log(pts[name].index),np.log(pts[name]),1)
ddfparams[name][1] = np.exp( ddfparams[name][1])
fnl = pd.DataFrame.from_dict(ddfparams, orient='index')
fnl.columns=["n","a"]
fnl
Setup some duration for interpolation and plotting (here <=30 hours as examples)
samdurations = np.linspace(0.1,30,100)
samdurations[-5:]
Use the DDF regression parameters to calculate DDF values
def regddf(duration, intercepta, expn):
return intercepta*duration**expn
fnlt = fnl.T
ddfvalues = OrderedDict()
for name in pts.columns.values.tolist():
ddfvalues[name] = regddf(samdurations,fnlt[name]["a"],fnlt[name]["n"])
inh = pd.DataFrame.from_dict(ddfvalues, orient='index').T.set_index(samdurations)
set a understandable column names
inh.columns = ["Tr = 2","Tr = 10","Tr = 20","Tr = 50","Tr = 100", "Tr = 200"]
Have a visualization check
ag = inh.plot()
ag = plt.plot(pts[2], "o", color="blue")
ag = plt.plot(pts[10], "o", color="orange")
ag = plt.plot(pts[20], "o", color="green")
ag = plt.plot(pts[100],"o", color="purple")
ag = plt.plot(pts[200],"o", color="brown")
plt.xlim([0.7,30])
plt.ylim([50,200])
plt.xlabel("Duration (Hour)")
plt.ylabel("Depth of Precipitation (mm)")
ag = inh.plot()
ag.set_yscale('log')
ag.set_xscale('log')
ag = plt.plot(pts[2], "o", color="blue")
ag = plt.plot(pts[10], "o", color="orange")
ag = plt.plot(pts[20], "o", color="green")
ag = plt.plot(pts[100],"o", color="purple")
ag = plt.plot(pts[200],"o", color="brown")
plt.xlim([0.7,30])
plt.ylim([50,200])
plt.xlabel("Duration (Hour) in Log Transformation")
plt.ylabel("Depth of Precipitation (mm)")
End Notes
Through this guide I have tried to give you a basic idea how to carry out extreme value analysis (EVA) upon a time series data of interest.
In fact, the analysis will become more complicated in a real practice as high quality data are not always available. Sometimes, you have to spend a lot of time cleaning the data. For example, fill the missing values and check outliers.
In addition, carrying out EVA always fits several distributions, simultaneously. Then a goodness-of-fit measure (e.g., Anderson-Darling test) is used to select the optimal one. This also is the case for creating DDF or IDF curves. Here the Gumbel distribution is used for demonstration. However, an optimal distribution should be selected for each duration.
Moreover, a simple DDF curve is fit in this part. However, there are several curves available in practical applications. Under such a case, scipy.optimize.curve_fit can be used to fit such curves.
This is just a start. You can try more.