A Beginner’s Guide to Carry out Extreme Value Analysis (4) - DDF

Intensity Duration Frequency (IDF) or Depth Duration Frequency (DDF) curves are among the most common tools used in water resources management. They are derived from historical rainfall records under the assumption of stationarity. Rainfall IDF/DDF curve information describes the frequency of extreme rainfall events for a variety of duration and intensities.

The following presents a primary procedure to create DDF curves. This part is also inspired by the tutorial of Extreme Rainfall Distribution Estimation with Python.

1. Load precipitation data

This is a hourly data precipitation data. The data have been convert to annual maxima time series for the duration of 1h, 3h, 6h, 12h, and 24h, which is also the accumulation time for precipitation.

import os
import math
from pandas import *
import pandas as pd
from datetime import datetime

%matplotlib inline 
import matplotlib.pyplot as plt
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 15, 9


data = pd.read_csv('precipitation.txt')
data=data.set_index('anno')
data.head()

Have a simple check for each duration

ax = data.plot(title="Annual Maximum Precipitation (mm)", style="o-")
ax.set_xlabel("Year")

2. Carry out extreme value analysis for each duration

This is a primary step that has been shown before (part 2). For the purpose of illustration, the GUM distribution is used for illustration.

import lmoments
from scipy.stats import genextreme,gumbel_r
from numpy import linspace
from statsmodels.distributions.empirical_distribution import ECDF
from collections import OrderedDict

Fit Gumbel distribution for each duration and have a check

gumfit = OrderedDict()
for name in data.columns.values.tolist():     
    dx = (data[name]).dropna()    
    gumfit[name] = lmoments.pelgum(lmoments.samlmu(dx) )  

3. Fit extreme precipitation into theorectical DDF curve for each return year

Setup some return years (e.g., 2, 10, 20, 50, 100, 200)

q2   = 1-1/2.
q10  = 1-1/10.
q20  = 1-1/20.
q50  = 1-1/50.
q100 = 1-1/100.
q200 = 1-1/200.

q = [q2,q10,q20,q50, q100, q200]

Calculate extreme values for each return year and duration

samprobs = OrderedDict()

for name in gumfit.keys():
    samprobs[name] = lmoments.quagum(q, gumfit[name])    

pts = pd.DataFrame.from_dict(samprobs, orient='index')
pts.columns=[2,10,20,50, 100, 200]
pts

convert index into numerical values for axis of plotting

pts.index = [1,3,6,12,24]
pts
pts.plot(style="o-")
plt.xlim([0,30])
plt.ylim([30,200])
ag = pts.plot(style="o-")
ag.set_yscale('log')
ag.set_xscale('log')
plt.xlim([0,30])
plt.ylim([30,200])
plt.xlim([0.7,30])
plt.ylim([30,200])

The simplest DDF curve formula is like

Fit DDF curves based the above formula

numpy.polyfit is used to fit the above curve for each return year in 10, 20 and 100 years.

ddfparams = OrderedDict()
for name in pts.columns.values.tolist(): 
    ddfparams[name] = np.polyfit(np.log(pts[name].index),np.log(pts[name]),1)
    ddfparams[name][1] = np.exp( ddfparams[name][1])

fnl = pd.DataFrame.from_dict(ddfparams, orient='index')
fnl.columns=["n","a"]
fnl   

Setup some duration for interpolation and plotting (here <=30 hours as examples)

samdurations = np.linspace(0.1,30,100)
samdurations[-5:]

Use the DDF regression parameters to calculate DDF values

def regddf(duration, intercepta, expn):
    return intercepta*duration**expn
 
  
fnlt = fnl.T

ddfvalues = OrderedDict()
for name in pts.columns.values.tolist():     
    ddfvalues[name] = regddf(samdurations,fnlt[name]["a"],fnlt[name]["n"])
    
inh = pd.DataFrame.from_dict(ddfvalues, orient='index').T.set_index(samdurations) 

set a understandable column names

inh.columns = ["Tr = 2","Tr = 10","Tr = 20","Tr = 50","Tr = 100", "Tr = 200"]

Have a visualization check

ag = inh.plot()

ag = plt.plot(pts[2],  "o", color="blue")
ag = plt.plot(pts[10], "o", color="orange")
ag = plt.plot(pts[20], "o", color="green")
ag = plt.plot(pts[100],"o", color="purple")
ag = plt.plot(pts[200],"o", color="brown")

plt.xlim([0.7,30])
plt.ylim([50,200])

plt.xlabel("Duration (Hour)")
plt.ylabel("Depth of Precipitation (mm)")
ag = inh.plot()
ag.set_yscale('log')
ag.set_xscale('log')

ag = plt.plot(pts[2],  "o", color="blue")
ag = plt.plot(pts[10], "o", color="orange")
ag = plt.plot(pts[20], "o", color="green")
ag = plt.plot(pts[100],"o", color="purple")
ag = plt.plot(pts[200],"o", color="brown")

plt.xlim([0.7,30])
plt.ylim([50,200])

plt.xlabel("Duration (Hour) in Log Transformation")
plt.ylabel("Depth of Precipitation (mm)")

End Notes

Through this guide I have tried to give you a basic idea how to carry out extreme value analysis (EVA) upon a time series data of interest.

In fact, the analysis will become more complicated in a real practice as high quality data are not always available. Sometimes, you have to spend a lot of time cleaning the data. For example, fill the missing values and check outliers.

In addition, carrying out EVA always fits several distributions, simultaneously. Then a goodness-of-fit measure (e.g., Anderson-Darling test) is used to select the optimal one. This also is the case for creating DDF or IDF curves. Here the Gumbel distribution is used for demonstration. However, an optimal distribution should be selected for each duration.

Moreover, a simple DDF curve is fit in this part. However, there are several curves available in practical applications. Under such a case, scipy.optimize.curve_fit can be used to fit such curves.

This is just a start. You can try more.


要查看或添加评论,请登录

Chonghua Yin的更多文章

社区洞察

其他会员也浏览了