登录查看更多内容

A Beginner’s Guide to Carry out Extreme Value Analysis (4) - DDF

Chonghua Yin

Head of Data Science | Climate Risk & Extreme Event Modeling | AI & Geospatial Analytics

发布日期: 2017年8月22日

Intensity Duration Frequency (IDF) or Depth Duration Frequency (DDF) curves are among the most common tools used in water resources management. They are derived from historical rainfall records under the assumption of stationarity. Rainfall IDF/DDF curve information describes the frequency of extreme rainfall events for a variety of duration and intensities.

The following presents a primary procedure to create DDF curves. This part is also inspired by the tutorial of Extreme Rainfall Distribution Estimation with Python.

1. Load precipitation data

This is a hourly data precipitation data. The data have been convert to annual maxima time series for the duration of 1h, 3h, 6h, 12h, and 24h, which is also the accumulation time for precipitation.

import os
import math
from pandas import *
import pandas as pd
from datetime import datetime

%matplotlib inline 
import matplotlib.pyplot as plt
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 15, 9


data = pd.read_csv('precipitation.txt')
data=data.set_index('anno')
data.head()

Have a simple check for each duration

ax = data.plot(title="Annual Maximum Precipitation (mm)", style="o-")
ax.set_xlabel("Year")

2. Carry out extreme value analysis for each duration

This is a primary step that has been shown before (part 2). For the purpose of illustration, the GUM distribution is used for illustration.

import lmoments
from scipy.stats import genextreme,gumbel_r
from numpy import linspace
from statsmodels.distributions.empirical_distribution import ECDF
from collections import OrderedDict

Fit Gumbel distribution for each duration and have a check

gumfit = OrderedDict()
for name in data.columns.values.tolist():     
    dx = (data[name]).dropna()    
    gumfit[name] = lmoments.pelgum(lmoments.samlmu(dx) )

3. Fit extreme precipitation into theorectical DDF curve for each return year

Setup some return years (e.g., 2, 10, 20, 50, 100, 200)

q2   = 1-1/2.
q10  = 1-1/10.
q20  = 1-1/20.
q50  = 1-1/50.
q100 = 1-1/100.
q200 = 1-1/200.

q = [q2,q10,q20,q50, q100, q200]

Calculate extreme values for each return year and duration

samprobs = OrderedDict()

for name in gumfit.keys():
    samprobs[name] = lmoments.quagum(q, gumfit[name])    

pts = pd.DataFrame.from_dict(samprobs, orient='index')
pts.columns=[2,10,20,50, 100, 200]
pts

convert index into numerical values for axis of plotting

pts.index = [1,3,6,12,24]
pts

pts.plot(style="o-")
plt.xlim([0,30])
plt.ylim([30,200])

ag = pts.plot(style="o-")
ag.set_yscale('log')
ag.set_xscale('log')
plt.xlim([0,30])
plt.ylim([30,200])
plt.xlim([0.7,30])
plt.ylim([30,200])

The simplest DDF curve formula is like

Fit DDF curves based the above formula

numpy.polyfit is used to fit the above curve for each return year in 10, 20 and 100 years.

ddfparams = OrderedDict()
for name in pts.columns.values.tolist(): 
    ddfparams[name] = np.polyfit(np.log(pts[name].index),np.log(pts[name]),1)
    ddfparams[name][1] = np.exp( ddfparams[name][1])

fnl = pd.DataFrame.from_dict(ddfparams, orient='index')
fnl.columns=["n","a"]
fnl

Setup some duration for interpolation and plotting (here <=30 hours as examples)

samdurations = np.linspace(0.1,30,100)
samdurations[-5:]

Use the DDF regression parameters to calculate DDF values

def regddf(duration, intercepta, expn):
    return intercepta*duration**expn

fnlt = fnl.T

ddfvalues = OrderedDict()
for name in pts.columns.values.tolist():     
    ddfvalues[name] = regddf(samdurations,fnlt[name]["a"],fnlt[name]["n"])
    
inh = pd.DataFrame.from_dict(ddfvalues, orient='index').T.set_index(samdurations)

set a understandable column names

inh.columns = ["Tr = 2","Tr = 10","Tr = 20","Tr = 50","Tr = 100", "Tr = 200"]

Have a visualization check

ag = inh.plot()

ag = plt.plot(pts[2],  "o", color="blue")
ag = plt.plot(pts[10], "o", color="orange")
ag = plt.plot(pts[20], "o", color="green")
ag = plt.plot(pts[100],"o", color="purple")
ag = plt.plot(pts[200],"o", color="brown")

plt.xlim([0.7,30])
plt.ylim([50,200])

plt.xlabel("Duration (Hour)")
plt.ylabel("Depth of Precipitation (mm)")

ag = inh.plot()
ag.set_yscale('log')
ag.set_xscale('log')

ag = plt.plot(pts[2],  "o", color="blue")
ag = plt.plot(pts[10], "o", color="orange")
ag = plt.plot(pts[20], "o", color="green")
ag = plt.plot(pts[100],"o", color="purple")
ag = plt.plot(pts[200],"o", color="brown")

plt.xlim([0.7,30])
plt.ylim([50,200])

plt.xlabel("Duration (Hour) in Log Transformation")
plt.ylabel("Depth of Precipitation (mm)")

End Notes

Through this guide I have tried to give you a basic idea how to carry out extreme value analysis (EVA) upon a time series data of interest.

In fact, the analysis will become more complicated in a real practice as high quality data are not always available. Sometimes, you have to spend a lot of time cleaning the data. For example, fill the missing values and check outliers.

In addition, carrying out EVA always fits several distributions, simultaneously. Then a goodness-of-fit measure (e.g., Anderson-Darling test) is used to select the optimal one. This also is the case for creating DDF or IDF curves. Here the Gumbel distribution is used for demonstration. However, an optimal distribution should be selected for each duration.

Moreover, a simple DDF curve is fit in this part. However, there are several curves available in practical applications. Under such a case, scipy.optimize.curve_fit can be used to fit such curves.

This is just a start. You can try more.

要查看或添加评论，请登录

Chonghua Yin的更多文章

SPEI: A Smarter Way to Measure Drought

2025年3月8日

SPEI: A Smarter Way to Measure Drought

When we think about drought, we often focus on rainfall—how much (or little) precipitation a place receives. But is…
NaN Wrangling: LOESS/LOWESS to the Rescue

2025年3月8日

NaN Wrangling: LOESS/LOWESS to the Rescue

Have you ever tried interpolating geospatial data near coastlines, only to find your results ruined by NaN (Not a…

2 条评论
Efficient Geospatial Nearest Neighbor Search with KDTree and xarray

2025年3月1日

Efficient Geospatial Nearest Neighbor Search with KDTree and xarray

When working with large-scale geospatial data, efficient nearest neighbor search is crucial. This article explores how…

1 条评论
Unlocking Data's Potential: Four Types of Analytics

2025年2月21日

Unlocking Data's Potential: Four Types of Analytics

In today's data-driven world, businesses that can harness the power of analytics gain a significant competitive edge…
Analytics: Team Driven

2025年2月19日

Analytics: Team Driven

A data analytics team’s strength doesn’t come from a single exceptional individual but from the collective impact of…
Secret to Product Longevity: Simplicity, Support, and Feedback

2025年2月15日

Secret to Product Longevity: Simplicity, Support, and Feedback

In today's rapidly evolving tech landscape, products constantly emerge and transform. Yet, some stand the test of time,…
Flying High: A Simple Metaphor for Business

2025年2月11日

Flying High: A Simple Metaphor for Business

I recently discussed the relationship between marketing and sales with a friend. During our conversation, he used a…
Project Life Cycle vs. Product Life Cycle: Embracing Agile Product Thinking

2025年2月6日

Project Life Cycle vs. Product Life Cycle: Embracing Agile Product Thinking

In business management, grasping project and product life cycle disparities is paramount. Although both concepts entail…
Separating Data APIs and Business Logic with an API Gateway

2025年1月23日

Separating Data APIs and Business Logic with an API Gateway

Today, I conversed with a friend about separating data APIs from business logic. Coincidentally, my friend is a wine…
Direct Access to NetCDF Files in TAR Archives

2024年8月30日

Direct Access to NetCDF Files in TAR Archives

Recently, I need to validate the performance of wind data from CONUS404 against observational data at a specific site…

See all articles

A Beginner’s Guide to Carry out Extreme Value Analysis (4) - DDF

Chonghua Yin

Head of Data Science | Climate Risk & Extreme Event Modeling | AI & Geospatial Analytics

1. Load precipitation data

Have a simple check for each duration

2. Carry out extreme value analysis for each duration

Fit Gumbel distribution for each duration and have a check

3. Fit extreme precipitation into theorectical DDF curve for each return year

Setup some return years (e.g., 2, 10, 20, 50, 100, 200)

Calculate extreme values for each return year and duration

convert index into numerical values for axis of plotting

The simplest DDF curve formula is like

Fit DDF curves based the above formula

Setup some duration for interpolation and plotting (here <=30 hours as examples)

Use the DDF regression parameters to calculate DDF values

set a understandable column names

Have a visualization check

End Notes

Chonghua Yin的更多文章

社区洞察

其他会员也浏览了

Pandas - Sort DataFrame

Change the data type of columns in Pandas

A Slap in the Face with Pandas

+30 Useful Operations in Pandas ??

A complete Exploratory Data Analysis guide with Python

Using Multiple Regression To Examine What Variables Are Most Correlated With A Movie’s Box Office Success

What is Central Tendency? Mean,Median & Mode

Sup04-Quickly Visualize DEM Attributes with xarray-spatial

Pandas 1.0 - Summary of enhancements and recommendations

Visualize CO2 Time Series with Python

1. Load precipitation data

Have a simple check for each duration

2. Carry out extreme value analysis for each duration

Fit Gumbel distribution for each duration and have a check

3. Fit extreme precipitation into theorectical DDF curve for each return year

Setup some return years (e.g., 2, 10, 20, 50, 100, 200)

Calculate extreme values for each return year and duration

convert index into numerical values for axis of plotting

The simplest DDF curve formula is like

Fit DDF curves based the above formula

Setup some duration for interpolation and plotting (here <=30 hours as examples)

Use the DDF regression parameters to calculate DDF values

set a understandable column names

Have a visualization check

End Notes

Chonghua Yin的更多文章

SPEI: A Smarter Way to Measure Drought

NaN Wrangling: LOESS/LOWESS to the Rescue

Efficient Geospatial Nearest Neighbor Search with KDTree and xarray

Unlocking Data's Potential: Four Types of Analytics

Analytics: Team Driven

Secret to Product Longevity: Simplicity, Support, and Feedback

Flying High: A Simple Metaphor for Business

Project Life Cycle vs. Product Life Cycle: Embracing Agile Product Thinking

Separating Data APIs and Business Logic with an API Gateway

Direct Access to NetCDF Files in TAR Archives

社区洞察

其他会员也浏览了

Pandas - Sort DataFrame

Change the data type of columns in Pandas

A Slap in the Face with Pandas

+30 Useful Operations in Pandas ??

A complete Exploratory Data Analysis guide with Python

Using Multiple Regression To Examine What Variables Are Most Correlated With A Movie’s Box Office Success

What is Central Tendency? Mean,Median & Mode

Sup04-Quickly Visualize DEM Attributes with xarray-spatial

Pandas 1.0 - Summary of enhancements and recommendations

Visualize CO2 Time Series with Python