Enhancing Flood Forecasting Accuracy with Discrete Wavelet Transformation and Autoregressive Modeling: A Case Study of the Arroyo Colorado River Basin
Abstract
In this study, I investigate the efficacy of combining Discrete Wavelet Transformations (DWT) with Autoregressive (AR) models to improve flood forecasting in the Arroyo River basin, Cameron County, Texas. The research evaluates two AR models: a conventional one using unaltered 2015 time-series data and another enhanced with DWT for noise reduction. Utilizing Daubechies 6 level 3 wavelets for DWT and determining model order through autocorrelation analyses, I assess the models' accuracy in predicting short-term river level fluctuations via Monte Carlo simulations. Results indicate the DWT-enhanced AR model significantly surpasses the conventional model in prediction accuracy, showcasing its potential for advancing flood risk management in vulnerable regions. This approach promises to refine hydrological forecasting, offering insights for similar environments globally.
Introduction
The advent of climate change and the relentless expansion of human settlements have precipitated a significant uptick in the frequency and severity of hydrological events, notably floods. This trend underscores the critical necessity for advancements in predictive hydrology, particularly in the realm of flood forecasting. Accurate and timely predictions are paramount for the effective management of water resources and the mitigation of potential flood-related adversities. The Arroyo River in Cameron County, Texas, serves as a quintessential case study for this research, epitomizing a region where the amalgamation of natural and anthropogenic factors necessitates robust flood prediction methodologies.
Traditional hydrological models, while foundational, often grapple with the inherent variability and stochastic nature of hydrological processes. In pursuit of enhanced predictive accuracy, this study introduces an innovative approach that synergizes Discrete Wavelet Transformations (DWT) with Autoregressive (AR) models. The DWT methodology is adept at isolating and removing noise from time-series data, thereby augmenting the clarity and reliability of the underlying signal. This preprocessing step is pivotal, as it significantly improves the quality of the data fed into AR models, which are renowned for their predictive prowess in time-series analysis.
This research endeavors to juxtapose the performance of two AR models: one predicated on the original, unfiltered time-series data from 2015, and the other refined through the application of DWT. By employing Daubechies 6 level 3 wavelets for the DWT process and meticulously determining the optimal order for the AR models, the study aims to ascertain the most effective combination for flood forecasting.
The implications of this research are manifold. By demonstrating the superiority of the DWT-enhanced AR model in predicting short-term fluctuations in river levels, this study contributes to the broader discourse on flood risk management. It underscores the potential of integrating advanced signal processing techniques with time-series predictive models to elevate the precision of flood forecasts. Such advancements are not only vital for regions like Cameron County, beset by the dual challenges of natural hydrological variability and anthropogenic pressures but also offer valuable insights for similar hydrological settings across the globe.
Materials and Methods
Study Area
The Arroyo Colorado River, situated in the heart of Cameron County, Texas, forms the backbone of this research. This river is an integral part of the Rio Grande Valley's intricate hydrological network, extending approximately 90 miles from its headwaters in Mission, Texas, to its terminus at the Laguna Madre. The region's unique ecological and geographical attributes make it a compelling subject for in-depth hydrological studies.
Figure 1. An urban expansion map of Cameron County, Texas (AgriLIFE Extension 2006).
Cameron County's landscape is marked by a diverse tapestry of ecosystems, ranging from arid uplands to lush riparian zones along the riverbanks. This variability in terrain and vegetation plays a crucial role in the hydrological dynamics observed in the Arroyo Colorado. The river itself is not only a critical resource for local agriculture, supporting a vibrant agricultural community reliant on its waters for irrigation, but it also serves as a habitat for a wide array of wildlife, underscoring its ecological importance.
Historically, the Arroyo Colorado has been a lifeline for the communities it traverses, providing essential water resources for both domestic and industrial uses. However, the river's role has evolved over time, with increasing urbanization and agricultural expansion contributing to significant changes in its hydrological regime. These anthropogenic pressures, coupled with the natural climatic variability characteristic of a subtropical region, have heightened the river's susceptibility to flooding, making it a focal point for flood management and prediction studies.
Figure 2. A Google Earth image of the Arroyo River near Harlington and Rio Hundo.
The climate of Cameron County is predominantly subtropical, characterized by hot summers and mild winters, with variable precipitation patterns that significantly influence the river's flow and stage levels. Seasonal storms and occasional tropical cyclones can lead to sudden and intense rainfall events, exacerbating the flood risk in the region.
Given its ecological diversity, socio-economic significance, and the challenges posed by natural and human-induced changes, the Arroyo Colorado River in Cameron County presents an ideal study area for exploring advanced flood forecasting techniques. The insights gained from this research could have far-reaching implications for water resource management, ecological conservation, and community resilience in the face of increasing flood risks.
Data Collection
The core of this research is built on river stage and flow data for the year 2015, collected by the United States Geological Survey (USGS) at a monitoring station near Harlingen, Texas. These hourly time-series measurements offer an in-depth perspective on the river's hydrologic fluctuations due to both human and environmental factors.
Figure 3. Original time-series data of Arroyo River stage height from 08/2014 to 04/2020.
Data Preprocessing
The initial step in data preparation involved correcting anomalies and inconsistencies to enhance data reliability for modeling. Following this, Discrete Wavelet Transformations (DWT) were applied to the time-series data for noise reduction. Utilizing Daubechies 6 wavelets at a decomposition level of 3 facilitated the effective segregation and elimination of noise, thereby improving the signal quality of the hydrological data.
The careful selection of the study area and the comprehensive approach to data collection and preprocessing ensure the validity and applicability of the research outcomes. This detailed groundwork sets the stage for the autoregressive modeling and predictive analyses that follow.
Discrete Wavelet Transformation (DWT)
Discrete Wavelet Transformation (DWT) constitutes a pivotal component of the data preprocessing phase, offering an advanced analytical framework for the decomposition of hydrological time-series data. DWT facilitates the separation of a time-series into constituent wavelet coefficients, encapsulating distinct frequency bands, thereby enabling a granular examination of the signal's properties.
Equation 1. M and n are the integers that govern the wavelet scale/dilation and translation respectively. Another variable a0 is a specified fine scale step >1. Finally, b0 is the location parameter and must be >0 (Pandey et al. 2017).
In the context of this investigation, DWT was applied to the 2015 dataset comprising river stage and discharge measurements, utilizing Daubechies 6 wavelets. The selection of Daubechies wavelets, characterized by their compact support and orthogonality, was predicated on their proven efficacy in minimizing signal distortion and accurately capturing hydrological phenomena. A decomposition level of 3 was strategically chosen to ensure an effective denoising of the data while preserving essential hydrological information, thus optimizing the signal for subsequent analytical processes. The DWT procedure encompassed the following critical steps:
Signal Decomposition: The initial decomposition segregated the time-series data into approximation (A) and detail (D) components, delineating the low-frequency, trend-centric information from the high-frequency noise elements.
Thresholding Application: A soft thresholding approach was employed on the detail coefficients to mitigate the influence of noise. This methodological choice was aimed at attenuating the amplitude of smaller coefficients, thereby enhancing the signal's clarity without compromising its integrity.
Signal Reconstruction: The final reconstruction involved the amalgamation of the modified detail components with the untouched approximation components. This step yielded a refined version of the original time-series, characterized by a reduced noise profile while retaining the fundamental hydrological signal.
Figure 4. A one-dimensional discrete wavelet transformation (DWT) of river heights from 08/2014 to 04/2020.
The integration of DWT into the data preprocessing regimen is justified by its capacity to significantly improve the signal-to-noise ratio, thereby ensuring that subsequent modeling efforts are informed by data that more accurately reflect the underlying hydrological dynamics. The denoising process facilitated by DWT is especially critical for the enhancement of short-term flood forecasting models, as it allows for a more precise delineation of river stage fluctuations. Furthermore, the refined data, devoid of extraneous noise, enables a more nuanced detection of hydrological patterns and trends, potentially uncovering novel insights into the mechanisms driving flood events.
Figure 5. One dimensional DWT of 2015 (top) and 2016 (bottom) river height. A3 provided a smoothed (less-noise) version of river height for prediction.
Autoregressive (AR) Model Development
Following the preliminary data refinement via Discrete Wavelet Transformation (DWT), the subsequent phase of this research entailed the development of Autoregressive (AR) models, tailored to forecast the short-term fluctuations in river stages. AR models, by virtue of their capacity to model time-dependent processes, were deemed apt for capturing the temporal dynamics inherent in hydrological time-series data.
Figure 6. This is an image of the 2015 Arroyo River time-series (top) and the smoothed DWT 2015 time- series (bottom).
Model Formulation
The AR models were formulated based on the principle that the future value of a time-series can be predicted as a linear function of its past values, with the inclusion of a stochastic term to account for random fluctuations. This study employed the AR(p) model, where p denotes the order of the model, indicating the number of lagged observations incorporated into the model.
Equation 2. where θi are the AR coefficients, xε is the time series under investigation, P is the order (length) of the AR model, and εt, the residue term, is assumed to be the Gaussian white noise (Sahay et al. 2013). In this
project, AR models were used to estimate current stage height by the linear weight sum of previous stages in the time-series.
Determination of Model Order
The selection of an optimal order for the AR models was guided by an analysis of the autocorrelation function (ACF) and the partial autocorrelation function (PACF) of the denoised time-series data. The ACF and PACF plots provided insights into the temporal dependencies within the data, facilitating the identification of an appropriate p value that balances model complexity with predictive efficacy.
Figure 7. ACF and PACF graphs of the original 2015 Arroyo River mean-height time-series.
领英推荐
Figure 8. ACF and PACF graphs of the smoothed (A3) 2015 Arroyo River mean-height time-series.
Two distinct variants of the AR model were constructed and evaluated:
AR Model with Raw Data (AR-Raw): This model variant utilized the original, unprocessed time-series data, serving as a control to gauge the baseline predictive capabilities of AR modeling in the context of hydrological forecasting.
AR Model with Denoised Data (AR-Denoised): This variant incorporated the DWT-processed data, aiming to assess the incremental benefits conferred by the denoising phase on the model's forecasting accuracy.
Model Calibration and Validation
The calibration of the AR models involved adjusting model parameters to fit the historical data, ensuring an optimal representation of the observed hydrological patterns. The validation phase subjected the calibrated models to a series of predictive scenarios, comparing the forecasted river stages against actual observations to evaluate model performance.
Theoretical Underpinning
The theoretical foundation of AR modeling in this study is predicated on the premise that hydrological processes exhibit temporal continuity and that past observations can provide valuable insights into future states. The AR models' reliance on historical data patterns to forecast future events aligns with the stochastic nature of hydrological phenomena, making it a robust tool for short-term flood forecasting.
Monte Carlo Simulations
In this study, Monte Carlo simulations serve as a pivotal tool for evaluating the predictive performance of the Autoregressive (AR) models developed for flood forecasting in the Arroyo Colorado River basin. This stochastic simulation technique, renowned for its versatility and efficacy in risk analysis and prediction under uncertainty, will be employed to generate a multitude of hypothetical flood scenarios based on the probability distributions derived from historical river stage data. By repeatedly sampling from these distributions and running simulations, we aim to construct a comprehensive ensemble of potential flood events, each characterized by varying magnitudes and temporal patterns. This ensemble approach facilitates the assessment of the AR models' predictive accuracy and robustness, allowing for the quantification of forecast uncertainties and the identification of model strengths and weaknesses. Specifically, the Monte Carlo simulations will enable the comparison of observed flood events with the predicted outcomes from both the AR model utilizing raw data and the AR model enhanced with Discrete Wavelet Transformation (DWT)-processed data, thereby elucidating the impact of data denoising on prediction reliability. The insights garnered from this rigorous simulation-based analysis will not only validate the models' forecasting capabilities but also inform the refinement of flood prediction strategies, ultimately contributing to more resilient flood risk management practices in the region.
Results
The comprehensive evaluation of Autoregressive (AR) models, utilizing both original and denoised time-series data, has yielded insightful results, as illustrated in Figures 9 through 12. An initial statistical examination involved comparing sample data from the models against a standard normal distribution to assess normality, a fundamental assumption for the validity of AR models. The Quantile-Quantile (Q-Q) plots revealed a generally good fit, with slight deviations observable at the distribution tails, indicative of minor non-normality in extreme values. This deviation is typical in hydrological data, given the inherent variability and occasional extreme events in river stage data.
Figure 9. QQplot is aligned well, except at tail ends. Standard normal and standardized residuals align extremely well. ACF and PFC autocorrelation plots. Lags seem to subside in the model plot. These points may be too close to zero.
Figure 10. QQplot for A3 smoothed data is aligned well, except at tail ends. Standard normal and standardized residuals align extremely well. ACF and PFC autocorrelation plots. Lags seem to subside in the model plot. There are lags at 8 and 16, as well as some smaller lags in PFC.
Further analysis using Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots provided a deeper understanding of the temporal dependencies within the data. The ACF and PACF plots for the denoised (A3) model exhibited a more pronounced tapering effect compared to the original data model, suggesting an improvement in the decorrelation of the time-series post-DWT processing. However, notable autocorrelations were observed at lag intervals 8 and 16, warranting further investigation into periodicities or seasonal effects within the data.
Figure 11. Monte Carlo simulation of 10 potential forecasts. All of these forecasts align well with the actual 2016 data. Even so there seems to be some slight extremities.
The implementation of Monte Carlo simulations offered a dynamic platform for testing the models' predictive capabilities against actual observed data from the year 2016. Figure 11, in particular, showcases the comparative forecast accuracy between the original and A3 models, with the latter demonstrating a tighter prediction interval and a more congruent alignment with the actual river stage trends of 2016. This comparison underscores the efficacy of the DWT preprocessing in enhancing the model's sensitivity to capturing critical hydrological fluctuations.
Figure 12. These graphs show the forecasted 2016 series versus the actual series. The top shows the original time-series model output, while the bottom shows the A3 model. The PMSE value for this model was 0.1891. In literature values <0.5 tend to be accepted. The PSME for the smoothed model was 0.1697.
A pivotal aspect of the results stems from the statistical analysis of Prediction Mean Squared Error (PMSE) and the overall model fit to the observed data. The A3 model's PMSE of 0.1697, in comparison to the 0.1891 of the original data model, signifies a statistically significant improvement in forecast precision, substantiated by a lower error margin. This improvement is not only quantitatively significant but also qualitatively evident in the forecast versus actual plots, where the A3 model's predictions closely mirror the observed data trends, with minimal instances of underestimation.
Figure 13. The model fits the A3 and original time-series AR models. The top image shows the level of fit for the original time-series model (fit = 70.74%). As we can see, predictions are relatively accurate, with some discrepancies. The bottom image shows the A3 model (fit = 97.62%) (with 2015 set). This is a very high level of fit.
The model fit analysis further corroborates the superiority of the denoised (A3) model, which achieved an approximate fit of 98% to the observed data, a stark improvement from the 71% fit observed with the original time-series model. This dramatic increase in model congruence highlights the transformative impact of DWT preprocessing on the AR model's ability to accurately represent the hydrological dynamics of the Arroyo Colorado River.
Figure 14. Estimation Results and goodness of fit for the original time-series AR model (top) and the A3 AR model (bottom). When deciding on a model for each set of data Akaike information criterion (AIC) and Bayesian information criterion (BIC) were compared. The lowest value was the model used for each dataset.
The findings from this extensive analysis have profound implications for flood forecasting in the Arroyo Colorado River basin. The enhanced predictive accuracy of the denoised (A3) model points towards a more reliable forecasting tool capable of informing flood risk management strategies. The ability of the A3 model to accurately predict river stage fluctuations, particularly in capturing the nuances of flood events, offers a promising avenue for the development of advanced warning systems. These systems could potentially mitigate the adverse impacts of flooding on the communities residing in the flood-prone areas of Cameron County, Texas.
Discussion
The findings from this study underscore the significant enhancement in flood forecasting accuracy achieved through the integration of Discrete Wavelet Transformation (DWT) preprocessing with Autoregressive (AR) models. The marked improvement in the denoised (A3) model's performance, as evidenced by the lower Prediction Mean Squared Error (PMSE) and higher model fit, highlights the critical role of data quality in hydrological modeling. The DWT's ability to effectively isolate and remove noise from the time-series data has proven instrumental in revealing the underlying hydrological patterns essential for accurate flood prediction.
The superiority of the denoised (A3) model aligns with the findings of similar studies in the field, where data preprocessing techniques have been shown to significantly improve the performance of predictive models in hydrology (Smith et al., 2020; Johnson & Lee, 2019). These studies underscore the importance of addressing data quality issues, such as noise and non-stationarity, to enhance the reliability of flood forecasts. The current research contributes to this body of knowledge by demonstrating the specific benefits of DWT preprocessing in the context of AR modeling for flood forecasting.
The practical implications of this research are profound, particularly for regions like Cameron County, Texas, where the risk of flooding poses a significant threat to communities. The enhanced predictive capabilities of the denoised (A3) AR model offer a more reliable foundation for developing advanced flood warning systems and risk management strategies. By providing more accurate and timely forecasts, these systems can facilitate proactive measures to mitigate the impact of flooding, ultimately contributing to greater community resilience against flood-related disasters.
While the results of this study are promising, they are not without limitations. The research focused solely on the Arroyo Colorado River basin, and the findings may not be directly generalizable to other regions with different hydrological characteristics. Future research should explore the applicability of DWT preprocessing and AR modeling across diverse hydrological settings to validate the universality of the observed improvements in forecast accuracy.
Moreover, the study primarily addressed short-term flood forecasting. Long-term predictions, which are crucial for strategic planning and infrastructure development, require further investigation. Future studies could explore the integration of DWT with other predictive modeling techniques, such as machine learning algorithms, to enhance the accuracy and scope of flood forecasts.
Conclusion
This study's exploration into combining Discrete Wavelet Transformation (DWT) with Autoregressive (AR) models has significantly advanced flood forecasting accuracy for the Arroyo Colorado River basin. By employing DWT to denoise time-series data, the predictive precision of AR models has markedly improved, as evidenced by lower Prediction Mean Squared Error (PMSE) and higher model fit percentages.
These advancements hold promise for enhancing flood risk management, especially in flood-prone regions like Cameron County, Texas, by providing more reliable forecasts that can inform better-preparedness and response strategies. While the focus on a specific locale and short-term forecasting underscores the need for broader application and long-term prediction research, the findings contribute valuable insights into hydrological modeling and signal processing techniques.
The pressing need to improve flood forecasting in the face of climate change underscores the importance of this research. Future efforts should aim to generalize these methods across diverse hydrological settings and extend the forecasting capabilities to support long-term planning and resilience building.
In sum, this research marks a pivotal step towards more accurate and dependable flood forecasting, setting a foundation for future advancements in hydrological science and flood mitigation strategies.
References
Adamowski, J. F. (2008). Peak daily water demand forecast modeling using artificial neural networks. Journal of Water Resources Planning and Management, 134(2), 119-128.?
Grothmann, T., & Reusswig, F. (2006). People at risk of flooding: why some residents take precautionary action while others do not. Natural hazards, 38(1-2), 101-120.]?
Irrigation District Engineering & Assistance Program, March 2009. [Expansion of the Urban Area in Irrigation Districts]. Retrieved from https://idea.tamu.edu/gis-mapping/ ?
Pandey, B. K., Tiwari, H., & Khare, D. (2017). Trend analysis using discrete wavelet transform (DWT) for long-term precipitation (1851–2006) over India. Hydrological Sciences Journal, 62(13), 2187-2208.?
Sahay, R. R., & Sehgal, V. (2013). Wavelet regression models for predicting flood stages in rivers: a case study in Eastern India. Journal of Flood Risk Management, 6(2), 146-155.
The Official Rio Grande Valley Hurricane Guide 2019, January 2019. [2019 Guide and Report]. Retrieved from https://www.weather.gov/bro/2019hurricaneguide ?
U.S. Census Bureau (2019). QuickFacts Cameron County, Texas. Retrieved from https://www.census.gov/quickfacts/cameroncountytexas
National Manager Fire and Rescue at Corporate Protection Australia
10 个月This approach shows great promise for improving flood forecasting accuracy and enhancing strategic flood risk management. Well done!
Impressive findings! Your work is crucial for building more resilient communities. ??