Spatial-temporal Decomposition Methods in Climate Data Analysis
Large datasets are increasingly widespread in many disciplines, which absolutely include Climate and Weather. The climate system is the result of highly complex interactions between many degrees of freedom or modes. In order to gain insight into understanding the dynamical/physical behavior involved, methods are required to drastically reduce their dimensionality in an interpretable way, such that most of the information in the data is preserved. This has led to the development by atmospheric researchers of methods that give a space display and a time display of large space-time atmospheric data.
Among these methods, principal component analysis (PCA)/Empirical Orthogonal Function (EOF) is one of the oldest and most widely used. In order to overcome some limitations of classical EOF/PCA analysis and make the resulting patterns more physically interpretable, many extensions have been developed such as rotated EOFs, Extended EOFs, and complex EOFs. A review of PCA/EOFs can be found in Kutzbach (1967), Hannachi (2004), and Hannachi et al. (2007). Non-EOF methods are also mentioned a little bit.
This short note does not intend to list all EOF-related methods. It is just a small memorandum where to get a clue when a data analysis requires a spatial-temporal decomposition method.
Methods
· PCA – Principal Component Analysis
PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The earliest literature on PCA dates from Pearson (1901) and Hotelling (1933). PCA can be presented as
y = v’x.
In simple words, the principal component analysis is a method of extracting important variables (in form of components) from a large set of variables available in a data set. It extracts a low dimensional set of features from a high dimensional data set with a motive to capture as much information as possible. With fewer variables, visualization also becomes much more meaningful. PCA is more useful when dealing with 3 or higher dimensional data.
The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible.
· EOF - Empirical Orthogonal Function
One discipline in which PCA has been widely used is atmospheric science. It was first suggested in that field by Obukhov (1947) and Lorenz (1956) and, uniquely to that discipline, it is usually known as empirical orthogonal function (EOF) analysis. EOF analysis is a standard method in the earth and marine sciences for exploring spatio-temporal variation in a variable. The simplicity and the analytic derivation of EOFs are the main reasons behind its popularity in atmospheric science. EOF can be presented as
X = VY.
The original purpose of EOFs was to reduce a large number of variables of the original data to a few variables, but without compromising much of the explained variance. Lately, however, EOF analysis has been used to extract individual modes of variability such as the Arctic Oscillation (AO).
· MVEOF – Multi-Variate EOF
MVEOF, also called combined EOF, extends conventional EOF by use of both spatial and intervariable coherence, which enables a more efficient compaction of multifield data. More important, it may extract dominant patterns in the spatial phase relationships among various fields of the derived empirical orthogonal functions. This often leads to physical insight into the interactive processes within a complex system such as the ocean-atmosphere climate system (Wang, 1992; Wang et al. 2008; He et al. 2015).
· EEOF – Extended EOF
EEOFs constitute an extension of the traditional MVEOF to deal not only with spatial- but also with temporal correlations observed in weather/climate data, in which the additional variables are lagged versions of the same process. The method was first introduced by Weare and Nasstrom (1982) who applied it to the 300-mb relative vorticity to identify propagating structures.
· CVEOF – Complex Vector EOF
Generally, some variables, as the wind has two components of Zonal and meridional winds, can be presented as a vector format. For example, the wind can be presented as complex format as w = u + iv. CVEOF can be used to carry out EOF analysis on the complex matrix just as the conventional EOF/PCA does.
· JEOF – Joint EOF
JEOF is an extension of EEOF, which deals with two variables rather than a single variable in the original EEOF. JEOF should be supplied with the normalized version of original variables. This is because the original variables may have different scales.
· CEOF – Complex EOF
Empirical orthogonal function analysis of data fields is commonly carried out under the assumption that each field can be represented as a spatially fixed pattern of behavior. This method, however, cannot be used to for detection of propagating features because of the lack of phase information. Under such a case, the CEOF technique was proposed (e.g., Davis 1976; Horel 1984; Barnett 1985; Preisendorfer 1988; Kaihatu et al. 1998) to serve as a separation of variables in space and time. Complex EOF can more effectively capture the structure of non-stationary periodic variations or two orthogonal variables (e.g., zonal or meridional velocity) in fewer modes.
The Complex Empirical Orthogonal Function (CEOF) was introduced to analyze a set of time series data that have phase lag among them by adding components that are the original time series data rotated by 90 degrees on a complex plane using a mathematical method called Hilbert transform. This method is close to frequency domain EOF but it does not require converting data in the time domain into the frequency domain explicitly in the process.
· POP – Principal Oscillation Pattern
The principal oscillation pattern (POP) analysis is a technique used to simultaneously infer the characteristic patterns and timescales of a vector time series. The POPs may be seen as the normal modes of a linearized system whose system matrix is estimated from data (von Storch et al., 1995; Gehne et al., 2014).
The POP method is not a tool that is useful in all applications. If the analyzed vector time series exhibit a strongly nonlinear behavior, the POPs may fail to identify a useful subsystem. However, if a significant portion of the variability of a nonlinear system is controlled by linear dynamics, the POP analysis may be successful in extracting principal modes of oscillation.
· ICA – Independent Component Analysis
Independent component analysis (ICA) is a statistical and computational technique for revealing hidden factors that underlie sets of random variables, measurements, or signals.
ICA defines a generative model for the observed multivariate data, which is typically given as a large database of samples. In the model, the data variables are assumed to be linear mixtures of some unknown latent variables, and the mixing system is also unknown. The latent variables are assumed non-Gaussian and mutually independent, and they are called the independent components of the observed data. These independent components, also called sources or factors, can be found by ICA.
ICA is superficially related to principal component analysis and factor analysis. Sometimes, the ICA is viewed as a method of EOF rotation. Starting from an initial EOF solution rather than rotating the loadings toward simplicity, ICA seeks a rotation matrix that maximizes the independence between the components in the time domain. If the underlying climate signals have an independent forcing, one can expect to find loadings with interpretable patterns whose time coefficients have properties that go beyond simple non-correlation observed in EOFs.
Often, ICA is more appropriate than PCA to analyze time series, since the extraction of independent components (ICs) involves higher-order statistics whereas PCA only uses the second-order statistics to obtain the principal components (PCs), which are not correlated and are not necessarily independent.
· ST-MEMD - Spatio-Temporal Multivariate Empirical Mode Decomposition
ST-MEMD is a variation of classic Empirical Mode Decomposition (EMD) that takes spatial and temporal information into account, simultaneously. The original EMD only processes each signal source in isolation. However, whilst ST-MEMD retained the increase in sensitivity and specificity from adding spatial data, the new temporal data made no meaningful difference in terms of performance (Davies and James, 2014).
References
Gehne et al. (2014): Irregularity and decadal variation in ENSO: A simplified model based on Principal Oscillation Patterns. Climate Dynamics, 43:3327-3350.
Hannachi, A., 2004: A primer for EOF analysis of climate data. University of Reading, 33 pp.
Hannachi, A., I. T. Jolliffe, and D. Stephenson (2007), Empirical orthogonal functions and related techniques in atmospheric science: A review, Int. J. Climatol, 27, 1119–1152.
He, J., L.-Y. Chang, and H. Chen, 2015: Meridional propagation of the 30- to 60-day variability of precipitation in the East Asian subtropical summer monsoon region: Monitoring and prediction. Atmos.–Ocean, 53, 251–263, doi:10.1080/07055900.2015.1017798
Hotelling H. 1933. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24, 417–441, 498–520 (doi: 10.1037/h0071325)
Kutzbach, J. E., 1967: Empirical eigenvectors of sea-level pressure, surface temperature and precipitation complexes over North America. J. Appl. Meteor., 6, 791-802.
Obukhov AM. 1947. Statistically homogeneous fields on a sphere. Usp. Mat. Navk. 2, 196–198.
Lorenz EN. 1956. Empirical orthogonal functions and statistical weather prediction. Technical report, Statistical Forecast Project Report 1, Dept. of Meteor. MIT: 49.
Pearson K. 1901. On lines and planes of closest fit to systems of points in space. Phil. Mag. 2, 559–572. (Doi: 10.1080/14786440109462720).
S. R. H. Davies, C. J. James, "Using empirical mode decomposition with spatio-temporal dynamics to classify single-trial motor imagery in BCI", 36th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., pp. 4631-4, Aug. 2014.
von Storch, Hans, Gerd Berger, Reiner Schnur, Jin-Song von Storch, 1995: Principal Oscillation Patterns: A Review. J. Climate, 8, 377
Weare BC, Nasstrom JS. 1982. Examples of extended empirical orthogonal function analysis. Monthly Weather Review 110: 481–485.
Wang, B., 1992: The vertical structure and development of the ENSO anomaly mode during 1979–1989. J. Atmos. Sci., 49, 698–712, doi: 10.1175/1520-0469(1992)049, 0698:TVSADO.2.0.CO; 2
Wang, B., Z. Wu, J. Li, J. Liu, C.-P. Chang, Y. Ding, and G. Wu, 2008: How to measure the strength of the East Asian summer monsoon. J. Climate, 21, 4449–4463, doi:10.1175/2008JCLI2183.1.