Correlation, causation and vector autoregressions
Vector autoregression (VAR) should be your first go-to statistical model.
Assume you have observed a vector time series. It is better to immediately consider the vector case, with vector size larger than 2. Considering a single dimensional case does not allow making the key point - stratification of dependency ("correlation") and causation between the vector constituents. In the bivariate case, one is always tempted to rush into identifying independent and dependent elements of the pair, sinking into the swamp of regressions. You are not immune against it even in the general vector case, always at risk of being sucked in by peer gravity into the black hole of machine learning, with its equally premature feature engineering and other cool yet embarrassing stuff.
VAR is, abstractly, AR, only in the matrix/vector form:
One can immediately generalize into VAR(I)MA, by introducing another weighted sum over the lagged vector innovations. Such model is notoriously difficult to identify and estimate in practice. Hence we stick with VAR, in reality even with VAR(1).
There are at least two reasons one should consider such model. Firstly, the time series of X's is not a sample yet, i.e. order of X's may matter. One needs to prove that it doesn't. The easiest and practical way to prove it is just to estimate VAR. If estimated B matrices are not statistical zeros, then the chances are that the AR part of the model has pulled enough time dependence it and your innovations have no residual serial dependency, i.e. they are a sample and not a time series any more. You can do historical MC/bootstrap on the sample of innovations even without modelling them any further.
领英推荐
Secondly, the key added value of VAR is stratification of causation and "correlation". The autoregressive B matrices capture causation in its purest form: the past affects the future. This is not the causation one is warned not to confuse correlation with: that causation is instantaneous. If need be, it can be somewhat introduced at this level via Error correction (ECM) dynamics, which is a bit more challenging to estimate:
Going back to basic VAR, once causation is taken out by the B matrices, the cross-component statistical dependency of the innovation vectors is what capture residual "correlation". That is the correlation not to be confused with causation. If there is no serial dependency in the innovations left, then you can sample from them directly, or you can estimate a parametric distribution from their sample, given that they are now i.i.d. This is helpful if you want to be able to sample a larger number of innovations than observed and you want a sample different from observations, of use some other parametric analytical methods.
As mentioned before, modelling vector innovations directly using the matrix version of MA may be too daunting. A half-way approach is to first deal with them component-wise before modelling the cross-component dependency if still necessary. In other words, once you have estimated VAR component-wise, you can analyse time series of the implied innovations for each component individually. This may be very useful if the components are diverse, e.g. equity returns vs credit spreads. You may need to use a GARCH-like approach to handle time clustering of variance in one component time-sieries and not in other components. Only after that you re-imply the innovation vectors and see if you need to still do anything with them.
Dependency via causation, modelled by the B matrices is "physics" of the model. It therefore may allow an interpretation far beyond the ability to partly explain variance. Furthermore, instead of the linear operators, which the B matrices are, you can invent whatever non-linear operator you like or attempt to estimate it non-parametrically using your favourite universal approximator. This was done in the past, and it is called Non-linear (V)AR. Tests have been invented (BDS) to check for non-linearity in time series. Whether all that adds material value in solving practical dynamic and optimal control problems in math finance is yet to be seen.
The chart in the head of the presentation shows PCA of the innovation correlation for the time series of iTraxx Crossover with different maturities. In the one of the left, the four AR(1) processes are estimated independently and then correlation between the residuals is PCA'ed. In the one on the right, a four-dimensional VAR(1) process is estimated and correlation between the residuals is also PCA'ed. VAR-based approach uncovers something that your traditional three yield curve factors do not.