Foundations of Data Analytics
An article written by?Davide Finotto and Riccardo Bellemo
Introduction
An in-depth understanding of stochastic processes and their application domain is crucial to structure a forecasting model worthy of the name. Indeed, the configuration of a forecasting model cannot be separated from the assimilation of the methodologies and basic tools used for data analysis (i.e. white noise and non-stationarity) and the understanding of their most frequent and desirable (or undesirable) outcomes.
More in detail, the scope of the present contribution is to illustrate the role of stationarity, a fundamental feature for a time series when using linear models. They are preferable to non-linear ones since they are easier to use, simpler to interpret, and you obtain more statistics that help you assess the goodness of fit of the model. One recurring and meaningful issue is to make them parsimonious, which in this field is intended as the minimization of the number of parameters to be estimated while maintaining high predictive accuracy. In spite of the name, it must be kept in mind that these forecasting tools can fit several types of curvature in the dataset, but non-linear regression is generally more flexible in the shapes that it can include. Hence, whenever a linear model does not yield an adequate fit to the data, you should switch to non-linear ones.
This article shall include the discussion of concrete cases and examples, in order to allow the practical explanation of how to identify the presence of stationarity through visual methods (a quick glance at the chart itself can be quite telling, or by looking at the autocorrelation function of the model) and through the use of other statistical tests.?
When can a time series be defined as stationary?
To make statistical inferences on any stochastic process, such as the price of a stock, we would have to take into consideration not only the current price, but we’d need to analyze a series of observations over time. These observations compose a time series, which has the following definition:
A partial realization of a trajectory of the stochastic process {Yt}, i.e. for t = 1, … , n, yt is the sample observation of Yt.
In other words, a time series is a sample drawn from a stochastic (random) process. The latter, denoted by {Yt}, t ∈ Z, is in turn defined as a family of random variables, Yt ∈ ?^d, d ≥ 1, ordered by a parameter t ∈ Z, such that for all n ∈ Z and all n-ples t1, … , tn in Z, such that the joint probability distribution of (Yt,, … , Yn) is well defined. Thus, a stochastic process is defined as a random function of time, which can be discrete or continuous depending on the phenomenon under analysis. Simply put, you can think of a series of daily prices of any asset as a stochastic (random) process: in this case, the time span in which the data is collected would constitute the time series, i.e. the selected sample relevant for the analysis. Yet we can’t use a random time series: you need to find a period of time T large enough to grant the consistency of the estimators and that, even graphically, looks as stable as possible.
A first clue that a time series is stable can be obtained graphically: by plotting the observations over time, they should be as horizontal as possible, that is, deviate as little as possible from the mean. Moreover, their variability should tend to be constant over time. Stability thus defined is a necessary but not sufficient condition for stationarity, which is why graphical analysis allows us to make an initial skimming for the identification of the best time series for the purpose of statistical inference. What we cannot detect from the plot is autocovariance, which should also remain constant alongside the mean and the variance of the time series. The simultaneous presence of these 3 characteristics implies stationarity, implying that the observations aren’t time-dependent.
Expressed in mathematical terms, a stochastic process is characterized as follows:
·??????Mean function:
??t = ??(????)
·??????Autocovariance function:
????,?? = ??????(???? , ????)
Since it is a linear combination of variances, ????,?? ≥ 0:
Which implies that:
Meaning that the autocovariance function of a stochastic process is always positive.
·??????Variance function:
·??????Autocorrelation function (which states the interdependence between and):
These functions highlight a fundamental flaw in linear models: they cannot be used for non-stationary time series since they are not built for dependent observations (in this case dependent on time t. Hence, assumptions such as i.i.d. and of Gaussianity don’t hold).?Using non-stationary time series data in a model produces unreliable and spurious results in that they can show a relationship between certain random variables where it actually doesn’t exist, leading to poor forecasting: you are effectively computing a point estimate on the whole time series for a parameter that changes over time.
On the other hand, stationarity can be defined as:
1- Weak (or 2nd order) stationarity: the mean, the variance and autocorrelation functions of the process are constant over time:
1.????Mean function:
????????????????????????????????????????????? ?? ?? = ??(????)??????????????
2.????Autocovariance function:
???????????????????????????????????????????? ?? ???? = ??????(???? , ???????)
???? = ???????????????????????
???????As you can see, doesn’t depend on time t but on the lag between t and j.
3.????Variance function
??^2 = ??(???? ) = ??0
4.????Autocorrelation function
???????Only the moments of order greater than 2 can change over time.
2- Strong (or strict) stationarity: the process {Yt} = (Yt,…, Ytn) must have a well-defined joint probability function (h>0, at t1+h,…, tn+h, the vector (y1+h, … , yn+h) has the same distribution for any n>=1). Simply put, on top of the conditions needed for 2nd order stationarity, the stochastic process must have the same known joint distribution for any shift in time t. Note that a strongly stationary process is also a weakly stationary one (but not vice versa).
This allows us to easily estimate the various moments of the process with the classic estimators that we all know so well (namely, the sample mean or sample variance).
Another useful way (but not an unequivocal one) to visually check for stationarity is to plot the autocorrelation function of the process (ACF): for a stationary time series the ACF will drop to zero relatively quickly while for a non-stationary one the decay be much slower, meaning that autocorrelation is persistent over time.
As you can see, the first graph declines at a very slow pace and it was in fact computed from a time series of daily data from the S&P500 which, as pretty much every financial asset’s price graph, is non-stationary. The second instead comes from the same dataset that has been first differenced, which is a manipulation that will be explained later in the article. Therefore, what you want to see is a quick decay of the ACF since it implies that, coherently with the definition of stationarity, the observations are uncorrelated over time. To ensure that the absence of autocorrelation is statistically significant, you shall perform one of the following tests:
·??????Box-Pierce test
·??????Ljung-Box test
·??????McLeod-Li test
The first two differ only in the way they compute the test statistic but the result should be the same as can be seen from the graphs below:
The first example might be a reasonable output for a LB test even though the last 3 lags are dangerously hovering above the rejection region. The second one is instead what you never want to see from your model: it shows clear and persistent autocorrelation between the lags. Thus you simply need the p-values of all the lags to be as high as possible, which means that you won’t reject the null hypothesis that the autocorrelation function equals 0.
The McLeod-Li test is instead significantly different from the others: while in the previous test the correlation is computed as corr = (Xt,Xt-j) , in this one it is computed between the squared random variables (as can be seen from the subscript of ρ, corr = (Xt^2, Xt-j^2). This important feature allows us to understand whether there are behaviors in the time series that induce volatility clusters, that is, as Mandelbrot first noted: “large changes tend to be followed by large changes, of either sign, and small changes tend to be followed by small changes”. It is, in fact, a test used mainly for heteroskedastic processes (all those processes in which the variance varies over time) such as the ARCH(q) or the GARCH(p,q) models, which are intended to accurately describe clustering.
Another major difference concerns its graphical output:
Although it might seem identical to the ones from the other two tests, they are dissimilar in what the p-values are computed on: in this case the p-values come from a joint test on all the lags. Consider the p-value of lag 9, it means that the test has been run jointly on the previous 9 lags and not between the current value and the 9th one as would happen in the other tests. The interpretation of the MlL test is the same as for the other two tests.
Another useful property of a time series that indicates the presence of stationarity is ergodicity, meaning that all the statistics of a stochastic process are consistent:
And consistency can be defined as follows: defined an estimator Tn?and with θ?the actual and unknown value of the parameter):
Which means that the probability that the difference between the estimated value and the actual value is not 0, tends to 0 when the number of observations n tends to infinite. This property is extremely important in the sense that, if a process possesses it, by increasing the number of observations the estimators will tend to the actual value of the parameter, thus increasing the accuracy of the forecast. This is why you always want to use a long enough time series that allows the exploitation of consistency. Also note that an ergodic process is at least 2nd order stationary.
Are non-stationary processes completely useless?
As you may have realized throughout the article, it is highly unlikely that a stochastic process will turn out to be stationary. Does this mean that linear models are totally useless in the real world and can only be used in an ideal scenario? Of course not, there are several ways that allow us to obtain a stationary process from a non-stationary one.
One example can be transformations of the random variables (e.g. taking the logarithms of the dataset), which can help stabilize the variance of the time series.
Another way to obtain stationarity is differencing, a practice that you will encounter very often in data sciences. Differencing entails the computation of the differences between consecutive observations and it can help stabilize the mean of a time series by removing its level changes. It should be kept in mind that to obtain stationarity it may be necessary to repeat the differencing process more than once. To clarify this concept:
These are 3 notations used to express a differencing process: you are creating a new process Xt composed of the first differences between the lags of the old process Yt. If by doing this Xt becomes stationary, it is possible to make statistical inferences on it and extrapolate the behavior of Yt from the results obtained: by knowing how Yt changes over time (from t-1 to t) inferences on its behavior are feasible.
In particular, B is the Backward operator (it can also be defined as the Lagging operator L) for which:
Lastly, ▽^d ?indicates the number of times that differencing (d) has been applied.
A non-stationary stochastic process that becomes stationary after d differencings is said to be integrated of order d, i.e. I(d) and it is defined as
领英推荐
Through differencing you can also correct seasonality: when there is a repetitive and predictable movement along the time series. The correction is enacted by differencing the current value with the value at the seasonal lag:
Where m indicates the length of the seasonal lag. By applying this you are removing the value that periodically repeats, leaving out only the error term from which the movement of the stochastic process depends.
White Noise
A white noise process is the simplest example of a stationary process and it can be seen in pretty much every model as the error term with mean 0 and fixed variance. In fact, although most financial time series are not stationary (and therefore, not white noise), any series can be decomposed into a predictable and an unpredictable component, where the latter is the fundamental white noise process.
There are three main types of white noise with increasingly restrictive assumptions:
A sequence Y1, Y2, … is said to be a weak white noise process with mean μ and variance σ^2 i.e.?if:
·??????E(Yt) = μ??????????????? ???t
·??????Var(Yt) = σ^2????????????t?
·??????Cov(Yt,Ys) = 0???????? ? ?t ≠ s
Where μ is a finite constant, σ^2 is a positive finite constant for all t. The Cov(Yt,Ys) = 0?indicates the lack of correlation between the random variables, a process defined in this way is said to be weakly station. The uncorrelatedness of error terms from different time observations is a property that allows us to easily compute most of the estimators as we have shown previously.
If the random variables are independent and identically distributed (i.i.d.) the process is said to be an “independent white noise process" i.e.:
The independence of the random variables can be written as Yt ⊥ Ys , t ≠ s, where ⊥ indicates ortogonality.
·???????E(Yt) = μ??????????????? ??t
·??????Var(Yt) = σ^2?????????????????t
·??????Cov(Yt,Ys) = 0???????? ? ?t ≠ s
It is important to notice that an independent white noise process, strongly stationary, is by construction also weakly stationary.
If the random variables of the process follow a specific marginal distribution this can be highlighted. For example, if the sequence of random variables Y1, Y2, ... are i.i.d. Gaussian the process is called “Gaussian white noise process”.
A White Noise process, if plotted, would look similar to the graph below and, as you can see, the observations move randomly around the mean (which in this case, and as it usually is, equals 0).
Random Walk
If by first differencing the series you obtain a white noise, you obtain a random walk:
???? = ?????1 + ????, ?? ~ ????(0, ??^2 )
Which is a particular case of an autoregressive process ???? = ??1?????1 + ???? , in which ??1 = 1.
The name of the process derives from its unpredictability: the current value yt is equal to the past one plus a random error of average 0. It is in fact used for non-stationary data, particularly for financial and economic data since they have long periods of apparent trends (up or down) and sudden and unpredictable change of direction. Since the random walk is a stochastic process which consists of the sum of a sequence of unrelated changes of a random variable. Therefore, there is no pattern to the changes and the random variable cannot be predicted.
As just cited, a random walk process is not stationary. Its model specifications are, in fact, the following: By proceeding by recursion we can go back to the initial value of the process y0 and see how the current value is simply derived by summing the series of white noise errors to it:
From this we can comfortably compute the various functions of the random walk:
Already from this you can see the dependence of the variance function from time, hence the non-stationarity of the process.
3. Autocovariance function
Assuming ?? < ??, we can rewrite the process as the sum of 2 components: the first is Ys, which is a random walk up to lag s, and the second part Wt, which is the sum of the errors from s onwards.
Knowing that, given their uncorrelation, ??????(???? , ????) = 0 when ?? ≠ ??, the last term must equal 0:
By exploiting the formula ??????(??, ??) = ??(????) ? ??(??)??(??) we can see that:
The first and the third component both equal 0 since the expected value of the error term equals 0 by definition (??~????(0, ??^2 )). The second term is instead the ??????(???? , ????) which equals 0 as we just mentioned.
The variance term, as we computed earlier, is equal to ????,??.
Similarly, for any ?? ≠ ??, ????,?? = ??^2min(??,??)
Therefore, the autocovariance function of a random walk can be expressed as follows:
It’s now evident why this process isn’t stationary: the autocovariance function depends on the chosen lag s.
Therefore, both the variance and autocorrelation functions indicate the non-stationarity of a random walk process.
Assuming that an asset’s behavior is a random walk suggests that it’s movements are independent of each other and they all have the same distribution. Clearly, this discredits both technical analysis and fundamental analysis: the first is undependable given that the analysts would trade a security based on past data (for example when a trend has already randomly formed). The second one is undependable due to the generally poor quality of information collected and used and the only partial ability to correctly interpret it. On the other hand, this theory would also credit the efficient market hypothesis, which assumes that stock prices fully reflect all available information and expectations, therefore current prices are the best approximation of the asset’s intrinsic value. Of course, the opposition to this theory asserts that it is possible to outperform the markets.
A practical example
For the following example we will be using RStudio and the libraries required are “quantmod” and “forecast”.
Let’s take into consideration a time series of daily data from the S&P500 index (ticker symbol GSPC) and analyze a period of time t going from 01-01-2016 to 12-31-2019.
require(quantmod)
require(forecast)
getSymbols('^GSPC',from='2016-01-01',to='2019-12-31') sp500=window(GSPC$GSPC.Close,start='2016-01-01',end='2019-12-31')
plot(sp500)
As you can see, the chosen time series is clearly non-stationary, hence it can’t be modeled. But as we showed earlier, it is possible to apply some transformations to the process to render it stationary: first we take the log of the time series to scale it differently and to try to stabilize the variance (first line of the following part of the code). Then we can first difference the process to eliminate the level changes and to make the observations oscillate around the mean (second line).
lsp500=log(sp500)
dlsp500=diff(lsp500)[-1]
layout(matrix(1:4,nrow=2,ncol=2,byrow=T))
plot(lsp500)
plot(dlsp500)
The plot shows that the first-differenced series is now stable around the mean over time (although there is clear evidence of variance clustering). We can now check whether the observations are autocorrelated over time by plotting the ACF functions and by applying the Ljung-Box test:
acf(lsp500)
acf(dlsp500)
Box.test(lsp500,lag=5,type='Ljung-Box') Box.test(dlsp500,lag=5,type='Ljung-Box')
Clearly, the first ACF shows strong autocorrelation persistency among the observations while the second one shows a quick decay, indicating that it might be a stationary process. Also, by applying the LB test we ensure that, for whatever lag we choose, the second time series’s observations are not autocorrelated while the ones from the first are. In this case we tested lag 5 but the same results happen even at other times t. The fact that by first-diferencing the time series becomes stationary can be confirmed by using the auto.arima function. By utilizing this function you can automatically fit an ARIMA(p,d,q) model to the time series. The letters p and q respectively indicate the lags of the autoregressive component (AR(p)) and of the moving average (MA(q)). Finally, the d stands for the order of integration of the time series, that is how many times it has been first-differenced before becoming stationary. This model will be explained in detail in the next article.
fitsp=auto.arima(lsp500,max.p=10,max.q=10,seasonal=F,stepwise=F,ic='aic',test='adf')
print(fitsp)
tsdiag(fitsp)
The resulting model is an ARIMA(0,1,1), meaning that there isn’t an autoregressive component (AR(0)), there is one lag in the moving average component (MA(1)) and the time series has been differenced once (d=1). You can also see from the Ljung-Box statistic that the lags are not autocorrelated although it gets very close to the critical region for lags 8 onward.?
Conclusions
With this article you should be equipped with the basic instruments that are vital in data analytics. In particular you should have familiarized with a vital notion in this field: the notion of stationarity, how to test for its presence in the time series that you are analyzing and how to try to obtain it. Moreover, the White Noise and the Random Walk processes have been outlined, which are the building blocks of many more complex and interesting models that will be treated in detail in the next articles. Hopefully by now you have built a solid foundation that'll be necessary for a clear understanding of the next topics and of what data science entails.
We are Main21
If you are interested in Data Science and how it is applied to face real problems, follow?Main21. We publish content about Data Science and Machine Learning and offer the opportunity to our members to interact and thrive in a flourishing community of people interested in Technology.
You can apply to be one of our members?here