Simple Way to Compute Time Series Rolling Average/Trend Predition over Time with Laplace Transform
Introduction
In marketing and many other business and industry senainos, data scientist or analysts need to generate rolling averaege or predict trend for a time series values. For example, we may be asked to estimate average of sales for thr "current" time point and the model needs to "forget" the old history. Here I would like to introduce a very interesting way (to compute the moving average that can "forget" old data by using the concept of Laplace transform. Don't get terrified as the calculation is very simple. It also reduces the dimension to 1 in case we regularly need to roll over to the average over a time span to the next time point. This approach also helps explains the meaning of the Laplace transform variable s as the period T. This method has been applied in a few of my projects.
The Formula
To make it simple:
In the formular: x[i] is the data values, t[i] is the time point. T is the period that you want calculate average over. We can also try to understand the formula as sum of values that decays over time exponentially (i.e. after time 1/T, the original value x[i] will become x[i]/e), interestingly, which sum multiplied by T, is the approximate moving average (I will demostrate it later with Laplace transform).
The Good Feature - Simple Rolling Over Computation
It can compute the next average by manipulating the previous value without querying the values in that time window of T. See below:
Here W is the length of time span. W does not need to be equal to T. It is a lot easier to understand it in the way of "sum of values decaying over time", where we simply decay the current average by exp(-TW) and add the values in the current time span of W.
Code Example
To make you understand it better, let's try a simple peice of Python code.
import numpy as np k = 50000 arr = np.ones(k) s1 = 1 s2 = 3 s3 = 6 s4 = 24 at1 = s1 * np.sum(arr * np.exp(-np.arange(0, k)*s1)) at2 = s2 * np.sum(arr * np.exp(-np.arange(0, k)*s2)) at3 = s3 * np.sum(arr * np.exp(-np.arange(0, k)*s3)) at4 = s4 * np.sum(arr * np.exp(-np.arange(0, k)*s4)) print('average over 1 day:', at1) print('average over 3 day:', at2) print('average over 6 day:', at3) print('average over 24 day:', at4) # average over 1 day: 1.5819767068693265 # average over 3 day: 3.1571870894737675 # average over 6 day: 6.014909469941067 # average over 24 day: 24.000000000906034
Where you can see, this does work when s is greater (especially for s4 = 24, the result is very close to 24). What's the reason? It's because the data we feeded into the model is discrete points, but we are trying to compute a smooth average over time (take this as Question A, I will explain it later with Laplace transform). But let's try another piece of code, you will find that it actually works very well as below:
k = 50000 arr = np.ones(k) s1 = 1 at1 = s1 * np.sum(arr/100 * np.exp(-np.arange(0, k)/100*s1)) print('average over 1 day:', at1) # average over 1 day: 1.0050083333194446
Why? We need to ask ourselves about what is actually a data "POINT"? See the figure below, when there is a value Xi given at a specific time "POINT" t[i], mathmetically, we can consider it as a Dirac delta function/distribution (a distribution reaches positive infinity in Y axis where t -> t[i] but 0 where t != t[i], and has exactly integral area of x[i]. When we compute the average of them, we can image that we are using two unit step functions to generate a platform to simulate the average. For the figure blow, we have 3 Dirac impulses of area 1, and we use a two unit step functions u(t-1)-u(t-3) to present the flatten platform with height of 1 and width of 3.
The Math - Laplace Transform of Dirac Impulses and Unit Step Functions
Now one interesting thing is that we have reach the real physical explanation of the Laplace transform variable "s". What does "s" mean? In Fourier transform, we know the variable is frequency. However, in Laplace transform, most textbook does not clearly explain the physical meaning of the variable "s". Here we can find that "s" is actually the time period. If s is T, you will get the average of x[i] over the "Period" of T.
That's how we can reach the formula with simplicity.
Explanation to Question A:
When T is very small (comparable to the size of time gap between adjacent data points). In the first code example, when s1 = 1, why the output 1.582 is significantly greater than the "real average of 1"? That's because there is a Dirac impulse exactly at the time point of 0, and the closest one to it is at the time point of -1. Therefore, in the "decaying" model, we know that when the the area (of Dirac impulse) are all at time point 0, it has more contribution than the area evenly spreaded between 0 and -1 (i.e. a square of 1 x 1). That's why the value from this algorithm is 1.582. In fact, it truely reflected the effect of "decaying over time". When T is greater, it will round this effect over larget time window and becomes more immune to local changes.
This method does very good job in computing average time and trend prediction. Check out the code here for example. The blue and green dashes almost overlaps, where the blue is the results of the current Laplace method, while the green the moving average (sum of values in period divided by period).
Limitations
However, there is some limitation if you want to compare it to the real moving average when there is sine wave component in the values. This algorithm seems to amplify the sine wave when the T is over 1/2 of the period of the sine wave.