Milestone Project: Time series forecasting in TensorFlow (BitPredict ????)
Marius Poskus
Cybersecurity Executive @ Fintech | Cybersecurity Leader | Board Advisor | AI Security | mpcybersecurity.co.uk
Time Series you might ask? Time series deals with data over a period of time. It can be anything from number of employees over a 10 year period, sales of computers over period of 5 years or electricity usage over 50 years.
The timeline can be short (seconds/minutes) or long (years/decades). And the problems you might investigate using can usually be broken down into two categories:
What we will cover during this project
Types of Time Series
?? Note: The frequency at which a time series value is collected is often referred to as seasonality. This is usually mesaured in number of samples per year. For example, collecting the price of Bitcoin once per day would result in a time series with a seasonality of 365. Time series data collected with different seasonality values often exhibit seasonal patterns (e.g. electricity demand behing higher in Summer months for air conditioning than Winter months).
Creating Test and Training Data Sets
We will also explore the differences between univariate and multivariate data, which might be useful in our case because Bitcoin halvening effect might alter our price predictions, as you can see in the image below:
There is an important problem that we needed to address in creating working data sets, because we are working with prediction models, we can't any longer use random data split models as we used in computer vision or language processing modelling, we have to split the data based on the dates, for example, first 80% of date range of prices will be training data and the last 20% will be the test data. This problem is perfectly summed up in the image below:
So as mentioned when we use the data split according to the time, the split looks like in the image below:
During our modelling experiments we built in total 11 models to try various deep learning architectures and explore different horizon and window sizes. Horizon and Window Sizes? It's not what you might think, Horizon is number of steps we are looking to predict in the future and Window size is the number of data point from the past we will use to predict the horizon. So image below shows all the models we have tried:
领英推荐
Let me paint a picture first of all about the Naive model - this is the model that does not need any prediction as it uses very simple formula:
Which basically means that it takes previous day's result as the future forecast, but as you will such a simple model is not that easy to beat with deep learning architectures. Before i showcase the results from each modelling experiments, let sum up what we used to measure the model effectiveness and why we used it:
Mean absolute error shows us how far on average the model prediction is away from the real data.
After running all 11 models with our data, the results can be found in the able below:
The majority of our deep learning models perform on par or only slightly better than the naive model. And for the turkey model, changing a single data point destroys its performance.
?? Note: Just because one type of model performs better here doesn't mean it'll perform the best elsewhere (and vice versa, just because one model performs poorly here, doesn't mean it'll perform poorly elsewhere).
As I said at the start, this is not financial advice.
After what we've gone through, you'll now have some of the skills required to callout BS for any future tutorial or blog post or investment sales guide claiming to have model which is able to predict the future.
Mark Saroufim's Tweet sums this up nicely (stock market forecasting with a machine learning model is just as reliable as palm reading).
As covered by model 10 - regarding a Turkey problem, just looking at the historic market data, it is almost impossible to predict future prices because we are not looking at potential market conditions and Turkey events - such as market crash, which some of you might be aware we are experiencing this week.