Milestone Project: Time series forecasting in TensorFlow (BitPredict ????)
Time Series Forecasting

Milestone Project: Time series forecasting in TensorFlow (BitPredict ????)

Time Series you might ask? Time series deals with data over a period of time. It can be anything from number of employees over a 10 year period, sales of computers over period of 5 years or electricity usage over 50 years.

The timeline can be short (seconds/minutes) or long (years/decades). And the problems you might investigate using can usually be broken down into two categories:


Classification vs Forecasting

  • Classification - Anomaly detection, time series identification (where did this time series come from?)
  • Forecasting - Predicting stock market prices, forecasting future demand for a product, stocking inventory requirements

What we will cover during this project

  • Get time series data (the historical price of Bitcoin)Load in time series data using pandas/Python's CSV module
  • Format data for a time series problemCreating training and test sets (the wrong way)Creating training and test sets (the right way)Visualizing time series dataTurning time series data into a supervised learning problem (windowing)Preparing univariate and multivariate (more than one variable) data
  • Evaluating a time series forecasting model
  • Setting up a series of deep learning modelling experiments, Dense (fully-connected) networks, Sequence models (LSTM and 1D CNN), Ensembling (combining multiple models together)Multivariate models, Replicating the N-BEATS algorithm using TensorFlow layer subclassing
  • Creating a modelling checkpoint to save the best performing model during training
  • Making predictions (forecasts) with a time series model
  • Creating prediction intervals for time series model forecasts
  • Discussing two different types of uncertainty in machine learning (data uncertainty and model uncertainty)
  • Demonstrating why forecasting in an open system is BS (the turkey problem)

Types of Time Series


Types of Time Series

?? Note: The frequency at which a time series value is collected is often referred to as seasonality. This is usually mesaured in number of samples per year. For example, collecting the price of Bitcoin once per day would result in a time series with a seasonality of 365. Time series data collected with different seasonality values often exhibit seasonal patterns (e.g. electricity demand behing higher in Summer months for air conditioning than Winter months).

Creating Test and Training Data Sets

We will also explore the differences between univariate and multivariate data, which might be useful in our case because Bitcoin halvening effect might alter our price predictions, as you can see in the image below:


Univariate vs Multivariate

There is an important problem that we needed to address in creating working data sets, because we are working with prediction models, we can't any longer use random data split models as we used in computer vision or language processing modelling, we have to split the data based on the dates, for example, first 80% of date range of prices will be training data and the last 20% will be the test data. This problem is perfectly summed up in the image below:


Time Series Data

So as mentioned when we use the data split according to the time, the split looks like in the image below:


Test and Training Data Split

During our modelling experiments we built in total 11 models to try various deep learning architectures and explore different horizon and window sizes. Horizon and Window Sizes? It's not what you might think, Horizon is number of steps we are looking to predict in the future and Window size is the number of data point from the past we will use to predict the horizon. So image below shows all the models we have tried:


Deep Learning model for time series problems

Let me paint a picture first of all about the Naive model - this is the model that does not need any prediction as it uses very simple formula:


Naive formula

Which basically means that it takes previous day's result as the future forecast, but as you will such a simple model is not that easy to beat with deep learning architectures. Before i showcase the results from each modelling experiments, let sum up what we used to measure the model effectiveness and why we used it:


Mean absolute error

Mean absolute error shows us how far on average the model prediction is away from the real data.

After running all 11 models with our data, the results can be found in the able below:


Time Series

The majority of our deep learning models perform on par or only slightly better than the naive model. And for the turkey model, changing a single data point destroys its performance.

?? Note: Just because one type of model performs better here doesn't mean it'll perform the best elsewhere (and vice versa, just because one model performs poorly here, doesn't mean it'll perform poorly elsewhere).

As I said at the start, this is not financial advice.

After what we've gone through, you'll now have some of the skills required to callout BS for any future tutorial or blog post or investment sales guide claiming to have model which is able to predict the future.

Mark Saroufim's Tweet sums this up nicely (stock market forecasting with a machine learning model is just as reliable as palm reading).


Time Series prediction is BS

As covered by model 10 - regarding a Turkey problem, just looking at the historic market data, it is almost impossible to predict future prices because we are not looking at potential market conditions and Turkey events - such as market crash, which some of you might be aware we are experiencing this week.


要查看或添加评论,请登录

Marius Poskus的更多文章

社区洞察

其他会员也浏览了