登录查看更多内容

Predicting the Bitcoin Price using Neural Networks

Abdel Giovanny Perez

Data Science Developer at Business Support

发布日期: 2020年7月11日

If you want to enrich your knowledge, you are on the right track, but if you want to fill your virtual pockets, you will probably find here a new methodology to predict the future price of Bitcoin based on neural networks.

In this article, you will find the following topics:

Time Series
Datasets used for the predictions
Preprocess of datasets
Use of TensorFlow library
Recurrent Neural Networks (RNN)
Practical example

The logic behind the implementation of SW used in this project is based on the TensorFlow tutorials. You can find it in https://www.tensorflow.org/tutorials/structured_data/time_series

Time Series

A Time Series is a sequence of data points ordered in time; some examples of Time Series are the Average Temperature in a city per hour, Quantity of visitors in a store per day, or price of BitCoin (BTC) per minute.

As you can see in Figure 1, these are the graphical representations of Time Series which independent variable the time.

From: https://weathercycles.wordpress.com/2014/03/29/temperature-time-series-graphs-australia/

From: https://www.aptech.com/blog/introduction-to-the-fundamentals-of-time-series-data-and-analysis/

Datasets used for Bitcoin prediction.

Before to deep in the Bitcoin datasets, I’m going to give a brief introduction to Bitcoin. This term is associated with a cryptocurrency and it is perhaps the worldwide’ most recognized; a cryptocurrency is a virtual currency managed by complex blockchain algorithm which allows replacing the current money and its acceptance has grown during last years.

Bitcoin appears in the first days of 2009 and its price (respect to USD) changed constantly depending on variables as offer and demand, as you can see in Figure 3.

I’m going to use a Dataset with a granularity per minute which includes the following info:

- The start time of the time window in Unix time

- The open price in USD at the start of the time window

- The high price in USD within the time window

- The low price in USD within the time window

- The close price in USD at the end of the time window

- The amount of BTC transacted in the time window

- The amount of Currency (USD) transacted in the time window

- The VWAP (volume-weighted average) price in USD for the time window. The VWAP is calculated by adding up the dollars traded for every transaction (price multiplied by the number of shares traded) and then dividing by the total shares traded.

VWAP = ∑Price * Volume / ∑Volume

An example of data found in this dataset:

Preprocessing of Datasets

Based on the objective proposed in the introduction, it is required to adequate the current dataset to any with features required to reach the goal. The following are the tasks required to change the format of Dataset.

1. Change the field ‘Timestamp’ in Unix time to a standard time format (year, month, day, hour & minute).

2. Delete all the NaN in the dataset. Reviewing the dataset, I found several NaN in the fields; these NaN are generated because at the beginning of Bitcoin there were few transactions, even zero in time-lapses, which generate a NaN in the WVAP field.

3. Choose a time-lapse in which one be used a trained data.

4. Choose features that ones be used to predicting the value of BTC at the close of the following hour.

5. Change the granularity from minutes to hours.

6. Data Normalization

The change of “Timestamp” info from Unix time to Standard format is implemented with the library Pandas using the command

df['Timestamp'] = pd.to_datetime(df['Timestamp'], unit='s')

In Figure 6 you will see a graphic of VWAP since the BTC creation until May 2020. As you can see there are three well-defined shapes (time lapses). The first one is from the creation to January 2017; the second one began in 2017 and finished around November 2018 and the last one start in January 2019 until now.

Based on these appreciations, I included in the preprocessing phase the following requirements:

- Includes in the training data information from 01- January 2017.

- Depending on the analysis is Univariate or Multivariate, I delete the columns High, Low, Open, Close, and for Univariate also the transaction columns.

In order to change the granularity from minutes to hours, I use the command resample from Pandas library, which is a powerful method that takes a DateTime Index and based on the parameters is able to change from minutes to hours the DateTime Index and also to execute a function (max, min, sum, mean) with the data included in the range to transform; an example is the following command.df_new['Volume_(Currency)'].resample('H').sum()

df_new['Volume_(Currency)'].resample('H').sum()

In this case, I’m resampling from minutes (original info) to hours (H), and the data in column ‘Volume_(Currency)’ is cumulated.

One important step to implement is Data Normalization, and it is the way that all data from several variables have the same behavior and similar ranges. The normalization is implemented with the following commands.

Note that before to implement these scripts, it is required to convert the info from data frames Pandas to NumPy arrays.

uni_train_mean = uni_data[:TRAIN_SPLIT].mean()
uni_train_std = uni_data[:TRAIN_SPLIT].std()

uni_data = (uni_data-uni_train_mean)/uni_train_std}

The last step is to split the data to use in the training stage. The requirement is to predict the close value of BTC using the info from lasts 24 hours. So, this last step split the data into packages of 24 hours an evaluate the WVAP for the hour 25.

From Preprocessing to Tensorflow

After ensuring the raw data has been preprocessed according to requirements it is required to convert these data in a format to be ingested to a TensorFlow model.

The command used in this case is,

train_data_single = tf.data.Dataset.from_tensor_slices((x_train_single, y_train_single))

This command is based on the proposed by TensorFlow web page,

dataset = Dataset.from_tensor_slices((batched_features, batched_labels))

Now, with the pipeline built the next step is to define a model based on a Recurrent Neural Network, specifically a LSTM in order to avoid Vanishing Gradient and enhance the performance.

single_step_model = tf.keras.models.Sequential()

single_step_model.add(tf.keras.layers.LSTM(24, input_shape=x_train_single.shape[-2:]))

single_step_model.add(tf.keras.layers.Dense(1))

single_step_model.summary()

The summary of this model

Results

Univariate

I have trained and predicted the Bitcoin time series with only one variable, the Volume Weighted Average Price. The following graphics show 5 samples from the validation dataset, the expected value, and the predicted.

Multivariate

On the other side, I have used three variables in a multivariate model to predict the next Bitcoin price. The values used are, Volume_(BTC), Volume_(Currency), and the main variable (VWAP) Weighted_Price.

The following graphics show 5 samples from the validation dataset, the expected value, and the predicted.

Acid test

The last test is to evaluate the prediction of BTC price with an unknown BTC price.

Conclusions

The use of RNN in the prediction of future data within Time Series is a powerful tool; even today it is used in other applications like speech recognition and generative models.

The comparison between the univariate and the multivariate model shows us that the multivariate model performs better, even with several epochs - as can be seen in the following figure.

Anyway, the use of RNN in areas like trading is not recommended because there are random multiples variables that are not included in the datasets as political decisions, behavior other cryptocurrencies, competition, and of course the offer and demand laws.

Full code in my GitHub account.

https://github.com/ledbagholberton

Resources.

https://www.investopedia.com/tech/what-determines-value-1-bitcoin/#:~:text=Bitcoin%20pricing%20is%20influenced%20by,the%20exchanges%20it%20trades%20on.

https://machinelearningmastery.com/when-to-use-mlp-cnn-and-rnn-neural-networks/

https://machinelearningmastery.com/time-series-data-stationary-python/

https://www.tensorflow.org/tutorials/structured_data/time_series

https://www.tensorflow.org/guide/data#time_series_windowing

要查看或添加评论，请登录

查看全部

Predicting the Bitcoin Price using Neural Networks

Abdel Giovanny Perez

Data Science Developer at Business Support

更多精彩文章

社区洞察

其他会员也浏览了

Indian Farmers’ Woes and the Role of AI and ML

Deep tech may stumble on insufficient computing power

OpenAI Cuts Off ByteDance for Breaching GPT Terms. Bitcoin ETF Combines Crypto and Carbon Credits

Top Tech Trends: What's Shaping the Future?

An extensive review of application of artificial intelligence and blockchain in management information systems

Transcending Human Biology: The Role of AI and Blockchain in the Future of Humanity

Quantum Race Heats Up, More AI goodies from Hugging Face and Google, Blockchain Shakeups, and Asia's Metaverse Education Boom!

The Quantum Tipping Point: Why We’re Still in the Early Stages of a Multi-Trillion-Dollar Industry

Understanding Artificial Intelligence

The Future of Digital Transformation: How Blockchain and AI Will Surpass All

Automated Data Augmentation: Solving the data lack in Machine Learning

2020年10月11日

Using Gaussian Process in Bayesian Optimization

2020年6月15日

Face Recognition & Verification. Pros & Cons.

2020年4月27日

Transfer Learning ?How to reach 88% on accuracy?

2020年4月13日

Summary - ImageNet Classification with Deep Convolutional Neural Networks

2020年3月26日

Optimization techniques in Machine Learning

2020年3月4日

Activation functions in Neural Networks

2020年2月23日

Is a new star growing in the Universe?

2019年11月8日

I wrote a Web Address, now what?

2019年8月26日

IoT: Is the microwave chatting with the freezer?

2019年7月26日