Clustering the Ethereum daily candlesticks to uncover hidden relationships
Ethereum has been making the news lately for its massive spikes and dips in prices and the seemingly unbounded possibilities that this technology will bring. Many people believe in and are invested in the Ethereum blockchain, myself included.
In my opinion, however, one of the things that is holding the Ethereum blockchain back is the huge amount of speculation that is going on in the markets.
The result is a huge volatility in the price of Ethers which could, in extreme cases drive the price all the way down to zero if a fear-driven chain reaction were to occur. This would in turn result in Ether-mining to be profitable one day and unprofitable the next, causing the miners to abandon the Ethereum blockchain eventually due to the uncertain landscape.
Uncertainty and information asymmetry are obvious drivers of speculation, so what if we could mitigate them somehow and increase the chances that the Ethereum blockchain survive till we are able to witness its intended possibilities?
What if we have a slightly-better-than-random idea of the next candlestick that is going to form on the Ethereum daily chart? My belief is that this will reduce fear and greed based trades and attempt to stabilise the prices just a little.
*Disclaimer: The contents, estimations, and predictions in this article are meant solely for research and educational purposes (mostly for my own learning) and should not be taken as investment advice. Neither I nor anyone else distributing this article is liable for any losses that occur from using the information contained here.
Dabbling in Data Analysis and Statistical Learning
What I have attempted to do is to find distinct clusters of candlestick formations from the historical price and volume data using the K-Means clustering algorithm. The K-Means clustering algorithm is an unsupervised clustering method which means it discovers similarities among data points on its own without explicitly having to tell the model what each candlesticks are before hand.
One of the advantages of using this is that we throw as much assumptions out of the window as possible and let the data reveal its true relationship among one another. The downside is that verifying the accuracy of the clusters is difficult and hence diminishes the reliability of the model to some extent. The results of this algorithm will also vary depending on a number of factors such as the features being used, the number of predetermined clusters, and the starting points of these cluster 'centres'.
Now I do not claim to be an expert in data science or market analysis, but this could be a step towards a right direction.
We can see the resulting clusters below:
(I try my best to name them accordingly)
Cluster 0 - Big Losses Small Losses
The first distinct cluster is characterised by candlesticks with long, medium, and short red bodies and wicks. In short, a classic move to the downside.
Cluster 1 - Gains and Wicks
The second distinct cluster is characterised by green candlesticks, mostly with long wicks. This represents moves to the upside followed by profit-taking or consolidation, or both.
Cluster 2 - Big Gains Small Gains
A classic move to the upside with good chances of lucrative gains to be made.
Cluster 3 - Losses and Hammers
This cluster contains red candlesticks with mostly short bodies but interestingly the candlestick with the longest relative body length is included here. Possibly representing cases where prices are 'bleeding' out over a period of time with a chance of a major downturn.
Cluster 4 - Volatile Indecision
This cluster is distinguished by long wicks on either or both ends, for both red and green candlesticks. I personally feel that this is representative of indecision in the market.
Cluster 5 -Blood and Indecision
The last cluster is a small group which includes long wicks along with some red long-bodied candlesticks. This cluster also shows indecision in the market but with a higher chance of losses occurring.
The Transition Matrix
Now you must be wondering, 'So what if we have all these fancy clusters? How can we make use of this information?'
Well what if, on top of there being a hidden relationship between candlesticks, there was a hidden relationship and sequence between each candlestick cluster as well? Some hidden rule that the markets seem to follow?
Sounds a lot like the transition matrix from the Hidden Markov Model. By counting how many times candlesticks from each cluster precedes each other cluster, we can come up with the 6 by 6 probability matrix below:
The rows of this transition matrix represent the 'From' state and the columns represent the 'To' state. The numbers in each 'tile' represents the probability of transitioning from one state to another. (ie. the probability of transitioning from state 3 to state 1 is 26%).
How can we use this information?
First let us recall our 6 clusters.
Cluster 0 - Big Losses Small Losses
Cluster 1 - Gains and Wicks
Cluster 2 - Big Gains Small Gains
Cluster 3 - Losses and Hammers
Cluster 4 - Volatile Indecision
Cluster 5 - Blood and Indecision
Quick Example:
Now looking at row '3', a candlestick in the 'Losses and Hammers' cluster will most likely transition into the 'Gains and Wicks' cluster (26% chance). Furthermore, there is also a 22% chance that the next candlestick will stay in the 'Big Gains Small Gains' cluster, giving us a 48% chance gains for holding our positions through the day.
Comparing to a 29% chance of major losses occuring in the next candlestick, this could be a good signal to enter into a position.
Again, I would like to emphasise that no one should base their investment or speculation decisions on this alone and if done so, it will be at your own risks.
Whats Next?
1) I will be exploring the common techniques used in technical analysis to derive buy or sell signals whilst benchmarking against this model.
- The big question to ask is - Are the knowledge passed down for technical analysis backed by evidence or simply just because that has been the way it was passed down?
2) Another interesting direction would be to explore the likelyhoods of an n-period state result given an initial state. For example, if we are in state 0 now, what are probabilities of ending up on state 2 or 3 at the end of 5 days?
For more details on the work in this article, please feel free to visit my github repository.