Financial Time-Series Predictions and AI Models (Part 2): HTM Models
Eberhard Schoneburg
Artificial Intelligence and Artificial Life Pioneer, Author, Speaker, Investor, Advisor, Lecturer
In my last post about financial time-series forecasting with Deep Learning (DL) models I took a rather critical view on DL and recommended not to use DL models for (financial) time-series forecasting for various reasons. My main argument was the observation that DL models do not have a convincing way to represent and process time dependencies in their inputs and hence will always have problems when used for time-series analysis and predictions.
In this new post here I assume the opposite stance and recommend that you try out or at least experiment with a special modern AI model that is more fit for forecasting purposes. As a matter of fact, the model discussed here was especially designed for learning, remembering and processing of temporal sequences. Hence, this model is called: Hierarchical Temporal Memory (HTM).
So far, HTM is mostly only known by AI insiders and specialists but barely known by the general public. That is why I will explain below how it works in some detail. I will discuss other AI models for financial applications in follow up posts.
The concept of HTM models has been introduced by Jeff Hawkins, the creator of the PalmPilot hand held computing device. In 2005 he published his book "On Intelligence" that lays the groundwork for his HTM model, a still recommendable read today. He has been working on improving HTM models ever since and is now working on the 3rd generation of the model.
With his small team of just around 50 employees however, Hawkins is a clear underdog in AI when compared to the big players like Google, IBM, Amazon, Facebook, Apple etc. which have several 1000s of AI staff and engineers. Despite his significant contributions in functional brain modelling, HTM is still mostly ignored by the big players and sometimes even by the academic AI establishment which criticises that Hawkins barely publishes any peer reviewed articles about HTM and the lack of an underlying comprehensive mathematical theory for HTM as compared to DL. There is no back-propagation like concise mathematical theory developed for HTM yet.
The general Philosophy underlying the HTM Model
A key claim of Hawkins and something he is proud of is that HTM is more closely modelling how our brain is structured and functions (contrary to DL models for example). He believes that real machine intelligence is impossible without incorporating similar structures and functions in computers as we find them to exist in our brains. With HTM he tries to re-engineer how our brains work and to build models of it in software (he also currently discusses with some hardware companies how to build new computer chips that better support his model).
Even though he claims to have modelled and included the latest insights from neuro science and functional brain research in HTM he honestly admits, that the work is not yet completed and that his team is still working on refining and improving his model.
For Hawkins our brain must and can only be understood as a result of evolution. The key and main purpose of the brain in our lives and our evolution is to secure our survival. For survival it is important and advantages that we can not only recognise well what is happening around us and to our bodies at any time. We also need to be able to learn from our experience and the sequences of events we experience.
We need to remember and understand especially well the sequences of events in time that we experience so we can learn to predict the future of events that may happen to us to avoid (repeated) negative and life threatening experiences. Processing sequential memories lies at the core of our brain activities. Intelligence occurs when such sequences are structured and organised in hierarchical more abstract layers of columns of neurons.
These hierarchical layers and associated pooling processes allow us to form stable less fluctuating abstractions from the erratic and continuously changing sensory inputs we experience and need to process every second of our conscious lives. They also allow us to form and recognise complex hierarchical patterns in our thinking processes. For Hawkins our brains are therefore just specialised organs that represent, remember, process and predict complex (hierarchical) sequences of events and experiences in time.
HTM models assume and rely on the fact that our neocortex is surprisingly homogeneously structured in the same way everywhere in the brain (with very few local exceptions). Every region of the neocortex looks similar to all others under the microscope. Each region is made up of similar columns and layers of neurons. The many different known functional specialisations of brain regions are only due to and caused by the different wirings to and from these regions, but not due to different structures of the local areas of the neocortex.
Therefore, HTM models assume that only one single processing method is used and needed by the brain to handle and process all sensory inputs (such as visual, audio, tactile etc and our inner body senses) in any given region of the neocortex. This single method according to Hawkins is a temporal, hierarchical memory process as described and specified in the HTM model.
This makes HTM models predestined objects to study when trying to intelligently forecast and analyse temporal processes like financial time-series.
Hierarchies of layers in HTM and micro Columns
A HTM model consists of one or more hierarchies of levels of columns of neurons (see picture on right). A neuron is called a cell. When cells are stacked up on top of each other they build a vertical cell micro columns. Several neighbouring micro columns form a region. One or several regions together make up one level of the hierarchy. Levels are therefore 3-dimensional structures in HTM models and not 1-dimensional structure as in the much simpler DL networks.
All cells in one vertical micro column in one level receive the same feed forward input and have the same receptive field.
Each horizontal layer of cells of one level is usually depicted and visualised as a 2-D planar structure of neighbouring cells (usually represented graphically as a square of cells - see the picture below in the section about sparse distributed encoding).
Levels connect to higher and lower levels through dendritic connections of the cells. Cells on the same plane build up dendritic connections to other cells that grow or die off during the learning process. Usually the hierarchies of levels have the shape of a tree and get smaller (have less neurons) towards the top as the higher levels process more convergent and abstract information (supported by a spacial and temporal pooling process). Several such hierarchies can be combined either vertically or horizontally or both to build more complex networks of HTM models.
The different levels in a HTM model represent the known horizontal layers of neurons in the neocortex (mostly of layer 3 or 4, see picture on right). The vertical micro columns that make up a region of a level represent the biological micro-columns of the neocortex (see also my post "Cracking the Neural Code").
The Neuron Model in HTM
HTM models have the most refined and biologically realistic models of neurons (cells) of all standard neural network models in the public domain. DL and related network models and all others use a very simplified and old (from the 1940's) neuron model as their basic units. They all assume that a neuron is a simple processing unit that takes in all inputs at their dendritic synapses, sums them up in the cell body, checks whether the accumulated incoming signals exceed a certain threshold and then fire a signal down their axons if that is the case (see "A" in picture below).
The latest generation of neurons in the HTM model however, are much more complex (see "C" in picture on right) and reflect biological neurons and their processing more closely. The major improvement over the classical model is that each neuron does not only have a feed forward dendritic input channel like the standard DL networks have, but also a context generating dendritic input channel and an additional feedback input channel as is usually only known from so called recurrent neural network (RNN) models.
HTM neurons therefore allow for a much more detailed and more sophisticated representation of the dendritic input. Activations of segments of dendritic branches and synapses depend on their distance to the cell body. Synapses that lay in close proximity to each other along a dendritic branch can do some sort of pre-processing of the incoming signals and for example act as co-incidence detectors for the neurons. This can be seen as a mechanism for neurons to prepare and "predict" for what they need to do next.
Neurons in HTM are not learning to adjust synaptic weights attached to their already existing connections to other neurons (like in DL) but rather learn to develop and grow dendritic or axonal connections to other neurons during learning. This learning process still follows a kind of Hebbian learning rule.
A neuron builds a connection to another neuron in the same plane if the other neuron was activated in the learning step just before the current one (see picture above). The logic behind this is that a neuron "A" connects to all other neurons that were active just before neuron A thereby indicating that these other neurons somehow predicted and preceded the activity of neuron A.
The neurons in HTM can take on 3 possible distinct states (compared to the classical model of neurons which only have 2 states - active and inactive). Besides the 2 common states of active and inactive, HTM neurons can also be in a predictive state which indicates that the neuron is expected (prepared, conditioned) to become active in one of the following processing cycles (see in the picture on the left the orange coloured cells compared to the red cells - the orange cells represent cells in predictive state in the layer). There are often several predictive state cells in a layer.
Learning Schedules in HTM Models
Another major difference between HTM and DL models and most other neural AI models is that HTM models can learn continuously, they don't have to go offline when deployed. They do not need to learn in batch mode but can constantly update the connections between the neurons. As a matter of fact, the HTM learning schedule options are even more flexible. If required and needed HTM models can still stop learning on demand when deployed.
However, unlike any other DL or related model the different hierarchical levels of an HTM model can selectively be switched on or off from learning at any time whenever needed. This means the user can select when and for how long any level of the network shall continue or stop training !
A typical use-case for such a flexible learning schedule is when one does not want to expose the net any more to new input sequences (to avoid over-fitting effects for example) but still wants the net to continue learning on a higher, more abstract level in the hierarchy of levels. With this approach one can get the HTM model to behave more intelligent without losing its skills of discriminating details in the input layers learned before.
Sparse Distributed Encoding and Associative Memories in HTM
An important feature of HTM models is that they use a so called "sparse distributed encoding" methods for their neuron activities. This feature is supposed to mirror the fact that our brains seem to represent the patterns they process and the sensual input they receive also in sparse distributed activity patterns of the neuron activities.
Even though our brains consists of many billions of neurons, the sparse representation means that at any given time only a very small percentage of the overall number of neurons is active (see picture on left). This, it turns out, is a very powerful mechanism and feature of our brains and HTM models.
The sparse processing and representation of information reduces the overall amount of energy needed to keep the brain active (our brain is the highest energy consumer of all organs in our body) and at the same time creates an effective and reliable error minimising and pattern association mechanism (associative memory).
The corresponding distributed sparse information processing in HTM is achieved by sending top-down inhibitory signals to quiet "competing" cells thereby minimising the number of active cells in a plane.
In a typical HTM implementation the sparsity factor is in the single digit percentage range (say 2%). For example, if a plane consists of say 100 x 100 cells, so in total 10,000 cells, then usually only around 200 cells will be active at any given time. This prevents the HTM model to confuse activity patterns and to avoid false classifications.
The probability that any given sparse distributed pattern of the 10,000 cells overlaps significantly (say more than 50%) with any other sparse distributed pattern is extremely low - unless the sparse patterns are very similar. And this is the whole purpose of the sparse encoding - it finds similar patterns and reduces classification errors of the HTM networks at the same time. In practical applications HTM models usually use planes of 2048 cells with around 40 (2%) active cells at any given time to encode sparsity.
This also makes the sparse distributed encoding a scheme to implement associative memory mechanisms within HTM. Because the overlapping of any 2 random sparse distributed patterns on a plane of cells is highly unlikely, one can use this fact to match patterns just on partial information. If a sparse pattern A matches a sparse pattern B by say 40% one can conclude already that the patterns are very similar and will occur in similar circumstances. Hence a new sparse pattern can already be identified or classified by just seeing 40% of its cell activities !
Context Understanding and higher Order Sequences
Another very powerful effect of the sparse distributed encoding is that it allows HTM models to "understand" and represent context dependent learning and processing. In some contexts an event or sequence of events has to be classified and acted upon differently than in other contexts. For example, if somebody in your office suddenly screams "Fire!", its usually a good idea to stand up and leave the office. But if this happens in a theatre while you watch a play and an actor screams "Fire!" it is usually just part of the play with no reason to stand up and run.
Time series and sequences can overlap in part or matches partially with other sequences. They may have common parts where they are identical or very similar (like seasonality) but they may also have other parts where they divert. The sparse distributed encoding and the 3-D column structure of the HTM layers allow to match time sequences that partially overlap within a certain context.
The picture below shows how the sparse distributed encoding in combination with the vertical columns in a network layer allow the HTM networks to respond appropriately to such situations. Two subsequent events B and C are encoded differently depending upon what the network has seen before training. If then the sequence "ABC" is detected as input, the network will correctly predict "ABCD" as output rather than "ABCY". this means that the HTM model "understands" that the subsequence "BC" has different "meanings" in different sequences.
Time Series Applications of HTM
As we have seen, HTM models are specially designed for memorising, learning and recognising sequential processes and hence are predestined for time-series applications. HTM models have already been used successfully with good results in time related areas such as in:
- Temporal anomaly detection
- Analysis of server usages over time
- Stock price movements anomalies
- Rogue behaviour pattern detection
- Geospatial tracking
- Natural language processing.
The context "understanding" feature of HTM models can be used for the difficult problem of anomaly or outlier detection in the analysis of real world time-series and time related events (see picture on left). When does a spike or several spikes in a time series indicate noise or random effects that can be filtered out and when will spikes rather represent a "black swan" event, a special relevant but unexpected situation with significant effect ?
A noise event can be recognise by the HTM network by its effect on the overall cells in one layer and the layers on top in the hierarchy. Noise will be usually cancelled out over time in the hierarchy of the levels as it will generally not built up sparse recurring patterns. However, "black swan" events will have a temporally lasting effect in the network and even may change the sparse distributed representation itself by forcing inhibitive feedback from top layers downwards. This can be detected by the HTM network and used for outlier identification and classification.
The detection of outliers and anomalies in time series is done by inspecting the activities of the neurons in the layers of the network. The HTM model always indicates simultaneously all the cells that are active and the cells that are in predictive state. The activity of a cell and its related column is hence considered abnormal if its state has not been predicted by the model before. Counting the cells that were predicted and comparing them to the unpredicted cells gives an additional overall anomaly score of a temporal sequence by the HTM model (see picture above). Anomaly detection is therefore one of the best use-cases for HTM models
Numenta has released a free mobile app in the Apple app store called "HTM for Stocks" to analyse time-series for stocks which is based on its anomalies detection features. The app recognises abnormal fluctuations in stock charts in near real time and associates them with Twitter feeds related to the same stock in order to try to explain why the abnormalities have occurred using the Twitter chatter as support. They also offer anomaly detection in time-series as a commercial SaaS service.
The software for the current generation of the HTM model is available for free and as open source to download at: www.numenta.org.
Full disclosure: I have no business ties to Numenta or Jeff Hawkins whatsoever.
Conclusion
One of the key claims of HTM is that it reflects the neural mechanisms and structures in our brains much more closely than any other neural models in AI. This is an agreeable statement.
However, does it also mean that it reflects and represents current neuro science findings better in absolute terms, not just better relative to other models ?
This is disputable. Many recent findings in neuro science are not reflected yet in HTM models (for example, the role of the glial cells, the glial-neural communication or how the temporal encoding of the spike-trains would fit into the HTM model). But it would not be fair to request this from HTM. There are literally thousands of articles about brain research and neuro science published every month with new findings about how our brains work. Its impossible for the small HTM team to be and stay current with all these new results.
One main problem, however that needs to be mentioned and stressed pertaining to HTM models is the sparse distributed encoding that is required on all levels of the model and especially at the input level to the network. Numenta uses and provides so called "encoders" for this, programs that turn input streams into a sparse distributed representation the network can use to do the further processing.
The sparse encoding is an important feature and function of the model but it is also its Achilles heel. If the sparse encoding is not done well, i.e. if it does not represent the patterns in the input sequences well or even distorts hidden patterns in a statistically significant way, the network is unlikely to generate any useful results (garbage in, garbage out).
Therefore, the effort and prior detailed data analysis required to come up with and generate a good sparse distributed encoding of the input stream can be very high and will strongly influence the success of the network. In many applications this encoding needs to be manually customised and fine tuned to the specific use-case requirements.
Other than that, HTM is a powerful, more modern AI model and approach. It is promising and definitely worth trying for time-series related tasks.
Eberhard Schoneburg
Hong Kong, May 1st, 2017
Business Manager at Slovenia pavilion at Expo2020 Dubai
6 年But what is your personal opinion, what is the best model for financial time series? Is it HTM or perhaps Random Forest. We have used Random Forest and it looks good.
Systematic Equities
7 年Miko?aj Bińkowski
Thinknowlogy is the world's only naturally intelligent knowledge technology, based on Laws of Intelligence that are naturally found in the human language. Open souce software.
7 年In regard to "For Hawkins our brain must and can only be understood as a result of evolution": ? Despite centuries of exhaustive research, the theory of evolution still hasn't provided a satisfying explanation for the origin of intelligence and language. Let alone, how both are related; ? According to the biblical world view, God has created laws of nature – including laws of intelligence – to make his creation run like clockwork. Being based on (Natural Laws of) Intelligence embedded in Grammar, only my software implements the intelligent function of words like definite article “the”, conjunction “or”, possessive verb “has/have” and past tense verbs “was/were” and “had”. Assuming the biblical world view is true, I am using fundamental science (logic and laws of nature) instead of cognitive science (simulation of behavior): ? Autonomous reasoning requires both intelligence and language; ? Intelligence and language are natural phenomena; ? Natural phenomena obey laws of nature; ? Laws of nature (and logic) are investigated using fundamental science. Using fundamental science, I gained knowledge and experience that no one else has: ? I have defined intelligence in a natural way, as a set of natural laws (https://mafait.org/intelligence/); ? I have discovered a logical relationship between natural intelligence and natural language (https://mafait.org/intelligence_in_grammar/), which I am implementing in software; ? And I defy anyone to beat the simplest results of my Controlled Natural Language reasoner in a generic way, from natural language, through algorithms, to natural language: https://mafait.org/challenge/. It is open source software.
Founder - aequum.ai
7 年Are there any R or Python packages that implement HTM? I learned the hard way that the ML methods are not necessarily conducive to time series application. My SDE experience with traditional time series as an implementation helped me understand that in traditional TS you basically have a constant "dt" and most methods have difficulty with irregular spaced data. Think ARIMA or GARCH. I have applied random forest as a regression technique with repeated fits and provide "t" as an independent variable. Analyzing the performance of the repeated fits allowed me to tune the approach and careful choice of dependents I get a reasonable fit behavior over short times but no dynamic model. I'm thinking of expanding on this by using the output of the process as an input to a regular spaced TS model. Not clear yet, but seems reasonable. Also, the dependents are not time sync'd, so there is probably information in the dependents that could feedback into the other corresponding dependents as inputs, but have not peeled away at that yet.