Time Sensitive Trade Data Processing
People would say that when good time comes then things are going to flourished, even very smarter ideas will not return much value if they are doing at the wrong time, likewise if it is right time even very less efforts may return extraordinary results quickly. Throughout the human history various experts tried to correlate events happening in our lives with the time they were happening and how could it be possible to get them in to one common stream to predict the future. Astrology is one such ancient method that could use to forecast the future using symbolic movements of planets.
Roots of astrology or its internal mechanisms are far beyond of ordinary senses and thought patterns that we everyday generate and consume. Statistics and probability is one of modern mathematical discipline where someone could predict the future up to reasonable extent using fair amount of past data. Unlike the astrology which was mostly depend on the practitioner and his personnel experiences the internal mechanisms of the prediction using statistics and probability is very well clear and understood.
Whether it’s Astrology, Statistics, machine learning or any other technique of future prediction, regardless of the discipline, there is a one such factor to be noted. Things are not just randomly happening, they are not just random events but also they have some strong relationship with the time, hence the thing that we referred to as the future is not something essentially a random stream of events.
If the future is not just random then, can we truly predict the future? Yes, some future events we certainly could. We have no doubt that tomorrow’s sun is also going to raise like it happen today, and we know when winter comes the spring is hidden with it, but hidden spring can’t pop up at any given moment but only when right time comes, where it certainly would blooming with many colures.
Those who are trading in stock markets are fortune seekers through numbers, weather its equity, forex, cryptocurrency, commodity or any other complex derivatives, it’s still a kind of a sense of the future. mostly through the figures available today.
If the future of the stock or an assets could predict accurately using only numbers, then computers must be the best candidate for it because computers are million times better than humans in calculations. This movement of future prediction by machines call Algorithmic Trading or Automated Trading.
Most matured markets such as United States, Japan ect are heavy users of automated trading, and other immerging markets also quickly following the foot step toward automated trading. Even though computers are good at calculations, how could it be possible to predict the future of the market events and the fortune of the trader?
Trading data is very much time sensitive, some automated trading algorithms requires history data processing through various financial formulas. Purpose of this article is to explain the techniques we took in processing historical trade data requirements in Direct Fn Algorithmic trading applications.
Challenges and Type of Queries
The entire subject of technical analysis is on the belief of ever repeating history. There are concepts, techniques and defined mathematical formulas about financial markets. Mathematicians who were interested in financial markets, has made such discoveries. The details discussed in the subject of technical analysis are used to identifying trends in price movements, and correlations between various assets classes. Now a day traders are not only traders with buy/sell orders, who reflects some aspects of miners too, instead of mining the earth they are miners of historical trading data to extracting the hidden fortune with in the data itself.A single year trade data will be around over 10 million records in emerging trading markets, while in busy venues such as NASDAQ, LSE might contain several hundreds of million records. Automated trading algorithms may issue some kind of time sensitive queries over these kind of large data sets.
If we iterate and selecting matching trades through this kind of large set of data then answering some complex query will take hours or even days of execution. Linear searching is something must be avoided, if not possible completely then up to a reasonable extent.
If there is a method of direct accessing required data from its memory location, it is what exactly required in these scenarios. We must then be known the location of the data required by query. If some kind of mathematical formula can support to find the location of the data then accessing data directly will not only possible but liner searching also can be avoided.
Finding the shape of the data
Data can’t exists without a shape, at least in our minds. Some data are inherently tabular and well suited for relational databases. Some data exists in graphs while some data represents complex hierarchical models. Finding most natural shape of the data suitable for a given application is a key factor of the success of the application and early completion of the project.
When we consider historical stock trading data, it has some important properties to list down.
1. Time of the trade
2. Date of the trade
3. Symbol of the trade
4. Trade price, quantity.
5. Sequence number
Seconds for a day is a constant, each day has 86,400 seconds and it’s not less or not more. Number of trading symbols of a given trading venue is also a constant for certain time period, days may grow with the data set. Let’s store all other properties of a trade against time, date and symbol.
(time,date,symbol) -> {trade1, trade2 …. Traden}
These kind of arrangement will result a shape of a cuboid. The result is a three dimensional structure of data.
After this point it’s all simple geometry as data has become a tangible shape in the space. Applying formulas to avoid linear searching is now possible.
Time Series Databases
Like relational databases, time series databases are kind of a database, specially designed for answering time sensitive queries. There are very well tested and popular time series databases in the market, both open source and commercial. Time series database are fast growing than any other discipline of databases as the massive growth of the internet of things, IOT data is essentially time sensitive, there are patterns to be discover against the time to reason about why such thing happen in this time.
To processing historical time sensitive data our approach was to implement simple time series database that specially designed for trade data handling, without going to advanced solutions in the market which build for general purpose. The main data structure was call cube and conceptualized as a java interface as listed below.
Implementation of the cube interface can be vary due to the requirement. It could be simple file based implementation or complete in memory implementation.
public interface Cube {
/**
* Save a trade in time series database
* @param trade
*/
void saveTrade(Trade trade);
/**
* save list of trades in same time series in to the database, the trade list must be in same time
* @param tradeList
* @param date
*/
void saveTradeList(List<Trade> tradeList, String date);
/**
* will return list of trades for particular query.
* @param tradeQuery
* @return
*/
List<Trade> getTrades(TradeQuery tradeQuery);
/**
* dynamic olhc calculation.
* @param tradeQuery
* @param intervalInSec
* @return
*/
List<OLHC> getOlhc(TradeQuery tradeQuery, int intervalInSec);
/**
* the results will be used in more advanced trading algorithms such as VWAP.
* @param symbol
* @param fromTime
* @param toTime
* @param historyDays
* @param intervalInSec
* @return
*/
List<TimeTradeVolume> getTradeAtInterval(String symbol, String fromTime, String toTime, int historyDays, int intervalInSec);
}