Machine learning and macro trading strategies

Machine learning and macro trading strategies

Machine learning can improve macro trading strategies, mainly because it makes them more flexible and adaptable, and generalizes knowledge better than fixed rules or trial-and-error approaches. Within the constraints of pre-set hyperparameters machine learning is continuously and autonomously learning from new data, thereby challenging or refining prevalent beliefs. Machine learning and expert domain knowledge are not rivals but complementary. Domain expertise is critical for the quality of featurization, the choice of hyperparameters, the selection of training and test samples, and the choice of regularization methods. Modern macro strategists may not need to make predictions themselves but could provide great value by helping machine learning algorithms to find the best prediction functions.

Find the full post and reference to the underlying material on the Systemic Risk and Systematic Value site.

Key benefits of machine learning for macro trading strategies

Most systematic macro trading strategies are based on fixed rules. Fixed trading rules are often maintained until they evidently break. By contrast, decision making with machine learning is based on variable rules. Since financial market environments are prone to structural change and instability this is a critical advantage. 

Conventional trading rules are based on trial-and-error. Their generation is time and labour intensive. Moreover, fixed rules do not deal well with uncertainty and unanticipated input, such as unprecedented volatility shocks or negative interest rates. By contrast, machine learning systems generalize knowledge better and are more easily adjustable than conventional rules, as long as they are provided with sufficient data. New experiences automatically become new training data that condition future actions.

No alt text provided for this image
No alt text provided for this image


Supervised machine learning algorithms propose actions based on a set of training data and a restricted hypothesis space. Training data are pairs of input and output data. The hypothesis space describes the type of prediction functions that the algorithm may consider. Given these restrictions, the data are allowed to learn on their own, as opposed to just reverse-engineer expert rules or verify prior beliefs. This way machine learning challenges or at least refines conventional wisdom by design. Learning becomes smooth and continuous. This lessens obstructions to learning that arise from rigid institutional constraints or personal attachment to specific beliefs.

The role of expert knowledge

The rise of machine learning does not devalue expert knowledge in economics and finance. Supervised learning methods require qualified prior beliefs. For example, the data scientist must choose plausible data sets and hyperparameters that control model complexity and model type. These choices require ample experience and domain knowledge.

Inputs into machine learning algorithms can be of a large variety of types, including text and images. Yet all must be translated into fixed-dimensional vector space to be fed into prediction function. The mapping from raw information (without structure) to a fixed-dimensional vector space is called featurization or feature extraction. This is a very important step that requires domain expertise. The more problems feature extraction solves the fewer difficulties the machine learning algorithm has to deal with. For financial market practice, this means that the better we are able to structure our input data from the myriad of available information, the easier the application of machine learning becomes. This means that knowledge of markets and economics remains important for competitive advantage.

How machine learning supports decision making

Decision theory is about choosing the best actions, under various definitions of optimality. Action is also the generic term for the output of a machine learning system, based on a pre-defined action space. The decision function (which is equivalent to a prediction function) takes an input and prescribes action. This decision function is the key product of machine learning. Actions are evaluated with respect to their outcome, typically by use of a loss function.

For formalization, decision theory refers to three spaces: input space, action pace and output space. In the case of macro trading the input space could contain relevant market and economic information (typically multiple real number time series). The action space could be a proposed trade and the output space could be the return on this trade. The spaces depend on the type of machine learning algorithm that was chosen.

Many problem domains can be formalized as four steps: [1] observe an input, [2] take an action, [3] observe the outcome, and [4] evaluate the action in relation to the outcome. The evaluation of actions is the subject of standard learning theory. This theory is based on the idea that we want to find a decision function that does well on leverage. i.e. producing a loss through action that is small. Expected loss is called risk of actions. Typically, this needs to be estimated based on available data and assumptions of their properties. Empirical risk is the loss based on available input/output data.

Bayes decision function is a function that achieves minimal risk among all possible functions. Its risk is called Bayes risk. However, the in-sample optimal decision function, simply based on empirical loss may be indeterminate and not be the best out-of-sample. This is where machine learning algorithms come in.

The key qualification of machine learning methods is generalizationGeneralization means spreading information we already have to other training points or other parts of the input space that we have not seen. This requires some “smoothness” in the prediction function, i.e. similar inputs should have similar outputs. Machine learning seeks to constrain prediction functions so that such smoothness is achieved. One approach is called constrained empirical risk minimizationInstead of minimizing empirical risk over all possible decision functions it constrains those functions to a particular subset, called a hypothesis space. The best function within that constrained space is called “risk minimizer”.

The train-test principle

Machine learning translates training data into prediction functions or decision functions. These functions deliver predictions or prescribe actions, called labels, for a case based on available features, represented by a feature vector.

The evaluation of prediction function is typically based on loss, a metric for the gravity of errors. It is calculated based on a specific loss function (such as squared errors or absolute errors) and based on a test set of data that is independent of the training set based on which the prediction function was chosen. It is important that the test set does not contaminate the training. This means that its information must not influence the choices with respect to the machine learning algorithm or the prediction function. Unfortunately, this is a significant risk with financial time series, because researchers typically know features of the history on which prediction functions are tested. If information of labels sneaks into the features in a way that would never happen in deployment this is called leakage.

The train-and-test principle of machine learning is a simulation of the traditional train-and-deploy principle prevalent in the investment industry. However, it is much more efficient and cheaper.

Train-test evaluation is another area where domain knowledge of experts is essential. Financial market data sets are prone to non-stationarity, which here refers to change in the data distribution, typically due to covariate shift (input distribution changed between training and test) or concept drift (correct output for given input changes over time). This can be due to policy changes (e.g. inflation targeting, quantitative easing), market structure changes (e.g. exchange rate regimes shifting from fixed to flexible) or technological changes (e.g. enhanced short-term information efficiency). It is inappropriate to train and test over influential structural changes. This would lead to what is called sample bias.

Test sets must be large enough to be meaningful. This can be an issue for macro trading strategies as there is only limited history of financial crises or business cycles. K-fold cross-validation is s standard train-test evaluation, particularly for smaller samples. This method selects k prediction functions and performances based on k different (albeit generally overlapping) training sets and k independent (non-overlapping) test sets. Cross-validation is not concerned with the performance of an individual prediction function, but with the performance of the model building algorithm. Each algorithm would produce a mean and standard deviation of loss measures. Of course, the actual prediction function used for deployment would be based on all the data.

No alt text provided for this image

For time series cross-validation is typically done through forward chaining based on expanding training time series. This allows checking if a specific machine learning algorithm consistently produces good prediction functions across time.

No alt text provided for this image

If we want to optimize over different learning hyperparameters, we need to divide the data into training, validation and test set. Hyperparameters are chosen by the data scientist in supervised learning to control model complexity, the definition of complexity, the optimization algorithm or the model type. The training data fit a prediction function based on a specific set of hyperparameters. The validation data is used for tuning the model’s hyperparameters. And the test data set is used for evaluating the algorithm including the tuning process.

The overfitting problem

A major practical pitfall in statistical learning is that features (such as predictors of asset returns) are relatively cheap to produce these days. Hence, quantitative researchers have a proclivity for overfitting (view post here). That proclivity increases with the neglect of structural information and expert knowledge. Overfitting translates into large gaps between the training and the test performances of models. Therefore, it is often appropriate to limit model complexity, based on qualified prior judgment, available data, and out-of-sample forecasting results.

Macro and finance is a field with many correlated data series and – when it comes to key macro events – quite limited history. This means we have many candidate predictors and only a limited number of experiences of specific occurrences, such as financial crises or currency devaluations. Importantly, complexity requires a sufficiently large number of data. The ratio of parameters to sample size must be reasonable.

The main defense against overfitting is regularization. Regularization means constraining the level of model complexity so that the model performs better at predicting or generalizing. Regularization produces models that fit data less well in the training sample with the intended benefit of fitting data better out-of-sample. There are two basic types of regularization. The first is to set the maximum complexity as a hyperparameter. That would simply constrain empirical risk minimization. This type is called Ivanov regularization. The alternative would be penalized empirical risk minimization. Penalizing the measure of complexity this means a trade-off between loss and complexity. This “soft constraint” is called Tikhonov regularization. For many machine learning algorithms, including LASSO and Ridge regression, these two forms of regularization are equivalent.

For each constraint parameter, there will be a different result. This called regularization path. The point is to find a path between underfitting and overfitting.

Lasso and ridge regression are the major workhorses of modern data science. They use two types of regularization with slightly different properties.

  • Ridge regression is a regression that adds “squared magnitude” of coefficient as penalty term to the loss function. This means that it is based on L2 regularization. Coefficients are generally reduced vis-a-vis unconstrained regression, but regressors are not dropped altogether.
  • Lasso (Least Absolute Shrinkage and Selection Operator) regression adds “absolute value of magnitude” of coefficient as penalty term to the loss function. This means that it is based on L1 regularization. Lasso often gives sparse solutions, i.e. a subset of coefficients will have zero values and the dimension of the input vector will be reduced. This helps to make models more interpretable.

Regularization issues

The application of regularization requires some in-depth understanding of the chosen method and knowledge of the data used. Most problems arise from the use of inputs that have similar or even identical information content. In Ridge or Lasso regression adding many time series with the same information content biases predictions to using the pre-selected type of information. Using time series with different scale and the same information content makes regularization methods prefer the features with a large scale, as they incur less of a penalty in terms of coefficient size. That is why features should usually be standardized.

Andrea Malagoli

Quantitative Portfolio Manager - Alternative Investments, Commodities, Structured Products

5 å¹´

“The rise of machine learning does not devalue expert knowledge in economics and finance. “ Very well said.

赞
回复

要查看或添加评论,请登录

Ralph Sueppel的更多文章

  • Tracking systematic default risk

    Tracking systematic default risk

    Systematic default risk is the probability of a critical share of the corporate sector defaulting simultaneously. It…

    3 条评论
  • Optimizing macro trading signals – A practical introduction

    Optimizing macro trading signals – A practical introduction

    Based on theory and empirical evidence, point-in-time indicators of macroeconomic trends and states are strong…

  • Commodity carry as a trading signal – part 2

    Commodity carry as a trading signal – part 2

    Carry on commodity futures contains information on implicit subsidies, such as convenience yields and hedging premia…

  • Commodity carry as a trading signal – part 1

    Commodity carry as a trading signal – part 1

    Commodity futures carry is the annualized return that would arise if all prices remained unchanged. It reflects storage…

  • Sovereign debt sustainability and CDS returns

    Sovereign debt sustainability and CDS returns

    Selling protection through credit default swaps is akin to writing put options on sovereign default. Together with…

    3 条评论
  • Macro demand-based rates strategies

    Macro demand-based rates strategies

    The pace of aggregate demand in the macroeconomy exerts pressure on interest rates. In credible inflation targeting…

  • How to measure the quality of a trading signal

    How to measure the quality of a trading signal

    The quality of a trading signal depends on its ability to predict future target returns and to generate material…

    3 条评论
  • The predictive power of real government bond yields

    The predictive power of real government bond yields

    Real government bond yields are indicators of standard market risk premia and implicit subsidies. They can be estimated…

  • Equity versus fixed income: the predictive power of bank surveys

    Equity versus fixed income: the predictive power of bank surveys

    Bank lending surveys help predict the relative performance of equity and duration positions. Signals of strengthening…

  • Business sentiment and commodity future returns

    Business sentiment and commodity future returns

    Business sentiment is a key driver of inventory dynamics in global industry and, therefore, a powerful indicator of…

    1 条评论

社区洞察