Clustering - a summary of some of the recent research
Source: Clustering(Presentation Slides), Marco Lopez dePrado, Cornell University - True Positive Technologies

Clustering - a summary of some of the recent research

As Marcos Lopez de Prado highlights in his more recent book – Machine Learning for Asset Managers (2020) – clustering allows for intuitive interpretations and doesn’t involve a change of basis.  Non-linearities can be captured with a clustering method such as tSNE, although this is a stochastic process and global distances are lost.  

Principal component analysis (PCA) has long been used in financial markets to isolate factors that explain a market, notwithstanding issues around PC attribution and factor interactions.  Hierarchical PCA addresses some of the issues associated with interpretation beyond the first principal component.  Kernel PCA is a way to capture non-linearities.  

Many papers capture the interplay between the clustering and PCA, such as cluster analysis after dimension reduction or after re-arrangement of all the features into orthogonal representations.  Or clustering after normalising for the first eigenvector.   

The Research

Clustering (Presentation Slides), Lopez de Prado, 2020. For an overview of clustering, including determining the optimal number of clusters for partition approaches.  Also see Chapters 2, 3 and 4 of Machine Learning for Asset Managers, which covers detoning (to remove the first eigenvector) and a discussion of metrics. 

A review of two decades of correlations, hierarchies, networks and clustering in financial marketsMarti, Nielsen, Binkowski and Donnat (2019 v5). A review of various clustering papers over the past two decades.

Reconstructing Emerging and Developed Markets Using Hierarchical Clustering, Journal of Financial Data Science1, Garvey and Madhavan, 2019. This paper investigates SVD, hierarchical and k-means clustering as a means of grouping developed and emerging equity and bond markets.  And then compares the results with some of the more widely tracked global benchmarks for those asset classes.  It provides a good overview of the approaches, compares the three methods and applies them to relevant macro markets.  They find substantial differences between cluster and benchmark assignments.   

In Hierarchical PCA and Applications to Portfolio Management (2019), Marco Avellaneda notes that ‘the identification problem in PCA reflects the uncertainty, or unreliability, of cross-asset correlations’…and as the size of the trading universe increases…’the correlations of assets which are not economically related….are difficult to quantify and may be noisy’.  The suggested approach is to use sector correlation matrices (via HPCA).  

This latter area is particularly busy.  To date – and although not recent - one of the better overviews/explanations of hierarchical clustering and asset allocation I have read is Hierarchical Clustering Based Asset Allocation, Raffinot, 2018.  One caveat with this paper is that it does not take into account transaction costs and turnover using this approach is elevated.  A point also captured in a more recent work by Kolrep, Lohre, Radatz and Rother in Economic vs Statistical Clustering in Multi-asset Multi Factor Strategies (2020), where they compare Hierarchical Risk Parity (HRP) with other diversified risk-based strategies and find HRP generates substantial turnover within their multi-asset portfolio rebalancing (which they address via addition of a penalisation term).  

Views and opinions expressed are solely my own and do not express those of my employer. 

1 Requires subscription.

?? Arjun Prakash

PhD Student (AI/ML) at Brown University, Quad Fellow

4 年
回复
Gerardo Lemus

Data Science (Financial Industry / Blockchain-DeFI)

4 年

I actually implemented a few clustering tools using python a couple of years ago - you might find the implementation useful (inside the blog I include the code as a google colaboratory notebook:?https://link.medium.com/k56RBBvku8)

回复

要查看或添加评论,请登录

Tim Moloney的更多文章

社区洞察