登录查看更多内容

The Accidental Data Scientists

Vivek Viswanathan

Quant PM

发布日期: 2021年7月12日

If you are on the younger side, you may have chosen to study data science and to become a data scientist. But if you are a bit older like me, you might have studied something else entirely—in my case, economics and finance over the course of a bachelor’s degree, master’s degree, and Ph.D.—and then realized that the methods that data scientists developed were superior to the methods of interrogating data that you had been taught. Circumstance made you a data scientist. You worked in a field that transitioned from humans being cost-effective and first-in-class decision makers to algorithms usurping our spot. You worked in a field where the empirical methods began advancing far faster than the rest of the field.

Ours is an interesting and important story, and I would argue that we live in a place of strength. Far from being split helplessly above a chasm between machine learning and domain expertise, we are a necessary bridge between the two worlds that allows us to build stronger models. We are cyborgs, straddling human expertise and machine learning algorithms. We are accidental data scientists, falling into a role out of a necessity borne of advancements in empirical methods. I cannot tell everyone’s story, but I can tell my own. And I welcome you to do the same.

If you are like me, you have spent the past few years, reading and watching videos on random forest and gradient boosting, transformers and attention, convolutional neural networks, Adam versus Rectified Adam, and swish vs. leaky ReLU activation functions. You watched Ken Jee and Statquest to understand more about the field and methods. You realized there was a gap between the models that you had been taught when you were in school and the current state of the art.

Over the majority of the past fifteen years, I used economic intuition and occasionally structural models as the basis for building strategies and portfolios. I would think carefully about what statistical test I needed to run to ensure that I had corrected for multiple testing or autocorrelation and heteroskedasticity of standard errors. If the standard tools failed, bootstrapping standard errors was also an option. All testing was done in-sample, so tremendous effort was expended trying to understand significance of model parameters.

These days, understanding Newey-West standard errors or Fama-Macbeth or understanding most asset pricing models is no longer necessary for effective model creation. In-sample fit is a metric used for debugging, not as metric viewed as relevant to expected model performance. Statistical significance of particular model parameters is forgotten. All prediction is focused on cross-validation sets. (Finance journal articles including those that I write still largely live in the old world and for generally understandable reasons which I will not discuss now.)

The Role of Cyborgs

All is not lost for us cyborgs. We hold three great strengths in my mind. The first is feature engineering. Practically, in quantitative equity management, we are unlikely to engineer any more financial and technical signals. There are a few hundred of them total, and machine learning algorithms can already infer whatever combination of those signals we might think provides a sharper signal. Instead, feature engineering now comes primarily from alternative data. You still use the standard signals. They are still valuable, but new signals will come from new untapped datasets.

Our comparative advantage is knowing what is likely to be predictive. In equity return prediction, the most predictive measures in roughly descending order are smart money flows, analyst forecasts, market data, and profitability metrics. There are many other signals and those are important too, but these generally have the biggest bang for their buck, so that compartmentalizes my search space in a way that someone without domain expertise is unlikely to have.

Ajit Jaokar 3 个月前

On My New Fixation

Hamze Ghalebi ? 1 年前

Breaking BERT?—?How to break into Machine Learning

Pascal Biese 5 年前

Our second strength is all the things beyond the prediction model. In quantitative equity management, currently, covariance estimation and portfolio optimization are not pure machine learning games. There are many domain-specific things to consider like taxation, transaction costs, and liquidity requirements.

Our final strength goes beyond what a typical domain expert can do. Pure domain experts often engineer linear combinations of signals, which will not improve a machine learning model’s predictability. Pure domain experts are often mystified by the outputs of machine learning models. They may not know how to properly cross-validate tests and parameters. That is where our machine learning understanding comes into play.

The cyborg is useful. The domain and ML expert combined into one person has a unique value that is not captured by having a separate domain expert and a separate ML expert.

What should you learn to be a quant finance cyborg?

I cannot speak for all domains but imagine you want to go into quantitative investment management. You want to be both a domain expert and ML expert. What should you start with? Machine learning. If you work alongside investment management quants, we can fill you in on all the domain-specific issues that you run into. I would much rather have an ML expert with no investment management background than an investment management expert with no ML background.

ML is somewhat generalizable whereas quantitative finance is exceedingly specific. If you learn a lot about bonds, that will barely be useful to you if you want to be a quant focused on equities or options or commodity futures. Spend 90% of your time studying the generalizable, technically difficult field. Spend 10% of your time studying the domain. When you start working in the field, those numbers will flip, so if you do not already have a good sense of machine learning, you will get relegated to the domain expert who does not understand machine learning, which is a fine place to be if you are happy with it, but you are probably here because you want something that embraces both domain expertise and machine learning.

Some Parting Words…

I hope this resonated to my fellow accidental data scientists out there. I spent much of my years of education studying methods that quickly became outdated. In those situations, we have a choice. We can entrench ourselves and convince ourselves and those around us that nothing has changed, and the old methods passed down from the ancients are still optimal. Or we can get curious and learn about where our understanding falls short. We have chosen the latter.

Thomas Arnold, PhD

Full Stack Data Scientist | Rising Health Risk is Predictable | Let me show you how to do it!

3 年

I was thinking about posting a survey on LinkedIn asking data scientists whether they were "accidental data scientists" or "intentional data scientists." I suppose that it would vary by age, since the field is so new. Thanks for sharing your creation story.

2 次回应

Gordon Ross, CFA

Data Analyst, Investment Practitioner

3 年

Thank you for your breadth of insight, Vivek. I now aspire to be an accidental data scientist also.

2 次回应

Farshad Saadatmand

Business Analyst @ MTI | PhD, MBA

3 年

I think your story greatly sheds light on the current existing gap in today’s investment management industry. Many experts in finance are facing a critical decision now: evolution or extinction. Glad you’re going with the former. Thanks Vivek Viswanathan for sharing this post!

1 次回应

Chethan Pai

Associate Principal, Analytics Research at SimCorp | Columbia University

3 年

I can totally resonate with this situation. I am going back to school this fall after working for several years in FI portfolio performance analytics space, and the feature engineering is exactly driving my course selection. Thank you for this post.

3 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

The Accidental Data Scientists

Vivek Viswanathan

Quant PM

The Role of Cyborgs

领英推荐

What should you learn to be a quant finance cyborg?

Some Parting Words…

更多精彩文章

社区洞察

其他会员也浏览了

What Data Science Forgot

24 Ultimate Data Science (ML) projects to work on in 2022.

5 MUST KNOW QUESTIONS FOR A DATA SCIENTIST

You have to fall in love with the Insights not with the Models (or with Coding)

Vector Indexing plus Knowledge Graphs with Neo4j

KDnuggets 17:n05: 5 Career Paths in Big Data, Data Science Explained; Identifying Better Predictors

You are into Data Science? Learn Linear Regression first (Introduction, some Pitfalls and how to avoid them)

KD 16:n43: The hard thing about Deep Learning; Big Data Main Events in 2016, Key Trends for 2017

The Role of Cyborgs

领英推荐

What should you learn to be a quant finance cyborg?

Some Parting Words…

The A-H Premium: Same Stock, Different Story

2021年10月28日

Relative Regulatory Risk: A-Shares vs. H-Shares vs. ADRs

2021年7月30日

How much China A-shares should you hold in your portfolio?

2021年6月17日

Chapter 5: China A-Share Mutual Funds

2021年4月8日

Chapter 4: Regulatory Releases

2021年3月31日

Chapter 3: State-Owned Enterprises

2021年3月17日

Chapter 2: Overview of China A-Shares

2021年3月8日

Chapter 1: Allocating to China A-Shares

2021年3月1日

How to Build a Great Backtest (and a Horrible Product)

2019年11月25日

China Stock Connect: Northbound Money, Smart Money?

2019年9月19日

社区洞察

其他会员也浏览了

What Data Science Forgot

24 Ultimate Data Science (ML) projects to work on in 2022.

5 MUST KNOW QUESTIONS FOR A DATA SCIENTIST

You have to fall in love with the Insights not with the Models (or with Coding)

Vector Indexing plus Knowledge Graphs with Neo4j

KDnuggets 17:n05: 5 Career Paths in Big Data, Data Science Explained; Identifying Better Predictors

You are into Data Science? Learn Linear Regression first (Introduction, some Pitfalls and how to avoid them)

KD 16:n43: The hard thing about Deep Learning; Big Data Main Events in 2016, Key Trends for 2017