ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Being systematic, part 3: some good practices

Vinesh Jha

CEO, ExtractAlpha and Estimize

å‘å¸ƒæ—¥æœŸ: 2016å¹´5æœˆ3æ—¥

In part 1 and part 2 we talked about the advantages and pitfalls of systematic investment research. This time, we'll talk about a few best practices we've picked up over the years and which we apply at ExtractAlpha.

Practice makes perfect

We all know how to get to Carnegie Hall, right? Practice, practice, practice. As a young music student I learned the importance of repetition in gaining the necessary muscle memory needed to move beyond pure mechanics. After sufficient amounts of practice, a musician can start to think about phrasing, structure, and nuance rather than focusing on simply being able to play the notes.

It's not so different with quant research. We have a set of analytical tools, and through their repeated use we are able to more fluidly answer deep research questions without expending as much time and energy on the mechanics, which in this case involve a lot of rote data analysis. Data manipulation is still a big part of the job - when we get a new data set, we need to line it up and get familiar with it, just like a musician sight-reading a new piece for the first time - but over time and with repetition we get better at it.

Checks and balances

Having a fantastic teacher or mentor helps, of course - in this regard I was fortunate as a musician, but not in quant where I'm largely an autodidact. And having a handbook or checklist, whether it's written down or just part of your routine, is also really helpful. Here are a few of the things we do as part of our research routine at ExtractAlpha.

We start each research project with a list of hypotheses and ideas. This can be a long list - really anything that's sort of reasonable and which pertains to the questions at hand. We'll brainstorm these ideas over the course of multiple days, often logging the ideas in a task management system. Then it's time for triage, by sorting to the top those ideas we definitely want to test, and for which we have the resources - usually this means we have or can acquire the relevant data. In the middle go the "maybes" and then we'll throw out ideas which seem a bit too outlandish or for which data simply isn't available.

Next, we pay careful attention to the data sets we're analyzing. An often overlooked step is simply opening up the data in a spreadsheet-like format and just scrolling through each of the fields (assuming we are talking about structured, tabular data). We'll often notice things like odd ways null values are stored, or oddly repeating values, when we take this simple but crucial step.

We also look at the distribution of the fields of interest. Which are well populated? Which categorical variables' values are common versus sparse? Which ones have outliers we need to take into consideration, for example by winsorization?

Now that we've got an understanding of the look and feel of the data, we can begin our in sample hypothesis testing. For the majority of our research process we will be in sample - that is, we will spend our time analyzing a pre-specified testing data set, and the remainder of the data will remain out of sample, for verification at the very end of the process. It's vital to be completely strict about in and out of sample testing, as tempting as it might be to "peek" out of sample midway through your testing to ease your discomfort. Choosing an in sample and out of sample split is also quite important. You want your out of sample period to be long enough to be meaningful, i.e., to encompass more than one type of market condition, but you also want your in sample period to be representative of the current time period; things have changed in the last decade, as noted in the previous post. We have a few techniques we use to address these issues, and they are worth thinking seriously about before beginning the research process.

Practical matters

Using the right tool for the job is essential. Hypothesis testing doesn't necessarily mean portfolio backtesting. Event studies are helpful, and sometimes we're trying to predict something other than returns; for example, models which predict company fundamentals such as revenues can often end up being more robust.

When it comes to backtests, we do a lot of cross sectional tests. The goal is to come up with a score across a wide swath of stocks at each time period (say, day), and determine whether the high-scoring stocks outperform the low-scoring stocks. The advantage of this approach, versus say a trading simulation, is that we get a very rich set of information. We can learn how the factor performs across sectors, capitalization ranges, time, and other slices. Furthermore the results are not as sensitive to the particular choice of portfolio construction parameters, and they are more indicative of how this factor might add in to an existing multifactor model.

We also look at a factor's exposure to common risk factors, its turnover and autocorrelation, and a plot of the time series of its (in sample!) returns, before and after transaction cost assumptions. All of this together gives us a holistic view of the efficacy of a factor, or of a variant of a factor as we try applying different hypotheses. And as we test, it allows us to understand the sensitivity of the idea to various formulations - the more robust, the better, lest we find the next butter in Bangladesh.

Finally, an often-overlooked note on universe construction. We see a lot of backtests from commercial vendors which include something like 5,000 stocks in the U.S. Even for a retail investor, a small trade can move the price of the 5,000th-most-liquid stock, and for institutional investors these stocks simply aren't tradable at all, even at very long horizons. We use a universe designed to mimic what institutions look at, but we're also always careful to split results by capitalization range, lest we find something that only has value among very small, illiquid stocks.

Summary

Hopefully some of these pointers were helpful. Clearly with a lot of quant research it doesnâ€™t take huge amounts of resources to do it right. Itâ€™s more about being careful and finding the right tool for the job, and being aware of some common pitfalls. For many of us quants, there are few things more satisfying than finding value in a new data set or anomaly; and it's all the more satisfying if, having followed some of these best practices, we can have more confidence that we're right.

Happy researching!

Haian Song

Investment manager assistant

8 å¹´

Thanks for sharing!

èµž

å›žå¤

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Vinesh Jhaçš„æ›´å¤šæ–‡ç«

Funky street cats in Hong Kong

2024å¹´3æœˆ7æ—¥

Funky street cats in Hong Kong

Back in November, driving home from work, I saw this quirky little vehicle in front of me and had to take a photo soâ€¦

8 æ¡è¯„è®º
ExtractAlpha's 2023 in review

2024å¹´1æœˆ17æ—¥

ExtractAlpha's 2023 in review

Thank you to everyone who followed our research and alternative data journey at ExtractAlpha this past year! I thoughtâ€¦

6 æ¡è¯„è®º
ExxonMobil, lobbying, and stock prices

2021å¹´7æœˆ5æ—¥

ExxonMobil, lobbying, and stock prices

This Fourth of July, it seems an opportune time to spend some time reflecting on that most patriotic of Americanâ€¦

2 æ¡è¯„è®º
ExtractAlpha and Estimize are merging!

2021å¹´5æœˆ10æ—¥

ExtractAlpha and Estimize are merging!

I have been fortunate to have been involved in several stages of the development of the use of analyst forecasts inâ€¦

19 æ¡è¯„è®º
Introducing ExtractAlpha's TrueBeats

2021å¹´3æœˆ4æ—¥

Introducing ExtractAlpha's TrueBeats

I've been studying earnings estimates data for about 23 years now. In that time, the state of the art in quantitativeâ€¦

8 æ¡è¯„è®º
The Value-Momentum rotation and Innovation

2020å¹´12æœˆ8æ—¥

The Value-Momentum rotation and Innovation

On Nov 9, Pfizer reported that its COVID-19 vaccine had proven to be 90% effective in early testing. That dayâ€¦

1 æ¡è¯„è®º
What Happened To The Quants In March 2020?

2020å¹´4æœˆ8æ—¥

What Happened To The Quants In March 2020?

Introduction In March, a wild month in the markets, most of the factors we track â€“ across sentiment, digital, andâ€¦

4 æ¡è¯„è®º
Do diverse companies outperform?

2019å¹´10æœˆ27æ—¥

Do diverse companies outperform?

The Wall Street Journal would have us think so. In a weekend article, The Business Case for More Diversity, the Journalâ€¦

13 æ¡è¯„è®º
Using alt data to predict revenues, globally

2019å¹´10æœˆ17æ—¥

Using alt data to predict revenues, globally

Using ExtractAlpha's Digital Revenue Signal (DRS) each quarter, we track the percentage of companies that beat and missâ€¦

1 æ¡è¯„è®º
The momentum crash: a quant tremor?

2019å¹´10æœˆ3æ—¥

The momentum crash: a quant tremor?

As has been widely reported, the Momentum factor had a miserable September. To put this into context, the move onâ€¦

See all articles

Being systematic, part 3: some good practices

Vinesh Jha

CEO, ExtractAlpha and Estimize

Practice makes perfect

Checks and balances

Practical matters

Summary

Vinesh Jhaçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Operational Research: Optimizing Decision-Making for Success

Part III - Statistical Correlations & Regression

Understanding Normal Distribution A Comprehensive Guide

What is Quantitative modeling: Definition, tools and users

Understanding Z-scores and P-values

What is data triangulation?

Quantitative Techniques: Understand Their Importance In The Business World

What is Linear Regression and How Does it Work?

Unraveling the Secrets of the Bell Curve: Understanding Normal Distribution

Bayes with Excel

Practice makes perfect

Checks and balances

Practical matters

Summary

Vinesh Jhaçš„æ›´å¤šæ–‡ç«

Funky street cats in Hong Kong

ExtractAlpha's 2023 in review

ExxonMobil, lobbying, and stock prices

ExtractAlpha and Estimize are merging!

Introducing ExtractAlpha's TrueBeats

The Value-Momentum rotation and Innovation

What Happened To The Quants In March 2020?

Do diverse companies outperform?

Using alt data to predict revenues, globally

The momentum crash: a quant tremor?

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Operational Research: Optimizing Decision-Making for Success

Part III - Statistical Correlations & Regression

Understanding Normal Distribution A Comprehensive Guide

What is Quantitative modeling: Definition, tools and users

Understanding Z-scores and P-values

What is data triangulation?

Quantitative Techniques: Understand Their Importance In The Business World

What is Linear Regression and How Does it Work?

Unraveling the Secrets of the Bell Curve: Understanding Normal Distribution

Bayes with Excel

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†