Being systematic, part 3: some good practices
In part 1 and part 2 we talked about the advantages and pitfalls of systematic investment research. This time, we'll talk about a few best practices we've picked up over the years and which we apply at ExtractAlpha.
Practice makes perfect
We all know how to get to Carnegie Hall, right? Practice, practice, practice. As a young music student I learned the importance of repetition in gaining the necessary muscle memory needed to move beyond pure mechanics. After sufficient amounts of practice, a musician can start to think about phrasing, structure, and nuance rather than focusing on simply being able to play the notes.
It's not so different with quant research. We have a set of analytical tools, and through their repeated use we are able to more fluidly answer deep research questions without expending as much time and energy on the mechanics, which in this case involve a lot of rote data analysis. Data manipulation is still a big part of the job - when we get a new data set, we need to line it up and get familiar with it, just like a musician sight-reading a new piece for the first time - but over time and with repetition we get better at it.
Checks and balances
Having a fantastic teacher or mentor helps, of course - in this regard I was fortunate as a musician, but not in quant where I'm largely an autodidact. And having a handbook or checklist, whether it's written down or just part of your routine, is also really helpful. Here are a few of the things we do as part of our research routine at ExtractAlpha.
We start each research project with a list of hypotheses and ideas. This can be a long list - really anything that's sort of reasonable and which pertains to the questions at hand. We'll brainstorm these ideas over the course of multiple days, often logging the ideas in a task management system. Then it's time for triage, by sorting to the top those ideas we definitely want to test, and for which we have the resources - usually this means we have or can acquire the relevant data. In the middle go the "maybes" and then we'll throw out ideas which seem a bit too outlandish or for which data simply isn't available.
Next, we pay careful attention to the data sets we're analyzing. An often overlooked step is simply opening up the data in a spreadsheet-like format and just scrolling through each of the fields (assuming we are talking about structured, tabular data). We'll often notice things like odd ways null values are stored, or oddly repeating values, when we take this simple but crucial step.
We also look at the distribution of the fields of interest. Which are well populated? Which categorical variables' values are common versus sparse? Which ones have outliers we need to take into consideration, for example by winsorization?
Now that we've got an understanding of the look and feel of the data, we can begin our in sample hypothesis testing. For the majority of our research process we will be in sample - that is, we will spend our time analyzing a pre-specified testing data set, and the remainder of the data will remain out of sample, for verification at the very end of the process. It's vital to be completely strict about in and out of sample testing, as tempting as it might be to "peek" out of sample midway through your testing to ease your discomfort. Choosing an in sample and out of sample split is also quite important. You want your out of sample period to be long enough to be meaningful, i.e., to encompass more than one type of market condition, but you also want your in sample period to be representative of the current time period; things have changed in the last decade, as noted in the previous post. We have a few techniques we use to address these issues, and they are worth thinking seriously about before beginning the research process.
Practical matters
Using the right tool for the job is essential. Hypothesis testing doesn't necessarily mean portfolio backtesting. Event studies are helpful, and sometimes we're trying to predict something other than returns; for example, models which predict company fundamentals such as revenues can often end up being more robust.
When it comes to backtests, we do a lot of cross sectional tests. The goal is to come up with a score across a wide swath of stocks at each time period (say, day), and determine whether the high-scoring stocks outperform the low-scoring stocks. The advantage of this approach, versus say a trading simulation, is that we get a very rich set of information. We can learn how the factor performs across sectors, capitalization ranges, time, and other slices. Furthermore the results are not as sensitive to the particular choice of portfolio construction parameters, and they are more indicative of how this factor might add in to an existing multifactor model.
We also look at a factor's exposure to common risk factors, its turnover and autocorrelation, and a plot of the time series of its (in sample!) returns, before and after transaction cost assumptions. All of this together gives us a holistic view of the efficacy of a factor, or of a variant of a factor as we try applying different hypotheses. And as we test, it allows us to understand the sensitivity of the idea to various formulations - the more robust, the better, lest we find the next butter in Bangladesh.
Finally, an often-overlooked note on universe construction. We see a lot of backtests from commercial vendors which include something like 5,000 stocks in the U.S. Even for a retail investor, a small trade can move the price of the 5,000th-most-liquid stock, and for institutional investors these stocks simply aren't tradable at all, even at very long horizons. We use a universe designed to mimic what institutions look at, but we're also always careful to split results by capitalization range, lest we find something that only has value among very small, illiquid stocks.
Summary
Hopefully some of these pointers were helpful. Clearly with a lot of quant research it doesn’t take huge amounts of resources to do it right. It’s more about being careful and finding the right tool for the job, and being aware of some common pitfalls. For many of us quants, there are few things more satisfying than finding value in a new data set or anomaly; and it's all the more satisfying if, having followed some of these best practices, we can have more confidence that we're right.
Happy researching!
Investment manager assistant
8 å¹´Thanks for sharing!