Being systematic, part 3: some good practices

Being systematic, part 3: some good practices

In part 1 and part 2 we talked about the advantages and pitfalls of systematic investment research.  This time, we'll talk about a few best practices we've picked up over the years and which we apply at ExtractAlpha.

Practice makes perfect

We all know how to get to Carnegie Hall, right?  Practice, practice, practice.  As a young music student I learned the importance of repetition in gaining the necessary muscle memory needed to move beyond pure mechanics.  After sufficient amounts of practice, a musician can start to think about phrasing, structure, and nuance rather than focusing on simply being able to play the notes.

It's not so different with quant research.  We have a set of analytical tools, and through their repeated use we are able to more fluidly answer deep research questions without expending as much time and energy on the mechanics, which in this case involve a lot of rote data analysis.  Data manipulation is still a big part of the job - when we get a new data set, we need to line it up and get familiar with it, just like a musician sight-reading a new piece for the first time - but over time and with repetition we get better at it.

Checks and balances

Having a fantastic teacher or mentor helps, of course - in this regard I was fortunate as a musician, but not in quant where I'm largely an autodidact.  And having a handbook or checklist, whether it's written down or just part of your routine, is also really helpful.  Here are a few of the things we do as part of our research routine at ExtractAlpha.

We start each research project with a list of hypotheses and ideas.  This can be a long list - really anything that's sort of reasonable and which pertains to the questions at hand.  We'll brainstorm these ideas over the course of multiple days, often logging the ideas in a task management system.  Then it's time for triage, by sorting to the top those ideas we definitely want to test, and for which we have the resources - usually this means we have or can acquire the relevant data.  In the middle go the "maybes" and then we'll throw out ideas which seem a bit too outlandish or for which data simply isn't available.

Next, we pay careful attention to the data sets we're analyzing.  An often overlooked step is simply opening up the data in a spreadsheet-like format and just scrolling through each of the fields (assuming we are talking about structured, tabular data).  We'll often notice things like odd ways null values are stored, or oddly repeating values, when we take this simple but crucial step.

We also look at the distribution of the fields of interest.  Which are well populated?  Which categorical variables' values are common versus sparse? Which ones have outliers we need to take into consideration, for example by winsorization?

Now that we've got an understanding of the look and feel of the data, we can begin our in sample hypothesis testing.  For the majority of our research process we will be in sample - that is, we will spend our time analyzing a pre-specified testing data set, and the remainder of the data will remain out of sample, for verification at the very end of the process.  It's vital to be completely strict about in and out of sample testing, as tempting as it might be to "peek" out of sample midway through your testing to ease your discomfort.  Choosing an in sample and out of sample split is also quite important.  You want your out of sample period to be long enough to be meaningful, i.e., to encompass more than one type of market condition, but you also want your in sample period to be representative of the current time period; things have changed in the last decade, as noted in the previous post.  We have a few techniques we use to address these issues, and they are worth thinking seriously about before beginning the research process.

Practical matters

Using the right tool for the job is essential.  Hypothesis testing doesn't necessarily mean portfolio backtesting.  Event studies are helpful, and sometimes we're trying to predict something other than returns; for example, models which predict company fundamentals such as revenues can often end up being more robust.

When it comes to backtests, we do a lot of cross sectional tests.  The goal is to come up with a score across a wide swath of stocks at each time period (say, day), and determine whether the high-scoring stocks outperform the low-scoring stocks.  The advantage of this approach, versus say a trading simulation, is that we get a very rich set of information.  We can learn how the factor performs across sectors, capitalization ranges, time, and other slices.  Furthermore the results are not as sensitive to the particular choice of portfolio construction parameters, and they are more indicative of how this factor might add in to an existing multifactor model.

We also look at a factor's exposure to common risk factors, its turnover and autocorrelation, and a plot of the time series of its (in sample!) returns, before and after transaction cost assumptions.  All of this together gives us a holistic view of the efficacy of a factor, or of a variant of a factor as we try applying different hypotheses.  And as we test, it allows us to understand the sensitivity of the idea to various formulations - the more robust, the better, lest we find the next butter in Bangladesh.

Finally, an often-overlooked note on universe construction.  We see a lot of backtests from commercial vendors which include something like 5,000 stocks in the U.S.  Even for a retail investor, a small trade can move the price of the 5,000th-most-liquid stock, and for institutional investors these stocks simply aren't tradable at all, even at very long horizons.  We use a universe designed to mimic what institutions look at, but we're also always careful to split results by capitalization range, lest we find something that only has value among very small, illiquid stocks. 

Summary

Hopefully some of these pointers were helpful.  Clearly with a lot of quant research it doesn’t take huge amounts of resources to do it right.  It’s more about being careful and finding the right tool for the job, and being aware of some common pitfalls.  For many of us quants, there are few things more satisfying than finding value in a new data set or anomaly; and it's all the more satisfying if, having followed some of these best practices, we can have more confidence that we're right.

Happy researching!

Haian Song

Investment manager assistant

8 å¹´

Thanks for sharing!

赞
回复

要查看或添加评论,请登录

Vinesh Jha的更多文章

  • Funky street cats in Hong Kong

    Funky street cats in Hong Kong

    Back in November, driving home from work, I saw this quirky little vehicle in front of me and had to take a photo so…

    8 条评论
  • ExtractAlpha's 2023 in review

    ExtractAlpha's 2023 in review

    Thank you to everyone who followed our research and alternative data journey at ExtractAlpha this past year! I thought…

    6 条评论
  • ExxonMobil, lobbying, and stock prices

    ExxonMobil, lobbying, and stock prices

    This Fourth of July, it seems an opportune time to spend some time reflecting on that most patriotic of American…

    2 条评论
  • ExtractAlpha and Estimize are merging!

    ExtractAlpha and Estimize are merging!

    I have been fortunate to have been involved in several stages of the development of the use of analyst forecasts in…

    19 条评论
  • Introducing ExtractAlpha's TrueBeats

    Introducing ExtractAlpha's TrueBeats

    I've been studying earnings estimates data for about 23 years now. In that time, the state of the art in quantitative…

    8 条评论
  • The Value-Momentum rotation and Innovation

    The Value-Momentum rotation and Innovation

    On Nov 9, Pfizer reported that its COVID-19 vaccine had proven to be 90% effective in early testing. That day…

    1 条评论
  • What Happened To The Quants In March 2020?

    What Happened To The Quants In March 2020?

    Introduction In March, a wild month in the markets, most of the factors we track – across sentiment, digital, and…

    4 条评论
  • Do diverse companies outperform?

    Do diverse companies outperform?

    The Wall Street Journal would have us think so. In a weekend article, The Business Case for More Diversity, the Journal…

    13 条评论
  • Using alt data to predict revenues, globally

    Using alt data to predict revenues, globally

    Using ExtractAlpha's Digital Revenue Signal (DRS) each quarter, we track the percentage of companies that beat and miss…

    1 条评论
  • The momentum crash: a quant tremor?

    The momentum crash: a quant tremor?

    As has been widely reported, the Momentum factor had a miserable September. To put this into context, the move on…

社区洞察

其他会员也浏览了