Factor Investing in Brazil: A Deep Dive into Evidence and Performance

Factor Investing in Brazil: A Deep Dive into Evidence and Performance

Over the past few months, I have devoted significant time studying quantitative analysis and financial modeling for investments. It’s exciting to merge my Insper-rooted approach—making decisions based on evidence and robust analysis—with practical applications that contribute to informed decision-making. A special thanks goes to Brenno, from Varos, who has been instrumental in teaching me technical content and contributing to this modeling process.

I am thrilled to share my study and results on a Factor Investing model applied to the Brazilian market, particularly over the last decade. This is the first of many projects I intend to pursue, aimed at exploring and refining robust methodologies that deliver solid outcomes and, most importantly, are grounded in statistical evidence.

A significant challenge in conducting such studies in Brazil lies in the sample size and data quality. For my research, I utilized a database that includes fundamental company data from 2011 onward, allowing me to maximize the sample size and identify variables and factors capable of explaining returns effectively. However, it’s crucial to note that data maturity in Brazil lags far behind that of the United States, which boasts a much larger and higher-quality dataset.

Addressing Key Biases

For this model and study, I took special care to address several critical biases that must be considered in such analyses:

  1. Look-Ahead Bias: This occurs when information unavailable at the time of the analysis is inadvertently included in the model. A practical example is comparing the reference date of financial data with the actual filing date of documents submitted by companies to the CVM. To avoid this bias, I ensured that fundamental data used in the modeling was genuinely available at the respective historical points in time. This step is crucial, as failing to do so would result in a model that always appears accurate, essentially “predicting” the future.
  2. Overfitting: Overfitting happens when a model is excessively optimized for past data, leading to near-perfect historical performance but unreliable predictions for the future. To mitigate this, I avoided overusing the entire dataset and instead implemented techniques such as in-sample and out-of-sample testing. For example, splitting the data into 80% for training and 20% for validation allows for a more realistic optimization process. By refraining from overloading the model with specific rules tailored to past scenarios, I aimed to strike a balance between historical performance and future applicability.
  3. Survivorship Bias: This occurs when only surviving companies are considered, excluding those that went bankrupt during the sample period. Including such companies is essential, as a model based solely on surviving firms risks overestimating success rates and ignoring critical failures. Incorporating defunct companies ensures a more comprehensive and unbiased model.

Analyzing Factors in Brazil

Before building the model and backtesting the strategy, it’s crucial to validate the factors and understand how they behaved in the Brazilian market. To this end, I conducted a backtest and a detailed analysis of each factor.

Methodology

  1. Data Collection: Data was retrieved via an API.
  2. Indicator Calculation: For each factor, relevant indicators were calculated. For instance, for the value factor, I used metrics like EV/EBITDA and P/E. Since each factor can be assessed through various indicators, I calculated multiple metrics to evaluate which performed best for each respective factor.
  3. Risk Premium Calculation: I divided the sample of companies into four quartiles based on each indicator. For example, with EV/EBITDA, a lower value is preferable, so companies were ranked in ascending order. The first quartile consisted of the “best” companies for that period and factor. This allowed me to analyze whether a risk premium existed for investing in low- or high-value companies.

Below is the result of my preliminary analysis to determine which indicator within the value factor consistently delivered the highest returns over time.



Comparison of the First Quartile Return Between Value Factor Indicators


Following the same approach described above, I analyzed all factors, using as many relevant indicators as possible. This enabled me to select the "champions among champions"—the indicators that best represented each factor for subsequent regression analysis.


Graphing the Best Indicators and Analyzing Combined Factors


Comparison Chart of First Quartile Return Among the Best Indicators of Each Factor


The chart above illustrates the returns of the first quartile for each selected factor. It reveals that the momentum/trend-following factor delivers the highest return within the first quartile. This suggests that investing in companies with stronger absolute performance may explain a substantial portion of the returns.


Risk Premium Analysis

Another critical analysis involves examining the risk premium of each factor. This was done by subtracting the fourth quartile's return from the first quartile's return. In other words, it quantifies the advantage of investing in the "best-selected" companies for each factor compared to the "worst-selected" ones. The results are shown in the chart below.


Risk Premium Comparison Chart Between Factors


The chart highlights that there is indeed a premium associated with investing in first-quartile companies over those in the fourth quartile. The only factor where this premium was null or negative was market cap. For all other factors, a positive and consistent risk premium was observed. Once again, momentum stood out as the clear winner, with a risk premium exceeding 1,000%. This strongly indicates a significant advantage for investors focusing on companies with stronger absolute performance.


Factor-Specific Descriptive Analysis

Finally, this report includes a dedicated descriptive analysis for each factor. Below is an example for the momentum factor.

Descriptive Analysis of the Momentum Factor


The chart on this page provides a breakdown of how each factor performed over time. The bar chart is especially useful for assessing the consistency of each factor. Ideally, a "staircase" pattern is sought, where the first quartile delivers the highest return, followed by the second, and so on, as observed with the momentum factor. Additionally, it’s essential to analyze the performance of each quartile over time, particularly in rolling 1-year windows, to determine if the factor is consistently robust and valid for regression testing.


Factor Correlation Analysis

An important step in selecting factors for regression analysis is to assess their correlation. In linear regression, it's critical that independent variables (factors) are not perfectly correlated, as this would violate MLR.3, the assumption of no perfect multicollinearity. Beyond statistical validity, selecting uncorrelated factors also ensures a more practical and efficient model, as it is likely to perform well across different market cycles.

The correlation matrix for the factors is included in the descriptive analysis report referenced above and can be seen below.

Correlation Matrix of the Factors


The correlation matrix reveals an exceptionally high correlation between the value factor and both momentum and quality factors. Consequently, it would be ideal to exclude these from the same linear regression to avoid redundancy and preserve the model's validity.


Regression and Statistical Analysis

To determine whether the factors robustly and consistently explain returns, it is essential to go beyond descriptive analysis and employ linear regression, as proposed by Fama and French. This allows for the examination of key statistical metrics related to both the model and the independent variables (factors).

The Fama-French regression follows the equation below:

Fama-French Five-Factor Model

Although this model was originally developed for the U.S. market, it can be adapted to the Brazilian market with some modifications.

An important variable to calculate is the market premium (Rmt - Rft). Ideally, a positive market premium is expected, meaning that, in the long run, the market return should exceed the return of the risk-free rate. In the United States, this value is typically positive, as the S&P 500, for example, often outperforms the risk-free rate. However, in Brazil, this market premium tends to be zero or even negative, depending on the time window analyzed. This presents a challenge for linear regression, as the Beta becomes distorted compared to the original models proposed by Fama-French.

As can be seen in the graph below, over a sample of approximately 13 years, the market premium in Brazil showed a return of -53%.

Cumulative Return of Market Premium (Market Return - CDI Return)


Preparing the Dataset

To perform the regression, I calculated the difference between the average universe return and the CDI rate (Rmt - Rft). But what exactly is the "average universe return"?

Since each factor contains a different number of companies in the dataset due to data maturity in Brazil, I first calculated the average return of each factor and then averaged those returns across all factors. This creates a proxy for the market's overall performance while ensuring consistency across the dataset.

Additionally, I applied a liquidity filter, considering only companies with an average daily trading volume greater than R$1M. This ensures the analysis excludes illiquid stocks, which may not be practical for real-world investment strategies.

The independent variables in the regression are the selected factors from the descriptive analysis phase, and the dependent variable is the adjusted market return (Rmt - Rft).


Statistical Considerations

Linear regression minimizes the error terms using Ordinary Least Squares (OLS) to estimate the coefficients. Key considerations for interpreting the results include:

  1. Intercept (Alpha): Ideally, the intercept should be statistically insignificant. A significant alpha suggests the model is failing to fully capture market risk and might not explain returns effectively.
  2. Beta Coefficients: By theory, factor betas should be positive and statistically significant, indicating a positive risk premium for investing in companies within the first quartile of a given factor. A negative beta would contradict financial theory, implying higher returns for the worst-performing companies (fourth quartile).
  3. Market Risk Premium: This represents the long-term return of the market over the risk-free rate. In mature markets like the U.S., this value is positive, reflecting the outperformance of the equity market relative to safer assets. However, in Brazil, this metric can be zero or negative, depending on the timeframe. For this dataset, over a 13-year window, the market risk premium was -53%, reflecting structural challenges in the local market.


First Model: Excluding the Momentum Factor

In the initial regression, I included the Leverage, Value, Quality, and Size factors, excluding Momentum. This allowed me to analyze the model's performance without Momentum and observe its impact when added later.

The results show moderate explanatory power, with an acceptable R2 and an F-statistic that indicates the model’s overall significance. However, individual factors displayed only mild statistical significance, highlighting room for improvement.

Statistical Result of the Linear Regression without the Momentum Factor


Second Model: Including the Momentum Factor

In the second regression, I included the Momentum factor while retaining the original factors from the first model.

This adjustment significantly enhanced the model's performance:

  1. Improved R2 and F-statistic: The inclusion of Momentum increased the model's explanatory power and overall significance.
  2. Insignificant Alpha: The intercept became statistically insignificant, a desirable result indicating that the factors captured market risk more effectively.
  3. Factor Significance: While most factors became more statistically significant, the Value factor lost its significance in this iteration. This aligns with previous findings suggesting that Momentum dominates other factors in the Brazilian market during the observed period.

Statistical Result of the Linear Regression with the Momentum Factor


Insights from the Regression Analysis

The results demonstrate the critical role of the Momentum factor in explaining returns in the Brazilian equity market. Its inclusion not only enhances the model's statistical robustness but also highlights its dominance over other factors, as seen in the descriptive analysis and backtests.

While the Value factor's insignificance might initially seem concerning, it could indicate cyclical limitations or overlap with other variables, warranting further investigation.

In conclusion, the regression confirms that a multi-factor model, especially one emphasizing Momentum, can provide valuable insights into return drivers in Brazil. This analysis lays the groundwork for refining factor-based strategies tailored to the local market.


Backtest of the Model

After analyzing and running the linear regressions to statistically assess the robustness of the model, it is time to get hands-on and observe in practice how the defined model, with the chosen factors and indicators, performs in real-life conditions, taking into account transaction costs and the practical challenges the market presents.

To implement my proprietary factor model, several assumptions are essential. The first is the liquidity filter, mentioned earlier. The second is the portfolio rebalancing frequency. How often will we reanalyze and adjust the investments? This is crucial both for the backtest and the model. Lastly, how many assets will we hold in our portfolio? A key point to highlight is that the model is 100% long-only, meaning it will always be fully invested in equities at all times and under all market conditions.

An important concept for the practical model is to consider its capacity, i.e., how much capital the model can handle without impacting the prices or liquidity of any specific asset. For this, several variables need to be considered, such as the number of assets in the portfolio, the liquidity filter, how many days we will take to buy our stocks (we can distribute purchases over 1/2/3 days), and the percentage of the traded volume that we want to establish as a limit. Over these months, I have developed two proprietary models whose results I find relevant, and they have different capacities. The first model, which I call "aggressive," has a smaller capacity as it includes small caps, which tend to have lower liquidity. The second model, which I consider "moderate," has a significantly higher capacity. For the aggressive model, the minimum capacity would be around 10 million reais, which could increase if we adjust some variables (which would also impact the results). This specific model was designed with a retail investor in mind, but from an institutional perspective, additional relevant variables would need to be considered to account for the capacity of a fund, where the assets under management are much larger.


Agressive Model

First Page of the Aggressive Model Report


Second Page of the Aggressive Model Report
Third Page of the Aggressive Model Report


I am very pleased with my ability to model the factors and variables to achieve such a satisfactory result over these months. This first model would be fully applicable in practice, and in fact, I am currently implementing it in my own portfolio, meaning I have skin in the game with the model I created. As a young investor, I believe the aggressive model could provide a better risk-return profile throughout my journey.

Regarding the results, I do not intend to delve into every detail to avoid being overly technical and repetitive. However, we observed very relevant statistics, such as an annual return of approximately 37%, with an annual volatility of 23%. Clearly, it is a highly aggressive model, with high volatility, but as shown in the trade statistics, mathematically, it is a winning model. Furthermore, over longer time frames, as seen on the third page, it was able to generate alpha in the market throughout the entire period.


Moderate Model

First Page of the Moderate Model Report


Second Page of the Moderate Model Report


Third Page of the Moderate Model Report


The moderate model emerged due to the need to create a strategy for individuals with a more conservative/moderate risk profile. While the aggressive model yielded good results, it’s understood that not everyone has the risk appetite for such a model. Additionally, the moderate model accommodates a much higher capacity than the aggressive one.

This model also pleasantly surprised me. Its mathematical return is quite interesting, with a consistent annual return of 23%, much lower volatility than the aggressive model (14%), and a more controlled drawdown. What impressed me positively was that, even during sideways market periods, like the recent one, it managed to deliver positive results.

The report provides additional details regarding risk (such as specific events, e.g., the truckers’ strike, 2008 crisis, COVID, etc.) and more charts. However, I felt that it would be too much content to post here, so I selected what I considered most relevant.


Conclusion

The main takeaway I offer in this article is the importance of studying new topics and deeply exploring subjects we deem relevant, regardless of the field. When I began diving into quantitative finance to build factor investing models, I started connecting a lot of what I had learned in college, from statistics and econometrics to finance and even behavioral economics. I believe that’s the power of education: being able to combine various tools to create meaningful studies in your field, at the forefront of knowledge. I know I’m infinitely far from mastering this area, but I also know that today I know much more than I did a year ago, and that’s what matters—the desire and insatiable curiosity to learn about topics that fascinate us.

Furthermore, in factor investing, I think one key insight is that it is possible to create a portfolio/investment model systematically and automatically, free from the cognitive biases that, unfortunately, are present in all of us. Of course, like any analysis, it is subject to flaws and can always be improved. However, the current results are promising for the Brazilian market. That is, the factors outlined above can indeed explain market returns. In other words, by exposing ourselves to the right factors, we can achieve consistent and efficient returns throughout our investments.

Beyond factor investing, I’ve also developed trading models that use technical analysis indicators, such as Hi-Lo, Bollinger Bands, Moving Averages, OBV, and so on. I’ve achieved promising results in some of them, while others have been disappointing. Currently, the trading model uses daily data, and I plan to move to shorter timeframes.

Additionally, with the factor model, I’m eager to explore macroeconomic factors and triggers that might shift the model to invest in CDI or remain invested in equities, thus enhancing the risk-return profile by incorporating macroeconomic factors.

Feel free to connect with me on LinkedIn for any questions or suggestions. I’m happy to discuss the topic further.

Thank you!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了