Be Serious with Equity Factor Investing!
While sharing the same objectives, equity indices that aim to provide multiple factor exposures may opt for very different implementation methods, thus reflecting differences in the underlying beliefs about multi-factor investing. This article looks at the conceptual considerations involved in designing different approaches. The key issues that we discuss involve the robustness, consistency and diversification of different approaches when designing multi-factor indices.
Product providers across the board put strong emphasis on the academic grounding of their factor indices. At the same time, they try to differentiate their products using proprietary elements in their strategy, often leading to the creation of products using new factors or novel strategy construction approaches that may or may not be consistent with the broad consensus on empirical asset pricing in the academic literature. As for factor definitions, many factor indices show considerable divergence from academic definitions.
For example, the Fama and French (2012, 2015) factor definitions, which are widely used in academic research, are based on straightforward stock selection criteria such as price-to-book value for example. However, for most factor or multi-factor offerings, product providers typically favour more complex factor definitions that may indeed reflect a stark disagreement with academic research. For example, some providers use industry or regional adjustments for certain variables within a given factor score while not using the same adjustments for other variables that make up the same factor score. Moreover, providers often use variables that are quite far removed from the original factor definition – for example, change in asset turnover in quality scores. In fact, most of the Quality indices on offer have more to do with the precepts of stock-picking gurus than with the academic literature, where profitability and investment have been identified as asset pricing factors.
While the definitions found in the reference academic research rely on straightforward variables and make a choice of transparently and simply selecting one key metric to come up with a factor score for each stock, the proprietary definitions from most providers use different sets of variables, as well as various adjustments, and often consist of complex combinations of several variables.
A mismatch with academic factor definitions creates two problems. The first, which we have already mentioned, is that it is difficult to refer to academic evidence to justify one's factor offering and at the same time distance oneself from the empirical framework used for that same research using factor definitions different from those used by the researchers cited. The second is that this complexification and/or creation of ad-hoc proprietary factors is a source of potential data-mining problems.
Selecting proprietary combinations or making proprietary tweaks to variable definitions offers the possibility of improving the performance of a factor index in a backtest. In general, proprietary factor definitions increase the amount of flexibility providers have in testing many variations of factors and thus pose a risk of data-mining. In fact, it appears that providers sometimes explicitly aim to select ad-hoc factor definitions that have performed well over short-term backtests.
The question is whether the improvement of the "enhanced" factor definition will also hold going forward, especially if there is no solid economic foundation for it. There is clearly a risk that one ends up with what academics have termed “lucky factors”. Harvey and Liu (2015) show that by snooping through data on a large number of candidate factors and retaining those with the highest t-stat, one takes the risk of uncovering flukes, which will not repeat out of sample. Perhaps even more importantly, it is unclear what – if anything – factors with extensive proprietary tweaks still have in common with the factors from academic research. Therefore, the empirical evidence in favour of the academic factors and their economic grounding cannot be transposed to such new proprietary factors.
While the selection bias potentially exists for any strategy, there is an additional bias that is specific to so-called composite scoring approaches. These are factor definitions that draw on combinations of multiple variables. Novy-Marx (2015) analyses the bias inherent in backtests of composite scoring approaches. Novy-Marx argues that the use of composite variables in the design and testing of smart beta strategies yields a “particular pernicious form of data-snooping bias”. He shows that creating a composite variable based on the in-sample performance of single variable strategies generates an over-fitting bias. The author concludes that, “combining signals that backtest positively can yield impressive backtested results, even when none of the signals employed to construct the composite signal has real power”.
A simple reason for why composite scores may be more prone to generating biased results is that a composite variable requires more inputs and thus increases the number of possible choices. There seems to be wide-ranging awareness that composite strategies, by having more inputs, will lead to increased data-mining risk. Pedersen (2015) argues that, “we should discount backtests more if they have more inputs and have been tweaked or optimised more”. Likewise, Ilmanen (2011) states that analysis involving "tweaks in indicator specification" is "even more vulnerable to data-mining than is identification of the basic regularities".
For investors conducting due diligence on commonly-offered smart beta strategies, it thus appears important to investigate not just the backtested performance but also the underlying data snooping risk, given that both selection bias and over-fitting bias may be present when proprietary composite scores are being used. Moreover, one can argue that backtests of strategies that do not employ complex proprietary scores are naturally more robust and the backtested performance of such strategies needs to be discounted less than that of complex proprietary factor definitions. In the next section, we further investigate the biases stemming from methodological choices, particularly looking at what happens when a consistent approach may be lacking in the index design.
Inconsistencies in methodologies
A major source of potential data-mining bias that may result in overstated backtested performance is the flexibility offered by the testing of many variations in search of the winning one. Such flexibility is obviously increased when a provider allows index methodologies to be inconsistent. On the contrary, a very effective mechanism to avoid data-mining is to establish a consistent framework for smart beta index creation. Such a framework can limit ad-hoc choices while providing the necessary flexibility needed for smart beta index construction. Surprisingly, while most major index providers argue that cap-weighted indices should employ a consistent set of rules across regions to avoid unintended investment outcomes, said consistency is often overlooked for factor indices.
Perhaps the most severe form of inconsistency is inconsistency among index offerings across time. For example, it is commonplace that two multi-factor indices launched at different points in time by the same provider use different definitions of the Value factor. This may be surprising, especially for the Value component, as Value seems to be among the most standard factors. Just like inconsistencies across factors open the room for a large number of variations in index design, it is clear that inconsistencies over time further increase such flexibility. Such inconsistency across time is, however, widely present among index offerings. Amenc et al. (2015) emphasise that inconsistency over time is all but day-to-day business for index providers.
An important issue that can be easily neglected when constructing a multi-factor index is diversification. Positive exposure to rewarded factors is obviously a strong and useful contributor to expected returns. However, products that aim to capture explicit risk-factor tilts often neglect adequate diversification. This is a serious issue because diversification has been described as the only "free lunch" in finance. Diversification allows a given exposure to be captured with the lowest level of risk required. In contrast, gaining factor exposures exposes investors to additional types of risk, and therefore, such exposures do not constitute a "free lunch". They instead constitute compensation for
risk in the form of systematic factor exposures. Such capturing of risk premia associated with systematic factors is attractive for investors who can accept the systematic risk exposure in return for commensurate compensation.
However, factor-tilted strategies, when they are very concentrated, may also take on other non-rewarded risks. Non-rewarded risks come in the form of idiosyncratic or firm-level risk, as well as potential risk for sector concentration, currency, sovereign or commodities risk exposure. Financial theory does not provide any reason why such risk should be rewarded. Therefore, a sensible approach to factor investing should not only look at obtaining a factor tilt, but also at achieving proper diversification within that factor tilt.
In fact, if the objective were to obtain the most pronounced value tilt, for example, the strategy that corresponds to this objective is to hold 100% in the single stock with the largest value exposure. This clearly shows that the objective of maximising the strength of a factor tilt is not reasonable. While practical implementations of concentrated factor-tilted indices will be less extreme than this example, we can expect problems with high levels of idiosyncratic risk and high levels of turnover whenever index construction focuses too much on concentration and pays too little attention to diversification.
One of the possible ways to construct a multi-factor index is to combine different single factor indices. Amenc et al. (2016) show that well-diversified factor indices which pursue a diversification objective through an alternative weighting scheme based on a relatively broad stock selection provide considerable benefits over more concentrated single factor indices. Their results suggest that well-diversified factor portfolios or indices outperform their highly-concentrated counterparts in terms of risk-adjusted performance, because concentrated factors may be highly exposed to unrewarded factors. In addition, they show that factor-tilted portfolios on narrow stock selections present implementation drawbacks such as higher turnover.
Concentration may arise in particular in indices that do not have such a diversification objective, especially in multi-factor indexing methodologies that, rather than combining single factor indices, actually build multi-factor indices from the stock level up.
In addition to concentration, stock level approaches contain further issues that we turn to now.
When using multi-factor scores in portfolio optimisation, it should not be forgotten that the score is ultimately used as a proxy for expected returns. It is well known for example that mean-variance optimisation that integrates expected returns can result in an "error maximisation exercise" since expected returns are hard to estimate at the individual stock level, and since mean-variance optimisers are very sensitive to estimation error for expected returns (Best and Grauer, 1991).
Achieving high absolute factor scores at the portfolio level by concentrating on picking champion stocks that score highly on all targeted factor dimensions is probably intuitively attractive but it is predicated on a high-precision relationship between factor scores and returns at the stock-level. There is no question that factor investing is motivated by an attempt to capture higher long-term returns through the right risk exposures. However, return estimation at the stock level is notoriously difficult. Black (1993) distinguishes between explaining returns, which is easy because it is really explaining variance, and predicting returns, which is hard. He contends that the accurate estimation of average expected return requires decades of data. For variance, he notes, "We can use daily (or more frequent) data to estimate covariances. Our estimates are accurate enough that we can see the covariances change through time". To estimate expected returns, on the other hand he writes, "Daily data hardly help at all." and "We need such a long period to estimate the average that we have little hope of seeing changes in expected return".
The search for champion stocks as measured by their factor scores is a stock-picking exercise that relies implicitly but heavily on the accuracy of expected return predictions. Attempting to improve stock-level return forecasts, even when this is done with the support of a factor model, is a largely futile exercise, reminiscent of traditional stock pickers. It may be useful to pause and remember that it
is precisely the lack of persistent success in stock picking that has led to more institutional investors shifting toward passive strategies. If efforts are to be made to improve the risk-adjusted returns of factor investing, it is more on the risk dimension side, where we can rely on 60 years of progress in financial econometrics to estimate convergent estimators of volatilities and covariances.
When academics have tested standard factors, they have done so by running portfolio sorts, and assessing return differences at the portfolio level, not by assessing returns at the stock level. For example, they have observed that, on average, value stocks tend to have higher returns than growth stocks over the long-term. If one now tries to design strategies based on very fine distinctions at the stock level, such relations may be drowned in noise. More generally, making very fine distinctions at the stock level is prone to capturing estimation error.
Thus, any stock-level approach needs to be handled with care and one needs to assess whether suitable mechanisms have been built in to achieve robustness.
The offerings in the area of multi-factor indices are multiplying rapidly and investors have to assess how such indices match their investment needs. Given that most products have been launched recently, analysis of risk and performance is mostly limited to backtested data. Therefore, the methodological principles behind index construction should become a key area of attention in the assessment of these indices. Analysing robustness requires an assessment of index design principles and the conceptual considerations underlying index design. Our brief review of offerings aims to shed light on several issues such as complex proprietary factor definitions, potential inconsistencies in methodologies, and concentration issues.
In principle, multi-factor indices aim at a common goal – outperforming cap-weighted benchmarks by providing exposure to multiple rewarded factors. As discussed here, the ways to do this are nonetheless quite diverse. A key consideration for investors is how robust the performance presented in backtests is expected to be. Highly parameterised approaches naturally contain higher risks of overstated backtest performance than more parsimonious index design methods. In particular, since the bottom-up approach is more flexible, it can more easily fall prey to data-mining. It is always possible to find a combination of factor definitions, multi-factor scoring and a weighting scheme that will select the right stocks in sample. In-sample over-fitting, however, would lead to disappointing out-of-sample performance. In terms of due diligence, the bar on innovative bottom-up methods should be set higher than for classic top-down approaches, and investors would be well advised to ask for live track records of a significant length when a provider shows a lot of creativity.
There is no doubt that more elaboration on factor definitions and the use of more granular stock-level information allow the data to be fitted better and help to produce backtests that suggest superior performance, but the ultimate question investors should ask is that of the robustness of the advertised index performance in live conditions.
This article is one of many articles in Investment Management Review (IMR) special edition Autumn 2017 which covers important recent insights into the asset management industry. To access the full edition please click on http://www.imrmagazine.com/edhec-special-edition.php.