Persistence in factor-based supervised learning models

emlyon faculty: Dr. Guillaume Coqueret

In this research project, we document the importance of memory in machine learning (ML)-based models relying on firm characteristics for asset pricing. We come to three empirical conclusions. First, the pure out-of-sample fit of the models can be mediocre: we find that some R^2 measures are negative, especially when training samples are short. Second, we show that poor fit does not necessarily matter from an investment standpoint: what actually counts is the measure of cross-sectional accuracy. Third, memory is key: portfolios are the most profitable when they are based on models driven by strong persistence. Average realized returns are the highest when the size of training samples is large and when the horizon of the predicted variable is long.