Forecasting with machine learning methods and multiple large datasets

Staff working papers set out research in progress by our staff, with the aim of encouraging comments and debate.
Published on 28 May 2021

Staff Working Paper No. 923

By Nikoleta Anesti, Eleni Kalamara and George Kapetanios
 
The usefulness of machine learning techniques for forecasting macroeconomic variables using multiple large datasets is considered. The predictive content of surveys is compared with text-based indicators from newspaper articles and a standard macroeconomic dataset, extending the evidence on the contribution of each dataset in predicting economic activity. Among the linear models, the Ridge regression and the Partial Least Squares models report the largest gains consistently for most of the forecasting horizons, and among the non-linear machine learning models, Support Vector Regression performs better at shorter horizons compared to the Neural Networks and Random Forest that yield more accurate forecasts up to two years ahead. Text-based indicators have similar informational content to surveys, albeit combining the two datasets provides with more accurate forecasts for most of the forecast horizons. The largest forecasting gains are overwhelmingly concentrated at the shorter horizons for the majority of models and datasets and they decrease significantly after one year. Non-linear machine learning models appear to be mostly useful during the Great Financial Crisis and perform similarly to their linear counterparts in more normal periods.
 
This version was updated in September 2024. The Staff Working Paper was first published on 28 May 2021 under the title ‘Forecasting UK GDP growth with large survey panels’.

 

Forecasting with machine learning methods and
multiple large datasets