Thursday, September 11, 2008

What to do when you have too much data for a macro forecast?

At first sight, the question appears not to have much meaning: It is usually lack of data rather than too much data that forecasters particularly dislike (I plan to devote some time to forecasting with poor data in later blogs). After all, when you have lots of data, you can usually do something with it, but that "something" might not be a good forecast at all. In other words, in data-rich forecasting, the task of the forecaster is to choose the right methodology and come up with an accurate forecast, and a recent paper throws some light on the issue.

Jan J. J. Groen and George Kapetanios (Revisiting Useful Approaches to Data-Rich Macroeconomic Forecasting, FRBNY Staff Report 327) go over the well-known methods in forecasting large amounts of data such as factor models (such as principal components) and Bayesian ridge regressions, both of which are readily available in most software packages nowadays. They also go over forecast combinations. However, I was most interested in their description of partial least squares regression, which I had not heard of before, so I got another tool in my forecasting arsenal. However, the paper goes much beyond a literature survey, as the authors use Monte Carlo experiments as well as forecasting exercises for main US macro data (unemployment, industrial production, Fed Funds and inflation) to show that the new kid (or rather the lesser-known cousin) performs no worse (and usually better) than the other two methodologies.

No comments: