“Simple” forecasting methods are still better

statistical versus machine learning

In 1979, the Greek academic Spyros Makridakis published an article in the Journal of the Royal Statistical Society. The article showed that showed simple (traditional statistical) methods of forecasting outperformed more complex methods. At the time, the increase in readily accessible computing power had allowed newer, computational, and apparently more accurate methods of forecasting to proliferate in business and academia. Makridakis, who was by this stage a Professor at Insead, was sceptical about the true value of these more complex forecasting methods, which prompted him to embark on the extensive empirical study behind the article.

The results of Makridakis’s study were widely criticised at the time, primarily due to the belief that complexity and greater computing power must produce more accurate results. The subsequent disagreement led to the creation of the “M” series of forecasting competitions which set out to find the most accurate forecasting methods for different types of predictions. There has been a total of four “M-Competitions” since 1982 with the most recent concluding in May of this year. The winner of this competition will be announced in October.

Fast forward 40 years and a similar debate has been sparked by a paper released by the same author. In his latest study, which was released in March of this year, Makridakis and his colleagues compared the effectiveness of eight Machine Learning (“ML”) methods versus eight simpler statistical methods, used for time series forecasting.

The most recent study was prompted by the increasing popularity of ML methods in forecasting, despite the lack of sufficient evidence of their superiority over traditional, statistical methods. For this reason, the authors set out to test whether their original assertion, that simple forecasting methods are more accurate than complex computational ones, was still true.

Key Terms

In broad terms, the authors sought to measure forecast performance against two metrics, accuracy and computational requirements. To help define these metrics, the authors use a number of key terms that outline how these measurements were made.

  • Computational Complexity – The time needed to train a given ML using historic data.
  • Model Fitting – A measure of how well a model “fits” historic data.
  • Measures of Accuracy – Two measures of forecasting performance are used in the study:
    • symmetric Mean Absolute Percentage Error (sMAPE)
    • Mean Absolute Scaled Error (MASE)
  • Data Preprocessing – The work that needs to be done on a data set before it can be used for ML modelling purposes.

Key Findings

The paper itself stretches to over 20 pages with a detailed breakdown and comparison of the accuracy of the different forecasting methods analysed.  Some of the key findings of the study conclude:

1. Simple is Still Better

The research contained in this study showed that the simpler statistical forecasting methods “dominated” ML methods across both accuracy measures used for all forecasting horizons examined. The simplest method of statistical forecasting was the naïve method, which is essentially a rollover of historical data. The naïve method performed better than all but three of the ML methods for a one-step ahead forecast, using both measures of accuracy, with far less Computational Complexity.

2. Computational Complexity

The authors highlight the excessive computational complexity of some of the ML methods as being be a barrier to their practical application. For these methods to be used in business and other fields, “their computational requirements must be reduced considerably.” The report suggests deseasonalizing data, using simpler models and limiting the number of training iterations as ways to reduce computational complexity.

3. The Best Fitting Models Don’t Produce Best Forecasts

ML forecasting techniques typically “fit” a line or a model to historical data and use this to extrapolate into the future. One measure of the effectiveness of an ML technique is how closely it can fit this model to historical data. This study shows that methods or models that best fit a data set did not necessarily result in more accurate forecasts.

4. Need to Crack Open the Black Box

One of the key suggestions in this paper is that for ML forecasting methods to be become useful in practical business applications, the way they work and how they produce results needs to be clearer to users. The researchers stated that “obtaining numbers from a black box is not acceptable to practitioners who need to know how forecasts arise and how they can be influenced or adjusted to arrive at workable predictions.”

5. Automate Preprocessing Tasks

The preprocessing of historical data is time consuming and requires decisions by the user that add to the complexity of the overall forecasting process. The automation of these preprocessing tasks is seen as key to ML forecasting techniques becoming useful to users, on a day-to-day basis.

Working towards Specialised Algorithms

While the paper shines a light on the shortcomings of ML forecasting methods, Makridakis and his colleagues do highlight the “great potential of ML for forecasting applications.” The conclusion of this paper reiterates this point and mentions that “specialised algorithms”, unique to forecasting, may be required to justify ML as a viable forecasting technique. For now, however, the most effective methods of time series forecasting are the simple, statistical ones.

Using the Latest Technologies

In CashAnalytics we believe, as is alluded to in the paper, that forecasting is like no other business discipline or task. Unlike many other business processes, the end result is measured in degrees of accuracy (and other factors), rather than a binary right or wrong.

As a technology company, and as dedicated liquidity forecasting specialists, it is our duty to be at the cutting edge to ensure our clients benefit from the best that technology has to offer. This means thoroughly testing the application of new technologies, and measuring their advantages and disadvantages. Like Makridakis, we believe that empirical analysis is the best measure of effectiveness.