Statistical Forecasts in Action


Last year at about this time, the World Climate Service highlighted a new effort to develop a statistical forecast capability for the subseasonal time frame (weeks in advance), and last summer a new World Climate Service product was released. Since then, the statistical forecast scheme, known as Sub-R, has been delivering week 3-6 temperature forecasts for the contiguous US, Europe, and eastern Asia, with daily updates; these forecasts have been available to WCS customers on our subseasonal climate forecast portal.

Now that a complete winter is “under our belt” with subseasonal statistical forecasting, it’s worth looking at some verification statistics. We’re particularly interested in how Sub-R performed in comparison to the dynamical model forecasts, and especially the leading ECMWF subseasonal model. It was a tough winter for the dynamical models over the US, but in Europe the models reliably anticipated the persistent and remarkable warmth that dominated most of the continent throughout the winter.

Statistical Forecasts – Subseasonal Skill Evaluation

Looking first at the contiguous US (CONUS), the chart below shows that overall the week 3 and week 4 Sub-R statistical forecasts were as good (week 3) or better (week 4) than the ECMWF. This isn’t saying much, of course, with very low anomaly correlation scores for both, but nevertheless it’s good to see positive skill from Sub-R. And it’s worth highlighting just how bad the ECMWF was: forecasters desperately need independent guidance with demonstrable skill.

In Europe the situation was vastly different, with high skill scores from the ECMWF in middle and especially late winter. It’s hard to beat a forecast that is warm everywhere all the time, when it turns out to be correct. But Sub-R did very well too, with the notable exception of a couple of bad “drop-out” periods in January. The existence of these bad forecasts is a topic of much interest and active research for the coming months, especially given that Sub-R was extremely successful in February and the first half of March.

And here’s a look at eastern Asia (from India and Southeast Asia across China to Japan and far eastern Russia). For this domain, the ECMWF was more often superior to Sub-R, and again we see a few notable drop-outs in Sub-R performance, but nevertheless the Sub-R anomaly correlations of about +0.30 are respectable.

While these results might be described as “encouraging” or “interesting”, perhaps not all readers would regard them as “compelling”. But let’s think about how an independent forecast like Sub-R can be used.

In practice, a long-lead dynamical model forecast like ECMWF always has very modest skill, and it’s often difficult to assess whether the forecast is “believable” or not. Of course, using calibrated probabilities is the best way to assess the model signal (as we’ve argued elsewhere), but the fact remains that the model forecast for weeks 3 and beyond is usually (not always) weak and essentially lacking in confidence.

And that’s where Sub-R comes in. If we look at the forecast verification from winter 2019-2020, we find that both sets of forecasts were more successful when the two independent systems agreed with each other – see the table below. Note that in every instance the ECMWF forecasts were more skillful when the sign of the Sub-R anomaly was the same as that of the ECMWF.

The obvious conclusion is that Sub-R provides an independent way to refine confidence in the dynamical model forecasts. When Sub-R agrees, then the ECMWF forecast is more credible and the forecaster should be more confident; when Sub-R disagrees, then confidence should be lower.

These results from winter 2019-2020 are completely consistent with the earlier conclusion we drew from a study of hindcast (out-of-sample) Sub-R performance for 2010-2018. When we looked at a dynamical model blend (multi-model ensemble, MME) and conditioned the skill calculation on whether Sub-R agreed or not, we found a very substantial increase in dynamical model skill when Sub-R showed the same sign of the tempeature anomaly. The figure below illustrates this for week 3 CONUS forecasts initialized in May through July, when Sub-R appears to be most skillful.

In summary, the performance of the realtime Sub-R forecasts over winter 2019-2010 demonstrates conclusively that Sub-R has significant value in providing independent confirmation of the subseasonal dynamical model forecasts. We believe the Sub-R prediction system is identifying periods of time when the atmosphere is more predictable than at other times, and long-lead forecasters should not be without this valuable independent guidance.

Sign up today for a World Climate Service trial, and join the growing number of users who have an edge over the rest of the market.