John M. Drake & Pej Rohani
A forecast is a quantitative statement about an event, outcome, or trend that has not yet been observed, conditional on prior data that has been observed.
Adapted from: Lauer, S.A. et al. 2020. Infectious disease forecasting for public health. In Population Biology of Vector-borne Diseases edited by J.M. Drake, M.B. Bonsall, and M.R. Strand. Oxford University Press.
Typically, it is assumed that forecasts are of future events, but this definition allows for “retrospective” forecasts, including nowcasts.
Probabilistic forecasts are an important special case.
Both mechanistic and statistical models may be used for forecasting.
Statistical models typically outperform mechanistic models.
Discussion question: What do you expect to be the strengths and weaknesses of these two kinds of models?
Mechanistic
Statistical
A 2014 scoping review of influenza forecasting models found that 17/35 models used compartmental models and 18/35 studies used statistical models
Chretien J-P, George D, Shaman J, Chitale RA, McKenzie FE 2014. Influenza Forecasting in Human Populations: A Scoping Review. PLoS ONE 9(4):e94130. doi:10.1371/journal.pone.0094130
Targets are the unknown (but verifiable) quantities that forecasts make quantitative statements about (i.e. an event, outcome, or trend).
Examples include:
In probabilistic forecasting, we typically seek a predictive density function \( f(z_t|y_{1:t-k}, t, x_{1:t-k}) \) such that \( \int f(z) dz = 1 \).
Data Percent weighted influenza-like illness from the US Outpatient Influenza-like Illness Surveillance Network (wILIL)
Prediction targets: Epidemic onset, Peak height, Peak week, Duration (how long wILI remains above the 2013-2014 national baseline of 2%)
Bayes' Theorem:
\[ P(A|B) = \frac{P(B|A)P(B)}{P(B)} \]
In Bayesian Statistics, \( P(A) \) is known as the prior.
Five procedures
Model past seasons' epidemic curves as smoothed versions plus noise. (“quadratic trend filtering”)
Construct prior probability for the current season's epidemic curve by considering “transformations” of past seasons' curves.
Estimate what the wILI values the in recent past will be after their final revisions, using non-final wILI and Google Flu Trends data.
Weight possibilities for the current season's epidemic curve using estimates of final revised wILI.
Calculate forecasting targets for each possibility; report results.
Empirical Bayes 2013–2014 national forecast, retrospectively, using the final revisions of wILI values, using revised wILI data for different epidemiological weeks.
Brooks LC, Farrow DC, Hyun S, TibshiraniRJ, Rosenfeld R (2015) Flexible modeling of epidemics with an empirical Bayes framework. PLoS Computational Biology 11(8): e1004382.
Data
Prediction targets:
Time series of forecasting targets and range of revisions. Points are the California time series of COVID-19 indicators which our model is designed to forecast in the version of the data used to fit the model. The vertical line range gives the range of values that each observation had in all versions of the data used to fit our model.
Indicators of the time spent at home. Grey points are the original data provided by Google of the amount of time people in California spent in residential areas relative to a pre-COVID-19 period. Black points are an example time series of the derived version which we use as a predictor of exposure in our model. Holiday effects in the original data are retained but weekly periodicity is removed.
COVID-19 vaccination doses administered in California over time.
Flow diagram of differential equations for expected values of state variables.
Local linearization (van Kampen expansion) supplies a differential equation for the covariance in fluctuations
Estimation using extended Kalman filter
Effective reproduction number of the fitted model for California.
Overall performance of short-term forecasts of cases and deaths. Lower scores indicate better performance. The GISST forecast consistently outperforms the COVID-19 Forecast Hub Baseline forecast and performs best for four week ahead death forecasts.
Overall performance of short-term forecasts of hospital admissions. Lower scores indicate better performance. The GISST forecasts score better than the COVIDhub Ensemble forecasts at all horizons.
Ensemble models (commonly used in weather forecasting) combine the forecasts of multiple models into a single forecast, e.g. by averaging the individual model predictions weighted by past performance
A forecasting system comprises an online algorithm to estimate \( f \), a target prediction function to transform \( f \) into the desired form for output (e.g. the probability that the target will be within a particular range, referred to as bins), and an estimate of performance.
George, D.B. et al. 2019. Technology to advance infectious disease forecasting for outbreak management. Nature Communications 10:3932 https://doi.org/10.1038/s41467-019-11901
Evaluation metrics should be defined and finite for all conceivable applications.
Forecasts should be evaluated using proper scoring rules.
Forecast accuracy should be evaluated using out-of-sample observations.
Log score: \( LogS = \frac{1}{T} \sum_{t=1}^T log \hat p (z_t) \)
Continuous ranked probability score: \( CRPS = \int \left( (F(z)) - \mathcal{H}(z-y) \right)^2 dz \) where \( F(z) \) is the forecasted cumulative distribution function and
\[ \mathcal{H}(z) = \begin{cases} 1,& \text{if } z > 1\\ 0, & z \leq 1 \end{cases} \]
is known as the Heaviside function. The CRPS generalizes the concept of the mean squared error of a point prediction to the case of the function-valued probabilistic forecast.
To evaluate performance over multiple observations, the scores may be averaged, e.g.
\( \frac{1}{T} \sum_{t=1}^T log \hat p (z_t) \).
Brooks LC, Farrow DC, Hyun S, TibshiraniRJ, Rosenfeld R. 2015. Flexible modeling of epidemics with an empirical Bayes framework. PLoS Computational Biology 11:e1004382.
Gibson GC, Moran KR, Reich NG, Osthus D. 2021. Improving probabilistic infectious disease forecasting through coherence. PLoS Computational Biology 17:e1007623.
O'Dea EB, Drake JM. 2022. A semi-parametric, state-space compartmental model with time-dependent parameters for forecasting COVID-19 cases, hospitalizations and deaths. Journal of the Royal Society Interface 19:20210702.
Licensed under the Creative Commons attribution-noncommercial license, http://creativecommons.org/licenses/bync/3.0/. Please share and remix noncommercially, mentioning its origin.