John M. Drake & Pej Rohani

*A forecast is a quantitative statement about an event, outcome, or trend that has not yet been observed, conditional on prior data that has been observed.*

Adapted from: Lauer, S.A. et al. 2020. Infectious disease forecasting for public health. In *Population Biology of Vector-borne Diseases* edited by J.M. Drake, M.B. Bonsall, and M.R. Strand. Oxford University Press.

Typically, it is assumed that forecasts are of *future* events, but this definition allows for “retrospective” forecasts, including nowcasts.

Probabilistic forecasts are an important special case.

Both mechanistic and statistical models may be used for forecasting.

Statistical models typically outperform mechanistic models.

**Discussion question:** What do you expect to be the strengths and weaknesses of these two kinds of models?

Mechanistic

- pro: can predict the effect of intervention
- con: underperform compared with statistical models

Statistical

- pro: high performing, generalizable
- con: assume stationary data-generating process

A 2014 scoping review of influenza forecasting models found that 17/35 models used compartmental models and 18/35 studies used statistical models

Chretien J-P, George D, Shaman J, Chitale RA, McKenzie FE 2014. Influenza Forecasting in Human Populations: A Scoping Review. *PLoS ONE* 9(4):e94130. doi:10.1371/journal.pone.0094130

- System complexity
- Data sparsity
- Behavior change
- Forecasting feedback loop

*Targets* are the unknown (but verifiable) quantities that forecasts make quantitative statements about (i.e. an event, outcome, or trend).

Examples include:

- Disease incidence, hospitalizations, or deaths at specific (future) times
- Peak disease incidence, hospitalizations, or deaths
- The time of peak disease incidence, hospitalizations or deaths
- A binary indicator for whether incidence, hospitalizations, or deaths exceeds a specified threshold

- Time ranges from 1 to \( T \), typically in regular intervals (days, weeks, etc.), i.e. \( t \in {1, 2, 3, ... T} \)
*Observations*(\( y \)) may be indexed by time, i.e., the*time series*\( {y_1, y_2, y_3, ... y_T} \) assumed to be draws from random variables \( Y_1, Y_2, Y_3,... Y_T \)*Covariates*may also be indexed by time, i.e. \( x_1, x_2, x_3, ... x_T \) and may be scalar or vector-valued- Often we are interested in the predicted value of a target at a future
*relative*time, indicated by a*lag*of \( k \), i.e. the value of the target at time \( t+k \) or the \( k \)-step-ahead prediction *Targets*may or may not be indexed by time and are represented by random variables \( Z \).