Infectious disease forecasting

John M. Drake & Pej Rohani

What is a forecast?

A forecast is a quantitative statement about an event, outcome, or trend that has not yet been observed, conditional on prior data that has been observed.

Adapted from: Lauer, S.A. et al. 2020. Infectious disease forecasting for public health. In Population Biology of Vector-borne Diseases edited by J.M. Drake, M.B. Bonsall, and M.R. Strand. Oxford University Press.

Typically, it is assumed that forecasts are of future events, but this definition allows for “retrospective” forecasts, including nowcasts.

Probabilistic forecasts are an important special case.


Both mechanistic and statistical models may be used for forecasting.

Statistical models typically outperform mechanistic models.

Discussion question: What do you expect to be the strengths and weaknesses of these two kinds of models?

Models used for forecasting infectious diseases


  • pro: can predict the effect of intervention
  • con: underperform compared with statistical models


  • pro: high performing, generalizable
  • con: assume stationary data-generating process

A 2014 scoping review of influenza forecasting models found that 17/35 models used compartmental models and 18/35 studies used statistical models

Chretien J-P, George D, Shaman J, Chitale RA, McKenzie FE 2014. Influenza Forecasting in Human Populations: A Scoping Review. PLoS ONE 9(4):e94130. doi:10.1371/journal.pone.0094130

Challenges for infectious disease forecasting

  • System complexity
  • Data sparsity
  • Behavior change
  • Forecasting feedback loop

Forecasting targets

Targets are the unknown (but verifiable) quantities that forecasts make quantitative statements about (i.e. an event, outcome, or trend).

Examples include:

  • Disease incidence, hospitalizations, or deaths at specific (future) times
  • Peak disease incidence, hospitalizations, or deaths
  • The time of peak disease incidence, hospitalizations or deaths
  • A binary indicator for whether incidence, hospitalizations, or deaths exceeds a specified threshold


  • Time ranges from 1 to \( T \), typically in regular intervals (days, weeks, etc.), i.e. \( t \in {1, 2, 3, ... T} \)
  • Observations (\( y \)) may be indexed by time, i.e., the time series \( {y_1, y_2, y_3, ... y_T} \) assumed to be draws from random variables \( Y_1, Y_2, Y_3,... Y_T \)
  • Covariates may also be indexed by time, i.e. \( x_1, x_2, x_3, ... x_T \) and may be scalar or vector-valued
  • Often we are interested in the predicted value of a target at a future relative time, indicated by a lag of \( k \), i.e. the value of the target at time \( t+k \) or the \( k \)-step-ahead prediction
  • Targets may or may not be indexed by time and are represented by random variables \( Z \).

The goal