Stochastic models

John M. Drake & Pej Rohani

Epidemiological data are variable/noisy

plot of chunk unnamed-chunk-1

Epidemiological data are variable/noisy

Noise is addressed using stochastic models

Two types of noise:

Observation error: the data are probabilistically related to the true state of the system
Process noise: the system progresses probabilistically

We distinguish further between two types of process noise:

Environmental stochasticity: some parameter is a random variable
Demographic stochasticity: individual-level chance events

Discussion question. What are the relationships among the following words/concepts: error, noise, probabilistic, random, stochastic, variability/variation?

The SIR model is a continuum approximation

plot of chunk unnamed-chunk-2

The \( SIR \) model, \( dY/dt = \beta XY/N - \gamma Y \), implies that changes in the states \( X \), \( Y \), and \( Z \) are continuous. But, in reality individuals are either susceptible, infected, or recovered so that \( X \), \( Y \), and \( Z \) are integer-valued and changes in the system state occur as discrete steps. The differential equation is an idealization.

Demographic stochasticity

What we seek is a stochastic model for which the system of ODEs is an appropriate idealization
There are an infinite number of such models, but the simplest one is a continuous-time, discrete-space Markov Chain with propensities given by the various terms in the differential equations
This model may also be interpreted as an event-driven model with state transition probabilities

Propensities

Rate	Transition
\( \beta XY/N \)	\( X \rightarrow X-1 \), \( Y \rightarrow Y+1 \)
\( \gamma Y \)	\( Y \rightarrow Y-1 \), \( Z \rightarrow Z+1 \)

Other kinds of stochastic models

Continuous and discrete time branching processes
Stochastic differential equations may be found as a continuum approximation to a discrete space model or to model environmental stochasticity
Reed-Frost model
Agent-based models, etc.

A stochastic differential equation (SDE):

\[ dX(t) = f(t) X(t) dt + h(t) X(t) dW(t) \]

SDEs require stochastic calculus (Ito or Stratonovich) to solve.

Note: Lots of scientific papers gloss over the mathematical details of the models they report, but best practice is to be explicit.

Writing down the model mathematically

The Master Equation is a system of differential equations for the flow in the probability that he system will be in any given (multi-dimensional) state

\[ dP_k/dt = \sum_l A_{kl}P_l \]

where \( A \) is a matrix of transition propensities.

This approach is only tractable for very simple models (e.g. \( SI \) and \( SIS \) epidemics)

Simulating the model

Exact simulation is straightforward via Gillespie's Direct method. To simulate a sample path:

Initialize
Iteration of a two step process
- Determine time of the next event
- Determine change of state at the next event time
Summarize

Step 1. Time to next event

Given system state \( N \), let \( R(N) \) be the sum of all the propensities for all changes of state and \( G_N(s) \) be the probability that no event occurs in subsequent time interval \( s \) for system state \( N \).

By the Markov assumption

\[ \tiny{ \begin{aligned} G_N(s + \delta s) &= Pr \left\{ \mbox{no event in } (t, t+s+\delta s) \right\} \\ &= Pr \left\{ \mbox{no event in } (t, t+s) \right\} \times Pr \left\{ \mbox{no event in } (t+s, t+s+\delta s) \right\} \\ &= G_N(s) \times \left\{ 1-R(N) \times \delta s \right\} \end{aligned} } \]

Step 1. Time to next event

After rearranging

\[ \begin{align*} \frac{G_N(s + \delta s) - G_N(s)}{\delta s} &= -R(N) \times G_N(s) \end{align*} \]

Letting \( \delta s \to 0 \)

\[ \begin{align*} \frac{d G_N}{d s} &= -R(N) \times G_N(s) \end{align*} \]

With solution

\[ \begin{align*} G_N(s) &= e^{-R(N)s} \end{align*} \]

Thus, the probability the next event occurs in \( (t, t+s) \) is

\[ \begin{align*} F_N(s) &= 1- e^{-R(N)s} \end{align*} \]

Step 1. Time to next event

Given event time distribution \( F_N \), an exponentially distributed random event time \( S \) can be obtained from a uniform random random variate \( U_1 \) by setting

\[ \begin{align*} U_1 = F_N(s) = 1- e^{-R(N)S} \end{align*} \]

and solving to obtain

\[ \begin{align*} S = - \log(U_1) /R(N) \end{align*} \]

Step 2: change of state

Let the propensities of event types \( E_1, E_2, E_3, ... \) be denoted \( R_1, R_2, R_3, .... \) with total rate \( R_{sum} = R(N)=\sum_i R_i \). In the long run, events of each type should occur with relative frequency \( R_i/R(N) \). We can randomly draw event classes with these frequencies by simulating a second uniform random variate \( U_2 \) and assigning event class \( E_i \) if

\[ \begin{align*} R_{sum}^{-1}\sum_{i=1}^{p-1} R_i < U_2 \leq R_{sum}^{-1}\sum_{i=1}^{p} R_i. \end{align*} \]

Gillespie's direct method

Label all possible events \( E_1, E_2, E_3, ... \)
Initialize \( t=0 \) and state \( N \)
Update step
1. Calculate propensities \( R_1, R_2, R_3, ... \)
2. Calculate \( R_{sum} = R(N)=\sum_i R_i \)
3. Generate \( U_1 \) and transform to obtain \( S \)
4. Generate \( U_2 \) and determine event type \( E_i \)
5. Update state based on \( E_i \)
6. Update time \( t=t+S \)
Go to step (3)

Example with SIR model

Events

\( E_1 \): Birth of susceptible individual (\( X \to X+1 \))
\( E_2 \): Infection (\( X \to X-1, Y \to Y+1 \))
\( E_3 \): Recovery (\( Y \to Y-1, Z \to Z+1 \))
\( E_4 \): Death of susceptible individual (\( X \to X-1 \))
\( E_5 \): Death of infected individual (\( Y \to Y-1 \))
\( E_6 \): Death of recovered individual (\( Z \to Z-1 \))

Example with SIR model

Propensities

\( R_1 \): \( \mu (X+Y+Z) \)
\( R_2 \): \( \beta XY/N \)
\( R_3 \): \( \gamma Y \)
\( R_4 \): \( \mu X \)
\( R_5 \): \( \mu y \)
\( R_6 \): \( \mu Z \)

Example R code

Function SIR.onestep performs calculations of each update step

SIR.onestep <- function (x, params) {
  X <- x[2]; Y <- x[3]; Z <- x[4] ;N <- X+Y+Z
  with(
       as.list(params), 
       {
         rates <- c(mu*N, beta*X*Y/N, mu*X, mu*Y, gamma*Y, mu*Z)
         changes <- matrix(c( 1, 0, 0,
                             -1, 1, 0,
                             -1, 0, 0,
                              0,-1, 0,
                              0,-1, 1,
                              0, 0,-1),
                           ncol=3, byrow=TRUE)
         U1<-runif(1); tau<--log(U1)/sum(rates)
         U2<-runif(1)
         m<-min(which(cumsum(rates)>=U2*sum(rates)))
         x<-x[2:4] + changes[m,]
         return(out <- c(tau, x))
       }
       )
}

Example R code

Now we write a function SIR.model that iteratively calls SIR.onestep to simulate an epidemic

SIR.model <- function (x, params, nstep) {  #function to simulate stochastic SIR
  output <- array(dim=c(nstep+1,4))         #set up array to store results
  colnames(output) <- c("time","X","Y","Z") #name variables
  output[1,] <- x                           #first record of output is initial condition
  for (k in 1:nstep) {                      #iterate for nstep steps
    output[k+1,] <- x <- SIR.onestep(x,params)
  }
  output                                    #return output
}

Example R code

Finally, we write a code that calls SIR.model to simulate epidemics

set.seed(38499583)                #set seed
nsims <- 10                       #number of simulations
pop.size <- 10000                   #total population size
Y0 <- 50                           #initial number infected
X0 <- round(0.98*pop.size)        #initial number suscepitlble (~98% of population)
nstep <- 16000                     #number of events to simulate
xstart <- c(time=0,X=X0,Y=Y0,Z=pop.size-X0-Y0) #initial conditions
params <- list(mu=0.00001,beta=60,gamma=365/13) #parameters
data <- vector(mode='list',length=nsims) #initialize list to store the output
for (k in 1:nsims) {              #simulate nsims times
  data[[k]] <- as.data.frame(SIR.model(xstart,params,nstep))
  data[[k]]$cum.time <- cumsum(data[[k]]$time)
}
max.time<-data[[1]]$cum.time[max(which(data[[1]]$Y>0))] #maximum time in first simulation
max.y<-1.8*max(data[[1]]$Y)       #find max infected in run 1 and increase by 80% for plot
plot(Y~cum.time,data=data[[1]],xlab='Time',ylab='Incidence',col=1,xlim=c(0,max.time),ylim=c(0,max.y), type='l', axes=FALSE)
box()
axis(2, cex.axis=0.8, las=2)

for (k in 1:nsims) {              #add multiple epidemics to plot
  lines(Y~cum.time,data=data[[k]],col=k,type='l')
}

plot of chunk unnamed-chunk-5

Example R code

plot of chunk unnamed-chunk-6

Stochastic phenomena

Extinction
The J to U transition

J-U transition illustrated by Nasell (1995) in Epidemic Models: Their Structure and Relation to Data

Stochastic phenomena

Difference between “likely” outcome (median: blue points) and “worst case scenario” (\( 95^{th} \) percentile: red points) compared with deterministic approximation (green line) and \( R_0 \) (black line)

Park et al. 2009. Science 326:726-728

Stochastic phenomena

Critical community size

Ferrari et al. 2013. Philosophical Transactions of the Royal Society B 368:20120141

Stochastic phenomena

Spreading risk in spatially connected populations

Kramer, A.M., J.T. Pulliam, L. Alexander, P. Rohani, A.W. Park & J.M. Drake. 2016. Spatial spread of the West Africa Ebola epidemic. Royal Society Open Science 3:160294

Acknowledgements

Presentations and exercises draw significantly from materials developed with Pej Rohani, Ben Bolker, Matt Ferrari, Aaron King, and Dave Smith used during the 2009-2011 Ecology and Evolution of Infectious Diseases workshops and the 2009-2019 Summer Institutes in Statistics and Modeling of Infectious Diseases.

Licensed under the Creative Commons attribution-noncommercial license, http://creativecommons.org/licenses/bync/3.0/. Please share and remix noncommercially, mentioning its origin.