John M. Drake & Pej Rohani
Noise is addressed using stochastic models
Two types of noise:
We distinguish further between two types of process noise:
Discussion question. What are the relationships among the following words/concepts: error, noise, probabilistic, random, stochastic, variability/variation?
The \( SIR \) model, \( dY/dt = \beta XY/N - \gamma Y \), implies that changes in the states \( X \), \( Y \), and \( Z \) are continuous. But, in reality individuals are either susceptible, infected, or recovered so that \( X \), \( Y \), and \( Z \) are integer-valued and changes in the system state occur as discrete steps. The differential equation is an idealization.
Propensities
Rate | Transition |
---|---|
\( \beta XY/N \) | \( X \rightarrow X-1 \), \( Y \rightarrow Y+1 \) |
\( \gamma Y \) | \( Y \rightarrow Y-1 \), \( Z \rightarrow Z+1 \) |
A stochastic differential equation (SDE):
\[ dX(t) = f(t) X(t) dt + h(t) X(t) dW(t) \]
SDEs require stochastic calculus (Ito or Stratonovich) to solve.
Note: Lots of scientific papers gloss over the mathematical details of the models they report, but best practice is to be explicit.
The Master Equation is a system of differential equations for the flow in the probability that he system will be in any given (multi-dimensional) state
\[ dP_k/dt = \sum_l A_{kl}P_l \]
where \( A \) is a matrix of transition propensities.
This approach is only tractable for very simple models (e.g. \( SI \) and \( SIS \) epidemics)
Exact simulation is straightforward via Gillespie's Direct method. To simulate a sample path:
Given system state \( N \), let \( R(N) \) be the sum of all the propensities for all changes of state and \( G_N(s) \) be the probability that no event occurs in subsequent time interval \( s \) for system state \( N \).
By the Markov assumption
\[ \tiny{ \begin{aligned} G_N(s + \delta s) &= Pr \left\{ \mbox{no event in } (t, t+s+\delta s) \right\} \\ &= Pr \left\{ \mbox{no event in } (t, t+s) \right\} \times Pr \left\{ \mbox{no event in } (t+s, t+s+\delta s) \right\} \\ &= G_N(s) \times \left\{ 1-R(N) \times \delta s \right\} \end{aligned} } \]
After rearranging
\[ \begin{align*} \frac{G_N(s + \delta s) - G_N(s)}{\delta s} &= -R(N) \times G_N(s) \end{align*} \]
Letting \( \delta s \to 0 \)
\[ \begin{align*} \frac{d G_N}{d s} &= -R(N) \times G_N(s) \end{align*} \]
With solution
\[ \begin{align*} G_N(s) &= e^{-R(N)s} \end{align*} \]
Thus, the probability the next event occurs in \( (t, t+s) \) is
\[ \begin{align*} F_N(s) &= 1- e^{-R(N)s} \end{align*} \]
Given event time distribution \( F_N \), an exponentially distributed random event time \( S \) can be obtained from a uniform random random variate \( U_1 \) by setting
\[ \begin{align*} U_1 = F_N(s) = 1- e^{-R(N)S} \end{align*} \]
and solving to obtain
\[ \begin{align*} S = - \log(U_1) /R(N) \end{align*} \]
Let the propensities of event types \( E_1, E_2, E_3, ... \) be denoted \( R_1, R_2, R_3, .... \) with total rate \( R_{sum} = R(N)=\sum_i R_i \). In the long run, events of each type should occur with relative frequency \( R_i/R(N) \). We can randomly draw event classes with these frequencies by simulating a second uniform random variate \( U_2 \) and assigning event class \( E_i \) if
\[ \begin{align*} R_{sum}^{-1}\sum_{i=1}^{p-1} R_i < U_2 \leq R_{sum}^{-1}\sum_{i=1}^{p} R_i. \end{align*} \]
Events
Propensities
Function SIR.onestep
performs calculations of each update step
SIR.onestep <- function (x, params) {
X <- x[2]; Y <- x[3]; Z <- x[4] ;N <- X+Y+Z
with(
as.list(params),
{
rates <- c(mu*N, beta*X*Y/N, mu*X, mu*Y, gamma*Y, mu*Z)
changes <- matrix(c( 1, 0, 0,
-1, 1, 0,
-1, 0, 0,
0,-1, 0,
0,-1, 1,
0, 0,-1),
ncol=3, byrow=TRUE)
U1<-runif(1); tau<--log(U1)/sum(rates)
U2<-runif(1)
m<-min(which(cumsum(rates)>=U2*sum(rates)))
x<-x[2:4] + changes[m,]
return(out <- c(tau, x))
}
)
}
Now we write a function SIR.model
that iteratively calls SIR.onestep
to simulate an epidemic
SIR.model <- function (x, params, nstep) { #function to simulate stochastic SIR
output <- array(dim=c(nstep+1,4)) #set up array to store results
colnames(output) <- c("time","X","Y","Z") #name variables
output[1,] <- x #first record of output is initial condition
for (k in 1:nstep) { #iterate for nstep steps
output[k+1,] <- x <- SIR.onestep(x,params)
}
output #return output
}
Finally, we write a code that calls SIR.model
to simulate epidemics
set.seed(38499583) #set seed
nsims <- 10 #number of simulations
pop.size <- 10000 #total population size
Y0 <- 50 #initial number infected
X0 <- round(0.98*pop.size) #initial number suscepitlble (~98% of population)
nstep <- 16000 #number of events to simulate
xstart <- c(time=0,X=X0,Y=Y0,Z=pop.size-X0-Y0) #initial conditions
params <- list(mu=0.00001,beta=60,gamma=365/13) #parameters
data <- vector(mode='list',length=nsims) #initialize list to store the output
for (k in 1:nsims) { #simulate nsims times
data[[k]] <- as.data.frame(SIR.model(xstart,params,nstep))
data[[k]]$cum.time <- cumsum(data[[k]]$time)
}
max.time<-data[[1]]$cum.time[max(which(data[[1]]$Y>0))] #maximum time in first simulation
max.y<-1.8*max(data[[1]]$Y) #find max infected in run 1 and increase by 80% for plot
plot(Y~cum.time,data=data[[1]],xlab='Time',ylab='Incidence',col=1,xlim=c(0,max.time),ylim=c(0,max.y), type='l', axes=FALSE)
box()
axis(2, cex.axis=0.8, las=2)
for (k in 1:nsims) { #add multiple epidemics to plot
lines(Y~cum.time,data=data[[k]],col=k,type='l')
}
J-U transition illustrated by Nasell (1995) in Epidemic Models: Their Structure and Relation to Data
Park et al. 2009. Science 326:726-728
Ferrari et al. 2013. Philosophical Transactions of the Royal Society B 368:20120141
Kramer, A.M., J.T. Pulliam, L. Alexander, P. Rohani, A.W. Park & J.M. Drake. 2016. Spatial spread of the West Africa Ebola epidemic. Royal Society Open Science 3:160294
Presentations and exercises draw significantly from materials developed with Pej Rohani, Ben Bolker, Matt Ferrari, Aaron King, and Dave Smith used during the 2009-2011 Ecology and Evolution of Infectious Diseases workshops and the 2009-2019 Summer Institutes in Statistics and Modeling of Infectious Diseases.
Licensed under the Creative Commons attribution-noncommercial license, http://creativecommons.org/licenses/bync/3.0/. Please share and remix noncommercially, mentioning its origin.