Looking Forward by Looking Back:
Using Historical Calibration to Improve Forecasts of Human Disease Vector Distributions

Sohanna, A. & Thomas, K., 2015. Looking Forward by Looking Back: Using Historical Calibration to Improve Forecasts of Human Disease Vector Distributions. Vector-Borne and Zoonotic Diseases, 15(3), pp.173–183. link

With rising concerns about how environmental change impacts disease vector distributions, many studies aim to predict future vector distributions under varying climate change scenarios using information available at present time. Many types of species distribution models enable us to produce highly accurate present-day data on vectors of disease. However, when trying to forecast or ‘hindcast’ species distributions many models are never validated with independent data on past or separately observed distributions. This review paper focuses on (1) methods of validation for present day spatial models, (2) how these models should be projected into the future, and (3) introduce the method of historical calibration for validation. The authors explain three methods of validation for present day spatial models and their limitations: the commonly used split-data approach (training & test data), independent dataset validation (geographically or temporally distinct data sets for validation), and validation via occurrence of disease in reservoir species. Next, the authors reviewed the use of GCMs to model future climates and their limitations including ignoring biological processes and non-linearities as well as using constant change environmental increments without setting theoretical limitations. Lastly, they suggest that historical calibration, a validation method rooted in macroecology, is more temporally transferrable in the context of projecting vector distributions and when coupled with reliable ensemble models could reduce current shortcomings in forecasting species distributions.

 

Vacated niches, competitive release and the community ecology of pathogen eradication

Lloyd-Smith JO. 2013 Vacated niches, competitive release and the community ecology of pathogen eradication. Phil Trans R Soc B 368: 0120150. http://dx.doi.org/10.1098/rstb.2012.0150

This article reviews whether it is sensible to consider the niche left behind when a pathogen is eradicated, and to worry about the risk that this niche will be recolonized by another pathogen causing a similar disease. This topic is highly controversial in the epidemiological literature regarding he merits of eradication. Lloyd-Smith proposes the term ‘vacated niche’ to describe the pathogen niche left behind following a successful eradication effort and evaluates evidence claiming that vacated niches can alter the epidemiology of the surrounding community of pathogens. Potential mechanisms of competitive release or evolutionary adaptation can elevate the health burden from other pathogens (i.e. resulting in increased incidence of another pathogen). However, he emphasizes that a vacated niche will not necessarily cause emergence of a replacement pathogen, or that any such pathogen will have similar disease characteristics to the eliminated one. He concludes that the vacated niche is an opportunity for other pathogens, but many factors will determine whether and how they may capitalize on it. This article is an expansion to the ecological discussion of whether empty niches actually exist and it is interesting to think about how these concepts would lend themselves to invasive species and/or local species extinctions.

Modelling ecological niches with support vector machines

Drake, Randin, & Guisan (2006) tested the method of support vector machines (SVMs) to map ecological niches using presence-only data for 106 species of woody plants and trees in a montane environment with nine environmental covariates. Support vector machines (SVMs) utilize machine-learning techniques designed to model one type of data only by finding statistical patterns and then removing outliers to estimate the support of high-dimensional distributions. The support of the distribution of a species’ environmental requirements is analogous to Hutchinson’s ecological niche concept. In situations with presence-only, SVMs are simpler (and differ from other methods) because they eliminate the requirement for pseudo-absence data. This paper compares three ways of using the SVM approach: (1) using no pre-processing or data reduction to the nine environmental covariates, (2) pre-processing training data using k-whitening, and (3) restricting covariates by removing highly correlated environmental variables. They found that method 1 resulted in models with the highest recall (ratio of number of correct predictions to total number of observations) and lowest false positive rate. Method 3 performed the worst overall, suggesting that useful information about ecological niches can be obtained by the inclusion of more environmental variables, even if they are highly correlated. Additionally, they found that the SVM method required approximately the same amount of observations as comparable methods, and resulted in similar AUC values for prediction. This paper helped to develop a background understanding of the literature on machine-learning techniques to model presence-only vs. presence-absence data and how the aforementioned methodological differences determine whether a species’ fundamental or realized niche is being modeled.

Drake, J.M., RANDIN, C. & GUISAN, A., 2006. Modelling ecological niches with support vector machines. Journal of Applied Ecology, 43(3), pp.424–432.Differing performances of 3 methods of using SVMs

Modeling the spatial distribution of two important South African plantation forestry pathogens

Van Staden et al. (2005) used a bioclimatic species distribution model to find the broad habitat distribution and potential distribution of two fungal pathogens of commercially important tree species, pines and eucalyptus, in South Africa under varying climate change scenarios. The distribution and infectivity of both pathogens are affected by certain climatic parameters (e.g. hail damage, high rainfall, and humidity) and climate change impacts these variables. Fungal incidence data for the study consisted of 87 confirmed reports of S. sapinea and 17 reports of C. cubensis and climate data for the area were obtained from existing literature and a digital elevation model for South Africa. Climate data included five variables: altitude, average rainfall of driest and wettest month, and average temperature of hottest and coldest month. The bioclimatic model incorporated these five variables, created a multidimensional scatter plot using for each variable for each grid cell in South Africa (11,800 total), generated matrix of covariates for each cell, and then transformed that matrix into a probability of occurrence for each fungus for each cell. Consequently, they were able to identify core-risk regions for both fungi, and found that those regions included major commercial forestry plantations. They report this as the first study to utilize a bioclimatic model to predict the distribution of economically relevant pathogens for eventual use in decision support systems for forestry management. This study could be improved by increased data on the fungus (more than 100 counts of each) and potentially exploring the variation in predictions generated by the model. It would be interesting to explore different combinations of variables or data points and how the predications would change based on each combination.

van Staden, V. et al., 2004. Modelling the spatial distribution of two important South African plantation forestry pathogens. Forest Ecology and Management, 187(1), pp.61–73.

Fast and flexible Bayesian species distribution modelling using Gaussian processes

Golding and Purse suggest that Gaussian process (GP) species distribution models (SDM) via Bayesian priors may be beneficial for ecologists that wish to incorporate prior knowledge of their system and retain the speed and accuracy of predictions granted by other models. Gaussian processes are able to fit complex (i.e. more statistical terms) statistical models, but typically require computationally extensive methods (e.g. Markov chain Monte Carlo methods). Consequently, the authors evaluate another method of incorporating GP SDMs by comparing its predictive ability and run time with other commonly used approaches in a dataset from the North American Breeding Bird Survey for both presence/absence and presence-only data. Models compared in their study include: a GP model, a generalized additive model (GAM), and a boosted regression tree model (BRT). Instead of fitting GP SDM models with MCMC, they evaluate the efficacy of a more efficient deterministic inference procedure called Laplace approximation and expectation propagation. Deterministic approximations are subject to error that may decrease accuracy of predictions, but the authors argue that even with these limitations GP models fitted with deterministic inference are a promising method for SDM analyses. They found that the predictive accuracy of GP SDMs fitted by Laplace approximation was higher than BRT, GAMs, and logistic regression for presence/absence data and higher than all compared models for presence-only data. Additionally, GP SDMs were just as fast as GAMs. For situations when data on species occurrence is sparse, such vector abundance and distribution, but distributions of hosts is better documented (e.g. cattle or humans) this method would allow integration of multiple types of prior information.

 

Golding, N. & Purse, B.V., 2016. Fast and flexible Bayesian species distribution modelling using Gaussian processes. Methods in Ecology and Evolution. doi: 10.1111/2041-210X.