Model-Based Control of Observer Bias for Presence-Only Data in Ecology – ECOL 8910: Perspectives in Computational Ecology

Warton, D. I., et al. (2013). “Model-based control of observer bias for the analysis of presence-only data in ecology.” PLoS One 8(11): e79168.

Observer bias can be a major problem in building SDMs with presence-only data. The authors define observer bias as the idea that “a species is more likely to have been recorded as occurring in a place where people are more likely to see and record it.” Using presences for other sampled species as pseudo-absences for your model is one way of addressing this issue if we assume that observer bias is consistent across species. The paper considers two alternative ways of implementing this approach: a point based approach and a grid cell based approach (in which records within a grid cell are aggregated such that a presence point in a grid cell means that that cell is a presence record for the focal species or the non-focal species respectively). This approach may replace observer bias with species richness bias, by restructuring our question to one of species composition. We now ask, given we at least one species is present in a cell/point, what is the probability that it is our focal species. The authors, therefore, propose an alternative, model-based bias correction method which they compare with these earlier methods. This method consists of modeling the likelihood of observing a presence as a function of both environmental variables and “observer bias variables” such as accessibility of sites. These functions are assumed to be additive. In order to control for observer bias during prediction all observer bias variables are set to a constant across all prediction points/cells. All analyses are performed using a Poisson point process regression model with a LASSO penalty for variable selection. First the authors work through an illustrative example with a single species Eucalyptus apiculata. The model fit with environmental variables and the model fit with environmental and observer bias variables are relatively similar with the second providing a better fit to presence points. The model fit to both types of variables and then controlled for observer bias provides a very different distribution with positive predictions extending into low accessibility areas. In order to draw broader conclusions they trained and evaluated models using 5-fold cross-validation on a presence-absence data set containing 62 species. 84% of species were better predicted when using model-based bias correction than when ignoring bias. These improvements were on average relatively small (95% CI for increase in AUC: 1.5+/- 1.1). Significantly more species were predicted better by model-based bias correction than by the alternative pseudoabsence approach described above. Some species were, however, fit better by the pseudoabsence approach. Potential pitfalls of this new model-based bias correction approach include: a reliance on the quality of the variables chosen to quantify observer bias and on the ability for the effect of these variables to be estimated from the available presence records (making small numbers of records particularly problematic), and the reduction in effectiveness that will come from (likely quite common) correlations between environmental and observer bias variables. This method seems relatively effective and very well grounded conceptually. It would be interesting to see it compared to other common methods of bias reduction beyond the pseudoabsence approach.