Inference from presence-only data; the ongoing controversy

Hastie, T. & Fithian, W., 2013. Inference from presence-only data; the ongoing controversy. Ecography, 36(8), pp.864–867.

In response to Royle et al. (2012), Hastie & Fithian (2013) question whether it is possible to estimate the overall species occurrence probability, or prevalence, given presence only data. The main concern with Royle et al. (2012) is their assumption of parametric form for nearly log-linear variables. Problematically for most real world data, the functional forms are almost never linear. Royle et al. (2012) approach of linear approximation is a useful simplification that allows researchers to estimate prevalence. But Hastie & Fithian (2013) argue that these assumptions are too arbitrary to be robust in practical settings. However, by assuming this, MLE methods can be used to estimate species probability of presence. To illustrate this point, they simulate nearly linearly logistic data and fit a linear logistic model using likelihood values to generate a large sample of values of x (geographic sites representing a unit of area), via the uniform distribution of sampling to determine presence/absence points. Next, they subset 1000 values of x’s, which had the species present. Figure 2, shows results from three separate simulation runs with the red line showing the true value for the species occurrence probability. In all cases, the values of the histogram bear no relationship to the true values of probability of presence. This paper clears up the main argument against using presence-only data to calculate the full support of species prevalence.