On estimating probability of presence from use–availability or presence–background data.

Phillips, S. J. and Elith, J. (2013), On estimating probability of presence from use–availability or presence–background data. Ecology, 94: 1409–1419. doi:10.1890/12-1520.1

The paper investigates statistical methods (specifically logistic models) that estimates the probability that a species is present at a site conditional on environmental covariates and further addresses the disagreement in the literature on whether probability of presence is identifiable from presence-background data alone. The probability of presence is identifiable if one makes strong assumptions about the structure of the species probability of presence, however some view the assumptions unrealistic and the risk of deviating from strong assumptions can result in poorly calibrated models. An experiment (outlined below) also demonstrates that an estimate of prevalence is necessary for identifying the probability of presence. It is suggested that presence-background data must be augmented with an additional datum to reliably estimate absolute probability of presence. Methods: Seven simulated species whose probability of presence is defined by the seven functions: constant, linear, quadratic, Gaussian, Semi-Logistic, Logistic 1 and Logistic 2 (whose probabilities were bounded by 0 and 1) – which represent a variety of shapes of the response of a species to its environment – were used, in addition to randomly drawn data with 1000 presence samples and 10000 background samples chosen uniformly (0 to 1). Data was used with 5 maximum-likelihood-based methods (abbreviated as EM, SC, SB, L1 and LK) for deriving logistic models from presence-background data. Method inputs varied by 1) using a strong parametric assumption to make probability of presence identifiable (which the output failed to estimate the species probability because it fails to acknowledge species response to environment as identified in L1 and LK) and 2) requires the user to supply an estimate of the species population prevalence (as in EM, SC, SB, which was ultimately recommended to use). Based on the papers results, there is no alternative to collecting quality field work data (as opposed to making strong assumptions as in (1)) which further points out the importance to address the complexities in species-environment relationships. I thought it was pretty obvious that one cannot make strong assumptions when determining a species presence, although it might be easier for the sake of using models, but when you take an ecologist (or more specifically a wildlife manager) point-of-view determining what information goes into a model is probably more relevant.