POISSON POINT PROCESS MODELS SOLVE THE “PSEUDO-ABSENCE PROBLEM” FOR PRESENCE-ONLY DATA IN ECOLOGY

Warton, David I.; Shepherd, Leah C. Poisson point process models solve the “pseudo-absence problem” for presence-only data in ecology. Ann. Appl. Stat. 4 (2010), no. 3, 1383–1402. doi:10.1214/10-AOAS331. http://projecteuclid.org/euclid.aoas/1287409378.

“Pseudo-absences” is commonly used by ecologists to model species distribution so that researchers can apply traditional presence/absence regression methods. However, there are three main weaknesses of this approach. which are related to model specification, interpretation, and implementation. Warton and Shepherd proposed point process models as an appropriate tool for species distribution modeling of presence-only data, given that presence data are actually a set of locations. Assuming locations of point events are independent, the intensity at point is modeled as a function of explanatory variables. They also linked point process model to logistic regression approach, showing that when logistic regression model is applied with an increasing number of pseudo-absences, slope parameters will converge to the point process slope estimates. As an illustration, they constructed Poisson point process models for the intensity of Angophora constata records as a function of a set of explanatory data. They have summarized how point process model can address the three weakness shown by logistic regression approach:
Specification – Point process is a plausible model for the data generation mechanism for presence-only data, while logistic regression is coercing the data to fit the model rather than choosing a model that fits the original data.
Interpretation – the intensity at a point has a natural interpretation as the expected number of presence per unit area, which is not sensitive to choice of quadeature points.
Implementation – PPM offers a framework for choosing pseudo-absences, which is not available for logistic regression.
The point process model introduced by this paper directly addressed some key concerns that are currently raised by “pseudo-absence” approaches for species distribution modeling. Though the dependency of points, as the basic assumption by point process models, may result in some lack of fit for specific set of data, it can be addressed by modeling spatial clustering to fit spatial dependency. It would be great to see some example employing point process models with systematic consideration of sample bias, point independency analysis, modeling fitting, and model diagnose.