The effects of model and data complexity on predictions from species distributions models

Garcia-Callejas, D., & Araujo, M. B. (2016). The effects of model and data complexity on predictions from species distributions models. Ecological Modelling, 326, 4-12.

DOI: 10.1016/j.ecolmodel.2015.06.002

 

1-s2.0-S0304380015002513-gr3

Species distribution models involved statistical or numerical methods that relate distributions of a species with layers of environmental data. While tests of SDM performance have concluded that more complex models are generally better than simple models, performance may be inflated when test data are not independent from training data, such as when data is randomly split into test and training data sets. The few studies that have tested transferability of models on completely new data have found no relationship between complexity and model performance. Delineation between simple and complex models can be difficult. Typically simple models are thought of as easy to comprehend and perform simple computational operations. Complex models have several layers of complexity that play a role in making them difficult to comprehend. First complex models may require a complex algorithm that uses a relatively large amount of computational resources. These models are referred to as time or algorithmic complex. Another source of complexity can be found in data complexity. The influence of data complexity on model performance has not been formally explored though the authors predict that simple data sets are likely easier to model and as such should yield better performance than models trained on complex datasets. To explore the influence of complexity on model performance the authors simulated the distribution of three species using a set of environmental covariates. Eight modeling methods were considered (BIOCLIM, GLM, GAM, MARS, Maxent, BRT, random forest, SVM) in evaluation model performance with varying complexity. A linear relationship existed between dataset size and computation time; the slope of which differed by several orders of magnitude across model type. AUC scores were significantly influenced by model technique with MaxEnt and GAM performing the best with no transferability. AUC scores were consistently lower when temporal transferability was implemented. AUC scores were significantly correlated with data complexity for all models with no transferability. When temporal transfer occurred AUC scores were only correlated with data complexity for MARS, MaxEnt, BRT, and Random Forest. Consistent with expectations data complexity was inversely related to model performance. Model complexity was not related to model performance contrary to expectations. While model complexity did not predict performance of the model, data complexity did. This study highlights the importance of considering the type of data being used to develop the model, particularly as it relates to the complexity of the data. In cases where complex data is being utilized model selection is important in ensuring good predictive performance.