Predicting the conservation status of data-deficient species

Bland, L. M., Collen, B., Orme, C. D. L. and Bielby, J. (2015), Predicting the conservation status of data-deficient species. Conservation Biology, 29: 250–259. doi: 10.1111/cobi.12372


One-sixth of the >65,000 species assessed by the IUCN are classified as data deficient (DD) due to a lack of information on taxonomy, geographic distribution, population status, or threats. Field surveys of DD species is not feasible, but large amounts of life history, ecological, and phylogenetic information are available can be combined for a comparative study of extinction risk based on species trait data.

The authors address the following questions:

  1. What are the relative abilities of 7 different ML methods (classification trees, random forests, boosted trees, k nearest neighbors, support vector machines, neural networks, and decision stumps) to predict extinction risk in terrestrial mammals?

Random forests, boosted trees, support vector machines, and neural networks performed particularly well. Classification trees and k nearest neighbors performed relatively poor.

  1. How accurately can those methods predict current geographical patterns of extinction risk?

The presented models were less likely to assign narrow-ranging non-threatened species and wide-ranging threatened species to their correct status.

  1. Using the models obtained, what is the predicted level of extinction risk faced by DD species?

313 of 493 (63.5%) of DD species are predicted as threatened, this increases the global proportion of threatened terrestrial mammals from 22% to 27%.

  1. How do our findings change current geographical patterns of extinction risk for terrestrial mammals?

Not really

Methods: The authors collated a database of 4461 terrestrial mammals classed as either non-threatened, threatened, vulnerable, endangered, critically endangered or data deficient. Additionally,  life history traits biogeographic distribution and habitat suitability were collected for each mammal. ML models (to predict threatened/non-threatened status) were developed using all mammals, along with separate models of rodents, bats, primates, and carnivores to explore the taxonomic transferability of ML predictive accuracy. Highly correlated and low variance variables were removed before fitting any models.  The training/testing (75/25) data set did not include any DD species. All models were tuned to maximize AUC values.  The Youden index was used to set the probability threshold to distinguish between the two classes. Predicted (from the best global ML) threatened species’ range maps were then compared to current global patterns of extinction risk.  

 

Modelling the oceanic habitats of two pelagic species using recreational fisheries data

Brodie, S., Hobday, A. J., Smith, J. A., Everett, J. D., Taylor, M. D., Gray, C. A., et al. (2015). Modelling the oceanic habitats of two pelagic species using recreational fisheries data. Fisheries Oceanography, 24(5), 463-477.

DOI: 10.1111/fog.12122

fog12122-fig-0004

Species distribution modeling lends a useful tool for describing the environmental requirements of a species and understanding how a species may respond to a changing environment. As these models are built on a combination of presence records and environmental covariates, which are logistically difficult to collect for pelagic species, species distribution models are rarely developed for such species. Fishery catch records, which exist for some pelagic species, are no different from typical presence-only data, except that there is typically no way to quantify fishing effort and as such determining habitat suitability from this data is difficult. This paper seeks to develop a species distribution model for two pelagic species using presence-only fishery data. Poisson point process models are presence-only methods that model intensity of the points per unit area as a proxy for relative abundance. Data for presence of dolphinfish and kingfish were acquired from the New South Wales Department of Primary Industries catch and release program. Environmental covariates were extracted from the Spatial Dynamics Ocean Data Explorer for presence and pseudo-absence points. A PPM was constructed to predict the distribution of each species as a function of environmental covariates with the presence and pseudo-absence points acting as the binary response variable. All environmental covariates were retained by the model which predicted fish intensities for both species reasonably well (AUC 0.80 and 0.81 for dolphinfish and kingfish respectively). Dolphinfish intensity increased along the coast during the summer and autumn, while kingfish intensity shifted south during the summer and autumn. These results show a strong relationship between pelagic fish distribution and ocean environmental variables, along with seasonal shifts in distribution for these species. This study successfully implemented species distribution modeling with a novel data collection strategy by using fishery catch data. This approach to species distribution modeling can be particularly applicable to managers whom wish to understand the distribution of the species they are managing as well as the abundance of that species across the region.   

Dynamic occupancy models for analyzing species’ range dynamics across large geographic scales

Bled, F., Nichols, J. D., & Altwegg, R. (2013). Dynamic occupancy models for analyzing species’ range dynamics across large geographic scales. Ecology and Evolution, 3(15), 4896-4909.

DOI: 10.1002/ece3.858

Through expanding citizen science efforts the large-scale biodiversity data required to predict species responses to global climate change are becoming increasingly available. But, drawing inferences from these large-scale data sets can be difficult as the data can be heterogeneous simply due to difficulties in collecting data in a standardized way. There is a need for a robust method to analyze the data that can account for variation in observation processes and spatial correlation. This paper seeks to develop a hierarchal occupancy model to analyze bird data collected across the southern portion of Africa. The use of this model is then illustrated by applying it to a study of the range dynamics of the hadeda ibis. To monitor bird species in southern Africa two atlas projects were established, in which citizens reported occurrence of species through species lists to a database. A hierarchical occupancy model was established to model occupancy at three levels (distribution at the scale of each atlas project, the yearly occupancy within each project, and the detection/non-detection on yearly use). The model was implemented on hadeda ibis, a large conspicuous bird that is unlikely to be mistaken for another species. The ibis had high occupancy probabilities across the ranges of both projects, with occupancy increasing from project one to project two. Based on geographic location of these projects this result reflects a range expansion of the ibis. This occupancy model for biodiversity data is conceptually similar to GAM-based species distribution models. As efforts for collecting data over larger extents continue this occupancy model will be useful in analyzing the data and allowing the researcher to address larger scale questions. This paper developed an occupancy model that utilized a citizen science project and demonstrated how the model could be used to study the range dynamics of a specific species. The authors conclude that the model could be used to answer many different macroecological questions and that range dynamics is just one example of how this occupancy model can be implemented.

Species distribution models grounded in ecological theory for decision support in river management

Bennetsen, E., Gobeyn, S., & Goethals, P. L. M. (2016). Species distribution models grounded in ecological theory for decision support in river management. Ecological Modelling, 325, 1-12.

DOI: 10.1016/j.ecolmodel.2015.12.016

1-s2.0-S0304380015005918-gr2

River managers must restore their systems with limited budgets and answer to conflicting stakeholders, and as such can be subjected to increased scrutiny. When planning restoration it is important to understand the stressors within the system and as such there has been more effort applied to the development of ecological models in European river systems. These models are often more related to ecological quality indices than actual species distribution models. While SDMs are often used to assess impacts of stressors on a system, the difficulties in modeling multiple species within the system has made application in riverine conservation difficult. The authors present a model for which they use species distribution of multiple species to assess the environmental condition of rivers throughout the landscape. This project comprised of six steps, (1) development of model concept, (2) data exploration and preparation, (3) construction of habitat suitability indices, (4) implementation of model concept, (5) model selection and assessment, and (6) meta-analysis of model results. The authors built a general model concept in which environmental stressors act as hierarchical environmental filters upon the realized species assemblages. This model is structured as abiotic filters in the form of habitat suitability indices which result in possible species assemblages. The model included 34 environmental variables to explain species assemblages. Habitat suitability indices were derived for each univariate from the trapezoid curve of the species distribution along an environmental gradient. In order to predict the presence of a species, HSI were combined as limiting abiotic filters. If HSI scores were low for many environmental variable the probability of presence of a species would decrease. Four models were constructed (one with just abiotic factors, one that included a geographic filter, one that included interactions of parameters, and one that included both the geographic filter and the interactions of parameters) to test the effect of model structure on prediction. HSIs were constructed for 92 taxa. The model that included both the geographical filter and the interaction parameter had the best performance. All models showed high agreement with known species distributions and ecological knowledge of individual species. Model performance differed strongly between species and as there is no relationship between how well a model agrees with ecological knowledge and the final model performance; improvements to the HSI are unlikely to improve model performance. The inclusion of stakeholders in model development resulted in credible and acceptable models which are able to handle multiple species at once and aid in the prediction of species assemblages across river systems. This type of species distribution relies heavily on prior knowledge regarding species response to environmental stressors, and as such differs from modern species distribution approaches. While this model can be easily interpreted and may be more likely to be accepted by stakeholders in the management of these rivers, more modern algorithmic based models may provide better prediction of species distributions in the system.   

The effects of model and data complexity on predictions from species distributions models

Garcia-Callejas, D., & Araujo, M. B. (2016). The effects of model and data complexity on predictions from species distributions models. Ecological Modelling, 326, 4-12.

DOI: 10.1016/j.ecolmodel.2015.06.002

 

1-s2.0-S0304380015002513-gr3

Species distribution models involved statistical or numerical methods that relate distributions of a species with layers of environmental data. While tests of SDM performance have concluded that more complex models are generally better than simple models, performance may be inflated when test data are not independent from training data, such as when data is randomly split into test and training data sets. The few studies that have tested transferability of models on completely new data have found no relationship between complexity and model performance. Delineation between simple and complex models can be difficult. Typically simple models are thought of as easy to comprehend and perform simple computational operations. Complex models have several layers of complexity that play a role in making them difficult to comprehend. First complex models may require a complex algorithm that uses a relatively large amount of computational resources. These models are referred to as time or algorithmic complex. Another source of complexity can be found in data complexity. The influence of data complexity on model performance has not been formally explored though the authors predict that simple data sets are likely easier to model and as such should yield better performance than models trained on complex datasets. To explore the influence of complexity on model performance the authors simulated the distribution of three species using a set of environmental covariates. Eight modeling methods were considered (BIOCLIM, GLM, GAM, MARS, Maxent, BRT, random forest, SVM) in evaluation model performance with varying complexity. A linear relationship existed between dataset size and computation time; the slope of which differed by several orders of magnitude across model type. AUC scores were significantly influenced by model technique with MaxEnt and GAM performing the best with no transferability. AUC scores were consistently lower when temporal transferability was implemented. AUC scores were significantly correlated with data complexity for all models with no transferability. When temporal transfer occurred AUC scores were only correlated with data complexity for MARS, MaxEnt, BRT, and Random Forest. Consistent with expectations data complexity was inversely related to model performance. Model complexity was not related to model performance contrary to expectations. While model complexity did not predict performance of the model, data complexity did. This study highlights the importance of considering the type of data being used to develop the model, particularly as it relates to the complexity of the data. In cases where complex data is being utilized model selection is important in ensuring good predictive performance.

Modeling the spatial distribution of Chagas disease vectors using environmental variables and people´s knowledge

 

Modeling the spatial distribution of Chagas disease vectors using environmental variables and people´s knowledge

Jaime Hernández, Ignacia Núñez, Antonella Bacigalupo, Pedro E Cattan

DOI: 10.1186/1476-072X-12-29

The distribution of two triatomine species: Triatoma infestans and Mepraia spinolai were modeled across different spatial scales within Chile. Each species is associated with particular niches and their risk of transmitting Chagas disease to humans varies (i.e. depends on the degree of domiciliation). Hernandez et. al. uses ENM to address spatial and temporal issues relevant to domestic transmission control and despite lack of data available, makes a predictive model by extrapolating actual data of the ecological niches to areas that have similar characteristics across different regions of Chile. Regardless of the degree of affiliation with human dwelling (which could omit the whole point of using ENM), relevant macro environmental variables from satellite based imagery and triatomine presence/absence data were used. To make a predictive model, authors used the machine learning algorithm Random Forest to predict the probability of triatomine presence. Random Forest generated model statistics to deliver strong information predictors across 10km, 5km, and 2.5km scales which were used to make the most suitable predictive model. Despite creating a model that ultimately determined degree of overlap between vector species and predicted that T. infestans can persist outside of domestic conditions, the type of data (presence/absence) does not seem like the best choice overall (because there could be false absence). Instead, to create a model where data is already scarce, esp. concerning true absences, it would be safest (esp. in a human disease health risk scenario) to use presence/background based model.

“Ecologic niche modeling and potential reservoirs for Chagas disease, Mexico.”

Peterson, A. Townsend, et al. “Ecologic niche modeling and potential reservoirs for Chagas disease, Mexico.” (2002).

doi:  10.3201/eid0807.010454

Townsend et. al. applies ecological niche modeling to improve the understanding of epidemiologically important vectors and parasite-reservoirs of Chagas disease using the Neotoma (pack-rat) and Triatoma species affiliation as a study system. The purpose of the study was to determine potential risk areas with ENM using primary occurrence data of various triatomine species to identify the degree of host affiliation with Neotoma sp. (conventionally known to be a strong vector-host affiliation) within Mexico. The generated model output was compared with field observations to test the quality of the model. Both ecological niches and potential geographic distribution were generated using the Genetic Algorithm for Rule-Set Prediction with environmental/ecologic data coverage (11 conventionally used variables). From the data available of the various Triatoma species, species with small sample sizes were not used in niche model. Prediction output was compared with known distributions of both rodent and vector, using the percentage of overlap as species associations and potential disease transmission. Results: Predicted distribution performed well, and triatomines were indeed found at locations predicted from the model, which also overlapped rodent distribution, suggesting a strong affiliation with the host. However, the model did fail to predict some species overlap which has been observed in the field, and this may be due to small sample of the species. This result limited the reliability of model to predict disease risk.

Regardless, this paper suggests that ecological niche modeling and species distribution prediction with GARP is useful tool in determining potential interactions between disease vectors and reservoir hosts. Can also suggest evolutionary relationships between vector and host depending on the percentage of overlap and identify further species interactions that were not previously identified or view potential jumps from sylvatic affiliations to more peridomestic habitat types. Thus, this method can provide a useful supplementary tool to link current/future Chagas disease risk given host-vector distributions in addition to detecting shifts in host affiliation and distributional boundaries.

Novel methods for the design and evaluation of marine protected areas in offshore waters

Leathwick, J., et al. (2008). “Novel methods for the design and evaluation of marine protected areas in offshore waters.Conservation Letters 1(2): 91-102.

 

Declines in marine biodiversity due to human exploitation, especially in fisheries, pose a serious threat to our oceans. Marine Protected Areas (MPAs) are instituted in some areas in order to reverse these losses. Various methods are used to identify and justify new candidates for MPA status. This paper serves as a proof of concept, using the oceanic waters of New Zealand’s Exclusive Economic Zone, for a new method based on Species Distribution Modeling. Patchy locality data on 96 commonly caught fish species were interpolated across the EEZ using a Boosted Regression Trees based SDM based on environmental covariates functionally relevant to fish.  17,000 trawls were used for model fitting and 4,314 were used for model evaluation. Because of the zero inflated nature of the data two BRT models (the first was fit to presence absence in the trawl and the second to log of catch size given presence). After evaluation these models were used to make environment-based predictions of catch per unit effort for each species for 1.59 million 1km2 grid cells. These predictions were employed for delineation of MPAs using the software Zonation. The software begins with preservation of the entire grid and then progressively removes cells that cause the smallest marginal loss to conservation value. The implementation used here attempts to retain high-quality core areas for all species. Of 96 predicted fish distributions 19 endemics were given higher priority weighting. Neighborhood losses were also assessed based on fish life histories such that the loss of some proportion of neighboring cells devalued the focal cell. The final component of the Zonation analysis is cost of preservation. The authors analyze outcomes under 4 cost scenarios: (1) “No cost restraint” equal costs for all cells so analysis is solely driven by species, (2) “full cost constraint” costs for grid cells varied based on fishing intensity, (3) “modified cost constraint” in which the costs of grid cells are rescaled from the “full cost constraint”, (4) “BPA” in which Zonation was used to assess the cost and value of a recently implemented set of Benthic Protection Areas in the waters around New Zealand.

Depth, temperature, and salinity contributed most to the predictive models. Models showed excellent predictive ability for presence/absence (Mean AUC=0.95, range= 0.86-0.99) but predictive ability for catch size was more variable (mean correlation= 0.534, range=0.05-0.82). “No cost constraint” analysis show that preservation of 10% of offshore parts of the EEZ would protect on average 27.4% of the geographic range of each of the analyzed fish species (46.4% if 20% is preserved). Use of neighborhood constraints identifies far more clumped groups of cells for protection. “Full cost constraint” analysis only shared 2/3 of its top 10% cells with the no constraint model but it would only provide slightly lower conservation value (mean=23.4% of each species range protected) with no loss of current fishing activity. “Modified cost constraint” analyses produced a range of intermediates between these two extremes. BPA areas (which comprise 16.6% of the trawlable EEZ) would protect on average 13.4% of species ranges if no fishing was allowed. Clearly all other scenarios outperform the current implementation of the BPA.

 

MPA figure

Modeling the spatial distribution of the seagrass Posidonia oceanica along the North African coast: Implications for the assessment of Good Environmental Status

Zucchetta, M., Venier, C., Taji, M. A., Mangin, A., & Pastres, R. (2016). Modelling the spatial distribution of the seagrass Posidonia oceanica along the North African coast: Implications for the assessment of Good Environmental Status. [Article]. Ecological Indicators, 61, 1011-1023.

DOI:10.1016/j.ecolind.2015.10.059

base_mappe_CS2.3ai

Anthropogenic use of marine habitats has the potential to degrade these environments. Identifying regions that are either heavily degraded or relatively pristine is critical to establishing conservation priorities. Ecological indicators, including abiotic and biotic factors, offer methods for determining the health of the ecosystem. While measurements for abiotic factors may be readily available across a large region through remote sensing, biotic data can be scarce. This paper explores the use of a species distribution model for an indicator species as a method for identifying regions of relatively low impact along the North African coast. A bionomial generalized linear model was fit using presences/absence data for P. oceanica, a seagrass, and a collection of environmental variables. The models identified coastal regions as having high probability of suitable habitat, particularly along the Tunisian and Libyan coasts. In order to assess impact in an area the potential distribution indicator was developed. This indicator is the ratio of predicted distribution of the species to actual observations of the species in the area. Areas where the ratio of actual presence to predicted presence is close to one may be considered to have low impact, where areas with ratios much lower than one may be experiencing sever human impacts. This study demonstrates a method for the use of remote sensing data to assessing regions of low and high anthropogenic impact. These methods appear particularly applicable for those that wish to assess ecosystem health across a large extent, but due to assumptions made regarding the development of the indicator derived from the species distribution model, local ground truthing of environmental health may still be required at the local scale.

Biotic interactions boost spatial models of species richness

Integrating biotic interactions into the framework of species richness models has been a suggested to improve the performance of both species distribution models.  The authors seek to use biotic variables in two species richness modeling frameworks.  Stacked species distribution models (SSDM) fit separate species distribution models then blindly stacks the results of the predicted occurrences to calculate the species richness.  The macroecological models (MEM) do not use the information provided by species identity and community composition to estimate the species number using environmental conditions.  This model assumes species richness is limited by environmental conditions. Using these two models, three different groups of taxa (vascular plants, bryophytes, and lichens) used to examine the effect of integrating biotic variables.

When comparing the results of the models using biotic interactions to models with only climatic and abiotic, biotic models performed consistently better.  Both modeling frameworks and all taxonomic groups using biotic interactions had a lower bias and increased predictive power. These results highlight the importance of using biotic predictor variables in not only species richness models but also single species distribution models.

Mod, H. K., le Roux, P. C., Guisan, A. & Luoto, M. 2015 Biotic interactions boost spatial models of species richness. Ecography (Cop.). 38, 913–921.