Habitat, environment and niche: what are we modelling?

Kearney, M. (2006), Habitat, environment and niche: what are we modelling?. Oikos, 115: 186–191. doi: 10.1111/j.2006.0030-1299.14908.x

The following words are often used interchangeably when discussing a species distribution: habitat, environment, and niche. These words have been considered “plagued by loose and inconsistent applications of the concepts in describing different methodological approaches” and this paper implies that there is an overall confusion of what is actually being modeled. Generally, the concept of habitat in species distribution is described as being associated with descriptive/correlative analysis of the environments of the organisms and useful in statistical models of species distributions and abundances, while the niche concept is reserved for mechanistic analysis of how different environmental factors in an organism’s habitat interact with the organism itself.  While distributions can be potentially predicted by modelling descriptions or correlations between organisms and habitat components, it is suggested to model an organisms niche mechanistically to fully understand and explain distributional limits. This is particularly true when trying to predict an organism’s distribution under altering scenarios, such as climate change. ‘Environment’ would refer to the biotic and abiotic phenomena surrounding and potentially interaction with an organism. Two ways to model species distribution taking into account these “confusingly interchangeable words”, are to 1) use the correlative approach, distribution data and GIS habitat data are associated statistically, often in the form of a regression model then interpolated across regions for which spatial data is available to predict areas of high probability of occurrence (use of the habitat concept) and 2) the mechanistic approach (use of the niche concept) in which the interaction between the properties of the organism and the environmental conditions surrounding it are mechanistically modelled and mapped onto the landscape. Although this paper seems like basic definition issues, these words can often be interchangeably used across different published papers which could be confusing for readers who may be just grasping the ideas and concepts. Thus, this calls for more consistent and universal use of such words to avoid confusion and to pay more attention to the definition and the context.

 

Not as good as they seem: the importance of concepts in species distribution modelling

Alberto Jiménez-Valverde, Jorge M. Lobo and Joaquín Horta
Diversity and Distributions, (Diversity Distrib.) (2008) 14, 885–890
doi: 10.1111/j.1472-4642.2008.00496.x

Three important concepts must be kept in mind when modelling species distribution: 1. The distinction between potential and realized distribution 2. The effect of the relative occurrence area of the species on the results of the evaluation of model performance and 3. The general inaccuracy of the predictions of the realized distribution provided by species distribution modelling methods. This paper mentions that different distribution modeling techniques should be used depending on whether the purpose is to determine potential vs realized distribution (Figure 1). Based on previous literature on a variety of species distribution models, the authors ask whether complex models tend to be more accurate then simpler techniques and this question comes into effect when considering a species potential vs. realized distribution (because realized is more context dependent and may require higher resolution data and therefore more complex). Sometimes when determining potential vs. realized distribution, different types of data, such as absence in addition to presence data will be needed, which can help to infer species exclusion because of an interaction with another species. It is also suspected that rarer species models are more accurate because presence data has more specific environmental associations. Thus, data type can also influence whether one thinks that complex provides better results that simpler models. Another issue can arise with the evaluation of model performance which can be biased towards a better performance of complex techniques due to their potential to overfit models of the training data. The performance of the models should be evaluated by examining errors of omission and commission separately and by taking into account the ratio between the extent of occurrence and the whole extent of the region of study. Overall, understanding which models to use requires a clear understanding of the question at hand and the type of data available/feasible to collect in addition to the species biology.

Fig. 1. An overview of suggested models depending on whether potential or realized species distribution is desired.

Capture

Drought sensitivity shapes species distribution patterns in tropical forests

Engelbrecht, BJ, Comita, LS, Condit, R, Kursar, TA, Tyree, MT, Turner, BL, & Hubbell, SP 2007, ‘Drought sensitivity shapes species distribution patterns in tropical forests’, Nature, vol. 447, no. 7140, pp. 80-82

DOI: 10.1038/nature05747

Investigation into the mechanisms behind tree species distribution along an environmental gradient. Differential drought sensitivity shapes plant distribution in tropical forests at both regional and local scales. 48 species were evaluated on their local and regional distribution within a network of122 inventory sites spanning a rainfall gradient along central Panama. The results suggest that niche differentiation with respect to soil water availability is a direct determinant of both local and regional scale distribution of tropical trees. However, global climate change and forest fragmentation can alter soil moisture availability, causing a change in tree species distribution which will be important to monitor for future studies. Regional species distribution – presence/absence and density of species were collected from sampling plots that were situated on both the wet (Caribbean) and dry (Pacific) sides of the Isthmus of Panama. An index of dry season response for 44 of the species was based on the fitted probability of occurrence toward the dry end and toward the wet end of the gradient. Drought sensitivity was a significant predictor of the probability of occurrence of the species on the dry relative to the wet side. At the local scale – species density was collected from sampling plots on wet and dry slopes within Barro Colorado Island. Local associations were analyzed of the tree species with dry and wet habitats. The paper further addresses interactions between environmental conditions that may or may not indirectly influence drought resistance (i.e. water availability) which is suspected to be a major driver of tree species distribution. Linear regressions were used to examine whether species reactions to drought or light were significant predictors of their densities in dry vs. wet sites. Knowledge of these more specific influential factors can help predict what impacts climate change and deforestation can have on future distributions of the species. Instead of using linear regression to detect significant predictor variables and this way determine species distribution, this project could have also benefitted from using a machine-learning algorithm of an ecological niche model. Because all of the elements were available, such as species presence/absence and environmental data it would have been interesting to compare their results with an ENM and determine which methods works best.

Geographic distribution and ecology of potential malaria vectors in the Republic of Korea.

Foley DH, Klein TA, Kim HC, Sames WJ, Wilkerson RC, Rueda LM. J Med Entomol. 2009, 46: 680-692. DOI: 10.1603/033.046.0336

Larval and adult mosquito collection data were used to develop ecological niche models for the potential geographic distribution for eight anopheline species in the Republic of Korea with the intention to compare species distribution of the mosquitos to areas of suspected malaria transmission in order to understand current and potential malaria risk and determine whether there is a particular species that has been responsible for malaria resurgence. Ecological requirements for each species were studied using occurrence only data. Mosquito occurrences that had between 9 and 106 points were used, however, to avoid spatial autocorrelation and ensure data independence, localities that were at least 5km apart were used. Environmental data was downloaded from the WorldClim dataset, which included monthly temperature, precipitation, and the 19 “bioclimatic layers”. Topographic, historic (?) land use, and NDVI layers were also used.  Both GARP and MaxEnt were used to develop the ENM because they have both been used in previous mosquito distribution models and overall have been well received. Garp utilizes the iterative process of rule selection, evaluation, testing, and incorporation or rejection with the intent to “evolve” to maximize predictive accuracy, while MaxEnt also uses testing and training data with a decision threshold to determine presence or absence of a species given the environmental data layers. Prediction success of the distribution models were better than random except for Garp models for 2 mosquito species and MaxEnt for one mosquito species. Reasons for poor distribution prediction may have stemmed from spatial resolution that is too coarse and does not match up with some mosquito species smaller scale environmental needs. Fitting species distribution with records of malaria outbreaks suggested that all species occurred where malaria outbreaks occurred, but some species occurred less frequently. Furthermore, the models were able to determine which environmental variables or landscape characteristics are considered more influential on the distribution of a species. Although these models proved useful in determining geographic distribution across large scales, it could be improved by using absence data as well (instead of presence only), more spatially separated data (instead of data ‘clumped’ around certain locations, higher resolution environmental data that is more up-to-date and land use data (to find whether certain types of disturbance may be associated with a greater abundance of a certain type of mosquito?). Also, another way to improve these models would be to incorporate some kind of interaction variable between species, especially during larval stages.

Fine-Scale Predictions of Distributions of Chagas Disease Vectors in the State of Guanajuato, Mexico

, , , , , , , ,

http://dx.doi.org/10.1093/jmedent/42.6.1068

Many species distribution models (specifically regarding models for the triatomine vector) are conducted at large geographical scales whereas smaller local scales may be more useful in understanding significant drivers of specific triatomine species distributions and thus a more localized disease risk. Lopez-Cardenas assesses a fine-scale distribution (“ a landscape view”) for 5 triatomine species: Triatoma Mexicana, T. longipennis, T. pallidipennis, T. berberi, and T. dimiata by geo-referencing collection localities and using ecological niche modeling with an evolutionary-computing approach. Triatomine species were collected from the field from 201 communities within Mexico. Risk for disease transmission was also assessed from niche mapping results. Predictor variables included the use of multi-temporal, remotely sensed environmental data sets as surrogate for climate data which permits fine-scale predictions across landscapes since climatic monitoring stations are too sparse to permit development of fine-scale climatic maps. Triatomine occurrence points, were used for each triatomine species and processed in the model GARP to determine species ecological niche.  Data were separated into training and testing data and rule variable selection was developed through an iterative selection process. 100 models were generated from the same selective process and the best models were chosen based on ideal omission and commission errors. Chi-squared test was used to compare observed success in predicting distributions of test points with those expected under random models. Results from using GARP has suggested which species provide a greater risk to human health based on their distribution patterns in Mexico. In retrospect, the quality of data they collected seemed ideal for any vector-presence study I’m just curious why they would use occurrence only models such as GARP when they have a good idea of species absence data as well. Perhaps it would be better to use presence-absence models in this situation because of the quality and credibility of their data, and therefore are more likely to have true absences.

 

 

 

The impact of climate change on the geographic distribution of two vectors of Chagas disease: implications for the force of infection.

Medone, Paula, et al. “The impact of climate change on the geographical distribution of two vectors of Chagas disease: implications for the force of infection.” Phil. Trans. R. Soc. B 370.1665 (2015): 20130560.

Rabinovich et al 2015 models the climatic niche distribution of two Chagas disease vectors, Rhodnius prolixus and Triatoma infestans, in Venezuala and Argentina, respectively. ENM were used to determine the current distribution and potential future distribution under climate change scenario using WorldClim environmental data set, determine which variables are significant drivers of triatomine distribution and evaluate potential disease risk shifts in areas known to have high disease prevalence. The authors used presence-only data of the two triatomine species generated from range maps (which may have contained pseudo-absences). To prevent over-fitting of the niche models, 5% of the data were randomly selected for presence points of each species from the complete distribution range. MaxEnt was used to predict the climatic niche for both species under current and future conditions and partial area under the curve (pAUC) was use to evaluate the goodness-of-fit for the models predictions. An interesting follow up would be to understand why AUC has been criticized and pAUC considered favorable (since this was the first time I had heard of pAUC). To evaluate transmission risks to humans and climate suitability, two approaches were used: 1) directly relating suitability and the rate of acquiring the disease (force of infection, FOI) for specific regions in the countries and 2) relating suitability with household vector density and then vector density with FOI. The two approaches likely come with a slew of assumptions such as population demographics associated with infection risk from rural villages and also rely heavily on health data that may be biased or erroneous (since this is considered a neglected disease). However, there probably is no alternative. Results suggest that climate change will have a different impact on each triatomine distribution and transmission risk, which depends on the biology of the triatomine, although a general decrease in disease risk associated with 2050 climate conditions was the take home message. Although the papers methods and approach were useful, it seemed too general to make any solid conclusions from and instead of answering questions, it creates more regarding which (environmental, climatic, microhabitat, etc.) factors are actually important in the distribution of triatomines and how we can gather that data and how use that data for future projects.

Geographic distribution of Chagas disease vectors in Brazil based on ecological niche modeling.

Gurgel-Gonçalves, Rodrigo, et al. “Geographic distribution of Chagas disease vectors in Brazil based on ecological niche modeling.” Journal of tropical medicine 2012 (2012)

http://dx.doi.org/10.1155/2012/705326

Despite the efforts of domestic triatomine vector eradication, Chagas disease transmission in humans still poses a risk in Latin American countries. Determining risk in areas where sampling and data are insufficient or non-existent Gurgel Gonclaves et al, have turned to ENM to determine areas of risk that have been subject to poor sampling of triatomines with synanthropic tendencies resulting in low data resolution in Brazil. ENM was applied here to explore both geographic and ecological phenomena based on known occurrences of various species. They used 3563 records of occurrences within Brazil in which at least 20 unique occurrence points were chosen for 62 species (leaving only 17 to conduct ENM on). These points were analyzed with environmental data set NDVI and 19 bioclimatic WorldClim variables (conventionally used layers). MaxEnt was chosen due to the nature of the study at hand; because no extrapolation would be involved to apply or speculate distributions from other locations. Models from each species were then combined to overlay an area of risk within Brazil. To provide a view of species responses to environmental variation across Brazil, they plotted 1000 random points across the country and determined: 1. presence/absence of each species and 2. The values of the first two principal components of the bioclimatic data set. MaxEnt allowed them to determine that some triatomine species occur within specific biomes, while others are more generally occurring. They were able to detect biome association with specific triatomine occurrence. Determining high risk areas was one of the main objective for this paper, however because triatomines exist through Brazil, determining high risk areas could easily have been just the locations of rural villages. Other than using that reason to conduct the ENM, it otherwise seemed most useful when determining which environmental characteristics drive species distribution at the geographic level. Furthermore, have limited data on 62 species and only being able to conduct ENM on 17 species shows that the amount and type of data is very crucial and so this method is not very useful in studying rare or elusive triatomine species, which would overall be more interesting and probably most important.

Modeling the spatial distribution of Chagas disease vectors using environmental variables and people´s knowledge

 

Modeling the spatial distribution of Chagas disease vectors using environmental variables and people´s knowledge

Jaime Hernández, Ignacia Núñez, Antonella Bacigalupo, Pedro E Cattan

DOI: 10.1186/1476-072X-12-29

The distribution of two triatomine species: Triatoma infestans and Mepraia spinolai were modeled across different spatial scales within Chile. Each species is associated with particular niches and their risk of transmitting Chagas disease to humans varies (i.e. depends on the degree of domiciliation). Hernandez et. al. uses ENM to address spatial and temporal issues relevant to domestic transmission control and despite lack of data available, makes a predictive model by extrapolating actual data of the ecological niches to areas that have similar characteristics across different regions of Chile. Regardless of the degree of affiliation with human dwelling (which could omit the whole point of using ENM), relevant macro environmental variables from satellite based imagery and triatomine presence/absence data were used. To make a predictive model, authors used the machine learning algorithm Random Forest to predict the probability of triatomine presence. Random Forest generated model statistics to deliver strong information predictors across 10km, 5km, and 2.5km scales which were used to make the most suitable predictive model. Despite creating a model that ultimately determined degree of overlap between vector species and predicted that T. infestans can persist outside of domestic conditions, the type of data (presence/absence) does not seem like the best choice overall (because there could be false absence). Instead, to create a model where data is already scarce, esp. concerning true absences, it would be safest (esp. in a human disease health risk scenario) to use presence/background based model.

“Ecologic niche modeling and potential reservoirs for Chagas disease, Mexico.”

Peterson, A. Townsend, et al. “Ecologic niche modeling and potential reservoirs for Chagas disease, Mexico.” (2002).

doi:  10.3201/eid0807.010454

Townsend et. al. applies ecological niche modeling to improve the understanding of epidemiologically important vectors and parasite-reservoirs of Chagas disease using the Neotoma (pack-rat) and Triatoma species affiliation as a study system. The purpose of the study was to determine potential risk areas with ENM using primary occurrence data of various triatomine species to identify the degree of host affiliation with Neotoma sp. (conventionally known to be a strong vector-host affiliation) within Mexico. The generated model output was compared with field observations to test the quality of the model. Both ecological niches and potential geographic distribution were generated using the Genetic Algorithm for Rule-Set Prediction with environmental/ecologic data coverage (11 conventionally used variables). From the data available of the various Triatoma species, species with small sample sizes were not used in niche model. Prediction output was compared with known distributions of both rodent and vector, using the percentage of overlap as species associations and potential disease transmission. Results: Predicted distribution performed well, and triatomines were indeed found at locations predicted from the model, which also overlapped rodent distribution, suggesting a strong affiliation with the host. However, the model did fail to predict some species overlap which has been observed in the field, and this may be due to small sample of the species. This result limited the reliability of model to predict disease risk.

Regardless, this paper suggests that ecological niche modeling and species distribution prediction with GARP is useful tool in determining potential interactions between disease vectors and reservoir hosts. Can also suggest evolutionary relationships between vector and host depending on the percentage of overlap and identify further species interactions that were not previously identified or view potential jumps from sylvatic affiliations to more peridomestic habitat types. Thus, this method can provide a useful supplementary tool to link current/future Chagas disease risk given host-vector distributions in addition to detecting shifts in host affiliation and distributional boundaries.

Modelling species distributions in Britain: a hierarchical integration of climate and land-cover data

Pearson, Richard G., Terence P. Dawson, and Canran Liu. “Modelling species distributions in Britain: a hierarchical integration of climate and land‐cover data.” Ecography 27.3 (2004): 285-298.

DOI: 10.1111/j.0906-7590.2004.03740.x

Pearson et. al, makes an argument that using a hierarchical framework approach for modeling species distribution benefits the understanding of unique roles and combined effects that climate change and landscape disturbance have on the determination of species distribution. The authors address the interaction between climate and land use change as determinants of species distribution by integrating both at fine scale (land cover data) with coarser scale climate data. Incorporating climate and land cover data at different spatial scales identifies the possibility that different environmental factors have a different impact on species depending on the scale. METHOD: They used presence-absence data of four plant species in Britain (which represent a range of habitat associations, life-forms, and distribution characteristics). The fine scale suitability surface was generated using the bioclimatic model SPECIES, which uses an Artificial Neural Network to first identify suitability at the European extent (continental scale – climate driven), then trained at the regional-scale (Great Britain) at 10 km then 1 km resolutions (climate and land cover driven). It is believed that at these scales these environemental factors are most apparent. Climate suitability was ultimately refined based on correlations between land cover type and observed distributions at 1 km and 10 km resolution. In order to match resolution from continent to regional scales, it was necessary to artificially aggregate suitability of cells. The hierarchical methodology was tested against a non-hierarchical method (see text) and performance of the models were evaluated using K statistic and AUC. Three threshold values were chosen (which will ultimately depend on the management situation for the species of interests).                                                                                     Incorporating land cover data improved model performance for some species, suggesting that the importance of different environmental variables on species distribution depends on the species requirements. Hierarchical vs. non-hierachichal methods (and across finer spatial scale (10 km vs 1 km)) did not perform better than the other when modeling current distribution of species. Ultimately, for predicting future species distributions, it is important to initially determine whether the decline of the species is driven from land cover of climatic variables. Theoretically, integrating hierarchical data seems like the ideal way to model species distributions, but of course there are data limitations which makes this method less feasible. It would be very interesting to apply this approach to a vertebrate/invertebrate species and compare conclusions.