SDMdata: A Web-Based Software Tool for Collecting Species Occurrence Records

Obtaining data dynamically and programmatically is necessary for reproducible research. This a blanket statement. What I mean specifically is that the ability to access data programmatically from a source that is version controlled allows for the consistent use of data. Currently, many databases are accessible through web-based interfaces, but have no API or method to access the data programmatically. This matters because subsequent analysis of the data is based only on that snapshot from a potentially dynamic database. Ideally, a complete workflow would include pulling the data from a database, cleaning it, analyzing it, and outputting results. This paper introduces a tool to download and clean species occurrence data from GBIF (Global Biodiversity Information Facility). This tool is web-based, written in Python, that takes a species name list, and outputs occurrence data from GBIF. They argue that the current _R_ implementation (`rgbif`) is flawed because of memory limitations (which is a pretty facile argument). I do like that `SDMdata` has an error-checking feature that will flag suspected errors. However, the proliferation of tools to query databases tends to “muddy the waters” in my opinion. Several resources already exist for programmatic data acquisition from GBIF in R, SQL, and Python. Perhaps this tool adds something novel; perhaps we should focus on making existing tools better.

 

Link to paper

Link to software