Taxonomy alignment as a key to enhance reproducibility in biodiversity research: a case study of Magnolia
Author: Yi-Yun Cheng (Jessica), University of Illinois at Urbana-Champaign
Mentors: Dr. Bertram Ludaescher, UIUC ; Dr. Nico Franz, ASU
Oftentimes in biodiversity research, we expect the scientific names of species to be unique identifiers, but actually they may not be. Why is that?
(1) The scientific names can vary over time
(2) The names stay the same, but the semantics of the names change
Other complicated issues:
(1) Different people may have different perceptions to the taxonomy of a same topic
(2) Species distribution datasets oftentimes only include information on a species ‘name’ without crediting the authorship of that taxonomy
This is why we are in a pressing need to align diffferent taxonomies that is addressing the same topic, not to only make the names more interoperable, but also to make way for further datasets usage.
Step 1: Decide which species (or genus) to examine
Step 2: Domain experts provide a mapping table for the taxonomies used over time for that particular species
Step 3: Researcher transpose domain experts’ table into Euler/X or LeanEuler input file
Step 4: Gather species distribution dataset from biodiverisity portals
Step 5: Concept mapping of the taxonomies and create new datasets based on different taxonomies
Step 6: Data cleaning - geocode missing lat-long information
Step 7: Visualizing species co-occurrence distribution & synthesized taxonomy alignment distribution
Step 8: Niche modeling and further analyisis
-
Step 5: Concept mapping process. Refer to this notebook.
-
Step 6: Filling in missing geo-location information. Clone this repository and run the geocode.py along with testgeocode.py.
-
Step 7: Species co-occurrence distribution visualization. Refer to this notebook.
- try plotdata.py to run the code directly