Skip to content

Interim taxonomy patch feature

Jonathan A Rees edited this page May 24, 2017 · 11 revisions

The patch facility permits custom additions and repairs to the Open Tree reference taxonomy (OTT).

Introduction to patch files
Modifying OTT
Format of the patch files
Modifying OTT on your own machine

Introduction to patch files

The taxonomies produced by the taxonomy build process are riddled with errors. Some of these errors are inherited from the input taxonomies, while others are artifacts of merging, such as taxa in lower priority input taxonomy B being paraphyletic relative to a higher priority input taxonomy A.

This file describes the 'interim patch feature' based on tab-delimited tables of patches. We are also experimenting with a patch language that is more script-like, described here.

Patches reside in a series of patch files, which are kept in the 'reference-taxonomy' git repository (see https://github.com/OpenTreeOfLife/reference-taxonomy/tree/master/feed/ott/edits/ ). The order in which the patch files are processed is not defined, but within each patch file the patches are processed in order from top to bottom.

In the process of constructing a new version of the taxonomy (OTT), patches are applied after the source taxonomies (NCBI, GBIF, ...) are algorithmically combined, but before unique taxon identifiers (OTT ids) are assigned (or reassigned) to taxa. Patches can therefore refer to parts of any of the input taxonomies, or to details of the way in which they were combined.

Adding / editing taxa in OTT

In order to add or modify taxa in OTT, you must add the required operation to a text file (a patch file) that gets processed at the end of OTT assembly. The simplest way to modify a patch file is directly on github.com (i.e. this site). There are instructions at the bottom of this page for editing the files locally and uploading to github .

Names of patch files should end in '.tsv' (for tab-separated values).

  1. If you don't already have one, create a GitHub account: use the green Sign up button on the top right. There are more instructions if you need them). You will also need to ask Rick, Karen, Mark or Stephen to add you to the OpenTreeOfLife organization in order to edit files - just send us your github username.
  2. Log into GitHub.
  3. Click on one of the files in the list of patch files. Use ott_edits.tsv if your changes fit into one of the existing categories, or create a new file. (Please do not rename or delete ott_edits.tsv.) If you want to create a new file, you will have to look at the instructions for working locally. (NOTE: there is now a '+' in the github user interface for creating new files.) (NOTE: local editing is much better because then you can create a branch and submit a pull request. (If you're not a project member this will happen automatically.))
  4. Click the Edit button near the top of the file. If it is greyed out, you probably aren't logged in.
  5. Add a new line to the file for every change that you want to make. Each line contains six columns, separated by tabs (not spaces!). The format section describes what should go in each column.
  6. Add one or more comments lines (lines that start with #) to document why you are making the change.
  7. Click the green Commit changes button at the bottom to save your changes. You can add an optional message describing the change (and it is a good idea to do so!).

If you create a new file make sure that the name ends in .tsv, and consists of only letters, digits, hyphen (-), underscore (_), or dot (.).

Format of the patch file

This is not a very tasteful or flexible notation; that it why this is the interim patch feature, not the patch feature. We hope that this mechanism will be replaced with something better in the future.

Each patch file is a plain (UTF-8) text file containing a tab-delimited table. Blank rows and rows beginning with '#' are ignored. Please make liberal use of comment lines '#' to explain the reason for the change.

The columns of a patch file are as follows

  • command - what kind of operation to perform (includes add, synonym, move, prune and elide)
  • name1 - the name of a taxon (call it taxon1)
  • rank - rank to be associated with taxon1
  • name2 - the name of another taxon (call it taxon2)
  • context - the name of a taxon already in OTT that is an ancestor of taxon1 and taxon2 (for homonym disambiguation) - typically this can be at the kingdom level since names are usually unique within any particular code (not always though)
  • sourceInfo - information about the source of this taxon, see below

If neither name1 nor name2 is a homonym (of some other name) then context can be 'life'. Otherwise context should be some non-homonym name with the property that the taxon it names has at most one descendant named name1 (or name2).

The commands are:

add - taxon1 is a (probably new) child of taxon2

  • taxon2 should already exist in the taxonomy.
  • If taxon1 already exists, no action is taken. The command is flagged as an error if taxon1's parent is other than taxon2, or if its rank is not as given.
  • Otherwise taxon1 is added to the taxonomy as a child of taxon2, with the specified rank.

synonym - name1 is a name for taxon2

  • taxon2 should already exist in the taxonomy.
  • If name1 is already a name for taxon2, no action is taken.
  • Otherwise name1 is added as a name for taxon2 (i.e. a synonym of name2).
  • The rank field should be blank.

move - change taxon1's parent

  • taxon1 and taxon2 should already exist.
  • taxon2 is made to be the parent of taxon1.
  • The rank of taxon1 is made to be as specified.

prune - delete a taxon and its descendants

  • taxon1 should exist; if it doesn't no action is taken.
  • name2 should be blank.

elide - delete a taxon but not its descendants

Its children are altered so that their parent becomes the parent of the deleted taxon.

  • taxon1 should exist; if it doesn't no action is taken.
  • name2 should be blank.

sourceInfo field

This is used to provide source or provenance information for newly added taxa. If you are adding a new taxon, under no circumstances should this field be left empty.

  • The value should be either a URI or a CURIE.
  • This should be a reference to an accession in an established taxonomic database.
  • If no such reference is available, it should be a URL that refers to a published description of the taxon in question.
  • If no such reference is available, it should be the DOI for the article reporting the study, assuming it explains the new name (and it should!), written in the form of a URL (http://dx.doi.org/10.something).
  • If there is no DOI for the article, use the best (most stable) possible URL for the article you can get ahold of.
  • If there is no provenance other than that it's your own unpublished opinion, put the URL for your ORCID.

Examples

The system currently (Sept 2013) knows about the above four databases (ncbi: gbif: IF: MB:). Other source prefixes can be added on request. Submit the request as an issue in the 'opentree' issue tracker https://github.com/OpenTreeOfLife/reference-taxonomy/issues. Absent any other kind of provenance, provide a URL that goes to an explanation of the name.

Modifying a patch file on your local machine

Instructions for this coming soon. Reading up on git would be a good place to start. Here are a few places to start: