-
Notifications
You must be signed in to change notification settings - Fork 12
Taxon flags
NOTE: the flags documentation has moved into the repository, here. The following text is retained for no very good reason.
The last column in the taxonomy.tsv file in the interim taxonomy file format is "flags". The flags entry is a comma-separated list of flags or markers. Usually these are generated by taxonomy synthesis and are used to decide whether a taxon is to be suppressed in downstream processing. For example, if there's a not_otu
flag then the name may not correspond to anything taxon-like and it may be desirable to suppress the name.
The possible values in that field are:
-
'Incertae sedis'-like flags
-
incertae_sedis
- in source taxonomy, was a member of an "incertae sedis" container (also "unallocated", "unclassified", "mitosporic") -
incertae_sedis_inherited
- descends from a node flaggedincertae_sedis
-
major_rank_conflict
- in source taxonomy, there is a gap, skipping a Linnaean rank, between the node's rank and its parent's rank, while there is a sibling not showing such a gap. For example: a genus in an order, that has a sibling that is a family. This flag is only applied in certain sources, e.g. GBIF, which happen to represent "incertae sedis" in this way. Does not apply to NCBI. Processed the same asincertae_sedis
-
major_rank_conflict_inherited
- descends from a node flaggedmajor_rank_conflict
-
unplaced
(new in OTT 2.9) - equivalent toincertae_sedis
. The nodes's parent is inconsistent with OTT, i.e. does not fit into the hierarchy, so the node has been made to be a child of the MRCA of the children of the inconsistent taxon -
unplaced_inherited
- descends from a node flaggedunplaced
-
environmental
- child of a node whose name contains the strings "environmental samples" or "mycorrhizal samples". Equivalent toincertae_sedis
-
environmental_inherited
- descends from a node flagged "environmental" -
sibling_higher
- has a sibling with a higher rank, wheremajor_rank_conflict
does not apply. For example: a subfamily with a sibling that's a family. Similar tomajor_rank_conflict
, but treatment as incertae sedis is not definitely warranted. Currently this only serves as a warning to a human browsing the taxonomy; it has no effect on assembly. -
inconsistent
(new in OTT 2.9) - a placeholder or "tombstone" for a taxon that has been removed due to its being inconsistent with higher priority taxa (judged to be not a clade). Does not have children, and can generally be ignored. -
merged
(OTT 2.9) - similar toinconsistent
, but the children were directly placed in a larger taxon
-
-
Other flags
-
barren
- there are only higher taxa at and below this node, no species or unranked tips -
extinct
- node is annotated as extinct (usually but not always by IRMNG) -
extinct_inherited
- descends from a node flaggedextinct
. -
hidden
- marked hidden due to Open Tree curatorial decision (e.g. microbes from GBIF) -
hidden_inherited
- descends from node flaggedhidden
-
hybrid
- taxon name contains "hybrid" or " x " indicating that it is a hybrid. Also, any node descended from such a node. -
infraspecific
- descends from a node with rank "species" -
not_otu
- the name suggests that this is not a taxon. Keywords interpreted this way include "uncultured", "unclassified", "unidentified", "unknown", "metagenome", "other sequences", "artificial", "libraries", "tranposons", and a few others. Also "sp." when at the end of a name. Also, any node descended from such a node. This flag is applied to NCBI taxa but not to SILVA taxa. -
viral
- the taxon name suggests that it has something to do with viruses. Also, any node descended from such a node. -
was_container
- this node used to be a container pseudo-taxon (incertae sedis, environmental samples, etc.) but its children have all been flagged and moved to the node's parent
-
-
Deprecated flags: (occur in old versions of OTT but not current ones)
-
major_rank_conflict_direct
- superseded bywas_container
-
unclassified
- this is NCBI's way of saying incertae sedis -
unclassified_inherited
- descends from a node flaggedunclassified
-
sibling_lower
(deprecated as of OTT 2.9) -
tattered
(deprecated as of OTT 2.9 in favor ofwas_container
) -
tattered_inherited
(deprecated as of OTT 2.9 in favor ofunplaced
andunplaced_inherited
) -
edited
- the taxon has been subject to an ad hoc edit ("patch") -
forced_visible
- not currently used -
extinct_direct
- superseded bywas_container
-
For more detail see the taxomachine source code and the smasher source code.
Synthesis (treemachine and future methods) and taxomachine are guided by the presence of these flags; each has its own list of flags that it uses as criteria for deciding whether to include an OTT entity in processing. For taxomachine, the flags affect which names are offered via the TNRS. For synthesis, the flags determine whether a node is to be included in the tree.
Taxon flags influence the behavior of the taxonomic name resolution services. If a taxon has any of the following flags, it is suppressed for TNRS purposes (i.e. not offered in TNRS results):
* not_otu
* environmental
* environmental_inherited
* viral
* hidden
* hidden_inherited
* was_container
WRITE ME