List of available fields from the ENA portal API here
Initially we should use the following fields:
ENA field name | DwC | suggested formatting | Comments |
---|---|---|---|
accession |
occurrenceID |
This is the primary key | |
accession |
associatedSequences |
https://www.ebi.ac.uk/ena/browser/api/embl/value | |
accession |
references |
https://www.ebi.ac.uk/ena/browser/view/value | |
location |
decimalLatitude , decimalLongitude |
contains both lat and lon, must be split | |
country |
country , locality |
Has format <country>:<locality> , must be split |
|
identified_by |
identifiedBy |
||
collected_by |
recordedBy |
||
collection_date |
eventDate |
||
specimen_voucher |
catalogNumber |
||
specimen_voucher |
basisOfRecord |
<value> IS NOT NULL ? "PreservedSpecimen" : "MaterialSample" |
|
sequence_md5 |
taxonID |
ASV:<value> |
As proposed here. Allows to search for identical sequence variants |
scientific_name |
scientificName |
||
tax_id |
taxonConceptID |
https://www.ebi.ac.uk/ena/browser/view/Taxon:value | Initially we should see how far we get by just supplying scientificName. But we may need a subsequent call to their taxonApi to retreive higher taxonomic ranks |
altitude |
minimumElevationInMeters , maximumElevationInMeters |
Should we populate bot max and min? | |
sex |
sex |
||
description |
occurrenceRemarks |
||
host |
associatedTaxa |
"host":"Lutra lutra" (Lutra Lutra is the name of the host in this example) |
Get data for first 100 sequences that has coordinates:
Get first 100 sequences that has information in the country
field:
Get first 100 sequences that has a catalogNumber:
Get first 100 sequences that has information in the identifiedby
field: