-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ingest: Switch to NCBI Datasets CLI to fetch data
Replace our custom fetch scripts that uses the NCBI Virus API with the NCBI Datasets CLI commands. NCBI datasets downloads a virus dataset ZIP file that includes a metadata JSON Lines file and a sequences FASTA file. To maintain a record of the single NDJSON file on S3, extract the sequences FASTA file and format the metadata into a TSV file that are parsed into a single NDJSON file using `augur curate passthru`. The metadata TSV is created using the NCBI `dataformat` command so that we do not have to parse the nested JSON lines files ourselves and header fields are renamed to match the previous fields we used for NCBI Virus. The NDJSON file created here no longer includes equivalent fields for "title" or "publication".
- Loading branch information
1 parent
b766498
commit 82ace30
Showing
2 changed files
with
117 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
key value | ||
Accession genbank_accession_rev | ||
Source database database | ||
Isolate Lineage strain | ||
Geographic Region region | ||
Geographic Location location | ||
Isolate Collection date collected | ||
Release date submitted | ||
Update date updated | ||
Length length | ||
Host Name host | ||
Isolate Lineage source isolation_source | ||
BioProjects bioproject_accession | ||
BioSample accession biosample_accession | ||
SRA Accessions sra_accession | ||
Submitter Names authors | ||
Submitter Affiliation submitting_organization |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters