Skip to content

Latest commit

 

History

History
70 lines (52 loc) · 2.01 KB

MEASLES.md

File metadata and controls

70 lines (52 loc) · 2.01 KB

Download data from Genbank

  • Genbank search URL

  • This is search fields of measles[title] AND viruses[filter] AND ("5000"[SLEN] : "20000"[SLEN])

  • Send to : Complete Record : File : Accession List

  • This downloads the file sequence.seq

  • Remove the .1, .2, etc... from the accession numbers in sequence.seq:

    sed -i '' -e 's/.1$//g' -e 's/.2$//g' sequence.seq
    

Upload to fauna

python3 vdb/measles_upload.py \
  -db vdb \
  -v measles \
  --ftype accession \
  --source genbank \
  --locus genome \
  --fname sequence.seq

Download from fauna

python3 vdb/download.py \
  --database vdb \
  --virus measles \
  --fasta_fields strain virus accession collection_date region country division location source locus authors url title journal puburl \
  --resolve_method choose_genbank \
  --fstem measles

This results in the file data/measles.fasta with FASTA header ordered as above.

Parse

augur parse \
  --sequences data/measles.fasta \
  --output-sequences data/sequences.fasta \
  --output-metadata data/metadata.tsv \
  --fields strain virus accession date region country division city db segment authors url title journal paper_url \
  --prettify-fields region country division city

This results in the files data/sequences.fasta and data/metadata.tsv.

Compress

zstd -T0 data/sequences.fasta
zstd -T0 data/metadata.tsv

This results in the files data/sequences.fasta.zst and data/metadata.tsv.zst.

Push to S3

nextstrain remote upload s3://nextstrain-data/files/measles/ data/sequences.fasta.zst data/metadata.tsv.zst

This pushes files to S3 to be made available at https://data.nextstrain.org/files/measles/sequences.fasta.zst and https://data.nextstrain.org/files/measles/metadata.tsv.zst.

Run measles workflow

See instructions at https://github.com/nextstrain/measles.