- Download sequences and meta information from GISAID
- In EPIFLU, select for H9N2, H7N9, or H5 sequences, select
HA
as required segment, select Submission Date >= last upload date to vdb - Download at most 5000 isolates at a time, may have to split downloads by submission date
- Download Isolates as XLS with YYYY-MM-DD date format
- Download Isolates as "Sequences (DNA) as FASTA"
- Select all DNA, except HE and PE
- Fasta Header as 0: DNA Accession no., 1: Isolate name, 2: Isolate ID, 3: Segment, 4: Passage details/history, 5: Submitting lab
DNA Accession no. | Isolate name | Isolate ID | Segment | Passage details/history | DNA INSDC
- Move files to
fauna/data
asgisaid_epiflu.xls
andgisaid_epiflu.fasta
. - Upload to vdb database
python3 vdb/avian_flu_upload.py -db vdb -v avian_flu --data_source gisaid --source gisaid --fname gisaid_epiflu
- Recommend running with
--preview
to confirm strain names and locations are correctly parsed before uploading- Can add to geo_synonyms file and flu_fix_location_label file to fix some of the formatting.
- Download sequences from IRD
- Search for Sequences and strains
- Select Data Type as Strain
- Enter either "H5N1" or "H7N9" under Subtype
- Click Search
- Click download all ...
- Download "Segment FASTA" as
GenomicFastaResults.fasta
. Select "Custom format", select all and add.
- Move file to
fauna/data
asGenomicFastaResults.fasta
. - Upload to vdb database
python3 vdb/avian_flu_upload.py -db vdb -v avian_flu --data_source ird --source ird --fname GenomicFastaResults.fasta
- Recommend running with
--preview
to confirm strain names and locations are correctly parsed before uploading- Can add to geo_synonyms file and flu_fix_location_label file to fix some of the formatting.
python3 vdb/avian_flu_download.py -db vdb -v avian_flu --select locus:HA subtype:h7n9 --fstem h7n9_ha