Avian flu pipeline notes

VDB

In EPIFLU, select for H9N2, H7N9, or H5 sequences, select HA as required segment, select Submission Date >= last upload date to vdb
Download at most 5000 isolates at a time, may have to split downloads by submission date
Download Isolates as XLS with YYYY-MM-DD date format
Download Isolates as "Sequences (DNA) as FASTA"
- Select all DNA, except HE and PE
- Fasta Header as 0: DNA Accession no., 1: Isolate name, 2: Isolate ID, 3: Segment, 4: Passage details/history, 5: Submitting lab
- DNA Accession no. | Isolate name | Isolate ID | Segment | Passage details/history | DNA INSDC

python3 vdb/avian_flu_upload.py -db vdb -v avian_flu --data_source gisaid --source gisaid --fname gisaid_epiflu
Recommend running with --preview to confirm strain names and locations are correctly parsed before uploading
- Can add to geo_synonyms file and flu_fix_location_label file to fix some of the formatting.

Search for Sequences and strains
Select Data Type as Strain
Enter either "H5N1" or "H7N9" under Subtype
Click Search
Click download all ...
Download "Segment FASTA" as GenomicFastaResults.fasta. Select "Custom format", select all and add.

python3 vdb/avian_flu_upload.py -db vdb -v avian_flu --data_source ird --source ird --fname GenomicFastaResults.fasta
Recommend running with --preview to confirm strain names and locations are correctly parsed before uploading
- Can add to geo_synonyms file and flu_fix_location_label file to fix some of the formatting.

python3 vdb/avian_flu_download.py -db vdb -v avian_flu --select locus:HA subtype:h7n9 --fstem h7n9_ha