Skip to content

Fitness dynamics of SARS-CoV-2 and seasonal influenza H3N2

Notifications You must be signed in to change notification settings

blab/fitness-flux

Folders and files

NameName
Last commit message
Last commit date

Latest commit

aa34f59 · Nov 12, 2024

History

28 Commits
Nov 8, 2024
Nov 12, 2024
Nov 11, 2024
Nov 4, 2024
Nov 8, 2024
Nov 4, 2024
Nov 4, 2024
Nov 8, 2024
Oct 1, 2024

Repository files navigation

Comparing fitness dynamics across SARS-CoV-2, influenza H3 and influenza H1

Provision metadata locally

mkdir data
cd data

For SARS-CoV-2

aws s3 cp s3://nextstrain-ncov-private/metadata.tsv.zst sarscov2_metadata.tsv.zst
zstd -c -d sarscov2_metadata.tsv.zst \
   | tsv-select -H -f strain,date,country,clade_nextstrain,Nextclade_pango,QC_overall_status \
   | zstd -c > sarscov2_subset_metadata.tsv.zst

and move to fitness-dynamics/data/.

For H3N2, clone https://github.com/blab/flu-geo-fitness and then run

nextstrain build . data/h3n2/metadata_with_nextclade.tsv
cd data/h3n2/
sed -i -e 's/\tseqName\t/\tstrain\t/' metadata_with_nextclade.tsv
sed -i -e 's/\tUsa\t/\tUSA\t/g' metadata_with_nextclade.tsv
sed -i -e '1s/$/\tinclusion/; 2,$s/$/\tglobal/' metadata_with_nextclade.tsv
tsv-select -H -f strain,date,region,country,inclusion,subclade,qc.overallStatus metadata_with_nextclade.tsv > metadata_selected.tsv
tsv-filter -H --str-ne subclade:unassigned --str-gt date:2000-01-01 metadata_selected.tsv > metadata_filtered.tsv
zstd -c metadata_filtered.tsv > h3n2_subset_metadata.tsv.zst

and move to fitness-dynamics/data/.

Workflow

Once metadata is provisioned locally, run the entire workflow with

nextstrain build . all_mlr_estimates

Sequence counts

Data for the project consists of daily sequence counts of clades of SARS-CoV-2, influenza H3 and influenza H1. Sequence counts are provisioned to the sequence-counts/ directory. From top-level directory run

nextstrain build . all_sequence_counts

to produce the sequence counts files

sequence-counts/sarscov2_clades_2020/prepared_seq_counts.tsv

Currently, clade counts are provisioned for just the USA.

MLR estimates

Run MLR models using evofr package. Run the model with

nextstrain build . all_mlr_estimates

to produce the MLR output JSON files

mlr-estimates/sarscov2_clades_2020/mlr_results.json

Scaffolded fitnesses

Fitnesses within each timepoint are measured relative to an arbitrary pivot variant. The Mathematica notebook fitness-flux-analysis/fitness-flux.nb takes mlr_results.json across timepoints and combines into a single scaffolded-fitness/sarscov2_clades_scaffolded_fitness.tsv. This notebook needs to be run separately for virus sarscov2 classification clades, virus sarscov2 classification lineages and virus h3n2 classification clades.

About

Fitness dynamics of SARS-CoV-2 and seasonal influenza H3N2

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published