Skip to content

Commit

Permalink
Merge pull request #3 from nextstrain/make-divergence-phylogeny
Browse files Browse the repository at this point in the history
Make divergence phylogeny
  • Loading branch information
kimandrews authored Aug 12, 2024
2 parents 443b122 + 66db35a commit 363b120
Show file tree
Hide file tree
Showing 13 changed files with 724 additions and 90 deletions.
9 changes: 3 additions & 6 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
# CHANGELOG

We use this CHANGELOG to document breaking changes, new features, bug fixes,
and config value changes that may affect both the usage of the workflows and
the outputs of the workflows. See the [changelog for the ncov
repository](https://github.com/nextstrain/ncov/blob/HEAD/docs/src/reference/change_log.md)
for an example of formatting.
* 12 August 2024: Create a full genome phylogeny for rabies [PR#3](https://github.com/nextstrain/rabies/pull/3)
* 25 July 2024: Add CI GH Action workflow to test the ingest workflow [PR#6](https://github.com/nextstrain/rabies/pull/6)
* 15 July 2024: Make rabies-specific modifications to the ingest directory (which originated from the pathogen-repo-guide) [PR#2](https://github.com/nextstrain/rabies/pull/2)
24 changes: 23 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,25 @@
# Nextstrain repository for rabies virus
This repository contains two workflows for the analysis of rabies virus data:

This repo is under development.
- [`ingest/`](./ingest) - Download data from GenBank, clean and curate it and upload it to S3
- [`phylogenetic/`](./phylogenetic) - Filter sequences, align, construct phylogeny and export for visualization

Each folder contains a README.md with more information. The results of running both workflows are publicly visible at [nextstrain.org/rabies](https://nextstrain.org/rabies).

## Installation

Follow the [standard installation instructions](https://docs.nextstrain.org/en/latest/install.html) for Nextstrain's suite of software tools.

## Quickstart

Run the default phylogenetic workflow via:
```
cd phylogenetic/
nextstrain build .
nextstrain view .
```

## Documentation

- [Running a pathogen workflow](https://docs.nextstrain.org/en/latest/tutorials/running-a-workflow.html)
- [Contributor documentation](./CONTRIBUTING.md)
2 changes: 1 addition & 1 deletion phylogenetic/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ configfile: "defaults/config.yaml"
rule all:
input:
# Fill in path(s) to the final exported Auspice JSON(s)
auspice_json="",
auspice_json="auspice/rabies.json",


# These rules are imported in the order that they are expected to run.
Expand Down
55 changes: 55 additions & 0 deletions phylogenetic/defaults/auspice_config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
{
"title": "Real-time tracking of rabies full genome virus evolution",
"maintainers": [
{"name": "Kim Andrews", "url": "https://bedford.io/team/kim-andrews/"},
{"name": "the Nextstrain team", "url": "https://nextstrain.org/team"}
],
"data_provenance": [
{
"name": "GenBank",
"url": "https://www.ncbi.nlm.nih.gov/genbank/"
}
],
"build_url": "https://github.com/nextstrain/rabies",
"colorings": [
{
"key": "gt",
"title": "Genotype",
"type": "categorical"
},
{
"key": "region",
"title": "Region",
"type": "categorical"
},
{
"key": "country",
"title": "Country",
"type": "categorical"
},
{
"key": "host",
"title": "Host",
"type": "categorical"
}
],
"geo_resolutions": [
"country",
"region"
],
"display_defaults": {
"map_triplicate": true,
"color_by": "region"
},
"filters": [
"region",
"country",
"author"
],
"metadata_columns": [
"author",
"strain",
"division",
"location"
]
}
7 changes: 7 additions & 0 deletions phylogenetic/defaults/colors.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Regions
region Asia #447CCD
region Oceania #5EA9A1
region Africa #8ABB6A
region Europe #BEBB48
region South America #E29E39
region North America #E2562B
19 changes: 14 additions & 5 deletions phylogenetic/defaults/config.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# This configuration file should contain all required configuration parameters
# for the phylogenetic workflow to run to completion.
#
# Define optional config parameters with their default values here so that users
# do not have to dig through the workflows to figure out the default values
strain_id_field: "accession"
files:
exclude: "defaults/dropped_strains.txt"
reference: "defaults/rabies_reference.gb"
colors: "defaults/colors.tsv"
auspice_config: "defaults/auspice_config.json"
description: "defaults/description.md"
filter:
group_by: "country year"
sequences_per_group: 20
min_date: 1950
min_length: 5000
ancestral:
inference: "joint"
17 changes: 17 additions & 0 deletions phylogenetic/defaults/description.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
We gratefully acknowledge the authors, originating and submitting laboratories of the genetic sequences and metadata for sharing their work. Please note that although data generators have generously shared data in an open fashion, that does not mean there should be free license to publish on this data. Data generators should be cited where possible and collaborations should be sought in some circumstances. Please try to avoid scooping someone else's work. Reach out if uncertain.


#### Analysis
Our bioinformatic processing workflow can be found at [github.com/nextstrain/rabies](https://github.com/nextstrain/rabies) and includes:
- sequence alignment by [augur align](https://docs.nextstrain.org/projects/augur/en/stable/usage/cli/align.html)
- phylogenetic reconstruction using [IQTREE-2](http://www.iqtree.org/)
- ancestral state reconstruction and temporal inference using [TreeTime](https://github.com/neherlab/treetime)

#### Underlying data
We curate sequence data and metadata from NCBI as starting point for our analyses. Curated sequences and metadata are available as flat files at:
- [data.nextstrain.org/files/workflows/rabies/sequences.fasta.zst](https://data.nextstrain.org/files/workflows/rabies/sequences.fasta.zst)
- [data.nextstrain.org/files/workflows/rabies/metadata.tsv.zst](https://data.nextstrain.org/files/workflows/rabies/metadata.tsv.zst)

---

Screenshots may be used under a [CC-BY-4.0 license](https://creativecommons.org/licenses/by/4.0/) and attribution to nextstrain.org must be provided.
6 changes: 6 additions & 0 deletions phylogenetic/defaults/dropped_strains.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Incorrect virus species assignment:
MF197744 # Lyssavirus bokeloh
MF197745 # Lyssavirus bokeloh
#
# Large number of mutations:
MK920923 # Sample from kinkajou host, Rocha et al. 2020: https://www.tandfonline.com/doi/full/10.1080/22221751.2020.1759380
Loading

0 comments on commit 363b120

Please sign in to comment.