Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pre commit #10

Merged
merged 8 commits into from
May 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .github/workflows/pre-commit.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: pre-commit

on:
- push
- pull_request

jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- uses: pre-commit/action@v3.0.1
41 changes: 41 additions & 0 deletions .pre-commit-config.yaml
genehack marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
default_language_version:
python: python3
exclude: '\.(tsv|fasta|gb)$|^ingest/vendored/'
repos:
- repo: https://github.com/pre-commit/sync-pre-commit-deps
rev: v0.0.1
hooks:
- id: sync-pre-commit-deps
- repo: https://github.com/shellcheck-py/shellcheck-py
rev: v0.10.0.1
hooks:
- id: shellcheck
- repo: https://github.com/rhysd/actionlint
rev: v1.6.27
hooks:
- id: actionlint
entry: env SHELLCHECK_OPTS='--exclude=SC2027' actionlint
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
hooks:
- id: trailing-whitespace
- id: check-ast
- id: check-case-conflict
- id: check-docstring-first
- id: check-json
- id: check-executables-have-shebangs
- id: check-merge-conflict
- id: check-shebang-scripts-are-executable
- id: check-symlinks
- id: check-toml
- id: check-yaml
- id: destroyed-symlinks
- id: detect-private-key
- id: end-of-file-fixer
- id: fix-byte-order-marker
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.4.6
hooks:
# Run the linter.
- id: ruff
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,16 @@ Phylogenies of human seasonal coronaviruses.
Built starting from the [Nextstrain pathogen repository template][].

[Nextstrain pathogen repository template]: https://github.com/nextstrain/pathogen-repo-guide/

## Working on this repo

This repo is configured to use [pre-commit](https://pre-commit.com).
If you will be writing new code or otherwise working within this repo,
please do the following to get started:

1. install `pre-commit` by running either `python -m pip install
pre-commit` or `brew install pre-commit`, depending on your
preferred package management solution
2. install the local git hooks by running `pre-commit install` from
the root of the repo
3. get to coding!
12 changes: 6 additions & 6 deletions ingest/config/hku1/annotations.tsv
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Manually curated annotations TSV file
# The TSV should not have a header and should have exactly three columns:
"# id to match existing metadata, field name, and field value"
"# If there are multiple annotations for the same id and field, then the last value is used"
# Lines starting with '#' are treated as comments
# Any '#' after the field value are treated as comments.
# Manually curated annotations TSV file
# The TSV should not have a header and should have exactly three columns:
# "id to match existing metadata, field name, and field value"
# "If there are multiple annotations for the same id and field, then the last value is used"
# Lines starting with '#' are treated as comments
# Any '#' after the field value are treated as comments.
110 changes: 55 additions & 55 deletions ingest/config/nl63/annotations.tsv
Original file line number Diff line number Diff line change
@@ -1,55 +1,55 @@
# Manually curated annotations TSV file
# The TSV should not have a header and should have exactly three columns:
"# id to match existing metadata, field name, and field value"
"# If there are multiple annotations for the same id and field, then the last value is used"
# Lines starting with '#' are treated as comments
# Any '#' after the field value are treated as comments.
KF530114.1 country USA
KF530113.1 country USA
KF530112.1 country USA
KF530111.1 country USA
KF530110.1 country USA
KF530109.1 country USA
KF530108.1 country USA
KF530107.1 country USA
KF530106.1 country USA
KF530105.1 country USA
KF530104.1 country USA
JQ900260.1 country USA
JQ900259.1 country USA
JQ900258.1 country USA
JQ900257.1 country USA
JQ900256.1 country USA
JQ900255.1 country USA
JQ771060.1 country USA
JQ771059.1 country USA
JQ771058.1 country USA
JQ771057.1 country USA
JQ771056.1 country USA
JQ771055.1 country USA
JQ765575.1 country USA
JQ765574.1 country USA
JQ765573.1 country USA
JQ765572.1 country USA
JQ765571.1 country USA
JQ765570.1 country USA
JQ765569.1 country USA
JQ765568.1 country USA
JQ765567.1 country USA
JQ765566.1 country USA
JQ765565.1 country USA
JQ765564.1 country USA
JQ765563.1 country USA
LC756668.1 country Japan
LC720428.1 country Japan
LC687402.1 country Japan
LC687401.1 country Japan
LC687397.1 country Japan
LC687396.1 country Japan
LC687395.1 country Japan
LC687394.1 country Japan
LC654455.1 country Japan
LC488390.2 country Japan
LC488389.2 country Japan
JX104161.1 country China
OR266947.1 country Russia
# Manually curated annotations TSV file
# The TSV should not have a header and should have exactly three columns:
# "id to match existing metadata, field name, and field value"
# "If there are multiple annotations for the same id and field, then the last value is used"
# Lines starting with '#' are treated as comments
# Any '#' after the field value are treated as comments.
KF530114.1 country USA
KF530113.1 country USA
KF530112.1 country USA
KF530111.1 country USA
KF530110.1 country USA
KF530109.1 country USA
KF530108.1 country USA
KF530107.1 country USA
KF530106.1 country USA
KF530105.1 country USA
KF530104.1 country USA
JQ900260.1 country USA
JQ900259.1 country USA
JQ900258.1 country USA
JQ900257.1 country USA
JQ900256.1 country USA
JQ900255.1 country USA
JQ771060.1 country USA
JQ771059.1 country USA
JQ771058.1 country USA
JQ771057.1 country USA
JQ771056.1 country USA
JQ771055.1 country USA
JQ765575.1 country USA
JQ765574.1 country USA
JQ765573.1 country USA
JQ765572.1 country USA
JQ765571.1 country USA
JQ765570.1 country USA
JQ765569.1 country USA
JQ765568.1 country USA
JQ765567.1 country USA
JQ765566.1 country USA
JQ765565.1 country USA
JQ765564.1 country USA
JQ765563.1 country USA
LC756668.1 country Japan
LC720428.1 country Japan
LC687402.1 country Japan
LC687401.1 country Japan
LC687397.1 country Japan
LC687396.1 country Japan
LC687395.1 country Japan
LC687394.1 country Japan
LC654455.1 country Japan
LC488390.2 country Japan
LC488389.2 country Japan
JX104161.1 country China
OR266947.1 country Russia
12 changes: 6 additions & 6 deletions ingest/config/oc43/annotations.tsv
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Manually curated annotations TSV file
# The TSV should not have a header and should have exactly three columns:
"# id to match existing metadata, field name, and field value"
"# If there are multiple annotations for the same id and field, then the last value is used"
# Lines starting with '#' are treated as comments
# Any '#' after the field value are treated as comments.
# Manually curated annotations TSV file
# The TSV should not have a header and should have exactly three columns:
# "id to match existing metadata, field name, and field value"
# "If there are multiple annotations for the same id and field, then the last value is used"
# Lines starting with '#' are treated as comments
# Any '#' after the field value are treated as comments.
3 changes: 2 additions & 1 deletion ingest/rules/curate.smk
Original file line number Diff line number Diff line change
Expand Up @@ -46,12 +46,13 @@ rule concat_geolocation_rules:
"""


def format_field_map(field_map: dict[str,str]) -> str:
def format_field_map(field_map: dict[str, str]) -> str:
"""
Format dict to `"key1"="value1" "key2=value2"...` for use in shell commands.
"""
return " ".join([f'"{key}"="{value}"' for key, value in field_map.items()])


# This curate pipeline is based on existing pipelines for pathogen repos using NCBI data.
# You may want to add and/or remove steps from the pipeline for custom metadata
# curation for your pathogen. Note that the curate pipeline is streaming NDJSON
Expand Down
1 change: 1 addition & 0 deletions ingest/rules/fetch_from_ncbi.smk
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ and outputs them as a single NDJSON file that can be directly fed into the
curation pipeline.
"""


rule fetch_ncbi_dataset_package:
params:
ncbi_taxon_id=lambda wildcards: config[wildcards.virus]["ncbi_taxon_id"],
Expand Down
1 change: 1 addition & 0 deletions ingest/vendored/transform-genbank-location
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ GenBank by verifying that the 'database' field has a value of "GenBank" or "RefS

Outputs the modified record to stdout.
"""

import json
from sys import stdin, stdout

Expand Down
1 change: 1 addition & 0 deletions ingest/vendored/transform-strain-names
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ stdin. Adds a 'strain' field to the record if it does not already exist.

Outputs the modified records to stdout.
"""

import argparse
import json
import re
Expand Down
Empty file modified phylogenetic/Snakefile
100755 → 100644
Empty file.
2 changes: 1 addition & 1 deletion phylogenetic/config/229e/dropped_strains.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
OK662398.1
OK625404.1
MZ712010.1
MZ712010.1
2 changes: 1 addition & 1 deletion phylogenetic/config/229e/reference.fasta
Original file line number Diff line number Diff line change
Expand Up @@ -454,4 +454,4 @@ ATCTAGTCTTATACACAATGGTAAGCCAGTGGTAGTAAAGGTATAAGAAATTTGCTACTA
TGTTACTGAACCTAGGTGAACGCTAGTATAACTCATTACAAATGTGCTGGAGTAATCAAA
GATCGCATTGACGAGCCAACAATGGAAGAGCCAGTCATTTGTCTTGAGACCTATCTAGTT
AGTAACTGCTAATGGAACGGTTTCGATATGGATACACAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAA
4 changes: 2 additions & 2 deletions phylogenetic/config/229e/reference.gb
Original file line number Diff line number Diff line change
Expand Up @@ -439,7 +439,7 @@ FEATURES Location/Qualifiers
AGVVANGVKAKGYPQFAELVPSTAAMLFDSHIVSKESGNTVVLTFTTRVTVPKDHPHL
GKFLEELNAFTREMQQHPLLNPSALEFNPSQTSPATAEPVRDEVSIETDIIDEVN"
3'UTR 26856..27317
ORIGIN
ORIGIN
1 acttaagtac cttatctatc tacagataga aaagttgctt tttagacttt gtgtctactt
61 ttctcaacta aacgaaattt ttgctatggc cggcatcttt gatgctggag tcgtagtgta
121 attgaaattt catttgggtt gcaacagttt ggaagcaagt gctgtgtgtc ctagtctaag
Expand Down Expand Up @@ -896,4 +896,4 @@ ORIGIN
27181 gatcgcattg acgagccaac aatggaagag ccagtcattt gtcttgagac ctatctagtt
27241 agtaactgct aatggaacgg tttcgatatg gatacacaaa aaaaaaaaaa aaaaaaaaaa
27301 aaaaaaaaaa aaaaaaa
//
//
2 changes: 1 addition & 1 deletion phylogenetic/config/hku1/reference.fasta
Original file line number Diff line number Diff line change
Expand Up @@ -497,4 +497,4 @@ CTTAATGAGAATGAATCCTAATTCGACACTAGGTGGTAACCCCTCGCTATTATTCGGAAT
AGGACACTCTCTATCAGAATGAATTCTTGCTGTAATAACAGATAGAGTAGGTTGTTACAG
ACTATATATTAATTAGTAGAAATTTTATATTTAGACATTTGATTGTTAGAGTAGTTATAA
GGTTTAGCTGTAGTATAAACGCCTCCGGGAAGAGCTATCAATTGTAGTGTTTAATATATA
TATTAGTATATGATTGAAATTAATTATAGCCTTTTGGAGGAATTAC
TATTAGTATATGATTGAAATTAATTATAGCCTTTTGGAGGAATTAC
6 changes: 3 additions & 3 deletions phylogenetic/config/hku1/reference.gb
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ REFERENCE 4 (bases 1 to 29926)
Hong Kong, China
COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The
reference sequence was derived from AY597011.

On Jan 23, 2006 this sequence version replaced NC_006577.1.
The mature peptides were designated nsp1 through nsp16 following
the nomenclature used in the MHV-A59 and SARS-CoV RefSeqs
Expand Down Expand Up @@ -608,7 +608,7 @@ FEATURES Location/Qualifiers
QLMVNKSSCYRDGISTISVPAHMPMHPMVNPSKGSSGLLITKLTLLLPPMFRQGILLL
KKLSLLGFRLVRFCLKAIMLKAQEGLLLIVDQVHVLNHVDPIIVH"
3'UTR 28960..29926
ORIGIN
ORIGIN
1 gagtttgagc gattgacgtt cgtaccgtct atcagcttac gatctcttgt cagatctcat
61 taaatctaaa ctttttaaac aagattccct gttatccatg cttgtgagtg tggtttaatc
121 ataatcttgt attttacttt ccacactttt catctctctg ccagtgacgt gttggttgtc
Expand Down Expand Up @@ -1108,4 +1108,4 @@ ORIGIN
29761 actatatatt aattagtaga aattttatat ttagacattt gattgttaga gtagttataa
29821 ggtttagctg tagtataaac gcctccggga agagctatca attgtagtgt ttaatatata
29881 tattagtata tgattgaaat taattatagc cttttggagg aattac
//
//
2 changes: 1 addition & 1 deletion phylogenetic/config/nl63/reference.fasta
Original file line number Diff line number Diff line change
Expand Up @@ -458,4 +458,4 @@ ATTAGTTGCAACCCCATGCGTTTAGCGCATGATAAGGGTTTAGTCTTACACACAATGGTA
GGCCAGTGATAGTAAAGTGTAAGTAATTTGCTATCATATTAACATGTCTAGAGGAAAGTC
AGAACTTTTTCTGTTTGTGTTGTTGGAGTACTTAAAGATCGCATAGGCGCGCCAACAATG
GAAGAGCCAACAACATATCTAAAAATGTTTTGTCTGGTACTTGTTAATGATATTGTTTTT
GATATGGATACAC
GATATGGATACAC
6 changes: 3 additions & 3 deletions phylogenetic/config/nl63/reference.gb
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ REFERENCE 4 (bases 1 to 27553)
Amsterdam 1105 AZ, The Netherlands
COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final
NCBI review. The reference sequence is identical to AY567487.

On Jun 24, 2004 this sequence version replaced NC_005831.1.
COMPLETENESS: full length.
FEATURES Location/Qualifiers
Expand Down Expand Up @@ -385,7 +385,7 @@ FEATURES Location/Qualifiers
3'UTR 27267..27553
/inference="non-experimental evidence, no additional
details recorded"
ORIGIN
ORIGIN
1 cttaaagaat ttttctatct atagatagag aattttctta tttagacttt gtgtctactc
61 ttctcaacta aacgaaattt ttctagtgct gtcatttgtt atggcagtcc tagtgtaatt
121 gaaatttcgt caagtttgta aactggttag gcaagtgttg tattttctgt gtctaagcac
Expand Down Expand Up @@ -846,4 +846,4 @@ ORIGIN
27421 agaacttttt ctgtttgtgt tgttggagta cttaaagatc gcataggcgc gccaacaatg
27481 gaagagccaa caacatatct aaaaatgttt tgtctggtac ttgttaatga tattgttttt
27541 gatatggata cac
//
//
2 changes: 1 addition & 1 deletion phylogenetic/config/oc43/reference.fasta
Original file line number Diff line number Diff line change
Expand Up @@ -511,4 +511,4 @@ TAACCCCTCGCAGAAAAGTCGAGATAAGGCACTCTCTATCAGAATGGATGTCTTGCTGCT
ATAATAGATAGAGAAGGTTATAGCAGACTATAGATTAATTAGTTGAAAGTTTTGTGTTGT
AATGTATAGTGTTGGAGAAAGTGAAAGACTTGCGGAAGTAATTGCCGACAAGTGCCCAAG
GGAAGAGCCAGCATGTTAAGTTACCACCCAGTAATTAGTAAATGAATGAAGTTAATTATG
GCCAATTGGAAGAATCAC
GCCAATTGGAAGAATCAC
4 changes: 2 additions & 2 deletions phylogenetic/config/oc43/reference.gb
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,7 @@ FEATURES Location/Qualifiers
/translation="MATSVNCCHDGIFTIWEQDRMLKTSTAPILTESTGSLATRLMSI
PRLTLSIGTQVAMRLFRLGFRLARYSLRVTILKAQEGLLLIPDLLRAHPAEPLVQDRV
VEPILAIEPLPLV"
ORIGIN
ORIGIN
1 gattgtgagc gatttgcgtg cgtgcatccc gcttcactga tctcttgtta gatctttttg
61 taatctaaac tttataaaaa catccactcc ctgtaatcta tgcttgtggg cgtagatttt
121 tcatagtggt gtttatattc atttctgctg ttaacagctt tcagccaggg acgtgttgta
Expand Down Expand Up @@ -812,4 +812,4 @@ ORIGIN
30601 aatgtatagt gttggagaaa gtgaaagact tgcggaagta attgccgaca agtgcccaag
30661 ggaagagcca gcatgttaag ttaccaccca gtaattagta aatgaatgaa gttaattatg
30721 gccaattgga agaatcac
//
//
2 changes: 1 addition & 1 deletion phylogenetic/rules/annotate_phylogeny.smk
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""
This part of the workflow runs ancestral reconstruction and
creates additonal annotations for the phylogenetic tree. It expects a
creates additional annotations for the phylogenetic tree. It expects a
single Newick tree and any additional files needed to create the
annotations such as the aligned FASTA and metadata file.

Expand Down