Skip to content

Commit

Permalink
Adjust input length on nextclade runs
Browse files Browse the repository at this point in the history
Only run nextclade on sequences at least 1000 nt long, this would avoid
misclassification of short sequences. The dengue E gene has an approximate
length of 1400.
  • Loading branch information
j23414 committed Jan 9, 2024
1 parent 67ed945 commit 9b9a10d
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions ingest/workflow/snakemake_rules/nextclade.smk
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,16 @@ rule nextclade_all:
output:
"data/nextclade_results/nextclade_all.tsv",
threads: 4
params:
min_length=1000, # E gene length is approximately 1400
shell:
"""
nextclade run \
--input-dataset {input.dataset} \
-j {threads} \
--output-tsv {output} \
--min-match-rate 0.01 \
--min-length {params.min_length} \
--silent \
{input.sequences}
"""
Expand Down

0 comments on commit 9b9a10d

Please sign in to comment.