Skip to content

Commit

Permalink
Update max-sequences filtering parameter
Browse files Browse the repository at this point in the history
It makes more sense to specify this as a filtering parameter. We could
continue using a value which can't be changed according to wildcards
(e.g. `target_sequences_per_tree: 3000`) however by using the
"*/*/*: 3000" syntax we make it clearer that it's possible to make
this specific to certain builds.

The new syntax makes this trivial to implement using a
  • Loading branch information
jameshadfield committed Dec 2, 2024
1 parent 2a19000 commit 2aaf006
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 8 deletions.
2 changes: 1 addition & 1 deletion Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -407,7 +407,7 @@ def _filter_params(wildcards, input, output, threads, resources):
group_by_value = resolve_config_value(['filter', 'group_by'], wildcards)
cmd += f" --group-by {group_by_value}" if group_by_value else ""

cmd += f" --subsample-max-sequences {config['target_sequences_per_tree']}"
cmd += f" --subsample-max-sequences {resolve_config_value(['filter', 'target_sequences_per_tree'], wildcards)}"
cmd += f" --min-date {resolve_config_value(['filter', 'min_date'], wildcards)}"
cmd += f" --include {input.include}"
cmd += f" --exclude-where {exclude_where}"
Expand Down
4 changes: 3 additions & 1 deletion gisaid/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@ subtype_lookup:
h9n2: ['h9n2']

#### Parameters which control large overarching aspects of the build
target_sequences_per_tree: 3000
same_strains_per_segment: false


Expand All @@ -68,6 +67,9 @@ description: config/description_gisaid.md
# There's one exception: If a config value is constant for any and all builds then you
# can just use a scalar value (number, string, boolean)
filter:
target_sequences_per_tree:
"*/*/*": 3000

min_length:
"*/pb2/*": 2100
"*/pb1/*": 2100
Expand Down
11 changes: 5 additions & 6 deletions h5n1-cattle-outbreak/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,6 @@ local_ingest: false
subtype_lookup:
h5n1-cattle-outbreak: ['h5n1', 'h5n2', 'h5n3', 'h5n4', 'h5n5', 'h5n6', 'h5n7', 'h5n8', 'h5n9']

#### Parameters which control large overarching aspects of the build
# Set a high target_sequences_per_tree to capture all circulating strains, as they will be pruned down
# as part of the workflow
target_sequences_per_tree: 10_000


#### Config files ####

reference: config/h5n1-cattle-outbreak/reference_{segment}.gb
Expand All @@ -58,6 +52,11 @@ description: config/{subtype}/description_{subtype}.md

#### Rule-specific parameters ####
filter:
# Set a high target_sequences_per_tree to capture all circulating strains, as they will be pruned down
# as part of the workflow
target_sequences_per_tree:
"*/*/*": 3000

min_length:
"*/pb2/*": 2100 # Note: could use "h5n1-cattle-outbreak/pb2/default: 2100" if desired
"*/pb1/*": 2100
Expand Down

0 comments on commit 2aaf006

Please sign in to comment.