-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
augur subsample command #635
Comments
After our recent conversations internally and with @dpark01 about reducing the complexity of the ncov workflow and improving the portability of the existing workflow with other workflow languages and/or platforms, I'm bumping this here as a higher priority issue and moving it from the "backlog" to the "next up". |
Here is my current hack--would love to replace all that with It would be nice if a command like this could include emit as output a numeric count of selected samples in each deme. |
PR #762 begins an implementation of |
Update: we've had internal discussions considering this again with a different YAML schema and the addition of weighted sampling (#1318). |
Tasks
@victorlin to fill this out
Links
augur subsample
proposalOriginal issue
A common use case is versatile sub-sampling of datasets to suit a particular research question. The current best example of this is the (wonderful) SARS-CoV-2 pipeline which leverages a augur filter rule, a script to calculate priorities and snakemake wizardry to allow versatile, declarative subsampling schemes to be simply and intuitively defined.
This allows a simple-to-reason-with YAML file to result in a very bespoke subsampling scheme:
The question arises: how do we do this for a different pathogen?
As the SARS-CoV-2 example leverages snakemake, one solution would be to abstract that logic into a importable snakemake rule. The alternative approach would be a new augur command
augur subsample
which takes a YAML file declaring the desired subsampling settings. Learning from our work on nCoV, this would essentially replace the snakemake-controlledaugur filter
commands with a singleaugur subsample
command. The yaml file would look similar / identical to the current snakemake implementation. The subcommand would leverage the functions used byaugur filter
as well as the priorities script from nCoV.Thoughts?
Examples
subsampling.yaml
:The text was updated successfully, but these errors were encountered: