-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snakemake workflow #52
Conversation
lusSTR/workflow/.DS_Store
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope.
No concerns about the Snakefile at the moment. I'd be happy to chat about some streamlining ideas some time, but that would be icing on the cake. |
Alright so I've been trying to get lusSTR into a snakemake workflow so that you can just put in the command
I think I need your assistance with this @standage. No rush and we can definitely talk in person about what I'm trying to do. My goal is to have two workflows: one for STRs and one for SNPs. At this time, I just have a placeholder for the SNPs one and I figured once I can successfully run the STR side, I'll fill that one in. Just trying to learn how to create this atm 😃 |
We've done thin CLI wrappers around Snakemake workflows for several projects now, and I really like it as a strategy. I'd love to take a look at this myself, but if I can't make enough time soon anyone else on my team has experience and should be qualified to help you hash it out! |
Sounds good! |
def main(args): | ||
Path(args.workdir).mkdir(parents=True, exist_ok=True) | ||
final_dest = f"{args.workdir}/config.yaml" | ||
config = resource_filename("lusSTR", "data/config.yaml") | ||
final_config = edit_config(config, args) | ||
with open(final_dest, "w") as file: | ||
yaml.dump(final_config, file) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created the command lusstr config
to create the config file. Settings can be specified as command line arguments. If no arguments are provided, the default settings are used and the user can edit manually. If a working directory is specified, the config file is dumped there, otherwise to the cwd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
lusSTR/cli/snps.py
Outdated
def main(args): | ||
snakefile = resource_filename("lusSTR", "workflows/snps.smk") | ||
pretarget = "annotate" if args.filter else "all" | ||
result = snakemake.snakemake( | ||
snakefile, config=args.config, targets=pretarget, | ||
workdir=args.work_dir | ||
) | ||
if result is not True: | ||
raise SystemError('Snakemake failed') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The snps
command is not ready yet. I figured I'd do another PR after this one (since this one is already quite a lot). This is just a placeholder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python has an Exception class dedicated for this scenario: I'd recommend clearing out—or commenting out—the current code in the main function and replacing it with a raise NotImplementedError('SNP workflow implementation pending')
statement.
## placeholder until I update for snps | ||
|
||
configfile: "config.yaml" | ||
output_name = config["output"] | ||
input_name = config["samp_input"] | ||
software = config["output_type"] | ||
prof = config["profile_type"] | ||
data = config["data_type"] | ||
filter_sep = config["filter_sep"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a placeholder for the snps snake file
lusSTR/wrappers/annot.py
Outdated
if args.separate: | ||
indiv_files(autosomal_final_table, input_name, ".txt") | ||
else: | ||
autosomal_final_table.to_csv(args.out, sep="\t", index=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the separate
argument for the annotate
step. This was really just used when lusSTR files were being fed into LLAMAS which required separate files for each sample. lusSTR however requires a single file with all samples to be fed into the next step, so it's unnecessary.
This is ready for further review @standage. I added a new command ( I also updated all tests and all are passing with the exception of |
I do still need to update the README. |
Nevermind! updated! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks pretty good. A few comments, but hopefully should be quick to resolve.
One thing you might consider is that the various lusstr subcommands give users the flexibility to run the STR analysis workflow step-wise, so having to specify format, annotate, or all seems a bit redundant. Why is the user going to run lusstr strs
if they don't want to run all
? There might be a good reason for this, and I could probably formulate the contours of that reason myself. But I'm curious what you think. If we cut out that argument, that could simplify the interface a bit (I'm not sure it would simplify the workflow implementation at all). If you decide that it's necessary and should stay, how would you feel about making it an option with all
as the default?
setup.py
Outdated
"lusSTR/tests/data/NGS_stutter_test/*", | ||
"lusSTR/workflows/*", | ||
"lusSTR/wrappers/*", | ||
] | ||
}, | ||
include_package_data=True, | ||
install_requires=["pandas>=1.0", "openpyxl>=3.0.6"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New package dependencies need to be declared here at a minimum.
- snakemake
- pyyaml(?)
lusSTR/cli/config.py
Outdated
data["separate"] = True | ||
if args.nocombine: | ||
data["nocombine"] = True | ||
if args.efm_sep: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get an error when I run lusstr config
with no arguments. It points to this line, which doesn't appear to be a perfect match for any option declared in the argument parser. I assume this should be if args.efm
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, old argument that I forgot to remove from that function. There were two different separate
pieces, one for the annotate
step and one for the filter
step. Now only the filter
step argument remains.
def main(args): | ||
Path(args.workdir).mkdir(parents=True, exist_ok=True) | ||
final_dest = f"{args.workdir}/config.yaml" | ||
config = resource_filename("lusSTR", "data/config.yaml") | ||
final_config = edit_config(config, args) | ||
with open(final_dest, "w") as file: | ||
yaml.dump(final_config, file) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
lusSTR/cli/snps.py
Outdated
def main(args): | ||
snakefile = resource_filename("lusSTR", "workflows/snps.smk") | ||
pretarget = "annotate" if args.filter else "all" | ||
result = snakemake.snakemake( | ||
snakefile, config=args.config, targets=pretarget, | ||
workdir=args.work_dir | ||
) | ||
if result is not True: | ||
raise SystemError('Snakemake failed') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python has an Exception class dedicated for this scenario: I'd recommend clearing out—or commenting out—the current code in the main function and replacing it with a raise NotImplementedError('SNP workflow implementation pending')
statement.
lusSTR/tests/test_suite.py
Outdated
def test_snakemake(command, output, format_out, annot_out, all_out, tmp_path): | ||
config = str(tmp_path / "config.yaml") | ||
inputfile = data_file("UAS_bulk_input/Positive Control Sample Details Report 2315.xlsx") | ||
exp_output = data_file(output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running this data through lusstr strs all
on the command line went seamlessly for me.
So do you mean by running |
Ah, gotcha. I saw that lusSTR still had a subcommand interface, but glossed over the fact that the old subcommands aren't supported any more. Carry on! |
I believe this is ready now, @standage! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
It looks like some SNP tests failed as well. If you're saving the SNP workflow for another PR, you should probably disable those tests for now. |
Ok will look into this! |
Ok looks like things are passing now! Everything good for you? |
This PR will create a snakemake workflow to simplify running the entire lusSTR pipeline on either a single file or a set of files.