-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interface to kmer size for pseudoaligners #1144
Conversation
This PR is against the
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
I have one remark that may be beyond the scope of this PR:
For Kallisto 0.48, the help text states that the maximum allowed kmer size is 31. Although I suppose people will rather lower than increase it, there is currently no validation / warning in place.
For some other parameters, we have input validation functions in lib/WorkflowRnaseq.groovy
and I was wondering if there should be a check included for the new parameter ? Something akin to this, maybe?
if (params.pseudo_aligner_kmer_size && !(params.salmon_index || params.kallisto_index) && params.pseudo_aligner_kmer_size % 2 != 0) {
println "It is strongly advised to choose odd kmer sizes."
}
if (params.pseudo_aligner == "kallisto" && !params.kallisto_index && params.pseudo_aligner_kmer_size && params.pseudo_aligner_kmer_size > 31) {
println "Please choose an odd kmer size smaller or equal to 31"
}
Agree this would be a good addition. Do the tools themselves generate an error if these criteria aren't met? |
I'm not actually in favour of this sort of thing, I think there's potential for significant maintenance burden keeping this sort of thing synchronised with tool updates. If larger kmer sizes are a problem, we should rely on the tools to fail with that problem highlighted. I know that for Salmon at least it will error gracefully if you give it an even kmer size (for example) |
I haven't run the tools, just checked that the argument flags match those in the tools' help texts when reviewing this PR.
That is a valid point. I am convinced (and happy that Salmon is that smart). |
Thanks for reviews! |
The default kmer length in both Salmon and Kallisto is too long for short reads (<50bp) found in older libraries and specialist RNA-seq variants.
We can allow users to set the kmer length via custom config, but I think it's a central enough parameter to expose at the workflow level.
There's also a fix here, since --gencode isn't actually a parameer for Kallisto indexing (copy/paste errror).
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).