-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fq lint module update: exit on failed validation #7000
base: master
Are you sure you want to change the base?
Conversation
@@ -29,5 +30,10 @@ process FQ_LINT { | |||
"${task.process}": | |||
fq: \$(echo \$(fq lint --version | sed 's/fq-lint //g')) | |||
END_VERSIONS | |||
|
|||
if ! tail -n 1 ${prefix}.fq_lint.txt | grep -q 'fq-lint end'; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't change the module, generally we try to let the tools work as they do, without overlaying additional logic.
We can add some logic to the subworkflow + workflow that would filter out any libraries failing linting. You may already see logic in rnaseq that we use for trimming, strand failures etc, and we can use the same mechanism.
But my understanding is that fq_lint will return a non-zero error code on a failure? That should trigger a stop without this. Am I mistaken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pinin4fjords Yeah, kind of. fq lint has different validators that are all assigned a different code, and if a FastQ file fails the linting, you will see in the log what code is failed on. You can't grep for the codes themselves in the log, as the log will say upfront what validators are enabled by code. That's why I had to grep for "fq-lint end" and check if it doesn't exist. That's the list line you'll see in the linting log if the linting was successful, so failed FastQ files won't have that in their log.
So, fq lint doesn't actually put out an error code itself. If we left the current module alone as is, then that process will successfully complete on any FastQ file, even corrupt ones, which defeats the purpose of adding the linting into the pipeline.
I am definitely open to trying out some different logic to filter out samples that failed linting. Are you thinking those samples would just filtered out, and the pipeline would keep going with the good samples, or that the pipeline would exit if it finds a sample that failed linting? We find that our users often want the pipeline to stop so that they can try reuploading (or redownloading from source and then reuploading) the offending FastQ file, which has worked in a lot of cases. Maybe another conditional that could be set as to if a user wants the pipeline to exit if a sample fails linting or if they want the pipeline to continue with successful samples?
@pinin4fjords - sorry, I accidentally clicked "Resolve conversation" on the topic of adding the module output to the subworkflow! I added that into my last commit, if that is what you meant/had in mind? |
I am proposing a change to the nf-core fq lint module, such that the process will exit if a FastQ file fails validation. This will allow us to add this module into nf-core pipelines like rnaseq (see relevant PR here) and validate FastQ files early in the workflow, preventing pipelines from continuing and running more computationally-heavy steps on corrupted FastQ files.
PR checklist
Closes #XXX
versions.yml
file.label
nf-core modules test <MODULE> --profile docker
nf-core modules test <MODULE> --profile singularity
nf-core modules test <MODULE> --profile conda
nf-core subworkflows test <SUBWORKFLOW> --profile docker
nf-core subworkflows test <SUBWORKFLOW> --profile singularity
nf-core subworkflows test <SUBWORKFLOW> --profile conda