Fragment_length_analysis step crashes on low-quality/low-abundance samples #117

DennisSchmitz · 2020-02-06T09:18:21Z

See Nextseq run 24, 2019, sample 18 for a test-case.

Bug submitted by @jeroencremer. The fragment length analysis step crashes on low-quality/low-abundance samples in default mode. The reason; due to stringency settings, the scaffold file is empty (due to filtering), then the subsequent step (which is fragment length analysis) crashes. This is a situation I didn't account for in the code. Good news is, if a sample crashes for this reason it's probably bad/useless anyway and can safely be removed from the analysis and restarted.

I do have to update the code to catch and handle this situation, either by touching empty files or by removing the sample in it's entirety from the analysis.

cat logs/Fragment_length_analysis_RUN24-18_S175.log 
[bwa_index] Pack FASTA... 0.00 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.00 seconds elapse.
[bwa_index] Update BWT... 0.00 sec
[bwa_index] Pack forward-only FASTA... 0.00 sec
[bwa_index] Construct SA from BWT and Occ... 0.00 sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa index data/scaffolds_filtered/RUN24-18_S175_scaffolds_ge500nt.fasta
[main] Real time: 0.074 sec; CPU: 0.002 sec
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 35584 sequences (3717313 bp)...

cat logs/drmaa/179773.out 
Sender: LSF System <XXX>
Subject: Job 179773: <Jovian_Fragment_length_analysis.jobid402> in cluster <XXX> Exited

Job <Jovian_Fragment_length_analysis.jobid402> was submitted from host <XXX> by user <XXX> in cluster <XXX> at Thu Jan 30 11:53:42 2020.
Job was executed on host(s) <XXX>, in queue <XXX>, as user <XXX> in cluster <XXX> at Thu Jan 30 11:53:43 2020.
<XXX> was used as the home directory.
<XXX> was used as the working directory.
Started at Thu Jan 30 11:53:43 2020.
Terminated at Thu Jan 30 11:54:17 2020.
Results reported at Thu Jan 30 11:54:17 2020.

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
XXX/Nextseq_RUN24/.snakemake/tmp.i83d3146/Jovian_Fragment_length_analysis.jobid402
------------------------------------------------------------

Exited with exit code 1.

Resource usage summary:

    CPU time :                                   5.26 sec.
    Max Memory :                                 34 MB
    Average Memory :                             24.20 MB
    Total Requested Memory :                     -
    Delta Memory :                               -
    Max Swap :                                   -
    Max Processes :                              7
    Max Threads :                                10
    Run time :                                   38 sec.
    Turnaround time :                            35 sec.

The output (if any) follows:

Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 4
Rules claiming more threads will be scaled down.
Job counts:
	count	jobs
	1	Fragment_length_analysis
	1

[Thu Jan 30 11:54:04 2020]
rule Fragment_length_analysis:
    input: data/scaffolds_filtered/RUN24-18_S175_scaffolds_ge500nt.fasta, data/cleaned_fastq/RUN24-18_S175_pR1.fq, data/cleaned_fastq/RUN24-18_S175_pR2.fq
    output: data/scaffolds_filtered/RUN24-18_S175_sorted.bam, data/scaffolds_filtered/RUN24-18_S175_sorted.bam.bai, data/scaffolds_filtered/RUN24-18_S175_insert_size_metrics.txt, data/scaffolds_filtered/RUN24-18_S175_insert_size_histogram.pdf
    log: logs/Fragment_length_analysis_RUN24-18_S175.log
    jobid: 0
    benchmark: logs/benchmark/Fragment_length_analysis_RUN24-18_S175.txt
    wildcards: sample=RUN24-18_S175
    threads: 4

Activating conda environment: XXX/Nextseq_RUN24/.snakemake/conda/d953bd61
/bin/bash: line 1: 11697 Segmentation fault      (core dumped) bwa mem -t 4 data/scaffolds_filtered/RUN24-18_S175_scaffolds_ge500nt.fasta data/cleaned_fastq/RUN24-18_S175_pR1.fq data/cleaned_fastq/RUN24-18_S175_pR2.fq 2>> logs/Fragment_length_analysis_RUN24-18_S175.log
     11698 Done                    | samtools view -@ 4 -uS - 2>> logs/Fragment_length_analysis_RUN24-18_S175.log
     11699 Done                    | samtools sort -@ 4 - -o data/scaffolds_filtered/RUN24-18_S175_sorted.bam >> logs/Fragment_length_analysis_RUN24-18_S175.log 2>&1
[Thu Jan 30 11:54:17 2020]
Error in rule Fragment_length_analysis:
    jobid: 0
    output: data/scaffolds_filtered/RUN24-18_S175_sorted.bam, data/scaffolds_filtered/RUN24-18_S175_sorted.bam.bai, data/scaffolds_filtered/RUN24-18_S175_insert_size_metrics.txt, data/scaffolds_filtered/RUN24-18_S175_insert_size_histogram.pdf
    log: logs/Fragment_length_analysis_RUN24-18_S175.log
    conda-env: XXX/Nextseq_RUN24/.snakemake/conda/d953bd61

RuleException:
CalledProcessError in line 336 of XXX/Nextseq_RUN24/Snakefile:
Command 'source /mnt/miniconda/bin/activate 'XXX/Nextseq_RUN24/.snakemake/conda/d953bd61'; set -euo pipefail;  bwa index data/scaffolds_filtered/RUN24-18_S175_scaffolds_ge500nt.fasta > logs/Fragment_length_analysis_RUN24-18_S175.log 2>&1
bwa mem -t 4 data/scaffolds_filtered/RUN24-18_S175_scaffolds_ge500nt.fasta data/cleaned_fastq/RUN24-18_S175_pR1.fq data/cleaned_fastq/RUN24-18_S175_pR2.fq 2>> logs/Fragment_length_analysis_RUN24-18_S175.log |samtools view -@ 4 -uS - 2>> logs/Fragment_length_analysis_RUN24-18_S175.log |samtools sort -@ 4 - -o data/scaffolds_filtered/RUN24-18_S175_sorted.bam >> logs/Fragment_length_analysis_RUN24-18_S175.log 2>&1
samtools index -@ 4 data/scaffolds_filtered/RUN24-18_S175_sorted.bam >> logs/Fragment_length_analysis_RUN24-18_S175.log 2>&1
picard -Dpicard.useLegacyParser=false CollectInsertSizeMetrics -I data/scaffolds_filtered/RUN24-18_S175_sorted.bam -O data/scaffolds_filtered/RUN24-18_S175_insert_size_metrics.txt -H data/scaffolds_filtered/RUN24-18_S175_insert_size_histogram.pdf >> logs/Fragment_length_analysis_RUN24-18_S175.log 2>&1' returned non-zero exit status 139.
  File "/XXX/Nextseq_RUN24/Snakefile", line 336, in __rule_Fragment_length_analysis
  File "XXX/.conda/envs/Jovian_master/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Removing output files of failed job Fragment_length_analysis since they might be corrupted:
data/scaffolds_filtered/RUN24-18_S175_sorted.bam
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message


PS:

Unable to read stderr data from stderr buffer file; your job was probably aborted prematurely.

Only 10MB reads left after filtering:

ls -lah ./data/cleaned_fastq/RUN24-18_S175_*
4.4M Jan 28 16:37 ./data/cleaned_fastq/RUN24-18_S175_pR1.fq
4.4M Jan 28 16:37 ./data/cleaned_fastq/RUN24-18_S175_pR2.fq
2.2M Jan 28 16:36 ./data/cleaned_fastq/RUN24-18_S175_unpaired.fq

The filtered scaffold file is empty:

ls -lah ./data/scaffolds_filtered/RUN24-18_S175_scaffolds_ge500nt.fasta
0 Jan 28 17:32 ./data/scaffolds_filtered/RUN24-18_S175_scaffolds_ge500nt.fasta
wc -l ./data/scaffolds_filtered/RUN24-18_S175_scaffolds_ge500nt.fasta
0 ./data/scaffolds_filtered/RUN24-18_S175_scaffolds_ge500nt.fasta

The text was updated successfully, but these errors were encountered:

DennisSchmitz self-assigned this Feb 6, 2020

DennisSchmitz added the bug Something isn't working label Feb 6, 2020

DennisSchmitz closed this as completed Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fragment_length_analysis step crashes on low-quality/low-abundance samples #117

Fragment_length_analysis step crashes on low-quality/low-abundance samples #117

DennisSchmitz commented Feb 6, 2020

Fragment_length_analysis step crashes on low-quality/low-abundance samples #117

Fragment_length_analysis step crashes on low-quality/low-abundance samples #117

Comments

DennisSchmitz commented Feb 6, 2020