Error in `Stage median_cov_target` #25

ifiddes-10x-zz · 2018-01-31T23:14:32Z

I am trying to use the STRetch_wgs_bam_pipeline.groovy pipeline with a previously generated BAM against GRCh38, targeting a specific set of STRs I have identified in a BED file. After taking an hour to walk the BAM (why does it not use random access with the index?), it crashes with an error about the reference:

=========================== Stage median_cov_target (NA24143_longranger) ===========================
2018/01/31 15:01:40 NA24143.bam
chromosome: STR-A not found in NA24143.bam
chromosome: STR-AAAAAC not found in NA24143.bam
chromosome: STR-AAAAAG not found in NA24143.bam
chromosome: STR-AAAAAT not found in NA24143.bam
chromosome: STR-AAAAC not found in NA24143.bam

... <removed lots of lines>

panic: runtime error: index out of range
goroutine 1 [running]:
github.com/brentp/goleft/covmed.BamInsertSizes(0xc42009a320, 0x186a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/brentp/go/src/github.com/brentp/goleft/covmed/main.go:106 +0x6c5
github.com/brentp/goleft/covmed.Main()
        /home/brentp/go/src/github.com/brentp/goleft/covmed/main.go:153 +0x5d6
main.main()
        /home/brentp/go/src/github.com/brentp/goleft/main.go:65 +0x191

Am I misunderstanding how to use this pipeline? Do I need to be mapped against a special reference?

The text was updated successfully, but these errors were encountered:

hdashnow · 2018-01-31T23:24:27Z

Hi @ifiddes-10x,

It looks like the error might be occurring within the goleft tool.

Could you please tell me a little more about how you have set up the pipeline?
What bed file did you provide to the pipeline? If your original bam was on hg38, it is expecting the hg38 positions of all STRs in the genome.

The messages "chromosome: STR-AAAAAC not found in NA24143.bam" etc don't necessarily indicate a problem, if for example there are no AAAAAC expansions in your genome. The message is generated by bedtools, but I could probably suppress it.

ifiddes-10x-zz · 2018-01-31T23:27:55Z

I am using a bed file of only a few specific STRs that we are interested in. Could that be the problem? Should I find a genome wide one?

Other than that, I am doing all default parameters, with my custom STR BED file, on group of hg38 BAMs like so, in a subfolder I made to reduce clutter in the main folder: ../tools/bin/bpipe run ../pipelines/STRetch_wgs_bam_pipeline.groovy ../../strs.bed ../NA*bam

hdashnow · 2018-01-31T23:35:22Z

STRetch works by remapping reads to a custom reference genome with STR decoys. It works on the premise that these reads were likely to have been mis-mapped to the standard reference genome, because the expanded sequence is no present there. The first stage in the pipeline is extracting all reads that are likely to contain STRs. So, all reads mapping to any STR locus, flanking region, and unmapped reads.
So you need to provide a bed file containing the locations of all STRs for it to work effectively. STRetch will then report on all loci, but you could filter the results to loci of interest at the end.
I've uploaded the hg19 bed file. I'll create the hg38 bed file for you and post the link here in a few hours.

ifiddes-10x-zz · 2018-02-01T00:22:05Z

I see, I misunderstood. Thanks for the STR BED file!

hdashnow · 2018-02-01T01:35:47Z

@ifiddes-10x The hg38 bed file is here:
https://figshare.com/s/198f7a51bd95ce16b3db

Usage:
bpipe run STRetch/pipelines/STRetch_wgs_bam_pipeline.groovy hg38.simpleRepeat_period1-6.dedup.sorted.bed sample1.bam sample2.bam

At the moment STRetch ships with hg19, so it will remap against the hg19 + STR decoys, then give you results in terms of hg19. It is definitely time for me to generate and upload a hg38 reference, but it might take me a while to create and test appropriately.

ifiddes-10x-zz · 2018-02-01T01:44:11Z

Sorry, where did you upload the hg19 BED file?

On Wed, Jan 31, 2018 at 5:35 PM Harriet Dashnow ***@***.***> wrote: @ifiddes-10x <https://github.com/ifiddes-10x> The hg38 bed file is here: https://figshare.com/s/198f7a51bd95ce16b3db Usage: bpipe run STRetch/pipelines/STRetch_wgs_bam_pipeline.groovy hg38.simpleRepeat_period1-6.dedup.sorted.bed sample1.bam sample2.bam At the moment STRetch ships with hg19, so it will remap against the hg19 + STR decoys, then give you results in terms of hg19. It is definitely time for me to generate and upload a hg38 reference, but it might take me a while to create and test appropriately. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#25 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AhnBndFi6Ec6NBeWOxW28Wi_E2el9Ib3ks5tQRT0gaJpZM4R0zXt> .

--

hdashnow · 2018-02-01T01:55:36Z

Your input data is in hg38 so you will need to use the file linked above. Once reads are extracted the pipeline will switch to hg19. You should already have those reference files from the installation script

ifiddes-10x-zz · 2018-02-01T01:58:57Z

I have some inputs that are mapped to hg38, some to hg19. Is the hg19 STR file one of the ones that the install script placed in the reference-data folder?

On Wed, Jan 31, 2018 at 5:55 PM Harriet Dashnow ***@***.***> wrote: Your input data is in hg38 so you will need to use the file linked above. Once reads are extracted the pipeline will switch to hg19. You should already have those reference files from the installation script — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#25 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AhnBnR5gxwLywaZWJ9BVNm2THvO7pfW-ks5tQRmYgaJpZM4R0zXt> .

--

hdashnow · 2018-02-01T01:59:42Z

Yep

ifiddes-10x-zz · 2018-02-01T02:02:12Z

Alright thanks!

On Wed, Jan 31, 2018 at 5:59 PM Harriet Dashnow ***@***.***> wrote: Yep — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#25 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AhnBndlxBoEF5eN6daWEj2A-V9wzNaM1ks5tQRqPgaJpZM4R0zXt> .

--

ifiddes-10x-zz · 2018-02-01T16:38:59Z

I am still getting the same error in goleft with the correct bed file. Any ideas?

panic: runtime error: index out of range
goroutine 1 [running]:
github.com/brentp/goleft/covmed.BamInsertSizes(0xc42009a370, 0x186a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/brentp/go/src/github.com/brentp/goleft/covmed/main.go:106 +0x6c5
github.com/brentp/goleft/covmed.Main()
        /home/brentp/go/src/github.com/brentp/goleft/covmed/main.go:153 +0x5d6
main.main()
        /home/brentp/go/src/github.com/brentp/goleft/main.go:65 +0x191

============================= Stage STR_coverage (NA24385_longranger) ==============================

=========================== Stage STR_locus_counts (NA24385_longranger) ============================
Traceback (most recent call last):
  File "/mnt/home/ian/yard/STR/STRetch/scripts/identify_locus.py", line 165, in <module>
    main()
  File "/mnt/home/ian/yard/STR/STRetch/scripts/identify_locus.py", line 93, in main
    readlen, count_noCIGAR = detect_readlen(bamfile)
TypeError: 'NoneType' object is not iterable
Cleaned up file NA24385.locus_counts to .bpipe/trash/NA24385.locus_counts
ERROR: Command failed with exit status = 1 :

STRPATH=$PATH; PATH=/mnt/home/ian/yard/STR/STRetch/tools/bin:$PATH; /mnt/home/ian/yard/STR/STRetch/tools/bin/python /mnt/home/ian/yard/STR/STRetch/scripts/identify_locus.py --bam NA24385.bam --bed /mnt/home/ian/yard/STR/STRetch/reference-data/hg19.simpleRepeat_period1-6_dedup.sorted.bed --output NA24385.locus_counts ;PATH=$STRPATH

hdashnow · 2018-02-01T22:37:50Z

That's an error I haven't seen before.
You can see which commands it was trying to run incommandlog.txt in the working directory.
I looks like goleft is complaining about reads with zero insert sizes. And later the STR_locus_counts is complaining about a NoneType object when reading the bamfile.
Any chance the intermediate bam file is empty?
Check the bam file feeding into that stage has reads in it, and that they have pairs.

ifiddes-10x-zz · 2018-02-01T22:49:52Z

That appears to be the case. The run folder ends up with 128 smaller BAMs from the `samtools collate` step, but the output FASTQ are empty. So something is wrong with this step `cat <( /mnt/home/ian/yard/STR/STRetch/tools/bin/samtools view -hu -L combined_strs.slop.bed ../NA24385_longranger.GRCh37.bam ) <( /mnt/home/ian/yard/STR/STRetch/tools/bin/samtools view -u -f 4 ../NA24385_longranger.GRCh37.bam ) | /mnt/home/ian/yard/STR/STRetch/tools/bin/samtools collate -Ou -n 128 - NA24385_L001_R1.fastq | /mnt/home/ian/yard/STR/STRetch/tools/bin/bedtools bamtofastq -i - -fq >(gzip -c > NA24385_L001_R1.fastq.gz) -fq2 >(gzip -c > NA24385_L001_R2.fastq.gz)`. I can check any of the 128 smaller BAMs and see that they contain valid, paired reads. I am running the above step by hand to see if I can see what happens.

On Thu, Feb 1, 2018 at 2:37 PM Harriet Dashnow ***@***.***> wrote: That's an error I haven't seen before. You can see which commands it was trying to run incommandlog.txt in the working directory. I looks like goleft is complaining about reads with zero insert sizes. And later the STR_locus_counts is complaining about a NoneType object when reading the bamfile. Any chance the intermediate bam file is empty? Check the bam file feeding into that stage has reads in it, and that they have pairs. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#25 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AhnBnSxrJ7EAA_oARtvPcu253bc0jmnHks5tQjy_gaJpZM4R0zXt> .

--

hdashnow · 2018-02-01T22:57:27Z

Hi Ian,

Okay, that makes sense. If that step is failing, no reads are getting extracted so the later stages fail. I've had the same error reported by another user, also using hg38. I'm taking a look at his data today, so hopefully I can nut it out.
Those small bam files are a by-product of the samtools collate sorting algorithm, so probably aren't much help. It does tell us the stage got up to sorting.

hdashnow · 2018-02-02T00:27:46Z

Hey Ian,

I have an idea, and if this is the problem I'm going to feel a bit silly! Are you using the pipeline on a cluster where jobs are given limited walltime?

I suspect the walltime for that job is set to the default, 2 hours. So it's getting cut off mid-way through. I previously only tested the STRetch WGS pipelines on a cluster that doesn't restrict walltime.

If you wouldn't mind editing the following:

In STRetch/pipelines/pipeline_stages.groovy, add , "bwamem" to the end of the triple quotes in extract_reads_region. It should be line 175 if you have the version that's currently in master. The stage should now look like this:

extract_reads_region = {

    doc "Extract reads from bam region + unaligned"

    def fastaname = get_fname(REF)

    produce(branch.sample + '_L001_R1.fastq.gz', branch.sample + '_L001_R2.fastq.gz') {
        exec """
            cat <( $samtools view -hu -L $input.bed $input.bam )
                <( $samtools view -u -f 4 $input.bam ) |
            $samtools collate -Ou -n 128 - $output.prefix |
            $bedtools bamtofastq -i - -fq >(gzip -c > $output1.gz) -fq2 >(gzip -c > $output2.gz)
        """,  "bwamem"
    }
}

And just to be on the safe side, let's increase the walltime to 48 hours. In STRetch/pipelines/bpipe.config, change the walltime for bwa, so it looks like this:

    bwamem {
        walltime="48:00:00"
        procs=8
    }```

I'm testing this too, but obviously it will take a day or two to get a result!

ifiddes-10x-zz · 2018-02-02T02:35:26Z

I am running it on a local machine. I will try your changes. I ran the command from the command log by hand and got 'Error reading input file': ``` bespin1 Thu Feb 01 14:48 test $cat <( /mnt/home/ian/yard/STR/STRetch/tools/bin/samtools view -hu -L combined_strs.slop.bed ../NA24385_longranger.GRCh37.bam ) <( /mnt/home/ian/yard/STR/STRetch/tools/bin/samtools view -u -f 4 ../NA24385_longranger.GRCh37.bam ) | /mnt/home/ian/yard/STR/STRetch/tools/bin/samtools collate -Ou -n 128 - NA24385_L001_R1.fastq | /mnt/home/ian/yard/STR/STRetch/tools/bin/bedtools bamtofastq -i - -fq >(gzip -c > NA24385_L001_R1.fastq.gz) -fq2 >(gzip -c > NA24385_L001_R2.fastq.gz) Error reading input file ```

On Thu, Feb 1, 2018 at 4:27 PM Harriet Dashnow ***@***.***> wrote: Hey Ian, I have an idea, and if this is the problem I'm going to feel a bit silly! Are you using the pipeline on a cluster where jobs are given limited walltime? I suspect the walltime for that job is set to the default, 2 hours. So it's getting cut off mid-way through. I previously only tested the STRetch WGS pipelines on a cluster that doesn't restrict walltime. If you wouldn't mind editing the following: In STRetch/pipelines/pipeline_stages.groovy, add , "bwamem" to the end of the triple quotes in extract_reads_region. It should be line 175 if you have the version that's currently in master. The stage should now look like this: extract_reads_region = { doc "Extract reads from bam region + unaligned" def fastaname = get_fname(REF) produce(branch.sample + '_L001_R1.fastq.gz', branch.sample + '_L001_R2.fastq.gz') { exec """ cat <( $samtools view -hu -L $input.bed $input.bam ) <( $samtools view -u -f 4 $input.bam ) | $samtools collate -Ou -n 128 - $output.prefix | $bedtools bamtofastq -i - -fq >(gzip -c > $output1.gz) -fq2 >(gzip -c > $output2.gz) """, "bwamem" } } And just to be on the safe side, let's increase the walltime to 48 hours. In STRetch/pipelines/bpipe.config, change the walltime for bwa, so it looks like this: bwamem { walltime="48:00:00" procs=8 }``` I'm testing this too, but obviously it will take a day or two to get a result! — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#25 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AhnBnbUc2P3WYgZcy0K1GmQc5QKywnVTks5tQlaCgaJpZM4R0zXt> .

--

ifiddes-10x-zz · 2018-02-05T18:57:33Z

Those changes did not work. In the end, I am still seeing lots of small (~30mb) bam files, showing that `samtools collate` is doing its job properly and shuffling the reads, but the `bedtools bamtofastq` step doesn't work -- I end up with two empty .fastq.gz files. `goleft` then fails on these empty files.

hdashnow · 2018-02-06T04:04:38Z

Thanks, Ian. I've been able to replicate the error now. I'll let you know when I have a fix.

…- hopefully addresses #25

hdashnow · 2018-07-22T04:27:00Z

I have made some changes to the code to try to address this issue. Could you please update STRetch, re-install it (including the reference data) and the try again?

From the STRetch directory:

# Get the latest version
git pull 
# Delete old installation and reference data
rm -rf tools reference-data
# Reinstall STRetch (including reference-data)
./install.sh

Note that pipeline_config.groovy will be overwritten by the install script. If you have made any changes to this (such as setting the control file) you will need to edit it again.

Let me know if you continue to have issues.

ifiddes-10x-zz closed this as completed Feb 1, 2018

ifiddes-10x-zz reopened this Feb 1, 2018

hdashnow added a commit that referenced this issue Feb 15, 2018

split read extraction stage into three small stages to reduce piping …

23b50b7

…- hopefully addresses #25

hdashnow mentioned this issue Jul 19, 2018

Using Bazam to extract reads #30

Merged

hdashnow closed this as completed Jul 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in `Stage median_cov_target` #25

Error in `Stage median_cov_target` #25

ifiddes-10x-zz commented Jan 31, 2018

hdashnow commented Jan 31, 2018

ifiddes-10x-zz commented Jan 31, 2018 •

edited

Loading

hdashnow commented Jan 31, 2018

ifiddes-10x-zz commented Feb 1, 2018

hdashnow commented Feb 1, 2018

ifiddes-10x-zz commented Feb 1, 2018 via email •

edited

Loading

hdashnow commented Feb 1, 2018

ifiddes-10x-zz commented Feb 1, 2018 via email •

edited

Loading

hdashnow commented Feb 1, 2018

ifiddes-10x-zz commented Feb 1, 2018 via email •

edited

Loading

ifiddes-10x-zz commented Feb 1, 2018

hdashnow commented Feb 1, 2018

ifiddes-10x-zz commented Feb 1, 2018 via email •

edited

Loading

hdashnow commented Feb 1, 2018

hdashnow commented Feb 2, 2018

ifiddes-10x-zz commented Feb 2, 2018 via email •

edited

Loading

ifiddes-10x-zz commented Feb 5, 2018 via email •

edited

Loading

hdashnow commented Feb 6, 2018

hdashnow commented Jul 22, 2018

Error in Stage median_cov_target #25

Error in Stage median_cov_target #25

Comments

ifiddes-10x-zz commented Jan 31, 2018

hdashnow commented Jan 31, 2018

ifiddes-10x-zz commented Jan 31, 2018 • edited Loading

hdashnow commented Jan 31, 2018

ifiddes-10x-zz commented Feb 1, 2018

hdashnow commented Feb 1, 2018

ifiddes-10x-zz commented Feb 1, 2018 via email • edited Loading

hdashnow commented Feb 1, 2018

ifiddes-10x-zz commented Feb 1, 2018 via email • edited Loading

hdashnow commented Feb 1, 2018

ifiddes-10x-zz commented Feb 1, 2018 via email • edited Loading

ifiddes-10x-zz commented Feb 1, 2018

hdashnow commented Feb 1, 2018

ifiddes-10x-zz commented Feb 1, 2018 via email • edited Loading

hdashnow commented Feb 1, 2018

hdashnow commented Feb 2, 2018

ifiddes-10x-zz commented Feb 2, 2018 via email • edited Loading

ifiddes-10x-zz commented Feb 5, 2018 via email • edited Loading

hdashnow commented Feb 6, 2018

hdashnow commented Jul 22, 2018

Error in `Stage median_cov_target` #25

Error in `Stage median_cov_target` #25

ifiddes-10x-zz commented Jan 31, 2018 •

edited

Loading

ifiddes-10x-zz commented Feb 1, 2018 via email •

edited

Loading

ifiddes-10x-zz commented Feb 1, 2018 via email •

edited

Loading

ifiddes-10x-zz commented Feb 1, 2018 via email •

edited

Loading

ifiddes-10x-zz commented Feb 1, 2018 via email •

edited

Loading

ifiddes-10x-zz commented Feb 2, 2018 via email •

edited

Loading

ifiddes-10x-zz commented Feb 5, 2018 via email •

edited

Loading