Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Stage median_cov_target #25

Closed
ifiddes-10x-zz opened this issue Jan 31, 2018 · 19 comments
Closed

Error in Stage median_cov_target #25

ifiddes-10x-zz opened this issue Jan 31, 2018 · 19 comments

Comments

@ifiddes-10x-zz
Copy link

I am trying to use the STRetch_wgs_bam_pipeline.groovy pipeline with a previously generated BAM against GRCh38, targeting a specific set of STRs I have identified in a BED file. After taking an hour to walk the BAM (why does it not use random access with the index?), it crashes with an error about the reference:

=========================== Stage median_cov_target (NA24143_longranger) ===========================
2018/01/31 15:01:40 NA24143.bam
chromosome: STR-A not found in NA24143.bam
chromosome: STR-AAAAAC not found in NA24143.bam
chromosome: STR-AAAAAG not found in NA24143.bam
chromosome: STR-AAAAAT not found in NA24143.bam
chromosome: STR-AAAAC not found in NA24143.bam

... <removed lots of lines>

panic: runtime error: index out of range
goroutine 1 [running]:
github.com/brentp/goleft/covmed.BamInsertSizes(0xc42009a320, 0x186a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/brentp/go/src/github.com/brentp/goleft/covmed/main.go:106 +0x6c5
github.com/brentp/goleft/covmed.Main()
        /home/brentp/go/src/github.com/brentp/goleft/covmed/main.go:153 +0x5d6
main.main()
        /home/brentp/go/src/github.com/brentp/goleft/main.go:65 +0x191


Am I misunderstanding how to use this pipeline? Do I need to be mapped against a special reference?

@hdashnow
Copy link
Collaborator

Hi @ifiddes-10x,

It looks like the error might be occurring within the goleft tool.

Could you please tell me a little more about how you have set up the pipeline?
What bed file did you provide to the pipeline? If your original bam was on hg38, it is expecting the hg38 positions of all STRs in the genome.

The messages "chromosome: STR-AAAAAC not found in NA24143.bam" etc don't necessarily indicate a problem, if for example there are no AAAAAC expansions in your genome. The message is generated by bedtools, but I could probably suppress it.

@ifiddes-10x-zz
Copy link
Author

ifiddes-10x-zz commented Jan 31, 2018

I am using a bed file of only a few specific STRs that we are interested in. Could that be the problem? Should I find a genome wide one?

Other than that, I am doing all default parameters, with my custom STR BED file, on group of hg38 BAMs like so, in a subfolder I made to reduce clutter in the main folder: ../tools/bin/bpipe run ../pipelines/STRetch_wgs_bam_pipeline.groovy ../../strs.bed ../NA*bam

@hdashnow
Copy link
Collaborator

STRetch works by remapping reads to a custom reference genome with STR decoys. It works on the premise that these reads were likely to have been mis-mapped to the standard reference genome, because the expanded sequence is no present there. The first stage in the pipeline is extracting all reads that are likely to contain STRs. So, all reads mapping to any STR locus, flanking region, and unmapped reads.
So you need to provide a bed file containing the locations of all STRs for it to work effectively. STRetch will then report on all loci, but you could filter the results to loci of interest at the end.
I've uploaded the hg19 bed file. I'll create the hg38 bed file for you and post the link here in a few hours.

@ifiddes-10x-zz
Copy link
Author

I see, I misunderstood. Thanks for the STR BED file!

@hdashnow
Copy link
Collaborator

hdashnow commented Feb 1, 2018

@ifiddes-10x The hg38 bed file is here:
https://figshare.com/s/198f7a51bd95ce16b3db

Usage:
bpipe run STRetch/pipelines/STRetch_wgs_bam_pipeline.groovy hg38.simpleRepeat_period1-6.dedup.sorted.bed sample1.bam sample2.bam

At the moment STRetch ships with hg19, so it will remap against the hg19 + STR decoys, then give you results in terms of hg19. It is definitely time for me to generate and upload a hg38 reference, but it might take me a while to create and test appropriately.

@ifiddes-10x-zz
Copy link
Author

ifiddes-10x-zz commented Feb 1, 2018 via email

@hdashnow
Copy link
Collaborator

hdashnow commented Feb 1, 2018

Your input data is in hg38 so you will need to use the file linked above. Once reads are extracted the pipeline will switch to hg19. You should already have those reference files from the installation script

@ifiddes-10x-zz
Copy link
Author

ifiddes-10x-zz commented Feb 1, 2018 via email

@hdashnow
Copy link
Collaborator

hdashnow commented Feb 1, 2018

Yep

@ifiddes-10x-zz
Copy link
Author

ifiddes-10x-zz commented Feb 1, 2018 via email

@ifiddes-10x-zz ifiddes-10x-zz reopened this Feb 1, 2018
@ifiddes-10x-zz
Copy link
Author

I am still getting the same error in goleft with the correct bed file. Any ideas?

panic: runtime error: index out of range
goroutine 1 [running]:
github.com/brentp/goleft/covmed.BamInsertSizes(0xc42009a370, 0x186a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/brentp/go/src/github.com/brentp/goleft/covmed/main.go:106 +0x6c5
github.com/brentp/goleft/covmed.Main()
        /home/brentp/go/src/github.com/brentp/goleft/covmed/main.go:153 +0x5d6
main.main()
        /home/brentp/go/src/github.com/brentp/goleft/main.go:65 +0x191

============================= Stage STR_coverage (NA24385_longranger) ==============================

=========================== Stage STR_locus_counts (NA24385_longranger) ============================
Traceback (most recent call last):
  File "/mnt/home/ian/yard/STR/STRetch/scripts/identify_locus.py", line 165, in <module>
    main()
  File "/mnt/home/ian/yard/STR/STRetch/scripts/identify_locus.py", line 93, in main
    readlen, count_noCIGAR = detect_readlen(bamfile)
TypeError: 'NoneType' object is not iterable
Cleaned up file NA24385.locus_counts to .bpipe/trash/NA24385.locus_counts
ERROR: Command failed with exit status = 1 :

STRPATH=$PATH; PATH=/mnt/home/ian/yard/STR/STRetch/tools/bin:$PATH; /mnt/home/ian/yard/STR/STRetch/tools/bin/python /mnt/home/ian/yard/STR/STRetch/scripts/identify_locus.py --bam NA24385.bam --bed /mnt/home/ian/yard/STR/STRetch/reference-data/hg19.simpleRepeat_period1-6_dedup.sorted.bed --output NA24385.locus_counts ;PATH=$STRPATH

@hdashnow
Copy link
Collaborator

hdashnow commented Feb 1, 2018

That's an error I haven't seen before.
You can see which commands it was trying to run incommandlog.txt in the working directory.
I looks like goleft is complaining about reads with zero insert sizes. And later the STR_locus_counts is complaining about a NoneType object when reading the bamfile.
Any chance the intermediate bam file is empty?
Check the bam file feeding into that stage has reads in it, and that they have pairs.

@ifiddes-10x-zz
Copy link
Author

ifiddes-10x-zz commented Feb 1, 2018 via email

@hdashnow
Copy link
Collaborator

hdashnow commented Feb 1, 2018

Hi Ian,

Okay, that makes sense. If that step is failing, no reads are getting extracted so the later stages fail. I've had the same error reported by another user, also using hg38. I'm taking a look at his data today, so hopefully I can nut it out.
Those small bam files are a by-product of the samtools collate sorting algorithm, so probably aren't much help. It does tell us the stage got up to sorting.

@hdashnow
Copy link
Collaborator

hdashnow commented Feb 2, 2018

Hey Ian,

I have an idea, and if this is the problem I'm going to feel a bit silly! Are you using the pipeline on a cluster where jobs are given limited walltime?

I suspect the walltime for that job is set to the default, 2 hours. So it's getting cut off mid-way through. I previously only tested the STRetch WGS pipelines on a cluster that doesn't restrict walltime.

If you wouldn't mind editing the following:

In STRetch/pipelines/pipeline_stages.groovy, add , "bwamem" to the end of the triple quotes in extract_reads_region. It should be line 175 if you have the version that's currently in master. The stage should now look like this:

extract_reads_region = {

    doc "Extract reads from bam region + unaligned"

    def fastaname = get_fname(REF)

    produce(branch.sample + '_L001_R1.fastq.gz', branch.sample + '_L001_R2.fastq.gz') {
        exec """
            cat <( $samtools view -hu -L $input.bed $input.bam )
                <( $samtools view -u -f 4 $input.bam ) |
            $samtools collate -Ou -n 128 - $output.prefix |
            $bedtools bamtofastq -i - -fq >(gzip -c > $output1.gz) -fq2 >(gzip -c > $output2.gz)
        """,  "bwamem"
    }
}

And just to be on the safe side, let's increase the walltime to 48 hours. In STRetch/pipelines/bpipe.config, change the walltime for bwa, so it looks like this:

    bwamem {
        walltime="48:00:00"
        procs=8
    }```

I'm testing this too, but obviously it will take a day or two to get a result!

@ifiddes-10x-zz
Copy link
Author

ifiddes-10x-zz commented Feb 2, 2018 via email

@ifiddes-10x-zz
Copy link
Author

ifiddes-10x-zz commented Feb 5, 2018 via email

@hdashnow
Copy link
Collaborator

hdashnow commented Feb 6, 2018

Thanks, Ian. I've been able to replicate the error now. I'll let you know when I have a fix.

@hdashnow
Copy link
Collaborator

I have made some changes to the code to try to address this issue. Could you please update STRetch, re-install it (including the reference data) and the try again?

From the STRetch directory:

# Get the latest version
git pull 
# Delete old installation and reference data
rm -rf tools reference-data
# Reinstall STRetch (including reference-data)
./install.sh

 
Note that pipeline_config.groovy will be overwritten by the install script. If you have made any changes to this (such as setting the control file) you will need to edit it again.
 
Let me know if you continue to have issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants