-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csi indexing of *.bam.sv.bam - invalid file pointer #520
Comments
Can you validate that the index itself is corrupt? If it is, then it looks like I can't rely on htsjdk-generated .csi index files :( |
I'm waiting on a reoccurrence. |
This is reproducible on this data set, fails on 37, fine on 38. |
I now get this error. I am using:
|
FYI, I'm seeing while attempting to run our CI pipeline on large scale data with |
Additional info. Within the container attempt to count reads in region where the failure occured:
No error, but also no reads. Accessing the the BAM+CSI file with samtools 1.13 I get an appropriate response:
It appears here that the CSI is valid for samtools 1.13 but not samtools 1.10. Went back into the container:
I'm not sure what is reading the CSI index within gridss, but it appears that samtools 1.13 can read it, but not samtools 1.10. |
I think I've isolated it down to the one place that the gridss shell script is generating csi indexes instead of bai indexes (during sort). Currently building and testing this patch: diff --git a/scripts/gridss b/scripts/gridss
index 7ffec1a5..9f3e6a58 100644
--- a/scripts/gridss
+++ b/scripts/gridss
@@ -801,7 +801,7 @@ if [[ $do_preprocess == true ]] ; then
| $timecmd $samtools_sort \
-T $tmp_prefix.coordinate-tmp \
-Obam \
- -o $tmp_prefix.coordinate.bam \
+ -o $tmp_prefix.coordinate.bam##idx##$tmp_prefix.coordinate.bam.bai \
$preprocess_sort_args \
/dev/stdin \
; } 1>&2 2>> $logfile
@@ -809,8 +809,8 @@ if [[ $do_preprocess == true ]] ; then
if [[ $skipsoftcliprealignment == "true" ]] ; then
write_status "Skipping SoftClipsToSplitReads $f"
mv $tmp_prefix.coordinate.bam $prefix.sv.bam
- mv $tmp_prefix.coordinate.bam.csi $prefix.sv.bam.csi
- touch $prefix.sv.bam.csi # make sure the index file is older so we don't get htsjdk WARNING spam
+ mv $tmp_prefix.coordinate.bam.bai $prefix.sv.bam.bai
+ touch $prefix.sv.bam.bai # make sure the index file is older so we don't get htsjdk WARNING spam
else
write_status "Running SoftClipsToSplitReads $f"
rm -f $tmp_prefix.sc2sr.suppsorted.sv-tmp* From samtools docs:
|
@d-cameron I've taken a branch at the v2.12.2 tag and the docker image fails to build before I've made changes:
The more detail on the failed tests:
|
Thanks for the feedback. Going with updated samtools version requirements |
@d-cameron I just checked the latest gridss docker image and it doesn't look like the samtools version was properly updated. If you look:
|
See #549 |
Hi @d-cameron , I tried rebuilding the docker image on 12/07/21 and again today, 12/13/21, using "latest" and "v2.13.0" tags and still get the samtools version error when running the Assembly step. Here's an excerpt from .err:
My HPC support folks tell me:
How can I work around this? |
I'm finding that the csi index generated for the *.bam.sv.bam file in some instances results in an invalid file pointer:
Rerunning gives the same result, however if I reindex the file with the 1.10 samtools included in the docker image the processing can be resumed.
Under GRCh38 this works fine, but GRCh37 it fails on the same sample. Just logging here incase others encounter this.
Example command:
The text was updated successfully, but these errors were encountered: