Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CrosscheckFingerprints NIO streaming of BAMs hangs tool indefinitely #1175

Closed
1 of 2 tasks
sooheelee opened this issue May 21, 2018 · 19 comments · Fixed by #1348
Closed
1 of 2 tasks

CrosscheckFingerprints NIO streaming of BAMs hangs tool indefinitely #1175

sooheelee opened this issue May 21, 2018 · 19 comments · Fixed by #1348
Assignees

Comments

@sooheelee
Copy link
Contributor

sooheelee commented May 21, 2018

Bug Report

DavidR says to bring this to your attention @yfarjoun.

Affected tool(s)

CrosscheckFingerprints called from GATK4.0.4.0

Affected version(s)

  • Latest public release version [version?]
  • Latest development/master branch as of [date of test?]

Description

Steps to reproduce

Leaving this to Picard folks to try to reproduce.

Expected behavior

Looks like either (i) the tool isn't compatible with NIO and should produce an error or (ii) there is a bug in the NIO process for this tool that should be fixed.

@sooheelee
Copy link
Contributor Author

Should reply back to the researcher about whether the tool cannot use NIO or that this is a bug that is being fixed.

@ruslanafr
Copy link

Any estimate on when this patch might come out? Our team really needs it as soon as possible. Thanks!

@yfarjoun
Copy link
Contributor

I'm a little confused about this, as we most certainly use CrosscheckFingerprints in production using the NIO capabilities. Could you try using the Picard cloudJar for this just to minimized one additional variable?

Also, please make sure that the index for the bam is available and readable.

@sooheelee
Copy link
Contributor Author

Looks like the Picard cloudjar has been a feature starting with release v2.18.1 (March 26, 2018). The standalone jars are downloadable from the normal Picard release page at https://github.com/broadinstitute/picard/releases. If this works for you @ruslanafr, then I would recommend continuing to use this jar until we figure out what is going on with calling Picard from GATK.

@sooheelee
Copy link
Contributor Author

@yfarjoun, @ruslanafr is using the tool with two bams as input:

-I gs://fc-66b68450-9d98-490f-934f-a9d824aac4be/REBC-AC8T-TTP1-A-1-1-D-A553-36/REBC-AC8T-TTP1-A-1-1-D-A553-36.bam \
-I gs://fc-66b68450-9d98-490f-934f-a9d824aac4be/REBC-AC8T-NB1-A-1-0-D-A525-36/REBC-AC8T-NB1-A-1-0-D-A525-36.bam \

Would this make a difference?

@yfarjoun
Copy link
Contributor

That shouldn't make any difference.

@lbergelson
Copy link
Member

lbergelson commented May 26, 2018 via email

@sooheelee
Copy link
Contributor Author

@lbergelson, @ruslanafr is a researcher at the Broad. From the forum post:

/usr/local/jre1.8.0_73/bin/java -Xmx14g -jar /gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar CrosscheckFingerprints \
-I gs://fc-66b68450-9d98-490f-934f-a9d824aac4be/REBC-AC8T-TTP1-A-1-1-D-A553-36/REBC-AC8T-TTP1-A-1-1-D-A553-36.bam \
-I gs://fc-66b68450-9d98-490f-934f-a9d824aac4be/REBC-AC8T-NB1-A-1-0-D-A525-36/REBC-AC8T-NB1-A-1-0-D-A525-36.bam \
-H /cromwell_root/firecloud-tcga-open-access/tutorial/reference/Homo_sapiens_assembly19.haplotype_database.txt \
--QUIET false --EXIT_CODE_WHEN_MISMATCH 0 \
--OUTPUT crosscheck.stats.txt \
--VALIDATION_STRINGENCY LENIENT
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/fc-4aacc2d6-4017-4fc2-b95a-fb892a3562b9/70dc8591-32c2-4de1-b2f8-2af887f2a8e3/Clinical_Workflow/6794a873-5bbb-46ef-ae2c-a10e77dab94e/call-CrossCheckLaneFingerprints_Task/attempt-2/tmp.197aeb50
17:12:16.387 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu May 17 17:12:16 UTC 2018] CrosscheckFingerprints  --INPUT gs://fc-66b68450-9d98-490f-934f-a9d824aac4be/REBC-AC8T-TTP1-A-1-1-D-A553-36/REBC-AC8T-TTP1-A-1-1-D-A553-36.bam --INPUT gs://fc-66b68450-9d98-490f-934f-a9d824aac4be/REBC-AC8T-NB1-A-1-0-D-A525-36/REBC-AC8T-NB1-A-1-0-D-A525-36.bam --OUTPUT crosscheck.stats.txt --HAPLOTYPE_MAP /cromwell_root/firecloud-tcga-open-access/tutorial/reference/Homo_sapiens_assembly19.haplotype_database.txt --EXIT_CODE_WHEN_MISMATCH 0 --QUIET false --VALIDATION_STRINGENCY LENIENT  --CROSSCHECK_MODE CHECK_SAME_SAMPLE --LOD_THRESHOLD 0.0 --CROSSCHECK_BY READGROUP --NUM_THREADS 1 --CALCULATE_TUMOR_AWARE_RESULTS true --ALLOW_DUPLICATE_READS false --GENOTYPING_ERROR_RATE 0.01 --OUTPUT_ERRORS_ONLY false --LOSS_OF_HET_RATE 0.5 --EXPECT_ALL_GROUPS_TO_MATCH false --VERBOSITY INFO --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Thu May 17 17:12:16 UTC 2018] Executing as root@45dc4fe55a48 on Linux 4.9.0-0.bpo.6-amd64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.0.4.0
INFO    2018-05-17 17:12:18 CrosscheckFingerprints  Fingerprinting 2 INPUT files.

But then the task just gets stuck without printing out anything else.

@ruslanafr, is it possible for you to access the machine you are running this on and try what @lbergelson suggests above with jstack?

@ruslanafr
Copy link

ruslanafr commented May 26, 2018 via email

@kcibul
Copy link
Contributor

kcibul commented May 28, 2018

I ran into this same problem myself today running from both local as well as a GCE node. Credentials/ADC is fine, and I can gsutil the files and run them locally no problem so I don't think it's a credentials issue.

I also looked at the production/best practice pipeline, and it doesn't look like it's using NIO. Here's the snipped from the task which is localizing files

task CrossCheckFingerprints {
  Array[File] input_bams
  Array[File] input_bam_indexes

Finally, in case it helps, I took a threaddump while it was running (30+ minutes).

"pool-2-thread-1" #14 prio=5 os_prio=0 tid=0x00007f5d58ee9800 nid=0x2c7 runnable [0x00007f5d23b80000]
java.lang.Thread.State: RUNNABLE
at com.sun.crypto.provider.GaloisCounterMode.doLastBlock(GaloisCounterMode.java:394)
at com.sun.crypto.provider.GaloisCounterMode.decryptFinal(GaloisCounterMode.java:552)
at com.sun.crypto.provider.CipherCore.finalNoPadding(CipherCore.java:1046)
at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:983)
at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:491)
at javax.crypto.Cipher.doFinal(Cipher.java:2377)
at sun.security.ssl.CipherBox.decrypt(CipherBox.java:461)
at sun.security.ssl.InputRecord.decrypt(InputRecord.java:172)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1015)
- locked <0x00000000c3e44e28> (a java.lang.Object)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
- locked <0x00000000c3e44f18> (a sun.security.ssl.AppInputStream)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
- locked <0x00000000a58c6430> (a java.io.BufferedInputStream)
at sun.net.www.MeteredStream.read(MeteredStream.java:134)
- locked <0x00000000a58d1470> (a sun.net.www.http.KeepAliveStream)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3375)
at shaded.cloud_nio.com.google.api.client.http.javanet.NetHttpResponse$SizeValidatingInputStream.read(NetHttpResponse.java:169)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at shaded.cloud_nio.com.google.api.client.util.ByteStreams.copy(ByteStreams.java:51)
at shaded.cloud_nio.com.google.api.client.util.IOUtils.copy(IOUtils.java:94)
at shaded.cloud_nio.com.google.api.client.util.IOUtils.copy(IOUtils.java:63)
at shaded.cloud_nio.com.google.api.client.http.HttpResponse.download(HttpResponse.java:421)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.read(HttpStorageRpc.java:510)
at com.google.cloud.storage.BlobReadChannel$1.call(BlobReadChannel.java:127)
at com.google.cloud.storage.BlobReadChannel$1.call(BlobReadChannel.java:124)
at shaded.cloud_nio.com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:94)
at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:54)
at com.google.cloud.storage.BlobReadChannel.read(BlobReadChannel.java:124)
at com.google.cloud.storage.contrib.nio.CloudStorageReadChannel.read(CloudStorageReadChannel.java:114)
- locked <0x00000000c3e45058> (a com.google.cloud.storage.contrib.nio.CloudStorageReadChannel)
at htsjdk.samtools.seekablestream.SeekablePathStream.read(SeekablePathStream.java:86)
at htsjdk.samtools.AbstractBAMFileIndex$IndexStreamBuffer.readFully(AbstractBAMFileIndex.java:660)
at htsjdk.samtools.AbstractBAMFileIndex$IndexStreamBuffer.readInteger(AbstractBAMFileIndex.java:695)
at htsjdk.samtools.AbstractBAMFileIndex.readInteger(AbstractBAMFileIndex.java:442)
at htsjdk.samtools.AbstractBAMFileIndex.skipToSequence(AbstractBAMFileIndex.java:423)
at htsjdk.samtools.AbstractBAMFileIndex.query(AbstractBAMFileIndex.java:277)
at htsjdk.samtools.CachingBAMFileIndex.getQueryResults(CachingBAMFileIndex.java:163)
at htsjdk.samtools.CachingBAMFileIndex.getSpanOverlapping(CachingBAMFileIndex.java:66)
at htsjdk.samtools.BAMFileReader.getFileSpan(BAMFileReader.java:898)
at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:915)
at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:575)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:528)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:400)
at htsjdk.samtools.util.SamRecordIntervalIteratorFactory.makeSamRecordIntervalIterator(SamRecordIntervalIteratorFactory.java:76)
at htsjdk.samtools.util.AbstractLocusIterator.iterator(AbstractLocusIterator.java:192)
at picard.fingerprint.FingerprintChecker.fingerprintSamFile(FingerprintChecker.java:441)
at picard.fingerprint.FingerprintChecker.lambda$fingerprintFiles$4(FingerprintChecker.java:635)
at picard.fingerprint.FingerprintChecker$$Lambda$30/631766751.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

@yfarjoun
Copy link
Contributor

interesting.

The place I know that we use crosscheck with NIO is in the joint-calling pipeline, but there we are comparing vcfs against vcfs, not bams against bams...so I'm wondering if there's a problem with the BAM implementation somehow....anyhow, good information will attempt to reproduce.

@ruslanafr
Copy link

I downloaded picardcloud.jar and ran it locally. This didn't work either. The printout was exactly the same as before. My credentials are ok since I can use gsutil to copy the files to my machine.

This is the threaddump from jstack for this run:

"pool-2-thread-1" #15 prio=5 os_prio=31 tid=0x00007fe445249000 nid=0x1407 runnable [0x000070000d6e0000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
- locked <0x00000006c08e4208> (a java.lang.Object)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
- locked <0x00000006c08e4a78> (a sun.security.ssl.AppInputStream)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
- locked <0x000000076c1a4458> (a java.io.BufferedInputStream)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
- locked <0x000000076c19f410> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
- locked <0x000000076c19f410> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:347)
at shaded.cloud_nio.com.google.api.client.http.javanet.NetHttpResponse.(NetHttpResponse.java:37)
at shaded.cloud_nio.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:94)
at shaded.cloud_nio.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:972)
at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeMedia(AbstractGoogleClientRequest.java:380)
at shaded.cloud_nio.com.google.api.services.storage.Storage$Objects$Get.executeMedia(Storage.java:6133)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.read(HttpStorageRpc.java:494)
at com.google.cloud.storage.BlobReadChannel$1.call(BlobReadChannel.java:127)
at com.google.cloud.storage.BlobReadChannel$1.call(BlobReadChannel.java:124)
at shaded.cloud_nio.com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:94)
at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:54)
at com.google.cloud.storage.BlobReadChannel.read(BlobReadChannel.java:124)
at com.google.cloud.storage.contrib.nio.CloudStorageReadChannel.read(CloudStorageReadChannel.java:114)
- locked <0x00000006c06badb0> (a com.google.cloud.storage.contrib.nio.CloudStorageReadChannel)
at htsjdk.samtools.seekablestream.SeekablePathStream.read(SeekablePathStream.java:86)
at htsjdk.samtools.AbstractBAMFileIndex$IndexStreamBuffer.readFully(AbstractBAMFileIndex.java:660)
at htsjdk.samtools.AbstractBAMFileIndex$IndexStreamBuffer.readInteger(AbstractBAMFileIndex.java:695)
at htsjdk.samtools.AbstractBAMFileIndex.readInteger(AbstractBAMFileIndex.java:442)
at htsjdk.samtools.AbstractBAMFileIndex.skipToSequence(AbstractBAMFileIndex.java:423)
at htsjdk.samtools.AbstractBAMFileIndex.query(AbstractBAMFileIndex.java:277)
at htsjdk.samtools.CachingBAMFileIndex.getQueryResults(CachingBAMFileIndex.java:163)
at htsjdk.samtools.CachingBAMFileIndex.getSpanOverlapping(CachingBAMFileIndex.java:66)
at htsjdk.samtools.BAMFileReader.getFileSpan(BAMFileReader.java:898)
at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:915)
at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:575)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:528)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:400)
at htsjdk.samtools.util.SamRecordIntervalIteratorFactory.makeSamRecordIntervalIterator(SamRecordIntervalIteratorFactory.java:76)
at htsjdk.samtools.util.AbstractLocusIterator.iterator(AbstractLocusIterator.java:192)
at picard.fingerprint.FingerprintChecker.fingerprintSamFile(FingerprintChecker.java:441)
at picard.fingerprint.FingerprintChecker.lambda$fingerprintFiles$4(FingerprintChecker.java:635)
at picard.fingerprint.FingerprintChecker$$Lambda$12/1966355106.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

This looks very similar to what @kcibul printed, so I think this is probably the same issue.

@ruslanafr
Copy link

I'm not sure if this issue is connected in any way, but I also tried to run the CleanSam tool (using both GATK4.0.4.0 and picardcloud.jar) with no success.
The error message I got in both cases is:

Exception in thread "main" htsjdk.samtools.SAMException: Cannot read non-existent file: file:///Users/rfrazer/Projects/moap/testing_staging_area/gs:/5aa919de-0aa0-43ec-9ec3-288481102b6d/tcga/OV/WGA_RepliG/WXS/BI/ILLUMINA/TCGA_MC3.TCGA-04-1335-11A-01W-0489-09.bam
at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:426)
at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:413)
at picard.sam.CleanSam.doWork(CleanSam.java:76)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:282)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

It's clear that the tool tries to access the given path from my current directory and not GCS.

The above stacktrace does indicate that CrosscheckFingerprints tries to access GCS, but I thought this is worth mentioning in case the two issues are somehow connected.

@sooheelee
Copy link
Contributor Author

sooheelee commented May 29, 2018

Fyi, I can use NIO on CRAMs and BAMs with GATKv4.0.4.0 tools (PrintReads and CNV tool), so the engine itself does support successful NIO.

@lbergelson
Copy link
Member

@ruslanafr That error makes sense most picard tools don't support GCS at all.

@sooheelee
Copy link
Contributor Author

@lbergelson, can you clarify further please? When a Picard tool is hooked up to the GATK engine, and is a walker or other such type of tool that should theoretically be able to use NIO, then can such tools perform NIO via the GATK engine? Or, do the Picard tools individually have to support NIO to work within GATK?

@lbergelson
Copy link
Member

Most picard tools are imported directly into GATK without modification. There isn't any way to automatically upgrade them to use NIO. Each one needs to be individually modified and as far as I know most have not been.

@ruslanafr
Copy link

Any progress on this bug?

@yfarjoun
Copy link
Contributor

yfarjoun commented Jul 5, 2018

Hi @ruslanafr

Thanks for reporting this. It seems that there has been a regression and that in HTSJDK reading many small intervals of a bam file over NIO is now unacceptably slow.

I've submitted an htsjdk issue: samtools/htsjdk#1161 and hope that it will get resolved soon.

yfarjoun pushed a commit that referenced this issue Jun 20, 2019
…eading the index over a network possible, fixes #1175
yfarjoun pushed a commit that referenced this issue Aug 6, 2019
* - until htsjdk adds a buffer to the index stream, this change makes reading the index over a network possible, fixes #1175

* - responding to review comments and attempting to fix what seems to be an unrelated failure...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants