-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
input streams opened from Paths fail when the underlying file is a pipe #1084
Comments
We should also file a bug with java. |
* re-enable reading bam files from pipes fixing an issue that prevented reading a bam file from an unseekable file, ex: a unix pipe fixes #1083 which was introduced by #1077 this only fixes the case where the bam is being opened as file, a similar issue exists for Paths (#1084) support for reading pipes as paths has never worked
@lbergelson I just hit this issue with when upgrading from 2.13.0 to 2.14.3. Not supporting |
I can also confirm that reading from stdin works in 2.14.1 and not in 2.14.2, with the latter including #1077 and on. |
@nh13 & @lbergelson I've opened PR #1118 which cleans up the test for named pipes a little bit, and more importantly makes it so that it now executes twice, once where the input resource is a @magicDGS since it sounds like it was your PR that affected this, could you take a look please? It looks to me like your implementation of available() is causing |
@nh13 Did reading from Stdin as a path work before? I was under the impression that it had never worked. There's a bug in the java implementation of
|
@lbergelson it fails in 2.14.2 and works in 2.14.1: fulcrumgenomics/fgbio#404 |
If it worked before it worked with the very nasty caveat that the if you were reading something GZIP'd from a pipe you could have random corruption of the data you were reading. See broadinstitute/gatk#4224 The issue is that the GzipInputStream uses |
That's concerning with respect to |
@lbergelson The regression we're seeing is when reading BAM specifically, which uses I think part of the problem is that we have too many code-paths for opening SAM/BAM/CRAM. Am I right in saying that all |
Also, if the JDK |
Ah, you're right about it for bam. I was thinking of the tribble case. We definitely have to many ways to open things. I would love to get rid of Having our own |
I can't look at it soon because I have some technical problems with my computer which are slowing down ny development. I definitely agree that this is a regression that should be fixed, and I will think about it it but I usually don't work with pipes in Java. Maybe wrapping the stream in s buffered one will help. On the other hand, I guess that htsjdk3 should be java.nio.Path implemented, and forget about File... |
Opening a unix pipe using
Files.newInputStream()
results in anInputStream
. However, if you callavailable()
on that input stream it crashes with an error in native code due to an illegal seek. The implementation of available() is unsafe when using an unseekable backing stream. Unfortunately, this is a bug in the java standard library and we can't easily change it.This is problematic because
BufferedInputStream
usesavailable
as part of it's call toread()
.There are some potential solutions.
This is related to #1083 but this problem existed before #1077
The text was updated successfully, but these errors were encountered: