Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PrintBGZFBlockInformation: remove file extension check so that we can accept, eg., bams #5801

Merged
merged 2 commits into from
Mar 27, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@

/**
* A diagnostic tool that prints information about the compressed blocks in a BGZF format file,
* such as a .vcf.gz file. This tool can detect various kinds of BGZF file corruption such as
* premature BGZF terminator blocks, truncated files, and files that were regular-GZIPPED by
* such as a .vcf.gz file or a .bam file. This tool can detect various kinds of BGZF file corruption
* such as premature BGZF terminator blocks, truncated files, and files that were regular-GZIPPED by
* accident.
* <p>
* The output looks like this:
Expand Down Expand Up @@ -71,15 +71,10 @@ protected void onStartup() {
throw new UserException.CouldNotReadInputFile("File " + bgzfPathString + " does not exist");
}

if ( ! IOUtil.hasBlockCompressedExtension(bgzfPathString) ) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you should amend the below statement "File is not a valid BGZF file. Could possibly be a regular GZIP file?" to better reflect that the user may have provided any file under the sun.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also explicitly add a reference to the javadocs for this tool that mentions the bam use case as it sounds incredibly useful to people.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and done

throw new UserException.CouldNotReadInputFile("File " + bgzfPathString + " does not end in a recognized BGZF file extension (" +
StringUtils.join(IOUtil.BLOCK_COMPRESSED_EXTENSIONS, ",") + ")");
}

try {
// Check that the file is in BGZF format. This catches the "regular GZIP" case as well:
if ( ! IOUtil.isBlockCompressed(bgzfPath) ) {
throw new UserException.CouldNotReadInputFile(bgzfPath, "File is not a valid BGZF file. Could possibly be a regular GZIP file?");
throw new UserException.CouldNotReadInputFile(bgzfPath, "File is not a valid BGZF file. Could be a regular GZIP file, or some other non-BGZF format.");
}
}
catch ( IOException e ) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,4 +100,33 @@ public void testRegularGzipFile() throws IOException {
};
runCommandLine(args);
}

/* We should get an exception for other non-BGZF formats as well */
@Test(expectedExceptions= UserException.CouldNotReadInputFile.class)
public void testNonBGZFFile() throws IOException{
final File input = new File(dbsnp_138_b37_1_65M_vcf);
final File actualOutput = createTempFile("PrintBGZFBlockInformationIntegrationTest_testNonBGZFFile", ".out");

final String[] args = {
"--bgzf-file", input.getAbsolutePath(),
"--" + StandardArgumentDefinitions.OUTPUT_LONG_NAME, actualOutput.getAbsolutePath()
};
runCommandLine(args);
}

/* Make sure that we can handle a standard BAM file */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a test to ensure there is a nice exception when the user provides a non-BGZF file (maybe a VCF).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Test
public void testBamFile() throws IOException {
final File input = new File(packageRootTestDir + "engine/reads_data_source_test1.bam");
final File actualOutput = createTempFile("PrintBGZFBlockInformationIntegrationTest_testBamFile", ".out");
final File expectedOutput = new File(toolsTestDir + "PrintBGZFBlockInformation/expected_PrintBGZFBlockInformationIntegrationTest_testBamFile.out");

final String[] args = {
"--bgzf-file", input.getAbsolutePath(),
"--" + StandardArgumentDefinitions.OUTPUT_LONG_NAME, actualOutput.getAbsolutePath()
};
runCommandLine(args);

IntegrationTestSpec.assertEqualTextFiles(actualOutput, expectedOutput);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
BGZF block information for file: reads_data_source_test1.bam

Block #1 at file offset 0
- compressed size: 380
- uncompressed size: 1971

Block #2 at file offset 380
- compressed size: 28
- uncompressed size: 0

***************************************************************************
Final BGZF 0-byte terminator block FOUND as expected at block number 2
***************************************************************************