-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding --sort-order option to SortSamSpark #4545
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4545 +/- ##
===============================================
+ Coverage 80.355% 80.642% +0.287%
- Complexity 17714 18478 +764
===============================================
Files 1088 1089 +1
Lines 63975 66116 +2141
Branches 10313 10913 +600
===============================================
+ Hits 51407 53317 +1910
- Misses 8555 8661 +106
- Partials 4013 4138 +125
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review complete, back to @lbergelson for changes. There is an issue with the tool unconditionally using ReadCoordinateComparator
, and the tests not being valid due to using the input as the expected output.
protected String outputFile; | ||
private String outputFile; | ||
|
||
@Argument(doc="the order to sort the file into", fullName = SORT_ORDER_LONG_NAME, optional = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the order to sort the file into" -> "sort order of output file" (the latter seems more grammatical to me)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -27,17 +29,29 @@ | |||
public final class SortSamSpark extends GATKSparkTool { | |||
private static final long serialVersionUID = 1L; | |||
|
|||
public static final String SORT_ORDER_LONG_NAME = "sort-order"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a blank line after this declaration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
@Override | ||
public List<ReadFilter> getDefaultReadFilters() { | ||
return Collections.singletonList(ReadFilterLibrary.ALLOW_ALL_READS); | ||
} | ||
|
||
@Override | ||
protected void onStartup() { | ||
if( sortOrder.getComparatorInstance() == null){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unbalanced paren spacing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. I'm sure intellij will find a different place to unbalance though
@Override | ||
protected void onStartup() { | ||
if( sortOrder.getComparatorInstance() == null){ | ||
throw new UserException.BadInput("Cannot sort a file in " + sortOrder + " order. That ordering doesnt define a valid comparator. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"That ordering doesnt define a valid comparator" -> "There is no comparator defined for that ordering"
|
||
@Override | ||
public List<ReadFilter> getDefaultReadFilters() { | ||
return Collections.singletonList(ReadFilterLibrary.ALLOW_ALL_READS); | ||
} | ||
|
||
@Override | ||
protected void onStartup() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call super.onStartup()
in the first line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On startup is documented as having an empty default implementation. I can add it anyway though to future proof.
} | ||
args.add("--num-reducers"); args.add("1"); | ||
args.addArgument(GATKSparkTool.NUM_REDUCERS_LONG_NAME, "1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does num reducers need to be 1 for this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know... Any number should be fine I think.
// {"count_reads.bam", "count_reads.bam", null, ".bam", "queryname"}, | ||
// {"count_reads.cram", "count_reads.cram", "count_reads.fasta", ".cram", "queryname"}, | ||
{"count_reads.bam", "count_reads.bam", null, ".bam", "queryname"}, | ||
{"count_reads.cram", "count_reads.cram", "count_reads.fasta", ".cram", "queryname"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a valid test? The inputs and outputs are the same! I'm not convinced that queryname sorting is actually working in this branch, since you use a ReadCoordinateComparator
in the tool no matter what.
args.add("--num-reducers"); args.add("1"); | ||
args.addInput(unsortedBam); | ||
args.addOutput(outputBam); | ||
args.addArgument(GATKSparkTool.NUM_REDUCERS_LONG_NAME, "1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, why set the number of reducers to 1?
@@ -61,13 +59,32 @@ public void test() throws Exception { | |||
final File sortedBam = new File(getTestDataDir(), "count_reads_sorted.bam"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test method needs a better name than just test()
. Also, why do we need this test case in addition to testSortBAMs()
? Can they be unified?
@@ -7,7 +7,7 @@ | |||
@SQ SN:chr6 LN:101 | |||
@SQ SN:chr7 LN:404 | |||
@SQ SN:chr8 LN:202 | |||
@RG ID:0 SM:Hi,Mom! | |||
@RG ID:0 SM:Hi,Mom! PL:ILLUMINA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you need to make this change?
74bcb52
to
9f8c7c5
Compare
adding a --sort-order option to SortSamSpark to let users specify the what order to sort in enabling disabled tests fixing the tests which weren't actually asserting anything closes #1260 work in progress in progress refactoring using new SparkUtils method
0722c0f
to
4d4265b
Compare
@lbergelson looks like the tests passed this time around. We should open a ticket in Hadoop-Bam to fix the issue |
@jamesemery I've opened a ticket in hadoop bam. |
|
||
if( splits.stream().allMatch(split -> split instanceof FileVirtualSplit || split instanceof FileSplit)) { | ||
splits.sort(Comparator.comparing(split -> { | ||
if (split instanceof FileVirtualSplit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mix the virtual and non-virtual splits here. This makes me uncomfortable as I don't know that its guaranteed either is going to return the same thing for getPath(). Could you use the filename instead? As that seems to be what is being sorted here anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍
adding a --sort-order option to SortSamSpark to let users specify the what order to sort in
enabling disabled tests
fixing the tests which weren't actually asserting anything
closes #1260