-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Yf provide arbtrary inputs to cmm #1337
Conversation
…ics! - tested on InsertSizeMetrics
…ics! - tested on InsertSizeMetrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments, and one alternative approach to consider. It might be cleaner to just have the various makeInstance
implementations construct a complete command line string or argument array out of the appropriate CMM args, plus the EXTRA_ARGS that are specific to that program, and call into a common method that allows the CLP to populate the entire program object the normal way. That would eliminate the need for the distributed mix of direct and CLP-based arg population; eliminate any validation weirdness or clobbering that might arise due to the fact that the CLP only sees a subset of the actual args, and simplify the implementation of doWork
.
public boolean INCLUDE_UNPAIRED = false; | ||
|
||
@Argument(doc="extra arguments to the various tools can be specified using the following format:" + | ||
"<PROGRAM>::<ARGUMENT_AND_VALUE> where <PROGRAM> is one of the programs specified in PROGRAM, " + | ||
"and <ARGUMENT_AND_VALUE> are the argument and value that you'd like to specify as you would on the commandline." + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commandline
-> command line
"<PROGRAM>::<ARGUMENT_AND_VALUE> where <PROGRAM> is one of the programs specified in PROGRAM, " + | ||
"and <ARGUMENT_AND_VALUE> are the argument and value that you'd like to specify as you would on the commandline." + | ||
"For example, to change the HISTOGRAM_WIDTH in CollectInsertSizeMetrics to 200 using the legacy parser, use:\n " + | ||
"EXTRA_ARGUMENT=CollectInsertSizeMetrics::HISTOGRAM_WIDTH=200\n " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, the only public place where this doc gets published is with the GATK doc, where it will be incorrect, and the reference to legacy parser is irrelevant.
} | ||
final String argumentAndValue = matcher.group("argumentAndValue"); | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra nl.
for (String str : arguments) { | ||
final Matcher matcher = pattern.matcher(str); | ||
if (!matcher.matches()) { | ||
throw new IllegalArgumentException("couldn't understand EXTRA_ARGUMENT " + str + " it doesn't conform to the regular expression given by " + pattern.pattern()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CommandLineException ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it doesn't hurt, but I'm not sure displaying the regular expression is useful to the user.
try { | ||
program = Program.valueOf(programName); | ||
} catch (IllegalArgumentException e) { | ||
throw new IllegalArgumentException("Couldn't find program with value " + programName, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CommandLineException
instance.setDefaultHeaders(getDefaultHeaders()); | ||
programs.add(instance); | ||
} | ||
if(!additionalArguments.isEmpty()){ | ||
throw new IllegalArgumentException("EXTRA_ARGUMENT values were provided, but appropriate program wasn't requested:" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make more sense to throw CommandLineException ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appropriate -> corresponding ?
Object[][] extraArgumentValue() { | ||
List<Object[]> tests = new ArrayList<>(); | ||
|
||
// this is actually legal after convertion to non-legacy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convertion -> conversion
"and <ARGUMENT_AND_VALUE> are the argument and value that you'd like to specify as you would on the commandline." + | ||
"For example, to change the HISTOGRAM_WIDTH in CollectInsertSizeMetrics to 200 using the legacy parser, use:\n " + | ||
"EXTRA_ARGUMENT=CollectInsertSizeMetrics::HISTOGRAM_WIDTH=200\n " + | ||
"Note that the following arguments cannot be modified on a per-program level: INPUT, REFERENCE_SEQUENCE, ASSUME_SORTED, and STOP_AFTER", optional = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "cannot be modified" mean in this case ? Will it fail, or is it just not advised ? Either way there should be code to specifically detect and reject these in the code ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the user changes the METRIC_ACCUMULATION_LEVEL at the program level using this ?
private static String PROPERTY_USE_LEGACY_PARSER = "picard.useLegacyParser"; | ||
private static String PROPERTY_CONVERT_LEGACY_COMMAND_LINE = "picard.convertCommandLine"; | ||
public static String PROPERTY_USE_LEGACY_PARSER = "picard.useLegacyParser"; | ||
public static String PROPERTY_CONVERT_LEGACY_COMMAND_LINE = "picard.convertCommandLine"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do these need to be public ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover from a different attempt...
final String optionalValue = matcher.group("optionalValue"); | ||
if (optionalValue!=null && !optionalValue.equals("")) { | ||
map.get(program).add(optionalValue); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code path needs a test case, preferably one where another arg appears in the EXTRA_ARGS list after the optional one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a test case...when there's a space after the "=" in
"EXTRA_ARGUMENT=CollectInsertSizeMetrics::HISTOGRAM_WIDTH= 58",
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yfarjoun I guess I misunderstood what the purpose of this code path is then. I interpreted "optional" to mean that this value was truly optional, but this is just to tolerate the space ? Although the test case passes as written (in that 58 gets propagated), this syntax won't work from the command line this way (without quoting at least - it does work with quoting).
But more importantly, although the tests seem to pass with both parsers (because args is already parsed in to an array of strings ?), I've tried a few variations manually and I haven't yet succeeded in passing EXTRA_ARGS using the Barclay parser (-Dpicard.useLegacyParser=false) from the command line at all. What is the right/intended syntax to use for extra args in that case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you'd need to quote the argument and the value together:
java -Dpicard.useLegacyParser=false -jar build/libs/picard.jar CollectMultipleMetrics \
-INPUT /Users/farjoun/Picard-public/testdata/picard/sam/summary_alignment_stats_test.sam \
-OUTPUT /var/folders/tc/hy9lszxd1dg9cf4bky51mrrd9k3s6g/T/alignmentMetrics6690349237641187401 \
-REFERENCE_SEQUENCE /Users/farjoun/Picard-public/testdata/picard/sam/summary_alignment_stats_test.fasta \
-METRIC_ACCUMULATION_LEVEL ALL_READS \
-PROGRAM null \
-PROGRAM CollectAlignmentSummaryMetrics \
-PROGRAM CollectInsertSizeMetrics \
-EXTRA_ARGUMENT "CollectInsertSizeMetrics::--HISTOGRAM_WIDTH 58" \
-EXTRA_ARGUMENT "CollectInsertSizeMetrics::--METRIC_ACCUMULATION_LEVEL LIBRARY" \
-EXTRA_ARGUMENT "CollectInsertSizeMetrics::--METRIC_ACCUMULATION_LEVEL READ_GROUP"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'll update the docs)
Thanks for the review @cmnbroad. I think I responded to all your points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -59,8 +73,8 @@ | |||
* Includes a method for determining whether or not a Program explicitly needs a reference sequence (i.e. cannot be null) | |||
*/ | |||
|
|||
static final String USAGE_SUMMARY ="Collect multiple classes of metrics. "; | |||
static final String USAGE_DETAILS ="This 'meta-metrics' tool runs one or more of the metrics collection modules at the same" + | |||
static final String USAGE_SUMMARY = "Collect multiple classes of metrics. "; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing white space.
public boolean INCLUDE_UNPAIRED = false; | ||
|
||
@Argument(doc="extra arguments to the various tools can be specified using the following format:" + | ||
"<PROGRAM>::<ARGUMENT_AND_VALUE> where <PROGRAM> is one of the programs specified in PROGRAM, " + | ||
"and <ARGUMENT_AND_VALUE> are the argument and value that you'd like to specify as you would on the command line." + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add space (after the period) to end of this line for readability.
instance.setDefaultHeaders(getDefaultHeaders()); | ||
programs.add(instance); | ||
} | ||
if(!additionalArguments.isEmpty()){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space between if and '(' and ')' and ' {'
"or, in the new parser:" + | ||
"--EXTRA_ARGUMENT \"CollectInsertSizeMetrics::--HISTOGRAM_WIDTH 200\"\n " + | ||
"(Quotes are required to avoid the shell from separating this into two arguments.) " + | ||
"Note that the following arguments cannot be modified on a per-program level: INPUT, REFERENCE_SEQUENCE, ASSUME_SORTED, and STOP_AFTER." + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trailing space here too.
map.computeIfAbsent(program, (k) -> new ArrayList<>()).add(argumentAndValue); | ||
|
||
final String optionalValue = matcher.group("optionalValue"); | ||
if (optionalValue != null && !optionalValue.equals("")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use !StringUtils.isEmpty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
- added the ability to specify extra arguments in CollectMultipleMetrics! - tested on InsertSizeMetrics - added some shenanigans for nonLegacy conversion... - some more work to make EXTRA_ARGUMENTS work on both legacy and new parser
Adding the ability to specify individual arguments to programs in CollectMultipleMetrics.
This has been requested several times. for example #1333
@cmnbroad you might get a kick from this PR...:-)
Checklist (never delete this)
Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.
Content
Review
For more detailed guidelines, see https://github.com/broadinstitute/picard/wiki/Guidelines-for-pull-requests