Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yf provide arbtrary inputs to cmm #1337

Merged
merged 8 commits into from
Jun 18, 2019
Merged

Conversation

yfarjoun
Copy link
Contributor

@yfarjoun yfarjoun commented May 24, 2019

Adding the ability to specify individual arguments to programs in CollectMultipleMetrics.

This has been requested several times. for example #1333

@cmnbroad you might get a kick from this PR...:-)


Checklist (never delete this)

Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.

Content

  • Added or modified tests to cover changes and any new functionality
  • Edited the README / documentation (if applicable)
  • All tests passing on Travis

Review

  • Final thumbs-up from reviewer
  • Rebase, squash and reword as applicable

For more detailed guidelines, see https://github.com/broadinstitute/picard/wiki/Guidelines-for-pull-requests

@yfarjoun yfarjoun requested a review from cmnbroad May 24, 2019 21:53
@coveralls
Copy link

coveralls commented May 28, 2019

Coverage Status

Coverage increased (+0.5%) to 82.186% when pulling 7786318 on yf_provide_arbtrary_inputs_to_CMM into 0b95405 on master.

Copy link
Contributor

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, and one alternative approach to consider. It might be cleaner to just have the various makeInstance implementations construct a complete command line string or argument array out of the appropriate CMM args, plus the EXTRA_ARGS that are specific to that program, and call into a common method that allows the CLP to populate the entire program object the normal way. That would eliminate the need for the distributed mix of direct and CLP-based arg population; eliminate any validation weirdness or clobbering that might arise due to the fact that the CLP only sees a subset of the actual args, and simplify the implementation of doWork.

public boolean INCLUDE_UNPAIRED = false;

@Argument(doc="extra arguments to the various tools can be specified using the following format:" +
"<PROGRAM>::<ARGUMENT_AND_VALUE> where <PROGRAM> is one of the programs specified in PROGRAM, " +
"and <ARGUMENT_AND_VALUE> are the argument and value that you'd like to specify as you would on the commandline." +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commandline -> command line

"<PROGRAM>::<ARGUMENT_AND_VALUE> where <PROGRAM> is one of the programs specified in PROGRAM, " +
"and <ARGUMENT_AND_VALUE> are the argument and value that you'd like to specify as you would on the commandline." +
"For example, to change the HISTOGRAM_WIDTH in CollectInsertSizeMetrics to 200 using the legacy parser, use:\n " +
"EXTRA_ARGUMENT=CollectInsertSizeMetrics::HISTOGRAM_WIDTH=200\n " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, the only public place where this doc gets published is with the GATK doc, where it will be incorrect, and the reference to legacy parser is irrelevant.

}
final String argumentAndValue = matcher.group("argumentAndValue");


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra nl.

for (String str : arguments) {
final Matcher matcher = pattern.matcher(str);
if (!matcher.matches()) {
throw new IllegalArgumentException("couldn't understand EXTRA_ARGUMENT " + str + " it doesn't conform to the regular expression given by " + pattern.pattern());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CommandLineException ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it doesn't hurt, but I'm not sure displaying the regular expression is useful to the user.

try {
program = Program.valueOf(programName);
} catch (IllegalArgumentException e) {
throw new IllegalArgumentException("Couldn't find program with value " + programName, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CommandLineException

instance.setDefaultHeaders(getDefaultHeaders());
programs.add(instance);
}
if(!additionalArguments.isEmpty()){
throw new IllegalArgumentException("EXTRA_ARGUMENT values were provided, but appropriate program wasn't requested:" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make more sense to throw CommandLineException ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

appropriate -> corresponding ?

Object[][] extraArgumentValue() {
List<Object[]> tests = new ArrayList<>();

// this is actually legal after convertion to non-legacy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convertion -> conversion

"and <ARGUMENT_AND_VALUE> are the argument and value that you'd like to specify as you would on the commandline." +
"For example, to change the HISTOGRAM_WIDTH in CollectInsertSizeMetrics to 200 using the legacy parser, use:\n " +
"EXTRA_ARGUMENT=CollectInsertSizeMetrics::HISTOGRAM_WIDTH=200\n " +
"Note that the following arguments cannot be modified on a per-program level: INPUT, REFERENCE_SEQUENCE, ASSUME_SORTED, and STOP_AFTER", optional = true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "cannot be modified" mean in this case ? Will it fail, or is it just not advised ? Either way there should be code to specifically detect and reject these in the code ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the user changes the METRIC_ACCUMULATION_LEVEL at the program level using this ?

private static String PROPERTY_USE_LEGACY_PARSER = "picard.useLegacyParser";
private static String PROPERTY_CONVERT_LEGACY_COMMAND_LINE = "picard.convertCommandLine";
public static String PROPERTY_USE_LEGACY_PARSER = "picard.useLegacyParser";
public static String PROPERTY_CONVERT_LEGACY_COMMAND_LINE = "picard.convertCommandLine";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do these need to be public ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover from a different attempt...

final String optionalValue = matcher.group("optionalValue");
if (optionalValue!=null && !optionalValue.equals("")) {
map.get(program).add(optionalValue);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code path needs a test case, preferably one where another arg appears in the EXTRA_ARGS list after the optional one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a test case...when there's a space after the "=" in

      "EXTRA_ARGUMENT=CollectInsertSizeMetrics::HISTOGRAM_WIDTH= 58",
          

Copy link
Contributor

@cmnbroad cmnbroad Jun 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yfarjoun I guess I misunderstood what the purpose of this code path is then. I interpreted "optional" to mean that this value was truly optional, but this is just to tolerate the space ? Although the test case passes as written (in that 58 gets propagated), this syntax won't work from the command line this way (without quoting at least - it does work with quoting).

But more importantly, although the tests seem to pass with both parsers (because args is already parsed in to an array of strings ?), I've tried a few variations manually and I haven't yet succeeded in passing EXTRA_ARGS using the Barclay parser (-Dpicard.useLegacyParser=false) from the command line at all. What is the right/intended syntax to use for extra args in that case ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'd need to quote the argument and the value together:

java -Dpicard.useLegacyParser=false -jar build/libs/picard.jar CollectMultipleMetrics \
-INPUT /Users/farjoun/Picard-public/testdata/picard/sam/summary_alignment_stats_test.sam \
-OUTPUT /var/folders/tc/hy9lszxd1dg9cf4bky51mrrd9k3s6g/T/alignmentMetrics6690349237641187401 \
-REFERENCE_SEQUENCE /Users/farjoun/Picard-public/testdata/picard/sam/summary_alignment_stats_test.fasta \
-METRIC_ACCUMULATION_LEVEL ALL_READS \
-PROGRAM null \
-PROGRAM CollectAlignmentSummaryMetrics \
-PROGRAM CollectInsertSizeMetrics \
-EXTRA_ARGUMENT "CollectInsertSizeMetrics::--HISTOGRAM_WIDTH 58" \
-EXTRA_ARGUMENT "CollectInsertSizeMetrics::--METRIC_ACCUMULATION_LEVEL LIBRARY" \
-EXTRA_ARGUMENT "CollectInsertSizeMetrics::--METRIC_ACCUMULATION_LEVEL READ_GROUP"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'll update the docs)

@yfarjoun
Copy link
Contributor Author

Thanks for the review @cmnbroad. I think I responded to all your points.

@yfarjoun
Copy link
Contributor Author

@gbggrant

Copy link
Contributor

@gbggrant gbggrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -59,8 +73,8 @@
* Includes a method for determining whether or not a Program explicitly needs a reference sequence (i.e. cannot be null)
*/

static final String USAGE_SUMMARY ="Collect multiple classes of metrics. ";
static final String USAGE_DETAILS ="This 'meta-metrics' tool runs one or more of the metrics collection modules at the same" +
static final String USAGE_SUMMARY = "Collect multiple classes of metrics. ";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing white space.

public boolean INCLUDE_UNPAIRED = false;

@Argument(doc="extra arguments to the various tools can be specified using the following format:" +
"<PROGRAM>::<ARGUMENT_AND_VALUE> where <PROGRAM> is one of the programs specified in PROGRAM, " +
"and <ARGUMENT_AND_VALUE> are the argument and value that you'd like to specify as you would on the command line." +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add space (after the period) to end of this line for readability.

instance.setDefaultHeaders(getDefaultHeaders());
programs.add(instance);
}
if(!additionalArguments.isEmpty()){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space between if and '(' and ')' and ' {'

"or, in the new parser:" +
"--EXTRA_ARGUMENT \"CollectInsertSizeMetrics::--HISTOGRAM_WIDTH 200\"\n " +
"(Quotes are required to avoid the shell from separating this into two arguments.) " +
"Note that the following arguments cannot be modified on a per-program level: INPUT, REFERENCE_SEQUENCE, ASSUME_SORTED, and STOP_AFTER." +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trailing space here too.

map.computeIfAbsent(program, (k) -> new ArrayList<>()).add(argumentAndValue);

final String optionalValue = matcher.group("optionalValue");
if (optionalValue != null && !optionalValue.equals("")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use !StringUtils.isEmpty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@yfarjoun yfarjoun merged commit ce50809 into master Jun 18, 2019
@yfarjoun yfarjoun deleted the yf_provide_arbtrary_inputs_to_CMM branch June 18, 2019 19:28
mjhipp pushed a commit to mjhipp/picard that referenced this pull request Aug 1, 2019
- added the ability to specify extra arguments in CollectMultipleMetrics!
- tested on InsertSizeMetrics
- added some shenanigans for nonLegacy conversion...
- some more work to make EXTRA_ARGUMENTS work on both legacy and new parser
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants