Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete a bunch of unused methods #6107

Merged
merged 10 commits into from
Aug 26, 2019
Merged

Delete a bunch of unused methods #6107

merged 10 commits into from
Aug 26, 2019

Conversation

davidbenjamin
Copy link
Contributor

@samuelklee Would you mind looking at this cruft-removal PR?

Copy link
Contributor

@samuelklee samuelklee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks as always for being a good boy scout, @davidbenjamin! Mostly minor comments except for one rather serious (IMO) issue. Determinism of production code aside, I think we need a thorough audit of test determinism.

@@ -37,7 +31,6 @@
public static final double LOG10_ONE_THIRD = -Math.log10(3.0);
public static final double LOG_ONE_THIRD = -Math.log(3.0);
public static final double INV_LOG_2 = 1.0 / Math.log(2.0);
public static final double INV_LOG_10 = 1.0 / Math.log(10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LN_1_M_EXP_THRESHOLD appears to be unused. There are also some inconsistencies in these names (LOG vs LN, use of OF, etc.) that might be worth cleaning up while you're here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return IntStream.range(0, matrix.getRowDimension())
.mapToDouble(r -> sum(matrix.getRow(r))).toArray();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like sum(final double[][][] array) and at least a few more methods in MathUtils are only called by test code. Some of these can probably be removed; others are used for checking test results, but I'm not sure if they need to be in production code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a bit more deleting

@@ -292,7 +292,7 @@ private void takeEstep(final double[] artifactPriors) {
private static double computeLogPosterior(final int altDepth, final int altF1R2Depth, final int depth,
final double statePrior, final BetaDistributionShape afPseudoCounts,
final BetaDistributionShape f1r2PseudoCounts){
Utils.validateArg(MathUtils.isAProbability(statePrior), String.format("statePrior must be a probability but got %f", statePrior));
Utils.validateArg(MathUtils.goodProbability(statePrior), String.format("statePrior must be a probability but got %f", statePrior));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think isValidProbability and isValidLog10Probability are probably better names for these methods, but I'll leave it up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

final double[] correctedNoiseBySample = MathUtils.rowStdDevs(correctedCoverage);
Arrays.stream(correctedNoiseBySample).forEach(x -> Assert.assertTrue(x < NON_GC_BIAS_NOISE_LEVEL * MEAN_READ_DEPTH));
Utils.nonNull(correctedCoverage);
final Variance varianceEvaluator = new Variance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should just use StandardDeviation (although this actually uses Variance under the hood as well).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

final double mean = MathUtils.mean(dresults, 0, dresults.length);
final double std = new StandardDeviation().evaluate(dresults);
final double mean = Arrays.stream(results).average().getAsDouble();
final double std = new StandardDeviation().evaluate(Arrays.stream(results).mapToDouble(i->i).toArray());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, is there some reason we don't typically use asDoubleStream in such code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No good reason.

@@ -1,6 +1,7 @@
package org.broadinstitute.hellbender.utils.locusiterator;

import org.broadinstitute.hellbender.utils.MathUtils;
import org.broadinstitute.hellbender.utils.Utils;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove MathUtils import (probably should just optimize imports across the board).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -114,7 +115,7 @@ private void makeReads() {
int alignmentStart = 1;

for ( int readsThisStack : readCountsPerAlignmentStart ) {
ArrayList<GATKRead> stackReads = new ArrayList<>(ArtificialReadUtils.createIdenticalArtificialReads(readsThisStack, header, "foo", 0, alignmentStart, MathUtils.randomIntegerInRange(50, 100)));
ArrayList<GATKRead> stackReads = new ArrayList<>(ArtificialReadUtils.createIdenticalArtificialReads(readsThisStack, header, "foo", 0, alignmentStart, Utils.getRandomGenerator().nextInt(51) + 50));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was/is this deterministic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, just a cursory examination shows a number of other tests that do not reset the seed. @lbergelson @jamesemery considering your recent foray into checking determinism, perhaps it's time for a thorough audit? Any reason why we need to use a global generator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps go ahead and extract a variable for the read length while we're here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed lots of missing finals in this class, up to you if you want to fix those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a bunch of stuff

@davidbenjamin
Copy link
Contributor Author

Back to @samuelklee.

Copy link
Contributor

@samuelklee samuelklee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making changes, @davidbenjamin! A few comments on your refactoring of PerSampleReadStateManagerUnitTest. OK to merge once addressed, but please double check that behavior is unchanged from the original test. Probably worth it to explicitly mention this refactoring in the commit messages, too.

}

@Test(dataProvider = "PerSampleReadStateManagerTestDataProvider")
public void runPerSampleReadStateManagerTest( PerSampleReadStateManagerTester test ) {
logger.warn("Running test: " + test);

Utils.resetRandomGenerator();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove logger.warn line above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resetting the seed here non-trivially changes the set of test cases. Is this desired?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also put the resetting of the seed as close as possible to the use of the random generator to make the behavior more obvious. I would put the generation of lengths in createPerSampleReadStateManagerTests and reset the seed at the beginning of that method. This essentially encapsulates the use of the generator in the data provider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I can confirm that this restores the original set of test cases, so I'm assuming the rest of your refactoring is good!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done done done

private void makeReads() {
int alignmentStart = 1;

private void makeReads(List<GATKRead> reads, List<ArrayList<AlignmentStateMachine>> recordStatesByAlignmentStart) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameters should be final, but I think you should get rid of this method as discussed above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true

PerSampleReadStateManager perSampleReadStateManager = new PerSampleReadStateManager(LocusIteratorByState.NO_DOWNSAMPLING);
final List<GATKRead> reads = new ArrayList<>();
final List<ArrayList<AlignmentStateMachine>> recordStatesByAlignmentStart = new ArrayList<>();
makeReads(reads, recordStatesByAlignmentStart);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that a makeReads method that modifies its parameters is unnecessary, but I admit that I haven't looked at the code in detail. I think it would be more intuitive to generate and directly assign the nested list of GATKReads and then flatten and map to the nested list of AlignmentStateMachines as needed. Probably a couple of stream statements would suffice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// The read we get back should be literally the same read in memory as we put in
Assert.assertTrue(originalRead == readFromPerSampleReadStateManager);
Assert.assertTrue(originalReadsIterator.hasNext());
Assert.assertTrue(originalReadsIterator.next() == stateIterator.next().getRead());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing state in the assert statement doesn't seem necessary. A developer might reasonably assume that deleting assert statements should remove checks of behavior without changing behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if ( readCount % removalInterval == 0 ) {
originalRead = originalReadsIterator.next(); // advance to next read, since the previous one should have been discarded
readCount++;
Iterator<AlignmentStateMachine> secondPassStateIterator = perSampleReadStateManager.iterator();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be final.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// The read we get back should be literally the same read in memory as we put in (after accounting for removals)
Assert.assertTrue(originalRead == readFromPerSampleReadStateManager);
Assert.assertTrue(secondPassStateIterator.hasNext());
Assert.assertTrue(reads.get(readIndex) == secondPassStateIterator.next().getRead());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

new PerSampleReadStateManagerTester(thisTestReadStateCounts, removalInterval);
}
}
final List<Integer> removeIntervals = Arrays.asList(0, 2, 3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removalIntervals is more consistent. Might be worth changing names everywhere to indicate that these are interval indices, but I leave that up to you.

@davidbenjamin davidbenjamin dismissed samuelklee’s stale review August 26, 2019 11:13

Reviewer said to merge once changes made

@davidbenjamin
Copy link
Contributor Author

@samuelklee Did everything, need your approving review to merge.

@samuelklee
Copy link
Contributor

Right, sorry about that. Thanks for making the changes!

@davidbenjamin davidbenjamin merged commit bbba2e0 into master Aug 26, 2019
@davidbenjamin davidbenjamin deleted the db_tidying branch August 26, 2019 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants