-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes for HG38 exception and better logging. #4563
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4563 +/- ##
===============================================
+ Coverage 79.857% 79.932% +0.075%
- Complexity 17054 17347 +293
===============================================
Files 1067 1067
Lines 62031 63140 +1109
Branches 10039 10311 +272
===============================================
+ Hits 49536 50469 +933
- Misses 8582 8696 +114
- Partials 3913 3975 +62
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some changes and questions
@@ -397,10 +397,13 @@ public Object onTraversalSuccess() { | |||
public void closeTool() { | |||
|
|||
for(final DataSourceFuncotationFactory factory : dataSourceFactories) { | |||
factory.close(); | |||
if ( factory != null ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be better as Closeable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can change DataSourceFuncotationFactory
to be Closeable
.
} | ||
|
||
@VisibleForTesting | ||
static boolean validateVersionInformation(final int major, final int minor, final int year, final int month, final int day) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for final
on primitives for method parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to leave it as it has semantic meaning as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except that the codebase does not do this elsewhere.
logger.info("Data sources alternate source: " + alternateSource); | ||
} | ||
} | ||
catch (final Exception ex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please log information about the exception. Generally, you should not be catching Exception
, but rather a subclass of Exception
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
I want to catch any exceptions that happen there and continue so that it doesn't actually fail. This is an optional check for logging purposes anyway.
|
||
boolean dataSourcesPathIsAcceptable = true; | ||
|
||
final Path readmePath = dataSourcesPath.resolve(IOUtils.getPath(README_FILE_NAME)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is documentation about the format of the README file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add in an explanation of what is required in the README for these purposes.
arguments.addVCF(new File(inputFile)); | ||
arguments.addReference(new File("/Users/jonn/Development/references/Homo_sapiens_assembly19.fasta")); | ||
|
||
arguments.addArgument(FuncotatorArgumentDefinitions.DATA_SOURCES_PATH_LONG_NAME, "/Users/jonn/Development/funcotator_dataSources.v1.1.20180208"); | ||
arguments.addArgument(FuncotatorArgumentDefinitions.DATA_SOURCES_PATH_LONG_NAME, "/Users/jonn/Development/funcotator_dataSources_latestjnkjjk"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pathname can't possibly be correct. Must be something that would be on anyone's dev machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is intermediate and planned to be phased out.
It is checked in so that I can run my local test data and not have to worry about rewriting tests every time I need to debug something.
I added the flag doDebugTests
to control for this and a unit test to ensure that doDebugTests
is always set to false for the tests to pass.
I want to leave these in for now so I can continue to test locally with different settings easily.
@@ -275,10 +277,12 @@ public void spotCheck2() throws IOException { | |||
|
|||
final ArgumentsBuilder arguments = new ArgumentsBuilder(); | |||
|
|||
arguments.addArgument("verbosity", "INFO"); | |||
|
|||
arguments.addVCF(new File(inputFile)); | |||
arguments.addReference(new File("/Users/jonn/Development/references/Homo_sapiens_assembly19.fasta")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See next comment.
@@ -235,6 +235,8 @@ public void spotCheck() throws IOException { | |||
|
|||
final ArgumentsBuilder arguments = new ArgumentsBuilder(); | |||
|
|||
arguments.addArgument("verbosity", "INFO"); | |||
|
|||
arguments.addVCF(new File("/Users/jonn/Development/M2_01115161-TA1-filtered.vcf")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove all references to /Users/jonn/Development/
etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is intermediate and planned to be phased out.
It is checked in so that I can run my local test data and not have to worry about rewriting tests every time I need to debug something.
I added the flag doDebugTests to control for this and a unit test to ensure that doDebugTests is always set to false for the tests to pass.
I want to leave these in for now so I can continue to test locally with different settings easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why can't you put these tests into the source tree and then remove if/when no longer needed?
You are asking to put these into master by issuing this PR and we cannot cut a release with these tests -- and master needs to always be release-ready.
There should be an alternative way of doing this.
final Path p = IOUtils.getPath(pathString); | ||
if ( !isValidDirectory(p) ) { | ||
logger.warn("WARNING: Given path is not a valid directory: " + p.toUri().toString()); | ||
logger.error("ERROR: Given data source path is not a valid directory: " + p.toUri().toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice that you have a lot of error messages that do not throw exceptions. Is this correct behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an error to have an invalid data source path.
You're right - I'll throw an exception here.
String line = reader.readLine(); | ||
while ((line != null) && ((version == null) || (source == null) || (alternateSource == null))) { | ||
|
||
if (version == null && line.startsWith(README_VERSION_LINE_START)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have a README and a config file? Can't we merge those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LeeTL1220 I don't understand... there is no config file for the data sources directory - only a README file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok... I thought this code was being run for each datasource. Not the datasources as a whole...
version = versionMajor + "." + versionMinor + "." + versionYear + "" + versionMonth + "" + versionDay; | ||
} | ||
else { | ||
logger.warn("README file has improperly formatted version string: " + line); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
README files typically have minimal to zero formatting. This would indicate that we have a stricter format. Is there a good reason not to merge with the config file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LeeTL1220 There is no config file for a data sources directory - only a README file.
I guess we can call it a config file, but then no one is likely to read it and it provides information to anyone who would download / use the data sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave w/ README. Again, I thought this was being run for each datasource. Not once for all datasources.
@jonn-smith Back to you. |
OK, @LeeTL1220 - First round comments addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonn-smith I do not see how we can leave in hardocded paths in master branch. Let's discuss alternatives.
String line = reader.readLine(); | ||
while ((line != null) && ((version == null) || (source == null) || (alternateSource == null))) { | ||
|
||
if (version == null && line.startsWith(README_VERSION_LINE_START)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok... I thought this code was being run for each datasource. Not the datasources as a whole...
version = versionMajor + "." + versionMinor + "." + versionYear + "" + versionMonth + "" + versionDay; | ||
} | ||
else { | ||
logger.warn("README file has improperly formatted version string: " + line); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave w/ README. Again, I thought this was being run for each datasource. Not once for all datasources.
@@ -235,6 +235,8 @@ public void spotCheck() throws IOException { | |||
|
|||
final ArgumentsBuilder arguments = new ArgumentsBuilder(); | |||
|
|||
arguments.addArgument("verbosity", "INFO"); | |||
|
|||
arguments.addVCF(new File("/Users/jonn/Development/M2_01115161-TA1-filtered.vcf")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why can't you put these tests into the source tree and then remove if/when no longer needed?
You are asking to put these into master by issuing this PR and we cannot cut a release with these tests -- and master needs to always be release-ready.
There should be an alternative way of doing this.
@LeeTL1220 that's totally fair. There are a few problems with putting in the tests, though. I don't know that I can publicly release the data that are being tested because of their provenance (pretty sure I can't). The second reason is that the whole reference sequences are not checked into git lfs. Without it I can't annotate on a set of variants from around the whole genome, and the whole reference is pretty big. The last thing I can think of is a data source problem - I don't have all the data sources checked into git lfs (for similar reasons, especially now with dbSNP in there). Also, these have already made their way into master. So if this is really a problem, we should get them out of there ASAP. |
@jonn-smith Let's discuss offline. Yes, let's get these out of master. |
1da4c11
to
0134757
Compare
Fixed NullPointerException when onTraversalStart throws. Now logs version number information for data sources. Now checks minimum data sources version and requires that the user upgrade if the version is too old.
0134757
to
5d85137
Compare
Now dbSNP uses the `All` datasource, not the `common` data source. Refactored largeDataValidationTest to use an environment variable to point to the test data files.
funcotatorValidation tests no longer run by default. Need to enable these tests in jenkins.
Had to change the expected data because of new mappings from data source fields to MAF output fields.
Added more logging functionality to better inform users. Fixed NullPointerException when onTraversalStart throws. Now logs version number information for data sources. Now checks minimum data sources version and requires that the user upgrade if the version is too old. Fixed a bug that caused __UNKNOWN__ annotations. Refactored the spot checks into largeDataValidationTest. Now dbSNP uses the `All` datasource, not the `common` data source. Refactored largeDataValidationTest to use an environment variable to point to the test data files. Moved funcotatorValidation tests into their own group; funcotatorValidation tests no longer run by default.
Added more logging functionality to better inform users.
Fixed NullPointerException when onTraversalStart throws.
Now logs version number information for data sources.
Now checks minimum data sources version and requires that the user
upgrade if the version is too old.
Fixes #4521 in a roundabout way - data sources need to be re-released for this version to run at all, but these new data sources contain a fix for the HG38 issue.