Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNNScoreVariants changes #4540

Closed
cmnbroad opened this issue Mar 19, 2018 · 2 comments
Closed

CNNScoreVariants changes #4540

cmnbroad opened this issue Mar 19, 2018 · 2 comments
Assignees
Labels

Comments

@cmnbroad
Copy link
Collaborator

Minor things that should be fixed in CNNScoreVariants:

  • the header should include the lines returned from getDefaultToolVCFHeaderLines
  • the code currently (unnecessarily?) adds standard VQSR header lines via addVQSRStandardHeaderLines
  • we should consider writing the reads data as separate lines in the FIFO for the 2d case to eliminate the need to create and then parse very long lines in python to find variant and read boundaries. longer term we may want to use a more structured format such as protocol buffers
  • writeOutputVCFWithScores does a second (post-traversal) traversal, but should use whatever mechanism results from Support Java/Python bidirectional data streaming #4316
@cmnbroad
Copy link
Collaborator Author

cmnbroad commented Oct 11, 2018

@ldgauthier @lucidtronix I'm updating this with a proposed list of CNNScoreVariants issues I think need to be resolved before we can remove the @Beta tag (actually is currently marked @Experimental). Let me know what you think:

  • vqsr_cnn python package needs a more detailed code review #4538 (Python factoring/PEP-8/code review)
  • factor python args handling (minimally factor out the inference args)
  • there is only one 2D test, which I think has no reads overlapping any of the variants
  • we should add a test that specifies one or more intervals
  • the tool currently adds standard VQSR header lines via addVQSRStandardHeaderLines, which is unnecssary
  • integrate read downsampling
  • determine/handle the failure mode when the user supplies a mix (of mismatched) 1D/2D arch and weights inputs

Other (not necessarily blockers):

  • establish all defaults (weights/arch/etc) in Java code
  • default arch is 1D - should this change to 2D ?
  • see if we can remove the artificially small inference/batch sizes (1) used in the tests. I think we added these due to timeouts which should no longer be an issue.
  • remove the newExpectations code paths in integration tests

@cmnbroad
Copy link
Collaborator Author

Closed via #5548.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants