Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark script improvements #5815

Merged
merged 2 commits into from
Mar 25, 2019
Merged

Spark script improvements #5815

merged 2 commits into from
Mar 25, 2019

Conversation

tomwhite
Copy link
Contributor

  • Increase master disk size
  • Use Dataproc 1.3
  • Use gz for known sites
  • Script for running genome dataset on gcs

Fixes #5152

- Increase master disk size
- Use Dataproc 1.3
- Use gz for known sites
- Script for running genome dataset on gcs
@codecov-io
Copy link

codecov-io commented Mar 19, 2019

Codecov Report

Merging #5815 into master will increase coverage by 0.003%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##              master     #5815       +/-   ##
===============================================
+ Coverage     87.029%   87.033%   +0.003%     
- Complexity     32104     32109        +5     
===============================================
  Files           1972      1972               
  Lines         147187    147194        +7     
  Branches       16201     16201               
===============================================
+ Hits          128096    128107       +11     
+ Misses         13184     13181        -3     
+ Partials        5907      5906        -1
Impacted Files Coverage Δ Complexity Δ
.../funcotator/FilterFuncotationsIntegrationTest.java 100% <0%> (ø) 5% <0%> (ø) ⬇️
...ithwaterman/SmithWatermanIntelAlignerUnitTest.java 60% <0%> (ø) 2% <0%> (ø) ⬇️
...nder/tools/funcotator/FuncotatorUtilsUnitTest.java 92.6% <0%> (+0.024%) 88% <0%> (+2%) ⬆️
...e/hellbender/tools/funcotator/FuncotatorUtils.java 88.83% <0%> (+0.06%) 194% <0%> (+1%) ⬆️
...nder/utils/runtime/StreamingProcessController.java 67.773% <0%> (+0.474%) 33% <0%> (ø) ⬇️
...utils/smithwaterman/SmithWatermanIntelAligner.java 80% <0%> (+30%) 3% <0%> (+2%) ⬆️

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are mostly just a better accounting of what you need to run your own spark tests, and consequently I haven't much to say. I am a little concerned that changing the scripts to leave a cluster alive is a little dangerous for people who don't know what they are doing. Something should probably be done to soften that blow.

--max-age 3h \
--project broad-gatk-collab

# Run scripts
for script in "$@"
do
SCRIPT_NAME="$script"
source "$script"
eval "$script" || exit $?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should loudly inform the user that it has elected to keep their cluster alive. This seems like a dangerous change to make to the script for some developers compute costs. Could you start the cluster with a time to live perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jamesemery. The cluster already has a max age of 3 hours. I've added a message to say the cluster won't be deleted immediately.

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

@tomwhite tomwhite merged commit 9a22c1f into master Mar 25, 2019
@tomwhite tomwhite deleted the tw_update_spark_scripts branch March 25, 2019 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants