Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Requester Pays support #5140

Merged
merged 2 commits into from
Sep 4, 2018
Merged

Enable Requester Pays support #5140

merged 2 commits into from
Sep 4, 2018

Conversation

jean-philippe-martin
Copy link
Contributor

Google Cloud Storage has "requester pays" buckets. When reading from
those buckets, the requester is billed for the bandwidth (normally, it's
the bucket owner who is billed).

This pull request enables this feature with GATK. By default it is
turned off (so there are no unexpected charges). To turn it on, use the
command line argument "--project-for-requester-pays" (or
"--requester-project") to indicate which project to bill.

Example:

$ ./gatk PrintReads --input $INPUT --output=/tmp/reads.bam
fails with: com.google.cloud.storage.StorageException: Bucket is requester pays bucket but no user project provided.

$ ./gatk PrintReads --input $INPUT --output=/tmp/reads.bam --requester-project=$PROJECT
works

This PR also removes the argumentless version of
setGlobalNIODefaultOptions() because it was confusing (it uses the
default values, not those indicated by the user - usually we'd expect
the user's values to be taken into account).

Google Cloud Storage has "requester pays" buckets. When reading from
those buckets, the requester is billed for the bandwidth (normally, it's
the bucket owner who is billed).

This pull request enables this feature with GATK. By default it is
turned off (so there are no unexpected charges). To turn it on, use the
command line argument: "--project-for-requester-pays" (or
"--requester-project").

Example:

$ ./gatk PrintReads --input $INPUT --output=/tmp/reads.bam
fails with: com.google.cloud.storage.StorageException: Bucket is requester pays bucket but no user project provided.

$ ./gatk PrintReads --input $INPUT --output=/tmp/reads.bam --requester-project=$PROJECT
works

This PR also removes the argumentless version of
setGlobalNIODefaultOptions() because it was confusing (it uses the
default values, not those indicated by the user - usually we'd expect
the user's values to be taken into account).
@jean-philippe-martin
Copy link
Contributor Author

This will address issue #4828.

@codecov-io
Copy link

codecov-io commented Aug 28, 2018

Codecov Report

Merging #5140 into master will increase coverage by 1.247%.
The diff coverage is 90.909%.

@@               Coverage Diff               @@
##              master     #5140       +/-   ##
===============================================
+ Coverage     85.568%   86.814%   +1.247%     
- Complexity     28983     29935      +952     
===============================================
  Files           1810      1813        +3     
  Lines         135251    137671     +2420     
  Branches       15000     15393      +393     
===============================================
+ Hits          115731    119518     +3787     
+ Misses         14170     12682     -1488     
- Partials        5350      5471      +121
Impacted Files Coverage Δ Complexity Δ
...ellbender/cmdline/StandardArgumentDefinitions.java 0% <ø> (ø) 0 <0> (ø) ⬇️
...oadinstitute/hellbender/utils/gcs/BucketUtils.java 80.519% <100%> (+0.519%) 40 <3> (ø) ⬇️
...nstitute/hellbender/utils/gcs/BucketUtilsTest.java 59.406% <100%> (+3.492%) 12 <2> (+1) ⬆️
...stitute/hellbender/cmdline/CommandLineProgram.java 83.117% <71.429%> (+1.238%) 43 <0> (+1) ⬆️
...pynumber/CallCopyRatioSegmentsIntegrationTest.java 95.455% <0%> (-4.545%) 4% <0%> (ø)
...ncotator/mafOutput/MafOutputRendererConstants.java 98.182% <0%> (-0.838%) 2% <0%> (+1%)
...r/tools/walkers/mutect/Mutect2IntegrationTest.java 91.429% <0%> (-0.428%) 63% <0%> (+3%)
...rs/variantutils/SelectVariantsIntegrationTest.java 100% <0%> (ø) 85% <0%> (+19%) ⬆️
...uncotator/dataSources/DataSourceUtilsUnitTest.java 100% <0%> (ø) 11% <0%> (+4%) ⬆️
...dorientation/CollectF1R2CountsIntegrationTest.java 100% <0%> (ø) 18% <0%> (+2%) ⬆️
... and 65 more

@droazen droazen self-requested a review August 28, 2018 15:56
@droazen droazen self-assigned this Aug 28, 2018
Copy link
Collaborator

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review complete, back to @jean-philippe-martin with some comments.

@@ -435,7 +438,7 @@ protected void printSettings() {
final boolean usingIntelInflater = (BlockGunzipper.getDefaultInflaterFactory() instanceof IntelInflaterFactory && ((IntelInflaterFactory)BlockGunzipper.getDefaultInflaterFactory()).usingIntelInflater());
logger.info("Inflater: " + (usingIntelInflater ? "IntelInflater": "JdkInflater"));

logger.info("GCS max retries/reopens: " + BucketUtils.getCloudStorageConfiguration(NIO_MAX_REOPENS).maxChannelReopens());
logger.info("GCS max retries/reopens: " + BucketUtils.getCloudStorageConfiguration(NIO_MAX_REOPENS, "").maxChannelReopens());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a second logger message here indicating whether requester pays is enabled/disabled, and if it's enabled the project that will be billed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -92,4 +92,6 @@ private StandardArgumentDefinitions(){}
public static final String USE_JDK_INFLATER_SHORT_NAME = "jdk-inflater";
public static final String NIO_MAX_REOPENS_LONG_NAME = "gcs-max-retries";
public static final String NIO_MAX_REOPENS_SHORT_NAME = "gcs-retries";
public static final String NIO_PROJECT_FOR_REQUESTER_PAYS_LONG_NAME = "project-for-requester-pays";
public static final String NIO_PROJECT_FOR_REQUESTER_PAYS_SHORT_NAME = "requester-project";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would eliminate the short name, and just always require the long name for this argument. I think it's important for the word "pays" to always be present :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return String.join("/", Arrays.copyOfRange(split, 3, split.length));

}

/**
* Sets NIO_MAX_REOPENS and generous timeouts as the global default.
* Sets max_reopens, requester_pays, and generous timeouts as the global default.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add javadoc for the maxReopens and requesterProject args.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

.maxChannelReopens(maxReopens);
if (!Strings.isNullOrEmpty(requesterProject)) {
// enable requester pays and indicate who pays
builder = builder.autoDetectRequesterPays(true).userProject(requesterProject);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will a sensible exception be thrown if the requesterProject is invalid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not there, no. Only when trying to access the bucket.

@@ -374,11 +370,15 @@ public static void setGlobalNIODefaultOptions(int maxReopens) {
}

/** The config we want to use. **/
public static CloudStorageConfiguration getCloudStorageConfiguration(int maxReopens) {
return CloudStorageConfiguration.builder()
public static CloudStorageConfiguration getCloudStorageConfiguration(int maxReopens, String requesterProject) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc for the two method args

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -17,7 +17,7 @@
public final class BucketUtilsTest extends GATKBaseTest {

static {
BucketUtils.setGlobalNIODefaultOptions();
BucketUtils.setGlobalNIODefaultOptions(20,"");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you get the defaults from the GATKConfig instead of hardcoding them here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done.

@@ -92,4 +92,6 @@ private StandardArgumentDefinitions(){}
public static final String USE_JDK_INFLATER_SHORT_NAME = "jdk-inflater";
public static final String NIO_MAX_REOPENS_LONG_NAME = "gcs-max-retries";
public static final String NIO_MAX_REOPENS_SHORT_NAME = "gcs-retries";
public static final String NIO_PROJECT_FOR_REQUESTER_PAYS_LONG_NAME = "project-for-requester-pays";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think gcs-project-for-requester-pays would be better, and more consistent with the existing retry args.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done.

@@ -17,7 +17,7 @@
public final class BucketUtilsTest extends GATKBaseTest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some sensible way to unit test the requester-pays support (perhaps using a mock object)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well what we have here is parsing the command-line arguments, and setting the cloud storage configuration. The former I pretty much already trust. The latter I added a little test for.

Copy link
Collaborator

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 looks good, thanks @jean-philippe-martin !

@droazen droazen merged commit e6f1c6b into master Sep 4, 2018
@droazen droazen deleted the jp_requester_pays branch September 4, 2018 18:05
@jean-philippe-martin
Copy link
Contributor Author

Thank you @droazen for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants