Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSPS-306 getResult returns signed urls for all outputs #120

Merged
merged 12 commits into from
Sep 10, 2024

Conversation

mmorgantaylor
Copy link
Collaborator

@mmorgantaylor mmorgantaylor commented Sep 6, 2024

Description

We now return signed GET (read-only) urls for pipeline run outputs, when the user calls get pipeline run result.

Added logic to redact the signature string from the signed url when we log its creation (for both PUT and GET signed urls).

To do this, we had to add two steps to the GCP flight:

  • FetchOutputsFromDataTableStep (brand new step, GCP correlary of the Azure FetchOutputsFromWdsStep)
  • CompletePipelineRunStep (reused existing step)

Jira Ticket

https://broadworkbench.atlassian.net/browse/TSPS-306

@mmorgantaylor mmorgantaylor marked this pull request as ready for review September 6, 2024 19:52
Copy link
Collaborator

@jsotobroad jsotobroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some questions/comments

}
return blobHttpUrl.substring(
blobHttpUrl.indexOf(workspaceId.toString()) + workspaceId.toString().length() + 1);
return blobHttpUrl.substring(blobHttpUrl.indexOf(delimiter) + delimiter.length() + 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think delimiter here is a little misleading cuz we're only using the first instance of it to substring the original string as opposed to every instance of it. maybe it can be called substringStart or blobNameStart? i dont love any of these :(

* @return blobName
*/
public static String getBlobNameFromTerraWorkspaceStorageHttpUrl(
String blobHttpUrl, UUID workspaceId) {
if (!blobHttpUrl.contains(workspaceId.toString())) {
String blobHttpUrl, String delimiter) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably rename blobHttpUrl to blobUrl

@@ -70,7 +70,48 @@ public URL generatePutObjectSignedUrl(String projectId, String bucketName, Strin
Storage.SignUrlOption.withExtHeaders(extensionHeaders),
Storage.SignUrlOption.withV4Signature()));

logger.info("Generated PUT signed URL: {}", url);
// remove the signature from the URL before logging
String cleanUrl = url.toString().split("X-Goog-Signature=")[0] + "X-Goog-Signature=REDACTED";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to add the rest of elements from the split() to get the full string or is X-Goog-Signature always guaranteed to be the last element in the url? If not you probably have to split the url on & and then ignore the element that has X-Goog-Signature in it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yeah I was just going by the signed url that was returned by the service, so it does currently seem to be consistently the last element, but there's no guarantee that it'll always be that way. i can make this resilient to different orders of elements.

@@ -408,6 +409,17 @@ public ApiPipelineRunOutputs formatPipelineRunOutputs(PipelineRun pipelineRun) {
pipelineRunOutputsAsMap(
pipelineOutputsRepository.findPipelineOutputsByJobId(pipelineRun.getId()).getOutputs());

// currently all outputs are paths that will need a SAS token
String workspaceStorageContainerName = pipelineRun.getWorkspaceStorageContainerName();
outputsMap.replaceAll(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is neat

@@ -408,6 +409,17 @@ public ApiPipelineRunOutputs formatPipelineRunOutputs(PipelineRun pipelineRun) {
pipelineRunOutputsAsMap(
pipelineOutputsRepository.findPipelineOutputsByJobId(pipelineRun.getId()).getOutputs());

// currently all outputs are paths that will need a SAS token
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SAS token -> signed url

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!


addStep(
new FetchOutputsFromDataTableStep(
flightBeanBag.getRawlsService(), flightBeanBag.getSamService()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh hmm do we want a different retry rule for external services. Ive jsut been using thes ame dbRetryRule for all these steps and maybe thats incorrect?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah thanks, i noticed this too but then forgot about it. added an "externalServiceRetryRule" (copied from WSM)

Copy link
Collaborator

@jsotobroad jsotobroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome this looks great, just that one comment that we discussed on slack about sanity checks since these are the things we'd be returnign back to the user

* @param signedUrl
* @return
*/
public static String cleanSignedUrl(URL signedUrl) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice this looks great

* Use for a short exponential backoff retry, for operations that should be completable within a
* few seconds.
*/
private final RetryRule externalServiceRetryRule =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

for (PipelineOutputDefinition outputDefinition : outputDefinitions) {
String keyName = outputDefinition.getName();
String wdlVariableName = outputDefinition.getWdlVariableName();
outputs.put(keyName, entity.getAttributes().get(wdlVariableName).toString());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we chatted on slack, it may be worth validating the outputs exist and are in fact not empty

Copy link
Collaborator

@jsotobroad jsotobroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome this looks great. Had a random thought about possible refactoring in the future but thats way into the future

* @param entity
* @return a map of pipeline outputs
*/
public Map<String, String> extractPipelineOutputsFromEntity(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a random thought i had about the future. I wonder if it would make sense to put these input/output focused function into its own service if this one starts getting too bloated

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it felt a little odd to put this into PipelineRuns service. I'd gladly move some of these methods into their own service/utility in future

Copy link

sonarcloud bot commented Sep 10, 2024

@mmorgantaylor mmorgantaylor merged commit d1285a6 into main Sep 10, 2024
12 checks passed
@mmorgantaylor mmorgantaylor deleted the TSPS-306_mma_signed_get_urls branch September 10, 2024 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants