Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: adds specific exception handling for the download operation #208

Merged
merged 5 commits into from
Feb 19, 2024
Merged

refactor: adds specific exception handling for the download operation #208

merged 5 commits into from
Feb 19, 2024

Conversation

yurimssilva
Copy link
Contributor

@yurimssilva yurimssilva commented Jan 25, 2024

What this PR changes/adds

  • Enhanced exception handling and incorporated explicit error messages for both download and upload operations, during the transfer process.

Why it does that

  • The previous implementation lacked clarity in distinguishing errors between fetching and pushing operations, potentially leading to confusion during troubleshooting.

Linked Issue(s)

Closes #200

@yurimssilva yurimssilva self-assigned this Jan 26, 2024
@yurimssilva yurimssilva added the enhancement New feature or request label Jan 26, 2024
Copy link
Member

@ndr-brt ndr-brt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to refactor the data sink a little, fetch and upload are too coupled, my suggestion would be to reach something like this:

for (var part : parts) {
    try (var input = part.openStream()) {
        // fetch logic
    } (catch Exception e) {
        return fetchFailure();
    }

    try {
        // upload logic
    } catch (Exception e) {
        return uploadFailure(e);
    }
}

that will clearly distinguish between different kind of exception without catching and retrowing exceptions.

An additional question would be: is it correct that a multi-parts upload should be stopped at the first failure? maybe yes, but that should be eventually documented with a test that explains why

@yurimssilva yurimssilva requested a review from ndr-brt January 31, 2024 11:41
@yurimssilva
Copy link
Contributor Author

yurimssilva commented Jan 31, 2024

I'd suggest to refactor the data sink a little, fetch and upload are too coupled, my suggestion would be to reach something like this:

for (var part : parts) {
    try (var input = part.openStream()) {
        // fetch logic
    } (catch Exception e) {
        return fetchFailure();
    }

    try {
        // upload logic
    } catch (Exception e) {
        return uploadFailure(e);
    }
}

that will clearly distinguish between different kind of exception without catching and retrowing exceptions.

An additional question would be: is it correct that a multi-parts upload should be stopped at the first failure? maybe yes, but that should be eventually documented with a test that explains why

The coupling between DataSource and DataSink complicates the neat separation of download and upload logic. The input resource, required outside the download scope, must be closed in any situation. I addressed this by just isolating the upload logic to handle errors within that block differently.

Regarding stopping multi-parts upload at the first failure, I think considering a potential retry strategy might be beneficial. Should we open a new issue to discuss and address this?

Copy link
Member

@ndr-brt ndr-brt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to refactor the data sink a little, fetch and upload are too coupled, my suggestion would be to reach something like this:

for (var part : parts) {
    try (var input = part.openStream()) {
        // fetch logic
    } (catch Exception e) {
        return fetchFailure();
    }

    try {
        // upload logic
    } catch (Exception e) {
        return uploadFailure(e);
    }
}

that will clearly distinguish between different kind of exception without catching and retrowing exceptions.
An additional question would be: is it correct that a multi-parts upload should be stopped at the first failure? maybe yes, but that should be eventually documented with a test that explains why

The coupling between DataSource and DataSink complicates the neat separation of download and upload logic. The input resource, required outside the download scope, must be closed in any situation. I addressed this by just isolating the upload logic to handle errors within that block differently.

the input can be transformed in a list of UploadPartRequest or RequestBody, then it can be closed before any of the upload call is made (createMultipartUpload, uploadPart, completeMultipartUpload), so the two parts can be logically split.

maybe that's not really efficient, but, in alternative, a custom exception SourceException or so could be defined and be thrown by the openStream method, so it can caught separately from all the others in the transferParts method, that will avoid the nested try catch block.

Please note that, an exception thrown by the input.readNBytes will be considered as uploadFailure (also in your code), to avoid that, the S3Part could return a InputStream decorator that catch and throws the SourceException I mentioned also for the readNBytes method.

Regarding stopping multi-parts upload at the first failure, I think considering a potential retry strategy might be beneficial. Should we open a new issue to discuss and address this?

yes, that makes sense

@yurimssilva yurimssilva changed the title feat: adds specific error messages for datasink and datasource refactor: adds specific exception handling for the download operation Feb 6, 2024
Copy link
Member

@ndr-brt ndr-brt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some cleanup needed, good otherwise

@yurimssilva yurimssilva requested a review from ndr-brt February 8, 2024 14:50
Copy link
Member

@ndr-brt ndr-brt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a missing test, please fix dependencies and then we can merge this

@paullatzelsperger paullatzelsperger removed their request for review February 15, 2024 06:20
@yurimssilva yurimssilva requested a review from ndr-brt February 15, 2024 10:29
Copy link
Member

@ndr-brt ndr-brt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, could you please fix the dependency check?

@ndr-brt ndr-brt merged commit f636401 into eclipse-edc:main Feb 19, 2024
15 checks passed
@yurimssilva yurimssilva deleted the refactor/datasink_exception_handling branch February 19, 2024 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inaccurate Error Message in S3DataSink's transferParts Method
3 participants