Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update conditions on error for delete operation #175

Merged
merged 4 commits into from
Aug 31, 2021

Conversation

surajkota
Copy link
Member

@surajkota surajkota commented Aug 31, 2021

Description of changes:

  • Update conditions on delete in case of error. Right now the errors from delete are not reflected to the user
    • I have added a check to treat all errors as recoverable if the resource is being deleted as I am not sure what is the user experience if the resource hits a terminal condition after its deleted. Please advise

Testing:
Manually introduced an error from sdkDelete and verified it gets updated in conditions

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@@ -212,7 +212,7 @@ func (rm *resourceManager) updateConditions (
}
}

if rm.terminalAWSError(err) || err == ackerr.SecretTypeNotSupported || err == ackerr.SecretNotFound {
if !r.IsBeingDeleted() && (rm.terminalAWSError(err) || err == ackerr.SecretTypeNotSupported || err == ackerr.SecretNotFound) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is fine. But, would it be safer to just let the Terminal Condition be set on the resource even if it's being deleted?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's safe to set Terminal Condition here because if deletion encounters an error, then wouldn't the controller stop reconciling without deleting the finalizer?

Copy link
Member Author

@surajkota surajkota Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left this as-is for now, until we have another strategy. If a customer hits this(should be a corner case), the worse case is controller retires with back off

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's safe to set Terminal Condition here because if deletion encounters an error, then wouldn't the controller stop reconciling without deleting the finalizer?

Yes, and that is what the controller should do I think. If the delete operation failed on the backend AWS service API call, then the resource will not be deleted from the AWS service, so we should keep the resource around on the k8s side, but in show a terminal condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and that is what the controller should do I think. If the delete operation failed on the backend AWS service API call, then the resource will not be deleted from the AWS service, so we should keep the resource around on the k8s side, but in show a terminal condition.

Does this mean the only way to delete that resource (after resolving any errors) is for users to manually remove the finalizer via kubectl edit ? That seems weird cx or am I missing some details

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I tried the following,

  1. simulated terminal condition true on delete after which the controller stopped reconciling
  2. made a change in spec, got a warning
Warning: Detected changes to resource xgboost-model-7fmglbwd7t189edvny which is currently being deleted.
model.sagemaker.services.k8s.aws/xgboost-model-7fmglbwd7t189edvny configured

but the controller reconciled and deleted the resource from k8s and aws successfully

Since terminal codes should only be related to those which require change in spec. I think its fine to remove this.

@jaypipes I have updated the PR, please take a look

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and that is what the controller should do I think. If the delete operation failed on the backend AWS service API call, then the resource will not be deleted from the AWS service, so we should keep the resource around on the k8s side, but in show a terminal condition.

Does this mean the only way to delete that resource (after resolving any errors) is for users to manually remove the finalizer via kubectl edit ? That seems weird cx or am I missing some details

The correct user experience for manually removing a resource that is in a stuck state: using kubectl delete --force

templates/pkg/resource/sdk.go.tpl Outdated Show resolved Hide resolved
templates/pkg/resource/sdk.go.tpl Outdated Show resolved Hide resolved
@surajkota
Copy link
Member Author

@jaypipes @brycahta thanks for review

I have moved the unrelated change to separate PR, please take a look

Copy link
Collaborator

@jaypipes jaypipes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jaypipes
Copy link
Collaborator

/lgtm

@ack-bot ack-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 31, 2021
@ack-bot
Copy link
Collaborator

ack-bot commented Aug 31, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: A-Hilaly, jaypipes, surajkota

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ack-bot ack-bot merged commit 6f22b7b into aws-controllers-k8s:main Aug 31, 2021
ack-bot pushed a commit to aws-controllers-k8s/sagemaker-controller that referenced this pull request Sep 1, 2021
Description of changes:
- bring in changes from aws-controllers-k8s/code-generator#175, aws-controllers-k8s/code-generator#176
- corresponding unit and integ test changes
- fix some erroneous unit tests like model package group and endpoint config
- fix invalid name in unit test
- fix assertion in notebook instance test since PR build was failing and add it to missing canary cleanup

Testing:
unit test
tested manually, rely on PR build, later we can add more unit tests for delete failed scenario


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants