Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBInstance does not retry Create for invalid spec values #1139

Closed
RedbackThomson opened this issue Jan 20, 2022 · 13 comments
Closed

DBInstance does not retry Create for invalid spec values #1139

RedbackThomson opened this issue Jan 20, 2022 · 13 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@RedbackThomson
Copy link
Contributor

Describe the bug
If a password is not specified when first creating an RDS DBInstance CR, the Create* call fails and sets the CR into a terminal condition. If you update the password field after this, the call is not re-attempted and the instance is forever stuck in a terminal state.

Steps to reproduce

  1. Create a DBInstance with an invalid masterUserPassword value
  2. Watch for the DBInstance to become terminal
  3. Update the masterUserPassword value to be valid
  4. See that the DBInstance never changes

Expected outcome
The DBInstance should be created with the new masterUserPassword if the initial Create failed.

Environment

  • Kubernetes version
  • Using EKS (yes/no), if so version?
  • AWS service targeted (S3, RDS, etc.) RDS
@RedbackThomson RedbackThomson added kind/bug Categorizes issue or PR as related to a bug. RDS labels Jan 20, 2022
@RedbackThomson
Copy link
Contributor Author

From our Slack conversation regarding this issue:

When we generate the RDS controller, we set which error responses should be classified as terminal. I can see that InvalidParameterValue is one of those. It might be worth it for us to remove that one from the list, since that blocks your use case.

@ack-bot
Copy link
Collaborator

ack-bot commented Apr 20, 2022

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Provide feedback via https://github.com/aws-controllers-k8s/community.
/lifecycle stale

@ack-bot ack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 20, 2022
@vijtrip2
Copy link
Contributor

/lifecycle frozen

@ack-bot ack-bot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 21, 2022
@Samze
Copy link

Samze commented Jun 2, 2022

I also bumped into this. This is especially problematic when applying the DBInstance & Secret at the same time, it introduces a race condition. If the Secret is created first the DBInstance will succeed, if the DBInstance is created first then it will fail and never reconcile. Ideally it would periodically re-reconcile on an exponential backoff.

@jkatz jkatz added this to RDS Roadmap Jun 3, 2022
@jkatz jkatz moved this to In Progress in RDS Roadmap Jun 3, 2022
@RedbackThomson
Copy link
Contributor Author

@Samze I see that in our logic for retrieving fields from Secret, we will throw an error if the Secret is not found - https://github.com/aws-controllers-k8s/runtime/blob/main/pkg/runtime/reconciler.go#L121 . This should cause the reconciler to requeue (without any backoff) until it succeeds. Is this the scenario you've been facing? Or have you been supplying an empty or invalid value inside the Secret, which is what's being used to create the instance?

@Samze
Copy link

Samze commented Jun 3, 2022

@RedbackThomson thanks for your response. I'm pointing DBInstance to a non-existing secret. Interesting, I'm not seeing any requeue and I see DBInstance get a " terminal condition" condition, which I assume means it won't retry.

This is what I'm doing:

  1. Create DBInstance with reference to non-existing secret.
  2. Check status, it shows.
  conditions:
  - message: kubernetes secret not found
    status: "True"
    type: ACK.Terminal
  - lastTransitionTime: "2022-06-03T18:25:10Z"
    message: Resource synced successfully
    reason: resource is in terminal condition
    status: "True"
    type: ACK.ResourceSynced
  1. Apply secret
  2. Wait for changes. Nothing changes

For reference I'm using 0.0.24 of the ACK RDS Operator.

Also didn't mean to hijack this thread, happy to make another issue if needed.

@RedbackThomson
Copy link
Contributor Author

Oh yeah this should not be a terminal condition. We definitely want to retry when we see that. If you could please put this into a separate issue, this is no longer just an RDS bug - it's a bug with the general way we handle secrets (and errors). I see the problem in the code, though!

@Samze
Copy link

Samze commented Jun 3, 2022

Thanks, created #1318

@jkatz jkatz removed RDS lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. labels Jun 28, 2022
@brucegucode
Copy link
Member

issue-1318 is closed, we can close this one too.
#1318

@brucegucode
Copy link
Member

/close

@ack-bot
Copy link
Collaborator

ack-bot commented Jul 6, 2022

@brucegucode: You can't close an active issue/PR unless you authored it or you are a collaborator.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jkatz
Copy link
Contributor

jkatz commented Jul 7, 2022

/close

@ack-bot
Copy link
Collaborator

ack-bot commented Jul 7, 2022

@jkatz: You can't close an active issue/PR unless you authored it or you are a collaborator.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jkatz jkatz closed this as completed Jul 7, 2022
Repository owner moved this from In Progress to Done in RDS Roadmap Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
No open projects
Status: Done
Development

No branches or pull requests

6 participants