-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resource updates during reconciliation causes status update error #1198
Comments
Which version of the SDK are you using? Does your CRD define the |
status is part of the revision of the resource. A reconcile loop reconcile a revision and should update the status of the same revision. If the spec change, a conflict occurs and the reconciler will loop on the new revision and should update the status successfully. |
I understand the underlying logic here but unfortunately our operator does some quite heavy operations (that can often take minutes even). Some of these are not retriable under some circumstances and if not persisted to the status require manual user intervention. In these cases we would much prefer to simply update the status for whatever latest revision there is instead of failing because that is much worse from the user perspective. |
You should be able to update the status during reconcile loop. Just reuse the resource returned by the fabricio client to do subsequent update and use it in the UpdateControl return result. |
Interesting, I will take a look, maybe for our use-case it is better to update the status manually to avoid these errors and simply return |
please update to latest version 2.1.4
pls see this part: https://javaoperatorsdk.io/docs/patterns-best-practices#managing-state Note that in v3 now we handle the optimsitic locking part differently, it's actually retried in the background with an up to date resource version. So this error what you see here, would normally pass. There are discussion about as mentioned by @scrocquesel We also support patch in v3 which does not do optimistic locking and won't do in future. Note that neither way is incorrect (in terms that the eventual consistency is achieved in both cases). Just some approach might be more optimal for different use-cases. There is also a technical reason why the way with optimistic locking has some advantages, see this comment: fabric8io/kubernetes-client#3943 (comment) Regarding the long reconciliations. Just out of curiosity, it's not something that could be avoided in general to have long reconciliations, like if some API calls are slow, there is nothing to do about it. But a generic rule of thumb is that the actual state is cached (See |
Attached a PR to this issue. From v3 there will be also patch (without optimistic locking in the So I would generally advice, to put the state about the external resource to a |
Maybe we should add some API support for this external state storage? |
yep, we can think about is, probably to dependent resources it makes sense very much. |
Bug Report
What did you do?
While running the Flink Kubernetes Operator we started seeing the following errors in the log:
The managed resources never interact with each other, and they only ever update the status (never the spec) of the managed resource.
We could reproduce the issue by sending an update using
kubectl apply
during a longer reconciliation step.What did you expect to see?
I would expect status updates to go through even if the user appled a spec change in the meantime, as status has nothing to do with what the user has changed.
These errors can make the operator logic much more fragile.
Environment
Kubernetes cluster type:
minikube
$ kubectl version
1.23.3
The text was updated successfully, but these errors were encountered: