-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use PATCH without optimistick locking to remove finalizer this way will avoid conflicts #1233
Comments
Hello. |
Hi @popdaniel942 , yes this is related to that. Will investigate that, it should not happen in general (or just rarely when the resource is updated while the reconciliation running), so this issue will mitigate that or rather eliminate that problem in all cases. But will investigate why is that happening anyway. |
@popdaniel942 there is integration test on our side, that checks this, so it does not happen at least in normal flows: If you could tell more in what situation this heppens to you would be very helpful. Isn't you code open source? thx |
@csviri unfortunately the code isn't open source...I tried the simplest way possible to use the operator and I still get problem. I used version 3.0.0. The crds are generated from Java code. The reconciliation works, but the cleanup fails with the above error and then retries and it works. private static final Logger LOGGER = LoggerFactory.getLogger(ResourceController.class); @OverRide
} @OverRide
} @OverRide
} Resource:
} |
apparently with 3.0.1 this example works |
@popdaniel942 thank you! we were able to reproduce this on an other project too, I'm on it! :) |
In our full code it still doesn't work with 3.0.1. But we don't change anything in the resource In the cleanup. All we do is call some rest methods. And when DeleteControl.defaultDelete() is called it fails. And after the retry it works. |
Yes, this happens because the framework tried to remove the finalizer during a default delete. And the resource it sees in cache might have a different version than on the server. This is strange because once a resource is marked for deletion it cannot be changed, only finalizer removed. So if there are no more finalizers used, it's weird. But also since the cleanup is triggered is means that the resource (in memory) is already marked for deletion. Looking into it.. |
@popdaniel942 could you pls turn on debug level logs, with MDC values for resource name+namespace+version, and thread id present, something like this:
the styles are not important :D and send us the log files pls? thx! Also isn't some controller on the cluster that adds an additional finalizer? |
@popdaniel942 are you using |
Right now we are using UpdateControl.updateStatus but we got the error with both. What is the difference between them? Also I see that when the cleanup is made there is a foregroungDeletion finalizer being added and that the resourceVersion gets changed. Also in some cases, if we run the code in debug mode with breakpoints on the cleanup method we get the error, but if we run normally without debug mode, we don't get the error |
It seems to be that I found the problem. It is actually not a bug, but this issue (using patch for finalizer removal) will fix it, see details here: |
its described here what can go wrong with the patch and why: #1245 (comment) In case of
In this case it probably happened also occasionally before, since the finalizer removal is using optimistic locking (OL) ATM (this issues will change that)
This is probably caused because if you debug (it takes some time) to other finalizer is removed, or the resource is changed in other way. |
If interested more in depth an see why is this possible pls check this: |
Ok so the issue should be solved by this PR: Feel free to try ti out already with a local build from that branch. thx! At the end it won't be patch, but the update will be retried, see the reasons in the related issue: Hopefully the PR will be merged soon, and a new patch version released. |
Fix for the mentioned issue, is not part of mentioned issues and released under v3.0.2, will close this issue. |
For remove this makes sense in all cases.
(For add this could cause some issues, its hard to avoid "double adding" without optimistic locking" atm)
The text was updated successfully, but these errors were encountered: