-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-27551] Update status manually instead of relying on updatecontrol #199
Conversation
...src/main/java/org/apache/flink/kubernetes/operator/controller/FlinkDeploymentController.java
Show resolved
Hide resolved
I want to confirm what is the long term solution. Do we want to keep the manual status update or leverage the java-operator-sdk v3 to avoid the conflicts? The manual status update has a downside that the CR status could be inconsistent the real status. I think that is why we need the |
Thanks for raising this @wangyang0918 I think the we need both features in the long run:
V3 of the operator sdk will definitely solve 1. and they might actually solve 2. eventually for us. There is a larger design change that we should do after the release to help with 2: Instead of storing non-user-facing information into the status we could use an extra configmap per FlinkDeployment. This is the supported approach by the operator framework for storing peristent states for the reconciliation logic. This would already work today so we don't need to wait for any additional feature. Then we could simply move out things like lastReconciledSpec, reconciliationStatus etc that are only important for the operator and update them on the fly. So to conclude my thoughts: Long term solution would be a combination of moving to v3 and coming up with something nicer for the on-the-fly status updates which are necessary to keep some things consistent and clean. |
@gyfora Thanks for the detailed explanation. Using the java-operator-sdk v3 for robust status updates and an extra ConfigMap per FlinkDeployment for the on-the-fly status updates makes a lot of sense to me. Given that the robust status updates could be covered re-trigger a reconciliation. We could upgrade the java-operator-sdk version later. Why do we not introduce the extra ConfigMap now for the on-the-fly status updates? |
@wangyang0918 I would not rush the ConfigMap change for the following reasons:
I think this requires a FLIP and some careful consideration/feedback. We can introduce this from v1beta1 -> v1 if we want in a backward compatible way. |
I get your point and will finish the review today. Let's keep an eye on the java-operator-sdk to see whether on-the-fly status updates could also be supported :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice PR. I have no more comments.
+1 for merging.
This PR reworks how status updates of Flink resources are updated in kubernetes. This is necessary due to operator-framework/java-operator-sdk#1198
Based on offline discussion with the JOSDK team this seems to be the safest short term solution.
The caching of the latest status is necessary to avoid race condition between spec updates and status patches that are made by us.
Tests have been updated with some init logic to enable patching them through the kubernetes client + removed the reliance of mutating status.