-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MGMT-8571: Deploy the image service as a stateful set #3067
MGMT-8571: Deploy the image service as a stateful set #3067
Conversation
a204185
to
83277b7
Compare
83277b7
to
2145eaf
Compare
/hold committed some stuff I shouldn't have |
2145eaf
to
a8854a9
Compare
/test ? |
@carbonin: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Rebased out what I shouldn't have committed, but leaving the hold as the upgrade doesn't work.
|
Unsure if we want to wait to merge this before sorting out #3067 (comment), so I'll leave the hold for now. |
Tested this case out and it doesn't look like it's going to work unless we want to delete and re-create the entire statefulset when the storage information is updated. Specifically I got this error when I changed the CR to include storage information after an upgrade:
@mhrivnak any advice here? Is there some PVC spec that would be safe for us to default to? I could base the size on the number of |
The statefulset has to be deleted and recreated to change the storage details. That's something the operator can automate. Then we can just expose settings like size on our CR. Maybe we can extrapolate from our previous recommendations what might be a sensible default or recommended size? Is it important to migrate data in this case? Or can the new volume start empty? |
Okay, so it makes sense to recreate the stateful set 👍
If we're going to rebuild the whole stateful set then I don't think we need to change how we ask users to specify the storage details. We could validate that the size is over some reasonable minimum, but I think that's probably a different conversation.
No, the data itself will be recreated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carbonin could you elaborate (maybe in the PR description) on the choice of using StatefulSet
vs Deployment
?
Updated with a bit more detail @flaper87. Let me know if something still isn't clear. |
d84f310
to
5c77771
Compare
Marking this as WIP. With this I was able to deploy the statefulset and update the agentserviceconfig CR to add storage and have the statefulset recreate successfully. This should mean that we can keep the storage value in the CR optional which will unstick upgrade. Now I need to test a full upgrade from an older (master) version of the operator to the one in this branch which will move us from a deployment to a statefulset. |
f03ae30
to
4b13872
Compare
internal/controller/controllers/agentserviceconfig_controller.go
Outdated
Show resolved
Hide resolved
internal/controller/controllers/agentserviceconfig_controller.go
Outdated
Show resolved
Hide resolved
internal/controller/controllers/agentserviceconfig_controller.go
Outdated
Show resolved
Hide resolved
internal/controller/controllers/agentserviceconfig_controller.go
Outdated
Show resolved
Hide resolved
19f2018
to
745b009
Compare
This also stops trying to monitor the image service as a deployment and reduces the number of `Get` calls by only fetching each deployment once per loop rather than re-fetching it for each condition we want to check.
Previously, when deploying the image service as a deployment, we were writing the template images to the pod filesystem. This isn't good for performance or the health of the node we're running on. Because of this the image service now will run as a stateful set and request a PV for the data directory. If information for a new PVC has been provided in the agentserviceconfig. If no PVC information was provided an emptydir volume is used. Changing the storage template of a stateful set requires that we delete and recreate the entire statefulset. This means that, in this case, we can't use controllerutil.CreateOrUpdate. This commit creates a separate function to reconcile just this statefulset which has a section very similar to create or update but with some additional logic around when we can do a normal update and when we need to recreate the statefulset. This also adds some logic to remove the old image service deployment. This is required for upgrade. Note the specific error handling which ensures that we only delete the deployment if we successfully reconciled (created, most likely) the statefulset to replace it. A finalizer is added to the statefulset to ensure we remove the PVC as we need to support resizing the volume which will mean claiming a new PV. A finalizer is also added to the agentserviceconfig object to ensure we clean up the image-service PVCs in the case that the agentserviceconfig is deleted directly. https://issues.redhat.com/browse/MGMT-8571
745b009
to
c4833ad
Compare
/override ci/prow/e2e-metal-assisted-operator-ztp This is broken everywhere |
@carbonin: Overrode contexts on behalf of carbonin: ci/prow/e2e-metal-assisted-operator-ztp In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest Errors don't look related to this PR. |
@carbonin: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/test e2e-ai-operator-ztp-ipv4v6-3masters-ocp-49 |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: carbonin, filanov, mhrivnak The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* MGMT-8571: add optional image storage to agent service config * MGMT-8571: add image storage to operator deploy script and docs * MGMT-8571: Refactor monitoring into a separate method This also stops trying to monitor the image service as a deployment and reduces the number of `Get` calls by only fetching each deployment once per loop rather than re-fetching it for each condition we want to check. * MGMT-8571: Deploy the image service as a stateful set Previously, when deploying the image service as a deployment, we were writing the template images to the pod filesystem. This isn't good for performance or the health of the node we're running on. Because of this the image service now will run as a stateful set and request a PV for the data directory. If information for a new PVC has been provided in the agentserviceconfig. If no PVC information was provided an emptydir volume is used. Changing the storage template of a stateful set requires that we delete and recreate the entire statefulset. This means that, in this case, we can't use controllerutil.CreateOrUpdate. This commit creates a separate function to reconcile just this statefulset which has a section very similar to create or update but with some additional logic around when we can do a normal update and when we need to recreate the statefulset. This also adds some logic to remove the old image service deployment. This is required for upgrade. Note the specific error handling which ensures that we only delete the deployment if we successfully reconciled (created, most likely) the statefulset to replace it. A finalizer is added to the statefulset to ensure we remove the PVC as we need to support resizing the volume which will mean claiming a new PV. A finalizer is also added to the agentserviceconfig object to ensure we clean up the image-service PVCs in the case that the agentserviceconfig is deleted directly. https://issues.redhat.com/browse/MGMT-8571
* MGMT-8571: add optional image storage to agent service config * MGMT-8571: add image storage to operator deploy script and docs * MGMT-8571: Refactor monitoring into a separate method This also stops trying to monitor the image service as a deployment and reduces the number of `Get` calls by only fetching each deployment once per loop rather than re-fetching it for each condition we want to check. * MGMT-8571: Deploy the image service as a stateful set Previously, when deploying the image service as a deployment, we were writing the template images to the pod filesystem. This isn't good for performance or the health of the node we're running on. Because of this the image service now will run as a stateful set and request a PV for the data directory. If information for a new PVC has been provided in the agentserviceconfig. If no PVC information was provided an emptydir volume is used. Changing the storage template of a stateful set requires that we delete and recreate the entire statefulset. This means that, in this case, we can't use controllerutil.CreateOrUpdate. This commit creates a separate function to reconcile just this statefulset which has a section very similar to create or update but with some additional logic around when we can do a normal update and when we need to recreate the statefulset. This also adds some logic to remove the old image service deployment. This is required for upgrade. Note the specific error handling which ensures that we only delete the deployment if we successfully reconciled (created, most likely) the statefulset to replace it. A finalizer is added to the statefulset to ensure we remove the PVC as we need to support resizing the volume which will mean claiming a new PV. A finalizer is also added to the agentserviceconfig object to ensure we clean up the image-service PVCs in the case that the agentserviceconfig is deleted directly. https://issues.redhat.com/browse/MGMT-8571
With openshift/assisted-service#3067 we have introduced an `imageStorage` configuration option that needs to be set in the AgentServiceConfig for Infrastructure Operator. This PR adds the configuration, so that `make assisted` target can deploy the operator seamlessly.
With openshift/assisted-service#3067 we have introduced an `imageStorage` configuration option that needs to be set in the AgentServiceConfig for Infrastructure Operator. This PR adds the configuration, so that `make assisted` target can deploy the operator seamlessly.
With openshift/assisted-service#3067 we have introduced an `imageStorage` configuration option that needs to be set in the AgentServiceConfig for Infrastructure Operator. This PR adds the configuration, so that `make assisted` target can deploy the operator seamlessly.
* MGMT-8571: add optional image storage to agent service config * MGMT-8571: add image storage to operator deploy script and docs * MGMT-8571: Refactor monitoring into a separate method This also stops trying to monitor the image service as a deployment and reduces the number of `Get` calls by only fetching each deployment once per loop rather than re-fetching it for each condition we want to check. * MGMT-8571: Deploy the image service as a stateful set Previously, when deploying the image service as a deployment, we were writing the template images to the pod filesystem. This isn't good for performance or the health of the node we're running on. Because of this the image service now will run as a stateful set and request a PV for the data directory. If information for a new PVC has been provided in the agentserviceconfig. If no PVC information was provided an emptydir volume is used. Changing the storage template of a stateful set requires that we delete and recreate the entire statefulset. This means that, in this case, we can't use controllerutil.CreateOrUpdate. This commit creates a separate function to reconcile just this statefulset which has a section very similar to create or update but with some additional logic around when we can do a normal update and when we need to recreate the statefulset. This also adds some logic to remove the old image service deployment. This is required for upgrade. Note the specific error handling which ensures that we only delete the deployment if we successfully reconciled (created, most likely) the statefulset to replace it. A finalizer is added to the statefulset to ensure we remove the PVC as we need to support resizing the volume which will mean claiming a new PV. A finalizer is also added to the agentserviceconfig object to ensure we clean up the image-service PVCs in the case that the agentserviceconfig is deleted directly. https://issues.redhat.com/browse/MGMT-8571
With openshift/assisted-service#3067 we have introduced an `imageStorage` configuration option that needs to be set in the AgentServiceConfig for Infrastructure Operator. This PR adds the configuration, so that `make assisted` target can deploy the operator seamlessly.
* MGMT-8571: add optional image storage to agent service config * MGMT-8571: add image storage to operator deploy script and docs * MGMT-8571: Refactor monitoring into a separate method This also stops trying to monitor the image service as a deployment and reduces the number of `Get` calls by only fetching each deployment once per loop rather than re-fetching it for each condition we want to check. * MGMT-8571: Deploy the image service as a stateful set Previously, when deploying the image service as a deployment, we were writing the template images to the pod filesystem. This isn't good for performance or the health of the node we're running on. Because of this the image service now will run as a stateful set and request a PV for the data directory. If information for a new PVC has been provided in the agentserviceconfig. If no PVC information was provided an emptydir volume is used. Changing the storage template of a stateful set requires that we delete and recreate the entire statefulset. This means that, in this case, we can't use controllerutil.CreateOrUpdate. This commit creates a separate function to reconcile just this statefulset which has a section very similar to create or update but with some additional logic around when we can do a normal update and when we need to recreate the statefulset. This also adds some logic to remove the old image service deployment. This is required for upgrade. Note the specific error handling which ensures that we only delete the deployment if we successfully reconciled (created, most likely) the statefulset to replace it. A finalizer is added to the statefulset to ensure we remove the PVC as we need to support resizing the volume which will mean claiming a new PV. A finalizer is also added to the agentserviceconfig object to ensure we clean up the image-service PVCs in the case that the agentserviceconfig is deleted directly. https://issues.redhat.com/browse/MGMT-8571
Description
Previously, when deploying the image service as a deployment, we were
writing the template images to the pod filesystem. This isn't good for
performance or the health of the node we're running on.
Because of this the image service now will run as a stateful set and
request a PV for the data directory. If information for a new PVC has
been provided in the agentserviceconfig. If no PVC information was
provided an emptydir volume is used.
Changing the storage template of a stateful set requires that we delete
and recreate the entire statefulset. This means that, in this case, we
can't use controllerutil.CreateOrUpdate.
This commit creates a separate function to reconcile just this
statefulset which has a section very similar to create or update but with
some additional logic around when we can do a normal update and when we
need to recreate the statefulset.
This also adds some logic to remove the old image service deployment.
This is required for upgrade. Note the specific error handling which
ensures that we only delete the deployment if we successfully
reconciled (created, most likely) the statefulset to replace it.
A finalizer is added to the statefulset to ensure we remove the PVC as
we need to support resizing the volume which will mean claiming a new
PV.
A finalizer is also added to the agentserviceconfig object to ensure we
clean up the image-service PVCs in the case that the agentserviceconfig
is deleted directly.
List all the issues related to this PR
https://issues.redhat.com/browse/MGMT-8571
What environments does this code impact?
How was this code tested?
Deployed the operator locally to a crc cluster using
operator-sdk run bundle
.Saw the image service created as a stateful set with the PV attached properly.
Created an infra-env and downloaded the image.
Assignees
/cc @avishayt
/cc @pawanpinjarkar
Checklist
docs
, README, etc)Reviewers Checklist