Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update troubleshooting items and how to patch along with the new 2.8 release #354

Merged
merged 1 commit into from
Jun 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 64 additions & 2 deletions docs/patching_subscription_image.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,68 @@
# Patching ACM hub and managed clusters with another subscription container images

## Patching hub cluster
## Patching hub cluster and managed clusters together (ACM >= 2.8)

To patch the subscription image, here are the steps:

`quay.io/xiangjingli/multicloud-operators-subscription@sha256:51f12144c277e33b34c18295468a7f375a2261eafc124b1f427253d3924c4867`

- On the hub, Get the namespace and name of the MCH resource
```
% oc get mch -A
NAMESPACE NAME STATUS AGE
open-cluster-management multiclusterhub Running 16h
```

- Create a ConfigMap to reference the images provided in the hotfix
```
$ oc apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: redhat-acm-hotfix-mintls12
namespace: open-cluster-management # this is the MCH namespace
labels:
operator.multicluster.openshift.io/hotfix: redhat-acm-hotfix-mintls12
data:
manifest.json: |-
[
{
"image-remote": "quay.io/xiangjingli",
"image-key": "multicluster_operators_subscription",
"image-name": "multicloud-operators-subscription",
"image-digest": "sha256:51f12144c277e33b34c18295468a7f375a2261eafc124b1f427253d3924c4867"
}
]
EOF
```

- Activate the hotfix by applying an annotation to the MCH resource for overriding the images specified in the configmap
```
$ oc -n open-cluster-management annotate mch multiclusterhub --overwrite mch-imageOverridesCM=redhat-acm-hotfix-mintls12
```

- The following hub subscription pods are expected to be restarted and running with the new hot fix image
```
% oc get pods -n open-cluster-management |grep subscription
multicluster-operators-hub-subscription-5cfdf4bb84-xcc9z 1/1 Running 0 50m
multicluster-operators-standalone-subscription-5467dcdbcc-2w8l2 1/1 Running 0 50m
multicluster-operators-subscription-report-57b776ccf9-ktvph 1/1 Running 0 50m
```

- (optional) Restart the RHACM operator on the hub
if hub subscription pods are not restarted after a while, restart the RHACM operator pods to ensure that the operator picks up the hotfix configuration
```
$ oc -n open-cluster-management scale deployment multiclusterhub-operator --replicas=0
$ oc -n open-cluster-management scale deployment multiclusterhub-operator --replicas=1
```

- Go to all managed clusters, make sure the following application-manager pod is restarted and running with the new hot fix image. This may take a while
```
% oc get pods -n open-cluster-management-agent-addon |grep application-manager
application-manager-bd4f7c5db-zvsvx 1/1 Running 0 50m
```

## Patching hub cluster (ACM <= 2.4)

In `open-cluster-management` namespace on ACM hub cluster, edit the advanced-cluster-management.v2.3.0 csv. (or 2.3.2 CSV)

Expand All @@ -10,7 +72,7 @@ oc edit csv advanced-cluster-management.v2.3.0 -n open-cluster-management

Look for containers **multicluster-operators-standalone-subscription** and **multicluster-operators-hub-subscription** and update their images to `quay.io/open-cluster-management/multicluster-operators-subscription:TAG` (it is recommended you note the current **SHA** tag if you want to revert the change). Replace `TAG` with the actual image tag (use `latest` to get the latest upstream version). This will recreate `multicluster-operators-standalone-subscription-xxxxxxx` and `multicluster-operators-hub-subscription-xxxxxxx` pods in `open-cluster-management` namespace. Check that the new pods are running with the new container image.

## Patching managed clusters
## Patching managed clusters (ACM <= 2.4)

If you are patching `local-cluster` managed cluster, which is the ACM hub cluster itself, run this command.

Expand Down
111 changes: 79 additions & 32 deletions docs/troubleshooting_guidence.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,50 +50,47 @@ I0207 22:59:38.422603 1 mcmhub_controller.go:726] subscription-hub-reconci
I0207 22:59:38.422618 1 mcmhub_controller.go:518] subscription-hub-reconciler/secondsub/second-level-sub "msg"="exit Hub Reconciling
...
```
### Set up log level for the hub subscription pod

- Open the ACM csv, append the log level to 1, save the csv
### Set up log level for the hub subscription pod (ACM >=2.7)
- On the hub, pause the MCH operator
```
% oc annotate mch -n open-cluster-management multiclusterhub mch-pause=true --overwrite=true
```

- Open the hub subscription pod, set up the log level to 1, save the pod
```
% oc edit csv -n open-cluster-management advanced-cluster-management.v2.5.0
% oc edit pods -n open-cluster-management multicluster-operators-hub-subscription-5cfdf4bb84-xcc9z

- name: multicluster-operators-hub-subscription
containers:
- command:
- /usr/local/bin/multicluster-operators-subscription
- --sync-interval=60
- --v=1
containers:
- command:
- /usr/local/bin/multicluster-operators-subscription
- --sync-interval=60
- --v=1
```

- Make sure the hub subscription pod is restarted to run.
- Make sure the hub subscription pod is restarted and running.
- Check more details from the hub subscription pod log

### Set up memory limit for the hub subscription pod

- Open the ACM csv, search the `multicluster-operators-hub-subscription` container, update the memory limit, save the csv

### Set up memory limit for the hub subscription pod (ACM >=2.7)
- On the hub, pause the MCH operator
```
% oc annotate mch -n open-cluster-management multiclusterhub mch-pause=true --overwrite=true
```
% oc edit csv -n open-cluster-management advanced-cluster-management.v2.5.0

- name: multicluster-operators-hub-subscription
spec:
replicas: 1
selector:
matchLabels:
app: multicluster-operators-hub-subscription
......
- Open the hub subscription pod, update the memory limit, save the pod
```
% oc edit pods -n open-cluster-management multicluster-operators-hub-subscription-5cfdf4bb84-xcc9z

resources:
limits:
cpu: 750m
memory: 2Gi ================> this is the hub subscription pod memory limit, update it to 4Gi for example.
requests:
cpu: 150m
memory: 128Mi
resources:
limits:
cpu: 750m
memory: 2Gi ================> this is the hub subscription pod memory limit, update it to 4Gi for example.
requests:
cpu: 150m
memory: 128Mi

```
- verify the hub subscription pod should be restarted with the new memory limit. It could take a while for OLM to be reconciled to do so.

- verify the hub subscription pod is restarted and running with the new memory limit.
```
% oc get pods -n open-cluster-management |grep hub-sub
multicluster-operators-hub-subscription-58858c488f-c52zt 1/1 Running 2 (28h ago) 27d
Expand Down Expand Up @@ -229,7 +226,7 @@ search for Deployment. Set spec.replicas to 0:
% oc get pods -n open-cluster-management-agent-addon |grep klusterlet-addon-appmgr
klusterlet-addon-appmgr-794d76bcbf-tbsn5 1/1 Running 0 14s
```
### Set up memory limit for the managed subscription pod (ACM >= 2.5)
### Set up memory limit for the managed subscription pod (ACM in 2.5 and 2.6)

- On the hub cluster, pause mch reconcile
```
Expand All @@ -254,6 +251,56 @@ search for Deployment. Set spec.replicas to 0
% oc get pods -n open-cluster-management-agent-addon |grep application-manager
```

### Set up memory limit for the managed subscription pod (ACM >= 2.7)
- Enable addondeploymentconfigs to be used in the application-manager addon on all managed clusters
```
apiVersion: addon.open-cluster-management.io/v1alpha1
kind: ClusterManagementAddOn
metadata:
name: application-manager
spec:
addOnMeta:
description: Processes events and other requests to managed resources.
displayName: Application Manager
supportedConfigs:
- group: addon.open-cluster-management.io
resource: addondeploymentconfigs
```

- Specify memory request and memory limit in the AddOnDeploymentConfig created in a managed cluster NS e.g. `cluster1`
```
apiVersion: addon.open-cluster-management.io/v1alpha1
kind: AddOnDeploymentConfig
metadata:
name: deploy-config
namespace: cluster1
spec:
customizedVariables:
- name: RequestMemory
value: 512Mi
- name: LimitsMemory
value: 4Gi
```

- Link the AddOnDeploymentConfig CR to the application-manager ManagedClusterAddOn in the same managed cluster NS e.g. `cluster1`
```
apiVersion: addon.open-cluster-management.io/v1alpha1
kind: ManagedClusterAddOn
metadata:
name: application-manager
namespace: cluster1
spec:
installNamespace: open-cluster-management-agent-addon
configs:
- group: addon.open-cluster-management.io
resource: addondeploymentconfigs
namespace: cluster1
name: deploy-config
```

As a result, the new memory limit and memory request will be applied to the application-manager pod on the `cluster1`.
The application-manager pod on different managed clusters could set up different memory limits.

### Set up new image for the managed subscription pod (ACM >= 2.5)

Since ACM 2.5, there is no klusterlet-addon-operator any more. The app addon pod (application-manager) running on the managed cluster is deployed by the hub subscription pod.
Expand Down