From cc87f7498c9118bac05b98204db757be055d20b6 Mon Sep 17 00:00:00 2001 From: Christian Muirhead Date: Thu, 10 Jun 2021 12:44:30 +1200 Subject: [PATCH] Document the current process for testing new operator bundle versions (#1544) * Add a doc explaining how to test the operator bundle In the future this should be automated but I wanted to capture the steps before trying to do that. * Remove the manifests directory before regenerating bundle Otherwise if generation has been run before and after a release you can end up with two cluster service version files in the bundle which will prevent it from installing correctly. * Updated with review feedback Thanks @matthchr and @theunrepentantgeek! --- Makefile | 2 +- docs/operator-bundle/testing-and-releasing.md | 207 ++++++++++++++++++ 2 files changed, 208 insertions(+), 1 deletion(-) create mode 100644 docs/operator-bundle/testing-and-releasing.md diff --git a/Makefile b/Makefile index 5337158d23f..dd4535e8e2a 100644 --- a/Makefile +++ b/Makefile @@ -352,7 +352,7 @@ generate-operator-bundle: LATEST_TAG := $(shell curl -sL https://api.github.com/ generate-operator-bundle: manifests @echo "Latest released tag is $(LATEST_TAG)" @echo "Previous bundle version is $(PREVIOUS_BUNDLE_VERSION)" - rm -rf "bundle/$(LATEST_TAG)" + rm -rf "bundle/manifests" kustomize build config/operator-bundle | operator-sdk generate bundle --version $(LATEST_TAG) --channels stable --default-channel stable --overwrite --kustomize-dir config/operator-bundle # This is only needed until CRD conversion support is released in OpenShift 4.6.x/Operator Lifecycle Manager 0.16.x scripts/add-openshift-cert-handling.sh diff --git a/docs/operator-bundle/testing-and-releasing.md b/docs/operator-bundle/testing-and-releasing.md new file mode 100644 index 00000000000..95a0db359b9 --- /dev/null +++ b/docs/operator-bundle/testing-and-releasing.md @@ -0,0 +1,207 @@ +# Testing and releasing the operatorhub bundle for ASO + +This is a guide to producing and testing the operator bundle after an Azure Service Operator (ASO) release. +It's primarily intended as a reference for the ASO development team. + +## Test upgrading from the old bundle to the new one + +Creating a bundle for a new version of the operator requires specifying a `replaces` field that indicates to [OLM](https://olm.operatorframework.io/) which version of the bundle it should be installed over. +When creating a PR to the community-operators repo, one of the items on the checklist asks whether upgrading the bundle from the previous version has been tested. +This section explains how to be able to check that box in good conscience. + +### Before you begin + +You will need: + +* A previous operator hub bundle (currently this is `quay.io/operatorhubio/azure-service-operator:v0.37.0`). +This is the latest released operator bundle, which might not correspond to the last version of the operator image (if some versions have been skipped). +* A newly published operator image that needs packaging into an operator bundle (for example `mcr.microsoft.com/k8s/azureserviceoperator:1.0.23956`) + +### Set up your cluster + +1. Create an [AKS cluster with ACR integration](https://docs.microsoft.com/en-us/azure/aks/cluster-container-registry-integration#create-a-new-aks-cluster-with-acr-integration). +(In theory these steps could work with a `kind` cluster, but the registry management will need some changing.) +2. Install OLM: + ```sh + curl -sL https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.18.1/install.sh | bash -s v0.18.1 + ``` +3. Install cert-manager: + ```sh + k apply -f https://github.com/jetstack/cert-manager/releases/download/v1.3.1/cert-manager.yaml + ``` +4. Install the `opm` tool to build the index image: + * Follow [these instructions](https://docs.openshift.com/container-platform/4.5/operators/admin/olm-managing-custom-catalogs.html#olm-installing-opm_olm-managing-custom-catalogs) to get the `opm` binary. + * Use docker rather than podman to log into the redhat registry - the path for auth credentials will be `~/.docker/config.json` if it's installed from the docker package archive (as opposed to from a snap). + * The `oc image extract` command is convenient for getting the binary out of the right image layer. If you don't have/want oc installed you can find it manually using `docker image save` and `tar` to find and extract it. +5. Log in to your ACR: + ```sh + az login + az acr login --name + ``` + +### Make the bundle image and index + +An operator bundle is a docker image containing the yaml of the CRDs and a [ClusterServiceVersion](https://docs.openshift.com/container-platform/4.7/rest_api/operatorhub_apis/clusterserviceversion-operators-coreos-com-v1alpha1.html) that defines the operator webhooks and deployment. +The bundle index is a docker image that contains a registry of bundles (potentially in different versions) - for our testing we'll be making a minimal one that just contains the current version and the new version of the ASO bundle. + +0. Update the `PREVIOUS_BUNDLE_VERSION` value in the Makefile if it's not right (and commit the change). +1. Run `make generate-operator-bundle` - this will produce the source files for a new bundle with the same version as the new operator version. +2. Set these environment variables to enable copying commands from the steps below: + ```sh + export REGISTRY_FQDN= + export PREVIOUS_BUNDLE_VERSION= + export NEW_BUNDLE_VERSION= + ``` +3. Build the new bundle image: + ```sh + docker build -f bundle.Dockerfile --tag ${REGISTRY_FQDN}/operatorhubio/azure-service-operator:v${NEW_BUNDLE_VERSION} . + ``` +4. Push the new bundle to the registry: + ```sh + docker push ${REGISTRY_FQDN}/operatorhubio/azure-service-operator:v${NEW_BUNDLE_VERSION} + ``` +5. Pull the previous bundle version from the `quay.io` registry: + ```sh + docker pull quay.io/operatorhubio/azure-service-operator:v${PREVIOUS_BUNDLE_VERSION} + ``` +6. Tag the previous bundle for your test registry: + ```sh + docker tag quay.io/operatorhubio/azure-service-operator:v${PREVIOUS_BUNDLE_VERSION} ${REGISTRY_FQDN}/operatorhubio/azure-service-operator:v${PREVIOUS_BUNDLE_VERSION} + ``` +7. Push that bundle to your registry as well: + ```sh + docker push ${REGISTRY_FQDN}/operatorhubio/azure-service-operator:v${PREVIOUS_BUNDLE_VERSION} + ``` +8. Make the index image that points to the two bundles: + ```sh + opm -c docker index add \ + --bundles ${REGISTRY_FQDN}/operatorhubio/azure-service-operator:v${PREVIOUS_BUNDLE_VERSION},${REGISTRY_FQDN}/operatorhubio/azure-service-operator:v${NEW_BUNDLE_VERSION} \ + --tag ${REGISTRY_FQDN}/operatorhubio/azure-service-operator-index:test1 + ``` +9. Push the index to the registry: + ```sh + docker push ${REGISTRY_FQDN}/operatorhubio/azure-service-operator-index:test1 + ``` + +### Create your test [CatalogSource](https://docs.openshift.com/container-platform/4.7/rest_api/operatorhub_apis/catalogsource-operators-coreos-com-v1alpha1.html) + +The catalog source will point to your new index image: +```yaml +apiVersion: operators.coreos.com/v1alpha1 +kind: CatalogSource +metadata: + name: aso-test + namespace: olm +spec: + displayName: Source for testing ASO + image: /operatorhubio/azure-service-operator-index:test1 + publisher: aso + sourceType: grpc +``` + +Replace the registry name with yours, and apply with `k apply -f catalogsource.yaml`. + +You should see that the catalog operator has started a pod from the index image. You can watch it interacting with the index pod using: +```sh +k logs -n olm deploy/catalog-operator -f +``` + +### Create your [Subscription](https://docs.openshift.com/container-platform/4.7/rest_api/operatorhub_apis/subscription-operators-coreos-com-v1alpha1.html) + +This will trigger the OLM to install the operator from the test catalog, starting with the previous version. +```yaml +apiVersion: operators.coreos.com/v1alpha1 +kind: Subscription +metadata: + labels: + operators.coreos.com/azure-service-operator.operators: "" + name: my-azure-service-operator + namespace: operators +spec: + channel: stable + name: azure-service-operator + source: aso-test + sourceNamespace: olm + installPlanApproval: Manual + startingCSV: azure-service-operator.v +``` + +Adjust for your starting version and create this with `k apply -f subscription.yaml`. + +The `installPlanApproval: Manual` setting tells OLM to create an unapproved [InstallPlan](https://docs.openshift.com/container-platform/4.7/rest_api/operatorhub_apis/installplan-operators-coreos-com-v1alpha1.html) that needs to be approved manually before that version of the operator will be installed. +This plus the `startingCSV` field pointing at the previous version of the operator will let you see the upgrade process at each step. + +### Approve installation of the old version + +You should see an unapproved InstallPlan created in the `operators` namespace for the old version of the operator bundle. +```sh +k get -n operators installplans +``` + +Edit or patch it to set `spec.approved` to `true`. +```sh +k edit -n operators installplans install- +``` + +The previous version of the operator will install and start up. You can check progress by looking at the yaml for the install plan and [ClusterServiceVersion](https://docs.openshift.com/container-platform/4.7/rest_api/operatorhub_apis/installplan-operators-coreos-com-v1alpha1.html), the logs from the catalog operator, and eventually the state of the pods in the `operators` namespace. + +### Approve installation of the new version + +Once the old version of the operator is installed, the OLM will create another install plan to upgrade it to the new version. +Approve that plan in the same way as the previous one and watch progress. + +### Troubleshooting + +If there's an issue with the new bundle, it will probably show up as a condition on the install plan (as well as logging by the catalog operator). +Problems encountered in the past include: + +* Validation errors in the CSV (particularly the `webhookdefinitions` section). +* Validation errors when applying the CRDs (eg. incorrect conversion fields after the v1beta1 -> v1 switch) +* Incorrect container image reference in deployment - this manifested as the install plan stuck in Installing (no errors) but the deployed pods stuck failing to pull the image. + +To back out the installed operator and retry: + +1. Fix the error and push a new bundle image as above - be sure to push the bundle image to the same version tag since that's what the index refers to. +2. Delete the subscription (will also remove the install plans): + ```sh + k delete -n operators subscriptions my-azure-service-operator + ``` +3. Delete the CSVs (this will also remove any deployments/services/pods etc.): + ```sh + k delete -n operators csv azure-service-operator.v${PREVIOUS_BUNDLE_VERSION} azure-service-operator.v${NEW_BUNDLE_VERSION} + ``` +4. Remove the CRDs: + ```sh + k api-resources --api-group azure.microsoft.com -o name | xargs kubectl delete customresourcedefinition + ``` +5. The catalog operator creates a configmap of each bundle image to load the CSV and CRDs from. +This won't be recreated from the updated image if it already exists (even though the content is out of date), so you need to delete it. +They're named with hashes but you can look at the `olm.sourceImage` annotation to find the right one to delete. + ```sh + k get -n olm configmap -o jsonpath='{range .items[*]}{@.metadata.name} {@.metadata.annotations.olm\.sourceImage}{"\n"}{end}' + k delete -n olm configmap + ``` +6. Then recreating the subscription as above will restart the install/upgrade process. + +## Creating PRs to add the new version to the [community-operators](https://github.com/operator-framework/community-operators) repo + +The embedded OpenShift operator store and [OperatorHub](https://operatorhub.io) are both populated from the community-operators repository. +The OpenShift store bundles are under the [`community-operators`](https://github.com/operator-framework/community-operators/tree/master/community-operators) directory, while OperatorHub ones are under [`upstream-community-operators`](https://github.com/operator-framework/community-operators/tree/master/upstream-community-operators). + +Once the bundle has been tested, separate PRs should be created for each of the two destination directories. + +1. Create a fork and local clone of the [`community-operators`](https://github.com/operator-framework/community-operators) repo if you haven't already got one. +2. Create a new branch, for example `upstream-azure-service-operator-${NEW_BUNDLE_VERSION}`. +3. Copy the manifests directory to the destination folder, renaming it with the new version number. + ```sh + cp -r bundle/manifests ~/dev/community-operators/upstream-community-operators/azure-service-operator/${NEW_BUNDLE_VERSION} + ``` + 4. Edit the package file `community-operators/upstream-community-operators/azure-service-operator/azure-service-operator.package.yaml` to point to the new bundle version. + ```yaml + channels: + - currentCSV: azure-service-operator.v + name: stable + defaultChannel: stable + packageName: azure-service-operator + ``` +5. Commit the changes, push them and create a PR.