Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bump katib-operators version 0.15.0 -> 0.16-rc.1 for release 1.8 #122

Closed
orfeas-k opened this issue Aug 22, 2023 · 7 comments · Fixed by #123
Closed

bump katib-operators version 0.15.0 -> 0.16-rc.1 for release 1.8 #122

orfeas-k opened this issue Aug 22, 2023 · 7 comments · Fixed by #123
Labels
Kubeflow 1.8 This issue affects the Charmed Kubeflow 1.8 release

Comments

@orfeas-k
Copy link
Contributor

This issue tracks the process of bumping katib-operators version from 0.15.0 to 0.16.0-rc.1. For the process, we 're following our internal release handbook document that has a section about manifest files upgrades and images upgrades.

The changes that this process will introduce should match the upstream ones kubeflow/manifests#2506 and they are expected to be templates updates and image versions' bumps.

@orfeas-k
Copy link
Contributor Author

As manifests source for release 1.8, we used the kubeflow/manifests repo and did a

kustomize build apps/katib/upstream/installs/katib-with-kubeflow >> katib1.8.yaml

The produced manifests from above were also used as a source for image versions too.

@orfeas-k orfeas-k added the Kubeflow 1.8 This issue affects the Charmed Kubeflow 1.8 release label Aug 22, 2023
@orfeas-k
Copy link
Contributor Author

Regarding this label change, we didn't apply the following label katib.kubeflow.org/metrics-collector-injection: disabled since this is a podspec charm and we can't add labels directly to the deployment. Note that this label injects a sidecar container to a pod for collecting metrics, and we don't that need that for the controller, so disabling it would make sense but skipping this shouldn't create a problem.

@orfeas-k
Copy link
Contributor Author

Regarding this volume change that introduces a volume of type configMap and mounts it to the deployment, this was translated to a volumeConfig in the container.

@orfeas-k
Copy link
Contributor Author

Regarding the webhook failurePolicy removal, I went ahead and removed the failurePolicy from webhooks applied also, but note that the failurePolicy removed was different than the one removed in upstream (fail as opposed to ignore).

@orfeas-k
Copy link
Contributor Author

orfeas-k commented Aug 22, 2023

Regarding RBAC changes, katib-controller charm is still a podspec charm and roles are defined here. As we see, we haven't been defining roles strictly until now and instead we 're just using verbs: ["*"] for all resources. Thus, we will instead add secrets following the same pattern. Strictly defined RBAC policies should be introduced during the charm's rewrite to sidecar.

@orfeas-k
Copy link
Contributor Author

Regarding gRPC health check change, upstream Katib changed the way they perform health checks and moved from using the grpc_health_probe binary and Kubernetes exec probe to using native k8s gRPC health probe. This change was introduced in the following PR kubeflow/katib#2189.

Unfortunately, Pebble doesn't support grpc health checks ATM and it is recommended to configure the check with the exec command. Considering that the above PR also removes the binary from the Dockerfile, we could overcome this since we are using rock for this. Thus, we will not introduce any change to the health checks performed by katib-controller.

@orfeas-k
Copy link
Contributor Author

Some key points about the volume change from previous comment and the args field for the container.

  • In order to properly mount the volume in our charm, we had to use the content field as we do in seldon-core charm and as indicated here https://discourse.charmhub.io/t/writing-a-kubernetes-charm/159.
  • Just by doing that and following upstream manifests changes though (meaning that we used katib-config.yaml as a mountPath in the volumeConfig and this line in args "--katib-config=/katib-config.yaml", we ended up with the container error below
Defaulted container "katib-controller" out of: katib-controller, juju-pod-init (init)
{"level":"error","ts":"2023-08-23T13:31:50Z","logger":"entrypoint","msg":
"Failed to get KatibConfig","error":"failed to parse katib-config.yaml in ConfigMap: 
katib-config: read /katib-config.yaml: is a directory","stacktrace":"main.main
\n\t/go/src/github.com/kubeflow/katib/cmd/katib-controller/v1beta1/main.go:71
\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"}

In order to overcome this, we changed to volume config to

                            {
                                "name": "katib-config",
                                "mountPath": "/katib-config",
                                "files": [
                                    {
                                        "path": "katib-config.yaml",
                                        "content": render_template(
                                            "src/templates/katib-config.yaml.j2",
                                            self.katib_config_context,
                                        ),
                                    }
                                ],
                            },

and args to

                        "args": ["--katib-config=/katib-config/katib-config.yaml"],

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Kubeflow 1.8 This issue affects the Charmed Kubeflow 1.8 release
Projects
Development

Successfully merging a pull request may close this issue.

1 participant