Pass SecurityContext in all the AppRepo Controller's jobs #6605

antgamdia · 2023-08-10T17:50:10Z

Description of the change

Even if the sync jobs were added a security context (by means of each AppRepo CRD), this information was not available for Cleanup jobs. Note that those jobs are spun once a NotFound error is thrown when fetching an AppRepo.

The approach herein applied has been just passing this information via command line arguments. However, to maximize the usefulness of the feature, it is also taken into account in the SyncJobs (if a config is passed via AppRepo CRD, it will override the default one).

Besides, this PR is adding some minor fixes, namely:

fixing a duplicated key.
~~adding some recommended security context params (more or less, the ones used in other Bitnami charts - but Trivy still complains).~~ I have removed those changes, CI was failing... so I think they would deserve their own PR.
fix a missing conditional handling of the CronJob API (v1 vs v1beta1).
passing the whole security context in the AppRepo (as opposed to passing just a single field).
using podSecuirtyContext and containerSecurityContext where actually corresponds.

Benefits

Users will be able to use SecurityContext in all the jobs created by Kubeapps.

Possible drawbacks

N/A (~~let's see what CI thinks :P~~) ~~It seems there are some issues during the chart installation, will look at it next week.~~ After reverting some changes in the actual security context, CI is passing again.

Applicable issues

fixes Cannot configure securityContext of sync and cleanup jobs #6545

Additional information

Here is a diff of the helm template before/after the changes in this PR:

Details

diff --git a/out.yaml b/out.yaml
index a1cd13a03..b2218e54f 100644
--- a/out.yaml
+++ b/out.yaml
@@ -717,6 +717,9 @@ spec:
           image: docker.io/kubeapps/apprepository-controller:latest
           imagePullPolicy: "IfNotPresent"
           securityContext:
             runAsNonRoot: true
             runAsUser: 1001
           command:
@@ -733,6 +736,8 @@ spec:
             - --database-user=postgres
             - --database-name=assets
             - --repos-per-namespace=true
+            - --default-container-security-context='{enabled":true,"runAsNonRoot":true,"runAsUser":1001}'
+            - --default-pod-security-context='{"enabled":true,"fsGroup":1001}'
           env:
             - name: REPO_SYNC_IMAGE
               value: docker.io/kubeapps/asset-syncer:latest
@@ -801,6 +806,9 @@ spec:
           image: docker.io/kubeapps/dashboard:latest
           imagePullPolicy: "IfNotPresent"
           securityContext:
             runAsNonRoot: true
             runAsUser: 1001
           env:
@@ -943,6 +951,9 @@ spec:
           image: docker.io/bitnami/nginx:1.25.1-debian-11-r37
           imagePullPolicy: "IfNotPresent"
           securityContext:
             runAsNonRoot: true
             runAsUser: 1001
           env:
@@ -1039,6 +1050,9 @@ spec:
           image: docker.io/kubeapps/kubeapps-apis:latest
           imagePullPolicy: "IfNotPresent"
           securityContext:
             runAsNonRoot: true
             runAsUser: 1001
           command:
@@ -1081,22 +1095,20 @@ spec:
               containerPort: 50051
           livenessProbe:
             failureThreshold: 6
-            initialDelaySeconds: 60
+            initialDelaySeconds: 10
             periodSeconds: 10
             successThreshold: 1
             timeoutSeconds: 5
             exec:
               command: ["grpc_health_probe", "-addr=:50051"]
-            initialDelaySeconds: 10
           readinessProbe:
             failureThreshold: 6
-            initialDelaySeconds: 0
+            initialDelaySeconds: 5
             periodSeconds: 10
             successThreshold: 1
             timeoutSeconds: 5
             exec:
               command: ["grpc_health_probe", "-addr=:50051"]
-            initialDelaySeconds: 5
           resources:
             limits:
               cpu: 250m
@@ -1286,5 +1298,13 @@ spec:
   url: https://charts.bitnami.com/bitnami
   syncJobPodTemplate:
     spec:
+      containers:
+        -
+          securityContext:
+            runAsNonRoot: true
+            runAsUser: 1001
       securityContext:
-        runAsUser: 1001
+        fsGroup: 1001

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

… tpl Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

netlify · 2023-08-10T17:57:03Z

✅ Deploy Preview for kubeapps-dev canceled.

Built without sensitive environment variables

Name	Link
🔨 Latest commit	`5f83dd8`
🔍 Latest deploy log	https://app.netlify.com/sites/kubeapps-dev/deploys/64da3bbb5a74f40008df90e9

absoludity

Ouch - sorry that it wasn't an easy fix as I'd originally thought.

Even if the sync jobs were added a security context (by means of each AppRepo CRD), this information was not available for Cleanup jobs. Note that those jobs are spun once a NotFound error is thrown when fetching an AppRepo.

Right, that signals that it's too late to be running the clean-up job at that point. Kubernetes has a solution for that (see below).

The approach herein applied has been just passing this information via command line arguments. However, to maximize the usefulness of the feature, it is also taken into account in the SyncJobs (if a config is passed via AppRepo CRD, it will override the default one).

Why would we provide AppRepo CRD-specific security context, if the controller is going to run with a security context to use by default anyway?

See what you think, but I reckon the native way to solve this would be for us to instead use finalizers for the AppRepo CRDs (which allow the controller to run clean-up tasks before the resource is deleted).

chart/kubeapps/templates/kubeappsapis/deployment.yaml

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

antgamdia · 2023-08-14T16:49:34Z

Why would we provide AppRepo CRD-specific security context, if the controller is going to run with a security context to use by default anyway?

It does not really make much sense, I agree. The point is that, currently, we do support the customization of the syncJobPodTemplate as it is part of our CRD. Before this PR, the only way to pass this information is by setting the pod template in the CRD... which is precisely adding per-repo security context (as opposed to the jobs to inherit the controller's context).

Whether or not using a custom security context in each repo makes actually sense... I don't know; haven't thoroughly thought about it - though I have the impression it could be useful in some contexts where the sync job is run in a certain restricted namespace somehow.

Coming back to this PR, now we would have two different ways of specifying a security context: 1) using the existing CRD's syncJobPodTemplate field and/or setting a default value in the AppRepoController side.
The point is: what should be the expected behavior when upgrading Kubeapps (assuming existing jobs) and setting the controller-wide default?

A) If a controller-wide default is found AND a CRD-scoped config is found, then solely use the default -as is- in every job.
B) If a controller-wide default is found AND a CRD-scoped config is found, then solely use the CRD's -as is- in every job.
C) If a controller-wide default is found AND a CRD-scoped config is found, then use both and, if they conflict, preserve default's
D) If a controller-wide default is found AND a CRD-scoped config is found, then use both and, if they conflict, preserve CRD's

In this PR, I opted for option D, as I see it as the less disruptive maximizing backward-compatibility,

See what you think, but I reckon the native way to solve this would be for us to instead use finalizers for the AppRepo CRDs (which allow the controller to run clean-up tasks before the resource is deleted).

+1000, the current approach based upon syncJobs+cleanupJobs seems like a bad smell given the current myriad of k8s-native solutions... but if we start improving it... we would end up in sth like the flux HelmRepository.
Anyway, we could open a new issue for trying to, al least, move from clean-up jobs to finalizers.

absoludity · 2023-08-15T00:34:57Z

Why would we provide AppRepo CRD-specific security context, if the controller is going to run with a security context to use by default anyway?

It does not really make much sense, I agree. The point is that, currently, we do support the customization of the syncJobPodTemplate as it is part of our CRD. Before this PR, the only way to pass this information is by setting the pod template in the CRD... which is precisely adding per-repo security context (as opposed to the jobs to inherit the controller's context).

Sorry, my thought was more that the AppRepo CRD-specific security context makes sense at the moment, but if we add a default security context then it no longer makes sense (to me?). In a multi-tenant use-case, having AppRepo specific security context allows different users to utilise different resources. But if they all have access to the default context, then you no longer have multi-tenancy for those resources (nodes, or whatever are protected by the security context).

Whether or not using a custom security context in each repo makes actually sense... I don't know; haven't thoroughly thought about it - though I have the impression it could be useful in some contexts where the sync job is run in a certain restricted namespace somehow.

Yes.

Coming back to this PR, now we would have two different ways of specifying a security context: 1) using the existing CRD's syncJobPodTemplate field and/or setting a default value in the AppRepoController side. The point is: what should be the expected behavior when upgrading Kubeapps (assuming existing jobs) and setting the controller-wide default?
* A) If a controller-wide default is found AND a CRD-scoped config is found, then solely use the default -as is- in every job.

* B) If a controller-wide default is found AND a CRD-scoped config is found, then solely use the CRD's -as is- in every job.

* C) If a controller-wide default is found AND a CRD-scoped config is found, then use both and, if they conflict, preserve default's

* D) If a controller-wide default is found AND a CRD-scoped config is found, then use both and, if they conflict, preserve CRD's
In this PR, I opted for option D, as I see it as the less disruptive maximizing backward-compatibility,

Yes, it's less disruptive, but I'm worried it's adding tech-debt in a direction away from our goal, when it's a minor problem that could be solved in a way that moves us towards our goal.

See what you think, but I reckon the native way to solve this would be for us to instead use finalizers for the AppRepo CRDs (which allow the controller to run clean-up tasks before the resource is deleted).

+1000, the current approach based upon syncJobs+cleanupJobs seems like a bad smell given the current myriad of k8s-native solutions... but if we start improving it... we would end up in sth like the flux HelmRepository. Anyway, we could open a new issue for trying to, al least, move from clean-up jobs to finalizers.

Yeah - the question is: is there a positive step forward here that we could take that's simple and doesn't add something moving us away from our goal? Not sure, but one idea that comes to mind: we could instead update to run the cleanup code when the app repo is deleted by Kubeapps - it's not perfect as it won't cover the case where the app repo is deleted by the CLI, but it's a step towards doing the work in the controller triggered (later) by a finalizer. (I don't think it's a huge amount of work, the cleanup).

I'm going to leave a +1 on this, because I know that you'll think about this in more detail than I have, considering this info, and make a good decision based on what you think (either way), I just wanted to be sure you understood my concern about adding something that may hinder us in the future, when there may be other steps forwards more aligned with the goal of using a finalizer.

chart/kubeapps/templates/kubeappsapis/deployment.yaml

antgamdia · 2023-08-16T18:16:25Z

Haven't had the time to reply yet (will do it tomorrow), but TL;DR, I gave it a try to the finalizers thing... and it seems to work pretty well :)

apprepos.mp4

antgamdia · 2023-08-17T11:41:57Z

Yes, it's less disruptive, but I'm worried it's adding tech-debt in a direction away from our goal, when it's a minor problem that could be solved in a way that moves us towards our goal.

Yep, I agree. I was a bit reluctant to add more features related to the CR lifecycle and just went with the simplistic approach. But if continue doing so, we would never get to fix that tech debt.

is there a positive step forward here that we could take that's simple and doesn't add something moving us away from our goal? Not sure, but one idea that comes to mind: we could instead update to run the cleanup code when the app repo is deleted by Kubeapps

I don't think that fully works since it can even lead to some inconsistent jobs. Not sure if diverging the behavior between what is done by Kubeapps and what is expected from the controller when interacting with the k8s resources directly, is a good idea.

So, in the end, I just added the finalizers thing. It wasn't that difficult to add: #6647 :) Let's move the technical discussion there.

I have also moved the extra changes I performed here to #6646.

absoludity

Excellent, thanks @antgamdia. I'll close this one after reviewing your others (so that we don't accidentally land it). Or maybe I'll switch the review instead.

#6646) ### Description of the change While working on #6605, I noticed I had a bunch of unrelated changes. Instead of overloading the PR, I have extracted them into this one. The main changes are: - Apply the same fix in go Dockerfiles as we did for kubeapps-apis (avoid downloading linters if `lint=false`). It helps reduce the build time locally. - Remove some duplicated keys in the yamls, as we already discussed. - Add some missing apprepo-controller args in the tests, mainly just for completness. - Fix some tests in the apprepo-controller. Mainly just swapping some misplaced `get, want`. - Handle cronjob v1beta1 vs v1 in a missing place. - Pass the `podSecurityContext` and `containerSecurityContext` in the pod template properly. - Update a missing copyright header. - Fix wrong values.yaml metadata (mainly `ocicatalog/ociCatalog`) ### Benefits Get the aforementioned issues solved. ### Possible drawbacks N/A (not adding new logic here) ### Applicable issues - (partially) related to ##6545 ### Additional information N/A --------- Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

@absoludity

### Description of the change Even if the sync jobs were added a security context (by means of each AppRepo CRD), this information was not available for Cleanup jobs. This is mainly due to the fact that those jobs are spun up once a NotFound error is thrown when fetching an AppRepo. However, Kubernetes does have a native approach for dealing with these scenarios: finalizers. In #6605 we proposed a simplistic workaround based on adding more params to the controller... but as suggested in #6605 (comment), moving to finalizers is a better long-term solution. ### Benefits Cleanup jobs are now handled within an existing AppRepo context... meaning we have all the syncJobTemplate available to be used (ie, securityPolicies and so on). ### Possible drawbacks When dealing with finalizers in the past I often found it really annoying when they get stuck and prevent the resource to get deleted. I wonder if we should add some info in the FAQ on how to manually remove the finalizers. Additionally, and this might be something important: for the AppRepo controller to be able to `update` AppRepos in other namespaces != kubeapps.... (to add the finalizer) it now needs extra RBAC. Before we were just granting `...-appprepos-read`.. but now we would need to grant `...-write` as well...and I'm not sure we really want to do so. WDYT, @absoludity ? Another idea is using an admission policy... but not sure again if we want to deal with that... ~(I haven't modified the RBAC yet in this PR)~ Changes have been performed finally ### Applicable issues - fixes #6545 ### Additional information This PR is based on top of #6646, but the main change to review is 6e70910 The rest is just moving code into separate files, mostly. Also, I have been taking a look at `kubebuilder` to create a new controller, based on the `sigs.k8s.io/controller-runtime` rather than on the workqueues we currently have. While it is pretty easy to start with ([see quickstart](https://book.kubebuilder.io/quick-start)), I think it is adding too much complexity (using kustomize, adding rbac proxies, prometheus metrics, etc... I also quickly tried the k8s codegen scripts, but ran into some issues with my setup... but perhaps it's the best option. IMO, at some point we should start thinking about moving towards a new state-of-the-art k8s controller boilerplate. --------- Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

antgamdia added 8 commits August 10, 2023 19:10

Fix duplicate key in yaml

d536c33

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

Add some extra security context fields

e970111

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

Pass the proper container/pod security policy to the app repo syncjob…

0bc010a

… tpl Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

Fix missing cronjob conditional handling

95d13ef

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

Add dault-*-security-context to apprepo controller

ffe8719

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

Set security context to pod/containers in sync/cleanup jobs

3d13f7f

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

Revert wrong change in fsgroup

633bb85

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

Regenerate chart readme

569f74d

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

antgamdia mentioned this pull request Aug 10, 2023

Cannot configure securityContext of sync and cleanup jobs #6545

Closed

vmwclabot added the cla-not-required label Aug 10, 2023

absoludity reviewed Aug 13, 2023

View reviewed changes

chart/kubeapps/templates/kubeappsapis/deployment.yaml Show resolved Hide resolved

antgamdia mentioned this pull request Aug 14, 2023

Add initial values and container to the kubeapps-apis service. #6595

Merged

Merge branch 'main' into 6545-add-seccontext-apprepo

e150b93

antgamdia force-pushed the 6545-add-seccontext-apprepo branch from 7dfcaff to eeada4f Compare August 14, 2023 14:33

Revert "Add some extra security context fields"

5f83dd8

Signed-off-by: Antonio Gamez Diaz <agamez@vmware.com>

antgamdia force-pushed the 6545-add-seccontext-apprepo branch from eeada4f to 5f83dd8 Compare August 14, 2023 14:35

absoludity approved these changes Aug 15, 2023

View reviewed changes

chart/kubeapps/templates/kubeappsapis/deployment.yaml Show resolved Hide resolved

This was referenced Aug 17, 2023

Prequel changes to adding security context to the AppRepo cleanup jobs #6646

Merged

Use finalizers to spin up AppRepo clean-up jobs #6647

Merged

absoludity suggested changes Aug 20, 2023

View reviewed changes

antgamdia closed this Aug 21, 2023

antgamdia deleted the 6545-add-seccontext-apprepo branch December 18, 2023 10:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass SecurityContext in all the AppRepo Controller's jobs #6605

Pass SecurityContext in all the AppRepo Controller's jobs #6605

antgamdia commented Aug 10, 2023 •

edited

Loading

netlify bot commented Aug 10, 2023 •

edited

Loading

absoludity left a comment

antgamdia commented Aug 14, 2023

absoludity commented Aug 15, 2023

antgamdia commented Aug 16, 2023

antgamdia commented Aug 17, 2023

absoludity left a comment

Pass SecurityContext in all the AppRepo Controller's jobs #6605

Pass SecurityContext in all the AppRepo Controller's jobs #6605

Conversation

antgamdia commented Aug 10, 2023 • edited Loading

Description of the change

Benefits

Possible drawbacks

Applicable issues

Additional information

netlify bot commented Aug 10, 2023 • edited Loading

✅ Deploy Preview for kubeapps-dev canceled.

absoludity left a comment

Choose a reason for hiding this comment

antgamdia commented Aug 14, 2023

absoludity commented Aug 15, 2023

antgamdia commented Aug 16, 2023

antgamdia commented Aug 17, 2023

absoludity left a comment

Choose a reason for hiding this comment

antgamdia commented Aug 10, 2023 •

edited

Loading

netlify bot commented Aug 10, 2023 •

edited

Loading