Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SCC and RBAC handling for DevWorkspaces #954

Merged
merged 10 commits into from
Nov 7, 2022

Conversation

amisevsk
Copy link
Collaborator

What does this PR do?

  • Update how the controller.devfile.io/scc is handled:
    • DWO no longer updates the SCC itself to add workspace serviceaccounts, removing the need for a finalizer there
    • use permissions are granted to the workspace serviceaccount via role/rolebinding
    • DWO no longer needs get and update permissions on the SCC to configure workspaces to use the SCC
      • An exception to this is if the old way of handling the attribute was used, in which case those permissions are still required to clean up any remaining workspaces with serviceaccount finalizers.
  • Update how workspace RBAC is handled in general
    • Move to new, more understandable names for the role and rolebinding
    • Rather than granting devworkspace roles to all serviceaccounts in a namespace, grant them specifically to workspace serviceaccounts
    • Use one role/rolebinding for default workspace permissions, and another role/rolebinding per SCC used.
    • Clean up roles and rolebindings when all workspaces are deleted

Commits should have explanatory messages to detail what each one does. This PR marks some constants/functions as deprecated:

  • The serviceaccount.controller.devfile.io finalizer is deprecated, along with functionality for clearing that finalizer, as it is no longer used.
  • Old RBAC code is removed but there is a function to clean up old roles and rolebindings when starting a workspace.

In both cases, DWO supports cleaning up resources created/updated prior to this commit. Removal of the deprecated code should be done for DWO 0.19.

This PR changes the requirements for using a custom SCC with new workspaces. Instead of granting the DevWorkspace Operator serviceaccount get and update permissions on the SCC to be used, DWO now only requires the use permissions:

cat <<EOF | oc apply -f -         
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole                        
metadata:
  name: dwo-access-to-scc
  labels:
    test: dwo-scc-access
rules:                 
- apiGroups:
    - "security.openshift.io"
  resources:                                    
    - "securitycontextconstraints"                 
  resourceNames:
    - "container-build"      
  verbs:          
    - "use"                                          
---                         
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:         
  name: dwo-access-to-scc
  labels:              
    test: dwo-scc-access
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: dwo-access-to-scc
subjects:
- kind: ServiceAccount
  name: devworkspace-controller-serviceaccount
  namespace: ${NAMESPACE}
EOF

Note: if any workspaces exist that use the serviceaccount.controller.devfile.io finalizer, the get and update permissions for that SCC are still required by DWO in order to clear the finalizer when th workspace is deleted. To remove the need for get and update permissions, first delete all workspaces that have the serviceaccount finalizer. To see which workspaces use the finalizer, use

oc get dw -A -o custom-columns=Name:metadata.name,Namespace:metadata.namespace,Finalizers:metadata.finalizers

and search for serviceaccount.controller.devfile.io.

What issues does this PR fix or reference?

Part of #884 (see #884 (comment))

Is it tested? How?

As there are significant changes in this PR, there are a number of different things to test. Test cases should cover major functionality. To test migration processes, it is necessary to create + start workspaces using the current main branch commit, then upgrade to the PR changes and trigger a reconcile for those workspaces.

  1. Verify that the old workspace role (workspace) and rolebinding (devworkspacedw) are deleted when starting a workspace with changes from this PR.
  2. Verify that old workspaces using the scc attribute can still be deleted (i.e. that the finalizer is cleared and the workspace serviceaccount is removed from the SCC)
  3. Verify that new roles/rolebindings are cleaned up:
    • If all workspaces are deleted, the role/rolebinding are deleted also
    • If multiple workspaces exist and one is deleted, that workspace's serviceaccount is removed from the shared rolebinding
  4. Verify that SCC support continues to work as before:
    • Role and rolebinding named devworkspace-use-<scc-name> should be created in the namespace
    • Workspaces that request that SCC have their serviceaccount added to the rolebinding's subjects
    • Deletion of workspaces cleans up SCC roles/rolebindings as described in 3.

PR Checklist

  • E2E tests pass (when PR is ready, comment /test v8-devworkspace-operator-e2e, v8-che-happy-path to trigger)
    • v8-devworkspace-operator-e2e: DevWorkspace e2e test
    • v8-che-happy-path: Happy path for verification integration with Che

@codecov
Copy link

codecov bot commented Oct 20, 2022

Codecov Report

Base: 49.52% // Head: 53.53% // Increases project coverage by +4.01% 🎉

Coverage data is based on head (da3ce11) compared to base (f64d213).
Patch coverage: 87.94% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #954      +/-   ##
==========================================
+ Coverage   49.52%   53.53%   +4.01%     
==========================================
  Files          38       43       +5     
  Lines        2417     2699     +282     
==========================================
+ Hits         1197     1445     +248     
- Misses       1095     1123      +28     
- Partials      125      131       +6     
Impacted Files Coverage Δ
pkg/provision/workspace/rbac/common.go 68.75% <68.75%> (ø)
pkg/provision/workspace/rbac/migrate.go 82.97% <82.97%> (ø)
pkg/provision/workspace/rbac/finalize.go 85.71% <85.71%> (ø)
pkg/provision/workspace/rbac/rolebinding.go 88.88% <88.88%> (ø)
pkg/provision/workspace/rbac/role.go 95.12% <95.12%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@AObuchow
Copy link
Collaborator

AObuchow commented Oct 24, 2022

I experienced some issues with the migration steps, though I may have made a mistake somewhere. Here are the steps I followed as well as the results I got:

  1. Checked out the main branch whiled logged on to an OpenShift 4.11 cluster
  2. Set up my DWO_IMG environment variable to point to my quay,io repo, then did a make docker install
  3. Followed the instructions from the original SCC PR. I made sure that the container-build workspace had the appropriate scc attribute set
  4. Checked out the changes from this PR and did a make docker restart and then waited for the controller manager deployment to have 1 pod

Then I started testing the migration process:

Verify that the old workspace role (workspace) and rolebinding (devworkspacedw) are deleted when starting a workspace with changes from this PR.

I started the theia-next.yaml sample workspace, but the workspace role and devworkspacedw rolebinding were not deleted.

Verify that old workspaces using the scc attribute can still be deleted (i.e. that the finalizer is cleared and the workspace serviceaccount is removed from the SCC)

I then tried deleting the container-build workspace (which uses the scc attribute) and it was deleted successfully. I saw that the finalizer was cleared (not sure if this is the correct way to check, but doing a oc get dw -A -o custom-columns=Name:metadata.name,Namespace:metadata.namespace,Finalizers:metadata.finalizers only listed the rbac.controller.devfile.io finalizer for the theia-next workspace I had started).

Additionally, I tried to check that the serviceaccount was removed from the container-build SCC, which it seemed to have been:

$ oc adm policy who-can use scc container-build
resourceaccessreviewresponse.authorization.openshift.io/<unknown> 

Namespace: devworkspace-controller
Verb:      use
Resource:  securitycontextconstraints.security.openshift.io

Users:  andrew
        system:admin
        system:serviceaccount:openshift-apiserver-operator:openshift-apiserver-operator
        system:serviceaccount:openshift-apiserver:openshift-apiserver-sa
        system:serviceaccount:openshift-authentication-operator:authentication-operator
        system:serviceaccount:openshift-authentication:oauth-openshift
        system:serviceaccount:openshift-cluster-node-tuning-operator:cluster-node-tuning-operator
        system:serviceaccount:openshift-cluster-storage-operator:cluster-storage-operator
        system:serviceaccount:openshift-cluster-storage-operator:csi-snapshot-controller-operator
        system:serviceaccount:openshift-cluster-version:default
        system:serviceaccount:openshift-config-operator:openshift-config-operator
        system:serviceaccount:openshift-controller-manager-operator:openshift-controller-manager-operator
        system:serviceaccount:openshift-etcd-operator:etcd-operator
        system:serviceaccount:openshift-etcd:installer-sa
        system:serviceaccount:openshift-kube-apiserver-operator:kube-apiserver-operator
        system:serviceaccount:openshift-kube-apiserver:installer-sa
        system:serviceaccount:openshift-kube-apiserver:localhost-recovery-client
        system:serviceaccount:openshift-kube-controller-manager-operator:kube-controller-manager-operator
        system:serviceaccount:openshift-kube-controller-manager:installer-sa
        system:serviceaccount:openshift-kube-controller-manager:localhost-recovery-client
        system:serviceaccount:openshift-kube-scheduler-operator:openshift-kube-scheduler-operator
        system:serviceaccount:openshift-kube-scheduler:installer-sa
        system:serviceaccount:openshift-kube-scheduler:localhost-recovery-client
        system:serviceaccount:openshift-kube-storage-version-migrator-operator:kube-storage-version-migrator-operator
        system:serviceaccount:openshift-kube-storage-version-migrator:kube-storage-version-migrator-sa
        system:serviceaccount:openshift-machine-api:cluster-baremetal-operator
        system:serviceaccount:openshift-machine-config-operator:default
        system:serviceaccount:openshift-network-operator:default
        system:serviceaccount:openshift-oauth-apiserver:oauth-apiserver-sa
        system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount
        system:serviceaccount:openshift-service-ca-operator:service-ca-operator
Groups: system:cluster-admins
        system:masters

After doing all of the above, I wanted to see if I could delete the theia-next workspace and make it again. When I tried deleting the workspace, the oc delete dw command got hung up so I checked the DWO logs and saw the following:

{
   "level":"info",
   "ts":1666634551.359788,
   "logger":"controllers.DevWorkspace",
   "msg":"Reconciling Workspace",
   "Request.Namespace":"devworkspace-controller",
   "Request.Name":"theia-next",
   "devworkspace_id":"workspace21d193cb97db4179",
   "resolvedConfig":""
}{
   "level":"info",
   "ts":1666634551.3598063,
   "logger":"controllers.DevWorkspace",
   "msg":"Finalizing DevWorkspace",
   "Request.Namespace":"devworkspace-controller",
   "Request.Name":"theia-next",
   "devworkspace_id":"workspace21d193cb97db4179"
}{
   "level":"info",
   "ts":1666634551.361374,
   "logger":"controllers.DevWorkspace",
   "msg":"failed to delete role devworkspace-default-role in namespace devworkspace-controller: roles.rbac.authorization.k8s.io \"devworkspace-default-role\" is forbidden: User \"system:serviceaccount:devworkspace-controller:devworkspace-controller-serviceaccount\" cannot delete resource \"roles\" in API group \"rbac.authorization.k8s.io\" in the namespace \"devworkspace-controller\"",
   "Request.Namespace":"devworkspace-controller",
   "Request.Name":"theia-next",
   "devworkspace_id":"workspace21d193cb97db4179"
}

Let me know if there are other details that would be useful to know, or if I made a mistake somewhere in following the instructions for this PR.

@amisevsk
Copy link
Collaborator Author

amisevsk commented Nov 1, 2022

  1. Checked out the main branch whiled logged on to an OpenShift 4.11 cluster
  2. Set up my DWO_IMG environment variable to point to my quay,io repo, then did a make docker install
  3. Followed the instructions from the original SCC PR. I made sure that the container-build workspace had the appropriate scc attribute set
  4. Checked out the changes from this PR and did a make docker restart and then waited for the controller manager deployment to have 1 pod

As this PR makes changes to the deploy folder (the role for DWO is updated), it's necessary to make install from the PR branch; otherwise you're swapping the PR image into an old installation of DWO, which could cause problems (e.g. the deletion issue you faced).

Did the workspace reach a running state?

The default way to provision roles and rolebindings in DevWorkspace
namespaces was by creating roles and bindings with static names. This
needs to be extended and so the previous way of provisioning needs to be
safely deprecated.

Signed-off-by: Angel Misevski <amisevsk@redhat.com>
Add package pkg/provision/workspace/rbac to handle workspace roles and
rolebindings more carefully. For regular workspaces, each namespace gets
one role and rolebinding. The rolebinding is updated to have all
workspace serviceaccounts in the namespace as subjects, and this list is
updated when workspaces are deleted. If all workspaces in a namespace
are deleted, then the role and rolebinding are deleted as well.

If a workspaces uses the 'controller.devfile.io/scc' attribute, an
additional role+binding for that SCC name is created and managed in the
same way.

Signed-off-by: Angel Misevski <amisevsk@redhat.com>
Add functionality to automatically delete old roles and rolebindings
when a workspace is processed. This removes the previously-used
'workspace' role and 'devworkspacedw' rolebinding, which have no normal
mechanism for their removal.

Signed-off-by: Angel Misevski <amisevsk@redhat.com>
Switch main reconcile loop to use new RBAC sync functions and remove old
rbac files.

Signed-off-by: Angel Misevski <amisevsk@redhat.com>
Since SCCs are added to serviceaccounts via roles and rolebindings, it's
no longer necessary to update SCCs to add workspace serviceaccounts as
users. This step is removed, as is functionality for automatically
adding the SA finalizer to workspaces.

However, code related to finalizing workspaces and cleaning up changes
made to SCCs is kept intact in order to ensure any workspaces that
previously used the finalizer can still be deleted cleanly.

Signed-off-by: Angel Misevski <amisevsk@redhat.com>
Update cluster role used by DWO to add verbs 'delete',
'deletecollection' for roles/rolebindings

Signed-off-by: Angel Misevski <amisevsk@redhat.com>
Update the cache filter for roles and rolebindings to check for
appropriate label. As a result, the old workspace role/rolebinding are
invisible to the controller, so the process for cleaning up old
resources needs to be updated as well.

Signed-off-by: Angel Misevski <amisevsk@redhat.com>
Signed-off-by: Angel Misevski <amisevsk@redhat.com>
Signed-off-by: Angel Misevski <amisevsk@redhat.com>
Signed-off-by: Angel Misevski <amisevsk@redhat.com>
@AObuchow
Copy link
Collaborator

AObuchow commented Nov 1, 2022

As this PR makes changes to the deploy folder (the role for DWO is updated), it's necessary to make install from the PR branch; otherwise you're swapping the PR image into an old installation of DWO, which could cause problems (e.g. the deletion issue you faced).

Okay, thank you for the clarification 👍

My cluster just died while I was still testing, but the workspace role and devworkspacedw rolebinding were not deleted. I will have to try again tomorrow morning to verify this is still happening (and it wasn't just cause the cluster timed-out).

One thing I should note:
I am creating another user (who has permissions to use the container-build SCC), with its own namespace (test-project) on the cluster, and creating workspaces from that user, rather than the kube:admin user.

Did the workspace reach a running state?

IIRC all the workspaces I had started during my previous (incorrect) testing had reached a running state. Issues only occured when trying to delete them.

@amisevsk
Copy link
Collaborator Author

amisevsk commented Nov 2, 2022

Additional debugging:

  1. Enable the experimental features in the DWOC, in case that surfaces more useful logs (and grep through the logs for workspace name/namespace
  2. oc auth can-i delete roles --as system:serviceaccount:${NAMESPACE}:devworkspace-controller-serviceaccount, etc.

@AObuchow
Copy link
Collaborator

AObuchow commented Nov 2, 2022

For some reason, doing a make docker install wasn't resulting in a pull of the new DWO image, so I ended up having to use a different image tag to get the PR image.

Here are some new findings:

  • Before installing the PR image, while logged into kube:admin, I tried checking if the system service account could delete roles in the test-project and devworkspace-controller namespaces. Note that test-project is the namespace created by my secondary user, where the workspace role and devworkspacedw rolebinding were created:
[aobuchow@fedora]$ oc auth can-i delete roles --as system:serviceaccount:${NAMESPACE}:devworkspace-controller-serviceaccount
no
[aobuchow@fedora]$ oc auth can-i delete roles --as system:serviceaccount:test-project:devworkspace-controller-serviceaccount
no
  • After installing the PR image, the command reported that the system service account could delete roles in the devworkspace-controller namespace but not the test-project namespace:
[aobuchow@fedora]$ oc auth can-i delete roles --as system:serviceaccount:test-project:devworkspace-controller-serviceaccount
no
[aobuchow@fedora]$ oc auth can-i delete roles --as system:serviceaccount:${NAMESPACE}:devworkspace-controller-serviceaccount
yes
  • After installing the PR image, the two workspaces I had started, which use the container-build SCC through the SCC attribute, failed. Here is the log for one of the workspaces:
{
   "level":"info",
   "ts":1667411040.449669,
   "logger":"controllers.DevWorkspace",
   "msg":"DevWorkspace failed to start: Error provisioning rbac: roles.rbac.authorization.k8s.io \"devworkspace-use-container-build\" is forbidden: user \"system:serviceaccount:devworkspace-controller:devworkspace-controller-serviceaccount\" (groups=[\"system:serviceaccounts\" \"system:serviceaccounts:devworkspace-controller\" \"system:authenticated\"]) is attempting to grant RBAC permissions not currently held:\n{APIGroups:[\"security.openshift.io\"], Resources:[\"securitycontextconstraints\"], ResourceNames:[\"container-build\"], Verbs:[\"use\"]}",
   "Request.Namespace":"test-project",
   "Request.Name":"container-build",
   "devworkspace_id":"workspace4dc708fd5c484f53"
}

Note: I hadn't yet added the new ClusterRole and ClusterRoleBinding that this PR mentions (I probably should have, as that seems related to the error with the "use" verb being mentioned)

  • After installing the PR image, I saw that the workspace role and devworkspacedw rolebinding were deleted.

My cluster just timed out, so I'm going to try this again, but will apply the new ClusterRole and ClusterRoleBinding from the PR before I install the new DWO image (from the PR)

@dkwon17
Copy link
Collaborator

dkwon17 commented Nov 2, 2022

@amisevsk I was able to successfully get everything working (still in the middle of reading the code right now) except for this part:

  1. Verify that old workspaces using the scc attribute can still be deleted (i.e. that the finalizer is cleared and the workspace serviceaccount is removed from the SCC)

After applying the changes from this PR with make install, and after creating a new workspace, the SA finalizer for the old workspace is not cleared, instead, the rbac.controller.devfile.io finalizer is added:

  finalizers:
    - serviceaccount.controller.devfile.io
    - rbac.controller.devfile.io

but I am still able to delete the workspace and see that the SA is removed from the SCC.

Any ideas why it is not deleting? New workspaces don't have the serviceaccount.controller.devfile.io finalizer, but existing workspaces do

EDIT: actually, I think this is expected:

// FinalizeServiceAccount removes the workspace service account from the SCC specified by the controller.devfile.io/scc attribute.
//
// Deprecated: This should no longer be needed as the serviceaccount finalizer is no longer added to workspaces (and workspaces
// do not update SCCs) but is kept here in order to clear finalizers from existing workspaces on deletion.

Copy link
Collaborator

@AObuchow AObuchow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr: I got things to work as expected.

Here are my observations:

  • Initially, I overwrote the old ClusterRole and ClusterRoleBinding (from the SCC PR) with the new one that has the use verb. I had to revert to the old ClusterRole and ClusterRoleBinding since that failed the workspaces (to be expected and is mentioned in the PR description):
[aobuchow@fedora]$ oc get dw -n test-project2
NAME                DEVWORKSPACE ID             PHASE    INFO
container-build     workspacef0f6eaa7c319408e   Failed   operator does not have permissions to get the 'container-build' SecurityContextConstraints
container-build-2   workspacec01561da72cf40b0   Failed   operator does not have permissions to get the 'container-build' SecurityContextConstraints
  • After deploying the PR image, I saw that the old workspace role and devworkspacedw rolebinding were deleted:
{
   "level":"info",
   "ts":1667417292.5182207,
   "logger":"controllers.DevWorkspace",
   "msg":"deleted deprecated DevWorkspace Role",
   "Request.Namespace":"test-project2",
   "Request.Name":"container-build",
   "devworkspace_id":"workspace127b9a98d3f64ff8"
}
{
   "level":"info",
   "ts":1667417292.5728934,
   "logger":"controllers.DevWorkspace",
   "msg":"deleted deprecated DevWorkspace RoleBinding",
   "Request.Namespace":"test-project2",
   "Request.Name":"container-build-2",
   "devworkspace_id":"workspacebec9f4cdef3b4a88"
}
  • I then saw that the new Role was created but wasn't ready:
{
   "level":"info",
   "ts":1667417292.6531458,
   "logger":"controllers.DevWorkspace",
   "msg":"Created object",
   "Request.Namespace":"test-project2",
   "Request.Name":"container-build-2",
   "kind":"v1.Role",
   "name":"devworkspace-use-container-build"
}{
   "level":"info",
   "ts":1667417292.6531677,
   "logger":"controllers.DevWorkspace",
   "msg":"v1.Role devworkspace-use-container-build is not ready: Created object",
   "Request.Namespace":"test-project2",
   "Request.Name":"container-build-2",
   "devworkspace_id":"workspacebec9f4cdef3b4a88"
}
  • I then deleted both workspaces (which were initially started on a DWO image built from the main branch) and saw that the Service account was successfully cleaned up as well as the old finalizer and RBAC. The new Role and RoleBinding (devworkspace-use-<SCC>) was also deleted.
{
   "level":"info",
   "ts":1667417594.2655234,
   "logger":"controllers.DevWorkspace",
   "msg":"ServiceAccount clean up successful; clearing finalizer",
   "Request.Namespace":"test-project2",
   "Request.Name":"container-build-2",
   "devworkspace_id":"workspacebec9f4cdef3b4a88"
}{
   "level":"info",
   "ts":1667417594.32489,
   "logger":"controllers.DevWorkspace",
   "msg":"Finalizing DevWorkspace",
   "Request.Namespace":"test-project2",
   "Request.Name":"container-build-2",
   "devworkspace_id":"workspacebec9f4cdef3b4a88"
}{
   "level":"info",
   "ts":1667417594.3343432,
   "logger":"controllers.DevWorkspace",
   "msg":"deleted role devworkspace-use-container-build in namespace test-project2",
   "Request.Namespace":"test-project2",
   "Request.Name":"container-build-2",
   "devworkspace_id":"workspacebec9f4cdef3b4a88"
}{
   "level":"info",
   "ts":1667417594.3680623,
   "logger":"controllers.DevWorkspace",
   "msg":"Reconciling Workspace",
   "Request.Namespace":"test-project2",
   "Request.Name":"container-build-2",
   "devworkspace_id":"workspacebec9f4cdef3b4a88",
   "resolvedConfig":""
}{
   "level":"info",
   "ts":1667417594.3680837,
   "logger":"controllers.DevWorkspace",
   "msg":"Finalizing DevWorkspace",
   "Request.Namespace":"test-project2",
   "Request.Name":"container-build-2",
   "devworkspace_id":"workspacebec9f4cdef3b4a88"
}{
   "level":"info",
   "ts":1667417594.3913825,
   "logger":"controllers.DevWorkspace",
   "msg":"deleted rolebinding devworkspace-use-container-build in namespace test-project2",
   "Request.Namespace":"test-project2",
   "Request.Name":"container-build-2",
   "devworkspace_id":"workspacebec9f4cdef3b4a88"
}{
   "level":"info",
   "ts":1667417594.4298122,
   "logger":"controllers.DevWorkspace",
   "msg":"Finalizing DevWorkspace",
   "Request.Namespace":"test-project2",
   "Request.Name":"container-build-2",
   "devworkspace_id":"workspacebec9f4cdef3b4a88"
}{
   "level":"info",
   "ts":1667417594.4702287,
   "logger":"controllers.DevWorkspace",
   "msg":"deleted role devworkspace-default-role in namespace test-project2",
   "Request.Namespace":"test-project2",
   "Request.Name":"container-build-2",
   "devworkspace_id":"workspacebec9f4cdef3b4a88"
}{
   "level":"info",
   "ts":1667417594.5233288,
   "logger":"controllers.DevWorkspace",
   "msg":"Finalizing DevWorkspace",
   "Request.Namespace":"test-project2",
   "Request.Name":"container-build-2",
   "devworkspace_id":"workspacebec9f4cdef3b4a88"
}{
   "level":"info",
   "ts":1667417594.5577114,
   "logger":"controllers.DevWorkspace",
   "msg":"deleted rolebinding devworkspace-default-rolebinding in namespace test-project2",
   "Request.Namespace":"test-project2",
   "Request.Name":"container-build-2",
   "devworkspace_id":"workspacebec9f4cdef3b4a88"
}{
   "level":"info",
   "ts":1667417594.6336832,
   "logger":"controllers.DevWorkspace",
   "msg":"Finalizing DevWorkspace",
   "Request.Namespace":"test-project2",
   "Request.Name":"container-build-2",
   "devworkspace_id":"workspacebec9f4cdef3b4a88"
}{
   "level":"info",
   "ts":1667417594.6337557,
   "logger":"controllers.DevWorkspace",
   "msg":"RBAC cleanup successful; clearing finalizer",
   "Request.Namespace":"test-project2",
   "Request.Name":"container-build-2",
   "devworkspace_id":"workspacebec9f4cdef3b4a88"
}
  • Then I overwrote the old Cluster Role and ClusterRoleBInding with the new one from the PR, and created 2 new workspaces that used the container-build SCC through the SCC workspace attribute
    • Both workspaces started up successfully. Their service accounts were created as well.
    • I saw that the dwo-use-container-build role and rolebinding were created
    • Deleting a single workspace resulted in the workspace's service account being removed from the rolebinding's subjects
    • Deleting both workspaces resulted in the role and rolebinding being deleted

\o/

if err := api.Client.Delete(api.Ctx, role); err != nil {
return err
}
return &RetryError{fmt.Errorf("deleted deprecated DevWorkspace Role")}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to ask why does it return RetryError when deletion succeeds? Is there a reason for restarting the reconciliation instead of continuing it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the general flow we follow for most things. Here it's likely safe to continue the reconcile, but in general the operator works by doing one thing at a time (i.e. given a cluster state, what's the next step towards moving it towards the intended state?). In other words, we're treating "no deprecated DevWorkspace Role exists" as a precondition for continuing with creating the new DevWorkspace Role.

The benefit of doing this is that it should be possible to track what each reconcile did to the cluster (e.g. you would see "reconciling DevWorkspace (...) deleted deprecated DevWorkspace Role".

Though, you make a good point. We could probably get away with just logging that we deleted the role here. I'll update the PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further thought -- the tests use this error to verify migration is proceeding as expected, so I think it's better to leave for now. The current plan is to remove code related to cleaning up old resources around the DWO 0.19.0 or 0.20.0 timeframe.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see thank you @amisevsk , I was just curious, sounds good 👍

@openshift-ci
Copy link

openshift-ci bot commented Nov 2, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: amisevsk, AObuchow, dkwon17

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@amisevsk
Copy link
Collaborator Author

amisevsk commented Nov 3, 2022

@AObuchow a few notes:

For some reason, doing a make docker install wasn't resulting in a pull of the new DWO image,

This is expected behavior from Kubernetes -- OpenShift/Kubernetes do not track image registries looking for new images, so it's necessary to make restart to restart existing containers (which will, at that point, check for a new image). Note that make restart is just rolling out a new deployment, which is how one triggers fetching a new image

Before installing the PR image, while logged into kube:admin, I tried checking if the system service account could delete roles in the test-project and devworkspace-controller namespaces. Note that test-project is the namespace created by my secondary user, where the workspace role and devworkspacedw rolebinding were created:

I think you're misinterpreting the oc auth can-i command. The format is oc auth can-i <verb> <kind> [-n namespace] [--as <user>]. Here, <user> is system:serviceaccount:<namespace>:<sa-name>, where the system:serviceaccount:namespace is how you refer to serviceaccounts in a cluster rather than regular users (c.f. oc auth can-i delete roles --as aobuchow). To that end, asking whether system:serviceaccount:test-project:devworkspace-controller-serviceaccount can do anything will always return no, since that "user" doesn't exist -- there's no devworkspace-controller-serviceaccount in the test-project namespace. To check if the DWO SA can do something in a specific namespace (rather than the one you're currently in), you have to use the -n parameter:

oc auth can-i delete roles -n test-project --as system:serviceaccount:${NAMESPACE}:devworkspace-controller-serviceaccount

where ${NAMESPACE} is the DWO install namespace.

After installing the PR image, the two workspaces I had started, which use the container-build SCC through the SCC attribute, failed. Here is the log for one of the workspaces:

This is expected, as a precondition for using a specific SCC with your workspace is that you let the DWO serviceaccount work with that SCC. The way Kubernetes RBAC works is that even if you are allowed to create Roles/Rolebindings, you cannot use that privilege to grant permissions you yourself do not have (e.g. I can't create a role that gives me cluster-admin unless I'm cluster-admin). What's happening in this case is DWO is attempting to give the DevWorkspace ServiceAccount permission to use the SCC (and thus run the workspace pod with that SCC instead of the default) but this is blocked because DWO itself cannot use that SCC. The ClusterRole/ClusterRoleBinding are required here.

Copy link
Contributor

@ibuziuk ibuziuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amisevsk not sure if this is expected but I got the following error on the already running workspace when updating DWO via make install from main to the version from PR:

  message: >-
    Error provisioning rbac: roles.rbac.authorization.k8s.io
    "devworkspace-use-container-build" is forbidden: user
    "system:serviceaccount:devworkspace-controller:devworkspace-controller-serviceaccount"
    (groups=["system:serviceaccounts"
    "system:serviceaccounts:devworkspace-controller" "system:authenticated"]) is
    attempting to grant RBAC permissions not currently held:

    {APIGroups:["security.openshift.io"],
    Resources:["securitycontextconstraints"], ResourceNames:["container-build"],
    Verbs:["use"]}
  phase: Failed

Note that DWO update happened when the workspace was still RUNNING

@ibuziuk
Copy link
Contributor

ibuziuk commented Nov 3, 2022

after some discussion it was decided to postpone merging the PR since it would break the existing implementation of the .spec.devEnvironments.disableContainerBuildCapabilities on Che Operator level.

The proposed merging cadence would be:

  • release 0.17.0 from the main in the current state
  • update che-operator scc that is provisioned as part of CR with the new ClusterRole permissions
  • merge this PR that would be part of 0.18.0

@amisevsk
Copy link
Collaborator Author

amisevsk commented Nov 3, 2022

To be specific, the breaking change in this PR is that the ClusterRole granted to the DevWorkspace Operator ServiceAccount is updated:

   apiVersion: rbac.authorization.k8s.io/v1
   kind: ClusterRole
   metadata:
     name: dwo-access-to-scc
     labels:
       test: dwo-scc-access
   rules:
   - apiGroups:
       - "security.openshift.io"
     resources:
       - "securitycontextconstraints"
     resourceNames:
       - "container-build"
     verbs:
      - "get"
      - "update
+     - "use"

Since this functionality is in use by the Che Operator, we need to update that operator to grant the additional permission first, or DWO 0.17.0 would be incompatible with Che 7.56.0

@AObuchow
Copy link
Collaborator

AObuchow commented Nov 3, 2022

@AObuchow a few notes:

Thank you very much for clearing up those points @amisevsk 🙏 I really should have read the documentation for oc auth can-i before using it, but now I know :D

@amisevsk amisevsk merged commit 25d533b into devfile:main Nov 7, 2022
@amisevsk amisevsk deleted the update-scc-config branch November 7, 2022 21:44
amisevsk added a commit to amisevsk/che-operator that referenced this pull request Nov 7, 2022
Add 'use' permissions in addition to 'get' and 'update' to be added to
the DevWorkspace Operator ServiceAccount when container build
functionality is enabled. This is required due to changes in the
DevWorkspace Operator in devfile/devworkspace-operator#954

Signed-off-by: Angel Misevski <amisevsk@redhat.com>
amisevsk added a commit to eclipse-che/che-operator that referenced this pull request Nov 10, 2022
Add 'use' permissions in addition to 'get' and 'update' to be added to
the DevWorkspace Operator ServiceAccount when container build
functionality is enabled. This is required due to changes in the
DevWorkspace Operator in devfile/devworkspace-operator#954

Signed-off-by: Angel Misevski <amisevsk@redhat.com>
amisevsk added a commit to amisevsk/devworkspace-operator that referenced this pull request Nov 10, 2022
Reapply changes from PR devfile#954 (commits ac944d5..25d533b) that had been
reverted (PR devfile#972)

The original PR was reverted due to required changes in the Che
Operator. As those are now applied, the original PR can be re-applied.

Signed-off-by: Angel Misevski <amisevsk@redhat.com>
amisevsk added a commit that referenced this pull request Nov 15, 2022
Reapply changes from PR #954 (commits ac944d5..25d533b) that had been
reverted (PR #972)

The original PR was reverted due to required changes in the Che
Operator. As those are now applied, the original PR can be re-applied.

Signed-off-by: Angel Misevski <amisevsk@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants