Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trident should create a unique SCC when deploying itself to OpenShift #374

Closed
nccurry opened this issue Apr 8, 2020 · 16 comments
Closed

Comments

@nccurry
Copy link

nccurry commented Apr 8, 2020

Describe the bug
As of OpenShift 4.3.8 modifying default SCC objects (including adding arbitrary users and service accounts) will block cluster upgrades.

The trident installer should create a separate SCC, instead of assigning itself to the default 'privileged' SCC, that contains just the permissions it needs to function.

https://bugzilla.redhat.com/show_bug.cgi?id=1821905
https://bugzilla.redhat.com/show_bug.cgi?id=1818893

Environment
Provide accurate information about the environment to help us reproduce the issue.

  • Trident version: 20.01.1
  • Trident installation flags used: default
  • Container runtime: crio
  • Kubernetes version: 1.16.2
  • Kubernetes orchestrator: OpenShift 4.3.8
  • Kubernetes enabled feature gates: Default
  • OS: Red Hat CoreOS 43
  • NetApp backend types: Azure File, ONTAP Nas
  • Other:

To Reproduce
Deploy Trident into OpenShift 4.3.8 cluster
Attempt to upgrade OpenShift 4.3.8 -> 4.3.9

Expected behavior
OpenShift cluster upgrades

Additional context

@nccurry nccurry added the bug label Apr 8, 2020
@gnarl gnarl added the tracked label Apr 8, 2020
@markandrewj
Copy link

Hello, I just wanted to say I am currently using OpenShift in a corporate environment, and we are affected by this at the moment.

@nccurry
Copy link
Author

nccurry commented Apr 9, 2020

@markandrewj You can work around the issue by temporarily removing the Trident service account from the scc. Once the upgrade has started you can add it back.

@markandrewj
Copy link

@nccurry We gave what you suggested a try, and the upgrade still wouldn't progress for us unfortunately.

@nccurry
Copy link
Author

nccurry commented Apr 9, 2020

Try reinitiating the upgrade through either the web console or oc adm upgrade --to="4.3.9"

@markandrewj
Copy link

We tried this too unfortunately.

$ oc adm upgrade --allow-upgrade-with-warnings --to 4.3.9
Updating to 4.3.9

$ oc get clusterversion -o json|jq ".items[0].status.history"
[
  {
    "completionTime": null,
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:f0fada3c8216dc17affdd3375ff845b838ef9f3d67787d3d42a88dcd0f328eea",
    "startedTime": "2020-04-09T20:24:11Z",
    "state": "Partial",
    "verified": false,
    "version": "4.3.9"
  }

@gnarl
Copy link
Contributor

gnarl commented Apr 9, 2020

We will have a fix out for this with the Trident 20.04 release due at the end of the month.

@megabreit
Copy link

@markandrewj When the upgrade does not continue, you need to wait a little longer and maybe restart as suggested. Put the trident user back into the SCC when all 3 kube-* operators are in the state "Updating". I did this 2 times with 4.3.8->4.3.9 and with 4.3.9->4.3.10. It took about 2-3 mins after the update was triggered.
Thinking about it: Maybe a better workaround would be to create a clone of the privileged SCC, add the trident user to it and remove it from privileged SCC.

@markandrewj
Copy link

I just waned to say thanks to everyone for replying so quickly. My colleagues, and myself, will be able to look into this again next week. If what was suggested doesn't work, waiting for the new release would be ok too.

@eparis
Copy link

eparis commented Apr 14, 2020

We (OpenShift) will be working to fix this in 4.3.13 and GREATLY apologize for our screw up. It may still require --force to get to 4.3.13. This is being very actively investigated on our side.

We do suggest that Trident move to using RBAC to access SCCs but we should not have broken what was working. We greatly appreciate your work to help address our (joint) customer's issues.

@markandrewj
Copy link

@eparis Although it is unfortunate we hit this issue, I am happy to see active development around Trident. We started using it in OpenShift 3, and I had some concern that it was going to fall to the wayside in OpenShift 4. Thanks for the help, and keep up the good work.

@eparis
Copy link

eparis commented Apr 14, 2020

FYI you should be able to run
oc adm upgrade --force --to=4.3.9
If it complains that it's already at 4.3.9 you might need to run
oc adm upgrade --clear
Then try again with --force

@markandrewj
Copy link

Thanks for the information, we will give it a try. Just out of curiosity, are there any plans to turn trident into an operator? It works pretty well at the moment, so I don't know how much value there would be, but it seems to be the trend at the moment.

@markandrewj
Copy link

markandrewj commented Apr 14, 2020

We tried what was suggested this afternoon, and our cluster is upgraded now. Thank you to everyone for the help.

@gnarl
Copy link
Contributor

gnarl commented Apr 14, 2020

@markandrewj We are planning to have an Operator in the 20.04 release. We will also keep the existing installer until we are sure the Operator has all the needed functionality.

@gnarl
Copy link
Contributor

gnarl commented Apr 28, 2020

This was fixed in the Trident 20.04 release.

@gnarl gnarl closed this as completed Apr 28, 2020
@markandrewj
Copy link

Awesome guys! Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants