-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Securing the cluster created by Juju #47835
Securing the cluster created by Juju #47835
Conversation
Hi @ktsakalozos. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
24f4b2c
to
edf1478
Compare
@@ -21,7 +21,8 @@ and then relate the `kubernetes-e2e` charm. | |||
```shell | |||
juju deploy kubernetes-core | |||
juju deploy cs:~containers/kubernetes-e2e | |||
juju add-relation kubernetes-e2e kubernetes-master | |||
juju add-relation kubernetes-my-worker:kube-api-endpoint kubernetes-master:kube-api-endpoint | |||
juju add-relation kubernetes-my-worker:kube-control kubernetes-master:kube-control |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubernetes-my-worker
-> kubernetes-worker
? Pretty sure the worker in this example is coming from juju deploy kubernetes-core
, hence it would get the default name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, backing up -- this should be relating e2e to master, not worker to master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap, my mistake. Thank you
@@ -44,9 +44,11 @@ def messaging(): | |||
|
|||
missing_services = [] | |||
if not is_state('kubernetes-master.available'): | |||
missing_services.append('kubernetes-master') | |||
missing_services.append('kubernetes-master(http)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think HTTP here is the interface name, not the relation name. That's unclear. Perhaps scope it properly in juju terms?
kubernetes-master:http
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if not is_state('certificates.available'): | ||
missing_services.append('certificates') | ||
if not is_state('kubeconfig.ready'): | ||
missing_services.append('kubernetes-master(kube-control)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, change the output to kubernetes-master:kube-control
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
set_state('authentication.setup') | ||
|
||
|
||
@when_not('leadership.is_leader') | ||
@when_not('authentication.setup') | ||
def setup_non_leader_authentication(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could removing the @when_not('authentication.setup')
decorator here result in a 5 minute delay?:
setup_non_leader_authentication
handler runspassword_changed
handler runs, removesauthentication.setup
state- reactive does not dispatch
setup_non_leader_authentication
handler again until the next hook
I'm having trouble reasoning through the specific outcome of that, but we've had similar problems with delays caused by using is_state
in combination with state-based dispatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good catch! The password update flow should should go like this:
- User sets new passowrd
- Leader grabs the password and propagates to non-leaders
- Leader restarts
- Non-leaders get the new password and restart.
You will notice I added the 'leadership.is_leader' on the password_changed()
so that only the leader acts on password change.
To your question, I think the change of leadership data will trigger the reactive framework on the non-leaders in a timely fashion so there will not be the 5 minute wait.
controller_opts.add('service-account-private-key-file', service_key) | ||
|
||
remove_state('kubernetes-master.components.started') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should setup_leader_authentication
remove this state too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can do that.
On the leader we have two triggers that cause a reconfiguration of the authentication 1) charm upgrade 2) password change. In both triggers we do
remove_state('authentication.setup')
remove_state('kubernetes-master.components.started')
On the non-leaders the reconfig of the authorisation is triggered after detecting a change on the leader. This is why I had the remove state there.
To make it homogeneous I added the remove state on the setup_leader_authentication
and removed it from the upgrade and change password triggers.
if not db.get(save_salt): | ||
db.set(save_salt, token) | ||
else: | ||
return db.get(save_salt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we're storing passwords in unitdata, what happens if we lose the leader unit? Will a password/token change when we wouldn't want it to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would, this isn't getting syndicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if its better form then to instead use leader-data to handle these values, and keep them as a mutable dict object, being mutated by the leader and subsequently leader_set() after update, so any future-facing leaders get the benefit of having this state tracked for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last commit should address this. I got rid of the local db and everything is read from the token files after they have been synced with the leader
'cni.available', 'kubernetes-worker.restart-needed') | ||
def start_worker(kube_api, kube_control, cni): | ||
def start_worker(kube_api, kube_control, auth_control, cni): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This surprises me -- are we getting two copies of the kube-control interface here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are, each state thats set off of that interface, passes the interface into the method. It got me too when I was prototyping.
kubelet_opts.add('v', '0') | ||
kubelet_opts.add('address', '0.0.0.0') | ||
kubelet_opts.add('port', '10250') | ||
kubelet_opts.add('cluster-dns', dns['sdn-ip']) | ||
kubelet_opts.add('cluster-domain', dns['domain']) | ||
kubelet_opts.add('anonymous-auth', 'false') | ||
kubelet_opts.add('anonymous-auth=false', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened here? If kubelet_opts.add('anonymous-auth', 'false')
didn't work then we should probably fix that in FlagManager rather than work around it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misunderstood how the snap options work. anonymous-auth=false will be ignored because it is not part of the config params. However, if I set the anonymous-auth to false it will not make it to the arguments list and it will take the default value of true. Opened this issue:
https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/314
I am going to remove the anonymous-auth option for now so as not to cause any confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the submission konstantinos. Overall I think this is a great first-draft submission of the fix but there are some serious questions posed here relating to state distribution and upgrade paths for existing users who have certificate auth enabled (all of our existing deployments to date).
We'll need to sort those satisfactorily before I'm comfortable acking this change.
The majority of what's here is excellent though and I look forward to giving this a second round review once the requested modifications have been made.
if not db.get(save_salt): | ||
db.set(save_salt, token) | ||
else: | ||
return db.get(save_salt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if its better form then to instead use leader-data to handle these values, and keep them as a mutable dict object, being mutated by the leader and subsequently leader_set() after update, so any future-facing leaders get the benefit of having this state tracked for them.
# Set permissions on the ubuntu users kubeconfig to ensure a consistent UX | ||
cmd = ['chown', 'ubuntu:ubuntu', kubeconfig_path] | ||
check_call(cmd) | ||
|
||
messaging() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dont do this. You're making an explicit invocation of a method that is controlled by a decorator.
What I suggest is to move the messaging method into a non-decorated space in the source file, and write a new method declaration for invoking the messaging()
method during the @when('kubernetes.e2e.installed')
state is present, and you can then do this messaging()
invocation inline like you have it (for guarantee of it being invoked when you expect it to be).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -781,7 +830,6 @@ def configure_master_services(): | |||
api_opts.add('service-cluster-ip-range', service_cidr()) | |||
api_opts.add('min-request-timeout', '300') | |||
api_opts.add('v', '4') | |||
api_opts.add('client-ca-file', ca_cert_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we need to remove this, but I don't think this doesn't do anything for existing deployments. Does this need to be explicitly checked and removed from the file? (i didn't notice that anywhere else in the code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We remove the certificate from the kubeconfig file by un-setting the users section. Is this what you would expect? https://github.com/kubernetes/kubernetes/pull/47835/files#diff-25f2686437984f527090205ba417c242R158 . Do you have something else in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the concern is that the client-ca-file
entry might not be cleared from api_opts when upgrading an older deployment.
Thankfully, all flags get cleared during an upgrade-charm, so we're good: https://github.com/juju-solutions/kubernetes/blob/55525773cec50fa7510a9801c5c763e6ec6ad2ac/cluster/juju/layers/kubernetes-master/reactive/kubernetes_master.py#L139-L141
Don't know why that's in migrate_from_pre_snaps
, it really doesn't belong there. But hey, at least it's happening!
'cni.available', 'kubernetes-worker.restart-needed') | ||
def start_worker(kube_api, kube_control, cni): | ||
def start_worker(kube_api, kube_control, auth_control, cni): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are, each state thats set off of that interface, passes the interface into the method. It got me too when I was prototyping.
@@ -463,13 +465,12 @@ def configure_worker_services(api_servers, dns, cluster_cidr): | |||
kubelet_opts.add('require-kubeconfig', 'true') | |||
kubelet_opts.add('kubeconfig', kubeconfig_path) | |||
kubelet_opts.add('network-plugin', 'cni') | |||
kubelet_opts.add('logtostderr', 'true') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this was removed. was this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value is already true:
--logtostderr log to standard error instead of files (default true)
Adding it back.
@k8s-bot ok to test I'm +1 to this, not seeing any problems after the recent changes. @chuckbutler can you give it another look too? |
I've been following along and i'm +1 to this. /lgtm |
/approve no-issue |
Ah I see there's still some dust settling here with respect to relationship scoping. I'm going to wait until you've signalled this PR is ready for re-review @ktsakalozos |
@marcoceppi @chuckbutler I had to update this PR to align it with the changes done on the kube-control interface. Retested again and it looks strong. Thank you. |
Since we got all comments addressed I am squashing the commits |
866c292
to
0b01cd7
Compare
/assign @Cynerva |
@Cynerva: GitHub didn't allow me to assign the following users: cynerva. Note that only kubernetes members can be assigned. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Cynerva, chuckbutler, ktsakalozos, marcoceppi Associated issue requirement bypassed by: marcoceppi The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/retest |
Automatic merge from submit-queue (batch tested with PRs 47850, 47835, 46197, 47250, 48284) |
What this PR does / why we need it: This PR secures the deployments done with Juju master. Works around certain security issues inherent to kubernetes (see for example dashboard access)
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #Special notes for your reviewer:
Release note: