-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Canal containers give selinux related error message #1691
Comments
Exactly the same thing happened to me after updating a cluster from CentOS 7.6 to 7.7, leading me to believe that something changed in SELinux in the transition (I've checked their release notes and found nothing.) I "fixed" it by changing the network plugin and using plain flannel for the time being (which was... laborious) but, because of this, I still haven't upgraded CentOS on the other clusters. Also, see projectcalico/calico#2704. |
Whil trying to reproduce the problem using a couple of different cloud providers, I see that
This is causing problems with the install. Running @nheinemans and @carloscarnero could you check if this step resolves your problem? |
I have not upgraded to CentOS 8 yet. Instead, I observed the problem going from 7.6 to 7.7. Thank you @leodotcloud for looking into this! |
@carloscarnero Where are your machines running (cloud/on-prem)? Any steps to reproduce the problem? |
The following is the ---
cluster_name: development
nodes:
- address: cfdd9f3c.example.com
user: dockeruser
role:
- controlplane
- etcd
- worker
- address: b5833011.example.com
user: dockeruser
role:
- controlplane
- etcd
- worker
- address: 307309d8.example.com
user: dockeruser
role:
- controlplane
- etcd
- worker
network:
plugin: canal
dns:
provider: coredns
upstreamnameservers:
- 8.8.8.8
ingress:
provider: none
system_images:
etcd: example.com/rancher/coreos-etcd:v3.3.10-rancher1
alpine: example.com/rancher/rke-tools:v0.1.50
nginx_proxy: example.com/rancher/rke-tools:v0.1.50
cert_downloader: example.com/rancher/rke-tools:v0.1.50
kubernetes: example.com/rancher/hyperkube:v1.15.5-rancher1
kubernetes_services_sidecar: example.com/rancher/rke-tools:v0.1.50
pod_infra_container: example.com/rancher/pause:3.1
kubedns: example.com/rancher/k8s-dns-kube-dns-amd64:1.15.0
dnsmasq: example.com/rancher/k8s-dns-dnsmasq-nanny-amd64:1.15.0
kubedns_sidecar: example.com/rancher/k8s-dns-sidecar-amd64:1.15.0
kubedns_autoscaler: example.com/rancher/cluster-proportional-autoscaler:1.3.0
coredns: example.com/rancher/coredns:1.3.1
coredns_autoscaler: example.com/rancher/cluster-proportional-autoscaler:1.3.0
flannel: example.com/rancher/coreos-flannel:v0.11.0-rancher1
flannel_cni: example.com/rancher/coreos-flannel-cni:v0.3.0-rancher5
calico_node: example.com/rancher/calico-node:v3.7.4
calico_cni: example.com/rancher/calico-cni:v3.7.4
calico_controllers: example.com/rancher/calico-kube-controllers:v3.7.4
calico_ctl: example.com/rancher/calico-ctl:v2.0.0
canal_node: example.com/rancher/calico-node:v3.7.4
canal_cni: example.com/rancher/calico-cni:v3.7.4
canal_flannel: example.com/rancher/coreos-flannel:v0.11.0
weave_node: example.com/rancher/weave-kube:2.5.2
weave_cni: example.com/rancher/weave-npc:2.5.2
ingress: example.com/rancher/nginx-ingress-controller:nginx-0.25.1-rancher1
ingress_backend: example.com/rancher/nginx-ingress-controller-defaultbackend:1.5-rancher1
metrics_server: example.com/rancher/metrics-server:v0.3.3 The nodes are based on CentOS 7.7, updated up to the last minute; and during basic system configuration, the documented requirements were taken into account. SELinux is completely enabled, of course, and that's what's preventing calico/canal to start. |
Thanks @carloscarnero for sharing the detailed info. I will try to reproduce it on my end. I have one more question. Do you use the stock CentOS ISO to bring up the machines or do you have any other customizations done? |
I'm using the CentOS minimal install option, practically vanilla. Close to no customizations, except that I remove the firewalld service and install iptables, which is a fully supported option (besides, it has worked as such close to two years.) One more thing I have just discovered: I was wrong that this happened during the upgrade from CentOS 7.6 to 7.7... I just checked a non-upgraded cluster and it was already failing (fragment):
The above comes from a seven-node cluster, configured with the same settings as before, and you can see that five pods failed, and two are running. I can be 100% certain that the OS settings are the same, as they're managed via Ansible. The logs for the failing pods show exactly the same message as with the opening message of this issue:
Every time the previous message pops up, there's a corresponding one on the SELinux audit log:
|
From the discussion in projectcalico/calico#2704 it seems that
is needed in order to properly handle SELinux systems. Thus, I edited the running canal daemonset with After saving, the pods immediately reached the running state, and no more errors were logged. Maybe this suggests that those lines are missing in the template? |
@leodotcloud I have tried the fix above in another different cluster, and it seems to work. |
RHEL8 support is tracked in rancher/rancher#23045. To validate the new templates (should show privileged true in the new templates and nothing in the old templates): Canal
Calico
|
@carloscarnero If you can test this change on some lab machines which are identical to the ones that were exhibiting the problem, that would be appreciated |
Reproduced the issue with RKE version v0.3.2 for Canal network plugin:
Tested with rke version v1.1.0-rc11.
Automation tests were also run on the above setups with Canal network plugin and no issues were found. |
@superseb I'm not clear what I should test. I mean... should I use rke v1.1.0-rc11? If that's the case, should I test against one of that version's supported K8s? EDIT: based on the previous comment, I will test with v1.1.0-rc11 and K8s1.15.10-rancher1-2. The operating system is CentOS 7.7, completely updated, with SELinux enabled and enforcing. This will take some time because all my setups are air-gapped and I have to prime the internal registry. |
Success using v1.1.0-rc11 and K8s1.15.10-rancher1-2 on CentOS 7.7 with enforcing SELinux! Note, however:
Next test is upgrading from 1.15.5 to 1.15.10, and will report back in this very comment to avoid further noise. EDIT: A cluster upgrade into 1.15.10 from 1.15.5 was successful! The canal pods are privileged and running properly. |
Thanks for testing |
RKE version:
0.3.0
Docker version: (
docker version
,docker info
preferred)Docker daemon.json:
Operating system and kernel: (
cat /etc/os-release
,uname -r
preferred)NAME="Red Hat Enterprise Linux"
VERSION="8.0 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.0"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.0 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8.0:GA"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.0
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.0"
Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
Doesn't matter
cluster.yml file:
Steps to Reproduce:
rke up
When the cluster is built, I see problemens with canal pods:
Looking into the cni-install pod, I see this error message:
Results:
Cluster doesn't work properly. Setting selinux to permissive is not really an option.
The text was updated successfully, but these errors were encountered: