Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node labels are lost after Deregister/Register Node events #19899

Closed
skynardo opened this issue May 31, 2018 · 4 comments · Fixed by #20663
Closed

Node labels are lost after Deregister/Register Node events #19899

skynardo opened this issue May 31, 2018 · 4 comments · Fixed by #20663
Assignees
Labels

Comments

@skynardo
Copy link

After stopping/starting a Node (AWS EC2 instance) node labels:
logging-infra-fluentd=true,node-role.kubernetes.io/compute=true
Are missing after the node registers and becomes READY

Version

openshift v3.9.0+ba7faec-1
kubernetes v1.9.1+a0ce1bc657

Steps To Reproduce
  1. oc get node ip-177-77-77-100.ec2.internal --show-labels
    NAME STATUS ROLES AGE VERSION LABELS
    ip-177-77-77-100.ec2.internal Ready compute 6d v1.9.1+a0ce1bc657 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-1,failure-domain.beta.kubernetes.io/zone=us-east-1a,kubernetes.io/hostname=ip-177-77-77-100.ec2.internal,logging-infra-fluentd=true,node-role.kubernetes.io/compute=true,region=primary,zone=default
  2. Shut down EC2 Instance
  3. Start up EC2 Instance
  4. Run oc get node ip-177-77-77-100.ec2.internal --show-labels
Current Result

oc get node ip-177-77-77-100.ec2.internal --show-labels
NAME STATUS ROLES AGE VERSION LABELS
ip-177-77-77-100.ec2.internal Ready 1m v1.9.1+a0ce1bc657 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-1,failure-domain.beta.kubernetes.io/zone=us-east-1a,kubernetes.io/hostname=ip-177-77-77-100.ec2.internal,region=primary,zone=default

Expected Result

We expect to still have the labels logging-infra-fluentd=true,node-role.kubernetes.io/compute=true after reboot.

Additional Information
@jwforres
Copy link
Member

jwforres commented Jun 7, 2018

@openshift/sig-pod

@DanyC97
Copy link
Contributor

DanyC97 commented Jun 8, 2018

@skynardo how did you set the labels nodes in first place ?

@skynardo
Copy link
Author

skynardo commented Jun 8, 2018

We ran the playbook below after upgrading to version 3.9 to deploy openshift logging. These playbooks must set the logging-infra-fluentd=true on all nodes.
/usr/share/ansible/openshift-ansible/playbooks/openshift-logging/config.yml

We ran the upgrade playbook below which sets the role.kubernetes.io/compute=true /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_9/upgrade.yml

We also have group_var files to set Openshift Roles for nodes, and masters (shown below).
These help the installer set Role=compute for worker nodes and Role=master for master nodes.

cat tag_OpenShift_Role_node

openshift_schedulable: true
openshift_node_labels:
region: primary
zone: default
openshift_node_group_name: node-config-node

cat tag_OpenShift_Role_master

openshift_schedulable: true
openshift_node_labels:
region: master
zone: default
openshift_node_group_name: node-config-master

@sjenning
Copy link
Contributor

sjenning commented Jun 28, 2018

This is a known issue:
https://bugzilla.redhat.com/show_bug.cgi?id=1559271

Upstream discussions:
kubernetes/kubernetes#45986
kubernetes/kubernetes#46442

This cause of the label loss is that the cloud node controller is deleting the node when the instance that backs it is shutdown.

BenTheElder pushed a commit to BenTheElder/kubernetes that referenced this issue Sep 1, 2018
Automatic merge from submit-queue (batch tested with PRs 67571, 67284, 66835, 68096, 68152). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

cloudprovider: aws: return true on existence check for stopped instances

xref https://bugzilla.redhat.com/show_bug.cgi?id=1559271
xref openshift/origin#19899

background kubernetes#45986 (comment)

Basically our customers are hitting this issue where the Node resource is deleted when the AWS instances stop (not terminate).  If the instances restart, the Nodes lose any labeling/taints.

Openstack cloudprovider already made this change kubernetes#59931

fixes kubernetes#45118 for AWS

**Reviewer note**: valid AWS instance states are `pending | running | shutting-down | terminated | stopping | stopped`.  There might be a case for returning `false` for instances in `pending` and/or `terminated` state.  Discuss!

`InstanceID()` changes from kubernetes#45986 credit @rrati 

@derekwaynecarr @smarterclayton @liggitt @justinsb @jsafrane @countspongebob
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants