Add node status to API object. #2315

ddysher · 2014-11-12T03:10:54Z

@smarterclayton @lavalamp I'm not sure if this is enough to get it work. Beat me up.

@smarterclayton Should I also take care of v1beta3?

smarterclayton · 2014-11-12T03:22:46Z

Add it to v1beta3 (in the new form) under Node. I would recommend not taking on any more of the v1beta3 changes in this pull - we should do the internal refactor of Node next but that can build on top of this.

smarterclayton · 2014-11-12T03:26:30Z

I think this looks ok otherwise - I haven't followed any other discussions about node conditions. Is the scheduler expected to only schedule on nodes with a certain condition? In another issue we discussed the admin being explicitly able to mark a node as not accepting new scheduled nodes - is that something that would be reflected in Condition, or separate? Seems like you might still schedule onto vanished or unhealthy nodes in some cases.

ddysher · 2014-11-12T04:55:30Z

The Condition here is meant to replace node health check. I don't know which issue you are referring to, but we can add another condition value, or a new field in NodeStatus, to let admin to disable a node.

What are the cases that we will schedule onto vanished/unhealthy node? I think at first, we can have apiserver filter out unheathy/vanished node. Then we can approach to selector based list, so scheduler can list only healthy node, kubectl can list all nodes, etc?

smarterclayton · 2014-11-12T05:39:27Z

On Nov 11, 2014, at 11:55 PM, Deyuan Deng notifications@github.com wrote:

The Condition here is meant to replace node health check. I don't know which issue you are referring to, but we can add another condition value, or a new field in NodeStatus, to let admin to disable a node.

It would be part of spec in v1beta3 (set by user), but I don't think have to handle it here.

What are the cases that we will schedule onto vanished/unhealthy node? I think at first, we can have apiserver filter out unheathy/vanished node. Then we can approach to selector based list, so scheduler can list only healthy node, kubectl can list all nodes, etc?

It can. Are we planning on defining as part of the api spec rigorously what Vanished means to a client like we do for pods? Or is it vaguer? The precision for PodCondition is very important, whereas Vanished seems like the definition could be very flexible.

—
Reply to this email directly or view it on GitHub.

bgrant0607 · 2014-11-12T15:48:41Z

I will take a look at this this morning.

bgrant0607 · 2014-11-12T22:17:53Z

pkg/api/types.go

+	// NodeHealthy means the Node is running but unhealthy
+	NodeUnhealthy NodeCondition = "Unhealthy"
+	// NodeVanished means the Node is not reachable
+	NodeVanished NodeCondition = "Vanished"


How do you plan to define "Vanished", and what behavior to you plan to attach to it?

Here "Vanished" is defined as not reachable from the cluster, i.e. connection refused from the node's kubelet port. Ocasional network failure or machine restarts can make a node unreachable, so we'll need a policy surround the definition, like N out of M tries, at most no connection in X min, etc. I plan to enforce this on node-controller.

dchen1107 · 2014-11-12T22:37:56Z

Besides question on Vanished raised by others, should we define a condition for node at staging state? healthy, but not ready to provide any services yet? Or this belongs to kubelet readiness state?

bgrant0607 · 2014-11-12T22:50:33Z

pkg/api/types.go

@@ -576,6 +594,8 @@ type Minion struct {
 	HostIP string `json:"hostIP,omitempty" yaml:"hostIP,omitempty"`
 	// Resources available on the node
 	NodeResources NodeResources `json:"resources,omitempty" yaml:"resources,omitempty"`
+	// Status describes the current status of a Node
+	Status NodeStatus `json:"status,omitempty" yaml:"status,omitempty""`


This new status field is fine, but FYI:

We're trying to move our internal representation towards v1beta3.

There should eventually be just TypeMeta, ObjectMeta, NodeSpec, and NodeStatus in Minion/Node.

NodeResources was added to the wrong place in v1beta3 and HostIP was apparently dropped (not sure whether that was intentional or not).

Thanks!

Is there any particular reason we still leave the NodeResources here? Or it's just not cleaned up yet?

I would assume whoever does the internal refactor should clean it up.

----- Original Message -----

@@ -576,6 +594,8 @@ type Minion struct {
HostIP string json:"hostIP,omitempty" yaml:"hostIP,omitempty"
// Resources available on the node
NodeResources NodeResources json:"resources,omitempty" yaml:"resources,omitempty"

// Status describes the current status of a Node

Status NodeStatus json:"status,omitempty" yaml:"status,omitempty""

Thanks!

Is there any particular reason we still leave the NodeResources here? Or
it's just not cleaned up yet?

Reply to this email directly or view it on GitHub:
https://github.com/GoogleCloudPlatform/kubernetes/pull/2315/files#r20271948

SG. I can investigate this, but I'm not fully understand how we organize our api versions. @smarterclayton

For the NodeCondition change, I need to update all 3 versions and internal version, but for dropping the HostIP, seems we only did that in v1beta3. What's the rationale behind this?

@bgrant0607 Should NodeResources be split into Spec and Status (Capacity in Spec, whatever represents the current state in Status)?

Short answer is yes.

See resourceCapacitySpec in https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/resources.md. I updated it this week.

Status is more complicated. See Usage Data in the Appendix of resources.md.

bgrant0607 · 2014-11-12T23:02:23Z

Sorry, I haven't had time to follow node controller discussions, either.

@smarterclayton

Vanished probably comes from #1366.

We do eventually need a rigorous definition of vanished. There will be a standard definition, though it may be configurable per k8s cluster. As discussed in #1366, once a node is considered vanished, I'd like the node controller to kill the pods on that node. If we need more flexibility, we'd use forgiveness to override the default kill policy.

Schedulers should only schedule on Healthy nodes.

I expect to add more flavors of not schedulable in the future, such as Lame (node controller draining running pods gracefully -- deliberately unschedulable via API call to apiserver and/or to Kubelet), Disabled (deliberately kill pods and make unschedulable via API call to apiserver and/or to Kubelet), etc.

bgrant0607 · 2014-11-13T01:28:07Z

@dchen1107 Good point. Not sure we want healthy, unhealthy, vanished to be the values of NodeCondition. Will think about this a bit.

smarterclayton · 2014-11-13T01:34:54Z

On Nov 12, 2014, at 6:02 PM, bgrant0607 notifications@github.com wrote:

Sorry, I haven't had time to follow node controller discussions, either.

@smarterclayton

Vanished probably comes from #1366.

We do eventually need a rigorous definition of vanished. There will be a standard definition, though it may be configurable per k8s cluster. As discussed in #1366, once a node is considered vanished, I'd like the node controller to kill the pods on that node. If we need more flexibility, we'd use forgiveness to override the default kill policy.

Schedulers should only schedule on Healthy nodes.

I expect to add more flavors of not schedulable in the future, such as Lame (node controller draining running pods gracefully -- deliberately unschedulable via API call to apiserver and/or to Kubelet), Disabled (deliberately kill pods and make unschedulable via API call to apiserver and/or to Kubelet), etc.

Seems like something can be both disabled and unhealthy at the same time. I.e. When an ops team wants to take a node out of the scheduler pool slowly and drain pods by deleting them slowly, but then a temporary network problem occurs. Tooling shouldn't be confused about the state of the node if that happens.
—
Reply to this email directly or view it on GitHub.

bgrant0607 · 2014-11-13T03:57:34Z

@smarterclayton Agree. For explicit disabling/laming, a NodeSpec field would be required.

ddysher · 2014-11-13T04:57:27Z

@dchen1107 I believe we will eventually need such a staging state, especially when kubernetes can scale nodes dynamically. At which point, we definitely need a rigorous definition, as the cutoff between staging and unhealthy can be subtle.

@bgrant0607 Thanks for reviewing, some comments:
Agreed with disabling/laming being another NodeSpec field. After moving health check out of minion registry, I'd like to implement watch interface on minion registry, which should make it eaiser to watch for any node spce changes.

Regarding to @dchen1107 's suggestion, why do you think healthy, unhealthy and vanished may not be wanted?

The current NodeCondition is basically a rework of minion health check. Is Node life cycle well understood in k8s that we are able to give a rigorous definition? If so, then I can get on board with it; if not, then we can first finish the rework?

bgrant0607 · 2014-11-13T18:50:46Z

I think we want to distinguish lifecycle from recently observed behavior.

By analogy with pods, NodeCondition should reflect lifecycle. Proposed conditions could be Pending, Running, Shutdown.

Then similar to ContainerState, we could provide more detailed information, including liveness, readiness, and reachability, each with time of the last transition (e.g., live -> not live).

Any condition (in the general meaning of the word) that could have a fairly open-ended number of causes should have an associated Reason and possibly Message.

ddysher · 2014-11-16T21:06:42Z

It's beyond the initial scope for NodeCondition, but I'll extend the idea here since it's desirable to have a lifecycle management of node.

Pending and Running looks good to me, but I'm not sure if Shutdown is appropriate here. How could controller know if a node is shutdown or not? I'd prefer using Stopped or Vanished, which includes more possibilities. Also, I think we need to include unhealthy state for node status to be complete, but we could also just include unhealthy state into stopped state. I've included a brief summary.

Pending

Pending means the node has been registered in kubernetes, but not ready to accept new pod. Either node-controller registers the node, or kube admin does (by plugin or cli). Pending state is a transient state: node in Pending state will finally become Running, Unhealthy or Stopped.

Transition

If a node is just added to k8s, its condition will become Pending.
If a node is re-enabled by admin, it will become Pending.
If during the X minute, Node starts responding, it is then marked as Running.
If during the X minute, Node starts responding with error, it is then marked as Unhealthy.
After X minute, Node is marked as Stopped.

Meta Data

Reason: This should probably be Node not ready, or Node in provision.

Running

Running means the node is running and ready to accept pod.

Transition
See others

Meta Data

StartedAt: The time that node start running.

Disabled

Disabled means the node is explicitly marked by admin to not accept any Pod.

Transition

An admin disables the node. When enabled, the node enters Pending state.

Meta Data

DisabledAt: The time that node is disabled
Reason: Reason for disabling the node
Message: A detailed message.

Unhealthy

Unhealthy means Node is responding to health check, but there is error in the Node.

Transition

If a node fails health check due to error status (OOM, etc), it is then marked as Unhealthy.
- TODO: Define errors

Meta Data

LastHealthyAt: The latest time that node reports healthy
UnhealthyAt: The time that node becomes unehealty.
Reason: The reason for the unhealthy state.
Message: A detailed message.

Stopped (Shutdown, Vanished)

Stopped means Node is not responding to health check.

Transition

If a node fails health check due to no reaspond, it is then marked as Stopped.
- TODO: How about machine restart?

Meta Data

StoppedAt: The time that Node is stopped.

bgrant0607 · 2014-11-20T01:43:51Z

Lifecycle stages should be similar to pods. When created/added to the cluster, they start in a pending state, progress to an active state, and then end in a terminated state when the node is deleted or potentially when it disappears from the host provider (e.g., the VM is deleted from GCE). It shouldn't be possible to go back to a prior stage without re-creating the node object.

Everything else should be current status.

bgrant0607 · 2014-11-20T01:53:26Z

I think we should have an array of status info, each with a kind of health, starting with Reachable, Live, Ready.

Additionally, in the spec, there should be a Schedulable field and a Runnable field. We'll want to make this fancier in various ways in the future.

Will address the rest in a bit.

bgrant0607 · 2014-11-20T02:22:42Z

We should also represent Schedulable and Runnable in status, similar to the other properties, recording time of transition, reason, and message.

bgrant0607 · 2014-11-21T17:27:31Z

Sigh. This discussion makes it clear what a mistake it was to use "Condition" for what is, essentially, a lifecycle phase (definition: a distinct period or stage in a process of change or forming part of something's development; or stage: a point, period, or step in a process or development).

At least we got Status right.

@smarterclayton @markturansky I feel bad for even considering this, but how would you feel about replacing Condition with Phase (which I prefer to Stage because the latter is used in other ways in deployment contexts)? It looks like Condition only appears in a couple dozen lines of code. That would allow us to repurpose Condition for things like Reachable, Live, Ready, Schedulable, Runnable.

markturansky · 2014-11-21T18:25:45Z

@bgrant0607 yes, that PodCondition refactor was easy. I am happy to change it to PodPhase.

I'll have a new pull for that change and I'll be sure to reference this issue.

ddysher · 2014-11-23T04:18:28Z

@bgrant0607 I'm not sure how Schedulable/Runnable fit into Reachable/Live/Ready in NodeStatus. We could also include each kind of unhealth in the status info. So my interpretation of your suggestion is something like follows. (I didn't follow Pod lifecycle discussion, correct me if it's wrong).

type NodeStatus struct {
    // NodePhase is the current phase of the Node, one of "Pending", "Running" and "Terminated". It
    // is populated based on the value of NodeCondition.
    Phase     NodePhase
    Condition NodeCondition
}

// NodeCondition is a detailed condition of Node. Only one of the field can be set at a given time.
type NodeCondition struct {
    UnReachable *NodeConditionUnReachable
    Reachable   *NodeConditionReachable
    Schedulable *NodeConditionSchedulable
    Live        *NodeConditionLive
    Ready       *NodeConditionReady
    NotReady    *NodeConditionNotReady
    ...
}

type NodeConditionUnReachable struct {
    Reason            string
    LastReachableTime Time
}
...

For example, when Node is unreachable and its LastReachableTime is none, then the node is Pending; otherwise, it's Terminated.

The tricky part is how do we define different node conditions?

bgrant0607 · 2014-11-24T21:12:32Z

The running phase may be populated based on condition, but the terminated phase should not be.

My proposal:

type NodeStatus struct {
    // NodePhase is the current lifecycle phase of the Node, one of "Pending", "Running" and "Terminated". 
    Phase     NodePhase
    Conditions []NodeCondition
}
type NodeConditionKind string
const (
  NodeReachable   NodeConditionKind = "Reachable"
  NodeLive        NodeConditionKind = "Live"
  NodeReady       NodeConditionKind = "Ready"
  NodeSchedulable NodeConditionKind = "Schedulable"
  NodeRunnable    NodeConditionKind = "Runnable"
)
type NodeCondition struct {
  Kind NodeConditionKind
  Status string // possible values: true, false, unknown
  LastTransitionTime util.Time
  Reason string
  Message string
}

ddysher · 2014-11-25T04:24:35Z

Ok, I think the idea is same here: with NodeCondition.Status, unhealthy condition is also included in Node status.

Currently, kubelet healthz is pretty weak, we cannot gather much information from it. The only information we can get is Reachable/Unreachable (connection succeed or not), and Ready/NotReady (status ok or not). so we'll start with a subset of the conditions.

I'm not sure what "Live" means here, or in general, the difference between Live/Reachable/Ready and the reason we include all them in node condition.

We'll want to make sure at any given time of the node, all of the conditions exist for the node. So I'd say change the slice to map will be easier to implement.

At last, I think we still need to populate NodePhase based on NodeCondition, even for terminate phase. NodePhase is really just a summary of NodeCondition, surrounded by our policy (defines which combination of conditions belongs to which phase).

kubernetes-bot · 2014-11-26T04:58:23Z

Can one of the admins verify this patch?

bgrant0607 · 2014-12-01T21:47:59Z

Discussed IRL. Summary:

NodePhase should not be derived from NodeCondition -- I'm deliberately trying to keep them orthogonal. NodePhase is about provisioning status. Pending is node has been created/added but not configured. (We need some mechanism to configure nodes on demand, to accommodate thing like auto-scaled clusters.) Running means that it has been configured and has Kubelet running. Terminated means the node has been removed from the cluster (e.g., its VM has been deleted).

I'm fine if we just start with one NodeCondition -- say, "Ready". We can add the others when we have more information.

Re. why array of conditions rather than map, see #2004.

I'm not completely opposed to aggregating default schedulability and runnability from NodeConditions, but as separate fields and not as part of NodePhase. However, such summary fields would become confusing later when we allow exceptions to the default policies, such as forgiveness (#1574).

ddysher · 2014-12-03T05:18:53Z

Updated PR based on discussion. Two changes made after @bgrant0607 's last comment:

For NodeCondition, I start with "Reachable" and "Ready". The definition is included in code comment. For 'Reachable', we could do a bit lower level connection, but if http is not working, it's not reachable from master's point of view anyway, so I'm sticking to http for now.
Created a new type for NodeCondition.Status

Now that we want to keep node phase from condition, we definitely won't include schedulability and runnability to NodePhase. The two properties can be summarized from NodeConditions (I think), but we need to persist the properties to support forgiveness, e.g. we need to know when a node becomes unschedulable. I actually like the idea of having a separate field.. but that probably should be discussed in another issue.

ddysher · 2014-12-09T23:24:46Z

@bgrant0607 What's the status for this? Are we going to merge it or have some more discussion?

bgrant0607 · 2014-12-09T23:37:47Z

@ddysher Was busy all last week. Will try to look it over soon.

bgrant0607 · 2014-12-11T00:34:03Z

pkg/api/types.go

+type NodeConditionStatus string
+
+// These are valid condition status. "ConditionTrue" means node is in the condition;
+// "ConditionFalse" means node is not in the condition; "ConditionUnknown" means kuernetes


typo: Kubernetes

bgrant0607 · 2014-12-11T00:39:53Z

Looks pretty good. Thanks! Just a few minor comments.

ddysher · 2014-12-11T05:30:13Z

Comment addressed. Thanks!

bgrant0607 · 2014-12-11T07:26:01Z

pkg/api/v1beta1/types.go

+)
+
+type NodeCondition struct {
+	Kind               NodeConditionKind


Please add json and description tags to all the fields.

Add node status to API object.

ddysher force-pushed the node-status branch from 5b9859c to 13b1edc Compare November 12, 2014 05:01

bgrant0607 self-assigned this Nov 12, 2014

bgrant0607 added the area/api Indicates an issue on api area. label Nov 12, 2014

bgrant0607 reviewed Nov 12, 2014
View reviewed changes

bgrant0607 added the area/nodecontroller label Nov 12, 2014

ddysher mentioned this pull request Nov 18, 2014

Create a backgroud task to register static list of machines. #2278

Merged

markturansky mentioned this pull request Nov 21, 2014

Refactor PodCondition to PodPhase #2522

Merged

ddysher force-pushed the node-status branch 2 times, most recently from cf0f8b8 to 5d0fc30 Compare December 3, 2014 05:18

bgrant0607 reviewed Dec 11, 2014
View reviewed changes

ddysher force-pushed the node-status branch from 5d0fc30 to 09fdb44 Compare December 11, 2014 05:16

bgrant0607 reviewed Dec 11, 2014
View reviewed changes

Add node status to API object (all versions).

0332c8d

ddysher force-pushed the node-status branch from 09fdb44 to 0332c8d Compare December 11, 2014 13:13

bgrant0607 added a commit that referenced this pull request Dec 11, 2014

Merge pull request #2315 from ddysher/node-status

ecbb6c4

Add node status to API object.

bgrant0607 merged commit ecbb6c4 into kubernetes:master Dec 11, 2014

ddysher mentioned this pull request Dec 16, 2014

Work items for node management #2959

Closed

10 tasks

ddysher deleted the node-status branch December 17, 2014 09:38

ddysher mentioned this pull request Dec 31, 2014

Always return a minion if we have it, so that we don't flakely 404... #3102

Closed

bgrant0607 mentioned this pull request Feb 13, 2015

Mark node to be decommissioned and act accordingly #3885

Closed

ddysher mentioned this pull request Mar 20, 2015

Rename ConditionFull to ConditionTrue and rename ConditionNone to ConditionFalse #5701

Closed

gmarek mentioned this pull request Mar 24, 2015

kubectl describe nodes <id> should have more information #4075

Closed

Add node status to API object. #2315

Add node status to API object. #2315

Conversation

ddysher commented Nov 12, 2014

smarterclayton commented Nov 12, 2014

smarterclayton commented Nov 12, 2014

ddysher commented Nov 12, 2014

smarterclayton commented Nov 12, 2014

bgrant0607 commented Nov 12, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dchen1107 commented Nov 12, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgrant0607 commented Nov 12, 2014

bgrant0607 commented Nov 13, 2014

smarterclayton commented Nov 13, 2014

bgrant0607 commented Nov 13, 2014

ddysher commented Nov 13, 2014

bgrant0607 commented Nov 13, 2014

ddysher commented Nov 16, 2014

Pending

Running

Disabled

Unhealthy

Stopped (Shutdown, Vanished)

bgrant0607 commented Nov 20, 2014

bgrant0607 commented Nov 20, 2014

bgrant0607 commented Nov 20, 2014

bgrant0607 commented Nov 21, 2014

markturansky commented Nov 21, 2014

ddysher commented Nov 23, 2014

bgrant0607 commented Nov 24, 2014

ddysher commented Nov 25, 2014

kubernetes-bot commented Nov 26, 2014

bgrant0607 commented Dec 1, 2014

ddysher commented Dec 3, 2014

ddysher commented Dec 9, 2014

bgrant0607 commented Dec 9, 2014

Choose a reason for hiding this comment

bgrant0607 commented Dec 11, 2014

ddysher commented Dec 11, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment