Add apis for machine-class #488

hardikdr · 2018-08-29T18:27:53Z

What this PR does / why we need it: Adding APIs for Machine-class. Machineclass is a way of externalizing the ProviderConfig from MachineSpec using reference.
This PR is basically re-initiating the work done here: kubernetes-retired/kube-deploy#659

I will migrate the review-comments from that PR to here.

Which issue(s) this PR fixes : This PR partially resolves- #22 .

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:

Added apis for machine-class

hardikdr · 2018-08-29T18:37:17Z

pkg/apis/cluster/v1alpha1/common_types.go

@@ -39,5 +42,25 @@ type ProviderConfig struct {
 // ProviderConfigSource represents a source for the provider-specific
 // resource configuration.
 type ProviderConfigSource struct {


mvladev: https://github.com/kubernetes/kube-deploy/pull/659/files#r177866614

roberthbailey: I don't think it should be necessary to specify the capacity / allocatable on a single machine if you inline the provider config. I think of the class is a separate way to bring the provider specific data, not something inherent to the spec of the machine.

/cc @mvladev - additional thoughts on this?

@roberthbailey does that mean machine class is just a collection of provider specific configuration?
Advantages of machine classes:

less configuration lines in the machine objects by keeping common bits in a machine class object

Any other?

less configuration lines in the machine objects by keeping common bits in a machine class object

more precisely, externalize the providerConfig to separate objects.

as discussed in wg-call, it could also be used for defaulting/versioning via apiserver-webhooks, which could be hard to achieve in only inlined.

hardikdr · 2018-08-29T18:42:35Z

pkg/apis/cluster/v1alpha1/common_types.go

+}
+
+// MachineClassRef is a reference to the MachineClass object. Controllers should find the right MachineClass using this reference.
+type MachineClassRef struct {


mvladev: What about namespace?

roberthbailey: Storage classes aren't namespaced, and I used that same pattern here. But it's worth discussing whether they should be part of a namespace.

I have entertained the namespaces- should be helpful especially keeping in mind the sensitivity of the data in machine-classes and our approach of isolating the clusters based on namespaces.

StorageClass are basically templates. Here, we're using namespace for isolating actual clusters. Do we need to also isolate machine templates? I don't see a need for namespace here.

As mentioned, machine-classes going forward would be expected to have specific details towards machines and in-turn clusters, and also sensitive information of underlying infra. Separating machine-classes via namespaces will help to avoid unintended consumers of it.

tend to agree with @hardikdr on keeping machine class as namespaced object.

hardikdr · 2018-08-29T18:46:53Z

pkg/apis/cluster/v1alpha1/common_types.go

+	// parameters is 512, with a cumulative max size of 256K.
+	// TODO(roberthbailey): Should this be a json-patch?
+	// +optional
+	Parameters map[string]string `json:parameters,omitempty`


krousey: What kinds of things is this meant to override? Capacity? Allocatable? Or stuff in the provider config? All of these seem like they could be deeply nested structures that a map would not be able to fully express.

If ProviderConfig, perhaps we consider json patches?

roberthbailey: My thinking was in the provider config (e.g. parameterize the zone of a class to stamp out identical machine sets in multiple zones).

I think we should leave this out for now. If we find a need for it, we can figure out how to add it later (templating vs. overlays will be contentious). But until there is a concrete need, less is more.

hardikdr · 2018-08-29T19:08:03Z

pkg/apis/cluster/v1alpha1/machineclass_types.go

+	// this field is consistent with the underlying machine that will
+	// be provisioned when this class is used, to inform higher level
+	// automation (e.g. the cluster autoscaler).
+	Allocatable corev1.ResourceList `json:"allocatable"`


krousey: From what I understand, this is the capacity minus the amount that kubelet is going to reserve on the node. I don't think you can know what that reserve is going to be just given a machine class. I know for GKE, we vary the reserve for each release, and possibly by machine size.

Perhaps the best way to represent this would be to officially model the Kubelet reserved resources in the MachineSpec (and therefore MachineSet and MachineDeployment). If we did that, we could drop Allocatable here, and autoscalers of deployments could pick the capacity from machine class and subtract reserved from machine spec.

mvladev: I agree with @krousey. There is no way to calculate in advance the allocated resources - the Machine class doesn't have a knowledge of which kubelet version / container runtime you are going to use and those can affect the kubelet's --kube-reserved and --system-reserved flags.

roberthbailey: The intent is that this would indicate to the cluster autoscaler how much actual capacity would exist once the "node allocatable" overhead was subtracted. The fact that the overhead varies by version makes putting this variable here... difficult, since it would tightly couple a machineclass to a specific k8s version if you wanted to adhere to the warning text.

@krousey - Is your suggestion of putting reserved in the machine spec to put it next to the kubelet version, since it is tightly coupled with it?

hardikdr · 2018-08-29T19:09:20Z

pkg/apis/cluster/v1alpha1/machineclass_types.go

+	// TODO: should this use an api.ObjectReference to a 'MachineTemplate' instead?
+	// A link to the MachineTemplate that will be used to create provider
+	// specific configuration for Machines of this class.
+	// MachineTemplate corev1.ObjectReference `json:machineTemplate`


mladev: Why is this needed?

roberthbailey: I added this comment as an alternate design -- one thing we'd thought about was splitting this data across two objects (class + template) and the template could potentially be used directly by machine in addition to machine class. Not sure if that's valuable or not though, so I put this here to foster discussion.

+1 for keeping capacity and allocatable as MahcineStatus fields. cluster-api controller would initialize these fields once linking with the kubelet node is established. These are "observed" values and mary vary from machine to machine due to some discovery failure or hw failure etc. Therefore should be updated in MachineStatus whatever is actually observed at kubelet node.

+1 for keeping taints in MachineClassSpec. Taints are generally used for scheduling domain isolation and therefore generally applies to a bunch of machines.

roberthbailey · 2018-08-31T18:35:27Z

/ok-to-test
/assign @roberthbailey

roberthbailey

It might be worth waiting to merge this until the kubebuilder stuff lands. I know it will mean a significant rebase, but I think the important thing is to agree on the initial api types. Hopefully putting those into the kubebuilder framework will be straightforward.

roberthbailey · 2018-09-06T05:01:37Z

pkg/apis/cluster/v1alpha1/common_types.go

@@ -39,5 +42,25 @@ type ProviderConfig struct {
 // ProviderConfigSource represents a source for the provider-specific
 // resource configuration.
 type ProviderConfigSource struct {


/cc @mvladev - additional thoughts on this?

roberthbailey · 2018-09-06T05:02:26Z

pkg/apis/cluster/v1alpha1/common_types.go

+	// parameters is 512, with a cumulative max size of 256K.
+	// TODO(roberthbailey): Should this be a json-patch?
+	// +optional
+	Parameters map[string]string `json:parameters,omitempty`


I think we should leave this out for now. If we find a need for it, we can figure out how to add it later (templating vs. overlays will be contentious). But until there is a concrete need, less is more.

roberthbailey · 2018-09-06T05:03:02Z

pkg/apis/cluster/v1alpha1/common_types.go

+// MachineClassRef is a reference to the MachineClass object. Controllers should find the right MachineClass using this reference.
+type MachineClassRef struct {
+	// +optional
+	metav1.ObjectMeta `json:"metadata,omitempty"`


Isn't there a "reference" type that we should use instead of objectmeta? Something like LocalReference or ObjectReference?

Yes, my bad, made it objectreference.
Also removed the Parameters for now.

roberthbailey · 2018-09-06T05:05:29Z

pkg/apis/cluster/v1alpha1/machineclass_types.go

+// across multiple Machines / MachineSets / MachineDeployments.
+// +k8s:openapi-gen=true
+// +resource:path=machineclasses
+type MachineClass struct {


If we leave out both allocatable and capacity, then machine class == provider config. I think it's important that we address @mvladev's comment above about whether we should have both of those fields on all machines (via an inline class) or only available to the autoscaler on machines that were created via a reference to a machine class.

Agreed. I believe these decisions would be easier to take once requirements from cluster-autosclaer requirements are clearer. Shall we evolve the APIs for such fields as and when needed?
See: Initial work has started here kubernetes/community#2653

I'm not sure if allocatable and capacity is something which should be put in the status of a machine. How can one define allocatable?
At the moment this is calculated by summing kube-reserved, system-reserved and eviction-thresholds https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable

We are going to chat with some folks that work on the autoscaler tomorrow; maybe that will help clarify what is needed here.

Code on how allocatable is calculated is here https://github.com/kubernetes/kubernetes/blob/426ef9d349bb3a277c3e8826fb772b6bdb008382/pkg/kubelet/cm/node_container_manager.go#L174:33
We could assume a reasonable margin so allocatable is set based on the given capacity when creating a new machineClass. This would be used for the case of scaling out from zero, otherwise the info from running machines/nodes would be used.

Also should taints belongs to the machineClass?

It seems strange to have allocatable and capacity part of machineclass. Is this defining how many pods we're going to allow to run on this machine? How can we know that without knowing the size of OS, and additional software stack? Shouldn't we let the control plane calculate that?

Yes, these fields are expected to be available only after certain calculation, but autoscaler probably would be asking for it from a machine-api stack. We are in talks with autoscaler folks and try to decide right place/design, and hence I have intentionally kept those fields out for the first cut.

@enxebre I am not sure about the specific requirements around taints, but I would expect Taints to be available rather at MachineObjects than at class. Class should be seen more for representation of set of machines under MachineDeployment/Set, where subset of them could or could not have taints, which could be best identified by taints on specific MachineObjects.

roberthbailey · 2018-09-06T05:06:14Z

sample/machineclass.yaml

+metadata:
+  name: small
+  namespace: foo
+capacity:


leave out capacity / allocatable since those are currently commented out.

Done. Thanks for pointing out, I missed it.

roberthbailey · 2018-09-06T05:06:47Z

sample/machineclass.yaml

+  apiVersion: "gceproviderconfig/v1alpha1"
+  kind: "GCEProviderConfig"
+  project: "$GCLOUD_PROJECT"
+  zone: "${ZONE:-us-central1-f}"


since we are going to remove the parameters part, make this just us-central1-f instead of my bash-style substitution syntax.

mvladev · 2018-09-11T08:43:26Z

pkg/apis/cluster/v1alpha1/common_types.go

+
+	// Provider is the name of the cloud-provider which MachineClass is intended for.
+	// +optional
+	Provider string `json:"provider,omitempty"`


I don't think that this is needed as the MachineClass should contain all the information for that specific machine.

@mvladev - where does the machine class contain that info? Right now it's contained in this string.

I assume Provider is similar to StorageClass' "provisioner"? If so, shouldn't we also have a name field for the MachineClass? Or is that already built in because of corev1.ObjectReference?

Yes, ObjectReference should provide us with the name field in it.

mvladev · 2018-09-11T08:59:11Z

pkg/apis/cluster/v1alpha1/machineclass_types.go

+// across multiple Machines / MachineSets / MachineDeployments.
+// +k8s:openapi-gen=true
+// +resource:path=machineclasses
+type MachineClass struct {


I'm not sure if allocatable and capacity is something which should be put in the status of a machine. How can one define allocatable?
At the moment this is calculated by summing kube-reserved, system-reserved and eviction-thresholds https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable

sflxn · 2018-09-12T21:01:54Z

pkg/apis/cluster/v1alpha1/common_types.go

-	// TODO(roberthbailey): Fill these in later
+	// The machine class from which the provider config should be sourced.
+	// +optional
+	MachineClass *MachineClassRef `json:machineClass,omitempty`


I just want to point out I believe this is different from the way StorageClass and RuntimeClass handles ref. I guess it's ok since we're not handling the class the same way as them. They use just a string. Their controller matches the class' string name to match up the provider. So if we were following the same pattern, we would have some controller that matched up the string with a provider known to the controller. It's been a few months since I looked at this, but I believe that is the pattern.

sflxn · 2018-09-12T21:59:21Z

sample/machineclass.yaml

+  providerConfig:
+    valueFrom:
+      machineClass: 
+        provider: gcp


I know this is an example, but can provider be defined in the machine class object above? With the way it was defined in the code above, can provider be provided in either machine class or machine objects?

I have kept provider at machine-object level, keeping in mind the usecase where we might not want to fetch the complete MachineClass when MachineObject is seen/updated. In the future iterations, we can think if it's needed at any other layers as well to be populated explicitly.

@hardikdr

the usecase where we might not want to fetch the complete MachineClass when MachineObject is seen/updated.

I am struggling to understand this use case. Who is "we"? autoscaler?
If it is intended that provider should use default provider config for this machine object and should not refer machine class, may be following will make more sense:

spec: providerConfig: provider: gcp valueFrom: machineClass: name: "" namepsace: ""

i.e if machine class is mentioned it will be refereed otherwise should be left blank for default.

Inline providerConfig will still work as it is currently, classes will only be a parallel mechanism for users to inject the providerConfig via reference. Above comment was explicitly for provider field at MachineObject layer, and not the existence of classes as such :)

vikaschoudhary16 · 2018-09-14T09:52:59Z

Trying to compare machine classes with storage classes. Motivation behind storage classes was the portability across different storage providers. Wondering what is the motivation behind machine class? Can you please link me to a user story that how autoscaler will consume machine classes?

hardikdr · 2018-09-14T14:42:46Z

@vikaschoudhary16 Though we learn from storage-class, we may have our divergent-points, as underlying usecase is different, for instance - namespaced machineclasses.
You might want to be updated with developments in this WIP-PR, I have described few points there
: kubernetes/community#2653

vikaschoudhary16 · 2018-09-18T05:21:35Z

I think it is a common pattern in kubernetes projects to have a separate commit for generated code. This helps reviewers to focus on the actual changes. If not so late already, you might want to rearrange commits to do so.
Thanks!

roberthbailey · 2018-10-04T21:01:51Z

@hardikdr - this needs a rebase (for the generated file).

hardikdr · 2018-10-09T13:53:17Z

@roberthbailey I updated the commits to align with both comment from @vikaschoudhary16 and kubebuilder pr, can you please have a look at the changes made.

roberthbailey · 2018-10-22T20:03:44Z

/lgtm
/approve

This has been waiting for comments for longe enough. Let's get it in and then we can tweak it going forward if we need to (since the api is still alpha).

k8s-ci-robot · 2018-10-22T21:03:23Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hardikdr, roberthbailey

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [roberthbailey]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 29, 2018

k8s-ci-robot requested review from krisnova and medinatiger August 29, 2018 18:28

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 29, 2018

hardikdr commented Aug 29, 2018

View reviewed changes

hardikdr changed the title ~~[WIP] Add apis for machine-class~~ Add apis for machine-class Aug 30, 2018

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 30, 2018

k8s-ci-robot assigned roberthbailey Aug 31, 2018

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 31, 2018

hardikdr force-pushed the machine-class branch from 13eedd0 to dba6f7a Compare September 5, 2018 11:50

roberthbailey reviewed Sep 6, 2018

View reviewed changes

hardikdr force-pushed the machine-class branch from dba6f7a to 9277250 Compare September 10, 2018 13:30

hardikdr mentioned this pull request Sep 10, 2018

WIP: cluster autoscaler integration with machine API kubernetes/community#2653

Closed

mvladev reviewed Sep 11, 2018

View reviewed changes

sflxn reviewed Sep 12, 2018

View reviewed changes

roberthbailey added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 4, 2018

hardikdr force-pushed the machine-class branch from 9277250 to 28568d9 Compare October 9, 2018 13:32

k8s-ci-robot removed the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Oct 9, 2018

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 9, 2018

hardikdr added 2 commits October 9, 2018 19:33

Add APIs for MachineClass

64b6470

Autogenerated files

a68fc1e

hardikdr force-pushed the machine-class branch from 28568d9 to a68fc1e Compare October 9, 2018 14:04

roberthbailey removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 12, 2018

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 22, 2018

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 22, 2018

k8s-ci-robot merged commit 8525414 into kubernetes-sigs:master Oct 22, 2018

hardikdr mentioned this pull request Nov 2, 2018

REQUEST: New membership for @hardikdr kubernetes/org#216

Closed

6 tasks

roberthbailey mentioned this pull request Nov 28, 2018

Add machine class to the Cluster API #22

Closed

enxebre mentioned this pull request Feb 5, 2020

REQUEST: New membership for enxebre kubernetes/org#1614

Closed

6 tasks

Add apis for machine-class #488

Add apis for machine-class #488

Conversation

hardikdr commented Aug 29, 2018

hardikdr Aug 29, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roberthbailey commented Aug 31, 2018

roberthbailey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hardikdr Sep 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enxebre Sep 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hardikdr Sep 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sflxn Sep 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sflxn Sep 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vikaschoudhary16 commented Sep 14, 2018

hardikdr commented Sep 14, 2018

vikaschoudhary16 commented Sep 18, 2018

roberthbailey commented Oct 4, 2018

hardikdr commented Oct 9, 2018

roberthbailey commented Oct 22, 2018

k8s-ci-robot commented Oct 22, 2018

hardikdr Aug 29, 2018 •

edited

Loading

hardikdr Sep 10, 2018 •

edited

Loading

enxebre Sep 12, 2018 •

edited

Loading

hardikdr Sep 13, 2018 •

edited

Loading

sflxn Sep 12, 2018 •

edited

Loading

sflxn Sep 12, 2018 •

edited

Loading