Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Update API with ClusterClass MachinePool support #8820

Merged
merged 1 commit into from
Jul 13, 2023

Conversation

willie-yao
Copy link
Contributor

@willie-yao willie-yao commented Jun 7, 2023

What this PR does / why we need it:
This PR extends the current API to add MachinePool support for ClusterClass. This is the first of many PRs to fully implement MachinePool support for ClusterClass.

This PR covers the "Extend APIs" section of the roadmap for MachinePool support in ClusterClass: #5991 (comment)

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes (partially) #5991

Big thanks to @richardcase for driving most of the implementation!!

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 7, 2023
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 7, 2023
@willie-yao willie-yao changed the title ✨ Update API with ClusterClass MachinePool support ✨ WIP: Update API with ClusterClass MachinePool support Jun 7, 2023
@willie-yao willie-yao changed the title ✨ WIP: Update API with ClusterClass MachinePool support WIP: ✨ Update API with ClusterClass MachinePool support Jun 7, 2023
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 7, 2023
@willie-yao willie-yao force-pushed the cc-mp-api branch 2 times, most recently from 8a14bcf to 8d99a49 Compare June 7, 2023 22:01
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 7, 2023
@willie-yao willie-yao changed the title WIP: ✨ Update API with ClusterClass MachinePool support ✨ Update API with ClusterClass MachinePool support Jun 19, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 19, 2023
Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much!

Looks pretty good so far. Just some questions. But +/- the nits the PR looks good to me and we don't have to add more fields in this iteration.

One fundamental point we have to decide is if we want to merge this before 1.5. I think the very last possible date is when we create the release branch on 11th July (https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/release/releases/release-1.5.md).

I don't have a strong opinion, but it might be good to delay merging this PR until after we create the release branch so the API + the corresponding implementation can land afterwards. I'm not sure if we can get everything done until the code freeze date. If we would only merge the API until 1.5 this can be confusing to users and it also already "freezes" the API which is a bit unnecessary and more flexibility to be able to change the API until the implementation is finished would be probably good.

api/v1alpha4/conversion.go Show resolved Hide resolved
api/v1beta1/cluster_types.go Outdated Show resolved Hide resolved
api/v1beta1/cluster_types.go Show resolved Hide resolved
api/v1beta1/cluster_types.go Show resolved Hide resolved
api/v1beta1/clusterclass_types.go Outdated Show resolved Hide resolved
api/v1beta1/clusterclass_types.go Show resolved Hide resolved
@sbueringer
Copy link
Member

^^ @fabriziopandini @CecileRobertMichon Opinions?

@willie-yao willie-yao force-pushed the cc-mp-api branch 4 times, most recently from 9bee93b to 7d858c5 Compare June 23, 2023 21:58
Co-authored-by: Richard Case <richard.case@outlook.com>
@sbueringer
Copy link
Member

sbueringer commented Jun 26, 2023

Thank you!

/lgtm

/hold
for #8820 (review)

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 26, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 26, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: b16f11d2477ca89eb1f4fd6a0c1af18382adfb78

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes lgtm to me, thanks for picking up the work
Q: If I got this right the implementation of MP support in CC is being deferred to 1.6.0, so are we going to defer merging this on main to as soon as we start developing the 1.6.0 release, right?

// This pool of nodes is managed by a MachinePool object whose lifecycle is managed by the Cluster controller.
type MachinePoolTopology struct {
// Metadata is the metadata applied to the MachinePool.
// At runtime this metadata is merged with the corresponding metadata from the ClusterClass.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for a follow-up, when this is merged/implemented we have to update the corresponding documentation in the book

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we actually try to bundle the book related changes within this PR? Or at least have an issue open before we merge this?

Copy link
Member

@sbueringer sbueringer Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an umbrella issue here #5991 (comment) including an update docs task. Are we looking for something specific?

@sbueringer
Copy link
Member

changes lgtm to me, thanks for picking up the work Q: If I got this right the implementation of MP support in CC is being deferred to 1.6.0, so are we going to defer merging this on main to as soon as we start developing the 1.6.0 release, right?

I don't know but that's why I asked above #8820 (review)

@willie-yao
Copy link
Contributor Author

I think it would make sense to defer this to 1.6.0 for the time being as I don't think there's enough time to fully implement and test the feature. However, would there be any chance of cherry-picking this into a patch release since it doesn't touch existing APIs?

@sbueringer
Copy link
Member

I think it would make sense to defer this to 1.6.0 for the time being as I don't think there's enough time to fully implement and test the feature. However, would there be any chance of cherry-picking this into a patch release since it doesn't touch existing APIs?

I think it touches existing APIs, the Cluster and ClusterClass CRDs.

I'm not sure about cherry-picking, but of course we can discuss it. This is the wording in our backporting policy

While we recommend to target following type of changes to the next minor release, CAPI maintainers will also consider
exceptions for backport of following changes only to the latest supported branch:

  • Enhancements or additions to experimental Cluster API features, with the goal of allowing faster adoption and iteration;
    Please note that stability of the branch will always be top priority while evaluating those PRs, and thus approval
    requires /lgtm from at least two maintainers that, on top of checking that the backport is not introducing any breaking
    change for either API or behavior, will evaluate if the impact of those backport is limited and well-scoped e.g.
    by checking that those changes should not touch non-experimental code paths like utils and/or by applying other
    considerations depending on the specific PR.

@CecileRobertMichon
Copy link
Contributor

Are clusterclass and topology not experimental still (behind a feature flag)? If so, and if all the APIs this PR touches are behind a feature flag, I don't think we necessarily should wait for v1.6 (and also this would technically qualify for cherry-pick per "Enhancements or additions to experimental Cluster API features"). I believe these changes don't actually touch the Cluster API, just the topology API that is part of the same file.

Side question: it's strange to me that those APIs are in the api/ folder and not exp/ with other experimental APIs, is there a reason for that?

@fabriziopandini
Copy link
Member

fabriziopandini commented Jul 5, 2023

My humble two cents.
Given that those changes qualify for backport I will prefer to avoid exposing to the users any non-functional intermediate state (API without the implementation); we can merge those API changes right after the release, add the corresponding implementation, and eventually backport both to the 1.5 branch if we feel it is safe doing it.

I believe these changes don't actually touch the Cluster API, just the topology API that is part of the same file.

Technically they are touching also the Cluster API (cluster.spec.topology), but I'm ok with making an exception given that everything below this field is entirely related to CC.

Side question: it's strange to me that those APIs are in the api/ folder and not exp/ with other experimental APIs, is there a reason for that?

I think the main reason was that it was required to change cluster.spec.topology, but frankly speaking, after some time dealing with experiments it is clear to me that this model has some flaws. I think that we should start discussing:

  • add support for different API groups, so we can have sets of resources to evolve at different speed.
  • getting rid of experiments folders (we can give ownership on api subfolders or controllers subfolder)

I will open an issue and bring up this point at the office hours as soon as I have some time for it

@CecileRobertMichon
Copy link
Contributor

we can merge those API changes right after the release, add the corresponding implementation, and eventually backport both to the 1.5 branch if we feel it is safe doing it.

SGTM. To clarify, we don't actually need to wait for the release to merge this PR, just until the RC is cut (and the release branch is created), correct?

@sbueringer
Copy link
Member

sbueringer commented Jul 6, 2023

add support for different API groups, so we can have sets of resources to evolve at different speed.

I think we don't need different API groups for APIs to evolve at different speed. Enforcing this would require grouping APIs in API groups by the speed that they evolve with.
E.g. we already have the infrastructure API group today where we have absolutely different apiVersions across infra providers.

getting rid of experiments folders (we can give ownership on api subfolders or controllers subfolder)

Huge +1. I think the exp folder is not really useful for anyone. Especially not our users (storing our APIs in a certain package is a very weak and implict way to signal to CAPI end users that something is experimental). In my opinion we should have:

  • feature gates for experimental features which are independent of an API
  • otherwise just have APIs in various versions alpha/beta/v1 depending on their state

As far as I'm aware that is also how it works in k/k. I don't think this mixture of apiVersions, exp folders and feature gates to signal the state of an API is ideal.

. I believe these changes don't actually touch the Cluster API, just the topology API that is part of the same file.

Just to clarify. The "topology API" is just a field in the Cluster API type (Cluster.spec.topology). It is not a separate API type.

SGTM. To clarify, we don't actually need to wait for the release to merge this PR, just until the RC is cut (and the release branch is created), correct?

Yes. In general it would be probably also good if someone who is more familiar with MachinePools could review this PR.

@CecileRobertMichon
Copy link
Contributor

Just to clarify. The "topology API" is just a field in the Cluster API type (Cluster.spec.topology). It is not a separate API type.

Right, I should have said "field" not "API". What I meant is only the Cluster.spec.topology is getting touched in this PR in the Cluster API, and that field is marked as experimental and behind a feature flag:

Topology *Topology `json:"topology,omitempty"`

Yes. In general it would be probably also good if someone who is more familiar with MachinePools could review this PR.

/lgtm

@vincepri
Copy link
Member

vincepri commented Jul 6, 2023

/lgtm

@CecileRobertMichon
Copy link
Contributor

/approve
/hold for 1.5 RC

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 6, 2023
@sbueringer
Copy link
Member

Given lgtm's + that we created release-1.5

As mentioned in #8820 (comment). Further steps are documented in #5991 (comment)

Let me know if there are tasks missing on 5991

/hold cancel
/lgtm
/approve

🎉

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 13, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon, sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [CecileRobertMichon,sbueringer]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 46e045c into kubernetes-sigs:main Jul 13, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.5 milestone Jul 13, 2023
@g-gaston
Copy link
Contributor

/area clusterclass

@k8s-ci-robot k8s-ci-robot added the area/clusterclass Issues or PRs related to clusterclass label Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/clusterclass Issues or PRs related to clusterclass cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants