Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support custom instance types with the KWOK provider #1048

Merged

Conversation

drmorr0
Copy link
Contributor

@drmorr0 drmorr0 commented Feb 27, 2024

Fixes #987

Description

This extends the KWOK provider for Karpenter by allowing users to specify custom lists of instance types that will be launched by Karpenter/KWOK.

How was this change tested?
Manual testing in a local kind+kwok cluster; I confirmed that, with the provided kwok/examples/instance_types.json, that an nginx deployment with zonal pod topology spread enabled worked correctly:

> cat test.k8s.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: test
  name: test-depl
  namespace: simkube
spec:
  replicas: 3
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
        - env:
            - name: POD_OWNER
              value: test-depl
          image: nginx:latest
          name: nginx
          resources:
            requests:
              cpu: "1"
      nodeSelector:
        type: virtual
      tolerations:
        - effect: NoSchedule
          key: kwok-provider
          value: "true"
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: "topology.kubernetes.io/zone"
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: test
> kubectl get nodeclaims
NAME                 TYPE         ZONE         NODE                        READY   AGE
kwok-default-9szxv   g5.4xlarge   us-east-1c   strange-swirles-833325525   True    4m41s
kwok-default-twk5m   g5.4xlarge   us-east-1b   strange-joliot-1822046751   True    4m41s
kwok-default-vrcqw   g5.4xlarge   us-east-1a   kind-ganguly-3882149666     True    4m41s
> kubectl get nodes
NAME                        STATUS   ROLES           AGE     VERSION
karpenter-control-plane     Ready    control-plane   15d     v1.29.0
karpenter-worker            Ready    <none>          15d     v1.29.0
kind-ganguly-3882149666     Ready    <none>          5m18s   kwok-v0.4.0
strange-joliot-1822046751   Ready    <none>          5m18s   kwok-v0.4.0
strange-swirles-833325525   Ready    <none>          5m18s   kwok-v0.4.0
> kubectl get pods
NAMESPACE   NAME                             READY   STATUS    RESTARTS   AGE
simkube     test-depl-c5f8ddd84-h5tzw        1/1     Running   0          5m58s
simkube     test-depl-c5f8ddd84-nnz9x        1/1     Running   0          5m58s
simkube     test-depl-c5f8ddd84-xshgx        1/1     Running   0          5m58s

Here are the relevant logs from karpenter:

{"level":"INFO","time":"2024-02-27T23:00:02.648Z","logger":"controller.provisioner","message":"computed new nodeclaim(s) to fit pod(s)","commit":"7b5bd17-dirty","nodeclaims":3,"pods":3}
{"level":"INFO","time":"2024-02-27T23:00:02.666Z","logger":"controller.provisioner","message":"created nodeclaim","commit":"7b5bd17-dirty","nodepool":"kwok-default","nodeclaim":"kwok-default-9szxv","requests":{"cpu":"1","pods":"2"},"instance-types":"g5.4xlarge"}
{"level":"INFO","time":"2024-02-27T23:00:02.667Z","logger":"controller.provisioner","message":"created nodeclaim","commit":"7b5bd17-dirty","nodepool":"kwok-default","nodeclaim":"kwok-default-vrcqw","requests":{"cpu":"1","pods":"2"},"instance-types":"g5.4xlarge"}
{"level":"INFO","time":"2024-02-27T23:00:02.667Z","logger":"controller.provisioner","message":"created nodeclaim","commit":"7b5bd17-dirty","nodepool":"kwok-default","nodeclaim":"kwok-default-twk5m","requests":{"cpu":"1","pods":"2"},"instance-types":"g5.4xlarge"}
{"level":"INFO","time":"2024-02-27T23:00:02.683Z","logger":"controller.nodeclaim.lifecycle","message":"launched nodeclaim","commit":"7b5bd17-dirty","nodeclaim":"kwok-default-vrcqw","provider-id":"kwok://kind-ganguly-3882149666","instance-type":"g5.4xlarge","zone":"us-east-1a","capacity-type":"on-demand","allocatable":{"cpu":"15900m","ephemeral-storage":"600Gi","m
emory":"65526Mi","nvidia.com/gpu":"1","pods":"110"}}
{"level":"INFO","time":"2024-02-27T23:00:02.685Z","logger":"controller.nodeclaim.lifecycle","message":"launched nodeclaim","commit":"7b5bd17-dirty","nodeclaim":"kwok-default-twk5m","provider-id":"kwok://strange-joliot-1822046751","instance-type":"g5.4xlarge","zone":"us-east-1b","capacity-type":"on-demand","allocatable":{"cpu":"15900m","ephemeral-storage":"600Gi",
"memory":"65526Mi","nvidia.com/gpu":"1","pods":"110"}}
{"level":"INFO","time":"2024-02-27T23:00:02.687Z","logger":"controller.nodeclaim.lifecycle","message":"launched nodeclaim","commit":"7b5bd17-dirty","nodeclaim":"kwok-default-9szxv","provider-id":"kwok://strange-swirles-833325525","instance-type":"g5.4xlarge","zone":"us-east-1c","capacity-type":"on-demand","allocatable":{"cpu":"15900m","ephemeral-storage":"600Gi",
"memory":"65526Mi","nvidia.com/gpu":"1","pods":"110"}}

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 27, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 27, 2024
@drmorr0 drmorr0 force-pushed the drmorr/kwok-custom-instance-types branch from c12234e to 8091027 Compare February 27, 2024 23:09
kwok/README.md Show resolved Hide resolved
@coveralls
Copy link

coveralls commented Feb 27, 2024

Pull Request Test Coverage Report for Build 8976607133

Details

  • 0 of 138 (0.0%) changed or added relevant lines in 4 files are covered.
  • 2 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.3%) to 78.566%

Changes Missing Coverage Covered Lines Changed/Added Lines %
kwok/main.go 0 6 0.0%
kwok/cloudprovider/cloudprovider.go 0 14 0.0%
kwok/cloudprovider/helpers.go 0 52 0.0%
kwok/tools/gen_instance_types.go 0 66 0.0%
Files with Coverage Reduction New Missed Lines %
kwok/cloudprovider/cloudprovider.go 1 0.0%
kwok/cloudprovider/helpers.go 1 0.0%
Totals Coverage Status
Change from base Build 8975908762: -0.3%
Covered Lines: 8350
Relevant Lines: 10628

💛 - Coveralls

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 28, 2024
Copy link
Contributor

@njtran njtran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nice work! Love the simplifications you've made.

Makefile Outdated Show resolved Hide resolved
kwok/README.md Outdated Show resolved Hide resolved
kwok/README.md Outdated Show resolved Hide resolved
kwok/charts/templates/config.yaml Outdated Show resolved Hide resolved
kwok/cloudprovider/cloudprovider.go Show resolved Hide resolved
kwok/cloudprovider/helpers.go Outdated Show resolved Hide resolved
kwok/cloudprovider/helpers.go Outdated Show resolved Hide resolved
kwok/cloudprovider/helpers.go Show resolved Hide resolved
kwok/cloudprovider/helpers.go Show resolved Hide resolved
kwok/main.go Outdated Show resolved Hide resolved
@drmorr0 drmorr0 force-pushed the drmorr/kwok-custom-instance-types branch 2 times, most recently from 47c2326 to 2b4e7f9 Compare March 1, 2024 19:35
@drmorr0
Copy link
Contributor Author

drmorr0 commented Mar 1, 2024

@njtran I've responded to your comments and committed some changes, lmk what you think! If these look good I can rebase them into the stack.

The major update is that, as suggested, we're now just embedding the data into the binary, so I've reverted all the changes to the helm charts.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 14, 2024
kwok/README.md Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
kwok/cloudprovider/cloudprovider.go Show resolved Hide resolved
kwok/cloudprovider/cloudprovider.go Outdated Show resolved Hide resolved
kwok/cloudprovider/generic_instance_types.json Outdated Show resolved Hide resolved
kwok/cloudprovider/helpers.go Outdated Show resolved Hide resolved
@drmorr0 drmorr0 force-pushed the drmorr/kwok-custom-instance-types branch from 2b4e7f9 to 6fa29a9 Compare March 20, 2024 22:44
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 20, 2024
@drmorr0 drmorr0 force-pushed the drmorr/kwok-custom-instance-types branch from 6fa29a9 to bf32b35 Compare March 20, 2024 22:45
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 20, 2024
@drmorr0 drmorr0 force-pushed the drmorr/kwok-custom-instance-types branch from bf32b35 to 629b3e7 Compare March 22, 2024 16:44
@drmorr0
Copy link
Contributor Author

drmorr0 commented Mar 22, 2024

@njtran any other concerns or comments about this PR? Did my response about the spot/on-demand logic make sense?

kwok/README.md Outdated Show resolved Hide resolved
kwok/README.md Outdated Show resolved Hide resolved
kwok/README.md Outdated Show resolved Hide resolved
kwok/cloudprovider/cloudprovider.go Show resolved Hide resolved
kwok/cloudprovider/helpers.go Show resolved Hide resolved
@drmorr0 drmorr0 force-pushed the drmorr/kwok-custom-instance-types branch from 629b3e7 to 108919f Compare March 28, 2024 18:45
Copy link
Contributor

@njtran njtran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 5, 2024
@njtran
Copy link
Contributor

njtran commented Apr 5, 2024

/lgtm
/approve
Will wait for @jonathan-innis to take a look

@njtran
Copy link
Contributor

njtran commented Apr 5, 2024

/lgtm cancel

Oops got the command wrong

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 5, 2024
@njtran
Copy link
Contributor

njtran commented Apr 5, 2024

/remove-approve

@k8s-ci-robot k8s-ci-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 5, 2024
@drmorr0
Copy link
Contributor Author

drmorr0 commented Apr 5, 2024

Added the boilerplate to gen_instance_types.go

Makefile Show resolved Hide resolved
kwok/cloudprovider/cloudprovider.go Show resolved Hide resolved
kwok/cloudprovider/cloudprovider.go Outdated Show resolved Hide resolved
kwok/tools/gen_instance_types.go Show resolved Hide resolved
kwok/cloudprovider/helpers.go Show resolved Hide resolved
kwok/cloudprovider/helpers.go Outdated Show resolved Hide resolved
kwok/cloudprovider/helpers.go Outdated Show resolved Hide resolved
kwok/cloudprovider/helpers.go Outdated Show resolved Hide resolved
kwok/cloudprovider/helpers.go Show resolved Hide resolved
kwok/main.go Outdated Show resolved Hide resolved
@drmorr0 drmorr0 force-pushed the drmorr/kwok-custom-instance-types branch 2 times, most recently from d6f08b1 to d95b7da Compare April 8, 2024 22:15
@drmorr0
Copy link
Contributor Author

drmorr0 commented Apr 8, 2024

@jonathan-innis OK I think I've addressed the remaining comments in here? I tested this again locally and it seems to work, and tests here pass as well. If there are additional concerns, I'd kinda like to address those in follow-up PRs if that's OK? There's been quite a bit of back-and-forth on this already.

kwok/cloudprovider/helpers.go Outdated Show resolved Hide resolved
kwok/cloudprovider/helpers.go Show resolved Hide resolved
kwok/tools/gen_instance_types.go Outdated Show resolved Hide resolved
@drmorr0 drmorr0 force-pushed the drmorr/kwok-custom-instance-types branch from d3fb908 to e7d6ea3 Compare May 4, 2024 00:19
@drmorr0 drmorr0 force-pushed the drmorr/kwok-custom-instance-types branch from e7d6ea3 to e613da8 Compare May 4, 2024 00:43
@drmorr0 drmorr0 force-pushed the drmorr/kwok-custom-instance-types branch from e613da8 to 2547248 Compare May 6, 2024 22:25
Copy link
Contributor

@njtran njtran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Good work! Thanks for committing this!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 7, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: drmorr0, njtran

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 7, 2024
@njtran
Copy link
Contributor

njtran commented May 7, 2024

/remove-hold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 7, 2024
@k8s-ci-robot k8s-ci-robot merged commit 11d9fd2 into kubernetes-sigs:main May 7, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Non-hard-coded list of instances for the KWOK provider
5 participants