-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloudprovider: add DigitalOcean #2245
Conversation
I've closed the previous PR as it was based on my personal repository. This new PR is the same, but the changes are based on https://github.com/digitalocean/autoscaler |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/lgtm |
|
||
``` | ||
minimum number of nodes: 1 | ||
maximum number of nodes: 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is significantly different UX from other providers - I think all (?) existing providers require explicitly setting at least max limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the default limit for a given node group right now. It's out upper limit. I can change it to 50 or something else if that makes more sense to you? In the UI (cloud.digitalocean.com) we're going to expose a UI that will have sensible pre-default values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any specific recommendation for the value. I was referring to the fact that there is a default at all, rather than an explicit requirement to specify a value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. This is just a way of limiting the hard maximum we allow. The user experience will be different and our API will validate requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments, but overall looks quite good. Seems like @andrewsykim is also reviewing this? Happy to approve once he lgtms.
Also - feel free to add digitalocean/OWNERS file with yourself and / or whoever is interested in maintaining this from DO side. Core developers have only so much review capacity and as the number of cloudproviders grow we don't want to block internal fixes / improvements on our reviews.
|
||
|
||
``` | ||
make build-binary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommended building in docker using either make build-in-docker
or make release
. Building outside of docker is for developer convenience, but it's best effort only and we know it doesn't work in some environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just tested it and seems like our API depends on https://github.com/google/go-querystring. Can I vendor this or should I also copy this as a subfolder and rewrite the import paths of our API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mwielgus what do you think about this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far we have been very strict on adding any cloud-provider-specific dependencies to global CA vendors. To keep our integrity and not create precedent(even for small Google library) we would kindly ask you to contain everything what your cloud provider needs in your directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mwielgus gotcha. I vendored everything locally into the digitalocean
folder. I re-ran make build-in-docker
again and now it works without any issue. I've force pushed the changes as I've rebased and squashed everything.
cluster-autoscaler/cloudprovider/digitalocean/digitalocean_cloud_provider.go
Show resolved
Hide resolved
} | ||
|
||
for _, node := range nodes { | ||
klog.V(5).Infof("checking node have: %q want: %q", node.Id, nodeID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will produce incredible amount of logs in a decently sized cluster. There are some V(5) logs that are sometimes useful for debugging, but this make V(5) unusable - can you make it V(6) or more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it V(6).
rl *cloudprovider.ResourceLimiter, | ||
) cloudprovider.CloudProvider { | ||
var configFile io.ReadCloser | ||
if opts.CloudConfig != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't that error/fatal if CloudConfig == ""?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This resembles the same layout as the other cloud providers. It doesn't have to error, because in the next step , newManager()
will throw an error because of an invalid configuration.
# Configuration | ||
|
||
The `cluster-autoscaler` dynamically runs based on tags associated with node | ||
pools. These are the current valid tags: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is completely different from how you run CA on different providers. Any particular reason for this design?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This let us dynamically enable/disable the cluster-autoscaler and make it by default as opt-out
. So to opt-in you need to add those tags. Because our UI is not ready yet, this will allow some customers to enable it by adding the tags themself. Also this doesn't require us fiddle with CA flag arguments. It's easier to maintain and ship cluster-autoscaler and requires zero to minimum interaction once it's deployed. Is there something you believe is wrong in this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I didn't realize you're planning to host CA pod (or at least start it automatically). I don't think there is anything inherently wrong with your approach, it's just inconsistent with other providers.
Given that autodiscovery already works differently for each provider I don't think it's a blocker for merging this either. Just a suboptimal experience for someone moving between clouds / running multicloud / etc.
min, err := strconv.Atoi(value) | ||
if err != nil { | ||
return nil, fmt.Errorf("invalid minimum nodes: %q", value) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly return an error for min=0? Currently you just silently override it to 1. Many other providers support scale-to-0 so it's possible for someone to assume digitalocean also supports it and just set it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't support 0
yet :/ Hence I return an error here. This is one of our short comings on our end we're working on fixing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment is that you don't actually return any error. You just treat 0 as 'not set' and silently default to 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly it seems like you can also set max to a negative value as long as you don't set min?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you're right. Added this test case and it fails:
{
name: "bad tags - max is set to negative, no min",
tags: []string{
"k8s-cluster-autoscaler-enabled:true",
"k8s-cluster-autoscaler-max:-5",
},
wantErr: true,
},
Making sure to fix it.
targetSize, delta, updatedNodePool.Count) | ||
} | ||
|
||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to cache TargetSize, in which case you should update it after changing it.
n.clusterID, n.id, nodeID, err) | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update cached TargetSize?
return fmt.Errorf("couldn't increase size to %d (delta: %d). Current size is: %d", | ||
targetSize, delta, updatedNodePool.Count) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update cached TargetSize?
if n.nodePool == nil { | ||
return nil, errors.New("node pool instance is not created") | ||
} | ||
return toInstances(n.nodePool.Nodes), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To get proper error handling you should return 'placeholder' instances for any instance that doesn't exist yet (n.nodePool.Count - len(n.nodePool.Nodes). Unless these is always 0?
It's fine to add it in a separate PR later. This PR is a recent example of this: #2235 and could be used as a reference.
Left most of my review comments in the previous PR #2227. Overall PR LGTM as well.
@fatih feel free to add myself here as well if you need someone in the OWNERs file (note you need to be an kubernetes org member for this to work). |
@andrewsykim sounds good thanks! I've added you to |
case "max": | ||
max, err := strconv.Atoi(value) | ||
if err != nil { | ||
return nil, fmt.Errorf("invalid minimum nodes: %q", value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/minimum/maximum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Please squash your commits once you're ready to merge, so that there are just 2: adding client and implementation. |
@MaciekPytel just did it. The PR now only contains two commits, including recent fixes around zero minimum and negative maxes. There is only one conversation that is not resolved though for us: #2245 (comment) What should I do here? |
@MaciekPytel everything is settled down now. I've fixed all your comments (I think I didn't missed anything). I also squashed and splitted all my changes into two commits. The first commit contains the client (including the dependencies) inside the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mwielgus The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I'm new to this project, but why is the whole godo source copied in a subdirectory instead of just add godo as a vendored dependency? This seems to break |
@micahhausler I also wanted to use |
Hmm that seems strange, where is that documented? |
I'm no longer involved with this project, please ask #sig-autoscaling in Slack or @MaciekPytel. |
cloudprovider: add DigitalOcean
cloudprovider: add DigitalOcean
This adds a new cluster autoscaler for DigitalOcean Kubernetes offering.
cloudprovider/digitalocean/godo
) and added it as a sub package.For reviewers:
I've tried to add each individual commit in a packaged way. The first commit (776bba5) adds our DigitealOcean API package called
godo
. The second and following commits are all related to the actual DigitalOcean cloud provider.cc @timoreimann @snormore
cc @andrewsykim @MaciekPytel
Closes #254