Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow for specifying subnet type for az #4464

Merged
merged 3 commits into from
Sep 26, 2023

Conversation

davidspek
Copy link
Contributor

@davidspek davidspek commented Aug 25, 2023

What type of PR is this?
/kind feature

What this PR does / why we need it:

Currently when specifying the availability zone(s) for a (managed) machine pool, all the subnets in the zone(s) will be used. By default CAPI creates both a public and a private subnet in each availability zone. So when a machine pool specifies an availability zone, both the public and private subnets for the zone are used for the machine pool. It seems as though the private subnet is preferred in the ASG, however, I haven't been able to find any documentation regarding which subnet an instance gets launched in when an ASG contains both a private and public subnet. This means it's nondeterministic in which subnet an instance of the ASG will be launched in.

To solve for this, a new optional parameter AvailabilityZoneSubnetType is added to the AWSManagedMachinePool and AWSMachinePool specs allowing administrators to choose if the machine pool should use public or private subnets in the requested availability zones. If the parameter isn't specified the current behavior is kept where both public and private subnets in the availability zone(s) are used. This is also related to #2191 where the preferred subnet type is private when no options are specified, but doing this currently isn't possible when specifying availability zone(s) for a machine pool.

Which issue(s) this PR fixes:
Fixes #2991

Special notes for your reviewer:

Checklist:

  • squashed commits
  • includes documentation
  • adds unit tests
  • adds or updates e2e tests

Release note:

Allow for specifying if private or public subnets should be used when availability zones are defined in `AWSManagedMachinePool` and `AWSMachinePool` specs.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Aug 25, 2023
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 25, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @davidspek. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 25, 2023
@AndiDog
Copy link
Contributor

AndiDog commented Aug 29, 2023

Likely, no new feature is needed. Please try this:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachinePool
spec:
  subnets:
  - filters:
      - name: tag:kubernetes.io/cluster/mycluster
        values:
          - shared
          - owned
      - name: tag:sigs.k8s.io/cluster-api-provider-aws/role
        values:
          - private
# [...]

@davidspek
Copy link
Contributor Author

davidspek commented Aug 29, 2023

Likely, no new feature is needed. Please try this:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachinePool
spec:
  subnets:
  - filters:
      - name: tag:kubernetes.io/cluster/mycluster
        values:
          - shared
          - owned
      - name: tag:sigs.k8s.io/cluster-api-provider-aws/role
        values:
          - private
# [...]

We are using AWSManagedMachinePool resources which don't allow for filtering the subnets in that manor.

@davidspek
Copy link
Contributor Author

@AndiDog Also, I don't think the subnet filter works alongside the specified availability zones. We need to be able to specify the type of subnet (private or public) for the specified availability zone(s).

@AndiDog
Copy link
Contributor

AndiDog commented Aug 29, 2023

Code-wise, for AWSManagedMachinePool, it's func (p *defaultSubnetPlacementStrategy) Place(input *placementInput) ([]string, error) which decides the subnets to use. That's also the place you found and improved.

I'd love if EKS types became more consistent with non-EKS. At best, we would implement AWSManagedMachinePool.Spec.Subnets and deprecate AWSManagedMachinePool.Spec.SubnetIDs, while still keeping backward compatibility for a while.

You said in chat

After looking at implementing the Subnets field for the AWSManagedMachinePool I noticed this doesn't actually work alongside the specified availability zones.

Can you explain what that means? Did you hit a roadblock implementing that?

@davidspek
Copy link
Contributor Author

davidspek commented Aug 29, 2023

The Subnets field allows you to filter subnets in the VPC that should be used by a node pool. However, it doesn't take the availability zone(s) set in the spec.availabilityZones into account. Even with implementing Subnets in the machine pool spec, you can't limit a machine pool to a specific availability zone and subnet type (private or public).

So in short Subnets can make it easy to allow you specify creating node pool(s) into all public or private subnets in the VPC.
We want to be able to specify public or private subnets for a specific availability zone.

The spec.availabilityZones is used to make it easier for people to place node pools into specific AZs, but all it actually does is subnet selection. The field is somewhat useless if you can't tell it in which type of subnet you wouldn't want to place it (since you probably don't want public unless you need it for some very specific reason in which case you wouldn't want it to use private). If you can also filter for the AZ in the subnet filter that could be a solution, but then spec.availabilityZones might as well be removed since it doesn't fully work as a helper field.

@richardcase
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 31, 2023
@davidspek
Copy link
Contributor Author

/retest

@davidspek
Copy link
Contributor Author

@richardcase The tests pass now.

@AndiDog
Copy link
Contributor

AndiDog commented Aug 31, 2023

The spec.availabilityZones is used to make it easier for people to place node pools into specific AZs, but all it actually does is subnet selection. The field is somewhat useless if you can't tell it in which type of subnet you wouldn't want to place it (since you probably don't want public unless you need it for some very specific reason in which case you wouldn't want it to use private). If you can also filter for the AZ in the subnet filter that could be a solution, but then spec.availabilityZones might as well be removed since it doesn't fully work as a helper field.

All true! I think on top of the tag sigs.k8s.io/cluster-api-provider-aws/role={public,private}, there should be one for the AZ, such that the Subnets[*].Filter can use them.

@AndiDog
Copy link
Contributor

AndiDog commented Aug 31, 2023

@richardcase The above ideas (not what the PR currently implements) would be breaking changes and probably we can consider that on top of the AWSNetwork CRD idea in order to improve networking.

@@ -52,6 +52,11 @@ type AWSMachinePoolSpec struct {
// AvailabilityZones is an array of availability zones instances can run in
AvailabilityZones []string `json:"availabilityZones,omitempty"`

// AvailabilityZoneSubnetType specifies which type of subnets to use when an availability zone is specified.
// +kubebuilder:validation:Enum:=public;private
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add the existing situation as well explicitly (i.e. all or publicandprivate)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that people can explicitly choose that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can definitely add that. Currently not setting the value does that, but we could add an explicit toggle for it as well in case the default logic changes in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardcase I've rebased and added an option for all along with updated unit tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @davidspek ...i'll take a look

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 1, 2023
Signed-off-by: David van der Spek <vanderspek.david@gmail.com>
Signed-off-by: David van der Spek <vanderspek.david@gmail.com>
Signed-off-by: David van der Spek <vanderspek.david@gmail.com>
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 1, 2023
@davidspek
Copy link
Contributor Author

@richardcase Have you been able to take another look at this?

@davidspek
Copy link
Contributor Author

@AndiDog Maybe you can also have a look here since you've just become a member.

Copy link
Contributor

@AndiDog AndiDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Looks good. Did you also test this successfully for your use case?

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 12, 2023
@davidspek
Copy link
Contributor Author

@AndiDog Yes, I've built an image for the controller and been able to successfully deploy AWSManagedMachinePool resources that specify the AZs as well as the new AvailabilityZoneSubnetType and those node groups are only bound to the private subnet(s).

@davidspek
Copy link
Contributor Author

@AndiDog @richardcase What's left before this can be merged?

@davidspek
Copy link
Contributor Author

Friendly ping @richardcase @AndiDog

@Ankitasw Ankitasw added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Sep 26, 2023
@Ankitasw
Copy link
Member

/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-eks

@Ankitasw
Copy link
Member

/approve
/hold

until tests passes. @davidspek feel free to unhold once tests passes

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 26, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ankitasw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 26, 2023
@davidspek
Copy link
Contributor Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 26, 2023
@k8s-ci-robot k8s-ci-robot merged commit 8188bba into kubernetes-sigs:main Sep 26, 2023
3 checks passed
@davidspek davidspek deleted the allow-private-az branch September 26, 2023 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWSManagedMachinePool with availabilityZones mapped are getting public IP addresses
5 participants