Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Fix creation of target groups and listeners in the reconcile loop #5017

Merged
merged 2 commits into from
Jun 13, 2024

Conversation

r4f4
Copy link
Contributor

@r4f4 r4f4 commented Jun 12, 2024

What type of PR is this?
/kind bug

What this PR does / why we need it:

Fixes an error where capa will create duplicate target groups and listeners.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #5015

Special notes for your reviewer:

Checklist:

  • squashed commits
  • includes documentation
  • includes emojis
  • adds unit tests
  • adds or updates e2e tests

Release note:

Fixes target group and listeners creation for v2 Load Balancers.

r4f4 added 2 commits June 12, 2024 22:11
We were comparing pointers that were never going to be equal. Let's
check their pointed-to values instead.

Also added a break when the listener is found.

Changed the `createdListeners` list to only include a listener if it was
created. This list is currently not used for anything, so this change
should have no impact.
We cannot compare target groups by name because every time we get the
desired spec, it contains newly-generated random names. Instead, let's
check that the prefixes match and that the port and protocol properties
are the same.
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 12, 2024
@k8s-ci-robot k8s-ci-robot requested review from fiunchinho and nrb June 12, 2024 20:25
@k8s-ci-robot k8s-ci-robot added needs-priority needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 12, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @r4f4. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@nrb
Copy link
Contributor

nrb commented Jun 12, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 12, 2024
@r4f4
Copy link
Contributor Author

r4f4 commented Jun 12, 2024

I'm currently running the Openshift e2e on this fix. Will report back when I have results.

@nrb
Copy link
Contributor

nrb commented Jun 12, 2024

/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-blocking
/test pull-cluster-api-provider-aws-e2e-clusterclass

@nrb
Copy link
Contributor

nrb commented Jun 12, 2024

/retest

@patrickdillon
Copy link

LGTM

Copy link
Contributor

@mtulio mtulio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot
Copy link
Contributor

@mtulio: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@r4f4
Copy link
Contributor Author

r4f4 commented Jun 12, 2024

All the openshift tests in this PR, except for one, reached a running cluster stage.

In the job that failed, no duplicated target groups/listeners are created so the issue seems to be fixed.

@nrb
Copy link
Contributor

nrb commented Jun 12, 2024

/retest

@nrb
Copy link
Contributor

nrb commented Jun 12, 2024

/approve
/hold

Holding until it passes e2e. Failures appear to be unrelated to this change, though.

/assign @damdo
for second review.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 12, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nrb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 12, 2024
@damdo
Copy link
Member

damdo commented Jun 13, 2024

/test pull-cluster-api-provider-aws-e2e

Copy link

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matching the groups based on naming sounds like it could become a little fragile over time, for example if we ever decided to change the names.

Is there any reason not to go straight to checking the port and the type rather than relying on names?

What would happen if a manually created target group matched the spec but wasn't named correctly, would the cluster still function?

@r4f4
Copy link
Contributor Author

r4f4 commented Jun 13, 2024

Matching the groups based on naming sounds like it could become a little fragile over time, for example if we ever decided to change the names.

Yes. I tried to move the used prefixes to consts to help with that.

Is there any reason not to go straight to checking the port and the type rather than relying on names?

I was discussing this with @mtulio. Ideally we would check the owned tag but I'm reluctant about adding even more API calls (one of the failures reasons we observed was API rate limitting). If it's acceptable that we might match target groups that were not created by CAPA, then we can check just port/type.

What would happen if a manually created target group matched the spec but wasn't named correctly, would the cluster still function?

Then CAPA would create a new target group. As long as an associated listener doesn't exist for that port, it should work. If there is no listener, then the target group was not associated with the LB and CAPA wouldn't have discovered it in the first place. I haven't tested this though.

@damdo
Copy link
Member

damdo commented Jun 13, 2024

/test pull-cluster-api-provider-aws-e2e

@JoelSpeed
Copy link

Having discussed out of band, the current approach is better than what we have in main today, which is broken and has potential for leaking resources.

I would like to see some more testing of the scenarios where users bring their own target groups or start modifying the target groups after the LB is created, but that needn't block this PR

@@ -1604,8 +1615,9 @@ func (s *Service) reconcileTargetGroupsAndListeners(lbARN string, spec *infrav1.

var listener *elbv2.Listener
for _, l := range existingListeners.Listeners {
if l.DefaultActions != nil && len(l.DefaultActions) > 0 && l.DefaultActions[0].TargetGroupArn == group.TargetGroupArn {
if l.DefaultActions != nil && len(l.DefaultActions) > 0 && *l.DefaultActions[0].TargetGroupArn == *group.TargetGroupArn {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://docs.aws.amazon.com/sdk-for-go/api/service/elbv2/#ELBV2.CreateListener:

This operation is idempotent, which means that it completes at most one time. If you attempt to create multiple listeners with the same settings, each call succeeds. 

So we could probably drop this check for an existing listener and always try createListener.

Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E2Es passed. The fix LGTM
Thanks @r4f4
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 13, 2024
@r4f4
Copy link
Contributor Author

r4f4 commented Jun 13, 2024

/hold cancel
e2e passed.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 13, 2024
@k8s-ci-robot k8s-ci-robot merged commit 888c659 into kubernetes-sigs:main Jun 13, 2024
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Duplicate target groups and listeners created in the ELBv2 reconcile loop
7 participants