Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add externally managed predicate #2383

Merged
merged 1 commit into from
May 25, 2021
Merged

Add externally managed predicate #2383

merged 1 commit into from
May 25, 2021

Conversation

alexander-demicev
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

Add externally managed predicate. Clusters marked with "cluster.x-k8s.io/managed-by" annotation should be skipped from reconciliation.

kubernetes-sigs/cluster-api#4135
kubernetes-sigs/cluster-api#4135

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Checklist:

  • squashed commits
  • includes documentation
  • adds unit tests
  • adds or updates e2e tests

Release note:

Add externally managed predicate. Clusters marked with `"cluster.x-k8s.io/managed-by"` annotation should be skipped from reconciliation.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 5, 2021
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 5, 2021
@enxebre
Copy link
Member

enxebre commented May 10, 2021

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 10, 2021
@@ -343,6 +343,17 @@ func (m *MachineScope) IsEKSManaged() bool {
return m.InfraCluster.InfraCluster().GetObjectKind().GroupVersionKind().Kind == "AWSManagedControlPlane"
}

func (m *MachineScope) IsExternallyManaged() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enxebre
Copy link
Member

enxebre commented May 10, 2021

@alexander-demichev can you please include "fix #2356" in the PR description?

Other than this nit https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/2383/files#r629435596 this lgtm overall.
PTAL @JoelSpeed @randomvariable

@@ -173,7 +173,7 @@ func (s *Service) CreateInstance(scope *scope.MachineScope, userData []byte) (*i
}
input.SubnetID = subnetID

if !scope.IsEKSManaged() && s.scope.Network().APIServerELB.DNSName == "" {
if !scope.IsExternallyManaged() && !scope.IsEKSManaged() && s.scope.Network().APIServerELB.DNSName == "" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that it being externally managed means there should be no reconciliation at all? So do we not need something like

if scope.IsExternallyManaged() {
  // Should not reconcile
  return nil, nil
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is related to instance creation, we are skipping logic that is relying on infrastructure created by the managed cluster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that if we are using external but still provide the APIServer DNSName then we still work as expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic is only used to check that a load balancer was created.

err := r.Get(ctx, key, awsCluster)
if err != nil {
log.V(4).Error(err, "Failed to get AWS cluster")
panic(fmt.Sprintf("Failed to get AWS cluster %T", err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's inline the error with the if here.
Also why %T rather than %v?

@randomvariable
Copy link
Member

Thanks for this. Looks ok from a cursory glance with the comments from @enxebre , but will take a deeper look

@@ -173,7 +173,7 @@ func (s *Service) CreateInstance(scope *scope.MachineScope, userData []byte) (*i
}
input.SubnetID = subnetID

if !scope.IsEKSManaged() && s.scope.Network().APIServerELB.DNSName == "" {
if !scope.IsExternallyManaged() && !scope.IsEKSManaged() && s.scope.Network().APIServerELB.DNSName == "" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that if we are using external but still provide the APIServer DNSName then we still work as expected?

if scope.IsExternallyManaged() {
return nil, nil
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to have the same effect as the changes on lines 191 right?

Also, do we not still need security groups? Or are they just fetched explicitly from the AWSCluster in this case?

Copy link
Member

@enxebre enxebre May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to have the same effect as the changes on lines 191 right?

I think this is called in more places, so we possibly want to drop the other one and keep this logic here within the func.

Also, do we not still need security groups? Or are they just fetched explicitly from the AWSCluster in this case?

I expect machines with externallyManaged infra to set their desired security groups explicitly through the API contract https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/main/api/v1alpha4/awsmachine_types.go#L100
and don't do any infra naming convention assumptions which would be driven by the not running infra controller. In fact without this change this would fail

return nil, awserrors.NewFailedDependency(fmt.Sprintf("%s security group not available", sg))


if err := r.Get(ctx, key, awsCluster); err != nil {
log.V(4).Error(err, "Failed to get AWS cluster")
panic(fmt.Sprintf("Failed to get AWS cluster %v", err))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think panicking here is probably not the way to go, as this could happen for example when the API is unavailable or if the Cluster resource has been created before the AWSCluster resource.

IMO, the solution here when there is an error is to log the error (as you've done) and fail closed (ie return nil). It means we miss the potential event from the Cluster object, but it's better than panicking IMO.

AFAIK the events from the cluster mapping are a bonus anyway, so should be safe to fail closed here

@sedefsavas
Copy link
Contributor

Should we also ensure this in this PR?

External infrastructure providers should ensure that the annotation, once set, cannot be removed.

https://github.com/kubernetes-sigs/cluster-api/blob/3065a926259f682f65cb8331b5f2543b270882a8/api/v1alpha4/common_types.go#L92-L94

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 12, 2021
ids, err := s.GetCoreSecurityGroups(scope)
if err != nil {
return nil, err
if !scope.IsExternallyManaged() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are discriminating here https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/2383/files#r630312878 seems we can drop this.

@@ -104,6 +105,13 @@ func (r *AWSCluster) ValidateUpdate(old runtime.Object) error {
)
}

if annotations.IsExternallyManaged(oldC) && !annotations.IsExternallyManaged(r) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to check the other way around as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the user can add this annotation and disable infrastructure management, am I wrong?

Copy link
Member

@enxebre enxebre May 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going from managed to externally managed should be allowed. The other way around is not allowed.

@enxebre
Copy link
Member

enxebre commented May 17, 2021

/lgtm
PTAL @JoelSpeed @randomvariable @sedefsavas

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 17, 2021
Copy link

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks for working on this @alexander-demichev

@sedefsavas
Copy link
Contributor

Shall we add documentation for externally managed clusters? A separate issue to track it is fine.
Also, maybe a developer's note for making sure the reconciliation is blocked in the mapping functions too? I remember there was a discussion in the cluster-api channel started by @JoelSpeed about this, but don't recall what the decision was.

@JoelSpeed
Copy link

+1 @sedefsavas I think we need to make some updates to the developers notes about predicates in general. This isn't just a problem for this predicate but also things like the paused predicate. I don't think we ever really reached a conclusion on this in the thread

@enxebre
Copy link
Member

enxebre commented May 19, 2021

Created #2412 to track docs.

@enxebre
Copy link
Member

enxebre commented May 21, 2021

@randomvariable @sedefsavas @alexander-demichev @JoelSpeed any objection to proceed with this?

@JoelSpeed
Copy link

I don't see the documentation as a blocker for this, we should prioritise updating that before we make this update for other providers though IMO. Happy to proceed from my perspective.

@sedefsavas
Copy link
Contributor

/lgtm
cc @randomvariable if you want to have a look again, otherwise will approve by the end of day.

@sedefsavas
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sedefsavas

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 25, 2021
@k8s-ci-robot k8s-ci-robot merged commit c53c107 into kubernetes-sigs:main May 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants