Skip to content
This repository has been archived by the owner on Jun 26, 2023. It is now read-only.

Cherrypick: Handle source object overwriting conflict #1115

Merged
merged 5 commits into from
Sep 24, 2020

Conversation

yiqigao217
Copy link
Contributor

See issue at #1076

The fix should include 4 changes, but we only include the first 2 fixes in this cherry-pick into HNC v0.5.

  1. Object webhook (see HNC: Update object validator to avoid overwriting source #1079)
  2. HierarchyConfiguration wehbook
  3. Hncconfig webhook (too complicated to do a cherry-pick, see HNC: Update hncconfig validator to avoid overwriting source #1102)
  4. HNC exceptions (not yet implemented, so irrelevant to v0.5)

Tested by make test

For source objects with a conflicting source in the ancestors, treat
them as propagated objects and overwrite them with the source in the
ancestor. See http://bit.ly/hnc-propagation-conflict

Tested by `make test` with a new integration test case.
Update object webhook to deny creating an object if an object with the
same name and type already exits in the descendants, and the type has
'Propagate' syncMode. Add new a new test case.

Tested manually on GKE cluster and by 'make test'.

The webhook output is like this: Error from server (Conflict): admission webhook "objects.hnc.x-k8s.io" denied the request: Cannot create /v1, Kind=Secret 'my-creds' in namespace 'acme-org' because it would overwrite objects in the following descendant namespace(s): [team-a team-b]. Choose a different name for the object, or remove the conflicting objects from these namespaces.
Update the 5-millisecond sleep to 500-millisecond before detecting
object overwriting for object reconciler test.

This should fix the flaky test on Prow presubmit.

Tested by 'make test'. It was not flaky before and after the change
locally but may help for Prow.
Add an additional rule to make sure there's no potential object
overwriting in hierarchy changes. Format the webhook message to:
"
Cannot update hierarchy because it would overwrite the following object(s):
  * Namespace "team-b": my-creds (/v1, Kind=Secret)
  * Namespace "team-b": my-creds2 (/v1, Kind=Secret)
To fix this, please rename or remove the conflicting objects first.
"

Format the object webhook message to:
"
Cannot create "my-creds2" (/v1, Kind=Secret) in namespace "acme-org" because it would overwrite objects in the following descendant namespace(s):
  * team-a
  * team-b
To fix this, choose a different name for the object, or remove the conflicting objects from the above namespaces.
"

Tested manually on GKE cluster and by 'make test' with new test
cases.
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 17, 2020
@yiqigao217
Copy link
Contributor Author

/assign @adrianludwin
/assign @rjbez17
/cc @GinnyJI

This is the cherry-pick of the Object and HierarchyConfig webhooks to HNC v0.5, to prevent creating conflicting objects and changing parents that would cause this conflict.

@yiqigao217
Copy link
Contributor Author

/retest

@adrianludwin adrianludwin added this to the hnc-v0.5.3 milestone Sep 17, 2020
@adrianludwin
Copy link
Contributor

This has four commits, but I thought it was only supposed to have two. Is that intended?

@yiqigao217
Copy link
Contributor Author

yiqigao217 commented Sep 17, 2020

This has four commits, but I thought it was only supposed to have two. Is that intended?

Yes, besides the object & hc webhook changes, there are also one for object reconciler behavior change to overwriting conflicting objects (this is the first commit), and another one to make the test less flaky with a longer sleep time for the new tests included.

@adrianludwin
Copy link
Contributor

Given that this is a change onto a stable branch, can you please do a bit more testing on it? It would be nice to see the full suite of e2e tests at least. Thanks!

Other than that this lgtm. @rjbez17 , what do you think of checking this into v0.5.3? It solves most of the problem that we identified that you can have silently overridden policies while not being so dangerous that they all get overwritten. The one dangerous part of this change is that if you turn on propagation for a new object type, any conflicting objects will get swept away; we'll fix that in v0.6 but it's too hard to fix in v0.5.x. But given the overall stability of HNC, I think that's an acceptable compromise.

@yiqigao217
Copy link
Contributor Author

Given that this is a change onto a stable branch, can you please do a bit more testing on it? It would be nice to see the full suite of e2e tests at least. Thanks!

Sure, will do. BTW just confirming that we don't have e2e test checked in in v0.5, I will have to manually add the e2e test files on my local branch to test, right? @adrianludwin

@adrianludwin
Copy link
Contributor

adrianludwin commented Sep 18, 2020 via email

@yiqigao217
Copy link
Contributor Author

Update: among 24 e2e tests, 22 passed and 2 failed:

[Fail] Issues [It] Should remove obsolete conditions CannotPropagateObject and CannotUpdateObject - issue #328 
[Fail] Issues [It] Should clear CannotUpdate conditions in descendants when a hierarchy changes - issue #605 

They both failed on webhook internal error on kubectl hns set sub-child --root:

Running:  [kubectl hns set sub-child --root]
Output:  Unsetting the parent of sub-child (was previously child)

Could not update the hierarchical configuration of sub-child.
Reason: Internal error occurred: failed calling webhook "hierarchyconfigurations.hnc.x-k8s.io": Post https://hnc-webhook-service.hnc-system.svc:443/validate-hnc-x-k8s-io-v1alpha1-hierarchyconfigurations?timeout=30s: stream error: stream ID 13139; INTERNAL_ERROR

Please note other HC changes such as kubectl hns set sub-child --parent child all passed. I'm looking into the issue now.

@adrianludwin
Copy link
Contributor

adrianludwin commented Sep 18, 2020 via email

The bug was found when running e2e tests that the webhook got
INTERNAL_ERROR when setting a namespace as root. Fix it by checking if
the new parent is nil. If yes, early exit since it's impossible to
introduce new naming conflicts.

Add a new unit test case for it. The test failed before and passed after
the change.

Tested by make test and manually.
@yiqigao217
Copy link
Contributor Author

Update: the e2e test error is fixed and cherry-picked into this PR too. I reran the e2e tests focusing (FIt()) on the two previously failed tests and it passed.

Copy link
Contributor

@adrianludwin adrianludwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
/hold
/assign @rjbez17

This is a larger change (both LoC and behaviour) than I'd usually want to put on a branch, but I think it makes sense. Our current behaviour re conflicting objects is clearly broken and this seems like a reasonable approach. Ideally a couple of us will play around with this before we release 0.5.3 to see if any other issues crop up with it.

@rjbez17 , wdyt?

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Sep 21, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adrianludwin, yiqigao217

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 21, 2020
@adrianludwin
Copy link
Contributor

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 24, 2020
@k8s-ci-robot k8s-ci-robot merged commit 6f44c1c into kubernetes-retired:hnc-v0.5 Sep 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants