Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change healthchecksl4 sync mechanism #1812

Merged

Conversation

panslava
Copy link
Contributor

@panslava panslava commented Sep 13, 2022

Removed shared global struct, instead only use shared mutex
for fake, return healthchecks instance with independent mutex

I also want to make our package pass go test -race, and current fake approach makes it fail

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 13, 2022
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 13, 2022
@panslava
Copy link
Contributor Author

/assign @kl52752

@@ -70,7 +69,6 @@ func newServiceController(t *testing.T, fakeGCE *gce.Cloud) *L4Controller {
for _, n := range nodes {
ctx.NodeInformer.GetIndexer().Add(n)
}
healthchecksl4.Fake(ctx.Cloud, ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So right now all test for controllers are creating the health checks with shared mutex because it is created in NewHandler function. Is it desired?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, NewHandler creates L4 with "synced" healthchecks, and then in all the tests, after NewHandler(), we override them with "Fake" health checks, which are supposed to be not synced. This approach is still questionable, but to fix it, it needs bigger refactoring

but still, if you look at the current Fake code

func Fake(cloud *gce.Cloud, recorderFactory events.RecorderProducer) *l4HealthChecks {
	instance = &l4HealthChecks{
		cloud:           cloud,
		recorderFactory: recorderFactory,
		hcProvider:      healthchecksprovider.NewHealthChecks(cloud, meta.VersionGA),
	}
	return instance
}

mutex sharedResourcesLock sync.Mutex created automatically, cause it is member of struct by "value", so it is suppose to be "unsynced" and it's own for each of result of Fake()

But, now, if multiple tests running in parallel, it can be a race condition in Fake(), if 2 goroutines execute at first

instance = &l4HealthChecks{
		cloud:           cloud,
		recorderFactory: recorderFactory,
		hcProvider:      healthchecksprovider.NewHealthChecks(cloud, meta.VersionGA),
	}

then return instance will return the same instance for both of them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in controller test you don't use Fake for health check you just depend on the controller to create it's own l4 hander and it is using default function and here.
ANd in test code you create for every controller different fakeGCE
And for me there is no race condition unless test use the same fakeGCE (It probably is within one test not across the tests which run in parallel).
ANd for me it looks like all test are locked with the same mutex and use different fakeGCE.
Does it make sense to you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, just seems that l4controller_test and l4netlbcontroller_test don't use Parallel (cause they check "global" metrics") so it will not help if the mutex of health checks will be not shared

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 27, 2022
@panslava panslava force-pushed the change-healthchecks_l4-mutex branch from 27efd4c to 6c3c4f0 Compare September 28, 2022 21:48
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 28, 2022
@panslava
Copy link
Contributor Author

/assign cezarygerard

@cezarygerard
Copy link
Contributor

see my 2 small comments
but overall LGTM
mu better than my singleton attempt ;-)

recorderFactory: recorderFactory,
hcProvider: healthchecksprovider.NewHealthChecks(cloud, meta.VersionGA),
// FakeNotSynced creates instance of l4HealthChecks with independent lock. Use for test only.
func FakeNotSynced(cloud *gce.Cloud, recorder record.EventRecorder) *l4HealthChecks {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not call this just Fake?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I renamed to Fake, difference is not that big,
I just thought that it is so similar to just normal l4HealthChecks, that it is not that clear, what is "fake" here

// instance is a singleton instance, created by Initialize
instance *l4HealthChecks
// sharedLock used to prevent race condition between shared health checks and firewalls.
sharedLock = &sync.Mutex{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this variable should only exist once? I like this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, it is initialised in the very beginning of execution, and then shared between every healthchecksl4

@panslava panslava changed the title Change healthchecksl4 sync method Change healthchecksl4 sync mechanism Oct 6, 2022
Removed shared global struct, instead only use shared mutex
For fake, return healthchecks instance with independent mutex
@panslava panslava force-pushed the change-healthchecks_l4-mutex branch from 6c3c4f0 to 5af9549 Compare October 6, 2022 12:42
@cezarygerard
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 6, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cezarygerard, panslava

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [cezarygerard,panslava]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 43e9cf6 into kubernetes:master Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants