Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data race: avoid unprotected access to sb.file #408

Merged
merged 1 commit into from
Jun 20, 2024

Conversation

pohly
Copy link

@pohly pohly commented Jun 17, 2024

What this PR does / why we need it:

79575d8 (included in klog v1.130.0) added a small data race with regards to accessing the syncBuffer.file field when calling Sync.

We can eliminate the entire redirectBuffer interface that led to this mistake and instead work directly with syncBuffer for flushing and syncing.

To avoid allocating a slice inside flushAll, fixed-sized struct is returned. However, it still ends up being allocated according to go test -benchmem.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Fixes #407

Special notes for your reviewer:

Against my expectations, BenchmarkLogs does the same number of allocations as before:

name     old time/op    new time/op    delta
Logs-36    7.59µs ± 5%    7.40µs ± 4%   ~     (p=0.099 n=9+10)

name     old alloc/op   new alloc/op   delta
Logs-36      748B ± 0%      748B ± 0%   ~     (p=1.000 n=10+10)

name     old allocs/op  new allocs/op  delta
Logs-36      9.00 ± 0%      9.00 ± 0%   ~     (all equal)

Release note:

Fix race condition in v1.130.0 around syncing file data.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 17, 2024
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 17, 2024
@k8s-ci-robot
Copy link

This issue is currently awaiting triage.

If klog contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 17, 2024
@pohly
Copy link
Author

pohly commented Jun 17, 2024

@1978629634: can you review?

@1978629634
Copy link

@1978629634: can you review?

ok

}

// flushAll flushes all the logs
// l.mu is held.
func (l *loggingT) flushAll() []flushSyncWriter {
files := make([]flushSyncWriter, 0, severity.NumSeverity)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'files' slice has been initialized with an initial capacity, preventing it from triggering reallocations during appends and thereby eliminating additional memory allocations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't reallocate, but it still allocates.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fixed-capacity slice possesses the same underlying mechanism as an array. On the other hand, for slices with dynamic capacities, invoking the append operation may trigger slice expansion, necessitating a reallocation of memory.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misconstrued this.

//
// The result is the number of files which need to be synced and the pointers to them.
func (l *loggingT) flushAll() fileArray {
var needToSync fileArray

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'needToSync' can be refactored as a 'sync.Map' and embedded within the 'loggingT' structure. When creating 'syncBuffer', it should be appended to 'needToSync'. During 'syncAll', data can be retrieved from 'needToSync'.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting what should be a local variable into the global logging feels like a bad workaround. I'd like to understand better why fileArray is not being copied by value.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The map 'l.file' is non-thread-safe, necessitating the creation of a temporary deep copy of 'l.file' to avoid data races. Hence, allocating memory for an array is indispensable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An array has a fixed size. My expectation is that Go therefore can treat fileArray like any other struct and copy it by value instead of doing allocations on the heap.

Is that expectation incorrect?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your expectation is correct.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. BenchmarkLogs calls flushAll/syncAll once for a very large b.N. Whatever difference the changes around memory allocation are, they are insignificant when divided by that b.N. That is why I was seeing 9 allocations for BenchmarkLogs before/after this PR, not because nothing changed.

I double-checked that the current code does indeed not allocate and added a benchmark for it, so I think we are good to merge it.

@1978629634
Copy link

Reply:Against my expectations, BenchmarkLogs does the same number of allocations as before
The allocs/ops average only counts heap allocations, not stack allocations.

Comment on lines 820 to +833
for i := 0; i < b.N; i++ {
Error("error")
Warning("warning")
Info("info")
}
files := logging.flushAll()
logging.syncAll(files)
needToSync := logging.flushAll()
sb, ok := logging.file[severity.InfoLog].(*syncBuffer)
if !ok {
b.Fatal("info wasn't created")
}
if needToSync.num != 1 || needToSync.files[0] != sb.file {
b.Fatalf("Should have received exactly the file from severity.InfoLog for syncing, got instead: %+v", needToSync)
}
logging.syncAll(needToSync)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After placing logging.flushAll() and syncAll(needToSync) snippet inside the for loop, the profiler indicates 10 allocations per operation for slices, one more than for arrays. Arrays utilize stack memory, while slices utilize heap memory, which incurs significantly higher overhead. Thus, transitioning to arrays would be more favorable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've come to the same conclusion 😄

79575d8 (included in klog v1.130.0) added a small data race with regards to
accessing the syncBuffer.file field when calling Sync.

We can eliminate the entire redirectBuffer interface that led to this mistake
and instead work directly with syncBuffer for flushing and syncing.

To avoid allocating a slice inside flushAll, a fixed-sized struct is
returned. A benchmark confirms that flushAll+syncAll can be called without
allocations.
@pohly pohly force-pushed the klog-flush-sync-fix branch from 62d3653 to 2327d4c Compare June 18, 2024 11:54
@1978629634
Copy link

/lgtm

@k8s-ci-robot
Copy link

@1978629634: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@pohly
Copy link
Author

pohly commented Jun 18, 2024

/assign @dims

For final approval.

/hold

In case that @liggitt wants to have a look.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 18, 2024
@liggitt
Copy link
Member

liggitt commented Jun 18, 2024

/hold

In case that @liggitt wants to have a look.

I don't have time, will defer to klog reviewers

@dims
Copy link
Member

dims commented Jun 20, 2024

/hold cancel
/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 20, 2024
@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 1978629634, dims, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 75663bb into kubernetes:main Jun 20, 2024
17 checks passed
renovate bot referenced this pull request in anza-labs/lke-operator Jun 20, 2024
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [k8s.io/klog/v2](https://github.com/kubernetes/klog) | `v2.130.0` ->
`v2.130.1` |
[![age](https://developer.mend.io/api/mc/badges/age/go/k8s.io%2fklog%2fv2/v2.130.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/go/k8s.io%2fklog%2fv2/v2.130.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/go/k8s.io%2fklog%2fv2/v2.130.0/v2.130.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/go/k8s.io%2fklog%2fv2/v2.130.0/v2.130.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|

---

### Release Notes

<details>
<summary>kubernetes/klog (k8s.io/klog/v2)</summary>

###
[`v2.130.1`](https://github.com/kubernetes/klog/releases/tag/v2.130.1):
Prepare klog release for Kubernetes v1.31 (Take 2)

[Compare
Source](https://github.com/kubernetes/klog/compare/v2.130.0...v2.130.1)

#### What's Changed

- data race: avoid unprotected access to sb.file by
[@&#8203;pohly](https://github.com/pohly) in
[https://github.com/kubernetes/klog/pull/408](https://github.com/kubernetes/klog/pull/408)

**Full Changelog**:
kubernetes/klog@v2.130.0...v2.130.1

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/anza-labs/lke-operator).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40MTAuMSIsInVwZGF0ZWRJblZlciI6IjM3LjQxMC4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJhcmVhL2RlcGVuZGVuY3kiXX0=-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
MrAlias referenced this pull request in open-telemetry/opentelemetry-go-contrib Jun 20, 2024
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [k8s.io/klog/v2](https://github.com/kubernetes/klog) | `v2.130.0` ->
`v2.130.1` |
[![age](https://developer.mend.io/api/mc/badges/age/go/k8s.io%2fklog%2fv2/v2.130.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/go/k8s.io%2fklog%2fv2/v2.130.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/go/k8s.io%2fklog%2fv2/v2.130.0/v2.130.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/go/k8s.io%2fklog%2fv2/v2.130.0/v2.130.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|

---

### Release Notes

<details>
<summary>kubernetes/klog (k8s.io/klog/v2)</summary>

###
[`v2.130.1`](https://github.com/kubernetes/klog/releases/tag/v2.130.1):
Prepare klog release for Kubernetes v1.31 (Take 2)

[Compare
Source](https://github.com/kubernetes/klog/compare/v2.130.0...v2.130.1)

#### What's Changed

- data race: avoid unprotected access to sb.file by
[@&#8203;pohly](https://github.com/pohly) in
[https://github.com/kubernetes/klog/pull/408](https://github.com/kubernetes/klog/pull/408)

**Full Changelog**:
kubernetes/klog@v2.130.0...v2.130.1

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/open-telemetry/opentelemetry-go-contrib).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40MTAuMSIsInVwZGF0ZWRJblZlciI6IjM3LjQxMC4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJTa2lwIENoYW5nZWxvZyIsImRlcGVuZGVuY2llcyJdfQ==-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
mx-psi referenced this pull request in open-telemetry/opentelemetry-collector-contrib Jun 25, 2024
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [k8s.io/klog/v2](https://github.com/kubernetes/klog) | `v2.130.0` ->
`v2.130.1` |
[![age](https://developer.mend.io/api/mc/badges/age/go/k8s.io%2fklog%2fv2/v2.130.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/go/k8s.io%2fklog%2fv2/v2.130.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/go/k8s.io%2fklog%2fv2/v2.130.0/v2.130.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/go/k8s.io%2fklog%2fv2/v2.130.0/v2.130.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>kubernetes/klog (k8s.io/klog/v2)</summary>

###
[`v2.130.1`](https://github.com/kubernetes/klog/releases/tag/v2.130.1):
Prepare klog release for Kubernetes v1.31 (Take 2)

[Compare
Source](https://github.com/kubernetes/klog/compare/v2.130.0...v2.130.1)

#### What's Changed

- data race: avoid unprotected access to sb.file by
[@&#8203;pohly](https://github.com/pohly) in
[https://github.com/kubernetes/klog/pull/408](https://github.com/kubernetes/klog/pull/408)

**Full Changelog**:
kubernetes/klog@v2.130.0...v2.130.1

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "on tuesday" (UTC), Automerge - At any
time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/open-telemetry/opentelemetry-collector-contrib).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40MTMuMiIsInVwZGF0ZWRJblZlciI6IjM3LjQxMy4yIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJkZXBlbmRlbmNpZXMiLCJyZW5vdmF0ZWJvdCJdfQ==-->

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: opentelemetrybot <107717825+opentelemetrybot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

klog 1.130.0: race condition during periodic flushing and rotating files
5 participants