Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rlimit cache rework for Go 1.23+ #4290

Merged
merged 1 commit into from
Jun 1, 2024

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented May 23, 2024

Go 1.23 tightens access to internal symbols, and even puts runc into
"hall of shame" for using an internal symbol (recently added by commit
da68c8e). So, while not impossible, it becomes harder to access those
internal symbols, and it is a bad idea in general.

Since Go 1.23 includes https://go.dev/cl/588076, we can clean the
internal rlimit cache by setting the RLIMIT_NOFILE for ourselves,
essentially disabling the rlimit cache.

Once Go 1.22 is no longer supported, we will remove the go:linkname hack.

@kolyshkin kolyshkin added the backport/1.1-pr A backport PR to release-1.1 label May 23, 2024
@kolyshkin kolyshkin added this to the 1.2.0 milestone May 23, 2024
@kolyshkin kolyshkin requested a review from lifubang May 23, 2024 21:06
@kolyshkin
Copy link
Contributor Author

This will fail CI because https://go.dev/cl/588076 is not yet in released Go versions.

@thaJeztah
Copy link
Member

Thanks for working on this! <3

@lifubang
Copy link
Member

lifubang commented May 23, 2024

NOTE this relies on golang.org/x/sys/unix having https://go.dev/cl/476695,
and Go having https://go.dev/cl/588076.

If we need golang to improve this issue, why not choose a simple way which can reduce two unnecessary syscalls.

golang/go#66797 (comment)
The core reason is that using Get/Set to clear the internal cache is not a atomic operation, I suggest to add a public method to clear this cache directly. For example: runtime.ClearSyscallRlimitCache().

@kolyshkin kolyshkin added backport/1.1-todo A PR in main branch which needs to be backported to release-1.1 and removed backport/1.1-pr A backport PR to release-1.1 labels May 23, 2024
@kolyshkin
Copy link
Contributor Author

The core reason is that using Get/Set to clear the internal cache is not a atomic operation, I suggest to add a public method to clear this cache directly. For example: runtime.ClearSyscallRlimitCache().

We only do Set here (will update PR in a sec).

@lifubang
Copy link
Member

We only do Set here

golang/go#66797 (comment)

@lifubang
Copy link
Member

a853a82
Because we have used procReady msg to avoid the race, if we want to remove go linkname, I recommend to do the original get/set to clear the cache. But I strongly suggest golang to add a specific public API to do this.

@kolyshkin
Copy link
Contributor Author

TODO: add CI for gotip, similar to how it's implemented here:
https://github.com/google/pprof/blob/main/.github/workflows/ci.yaml

Will do later.

@kolyshkin kolyshkin force-pushed the rlimit-rework branch 2 times, most recently from 76a6a86 to 8860620 Compare May 24, 2024 04:50
@kolyshkin
Copy link
Contributor Author

Codespell CI failure is unrelated; being fixed in #4291

@kolyshkin kolyshkin force-pushed the rlimit-rework branch 7 times, most recently from b115856 to adae1d2 Compare May 24, 2024 06:05
@kolyshkin
Copy link
Contributor Author

OK, this PR now has an infra in place to test https://go.dev/cl/588076 once it will be merged.

Currently, in Go 1.23 we have failures like this one (because the rlimit cache is not cleaned):

=== RUN   TestExecInUsernsRlimit
    execin_test.go:129: expected rlimit to be 1026, got 512
--- FAIL: TestExecInUsernsRlimit (0.18s)

Marking as draft for now, pending https://go.dev/cl/588076 merge.

@kolyshkin kolyshkin marked this pull request as draft May 24, 2024 06:19
@kolyshkin kolyshkin changed the title Rlimit cache rework Rlimit cache rework for Go 1.23+ May 24, 2024
@kolyshkin
Copy link
Contributor Author

Now https://go.dev/cl/588076 is merged and everything works as it should, incl. go 1.23.

@thaJeztah
Copy link
Member

Does that mean this is ready for review? 😅 (I see it's still in draft)

@kolyshkin kolyshkin marked this pull request as ready for review May 30, 2024 19:08
@kolyshkin
Copy link
Contributor Author

Does that mean this is ready for review? 😅 (I see it's still in draft)

It is now; PTAL.

Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@@ -24,7 +24,7 @@ jobs:
fail-fast: false
matrix:
os: [ubuntu-20.04, ubuntu-22.04, actuated-arm64-6cpu-8gb]
go-version: [1.20.x, 1.21.x]
go-version: [1.20.x, 1.21.x, tip]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have to alway test the unreleased go version in the future? If there was something wrong in master branch of go repository, we would get ci failure too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have to alway test the unreleased go version in the future?

No, we don't.

The reason for adding this is simple. The code added is for Go 1.23, and there's no other way to test it, and I don't want to push the code that's not tested.

I can remove the gotip commit now when we saw it's working.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed; PTAL @lifubang

@kolyshkin
Copy link
Contributor Author

Let's merge this so we can move forward with https://go-review.googlesource.com/c/go/+/587918

@kolyshkin
Copy link
Contributor Author

(and with the backport)

Go 1.23 tightens access to internal symbols, and even puts runc into
"hall of shame" for using an internal symbol (recently added by commit
da68c8e). So, while not impossible, it becomes harder to access those
internal symbols, and it is a bad idea in general.

Since Go 1.23 includes https://go.dev/cl/588076, we can clean the
internal rlimit cache by setting the RLIMIT_NOFILE for ourselves,
essentially disabling the rlimit cache.

Once Go 1.22 is no longer supported, we will remove the go:linkname hack.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@lifubang lifubang merged commit 854c4af into opencontainers:main Jun 1, 2024
40 checks passed
@kolyshkin kolyshkin added backport/1.1-done A PR in main branch which has been backported to release-1.1 and removed backport/1.1-todo A PR in main branch which needs to be backported to release-1.1 labels Jun 4, 2024
gopherbot pushed a commit to golang/go that referenced this pull request Jun 7, 2024
Since CL 588076 runc can do fine without the kludge. The code accessing the symbol is now guarded with `go:build !go1.23` in all supported runc branches (main: [1], release-1.1: [2]).

This reverts part of CL 587219.

Updates #67401.

For #66797.

[1]: opencontainers/runc#4290
[2]: opencontainers/runc#4299

Change-Id: I204843a93c36857e21ab9b43bd7aaf046e8b9787
Reviewed-on: https://go-review.googlesource.com/c/go/+/587918
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@lifubang lifubang mentioned this pull request Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.1-done A PR in main branch which has been backported to release-1.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants