-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default local resources should respect cgroup limits on Linux #3886
Comments
I can still reproduce this issue with our large app on Circle CI using Bazel 0.10.0 I configured Circle to use their "large" container which includes 8GB of RAM, and set my bazel.rc file to use |
Yes. Bazel's resource management is based on guesstimates as to how much RAM / CPU / IO a certain action will need. If you have a container with just 4GB RAM and no swap space, stuff will horribly die, but so would a physical machine with 4GB RAM, no allowed memory overcommit and no swap space (which is what it looks like CircleCI is doing here). I think the only way to fix this is to limit the parallelism, e.g. try to run with --jobs=4 and adjust up or down as needed. If you have an idea how Bazel could solve this better, I'd be very happy to hear about it :) |
Thanks @philwo that's a good suggestion. I have added it and will continue to monitor results. So far so good. For resource limitation I am using this in Circle CI (in a container with 8 GB of RAM):
and using this in tools/bazel.rc:
|
It seems that it's possible to query a container's own memory limit via I think there is another open bug about CPU limits in this same context, and those details are also available via |
local_resources
defaults should respect cgroup memory limits on Linux
Ideally, it'll work with both cgroups v1 and v2. |
This issue is still present even in bazel 4.x. |
Bazel 4.x uses OpenJDK 11, which has the container resource detection improvements from https://bugs.openjdk.java.net/browse/JDK-8146115 and thus everything "should" just work out of the box if we'd just use the Java APIs to get the available RAM and CPU. 🤔 However, it looks like we're trying to be too smart and parse /proc/cpuinfo and /proc/meminfo ourselves. I'll try to find someone who can look into this so we can fix it. |
FYI that might be #5042. |
Interesting thing to investigate is also bazel server jvm resource consumption vs actions resources
|
Is this still an issue in Bazel 5+? |
Yes, this is still an issue. We run Bazel 5.2.0 on CircleCI at Figma and have encountered this. |
Here's our workaround:
|
Still an issue. |
No, I do think #16512 fixes this. |
@larsrc-google My PR doesn't fix this for RAM limits, only CPU shares. |
The VM itself does use the cgroup memory limit for the heap, but obviously that doesn't help us if we can't access that value. We could potentially use the cgroups handing I introduced recently, then take the lower of the available values. It would need a bit of tweaking to accept being in read-only mode, but not too much. |
As of JDK 14, `OperatingSystemMXBean` provides information about system memory that is container-aware. Outside containers, it uses the same mechanisms as Bazel to determine available RAM (`/proc/meminfo` on Linux, `hw.memsize` on macOS) and can thus be used as a drop-in replacement for the custom implementation. A small caveat is that Bazel's macOS RAM estimate was based on converting bytes to "MB" via a divisor of `1000^2` instead of `1024^2`, resulting in a consistent overestimate compared to an identical Linux machine that is now corrected. This opportunity was missed in bazelbuild#16512 since `OperatingSystemMXBean` is based on a complete Java implementation of cgroups handling and doesn't go through the `os::total_memory` or `os::physical_memory` Hotspot functions. RELNOTES[INC]: * On Linux, Bazel's RAM estimate for the host machine is now aware of container resource limits. * On macOS, Bazel no longer consistently overestimates the total RAM by ~5% (`1024^2/1000^2`). * On Windows, Bazel's RAM estimate is now generally more accurate as it is no longer influenced by JVM heuristics. Fixes bazelbuild#3886 Closes bazelbuild#20435. PiperOrigin-RevId: 588718034 Change-Id: I2daafa0567740a1b149ca8756ec27f102129283c
As of JDK 14, `OperatingSystemMXBean` provides information about system memory that is container-aware. Outside containers, it uses the same mechanisms as Bazel to determine available RAM (`/proc/meminfo` on Linux, `hw.memsize` on macOS) and can thus be used as a drop-in replacement for the custom implementation. A small caveat is that Bazel's macOS RAM estimate was based on converting bytes to "MB" via a divisor of `1000^2` instead of `1024^2`, resulting in a consistent overestimate compared to an identical Linux machine that is now corrected. This opportunity was missed in #16512 since `OperatingSystemMXBean` is based on a complete Java implementation of cgroups handling and doesn't go through the `os::total_memory` or `os::physical_memory` Hotspot functions. RELNOTES[INC]: * On Linux, Bazel's RAM estimate for the host machine is now aware of container resource limits. * On macOS, Bazel no longer consistently overestimates the total RAM by ~5% (`1024^2/1000^2`). * On Windows, Bazel's RAM estimate is now generally more accurate as it is no longer influenced by JVM heuristics. Fixes #3886 Closes #20435. Commit 2f3cdc5 PiperOrigin-RevId: 588718034 Change-Id: I2daafa0567740a1b149ca8756ec27f102129283c Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
A fix for this issue has been included in Bazel 7.1.0 RC1. Please test out the release candidate and report any issues as soon as possible. Thanks! |
Please provide the following information. The more we know about your system and use case, the more easily and likely we can help.
Description of the problem / feature request / question:
Bazel, by default, looks at available RAM on the system to set
local_resources
defaults, so as to best-use the resources of the machine.Unfortunately, inside a Docker container or other cgroup environment, the system-wide memory statisics (
/proc/meminfo
, the output offree
, etc) reflect the memory usage of the host, not the container.bazel should make a best-effort attempt to find the effective cgroup memory controller limits, and use those.
If possible, provide a minimal example to reproduce the problem:
I ran into this in a CircleCI build; All builds would fail with
until I added
build --local_resources=4096,4,1.0
to my.bazelrc
.Circle's build containers report 60G of RAM, but are cgroup-limited to 4G, so building any large application on Circle ought reproduce the issue.
Environment info
Operating System:
Linux; Tested on Ubuntu 16.04
Bazel version (output of
bazel info release
):release 0.5.4
If
bazel info release
returns "development version" or "(@non-git)", please tell us what source tree you compiled Bazel from; git commit hash is appreciated (git rev-parse HEAD
):Have you found anything relevant by searching the web?
(e.g. StackOverflow answers,
GitHub issues,
email threads on the
bazel-discuss
Google group)There are a number of reports online of people puzzling with bazel OOM-ing. It's hard to know how many of them root-cause to this issue, but almost certainly some of them do, since container environments are increasingly popular these days.
Anything else, information or logs or outputs that would be helpful?
(If they are large, please upload as attachment or provide link).
https://fabiokung.com/2014/03/13/memory-inside-linux-containers/ has some notes on how to detect memory availability inside containers.
The text was updated successfully, but these errors were encountered: