-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wasm: fix CPU profiler deadlock #17877
Conversation
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Not backporting because the new APIs in seastar were not backported. Requires this vtools change: https://github.com/redpanda-data/vtools/pull/2644 |
// Disable profiling backtraces inside the VM - at the time of writing | ||
// backtraces lead to segfaults causing deadlock in Seastar's signal | ||
// handlers. | ||
auto _ = ss::internal::scoped_disable_profile_temporarily(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the comment mean on this object in seastar tree This is not reentrant
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that it doesn't handle the case of nesting these RAII objects - when being destructed it doesn't set the flag to the original state, but unconditionally turns the flag off.
FYI - I'll create a followup ticket, but if we want profiling results within the Wasmtime VM there are ways to get backtraces: https://bytecodealliance.zulipchat.com/#narrow/stream/217126-wasmtime/topic/Lower.20level.20backtrace.20API those APIs just need to be tied into C API |
Currently trying to take a backtrace while the CPU profiler is running results in a deadlock, because taking a backtrace segfaults, as we don't write any debug symbols during JIT compilation (which causes other deadlocks in libgcc). To fix this, disable the profile's backtracing when within Wasm. In the future we should use Wasmtime's profiling APIs to get a stacktrace within the guest program that is running. For more information on that API see: https://docs.wasmtime.dev/api/wasmtime/struct.GuestProfiler.html Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
new failures in https://buildkite.com/redpanda/redpanda/builds/47890#018ee8de-1027-4296-8932-5d9f23cf1287:
|
oversized alloc |
@rockwotj @ballard26 this seems like a bit observability hole now that we are rolling out WASM: if we are using a lot of CPU time in WASM it's going to be invisible to the profiling. Getting stacks in the VM would be nice, but it seems like a simpler solution which would get a lot of the way there would be to suppress stacks with a "reason", and if a sample would have been taken when suppressed we record that as a synthetic stack of say 2 frames: "supppressed" -> "reason" so it still shows clearly in the profile. This would let us see what % of time is being spent in suppressed stuff and keep the profiling unbiased overall. |
It's better than deadlocking the process! Yes I agree I can add this to the profiler. My main priority was not terribly breaking things. Wasm does run in it's own scheduling group, and there are stats on CPU usage per function, so it's not like we're completely blind. |
Do we have a ticket for improving the situation here, or this happening imminently? |
Ticket: https://redpandadata.atlassian.net/browse/CORE-2410 I can look at this early next week |
Currently trying to take a backtrace while the CPU profiler is running
results in a deadlock, because taking a backtrace segfaults, as we don't
write any debug symbols during JIT compilation (which causes other
deadlocks in libgcc). To fix this, disable the profile's backtracing
when within Wasm. In the future we should use Wasmtime's profiling APIs
to get a stacktrace within the guest program that is running. For more
information on that API see:
https://docs.wasmtime.dev/api/wasmtime/struct.GuestProfiler.html
FIXES: CORE-486
Backports Required
Release Notes
Bug Fixes