wasm: fix CPU profiler deadlock #17877

rockwotj · 2024-04-16T01:46:53Z

Currently trying to take a backtrace while the CPU profiler is running
results in a deadlock, because taking a backtrace segfaults, as we don't
write any debug symbols during JIT compilation (which causes other
deadlocks in libgcc). To fix this, disable the profile's backtracing
when within Wasm. In the future we should use Wasmtime's profiling APIs
to get a stacktrace within the guest program that is running. For more
information on that API see:
https://docs.wasmtime.dev/api/wasmtime/struct.GuestProfiler.html

FIXES: CORE-486

Backports Required

Release Notes

Bug Fixes

Fixes an issue where using the CPU profiler with running Data Transforms could cause the process to deadlock.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

rockwotj · 2024-04-16T01:47:39Z

Not backporting because the new APIs in seastar were not backported.

Requires this vtools change: https://github.com/redpanda-data/vtools/pull/2644

src/v/wasm/wasmtime.cc

dotnwat · 2024-04-16T14:33:39Z

src/v/wasm/wasmtime.cc

+        // Disable profiling backtraces inside the VM - at the time of writing
+        // backtraces lead to segfaults causing deadlock in Seastar's signal
+        // handlers.
+        auto _ = ss::internal::scoped_disable_profile_temporarily();


what is the comment mean on this object in seastar tree This is not reentrant?

It means that it doesn't handle the case of nesting these RAII objects - when being destructed it doesn't set the flag to the original state, but unconditionally turns the flag off.

rockwotj · 2024-04-16T15:26:24Z

FYI - I'll create a followup ticket, but if we want profiling results within the Wasmtime VM there are ways to get backtraces: https://bytecodealliance.zulipchat.com/#narrow/stream/217126-wasmtime/topic/Lower.20level.20backtrace.20API those APIs just need to be tied into C API

Currently trying to take a backtrace while the CPU profiler is running results in a deadlock, because taking a backtrace segfaults, as we don't write any debug symbols during JIT compilation (which causes other deadlocks in libgcc). To fix this, disable the profile's backtracing when within Wasm. In the future we should use Wasmtime's profiling APIs to get a stacktrace within the guest program that is running. For more information on that API see: https://docs.wasmtime.dev/api/wasmtime/struct.GuestProfiler.html Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

vbotbuildovich · 2024-04-16T23:08:44Z

new failures in https://buildkite.com/redpanda/redpanda/builds/47890#018ee8de-1027-4296-8932-5d9f23cf1287:

"rptest.tests.write_caching_fi_test.WriteCachingFailureInjectionTest.test_unavoidable_data_loss"

dotnwat · 2024-04-16T23:14:11Z

oversized alloc

travisdowns · 2024-04-17T13:58:55Z

@rockwotj @ballard26 this seems like a bit observability hole now that we are rolling out WASM: if we are using a lot of CPU time in WASM it's going to be invisible to the profiling.

Getting stacks in the VM would be nice, but it seems like a simpler solution which would get a lot of the way there would be to suppress stacks with a "reason", and if a sample would have been taken when suppressed we record that as a synthetic stack of say 2 frames: "supppressed" -> "reason" so it still shows clearly in the profile. This would let us see what % of time is being spent in suppressed stuff and keep the profiling unbiased overall.

rockwotj · 2024-04-17T14:10:59Z

invisible to the profiling.

It's better than deadlocking the process! Yes I agree I can add this to the profiler. My main priority was not terribly breaking things. Wasm does run in it's own scheduling group, and there are stats on CPU usage per function, so it's not like we're completely blind.

dotnwat · 2024-04-17T16:01:05Z

Do we have a ticket for improving the situation here, or this happening imminently?

rockwotj · 2024-04-17T16:04:33Z

Ticket: https://redpandadata.atlassian.net/browse/CORE-2410

I can look at this early next week

cmake: upgrade seastar

75a1f2a

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

rockwotj requested review from BenPope and a team as code owners April 16, 2024 01:46

rockwotj requested review from jackietung-redpanda and removed request for a team April 16, 2024 01:46

github-actions bot added area/build area/redpanda area/wasm WASM Data Transforms labels Apr 16, 2024

rockwotj requested review from StephanDollberg and ballard26 April 16, 2024 01:47

rockwotj self-assigned this Apr 16, 2024

dotnwat previously approved these changes Apr 16, 2024

View reviewed changes

StephanDollberg reviewed Apr 16, 2024

View reviewed changes

src/v/wasm/wasmtime.cc Outdated Show resolved Hide resolved

rockwotj dismissed dotnwat’s stale review via d6aa1e2 April 16, 2024 14:28

rockwotj force-pushed the wasm-profiler branch from 6c1f925 to d6aa1e2 Compare April 16, 2024 14:28

rockwotj requested review from StephanDollberg and dotnwat April 16, 2024 14:29

dotnwat previously approved these changes Apr 16, 2024

View reviewed changes

StephanDollberg previously approved these changes Apr 16, 2024

View reviewed changes

rockwotj dismissed stale reviews from StephanDollberg and dotnwat via a910869 April 16, 2024 20:05

rockwotj force-pushed the wasm-profiler branch from d6aa1e2 to a910869 Compare April 16, 2024 20:05

dotnwat approved these changes Apr 16, 2024

View reviewed changes

dotnwat merged commit 13273f3 into redpanda-data:dev Apr 16, 2024
14 of 17 checks passed

rockwotj deleted the wasm-profiler branch April 16, 2024 23:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wasm: fix CPU profiler deadlock #17877

wasm: fix CPU profiler deadlock #17877

rockwotj commented Apr 16, 2024 •

edited

Loading

rockwotj commented Apr 16, 2024

dotnwat Apr 16, 2024

rockwotj Apr 16, 2024

rockwotj commented Apr 16, 2024

vbotbuildovich commented Apr 16, 2024

dotnwat commented Apr 16, 2024

travisdowns commented Apr 17, 2024

rockwotj commented Apr 17, 2024 •

edited

Loading

dotnwat commented Apr 17, 2024 •

edited

Loading

rockwotj commented Apr 17, 2024

wasm: fix CPU profiler deadlock #17877

wasm: fix CPU profiler deadlock #17877

Conversation

rockwotj commented Apr 16, 2024 • edited Loading

Backports Required

Release Notes

Bug Fixes

rockwotj commented Apr 16, 2024

dotnwat Apr 16, 2024

Choose a reason for hiding this comment

rockwotj Apr 16, 2024

Choose a reason for hiding this comment

rockwotj commented Apr 16, 2024

vbotbuildovich commented Apr 16, 2024

dotnwat commented Apr 16, 2024

travisdowns commented Apr 17, 2024

rockwotj commented Apr 17, 2024 • edited Loading

dotnwat commented Apr 17, 2024 • edited Loading

rockwotj commented Apr 17, 2024

rockwotj commented Apr 16, 2024 •

edited

Loading

rockwotj commented Apr 17, 2024 •

edited

Loading

dotnwat commented Apr 17, 2024 •

edited

Loading