-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
task: add Tracing instrumentation to spawned tasks #2655
Conversation
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
The context shows the currently active task, so we _really_ only need `new` and `close` events. Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
there is no option for open/close only, currently Signed-off-by: Eliza Weisman <eliza@buoyant.io>
tokio/src/macros/cfg.rs
Outdated
macro_rules! cfg_trace { | ||
($($item:item)*) => { | ||
$( | ||
#[cfg(all(feature = "tracing", tokio_unstable))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about this, but I don't think tracing
merits being placed behind the tokio_unstable
unstable flag. @carllerche will need to weigh in on this, but unlike the CancelationToken
, changes to the tracing
output aren't what the Tokio project has historically considered to be a "breaking change". An off-by-default (e.g., excluded from full
) feature flag seems like a reasonable default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at one point, @carllerche suggested that any dependency on a pre-1.0 crate should require the tokio_unstable
flag...but, in this case, we aren't exposing anything from tracing
in Tokio's public APIs. AFAICT, it shouldn't be possible for the tracing
dependency to cause any compile time breakage --- the worst failure mode, if there is eventually a breaking change to the tracing-core
crate that we can't work around, is that the tracing diagnostics might silently fail to be collected.
IMO, you're right, and an off-by-default optional dep should be fine. That would be pretty similar to what we do for parking_lot
, which is also pre-1.0 (and churns versions somewhat frequently!), since it's also not exposed publicly. Of course, if we were to expose tracing
types (such as Span
or Dispatch
) in a public API later on, those APIs should require the unstable flag.
With all of that said: I made the more conservative choice of opening this PR with the tokio_unstable
flag required, because I want to bias towards getting this merged sooner rather than later, and continuing to iterate once it merges. We can always remove the flag in a follow-up, but it's harder to add if these changes ship in a published release...
This is really cool, I'm excited! I'll add to the future that from experience it'd also be really handy to have events for when a resource is notified. |
@jonhoo Yup, the next major thing to do in Tokio itself is probably going to be adding instrumentation to |
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. We can remove the unstable
flag.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
…liza/trace-spawns Signed-off-by: Eliza Weisman <eliza@buoyant.io>
- docs: misc improvements (#2572, #2658, #2663, #2656, #2647, #2630, #2487, #2621, #2624, #2600, #2623, #2622, #2577, #2569, #2589, #2575, #2540, #2564, #2567, #2520, #2521, #2572, #2493) - rt: allow calls to `block_on` inside calls to `block_in_place` that are themselves inside `block_on` (#2645) - net: fix non-portable behavior when dropping `TcpStream` `OwnedWriteHalf` (#2597) - io: improve stack usage by allocating large buffers on directly on the heap (#2634) - io: fix unsound pin projection in `AsyncReadExt::read_buf` and `AsyncWriteExt::write_buf` (#2612) - io: fix unnecessary zeroing for `AsyncRead` implementors (#2525) - io: Fix `BufReader` not correctly forwarding `poll_write_buf` (#2654) - coop: returning `Poll::Pending` no longer decrements the task budget (#2549) - io: little-endian variants of `AsyncReadExt` and `AsyncWriteExt` methods (#1915) - io: fix panic in `AsyncReadExt::read_line` (#2541) - task: add [`tracing`] instrumentation to spawned tasks (#2655) - sync: allow unsized types in `Mutex` and `RwLock` (via `default` constructors) (#2615) - net: add `ToSocketAddrs` implementation for `&[SocketAddr]` (#2604) - fs: add `OpenOptionsExt` for `OpenOptions` (#2515) - fs: add `DirBuilder` (#2524) [`tracing`]: https://crates.io/crates/tracing Signed-off-by: Eliza Weisman <eliza@buoyant.io>
# 0.2.22 (July 2!, 2020) ### Fixes - docs: misc improvements (#2572, #2658, #2663, #2656, #2647, #2630, #2487, #2621, #2624, #2600, #2623, #2622, #2577, #2569, #2589, #2575, #2540, #2564, #2567, #2520, #2521, #2493) - rt: allow calls to `block_on` inside calls to `block_in_place` that are themselves inside `block_on` (#2645) - net: fix non-portable behavior when dropping `TcpStream` `OwnedWriteHalf` (#2597) - io: improve stack usage by allocating large buffers on directly on the heap (#2634) - io: fix unsound pin projection in `AsyncReadExt::read_buf` and `AsyncWriteExt::write_buf` (#2612) - io: fix unnecessary zeroing for `AsyncRead` implementors (#2525) - io: Fix `BufReader` not correctly forwarding `poll_write_buf` (#2654) - io: fix panic in `AsyncReadExt::read_line` (#2541) ### Changes - coop: returning `Poll::Pending` no longer decrements the task budget (#2549) ### Added - io: little-endian variants of `AsyncReadExt` and `AsyncWriteExt` methods (#1915) - task: add [`tracing`] instrumentation to spawned tasks (#2655) - sync: allow unsized types in `Mutex` and `RwLock` (via `default` constructors) (#2615) - net: add `ToSocketAddrs` implementation for `&[SocketAddr]` (#2604) - fs: add `OpenOptionsExt` for `OpenOptions` (#2515) - fs: add `DirBuilder` (#2524) [`tracing`]: https://crates.io/crates/tracing Signed-off-by: Eliza Weisman <eliza@buoyant.io>
## Motivation When debugging proxy issues, it can be useful to inspect the list of currently spawned Tokio tasks and their states. This can be used similarly to the thread or coroutine dumps provided by other languages' runtimes. ## Solution This branch adds a new endpoint to the proxy's admin server, `/tasks`, that returns a dump of all tasks currently spawned on the Tokio runtime, using the new Tracing instrumentation added in tokio-rs/tokio#2655, and a work-in-progress [`tokio-trace`] crate that provides Tokio-specific Tracing layers. Currently, the `/tasks` admin endpoint records the following information about each task: * Whether it is a normal, local, or blocking task (not relevant to us currently, since Linkerd does not use local or blocking tasks...but we might eventually!) * Whether the task is active (currently being polled) or idle (waiting to be polled) * The type of the future that was spawned * The Tracing span context from which the task was spawned * The total number of times the task has been polled * Timing statistics about the task, including: - The time in nanoseconds between when the task was spawned and when it was first polled (essentially, measuring the Tokio scheduler's latency) - The total time in nanoseconds the task has existed - The task's _busy time_ in nanoseconds (time it was actively being polled) - The tasks _idle time_ in nanoseconds (time it was _not_ being polled) In the future, Tokio will likely expose additional Tracing information, which we'll be able to collect as well. The task dump can be accessed either as an HTML table or as JSON. JSON is returned if the request has an `Accept: application/json` header, or whenever the path `/tasks.json` is requested; otherwise, the data is rendered as an HTML table. Like the `/proxy-log-level` endpoint, access is denied to requests coming from sources other than localhost, to help restrict access to authorized users (since a high volume of requests for task dumps could be used to starve the proxy). Example JSON output (in Firefox Dev Edition's extremely nice GUI JSON viewer): ![Screenshot_20200715_121938](https://user-images.githubusercontent.com/2796466/87598059-b9f68380-c6a7-11ea-8f21-842b57793baa.png) Zoomed in on the timing data for a single task: ![Screenshot_20200715_122047](https://user-images.githubusercontent.com/2796466/87598089-c4b11880-c6a7-11ea-93ac-895f7ecee0f0.png) And HTML: ![Screenshot_20200715_143155](https://user-images.githubusercontent.com/2796466/87598414-fe821f00-c6a7-11ea-93b8-d18e4837346c.png) Because the task data is generated from Tracing spans emitted by Tokio, the task spans must be enabled for it to be used. This can be done by setting a trace filter that enables the `trace` level for the target `tokio::task`, e.g.: ``` tokio::task=trace ``` or ``` tokio=trace ``` ## Notes * This branch depends on unreleased code from upstream, including a Tokio change that has merged to master but not been published, and my unreleased work-in-progress [`tokio-trace`] crate. Therefore, I've pinned these upstreams to fixed Git SHAs, to guard against dependencies changing under us unexpectedly. * I considered requiring a build-time feature flag to enable this feature, the way we do for the mock SO_ORIG_DST implementation for testing. However, this would make it harder to use task tracking to debug issues in proxies not built with the flag. I'm happy to change this code to be feature flagged if we think that's the right approach. [`tokio-trace`]: https://github.com/hawkw/tokio-trace Closes linkerd/linkerd2#3803 Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Motivation
When debugging asynchronous systems, it can be very valuable to inspect
what tasks are currently active (see #2510). The
tracing
crate andrelated libraries provide an interface for Rust libraries and
applications to emit and consume structured, contextual, and async-aware
diagnostic information. Because this diagnostic information is
structured and machine-readable, it is a better fit for the
task-tracking use case than textual logging —
tracing
spans can beconsumed to generate metrics ranging from a simple counter of active
tasks to histograms of poll durations, idle durations, and total task
lifetimes. This information is potentially valuable to both Tokio users
and to maintainers.
Additionally,
tracing
is maintained by the Tokio project and isbecoming widely adopted by other libraries in the "Tokio stack", such as
hyper
,h2
, andtonic
and in other parts of the broader Rustecosystem. Therefore, it is suitable for use in Tokio itself.
Solution
This PR is an MVP for instrumenting Tokio with
tracing
spans. When the"tracing" feature flag and the "tokio_unstable" cfg are both enabled,
every spawned future will be instrumented with a
tracing
span. If boththe cfg flag and the "tracing" feature flag are not set,
tokio
willnot depend on
tracing
, and no spans will be generated.The generated spans are at the
TRACE
verbosity level, and have thetarget "tokio::task", which may be used by consumers to filter whether
they should be recorded. They include fields for the type name of the
spawned future and for what kind of task the span corresponds to (a
standard
spawn
ed task, a local task spawned byspawn_local
, or ablocking
task spawned byspawn_blocking
). Becausetracing
hasseparate concepts of "opening/closing" and "entering/exiting" a span, we
enter these spans every time the spawned task is polled. This allows
collecting data such as:
spawn
todrop
therefore, aggregated metrics like histograms or averages of poll
durations)
it was alive but not being polled
spawn
ed and the first pollAs an example, here is the output of a version of the
chat
exampleinstrumented with
tracing
:And, with multiple connections actually sending messages:
I haven't added any
tracing
spans in the example, only converted theexisting
println!
s totracing::info
andtracing::error
forconsistency. The span durations in the above output are generated by
tracing-subscriber
. Of course, Tokio-specific subscriber couldgenerate even more detailed statistics, but that's follow-up work once
basic tracing support has been added.
Note that the
Instrumented
type fromtracing-futures
, which attachesa
tracing
span to a future, was reimplemented inside of Tokio to avoida dependency on that crate.
tracing-futures
has a feature flag thatenables an optional dependency on Tokio, and I believe that if another
crate in a dependency graph enables that feature while Tokio's
tracing
support is also enabled, it would create a circular dependency that
Cargo wouldn't be able to handle. Also, it avoids a dependency for a
very small amount of code that is unlikely to ever change.
There is, of course, room for plenty of future work here. This might
include:
tokio
, such as I/O resources andchannels (possibly via waker instrumentation)
can be inspected
tracing-subscriber
Layer
s to collect and displayTokio-specific data from these traces
track_caller
(when it's stable) to record where a taskwas
spawn
ed fromHowever, this is intended as an MVP to get us started on that path.
Signed-off-by: Eliza Weisman eliza@buoyant.io