Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
core: remove thread local caching of the global default subscriber (#…
…2593) ## Motivation Currently, when a call to `dispatcher::get_default` occurs, `tracing` will check the thread-local default dispatcher first. If a thread-local scoped default is set, it is returned. Otherwise, the thread will then check the global default. If a global default is present, it is then cached in the thread local, so that subsequent calls do not need to check the global default. Unfortunately, this behavior results in issues if the scoped default is accessed (e.g. using `get_default` or creating a new span) *prior* to a global default being set. When `get_default` runs for the first time and there is no global default, a `none` dispatcher is cached as the thread-local default. This means that the thread will now behave as though its default dispatcher is `None` until the scoped default is overridden, even if a global default is then set later. This is quite bad, and results in issues such as #2587, #2436, and #2411. ## Solution This branch makes several changes to remove the use of the thread-local caching of the global default dispatcher, and to lessen the performance impact of doing so. On the `master` (v0.2.x) branch, we track the number of scoped dispatchers currently active, and use it to determine whether or not to check thread-local storage at all. This optimization was introduced in PR #1017. This branch backports a similar change to `v0.1.x`. In addition, #1017 also changes the dispatcher module to represent a `Dispatch` internally using an enum of either an `Arc` in the case where the dispatcher is scoped, or a `&'static dyn Subscriber + Send + Sync` reference when the dispatcher is the global default. This makes cloning and constructing the global default cheaper, and also allows us to change the `None` dispatcher into a static singleton. That means that the use of a `None` dispatcher no longer requires an allocation and arc reference bump, an issue which was previously resolved by locally caching a `None` dispatcher. A side benefit of this change is that *cloning* a `Dispatch` is substantially cheaper when the dispatcher is a global default, as it's just an `&'static` reference and no `Arc` bump is necessary. This will also make cloning a `Span` cheaper when the global default dispatcher is in use. Finally, because the overhead of getting the global default is substantially reduced, we are able to change the scoped default dispatcher's behavior to remove the caching entirely. This means that the category of bugs involving the local cache becoming stale is resolved entirely. Fixes #2587 Fixes #2436 Fixes #2411 Closes #2592 ## Performance Impact This change results in a change in performance characteristics. Running the benchmarks, we observe a significant improvement in performance in most of the benchmarks that use the global default dispatcher, and a noticeable decrease in performance for some benchmarks using the scoped default. In my opinion, this performance change is acceptable, as a global default dispatcher is the common case for most users, and is generally expected to perform better than the scoped default. In addition, resolving the variety of bugs that are caused by the local caching of the default when using the scoped default dispatcher is worth a performance cost when the scoped default is in use. <details> <summary>Benchmark results:</summary> ``` Running benches/baseline.rs (target/release/deps/baseline-9b70733ce49582d2) comparison/relaxed_load time: [456.48 ps 456.55 ps 456.63 ps] change: [+3.0281% +3.3135% +3.5664%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) high mild 5 (5.00%) high severe comparison/acquire_load time: [438.98 ps 439.32 ps 439.76 ps] change: [-0.3725% -0.2092% -0.0614%] (p = 0.01 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) high mild 10 (10.00%) high severe comparison/log time: [227.05 ps 227.14 ps 227.27 ps] change: [+3.1351% +3.2984% +3.4537%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) high mild 9 (9.00%) high severe ``` ``` Running benches/dispatch_get_clone.rs (target/release/deps/dispatch_get_clone-d4d6ca1f9895e432) Dispatch::get_clone/none time: [8.3974 ns 8.4004 ns 8.4039 ns] change: [-22.870% -22.796% -22.728%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 4 (4.00%) high severe Dispatch::get_clone/scoped time: [15.877 ns 15.959 ns 16.045 ns] change: [+52.358% +52.943% +53.500%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) high mild 14 (14.00%) high severe Dispatch::get_clone/global time: [8.3962 ns 8.4000 ns 8.4054 ns] change: [-19.126% -18.961% -18.817%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 2 (2.00%) low severe 6 (6.00%) high mild 7 (7.00%) high severe ``` ``` Running benches/dispatch_get_ref.rs (target/release/deps/dispatch_get_ref-6ce05749a0b1bf87) Dispatch::get_ref/none time: [1.7551 ns 1.7564 ns 1.7579 ns] change: [-51.858% -51.749% -51.644%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe Dispatch::get_ref/scoped time: [3.6341 ns 3.6365 ns 3.6397 ns] change: [-2.6892% -2.5955% -2.4968%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) high mild 7 (7.00%) high severe Dispatch::get_ref/global time: [1.7668 ns 1.7686 ns 1.7713 ns] change: [-52.697% -52.647% -52.603%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe ``` ``` Running benches/empty_span.rs (target/release/deps/empty_span-745c777d77b8b7ca) empty_span/none time: [227.02 ps 227.10 ps 227.20 ps] change: [-0.1729% -0.0705% +0.0495%] (p = 0.24 > 0.05) No change in performance detected. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) high mild 5 (5.00%) high severe empty_span/scoped time: [218.51 ps 218.69 ps 218.90 ps] change: [-0.7582% -0.6056% -0.4630%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe empty_span/global time: [217.85 ps 218.15 ps 218.56 ps] change: [-2.6528% -2.4341% -2.1602%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe empty_span/baseline_struct time: [655.54 ps 656.09 ps 656.76 ps] change: [-1.6595% -1.4125% -1.1776%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe ``` ``` Running benches/enter_span.rs (target/release/deps/enter_span-7fc1c2a69c076475) enter_span/none time: [0.0000 ps 0.0000 ps 0.0000 ps] change: [-43.600% +6.5764% +109.38%] (p = 0.86 > 0.05) No change in performance detected. Found 14 outliers among 100 measurements (14.00%) 6 (6.00%) high mild 8 (8.00%) high severe enter_span/scoped time: [2.6513 ns 2.6567 ns 2.6641 ns] change: [+0.3121% +1.9504% +3.4648%] (p = 0.01 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe enter_span/global time: [3.2108 ns 3.2160 ns 3.2220 ns] change: [+25.963% +26.742% +27.434%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild ``` ``` Running benches/event.rs (target/release/deps/event-6742eef6ebe07aa4) event/none time: [227.04 ps 227.18 ps 227.41 ps] change: [-1.6751% -1.5743% -1.4711%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 6 (6.00%) high mild 7 (7.00%) high severe event/scoped time: [8.3849 ns 8.4335 ns 8.4888 ns] change: [-3.4754% -3.0252% -2.6092%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe event/scoped_recording time: [36.916 ns 37.022 ns 37.194 ns] change: [+8.1054% +18.714% +30.381%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low mild 2 (2.00%) high mild 7 (7.00%) high severe event/global time: [6.9694 ns 7.1677 ns 7.3469 ns] change: [-23.407% -21.940% -20.398%] (p = 0.00 < 0.05) Performance has improved. ``` ``` Running benches/span_fields.rs (target/release/deps/span_fields-96dfd0a8a577dec6) span_fields/none time: [3.5936 ns 3.6008 ns 3.6106 ns] change: [+17.160% +17.776% +18.413%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 8 (8.00%) high mild 6 (6.00%) high severe span_fields/scoped time: [33.751 ns 33.765 ns 33.779 ns] change: [+22.689% +22.873% +23.037%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 1 (1.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe span_fields/scoped_recording time: [270.22 ns 270.55 ns 270.91 ns] change: [+10.615% +10.827% +11.028%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 6 (6.00%) high mild span_fields/global time: [28.337 ns 28.428 ns 28.527 ns] change: [+3.0582% +3.3355% +3.6278%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 13 (13.00%) high mild ``` ``` Running benches/span_no_fields.rs (target/release/deps/span_no_fields-f8c7d7a84f720442) span_no_fields/none time: [1.5467 ns 1.5507 ns 1.5553 ns] change: [+12.966% +13.206% +13.434%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 7 (7.00%) high mild 1 (1.00%) high severe span_no_fields/scoped time: [17.796 ns 17.810 ns 17.826 ns] change: [+1.0381% +1.1673% +1.2914%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 6 (6.00%) high mild 6 (6.00%) high severe span_no_fields/scoped_recording time: [30.397 ns 30.459 ns 30.524 ns] change: [-0.8489% -0.6268% -0.3915%] (p = 0.00 < 0.05) Change within noise threshold. span_no_fields/global time: [12.747 ns 12.791 ns 12.844 ns] change: [-27.930% -27.672% -27.386%] (p = 0.00 < 0.05) Performance has improved. ``` ``` Running benches/span_repeated.rs (target/release/deps/span_repeated-03bfaaf4ecd13d36) span_repeated/none time: [699.28 ns 699.84 ns 700.53 ns] change: [+2.4125% +2.6359% +2.8862%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 7 (7.00%) high mild 2 (2.00%) high severe span_repeated/scoped time: [2.5029 µs 2.5057 µs 2.5090 µs] change: [+4.5095% +4.6605% +4.8122%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 8 (8.00%) low mild 6 (6.00%) high mild 2 (2.00%) high severe span_repeated/scoped_recording time: [5.0509 µs 5.0535 µs 5.0566 µs] change: [+0.7346% +1.0724% +1.3718%] (p = 0.00 < 0.05) Change within noise threshold. Found 13 outliers among 100 measurements (13.00%) 6 (6.00%) high mild 7 (7.00%) high severe span_repeated/global time: [2.1264 µs 2.1272 µs 2.1282 µs] change: [-11.213% -11.119% -11.031%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) high mild 5 (5.00%) high severe ``` </details>
- Loading branch information