-
Notifications
You must be signed in to change notification settings - Fork 733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: remove thread local caching of the global default subscriber #2593
Conversation
On the `master` (v0.2.x) branch, we track the number of scoped dispatchers currently active, and use it to determine whether or not to check thread-local storage at all. This optimization was introduced in PR #1017. It turns out that this change prevents the incorrect behavior where a `None` dispatcher is cached in the thread local if a scoped dispatcher is used before the global default is set (issue #2587). Therefore, this commit backports the `SCOPED_COUNT` optimization (and *just* the `SCOPED_COUNT` change) from #1017 to the `v0.1.x` branch. Currently, this results in a substantial performance *regression*, especially in the "no dispatcher" case. This is because the `None` dispatcher gets cloned a lot now that it's no longer cached. A subsequent commit will resolve this by backporting more of the changes from #1017, but I wanted to make this in a separate commit to determine if it fixes the tests for #2587. ``` Running benches/dispatch_get_clone.rs (target/release/deps/dispatch_get_clone-d4d6ca1f9895e432) Dispatch::get_clone/none time: [23.525 ns 23.534 ns 23.543 ns] change: [+113.37% +113.59% +113.79%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) low mild 1 (1.00%) high mild Dispatch::get_clone/scoped time: [11.071 ns 11.078 ns 11.084 ns] change: [-0.1603% -0.0360% +0.1300%] (p = 0.67 > 0.05) No change in performance detected. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) low severe 3 (3.00%) low mild 1 (1.00%) high mild 1 (1.00%) high severe Dispatch::get_clone/global time: [24.036 ns 24.055 ns 24.075 ns] change: [+118.73% +119.28% +119.84%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe Running benches/dispatch_get_ref.rs (target/release/deps/dispatch_get_ref-6ce05749a0b1bf87) Dispatch::get_ref/none time: [14.326 ns 14.365 ns 14.407 ns] change: [+265.93% +266.48% +267.13%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) high mild 5 (5.00%) high severe Dispatch::get_ref/scoped time: [3.8639 ns 3.8681 ns 3.8725 ns] change: [+0.3381% +0.6460% +0.9565%] (p = 0.00 < 0.05) Change within noise threshold. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low mild 3 (3.00%) high mild 5 (5.00%) high severe Dispatch::get_ref/global time: [14.378 ns 14.472 ns 14.586 ns] change: [+272.95% +274.50% +276.12%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) low severe 4 (4.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe Running benches/empty_span.rs (target/release/deps/empty_span-745c777d77b8b7ca) empty_span/none time: [230.31 ps 230.47 ps 230.64 ps] change: [-0.9622% -0.8240% -0.6836%] (p = 0.00 < 0.05) Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) low mild 2 (2.00%) high mild 2 (2.00%) high severe empty_span/scoped time: [233.29 ps 233.90 ps 234.48 ps] change: [-0.0987% +0.1388% +0.3820%] (p = 0.28 > 0.05) No change in performance detected. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild empty_span/global time: [234.41 ps 234.60 ps 234.78 ps] change: [-0.4066% -0.2249% -0.0607%] (p = 0.01 < 0.05) Change within noise threshold. Found 19 outliers among 100 measurements (19.00%) 5 (5.00%) low severe 13 (13.00%) low mild 1 (1.00%) high mild empty_span/baseline_struct time: [704.20 ps 704.52 ps 704.86 ps] change: [-0.2031% +0.0627% +0.2709%] (p = 0.64 > 0.05) No change in performance detected. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) low mild 3 (3.00%) high mild 3 (3.00%) high severe Running benches/enter_span.rs (target/release/deps/enter_span-7fc1c2a69c076475) enter_span/none time: [0.0000 ps 0.0000 ps 0.0000 ps] change: [-43.986% +9.0604% +110.03%] (p = 0.82 > 0.05) No change in performance detected. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) high mild 8 (8.00%) high severe enter_span/scoped time: [2.8087 ns 2.8171 ns 2.8263 ns] change: [-6.2503% -5.8437% -5.5244%] (p = 0.00 < 0.05) Performance has improved. Found 17 outliers among 100 measurements (17.00%) 10 (10.00%) low severe 4 (4.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe enter_span/global time: [2.7827 ns 2.7881 ns 2.7942 ns] change: [-7.0005% -6.6984% -6.3814%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild Running benches/event.rs (target/release/deps/event-6742eef6ebe07aa4) event/none time: [233.17 ps 233.70 ps 234.26 ps] change: [+0.8021% +1.2066% +1.7762%] (p = 0.00 < 0.05) Change within noise threshold. Found 21 outliers among 100 measurements (21.00%) 9 (9.00%) low severe 8 (8.00%) high mild 4 (4.00%) high severe event/scoped time: [9.5149 ns 9.5274 ns 9.5408 ns] change: [+4.4596% +4.6910% +4.8995%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild event/scoped_recording time: [33.624 ns 34.086 ns 34.597 ns] change: [-7.7753% +1.4363% +11.588%] (p = 0.80 > 0.05) No change in performance detected. Found 28 outliers among 100 measurements (28.00%) 12 (12.00%) low mild 4 (4.00%) high mild 12 (12.00%) high severe event/global time: [22.544 ns 22.587 ns 22.631 ns] change: [+143.27% +143.94% +144.64%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 6 (6.00%) low severe 1 (1.00%) low mild Running benches/shared.rs (target/release/deps/shared-9c2531bf46089869) running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s Running benches/span_fields.rs (target/release/deps/span_fields-96dfd0a8a577dec6) span_fields/none time: [3.2739 ns 3.2814 ns 3.2900 ns] change: [+0.6402% +0.8871% +1.1473%] (p = 0.00 < 0.05) Change within noise threshold. Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) high mild 6 (6.00%) high severe span_fields/scoped time: [29.177 ns 29.237 ns 29.283 ns] change: [-4.7079% -4.4906% -4.2912%] (p = 0.00 < 0.05) Performance has improved. span_fields/scoped_recording time: [269.00 ns 269.32 ns 269.64 ns] change: [+4.7209% +4.8928% +5.0580%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high mild span_fields/global time: [39.700 ns 39.860 ns 40.081 ns] change: [+31.179% +31.920% +33.131%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe Running benches/span_no_fields.rs (target/release/deps/span_no_fields-f8c7d7a84f720442) span_no_fields/none time: [1.3904 ns 1.3914 ns 1.3927 ns] change: [-2.2161% -2.0729% -1.9314%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low mild 5 (5.00%) high mild 5 (5.00%) high severe span_no_fields/scoped time: [18.811 ns 18.818 ns 18.826 ns] change: [+1.9002% +2.1274% +2.3444%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 1 (1.00%) high mild 6 (6.00%) high severe span_no_fields/scoped_recording time: [32.819 ns 32.844 ns 32.872 ns] change: [-0.2256% -0.1680% -0.1081%] (p = 0.00 < 0.05) Change within noise threshold. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) high mild 5 (5.00%) high severe span_no_fields/global time: [36.148 ns 36.190 ns 36.229 ns] change: [+91.653% +91.930% +92.168%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild Running benches/span_repeated.rs (target/release/deps/span_repeated-03bfaaf4ecd13d36) span_repeated/none time: [730.43 ns 731.10 ns 731.79 ns] change: [+0.8904% +1.1179% +1.3512%] (p = 0.00 < 0.05) Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe span_repeated/scoped time: [2.4945 µs 2.4980 µs 2.5017 µs] change: [+3.4807% +3.6733% +3.8473%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low mild 2 (2.00%) high mild span_repeated/scoped_recording time: [4.9626 µs 4.9712 µs 4.9786 µs] change: [-1.2378% -0.8863% -0.5529%] (p = 0.00 < 0.05) Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild span_repeated/global time: [3.8247 µs 3.8270 µs 3.8291 µs] change: [+55.811% +55.944% +56.091%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe ```
This backports more of the change from #1017 to `v0.1.x`. This change makes cloning a `&'static` global dispatcher and a `None` dispatcher no longer require an `Arc` clone. `None` dispatchers no longer allocate. This undoes much of the performance regression from the previous commit: ``` Running benches/baseline.rs (target/release/deps/baseline-9b70733ce49582d2)comparison/relaxed_load time: [467.51 ps 468.35 ps 469.23 ps] change: [-0.8492% -0.5683% -0.3377%] (p = 0.00 < 0.05) Change within noise threshold. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild comparison/acquire_load time: [473.86 ps 476.62 ps 480.23 ps] change: [+0.0428% +0.6687% +1.2841%] (p = 0.04 < 0.05) Change within noise threshold. Found 14 outliers among 100 measurements (14.00%) 8 (8.00%) high mild 6 (6.00%) high severe comparison/log time: [235.40 ps 236.35 ps 237.77 ps] change: [-0.3095% +0.3407% +1.2059%] (p = 0.50 > 0.05) No change in performance detected. Found 16 outliers among 100 measurements (16.00%) 3 (3.00%) low severe 6 (6.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe Running benches/dispatch_get_clone.rs (target/release/deps/dispatch_get_clone-d4d6ca1f9895e432)Dispatch::get_clone/none time: [8.3471 ns 8.3649 ns 8.3863 ns] change: [-64.375% -64.301% -64.226%] (p = 0.00 < 0.05) Performance has improved. Dispatch::get_clone/scoped time: [17.054 ns 17.081 ns 17.108 ns] change: [+53.653% +53.819% +53.991%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe Dispatch::get_clone/global time: [8.4810 ns 8.4853 ns 8.4901 ns] change: [-64.773% -64.741% -64.710%] (p = 0.00 < 0.05) Performance has improved. Found 16 outliers among 100 measurements (16.00%) 15 (15.00%) high mild 1 (1.00%) high severe Running benches/dispatch_get_ref.rs (target/release/deps/dispatch_get_ref-6ce05749a0b1bf87)Dispatch::get_ref/none time: [1.8803 ns 1.8826 ns 1.8853 ns] change: [-86.888% -86.856% -86.812%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) high mild 4 (4.00%) high severe Dispatch::get_ref/scoped time: [3.8294 ns 3.8422 ns 3.8529 ns] change: [-1.5952% -1.3861% -1.1447%] (p = 0.00 < 0.05) Performance has improved. Dispatch::get_ref/global time: [1.8859 ns 1.8901 ns 1.8936 ns] change: [-87.025% -86.963% -86.907%] (p = 0.00 < 0.05) Performance has improved. Running benches/empty_span.rs (target/release/deps/empty_span-745c777d77b8b7ca)empty_span/none time: [234.18 ps 234.44 ps 234.68 ps] change: [+1.4833% +1.6364% +1.7788%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) low mild 1 (1.00%) high mild 1 (1.00%) high severe empty_span/scoped time: [235.19 ps 236.00 ps 236.71 ps] change: [+0.5261% +0.8586% +1.1517%] (p = 0.00 < 0.05) Change within noise threshold. empty_span/global time: [235.04 ps 235.52 ps 236.15 ps] change: [-0.0215% +0.7651% +2.1704%] (p = 0.17 > 0.05) No change in performance detected. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe empty_span/baseline_struct time: [704.83 ps 705.28 ps 705.80 ps] change: [+0.0246% +0.6522% +1.5423%] (p = 0.10 > 0.05) No change in performance detected. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe Running benches/enter_span.rs (target/release/deps/enter_span-7fc1c2a69c076475)enter_span/none time: [0.0000 ps 0.0000 ps 0.0000 ps] change: [-49.467% +1.5232% +101.49%] (p = 0.98 > 0.05) No change in performance detected. Found 13 outliers among 100 measurements (13.00%) 5 (5.00%) high mild 8 (8.00%) high severe enter_span/scoped time: [3.7519 ns 3.7648 ns 3.7776 ns] change: [+32.984% +33.297% +33.588%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 6 (6.00%) low mild 1 (1.00%) high mild enter_span/global time: [3.7399 ns 3.7532 ns 3.7687 ns] change: [+33.878% +34.287% +34.719%] (p = 0.00 < 0.05) Performance has regressed. Found 19 outliers among 100 measurements (19.00%) 14 (14.00%) low severe 2 (2.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe Running benches/event.rs (target/release/deps/event-6742eef6ebe07aa4)event/none time: [234.52 ps 234.95 ps 235.48 ps] change: [-0.2638% +0.2782% +0.7705%] (p = 0.34 > 0.05) No change in performance detected. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe event/scoped time: [8.8083 ns 8.8738 ns 8.9531 ns] change: [-7.2455% -6.8819% -6.4900%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe event/scoped_recording time: [32.472 ns 33.287 ns 34.363 ns] change: [-9.3516% -0.4827% +9.1329%] (p = 0.89 > 0.05) No change in performance detected. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe event/global time: [6.7528 ns 6.7610 ns 6.7701 ns] change: [-70.199% -70.111% -70.013%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low mild 8 (8.00%) high mild 2 (2.00%) high severe Running benches/shared.rs (target/release/deps/shared-9c2531bf46089869) running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s Running benches/span_fields.rs (target/release/deps/span_fields-96dfd0a8a577dec6)span_fields/none time: [3.6797 ns 3.6839 ns 3.6888 ns] change: [+12.223% +12.489% +12.735%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low severe 3 (3.00%) high mild span_fields/scoped time: [32.995 ns 33.166 ns 33.349 ns] change: [+13.805% +14.178% +14.526%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe span_fields/scoped_recording time: [280.94 ns 283.54 ns 286.81 ns] change: [+4.7144% +5.4942% +6.4857%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe span_fields/global time: [29.370 ns 29.434 ns 29.505 ns] change: [-27.391% -26.777% -26.346%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild Running benches/span_no_fields.rs (target/release/deps/span_no_fields-f8c7d7a84f720442)span_no_fields/none time: [1.6604 ns 1.6663 ns 1.6725 ns] change: [+18.306% +18.708% +19.155%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low mild 3 (3.00%) high mild 1 (1.00%) high severe span_no_fields/scoped time: [19.079 ns 19.107 ns 19.132 ns] change: [+0.5306% +0.7813% +1.0349%] (p = 0.00 < 0.05) Change within noise threshold. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild span_no_fields/scoped_recording time: [32.628 ns 32.667 ns 32.707 ns] change: [-0.3009% -0.1227% +0.0411%] (p = 0.18 > 0.05) No change in performance detected. span_no_fields/global time: [13.029 ns 13.042 ns 13.055 ns] change: [-64.133% -64.076% -64.014%] (p = 0.00 < 0.05) Performance has improved. Running benches/span_repeated.rs (target/release/deps/span_repeated-03bfaaf4ecd13d36)span_repeated/none time: [253.05 ns 256.87 ns 261.12 ns] change: [-65.419% -65.179% -64.926%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 7 (7.00%) high severe span_repeated/scoped time: [2.2869 µs 2.2926 µs 2.2986 µs] change: [-8.5393% -8.3402% -8.1049%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe span_repeated/scoped_recording time: [4.7143 µs 4.7286 µs 4.7451 µs] change: [-5.0531% -4.7290% -4.3831%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 6 (6.00%) high mild 3 (3.00%) high severe span_repeated/global time: [2.1195 µs 2.1224 µs 2.1253 µs] change: [-44.708% -44.624% -44.550%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild ```
now that getting the global default is substantially faster, who cares! removing the thread-local caching of the global default totally fixes this whole category of weirdness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me, no red flags. But I'll admit that this isn't an area I'm deeply familar with, so you can decide whether my approval is sufficient.
@hds Thanks for taking a look anyway! I'm pretty aware that I'm the only person who's really familiar with this code, but it's good to get more eyes regardless. |
# 0.1.31 (May 11, 2023) This release of `tracing-core` fixes a bug that caused threads which call `dispatcher::get_default` _before_ a global default subscriber is set to never see the global default once it is set. In addition, it includes improvements for instrumentation performance in some cases, especially when using a global default dispatcher. ### Fixed - Fixed incorrect thread-local caching of `Dispatch::none` if `dispatcher::get_default` is called before `dispatcher::set_global_default` (#2593) ### Changed - Cloning a `Dispatch` that points at a global default subscriber no longer requires an `Arc` reference count increment, improving performance substantially (#2593) - `dispatcher::get_default` no longer attempts to access a thread local if the scoped dispatcher is not in use, improving performance when the default dispatcher is global (#2593) - Added `#[inline]` annotations called by the `event!` and `span!` macros to reduce the size of macro-generated code and improve recording performance (#2555) Thanks to new contributor @ldm0 for contributing to this release!
# 0.1.31 (May 11, 2023) This release of `tracing-core` fixes a bug that caused threads which call `dispatcher::get_default` _before_ a global default subscriber is set to never see the global default once it is set. In addition, it includes improvements for instrumentation performance in some cases, especially when using a global default dispatcher. ### Fixed - Fixed incorrect thread-local caching of `Dispatch::none` if `dispatcher::get_default` is called before `dispatcher::set_global_default` (#2593) ### Changed - Cloning a `Dispatch` that points at a global default subscriber no longer requires an `Arc` reference count increment, improving performance substantially (#2593) - `dispatcher::get_default` no longer attempts to access a thread local if the scoped dispatcher is not in use, improving performance when the default dispatcher is global (#2593) - Added `#[inline]` annotations called by the `event!` and `span!` macros to reduce the size of macro-generated code and improve recording performance (#2555) Thanks to new contributor @ldm0 for contributing to this release!
…okio-rs#2593) ## Motivation Currently, when a call to `dispatcher::get_default` occurs, `tracing` will check the thread-local default dispatcher first. If a thread-local scoped default is set, it is returned. Otherwise, the thread will then check the global default. If a global default is present, it is then cached in the thread local, so that subsequent calls do not need to check the global default. Unfortunately, this behavior results in issues if the scoped default is accessed (e.g. using `get_default` or creating a new span) *prior* to a global default being set. When `get_default` runs for the first time and there is no global default, a `none` dispatcher is cached as the thread-local default. This means that the thread will now behave as though its default dispatcher is `None` until the scoped default is overridden, even if a global default is then set later. This is quite bad, and results in issues such as tokio-rs#2587, tokio-rs#2436, and tokio-rs#2411. ## Solution This branch makes several changes to remove the use of the thread-local caching of the global default dispatcher, and to lessen the performance impact of doing so. On the `master` (v0.2.x) branch, we track the number of scoped dispatchers currently active, and use it to determine whether or not to check thread-local storage at all. This optimization was introduced in PR tokio-rs#1017. This branch backports a similar change to `v0.1.x`. In addition, tokio-rs#1017 also changes the dispatcher module to represent a `Dispatch` internally using an enum of either an `Arc` in the case where the dispatcher is scoped, or a `&'static dyn Subscriber + Send + Sync` reference when the dispatcher is the global default. This makes cloning and constructing the global default cheaper, and also allows us to change the `None` dispatcher into a static singleton. That means that the use of a `None` dispatcher no longer requires an allocation and arc reference bump, an issue which was previously resolved by locally caching a `None` dispatcher. A side benefit of this change is that *cloning* a `Dispatch` is substantially cheaper when the dispatcher is a global default, as it's just an `&'static` reference and no `Arc` bump is necessary. This will also make cloning a `Span` cheaper when the global default dispatcher is in use. Finally, because the overhead of getting the global default is substantially reduced, we are able to change the scoped default dispatcher's behavior to remove the caching entirely. This means that the category of bugs involving the local cache becoming stale is resolved entirely. Fixes tokio-rs#2587 Fixes tokio-rs#2436 Fixes tokio-rs#2411 Closes tokio-rs#2592 ## Performance Impact This change results in a change in performance characteristics. Running the benchmarks, we observe a significant improvement in performance in most of the benchmarks that use the global default dispatcher, and a noticeable decrease in performance for some benchmarks using the scoped default. In my opinion, this performance change is acceptable, as a global default dispatcher is the common case for most users, and is generally expected to perform better than the scoped default. In addition, resolving the variety of bugs that are caused by the local caching of the default when using the scoped default dispatcher is worth a performance cost when the scoped default is in use. <details> <summary>Benchmark results:</summary> ``` Running benches/baseline.rs (target/release/deps/baseline-9b70733ce49582d2) comparison/relaxed_load time: [456.48 ps 456.55 ps 456.63 ps] change: [+3.0281% +3.3135% +3.5664%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) high mild 5 (5.00%) high severe comparison/acquire_load time: [438.98 ps 439.32 ps 439.76 ps] change: [-0.3725% -0.2092% -0.0614%] (p = 0.01 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) high mild 10 (10.00%) high severe comparison/log time: [227.05 ps 227.14 ps 227.27 ps] change: [+3.1351% +3.2984% +3.4537%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) high mild 9 (9.00%) high severe ``` ``` Running benches/dispatch_get_clone.rs (target/release/deps/dispatch_get_clone-d4d6ca1f9895e432) Dispatch::get_clone/none time: [8.3974 ns 8.4004 ns 8.4039 ns] change: [-22.870% -22.796% -22.728%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 4 (4.00%) high severe Dispatch::get_clone/scoped time: [15.877 ns 15.959 ns 16.045 ns] change: [+52.358% +52.943% +53.500%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) high mild 14 (14.00%) high severe Dispatch::get_clone/global time: [8.3962 ns 8.4000 ns 8.4054 ns] change: [-19.126% -18.961% -18.817%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 2 (2.00%) low severe 6 (6.00%) high mild 7 (7.00%) high severe ``` ``` Running benches/dispatch_get_ref.rs (target/release/deps/dispatch_get_ref-6ce05749a0b1bf87) Dispatch::get_ref/none time: [1.7551 ns 1.7564 ns 1.7579 ns] change: [-51.858% -51.749% -51.644%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe Dispatch::get_ref/scoped time: [3.6341 ns 3.6365 ns 3.6397 ns] change: [-2.6892% -2.5955% -2.4968%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) high mild 7 (7.00%) high severe Dispatch::get_ref/global time: [1.7668 ns 1.7686 ns 1.7713 ns] change: [-52.697% -52.647% -52.603%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe ``` ``` Running benches/empty_span.rs (target/release/deps/empty_span-745c777d77b8b7ca) empty_span/none time: [227.02 ps 227.10 ps 227.20 ps] change: [-0.1729% -0.0705% +0.0495%] (p = 0.24 > 0.05) No change in performance detected. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) high mild 5 (5.00%) high severe empty_span/scoped time: [218.51 ps 218.69 ps 218.90 ps] change: [-0.7582% -0.6056% -0.4630%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe empty_span/global time: [217.85 ps 218.15 ps 218.56 ps] change: [-2.6528% -2.4341% -2.1602%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe empty_span/baseline_struct time: [655.54 ps 656.09 ps 656.76 ps] change: [-1.6595% -1.4125% -1.1776%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe ``` ``` Running benches/enter_span.rs (target/release/deps/enter_span-7fc1c2a69c076475) enter_span/none time: [0.0000 ps 0.0000 ps 0.0000 ps] change: [-43.600% +6.5764% +109.38%] (p = 0.86 > 0.05) No change in performance detected. Found 14 outliers among 100 measurements (14.00%) 6 (6.00%) high mild 8 (8.00%) high severe enter_span/scoped time: [2.6513 ns 2.6567 ns 2.6641 ns] change: [+0.3121% +1.9504% +3.4648%] (p = 0.01 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe enter_span/global time: [3.2108 ns 3.2160 ns 3.2220 ns] change: [+25.963% +26.742% +27.434%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild ``` ``` Running benches/event.rs (target/release/deps/event-6742eef6ebe07aa4) event/none time: [227.04 ps 227.18 ps 227.41 ps] change: [-1.6751% -1.5743% -1.4711%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 6 (6.00%) high mild 7 (7.00%) high severe event/scoped time: [8.3849 ns 8.4335 ns 8.4888 ns] change: [-3.4754% -3.0252% -2.6092%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe event/scoped_recording time: [36.916 ns 37.022 ns 37.194 ns] change: [+8.1054% +18.714% +30.381%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low mild 2 (2.00%) high mild 7 (7.00%) high severe event/global time: [6.9694 ns 7.1677 ns 7.3469 ns] change: [-23.407% -21.940% -20.398%] (p = 0.00 < 0.05) Performance has improved. ``` ``` Running benches/span_fields.rs (target/release/deps/span_fields-96dfd0a8a577dec6) span_fields/none time: [3.5936 ns 3.6008 ns 3.6106 ns] change: [+17.160% +17.776% +18.413%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 8 (8.00%) high mild 6 (6.00%) high severe span_fields/scoped time: [33.751 ns 33.765 ns 33.779 ns] change: [+22.689% +22.873% +23.037%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 1 (1.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe span_fields/scoped_recording time: [270.22 ns 270.55 ns 270.91 ns] change: [+10.615% +10.827% +11.028%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 6 (6.00%) high mild span_fields/global time: [28.337 ns 28.428 ns 28.527 ns] change: [+3.0582% +3.3355% +3.6278%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 13 (13.00%) high mild ``` ``` Running benches/span_no_fields.rs (target/release/deps/span_no_fields-f8c7d7a84f720442) span_no_fields/none time: [1.5467 ns 1.5507 ns 1.5553 ns] change: [+12.966% +13.206% +13.434%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 7 (7.00%) high mild 1 (1.00%) high severe span_no_fields/scoped time: [17.796 ns 17.810 ns 17.826 ns] change: [+1.0381% +1.1673% +1.2914%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 6 (6.00%) high mild 6 (6.00%) high severe span_no_fields/scoped_recording time: [30.397 ns 30.459 ns 30.524 ns] change: [-0.8489% -0.6268% -0.3915%] (p = 0.00 < 0.05) Change within noise threshold. span_no_fields/global time: [12.747 ns 12.791 ns 12.844 ns] change: [-27.930% -27.672% -27.386%] (p = 0.00 < 0.05) Performance has improved. ``` ``` Running benches/span_repeated.rs (target/release/deps/span_repeated-03bfaaf4ecd13d36) span_repeated/none time: [699.28 ns 699.84 ns 700.53 ns] change: [+2.4125% +2.6359% +2.8862%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 7 (7.00%) high mild 2 (2.00%) high severe span_repeated/scoped time: [2.5029 µs 2.5057 µs 2.5090 µs] change: [+4.5095% +4.6605% +4.8122%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 8 (8.00%) low mild 6 (6.00%) high mild 2 (2.00%) high severe span_repeated/scoped_recording time: [5.0509 µs 5.0535 µs 5.0566 µs] change: [+0.7346% +1.0724% +1.3718%] (p = 0.00 < 0.05) Change within noise threshold. Found 13 outliers among 100 measurements (13.00%) 6 (6.00%) high mild 7 (7.00%) high severe span_repeated/global time: [2.1264 µs 2.1272 µs 2.1282 µs] change: [-11.213% -11.119% -11.031%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) high mild 5 (5.00%) high severe ``` </details>
# 0.1.31 (May 11, 2023) This release of `tracing-core` fixes a bug that caused threads which call `dispatcher::get_default` _before_ a global default subscriber is set to never see the global default once it is set. In addition, it includes improvements for instrumentation performance in some cases, especially when using a global default dispatcher. ### Fixed - Fixed incorrect thread-local caching of `Dispatch::none` if `dispatcher::get_default` is called before `dispatcher::set_global_default` (tokio-rs#2593) ### Changed - Cloning a `Dispatch` that points at a global default subscriber no longer requires an `Arc` reference count increment, improving performance substantially (tokio-rs#2593) - `dispatcher::get_default` no longer attempts to access a thread local if the scoped dispatcher is not in use, improving performance when the default dispatcher is global (tokio-rs#2593) - Added `#[inline]` annotations called by the `event!` and `span!` macros to reduce the size of macro-generated code and improve recording performance (tokio-rs#2555) Thanks to new contributor @ldm0 for contributing to this release!
Motivation
Currently, when a call to
dispatcher::get_default
occurs,tracing
will check the thread-local default dispatcher first. If a thread-local
scoped default is set, it is returned. Otherwise, the thread will then
check the global default. If a global default is present, it is then
cached in the thread local, so that subsequent calls do not need to
check the global default.
Unfortunately, this behavior results in issues if the scoped default is
accessed (e.g. using
get_default
or creating a new span) prior to aglobal default being set. When
get_default
runs for the first time andthere is no global default, a
none
dispatcher is cached as thethread-local default. This means that the thread will now behave as
though its default dispatcher is
None
until the scoped default isoverridden, even if a global default is then set later. This is quite
bad, and results in issues such as #2587, #2436, and #2411.
Solution
This branch makes several changes to remove the use of the thread-local
caching of the global default dispatcher, and to lessen the performance
impact of doing so.
On the
master
(v0.2.x) branch, we track the number of scopeddispatchers currently active, and use it to determine whether or not to
check thread-local storage at all. This optimization was introduced in
PR #1017. This branch backports a similar change to
v0.1.x
.In addition, #1017 also changes the dispatcher module to represent a
Dispatch
internally using an enum of either anArc
in the case wherethe dispatcher is scoped, or a
&'static dyn Subscriber + Send + Sync
reference when the dispatcher is the global default. This makes cloning
and constructing the global default cheaper, and also allows us to
change the
None
dispatcher into a static singleton. That means thatthe use of a
None
dispatcher no longer requires an allocation and arcreference bump, an issue which was previously resolved by locally
caching a
None
dispatcher. A side benefit of this change is thatcloning a
Dispatch
is substantially cheaper when the dispatcher is aglobal default, as it's just an
&'static
reference and noArc
bumpis necessary. This will also make cloning a
Span
cheaper when theglobal default dispatcher is in use.
Finally, because the overhead of getting the global default is
substantially reduced, we are able to change the scoped default
dispatcher's behavior to remove the caching entirely. This means that
the category of bugs involving the local cache becoming stale is
resolved entirely.
Fixes #2587
Fixes #2436
Fixes #2411
Closes #2592
Performance Impact
This change results in a change in performance characteristics. Running
the benchmarks, we observe a significant improvement in performance in
most of the benchmarks that use the global default dispatcher, and a
noticeable decrease in performance for some benchmarks using the scoped
default. In my opinion, this performance change is acceptable, as a
global default dispatcher is the common case for most users, and is
generally expected to perform better than the scoped default. In
addition, resolving the variety of bugs that are caused by the local
caching of the default when using the scoped default dispatcher is worth
a performance cost when the scoped default is in use.
Benchmark results: