Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support racy initialization of an Executor's state #108

Merged
merged 6 commits into from
Apr 9, 2024

Conversation

james7132
Copy link
Contributor

@james7132 james7132 commented Apr 1, 2024

Fixes #89. Uses @notgull's suggestion of using a AtomicPtr with a racy initialization instead of a OnceCell.

For the addition of more unsafe, I added the clippy::undocumented_unsafe_blocks lint at a warn, and fixed a few of the remaining open clippy issues (i.e. Waker::clone_from already handling the case where they're equal).

Removing async_lock as a dependency shouldn't be a SemVer breaking change.

@james7132
Copy link
Contributor Author

@kettle11 can you test this out in a wasm environment with atomics enabled? This should fix the issue you were running into.

Copy link
Member

@notgull notgull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

src/lib.rs Outdated Show resolved Hide resolved
@james7132 james7132 requested a review from notgull April 8, 2024 21:00
Copy link
Member

@notgull notgull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@notgull
Copy link
Member

notgull commented Apr 8, 2024

Rebase this on master and then I will merge

@james7132
Copy link
Contributor Author

I'm not sure if we can relax the atomic orderings on the access or not. There's no strict ordering requirements, per se. The worst that happens if it's improperly ordered is it allocates a few extra states that are immediately deallocated.

@notgull
Copy link
Member

notgull commented Apr 8, 2024

Yeah I think it's fine to only use Acquire ordering to load the pointer. Since there are no side effects to creating a new state allocation and it's cold code anyways, there shouldn't be any issues here.

@james7132
Copy link
Contributor Author

Could we just use Relaxed then? All that is needed is that the operation of assigning the pointer is atomic.

@notgull
Copy link
Member

notgull commented Apr 8, 2024

I know there is some weirdness around Relaxed in the C memory model, especially around ordering.

Let's merge this with Acquire ordering for now. Then we can open a new PR for Relaxed ordering, where we can benchmark if it's actually worth it.

@james7132
Copy link
Contributor Author

james7132 commented Apr 8, 2024

Quick check against the benchmarks seems to show a (mostly) positive improvement over the use of async_lock::OnceCell:

executor::create        time:   [1.1075 µs 1.1082 µs 1.1089 µs]
                        change: [-7.9526% -7.8321% -7.7141%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

single_thread/executor::spawn_one
                        time:   [1.6380 µs 1.6489 µs 1.6654 µs]
                        change: [-1.2081% +0.7322% +3.0809%] (p = 0.51 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe
single_thread/executor::spawn_batch
                        time:   [29.932 µs 31.962 µs 35.088 µs]
                        change: [+5.5198% +15.526% +27.277%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe
single_thread/executor::spawn_many_local
                        time:   [5.0103 ms 5.0233 ms 5.0366 ms]
                        change: [-7.9087% -6.4987% -5.2481%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
single_thread/executor::spawn_recursively
                        time:   [30.415 ms 30.888 ms 31.403 ms]
                        change: [-38.044% -36.803% -35.536%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe
single_thread/executor::yield_now
                        time:   [5.2423 ms 5.2992 ms 5.3750 ms]
                        change: [-1.9443% -0.7857% +0.6821%] (p = 0.27 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe

multi_thread/executor::spawn_one
                        time:   [1.4755 µs 1.5024 µs 1.5404 µs]
                        change: [-11.989% -6.9781% -2.6378%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe
multi_thread/executor::spawn_batch
                        time:   [47.541 µs 54.625 µs 66.057 µs]
                        change: [-16.793% -9.5843% -0.0431%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
multi_thread/executor::spawn_many_local
                        time:   [24.540 ms 24.663 ms 24.792 ms]
                        change: [-9.3981% -6.1580% -3.5130%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking multi_thread/executor::spawn_recursively: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 17.3s, or reduce sample count to 20.
multi_thread/executor::spawn_recursively
                        time:   [171.80 ms 172.19 ms 172.63 ms]
                        change: [-29.251% -25.846% -22.158%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
multi_thread/executor::yield_now
                        time:   [23.836 ms 24.130 ms 24.370 ms]
                        change: [+25.755% +29.244% +32.596%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) low severe
  3 (3.00%) low mild

It may be even faster if we could use a Box instead of an Arc to avoid the costs of atomic increments/decrements, and just be sure to deallcoate all of the tasks before the state when dropping the executor, but we can save that for another PR.

@notgull
Copy link
Member

notgull commented Apr 9, 2024

That makes sense; we're basically doing what OnceCell is doing but doing more operations at once. Once you rebase on master I can merge.

@james7132
Copy link
Contributor Author

Should be ready to go.

@notgull
Copy link
Member

notgull commented Apr 9, 2024

image

I can't merge, and I can't force-push to your branch, please make sure you do this, where canonical is this repository's remote:

$ git fetch canonical master
$ git rebase canonical/master
$ git push origin racy-initialization --force

@james7132 james7132 force-pushed the racy-initialization branch from 83c4c3b to 14fd269 Compare April 9, 2024 02:34
@notgull notgull merged commit 649bdfd into smol-rs:master Apr 9, 2024
8 checks passed
@notgull notgull mentioned this pull request Apr 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Panic on multithreaded browser Wasm
2 participants