-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix first Reactor sleep taking at least 30-50ms #653
Conversation
f3e7916
to
f6f3963
Compare
Ping on this in case you missed it in your review batch @glommer |
I had missed it, indeed. |
Can we add a comment on the code explaining why we're doing this, so nobody else removes that in the future? |
Noticed this problem in unit tests where the very first timer sleep would take a very long time because the membarrier was being registered and tests have more than 1 thread. Initializing the membarrier strategy before the BlockingThreadPool is constructed seems like a good idea because it tries to elide the registration cost if we haven't created any other threads yet. Even if there are threads, the cost is front-loaded eagerly so it's part of the cost of creating the executor rather than appearing as a random delay going to sleep on the io_uring the first time (assuming it could otherwise wake before the 30-80ms cost observed). I believe the cost of registration got worse sometime after Linux 6.6 because tests in my project that do something similar started regularly taking >30ms after upgrading to 6.8 whereas before they were mostly succeeding < 30ms (it was my grace window for how long a 10ms sleep could take). With this change we see that the very first timer now completes within 11ms (I added a grace window of an extra ms in case of CI).
Good note. I've added much more extensive documentation for the subtleties. I also realized that it's placement was incorrect as it needed to preceed the construction of threads in BlockingThreadPool. I also realized it's necessary as an external API (or we can add a dependency on the ctor crate to do this transparently) for 2 reasons:
I don't know the policy on adding dependency on 3p crates so I went with an explicit API that users can call if it's needed for them (most don't) but ctor magic would be more user-friendly although portability may be a concern (I think it's well supported on tier 1 platforms but I'm sure there's always subtleties that can appear with that kind of low-level stuff). If you'd prefer magic, I'm happy to change the PR & add a dependency on ctor. Let me know. |
As I get older, I dislike magic more and more. |
Noticed this problem in unit tests where the very first timer sleep would take a very long time because the membarrier was being registered and tests have more than 1 thread.
Initializing the membarrier strategy on Reactor::new seems like a good idea because it's highly likely any use of the Reactor will involve going to sleep and I can't imagine a use-case where that's not the case unless you're not interacting with anything within Glommio in the first place.
With this change we see that the very first timer now completes within 11ms (I added a grace window of an extra ms in case of CI).
What does this PR do?
The first construction of the IO uring reactor now registers the membarrier.
Motivation
I had tests that were flakily failing because a very short sleep was taking an almost unbounded amount of time (observed up to 60ms even for a 5ms sleep). I think it's cleaner if the membarrier initialization is front-loaded into Reactor construction vs paying that penalty on the very first .await later where it's a bit more non-obvious and surprising.
Related issues
Fixes #652
Additional Notes
I believe the cost of registration when there's > 1 thread running got worse sometime after Linux 6.6 because tests in my project that do something similar started regularly taking >30ms after upgrading to 6.8 whereas before they were mostly succeeding < 30ms (it was my grace window for how long a 10ms sleep could take).
Checklist
[X] I have added unit tests to the code I am submitting
[X] My unit tests cover both failure and success scenarios
[] If applicable, I have discussed my architecture