Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix first Reactor sleep taking at least 30-50ms #653

Merged
merged 3 commits into from
Apr 25, 2024

Conversation

vlovich
Copy link
Contributor

@vlovich vlovich commented Apr 16, 2024

Noticed this problem in unit tests where the very first timer sleep would take a very long time because the membarrier was being registered and tests have more than 1 thread.

Initializing the membarrier strategy on Reactor::new seems like a good idea because it's highly likely any use of the Reactor will involve going to sleep and I can't imagine a use-case where that's not the case unless you're not interacting with anything within Glommio in the first place.

With this change we see that the very first timer now completes within 11ms (I added a grace window of an extra ms in case of CI).

What does this PR do?

The first construction of the IO uring reactor now registers the membarrier.

Motivation

I had tests that were flakily failing because a very short sleep was taking an almost unbounded amount of time (observed up to 60ms even for a 5ms sleep). I think it's cleaner if the membarrier initialization is front-loaded into Reactor construction vs paying that penalty on the very first .await later where it's a bit more non-obvious and surprising.

Related issues

Fixes #652

Additional Notes

I believe the cost of registration when there's > 1 thread running got worse sometime after Linux 6.6 because tests in my project that do something similar started regularly taking >30ms after upgrading to 6.8 whereas before they were mostly succeeding < 30ms (it was my grace window for how long a 10ms sleep could take).

Checklist

[X] I have added unit tests to the code I am submitting
[X] My unit tests cover both failure and success scenarios
[] If applicable, I have discussed my architecture

@vlovich
Copy link
Contributor Author

vlovich commented Apr 24, 2024

Ping on this in case you missed it in your review batch @glommer

@glommer
Copy link
Collaborator

glommer commented Apr 24, 2024

I had missed it, indeed.

@glommer
Copy link
Collaborator

glommer commented Apr 24, 2024

Can we add a comment on the code explaining why we're doing this, so nobody else removes that in the future?

Noticed this problem in unit tests where the very first timer sleep
would take a very long time because the membarrier was being registered
and tests have more than 1 thread.

Initializing the membarrier strategy before the BlockingThreadPool is
constructed seems like a good idea because it tries to elide the
registration cost if we haven't created any other threads yet. Even if
there are threads, the cost is front-loaded eagerly so it's part of the
cost of creating the executor rather than appearing as a random delay
going to sleep on the io_uring the first time (assuming it could
otherwise wake before the 30-80ms cost observed).

I believe the cost of registration got worse sometime after Linux 6.6
because tests in my project that do something similar started regularly
taking >30ms after upgrading to 6.8 whereas before they were mostly
succeeding < 30ms (it was my grace window for how long a 10ms sleep
could take).

With this change we see that the very first timer now completes within
11ms (I added a grace window of an extra ms in case of CI).
@vlovich
Copy link
Contributor Author

vlovich commented Apr 24, 2024

Good note. I've added much more extensive documentation for the subtleties. I also realized that it's placement was incorrect as it needed to preceed the construction of threads in BlockingThreadPool. I also realized it's necessary as an external API (or we can add a dependency on the ctor crate to do this transparently) for 2 reasons:

  1. The user may start their own threads before glommio
  2. The user may sandbox their app before glommio & thus glommio will get a privilege denial for trying to register the membarrier

I don't know the policy on adding dependency on 3p crates so I went with an explicit API that users can call if it's needed for them (most don't) but ctor magic would be more user-friendly although portability may be a concern (I think it's well supported on tier 1 platforms but I'm sure there's always subtleties that can appear with that kind of low-level stuff). If you'd prefer magic, I'm happy to change the PR & add a dependency on ctor. Let me know.

@glommer
Copy link
Collaborator

glommer commented Apr 25, 2024

As I get older, I dislike magic more and more.

@glommer glommer merged commit 2418fa3 into DataDog:master Apr 25, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Timer adds a surprising amount of delay vs thread::sleep
2 participants