-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improper memory ordering when reading wakeup flag with SQPOLL #541
Comments
Care to send a patch for this? |
A full memory barrier is required between the store to the SQ tail in __io_uring_flush_sq and the load of the flags in sq_ring_needs_enter to prevent a situation where the kernel thread goes to sleep while sq_ring_needs_enter returns false Fixes: axboe#541 Signed-off-by: Almog Khaikin <almogkh@gmail.com>
A full memory barrier is required between the store to the SQ tail in __io_uring_flush_sq and the load of the flags in sq_ring_needs_enter to prevent a situation where the kernel thread goes to sleep while sq_ring_needs_enter returns false Fixes: axboe#541 Signed-off-by: Almog Khaikin <almogkh@gmail.com>
@almogkh Thanks for the explanation about the fix. I wonder if there is another situation that could cause missed wakeup.
In the above situation, suppose we don't have any reordering, the Application submits a SQ work and wakes up kernel between Kernel checks tail and Kernel calls schedule(). Is this a potential issue, or am I missing something? Thanks! |
There is no issue in the situation you describe. The first thing the SQPOLL thread does when it's about to go to sleep, even before it sets the If the SQPOLL thread already checked the SQ and it was empty, then it will call |
Got it, my thought was wrong. Thanks again! |
This is a followup to #219.
In that discussion it was decided that a relaxed load is sufficient even though the documentation states that an acquire load is needed but it's actually the opposite. Between updating the sq tail and reading the flags a full memory barrier (
smp_mb
) is required. This is even documented in the kernel source code:This is a case of a read-after-write which neither acquire nor release semantics deal with. Even on x86 which has a strong memory model a read-after-write is the only situation where memory operations can be reordered due to the store buffer.
This same bug is also present in the kernel code. The kernel sets the NEED_WAKEUP flag and then checks if there are SQEs one last time before going to sleep. This is again a read-after-write which means a full memory barrier is required but it's missing in the kernel code.
With the current implementation the following sequence of events is possible:
The result is there is now an IO operation sitting in the sq and not being processed while the application thinks the operation was submitted. I used x86 specific terminology and referenced the store buffer just to simplify the explanation but this applies to all architectures. In the C11 memory model a WriteRead can be reordered if there is no full barrier between the two operations.
The text was updated successfully, but these errors were encountered: