Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improper memory ordering when reading wakeup flag with SQPOLL #541

Closed
almogkh opened this issue Feb 26, 2022 · 4 comments · Fixed by #542
Closed

Improper memory ordering when reading wakeup flag with SQPOLL #541

almogkh opened this issue Feb 26, 2022 · 4 comments · Fixed by #542

Comments

@almogkh
Copy link
Contributor

almogkh commented Feb 26, 2022

This is a followup to #219.
In that discussion it was decided that a relaxed load is sufficient even though the documentation states that an acquire load is needed but it's actually the opposite. Between updating the sq tail and reading the flags a full memory barrier (smp_mb) is required. This is even documented in the kernel source code:

 * When using the SQ poll thread (IORING_SETUP_SQPOLL), the application
 * needs to check the SQ flags for IORING_SQ_NEED_WAKEUP *after*
 * updating the SQ tail; a full memory barrier smp_mb() is needed
 * between.

This is a case of a read-after-write which neither acquire nor release semantics deal with. Even on x86 which has a strong memory model a read-after-write is the only situation where memory operations can be reordered due to the store buffer.

This same bug is also present in the kernel code. The kernel sets the NEED_WAKEUP flag and then checks if there are SQEs one last time before going to sleep. This is again a read-after-write which means a full memory barrier is required but it's missing in the kernel code.

With the current implementation the following sequence of events is possible:

Kernel sleep timer expires
Kernel sets NEED_WAKEUP flag
                                                         Application submits new work to the SQ (updates the SQ tail)
Kernel checks SQ for new entries and doesn't see the 
update to the SQ tail because it's in the store buffer
                                                         Application checks NEED_WAKEUP flag and doesn't see it because
                                                         the store to the flag is in the store buffer
Kernel thread goes to sleep
                                                         Application moves on without invoking io_uring_enter

The result is there is now an IO operation sitting in the sq and not being processed while the application thinks the operation was submitted. I used x86 specific terminology and referenced the store buffer just to simplify the explanation but this applies to all architectures. In the C11 memory model a WriteRead can be reordered if there is no full barrier between the two operations.

@axboe
Copy link
Owner

axboe commented Mar 5, 2022

Care to send a patch for this?

almogkh added a commit to almogkh/liburing that referenced this issue Mar 5, 2022
A full memory barrier is required between the store to the SQ tail in
__io_uring_flush_sq and the load of the flags in
sq_ring_needs_enter to prevent a situation where the kernel thread goes
to sleep while sq_ring_needs_enter returns false

Fixes: axboe#541
Signed-off-by: Almog Khaikin <almogkh@gmail.com>
almogkh added a commit to almogkh/liburing that referenced this issue Mar 5, 2022
A full memory barrier is required between the store to the SQ tail in
__io_uring_flush_sq and the load of the flags in sq_ring_needs_enter
to prevent a situation where the kernel thread goes to sleep while
sq_ring_needs_enter returns false

Fixes: axboe#541
Signed-off-by: Almog Khaikin <almogkh@gmail.com>
@axboe axboe closed this as completed in 744f415 Mar 10, 2022
@zhangjaycee
Copy link

@almogkh Thanks for the explanation about the fix. I wonder if there is another situation that could cause missed wakeup.

Kernel sleep timer expires
Kernel sets NEED_WAKEUP flag
Kernel checks SQ for new entries and it's empty
                                                         Application submits new work to the SQ (updates the SQ tail)
                                                         Application checks NEED_WAKEUP flag and see it
                                                         Application invokes io_uring_enter
Kernel thread goes to sleep without handling the last SQ work       

In the above situation, suppose we don't have any reordering, the Application submits a SQ work and wakes up kernel between Kernel checks tail and Kernel calls schedule(). Is this a potential issue, or am I missing something? Thanks!

@almogkh
Copy link
Contributor Author

almogkh commented Feb 1, 2024

There is no issue in the situation you describe. The first thing the SQPOLL thread does when it's about to go to sleep, even before it sets the NEED_WAKEUP flag, is call prepare_to_wait. From the moment that call happens, the application can wakeup the SQPOLL thread using io_uring_enter (which calls wake_up inside the kernel). All of this wait/wakeup code is part of the scheduler and not related to io_uring so I'm fairly certain there are no races there.

If the SQPOLL thread already checked the SQ and it was empty, then it will call schedule, but that's fine because after wake_up is called due to io_uring_enter, the SQPOLL thread will not be in a sleeping state and the scheduler will be able to schedule the thread to run again eventually.

@zhangjaycee
Copy link

There is no issue in the situation you describe. The first thing the SQPOLL thread does when it's about to go to sleep, even before it sets the NEED_WAKEUP flag, is call prepare_to_wait. From the moment that call happens, the application can wakeup the SQPOLL thread using io_uring_enter (which calls wake_up inside the kernel). All of this wait/wakeup code is part of the scheduler and not related to io_uring so I'm fairly certain there are no races there.

If the SQPOLL thread already checked the SQ and it was empty, then it will call schedule, but that's fine because after wake_up is called due to io_uring_enter, the SQPOLL thread will not be in a sleeping state and the scheduler will be able to schedule the thread to run again eventually.

Got it, my thought was wrong. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants