stronger memory load acquire when reading the wakeup flag #219

jorangreef · 2020-10-02T11:18:31Z

@axboe, The man page for io_uring_setup.2 recently added the rationale for using memory load acquire semantics when checking sq_ring->flags:

/*
 * Ensure that the wakeup flag is read after the tail pointer
 * has been written. It's important to use memory load acquire
 * semantics for the flags read, as otherwise the application
 * and the kernel might not agree on the consistency of the
 * wakeup flag.
 */
unsigned flags = atomic_load_relaxed(sq_ring->flags);

However, looking closer, just after the comment, the actual semantics being used in the example above are atomic_load_relaxed instead of atomic_load_acquire?

The same might also be the case in queue.c for the implementation of sq_ring_needs_enter(), which uses IO_URING_READ_ONCE(*ring->sq.kflags).

To avoid I/O starvation, maybe a stronger barrier such as acquire is needed here instead of relaxed?

The text was updated successfully, but these errors were encountered:

jorangreef · 2020-10-02T11:19:12Z

If you agree, may I submit a PR to fix the man page and queue.c?

romange · 2020-10-02T12:40:02Z

I think the comment talks about something else. The barrier happens at __io_uring_flush_sq : io_uring_smp_store_release(sq->ktail, ktail);

jorangreef · 2020-10-02T13:39:58Z

Thanks @romange, this is about flags not tail, see: https://github.com/axboe/liburing/blob/master/man/io_uring_setup.2#L81-L110 and also https://github.com/axboe/liburing/blob/master/src/queue.c#L17-L32

jorangreef · 2020-10-02T13:42:58Z

@romange See the commit in #194 where the rationale for the need to guard flags when checking for wakeup is discussed.

romange · 2020-10-03T15:39:31Z

@jorangreef Not trying to argue or anything - I am really not in expert in memory ordering.. Also I do not know io_uring code so I could be easily wrong. I think there is a bigger issue here and it's not related to barrier...

atomic variables are always synchronized, even with relaxed semantics. Barriers are needed to formally define how data is synchronized before and after they are read or write.

2.What confused me here is that this code:

 if (load(*kflags, relaxed) == IORING_SQ_NEED_WAKEUP) 
  do_system_call(IORING_ENTER_SQ_WAKEUP);

should not require any ordering since it does not depend on any data besides kflags. But I also think you are correct and there is a bug here - and some notifications could be missed.

I assume that in the kernel side (and really I do not know since I could not find a clear code snippet that proves my point):
there is a SQ thread that does this:

if (too_much_pointless_spinning) {
 /* () */
*kflags |= IORING_SQ_NEED_WAKEUP; 
 mem_barrier_release();
 suspend_myself()
}

which signals user thread to waken him. However, this could happen:
a. Kernel thread would preempt before changing *kflags but after deciding it needs to go to sleep (at /* ()*/ line)
b. A user thread reads *kflags before kernel thread mutated the flags.
c. A kernel thread goes to sleep and a user thread does not pass IORING_SQ_NEED_WAKEUP.

I am assuming that the kernel can not guarantee atomicity (i.e. transactional semantics) of multiple statements with respect to userland threads but maybe I am wrong. But If I am not wrong then the problem with the interface and not with memory barriers.

There is a wonderful lock-free algorithm that is called event-count that can solve this issue but it requires more that just a bit in an integer.

romange · 2020-10-03T15:56:18Z

Specifically, the question is whether a userland thread can run when ctx->completion_lock is taken.

https://github.com/torvalds/linux/blob/d3d45f8220d60a0b2aaaacf8fb2be4e6ffd9008e/fs/io_uring.c#L6549

if it can - we have a bug that requires a different algorithm than just atomic.set/get

axboe · 2020-10-03T16:12:12Z

The wakeup flag is ordered with the SQ ring kernel tail read, so as far as I can tell the logic should be sound. The kernel checks if it should go to sleep after setting IORING_SQ_NEED_WAKEUP and doesn't if the SQ ring has entries. The app/liburing side flushes entries to the SQ ring, then checks flags for needing to enter.

romange · 2020-10-03T17:41:34Z

You are right, I missed the piece where kernel checks the tail. So the ordering is around the tail which synchronizes both threads.
In that case relaxed load should suffice for kflags.

axboe · 2020-10-03T17:42:33Z

Thanks for double checking.

jorangreef · 2020-10-04T11:08:20Z

This has been an interesting thought exercise.

Is this comment in the man page still correct then?

/*
 * Ensure that the wakeup flag is read after the tail pointer
 * has been written. It's important to use MEMORY LOAD ACQUIRE
 * semantics for the flags read, as otherwise the application
 * and the kernel might not agree on the consistency of the
 * wakeup flag.
 */

I think this should then be changed to MEMORY LOAD RELAXED?

axboe closed this as completed Oct 3, 2020

almogkh mentioned this issue Feb 26, 2022

Improper memory ordering when reading wakeup flag with SQPOLL #541

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stronger memory load acquire when reading the wakeup flag #219

stronger memory load acquire when reading the wakeup flag #219

jorangreef commented Oct 2, 2020

jorangreef commented Oct 2, 2020

romange commented Oct 2, 2020 •

edited

Loading

jorangreef commented Oct 2, 2020

jorangreef commented Oct 2, 2020 •

edited

Loading

romange commented Oct 3, 2020 •

edited

Loading

romange commented Oct 3, 2020 •

edited

Loading

axboe commented Oct 3, 2020

romange commented Oct 3, 2020

axboe commented Oct 3, 2020

jorangreef commented Oct 4, 2020 •

edited

Loading

stronger memory load acquire when reading the wakeup flag #219

stronger memory load acquire when reading the wakeup flag #219

Comments

jorangreef commented Oct 2, 2020

jorangreef commented Oct 2, 2020

romange commented Oct 2, 2020 • edited Loading

jorangreef commented Oct 2, 2020

jorangreef commented Oct 2, 2020 • edited Loading

romange commented Oct 3, 2020 • edited Loading

romange commented Oct 3, 2020 • edited Loading

axboe commented Oct 3, 2020

romange commented Oct 3, 2020

axboe commented Oct 3, 2020

jorangreef commented Oct 4, 2020 • edited Loading

romange commented Oct 2, 2020 •

edited

Loading

jorangreef commented Oct 2, 2020 •

edited

Loading

romange commented Oct 3, 2020 •

edited

Loading

romange commented Oct 3, 2020 •

edited

Loading

jorangreef commented Oct 4, 2020 •

edited

Loading