-
Notifications
You must be signed in to change notification settings - Fork 178
Add support for rw semaphore changes under PREEMPT_RT_FULL #589
Conversation
Testing right now with 4.8.14-rt9, runs fine so far, thanks 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key idea here looks good to me. I particularly like that this is a low risk change in the sense that all non-RT will be entirely unaffected.
Just some style issues and a question about the ASSERT.
#ifdef CONFIG_RWSEM_GENERIC_SPINLOCK | ||
#if defined(CONFIG_PREEMPT_RT_FULL) | ||
#define SPL_RWSEM_SINGLE_READER_VALUE (1) | ||
#define SPL_RWSEM_SINGLE_WRITER_VALUE (0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[cstyle] #define followed by space instead of tab
@@ -36,7 +39,9 @@ | |||
#endif | |||
|
|||
/* Linux 3.16 changed activity to count for rwsem-spinlock */ | |||
#if defined(HAVE_RWSEM_ACTIVITY) | |||
#if defined(CONFIG_PREEMPT_RT_FULL) | |||
#define RWSEM_COUNT(sem) sem->read_depth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[cstyle] #define followed by space instead of tab
* lock is held. On other platforms the lock is never released during | ||
* the upgrade process. This is necessary under Linux because the kernel | ||
* does not provide an upgrade function. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really should have dropped this comment in f58040c. I'm OK with removing it in this patch just make sure to update the commit comment to mention the above commit where it should have been dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've split the commit in two.
// the second attempt. Therefore the implementation allows a | ||
// single thread to take a rwsem as read lock multiple times | ||
// tracking that nesting as read_depth counter. | ||
if(rwsem->read_depth <= 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[cstyle] Missing space after if
. Should be if (rwsem...)
.
// read lock twice, as the mutex would already be locked on | ||
// the second attempt. Therefore the implementation allows a | ||
// single thread to take a rwsem as read lock multiple times | ||
// tracking that nesting as read_depth counter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[cstyle] This should be a C89 style block comment.
/*
* Under the realtime patch series...
* ...
* tracking that nesting as read_depth counter.
*/
if(rwsem->read_depth <= 1) { | ||
// In case, the current thread has not taken the lock more | ||
// than once as read lock, we can allow an upgrade to a write | ||
// lock. rwsem_rt.h implements write locks as read_depth == 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[cstyle] C89 comments should be used here and elsewhere.
static int | ||
__rwsem_tryupgrade(struct rw_semaphore *rwsem) | ||
{ | ||
ASSERT(rt_mutex_owner(&rwsem->lock) != current); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you mean ==
? The lock can only be upgraded when your holding it as a reader. Which would be exclusive in this case.
ASSERT(rt_mutex_owner(&rwsem->lock) == current);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is absolutely inverted. Fixed. The owner must be the current thread otherwise it's not legal to call the upgrade method. (Same condition as in the downgrade code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clefru this puzzled me too, initially. But, wasn't the first statement right? If you are trying to upgrade, you are only a reader, hence not a owner.
EDITED
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just reinspected 0124-rt-Add*.patch linked above. I am pretty sure the ASSERT is correct as is. Just search for rt_mt_owner in that file and you see a lot of comments that explain the semantic of that field, in particular that rt_mutex_owner(lock) == current
when current owns the lock.
I also found why my code was wrong. My first try had the same guard BUG_ON(rt_mutex_owner(&rwsem->lock) != current);
as the rt_downgrade_write method, but BUG_ON isn't available in spl, so I changed it to ASSERT(..) forgetting to invert the conditional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is somewhat confusing because with the RT kernel there really is no such thing as a read lock. They're all exclusive mutexs and thus have owners even when only held for read. Offhand I can't think of any place in the ZFS code where this change in semantics will result in a problem but it's something we should keep in mind.
@clefru thanks for updating this, the BUG_ON -> ASSERT inversion makes perfect sense.
@@ -207,6 +207,10 @@ splat_rwlock_test1(struct file *file, void *arg) | |||
rw_thr_t rwt[SPLAT_RWLOCK_TEST_COUNT]; | |||
rw_priv_t *rwp; | |||
|
|||
#if defined(CONFIG_PREEMPT_RT_FULL) | |||
// This test will never succeed on PREEMPT_RT_FULL because locks can only be held by a single thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[cstyle] 80 character limit
#if defined(CONFIG_PREEMPT_RT_FULL) | ||
// This test will never succeed on PREEMPT_RT_FULL because locks can only be held by a single thread. | ||
return 0; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest turning this in to an #else clause for readability.
ee7fb7e
to
610207d
Compare
I haven't found a way to run cstyle quickly, so I just visually inspected the patch. Pardon me if I have not spotted a mistake. Offtopic: Is there a good way to get emacs to do spl/zfs style? |
FYI on retesting the patch with zpios-sanity.sh, I hit an occasional deadlock. I haven't looked into that yet.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. There are still a few remaining style issues but I can fix those in the merge. We haven't yet had a change to cleanup the SPL for cstyle without automated the checking so it's easy to get wrong.
Commit f58040c should have removed this comment which is no longer relevant. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Clemens Fruhwirth <clemens@endorphin.org> Issue openzfs#589
The main complication from the RT patch set is that the RW semaphore locks change such that read locks on an rwsem can be taken only by a single thread. All other threads are locked out. This single thread can take a read lock multiple times though. The underlying implementation changes to a mutex with an additional read_depth count. The implementation can be best understood by inspecting the RT patch. rwsem_rt.h and rt.c give the best insight into how RT rwsem works. My implementation for rwsem_tryupgrade is basically an inversion of rt_downgrade_write found in rt.c. Please see the comments in the code. Unfortunately, I have to drop SPLAT rwlock test4 completely as this test tries to take multiple locks from different threads, which RT rwsems do not support. Otherwise SPLAT, zconfig.sh, zpios-sanity.sh and zfs-tests.sh pass on my Debian-testing VM with the kernel linux-image-4.8.0-1-rt-amd64. Tested-by: kernelOfTruth <kerneloftruth@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Clemens Fruhwirth <clemens@endorphin.org> Closes openzfs/zfs#5491 Closes openzfs#589 Closes openzfs#308
The main complication from the RT patch set is that the RW semaphore locks change in that read locks on an rwsem can be taken only by a single thread. All other threads are locked out. This single thread can take a read lock multiple times though. The underlying implementation changes to a mutex with an additional read_depth count.
The implementation can be best understood by inspecting the RT patch. rwsem_rt.h and rt.c give the best insight into how RT rwsem works. My implementation for rwsem_tryupgrade is basically an inversion of rt_downgrade_write found in rt.c. Please see the comments in the code.
Unfortunately, I have to drop SPLAT rwlock test4 completely as this test tries to take multiple locks from different threads, which RT rwsems do not support. Otherwise SPLAT, zconfig.sh, zpios-sanity.sh and zfs-tests.sh pass on my Debian-testing VM with linux-image-4.8.0-1-rt-amd64.
Assuming this PR is the right direction, I'll add a test to execute "rw_enter(rwp, RW_READER); rw_enter(rwp, RW_READER); ASSERT(rwsem_tryupgrade(rwp) == -EBUSY);" in the same thread.