This repository has been archived by the owner on Feb 26, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 179
mutex: force serialization on mutex_exit() to fix races #421
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Dec 18, 2014
This looks good to me. |
This was referenced Dec 19, 2014
It is known that mutex in Linux is not safe when using it to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx: zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). dbuf uses dbuf->db_mtx to protect reference counting. Add it's been reported multiple time that zio->io_lock and dbuf->db_mtx suffer from corruption, which is exactly the symptom of the race. openzfs/zfs#2523 openzfs/zfs#2897 This patch fix this kind of race by forcing serializaion on mutex_exit() with a spinlock, making the mutex safer by sacrificing a bit of performance and memory overhead. Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Update: add comment. |
@tuxoko Thanks for update (and fix). I've merged this patch which should resolve a whole slew of open issues. I've tried to list most of them in the commit message but I'm confident I missed several (likely many). @ryao @dajhorn @FransUrbo I don't usually recommend cherry picking fixes between tags. But if your going to update your distributions packages I'd strongly suggest cherry picking this fix prior to the 0.6.4 tag. It's small, safe, and address the most commonly reported issue. |
behlendorf
pushed a commit
that referenced
this pull request
Dec 23, 2014
It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #421
ryao
pushed a commit
to ryao/spl
that referenced
this pull request
Feb 19, 2015
It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes openzfs#421 Conflicts: include/sys/mutex.h
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It is known that mutex in Linux is not safe when using it to synchronize
the freeing of object in which the mutex is embedded:
http://lwn.net/Articles/575477/
The known places in ZFS which are suspected to suffer from the race condition
are zio->io_lock and dbuf->db_mtx:
zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait()
and zio_done().
dbuf uses dbuf->db_mtx to protect reference counting.
Add it's been reported multiple time that zio->io_lock and dbuf->db_mtx suffer
from corruption, which is exactly the symptom of the race.
openzfs/zfs#2523
openzfs/zfs#2897
This patch fix this kind of race by forcing serializaion on mutex_exit() with
a spinlock, making the mutex safer by sacrificing a bit of performance and
memory overhead.
Signed-off-by: Chunwei Chen tuxoko@gmail.com