Skip to content
This repository has been archived by the owner on Apr 13, 2019. It is now read-only.

Commit

Permalink
md: fix a potential deadlock of raid5/raid10 reshape
Browse files Browse the repository at this point in the history
There is a potential deadlock if mount/umount happens when
raid5_finish_reshape() tries to grow the size of emulated disk.

How the deadlock happens?
1) The raid5 resync thread finished reshape (expanding array).
2) The mount or umount thread holds VFS sb->s_umount lock and tries to
   write through critical data into raid5 emulated block device. So it
   waits for raid5 kernel thread handling stripes in order to finish it
   I/Os.
3) In the routine of raid5 kernel thread, md_check_recovery() will be
   called first in order to reap the raid5 resync thread. That is,
   raid5_finish_reshape() will be called. In this function, it will try
   to update conf and call VFS revalidate_disk() to grow the raid5
   emulated block device. It will try to acquire VFS sb->s_umount lock.
The raid5 kernel thread cannot continue, so no one can handle mount/
umount I/Os (stripes). Once the write-through I/Os cannot be finished,
mount/umount will not release sb->s_umount lock. The deadlock happens.

The raid5 kernel thread is an emulated block device. It is responible to
handle I/Os (stripes) from upper layers. The emulated block device
should not request any I/Os on itself. That is, it should not call VFS
layer functions. (If it did, it will try to acquire VFS locks to
guarantee the I/Os sequence.) So we have the resync thread to send
resync I/O requests and to wait for the results.

For solving this potential deadlock, we can put the size growth of the
emulated block device as the final step of reshape thread.

2017/12/29:
Thanks to Guoqing Jiang <[email protected]>,
we confirmed that there is the same deadlock issue in raid10. It's
reproducible and can be fixed by this patch. For raid10.c, we can remove
the similar code to prevent deadlock as well since they has been called
before.

Reported-by: Alex Wu <[email protected]>
Reviewed-by: Alex Wu <[email protected]>
Reviewed-by: Chung-Chiang Cheng <[email protected]>
Signed-off-by: BingJing Chang <[email protected]>
Signed-off-by: Shaohua Li <[email protected]>
  • Loading branch information
bingjingc authored and Shaohua Li committed Feb 25, 2018
1 parent 43a5212 commit 8876391
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 14 deletions.
13 changes: 13 additions & 0 deletions drivers/md/md.c
Original file line number Diff line number Diff line change
Expand Up @@ -8569,6 +8569,19 @@ void md_do_sync(struct md_thread *thread)
set_mask_bits(&mddev->sb_flags, 0,
BIT(MD_SB_CHANGE_PENDING) | BIT(MD_SB_CHANGE_DEVS));

if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
mddev->delta_disks > 0 &&
mddev->pers->finish_reshape &&
mddev->pers->size &&
mddev->queue) {
mddev_lock_nointr(mddev);
md_set_array_sectors(mddev, mddev->pers->size(mddev, 0, 0));
mddev_unlock(mddev);
set_capacity(mddev->gendisk, mddev->array_sectors);
revalidate_disk(mddev->gendisk);
}

spin_lock(&mddev->lock);
if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
/* We completed so min/max setting can be forgotten if used. */
Expand Down
8 changes: 1 addition & 7 deletions drivers/md/raid10.c
Original file line number Diff line number Diff line change
Expand Up @@ -4832,17 +4832,11 @@ static void raid10_finish_reshape(struct mddev *mddev)
return;

if (mddev->delta_disks > 0) {
sector_t size = raid10_size(mddev, 0, 0);
md_set_array_sectors(mddev, size);
if (mddev->recovery_cp > mddev->resync_max_sectors) {
mddev->recovery_cp = mddev->resync_max_sectors;
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
}
mddev->resync_max_sectors = size;
if (mddev->queue) {
set_capacity(mddev->gendisk, mddev->array_sectors);
revalidate_disk(mddev->gendisk);
}
mddev->resync_max_sectors = mddev->array_sectors;
} else {
int d;
rcu_read_lock();
Expand Down
8 changes: 1 addition & 7 deletions drivers/md/raid5.c
Original file line number Diff line number Diff line change
Expand Up @@ -8000,13 +8000,7 @@ static void raid5_finish_reshape(struct mddev *mddev)

if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {

if (mddev->delta_disks > 0) {
md_set_array_sectors(mddev, raid5_size(mddev, 0, 0));
if (mddev->queue) {
set_capacity(mddev->gendisk, mddev->array_sectors);
revalidate_disk(mddev->gendisk);
}
} else {
if (mddev->delta_disks <= 0) {
int d;
spin_lock_irq(&conf->device_lock);
mddev->degraded = raid5_calc_degraded(conf);
Expand Down

0 comments on commit 8876391

Please sign in to comment.