summaryrefslogtreecommitdiff
path: root/drivers/md
AgeCommit message (Collapse)Author
2016-09-22dm array: introduce cursor apiJoe Thornber
More efficient way to iterate an array due to prefetching (makes use of the new dm_btree_cursor_* api). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-22dm btree: introduce cursor apiJoe Thornber
This uses prefetching to speed up iteration through a btree. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-22dm cache policy smq: distribute entries to random levels when switching to smqJoe Thornber
For smq the 32 bit 'hint' stores the multiqueue level that the entry should be stored in. If a different policy has been used previously, and then switched to smq, the hints will be invalid. In which case we used to put all entries in the bottom level of the multiqueue, and then redistribute. Redistribution is faster if we put entries with invalid hints in random levels initially. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-22dm cache: speed up writing of the hint arrayJoe Thornber
It's far quicker to always delete the hint array and recreate with dm_array_new() because we avoid the copying caused by mutation. Also simplifies the policy interface, replacing the walk_hints() with the simpler get_hint(). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-22dm array: add dm_array_new()Joe Thornber
dm_array_new() creates a new, populated array more efficiently than starting with an empty one and resizing. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-22block: export bio_free_pages to other modulesGuoqing Jiang
bio_free_pages is introduced in commit 1dfa0f68c040 ("block: add a helper to free bio bounce buffer pages"), we can reuse the func in other modules after it was imported. Cc: Christoph Hellwig <hch@infradead.org> Cc: Jens Axboe <axboe@fb.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Shaohua Li <shli@fb.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Acked-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-21raid5: handle register_shrinker failureShaohua Li
register_shrinker() now can fail. When it happens, shrinker.nr_deferred is null. We use it to determine if unregister_shrinker is required. Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21raid5: fix to detect failure of register_shrinkerChao Yu
register_shrinker can fail after commit 1d3d4437eae1 ("vmscan: per-node deferred work"), we should detect the failure of it, otherwise we may fail to register shrinker after raid5 configuration was setup successfully. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21md: fix a potential deadlockShaohua Li
lockdep reports a potential deadlock. Fix this by droping the mutex before md_import_device [ 1137.126601] ====================================================== [ 1137.127013] [ INFO: possible circular locking dependency detected ] [ 1137.127013] 4.8.0-rc4+ #538 Not tainted [ 1137.127013] ------------------------------------------------------- [ 1137.127013] mdadm/16675 is trying to acquire lock: [ 1137.127013] (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff81243cf3>] __blkdev_get+0x63/0x450 [ 1137.127013] but task is already holding lock: [ 1137.127013] (detected_devices_mutex){+.+.+.}, at: [<ffffffff81a5138c>] md_ioctl+0x2ac/0x1f50 [ 1137.127013] which lock already depends on the new lock. [ 1137.127013] the existing dependency chain (in reverse order) is: [ 1137.127013] -> #1 (detected_devices_mutex){+.+.+.}: [ 1137.127013] [<ffffffff810b6f19>] lock_acquire+0xb9/0x220 [ 1137.127013] [<ffffffff81c51647>] mutex_lock_nested+0x67/0x3d0 [ 1137.127013] [<ffffffff81a4eeaf>] md_autodetect_dev+0x3f/0x90 [ 1137.127013] [<ffffffff81595be8>] rescan_partitions+0x1a8/0x2c0 [ 1137.127013] [<ffffffff81590081>] __blkdev_reread_part+0x71/0xb0 [ 1137.127013] [<ffffffff815900e5>] blkdev_reread_part+0x25/0x40 [ 1137.127013] [<ffffffff81590c4b>] blkdev_ioctl+0x51b/0xa30 [ 1137.127013] [<ffffffff81242bf1>] block_ioctl+0x41/0x50 [ 1137.127013] [<ffffffff81214c96>] do_vfs_ioctl+0x96/0x6e0 [ 1137.127013] [<ffffffff81215321>] SyS_ioctl+0x41/0x70 [ 1137.127013] [<ffffffff81c56825>] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 1137.127013] -> #0 (&bdev->bd_mutex){+.+.+.}: [ 1137.127013] [<ffffffff810b6af2>] __lock_acquire+0x1662/0x1690 [ 1137.127013] [<ffffffff810b6f19>] lock_acquire+0xb9/0x220 [ 1137.127013] [<ffffffff81c51647>] mutex_lock_nested+0x67/0x3d0 [ 1137.127013] [<ffffffff81243cf3>] __blkdev_get+0x63/0x450 [ 1137.127013] [<ffffffff81244307>] blkdev_get+0x227/0x350 [ 1137.127013] [<ffffffff812444f6>] blkdev_get_by_dev+0x36/0x50 [ 1137.127013] [<ffffffff81a46d65>] lock_rdev+0x35/0x80 [ 1137.127013] [<ffffffff81a49bb4>] md_import_device+0xb4/0x1b0 [ 1137.127013] [<ffffffff81a513d6>] md_ioctl+0x2f6/0x1f50 [ 1137.127013] [<ffffffff815909b3>] blkdev_ioctl+0x283/0xa30 [ 1137.127013] [<ffffffff81242bf1>] block_ioctl+0x41/0x50 [ 1137.127013] [<ffffffff81214c96>] do_vfs_ioctl+0x96/0x6e0 [ 1137.127013] [<ffffffff81215321>] SyS_ioctl+0x41/0x70 [ 1137.127013] [<ffffffff81c56825>] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 1137.127013] other info that might help us debug this: [ 1137.127013] Possible unsafe locking scenario: [ 1137.127013] CPU0 CPU1 [ 1137.127013] ---- ---- [ 1137.127013] lock(detected_devices_mutex); [ 1137.127013] lock(&bdev->bd_mutex); [ 1137.127013] lock(detected_devices_mutex); [ 1137.127013] lock(&bdev->bd_mutex); [ 1137.127013] *** DEADLOCK *** Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21md/bitmap: fix wrong cleanupShaohua Li
if bitmap_create fails, the bitmap is already cleaned up and the returned value is an error number. We can't do the cleanup again. Reported-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21raid5: allow arbitrary max_hw_sectorsShaohua Li
raid5 will split bio to proper size internally, there is no point to use underlayer disk's max_hw_sectors. In my qemu system, without the change, the raid5 only receives 128k size bio, which reduces the chance of bio merge sending to underlayer disks. Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21md-cluster: make resync lock also could be interrupttedGuoqing Jiang
When one node is perform resync or recovery, other nodes can't get resync lock and could block for a while before it holds the lock, so we can't stop array immediately for this scenario. To make array could be stop quickly, we check MD_CLOSING in dlm_lock_sync_interruptible to make us can interrupt the lock request. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hangGuoqing Jiang
When some node leaves cluster, then it's bitmap need to be synced by another node, so "md*_recover" thread is triggered for the purpose. However, with below steps. we can find tasks hang happened either in B or C. 1. Node A create a resyncing cluster raid1, assemble it in other two nodes (B and C). 2. stop array in B and C. 3. stop array in A. linux44:~ # ps aux|grep md|grep D root 5938 0.0 0.1 19852 1964 pts/0 D+ 14:52 0:00 mdadm -S md0 root 5939 0.0 0.0 0 0 ? D 14:52 0:00 [md0_recover] linux44:~ # cat /proc/5939/stack [<ffffffffa04cf321>] dlm_lock_sync+0x71/0x90 [md_cluster] [<ffffffffa04d0705>] recover_bitmaps+0x125/0x220 [md_cluster] [<ffffffffa052105d>] md_thread+0x16d/0x180 [md_mod] [<ffffffff8107ad94>] kthread+0xb4/0xc0 [<ffffffff8152a518>] ret_from_fork+0x58/0x90 linux44:~ # cat /proc/5938/stack [<ffffffff8107afde>] kthread_stop+0x6e/0x120 [<ffffffffa0519da0>] md_unregister_thread+0x40/0x80 [md_mod] [<ffffffffa04cfd20>] leave+0x70/0x120 [md_cluster] [<ffffffffa0525e24>] md_cluster_stop+0x14/0x30 [md_mod] [<ffffffffa05269ab>] bitmap_free+0x14b/0x150 [md_mod] [<ffffffffa0523f3b>] do_md_stop+0x35b/0x5a0 [md_mod] [<ffffffffa0524e83>] md_ioctl+0x873/0x1590 [md_mod] [<ffffffff81288464>] blkdev_ioctl+0x214/0x7d0 [<ffffffff811dd3dd>] block_ioctl+0x3d/0x40 [<ffffffff811b92d4>] do_vfs_ioctl+0x2d4/0x4b0 [<ffffffff811b9538>] SyS_ioctl+0x88/0xa0 [<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b The problem is caused by recover_bitmaps can't reliably abort when the thread is unregistered. So dlm_lock_sync_interruptible is introduced to detect the thread's situation to fix the problem. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21md-cluster: convert the completion to wait queueGuoqing Jiang
Previously, we used completion to sync between require dlm lock and sync_ast, however we will have to expose completion.wait and completion.done in dlm_lock_sync_interruptible (introduced later), it is not a common usage for completion, so convert related things to wait queue. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21md-cluster: protect md_find_rdev_nr_rcu with rcu lockGuoqing Jiang
We need to use rcu_read_lock/unlock to avoid potential race. Reported-by: Shaohua Li <shli@fb.com> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21md-cluster: clean related infos of clusterGuoqing Jiang
cluster_info and bitmap_info.nodes also need to be cleared when array is stopped. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21md: changes for MD_STILL_CLOSED flagGuoqing Jiang
When stop clustered raid while it is pending on resync, MD_STILL_CLOSED flag could be cleared since udev rule is triggered to open the mddev. So obviously array can't be stopped soon and returns EBUSY. mdadm -Ss md-raid-arrays.rules set MD_STILL_CLOSED md_open() ... ... ... clear MD_STILL_CLOSED do_md_stop We make below changes to resolve this issue: 1. rename MD_STILL_CLOSED to MD_CLOSING since it is set when stop array and it means we are stopping array. 2. let md_open returns early if CLOSING is set, so no other threads will open array if one thread is trying to close it. 3. no need to clear CLOSING bit in md_open because 1 has ensure the bit is cleared, then we also don't need to test CLOSING bit in do_md_stop. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21md-cluster: remove some unnecessary dlm_unlock_syncGuoqing Jiang
Since DLM_LKF_FORCEUNLOCK is used in lockres_free, we don't need to call dlm_unlock_sync before free lock resource. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21md-cluster: use FORCEUNLOCK in lockres_freeGuoqing Jiang
For dlm_unlock, we need to pass flag to dlm_unlock as the third parameter instead of set res->flags. Also, DLM_LKF_FORCEUNLOCK is more suitable for dlm_unlock since it works even the lock is on waiting or convert queue. Acked-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21md-cluster: call md_kick_rdev_from_array once ack failedGuoqing Jiang
The new_disk_ack could return failure if WAITING_FOR_NEWDISK is not set, so we need to kick the dev from array in case failure happened. And we missed to check err before call new_disk_ack othwise we could kick a rdev which isn't in array, thanks for the reminder from Shaohua. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21blk-mq: register device instead of diskMatias Bjørling
Enable devices without a gendisk instance to register itself with blk-mq and expose the associated multi-queue sysfs entries. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-15dm mpath: delay the requeue of blk-mq requests while all paths downMike Snitzer
Return DM_MAPIO_DELAY_REQUEUE from .clone_and_map_rq. Also, return false from .busy, if all paths are down, so that blk-mq requests get mapped via .clone_and_map_rq -- which results in DM_MAPIO_DELAY_REQUEUE being returned to dm-rq. This change allows for a noticeable reduction in cpu utilization (reduced kworker load) while all paths are down, e.g.: system CPU idleness (as measured by fio's --idle-prof=system): before: system: 86.58% after: system: 98.60% Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com>
2016-09-15dm mpath: use dm_mq_kick_requeue_list()Mike Snitzer
When reinstating a path the blk-mq request_queue's requeue_list should get kicked. It makes sense to kick the requeue_list as part of the existing hook (previously only used by bio-based support). Rename process_queued_bios_list to process_queued_io_list. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com>
2016-09-15dm rq: introduce dm_mq_kick_requeue_list()Mike Snitzer
Make it possible for a request-based target to kick the DM device's blk-mq request_queue's requeue_list. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com>
2016-09-15dm rq: reduce arguments passed to map_request() and ↵Mike Snitzer
dm_requeue_original_request() Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com>
2016-09-15blk-mq: remove ->map_queueChristoph Hellwig
All drivers use the default, so provide an inline version of it. If we ever need other queue mapping we can add an optional method back, although supporting will also require major changes to the queue setup code. This provides better code generation, and better debugability as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-15Merge branch 'irq/for-block' of ↵Jens Axboe
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into for-4.9/msi-irq
2016-09-14dm rq: add DM_MAPIO_DELAY_REQUEUE to delay requeue of blk-mq requestsMike Snitzer
Otherwise blk-mq will immediately dispatch requests that are requeued via a BLK_MQ_RQ_QUEUE_BUSY return from blk_mq_ops .queue_rq. Delayed requeue is implemented using blk_mq_delay_kick_requeue_list() with a delay of 5 secs. In the context of DM multipath (all paths down) it doesn't make any sense to requeue more quickly. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-14dm: convert wait loops to use autoremove_wake_function()Bart Van Assche
Use autoremove_wake_function() instead of default_wake_function() to make the dm wait loops more similar to other wait loops in the kernel. This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-14dm: use signal_pending_state() in dm_wait_for_completion()Bart Van Assche
Use signal_pending_state() instead of open-coding it. This patch does not change any functionality but makes it possible to pass TASK_KILLABLE as the second argument of dm_wait_for_completion(). See also commit 16882c1e962b ("sched: fix TASK_WAKEKILL vs SIGKILL race"). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-14dm: rename task state function argumentsBart Van Assche
Rename 'interruptible' into 'task_state' to make it clear that this argument is a task state instead of a boolean. Also, change type from int to long. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-14dm: add two lockdep_assert_held() statementsBart Van Assche
Document the locking assumptions for the __bind() and __dm_suspend() functions. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-14dm rq: simplify dm_old_stop_queue()Bart Van Assche
This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-14dm mpath: check if path's request_queue is dying in activate_path()Mike Snitzer
If pg_init_retries is set and a request is queued against a multipath device with all underlying block device request_queues in the "dying" state then an infinite loop is triggered because activate_path() never succeeds and hence never calls pg_init_done(). This change avoids that device removal triggers an infinite loop by failing the activate_path() which causes the "dying" path to be failed. Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2016-09-14dm rq: take request_queue lock while clearing QUEUE_FLAG_STOPPEDMike Snitzer
Every call of queue_flag_clear_unlocked() after block device initialization has finished is wrong if blk_cleanup_queue() can be called concurrently. Convert queue_flag_clear_unlocked() into queue_flag_clear() and protect it by the block layer queue lock. Also, factor out dm_mq_start_queue(). Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2016-09-14dm rq: factor out dm_mq_stop_queue()Bart Van Assche
Also, check that the blk-mq request_queue isn't already stopped. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-14dm: mark request_queue dead before destroying the DM deviceBart Van Assche
This avoids that new requests are queued while __dm_destroy() is in progress. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2016-09-14dm: return correct error code in dm_resume()'s retry loopMinfei Huang
dm_resume() will return success (0) rather than -EINVAL if !dm_suspended_md() upon retry within dm_resume(). Reset the error code at the start of dm_resume()'s retry loop. Also, remove a useless assignment at the end of dm_resume(). Fixes: ffcc393641 ("dm: enhance internal suspend and resume interface") Cc: stable@vger.kernel.org # 3.19+ Signed-off-by: Minfei Huang <mnghuan@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-14block, dm-crypt, btrfs: Introduce bio_flags()Bart Van Assche
Introduce the bio_flags() macro. Ensure that the second argument of bio_set_op_attrs() only contains flags and no operation. This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Mike Christie <mchristi@redhat.com> Cc: Chris Mason <clm@fb.com> (maintainer:BTRFS FILE SYSTEM) Cc: Josef Bacik <jbacik@fb.com> (maintainer:BTRFS FILE SYSTEM) Cc: Mike Snitzer <snitzer@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@hgst.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-13Merge tag 'md/4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/mdLinus Torvalds
Pull MD fixes from Shaohua Li: "A few bug fixes for MD: - Guoqing fixed a bug compiling md-cluster in kernel - I fixed a potential deadlock in raid5-cache superblock write, a hang in raid5 reshape resume and a race condition introduced in rc4" * tag 'md/4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: raid5: fix a small race condition md-cluster: make md-cluster also can work when compiled into kernel raid5: guarantee enough stripes to avoid reshape hang raid5-cache: fix a deadlock in superblock write
2016-09-09raid5: fix a small race conditionShaohua Li
commit 5f9d1fde7d54a5(raid5: fix memory leak of bio integrity data) moves bio_reset to bio_endio. But it introduces a small race condition. It does bio_reset after raid5_release_stripe, which could make the stripe reusable and hence reuse the bio just before bio_reset. Moving bio_reset before raid5_release_stripe is called should fix the race. Reported-and-tested-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-08md-cluster: make md-cluster also can work when compiled into kernelGuoqing Jiang
The md-cluster is compiled as module by default, if it is compiled by built-in way, then we can't make md-cluster works. [64782.630008] md/raid1:md127: active with 2 out of 2 mirrors [64782.630528] md-cluster module not found. [64782.630530] md127: Could not setup cluster service (-2) Fixes: edb39c9 ("Introduce md_cluster_operations to handle cluster functions") Cc: stable@vger.kernel.org (v4.1+) Reported-by: Marc Smith <marc.smith@mcc.edu> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-06md/raid5: Convert to hotplug state machineSebastian Andrzej Siewior
Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Neil Brown <neilb@suse.com> Cc: linux-raid@vger.kernel.org Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160818125731.27256-10-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-03Merge tag 'dm-4.8-fixes-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - a stable fix in both DM crypt and DM log-writes for too large bios (as generated by bcache) - two other stable fixes for DM log-writes - a stable fix for a DM crypt bug that could result in freeing pointers from uninitialized memory in the tfm allocation error path - a DM bufio cleanup to discontinue using create_singlethread_workqueue() * tag 'dm-4.8-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm bufio: remove use of deprecated create_singlethread_workqueue() dm crypt: fix free of bad values after tfm allocation failure dm crypt: fix error with too large bios dm log writes: fix check of kthread_run() return value dm log writes: fix bug with too large bios dm log writes: move IO accounting earlier to fix error path
2016-08-31raid5: guarantee enough stripes to avoid reshape hangShaohua Li
If there aren't enough stripes, reshape will hang. We have a check for this in new reshape, but miss it for reshape resume, hence we could see hang in reshape resume. This patch forces enough stripes existed if reshape resumes. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-08-31raid5-cache: fix a deadlock in superblock writeShaohua Li
There is a potential deadlock in superblock write. Discard could zero data, so before discard we must make sure superblock is updated to new log tail. Updating superblock (either directly call md_update_sb() or depend on md thread) must hold reconfig mutex. On the other hand, raid5_quiesce is called with reconfig_mutex hold. The first step of raid5_quiesce() is waitting for all IO finish, hence waitting for reclaim thread, while reclaim thread is calling this function and waitting for reconfig mutex. So there is a deadlock. We workaround this issue with a trylock. The downside of the solution is we could miss discard if we can't take reconfig mutex. But this should happen rarely (mainly in raid array stop), so miss discard shouldn't be a big problem. Cc: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-08-30dm bufio: remove use of deprecated create_singlethread_workqueue()Bhaktipriya Shridhar
The workqueue "dm_bufio_wq" queues a single work item &dm_bufio_work so it doesn't require execution ordering. Hence, alloc_workqueue() has been used to replace the deprecated create_singlethread_workqueue(). The WQ_MEM_RECLAIM flag has been set since DM requires forward progress under memory pressure. Since there are fixed number of work items, explicit concurrency limit is unnecessary here. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-08-30dm crypt: fix free of bad values after tfm allocation failureEric Biggers
If crypt_alloc_tfms() had to allocate multiple tfms and it failed before the last allocation, then it would call crypt_free_tfms() and could free pointers from uninitialized memory -- due to the crypt_free_tfms() check for non-zero cc->tfms[i]. Fix by allocating zeroed memory. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2016-08-30dm crypt: fix error with too large biosMikulas Patocka
When dm-crypt processes writes, it allocates a new bio in crypt_alloc_buffer(). The bio is allocated from a bio set and it can have at most BIO_MAX_PAGES vector entries, however the incoming bio can be larger (e.g. if it was allocated by bcache). If the incoming bio is larger, bio_alloc_bioset() fails and an error is returned. To avoid the error, we test for a too large bio in the function crypt_map() and use dm_accept_partial_bio() to split the bio. dm_accept_partial_bio() trims the current bio to the desired size and asks DM core to send another bio with the rest of the data. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # v3.16+
2016-08-30dm log writes: fix check of kthread_run() return valueVladimir Zapolskiy
The kthread_run() function returns either a valid task_struct or ERR_PTR() value, check for NULL is invalid. This change fixes potential for oops, e.g. in OOM situation. Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org