summaryrefslogtreecommitdiff
path: root/drivers/md
AgeCommit message (Collapse)Author
2022-01-06md: drop queue limitation for RAID1 and RAID10Mariusz Tkaczyk
As suggested by Neil Brown[1], this limitation seems to be deprecated. With plugging in use, writes are processed behind the raid thread and conf->pending_count is not increased. This limitation occurs only if caller doesn't use plugs. It can be avoided and often it is (with plugging). There are no reports that queue is growing to enormous size so remove queue limitation for non-plugged IOs too. [1] https://lore.kernel.org/linux-raid/162496301481.7211.18031090130574610495@noble.neil.brown.name Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> Signed-off-by: Song Liu <song@kernel.org>
2022-01-06md/raid5: play nice with PREEMPT_RTDavidlohr Bueso
raid_run_ops() relies on the implicitly disabled preemption for its percpu ops, although this is really about CPU locality. This breaks RT semantics as it can take regular (and thus sleeping) spinlocks, such as stripe_lock. Add a local_lock such that non-RT does not change and continues to be just map to preempt_disable/enable, but makes RT happy as the region will use a per-CPU spinlock and thus be preemptible and still guarantee CPU locality. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Song Liu <songliubraving@fb.com>
2022-01-06dm sysfs: use default_groups in kobj_typeGreg Kroah-Hartman
There are currently 2 ways to create a set of sysfs files for a kobj_type, through the default_attrs field, and the default_groups field. Move the dm sysfs code to use default_groups field which has been the preferred way since aa30f47cf666 ("kobject: Add support for default attribute groups to kobj_type") so that we can soon get rid of the obsolete default_attrs field. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-01-06dm integrity: Use struct_group() to zero struct journal_sectorKees Cook
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(), avoid intentionally writing across neighboring fields. Add struct_group() to mark region of struct journal_sector that should be initialized to zero. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-01-04dm space map common: add bounds check to sm_ll_lookup_bitmap()Joe Thornber
Corrupted metadata could warrant returning error from sm_ll_lookup_bitmap(). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-01-04dm btree: add a defensive bounds check to insert_at()Joe Thornber
Corrupt metadata could trigger an out of bounds write. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-01-04dm btree remove: change a bunch of BUG_ON() calls to proper errorsJoe Thornber
Abuse of BUG_ON() is never appropriate, best to propagate errors to fail gracefully (rather than take the entire system down). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-01-04dm btree spine: eliminate duplicate le32_to_cpu() in node_check()Joe Thornber
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-01-04dm btree spine: remove extra node_check function declarationJoe Thornber
Should have been removed as part of commit f73e2e70ec48 ("dm btree spine: remove paranoid node_check call in node_prep_for_write()") Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-01-03md/raid1: fix missing bitmap update w/o WriteMostly devicesSong Liu
commit [1] causes missing bitmap updates when there isn't any WriteMostly devices. Detailed steps to reproduce by Norbert (which somehow didn't make to lore): # setup md10 (raid1) with two drives (1 GByte sparse files) dd if=/dev/zero of=disk1 bs=1024k seek=1024 count=0 dd if=/dev/zero of=disk2 bs=1024k seek=1024 count=0 losetup /dev/loop11 disk1 losetup /dev/loop12 disk2 mdadm --create /dev/md10 --level=1 --raid-devices=2 /dev/loop11 /dev/loop12 # add bitmap (aka write-intent log) mdadm /dev/md10 --grow --bitmap=internal echo check > /sys/block/md10/md/sync_action root:# cat /sys/block/md10/md/mismatch_cnt 0 root:# # remove member drive disk2 (loop12) mdadm /dev/md10 -f loop12 ; mdadm /dev/md10 -r loop12 # modify degraded md device dd if=/dev/urandom of=/dev/md10 bs=512 count=1 # no blocks recorded as out of sync on the remaining member disk1/loop11 root:# mdadm -X /dev/loop11 | grep Bitmap Bitmap : 16 bits (chunks), 0 dirty (0.0%) root:# # re-add disk2, nothing synced because of empty bitmap mdadm /dev/md10 --re-add /dev/loop12 # check integrity again echo check > /sys/block/md10/md/sync_action # disk1 and disk2 are no longer in sync, reads return differend data root:# cat /sys/block/md10/md/mismatch_cnt 128 root:# # clean up mdadm -S /dev/md10 losetup -d /dev/loop11 losetup -d /dev/loop12 rm disk1 disk2 Fix this by moving the WriteMostly check to the if condition for alloc_behind_master_bio(). [1] commit fd3b6975e9c1 ("md/raid1: only allocate write behind bio for WriteMostly device") Fixes: fd3b6975e9c1 ("md/raid1: only allocate write behind bio for WriteMostly device") Cc: stable@vger.kernel.org # v5.12+ Cc: Guoqing Jiang <guoqing.jiang@linux.dev> Cc: Jens Axboe <axboe@kernel.dk> Reported-by: Norbert Warmuth <nwarmuth@t-online.de> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Song Liu <song@kernel.org>
2021-12-18dax: remove the copy_from_iter and copy_to_iter methodsChristoph Hellwig
These methods indirect the actual DAX read/write path. In the end pmem uses magic flush and mc safe variants and fuse and dcssblk use plain ones while device mapper picks redirects to the underlying device. Add set_dax_nocache() and set_dax_nomc() APIs to control which copy routines are used to remove indirect call from the read/write fast path as well as a lot of boilerplate code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> [virtiofs] Link: https://lore.kernel.org/r/20211215084508.435401-5-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-18dax: remove the DAXDEV_F_SYNC flagChristoph Hellwig
Remove the DAXDEV_F_SYNC flag and thus the flags argument to alloc_dax and just let the drivers call set_dax_synchronous directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211215084508.435401-4-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-17Merge tag 'block-5.16-2021-12-17' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - Fix for hammering on the delayed run queue timer (me) - bcache regression fix for this merge window (Lin) - Fix a divide-by-zero in the blk-iocost code (Tejun) * tag 'block-5.16-2021-12-17' of git://git.kernel.dk/linux-block: bcache: fix NULL pointer reference in cached_dev_detach_finish block: reduce kblockd_mod_delayed_work_on() CPU consumption iocost: Fix divide-by-zero on donation from low hweight cgroup
2021-12-16Merge tag 'for-5.16/dm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix use after free in DM btree remove's rebalance_children() - Fix DM integrity data corruption, introduced during 5.16 merge, due to improper use of bvec_kmap_local() * tag 'for-5.16/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm integrity: fix data corruption due to improper use of bvec_kmap_local dm btree remove: fix use after free in rebalance_children()
2021-12-15dm integrity: fix data corruption due to improper use of bvec_kmap_localMike Snitzer
Commit 25058d1c725c ("dm integrity: use bvec_kmap_local in __journal_read_write") didn't account for __journal_read_write() later adding the biovec's bv_offset. As such using bvec_kmap_local() caused the start of the biovec to be skipped. Trivial test that illustrates data corruption: # integritysetup format /dev/pmem0 # integritysetup open /dev/pmem0 integrityroot # mkfs.xfs /dev/mapper/integrityroot ... bad magic number bad magic number Metadata corruption detected at xfs_sb block 0x0/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000 releasing dirty buffer (bulk) to free list! Fix this by using kmap_local_page() instead of bvec_kmap_local() in __journal_read_write(). Fixes: 25058d1c725c ("dm integrity: use bvec_kmap_local in __journal_read_write") Reported-by: Tony Asleson <tasleson@redhat.com> Reviewed-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-12-14bcache: fix NULL pointer reference in cached_dev_detach_finishLin Feng
Commit 0259d4498ba4 ("bcache: move calc_cached_dev_sectors to proper place on backing device detach") tries to fix calc_cached_dev_sectors when bcache device detaches, but now we have: cached_dev_detach_finish ... bcache_device_detach(&dc->disk); ... closure_put(&d->c->caching); d->c = NULL; [*explicitly set dc->disk.c to NULL*] list_move(&dc->list, &uncached_devices); calc_cached_dev_sectors(dc->disk.c); [*passing a NULL pointer*] ... Upper codeflows shows how bug happens, this patch fix the problem by caching dc->disk.c beforehand, and cache_set won't be freed under us because c->caching closure at least holds a reference count and closure callback __cache_set_unregister only being called by bch_cache_set_stop which using closure_queue(&c->caching), that means c->caching closure callback for destroying cache_set won't be trigger by previous closure_put(&d->c->caching). So at this stage(while cached_dev_detach_finish is calling) it's safe to access cache_set dc->disk.c. Fixes: 0259d4498ba4 ("bcache: move calc_cached_dev_sectors to proper place on backing device detach") Signed-off-by: Lin Feng <linf@wangsu.com> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20211112053629.3437-2-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-10md: fix double free of mddev->private in autorun_array()zhangyue
In driver/md/md.c, if the function autorun_array() is called, the problem of double free may occur. In function autorun_array(), when the function do_md_run() returns an error, the function do_md_stop() will be called. The function do_md_run() called function md_run(), but in function md_run(), the pointer mddev->private may be freed. The function do_md_stop() called the function __md_stop(), but in function __md_stop(), the pointer mddev->private also will be freed without judging null. At this time, the pointer mddev->private will be double free, so it needs to be judged null or not. Signed-off-by: zhangyue <zhangyue1@kylinos.cn> Signed-off-by: Song Liu <songliubraving@fb.com>
2021-12-10md: fix update super 1.0 on rdev size changeMarkus Hochholdinger
The superblock of version 1.0 doesn't get moved to the new position on a device size change. This leads to a rdev without a superblock on a known position, the raid can't be re-assembled. The line was removed by mistake and is re-added by this patch. Fixes: d9c0fa509eaf ("md: fix max sectors calculation for super 1.0") Cc: stable@vger.kernel.org Signed-off-by: Markus Hochholdinger <markus@hochholdinger.net> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2021-12-04dax: return the partition offset from fs_dax_get_by_bdevChristoph Hellwig
Prepare for the removal of the block_device from the DAX I/O path by returning the partition offset from fs_dax_get_by_bdev so that the file systems have it at hand for use during I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-26-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dm-stripe: add a stripe_dax_pgoff helperChristoph Hellwig
Add a helper to perform the entire remapping for DAX accesses. This helper open codes bdev_dax_pgoff given that the alignment checks have already been done by the submitting file system and don't need to be repeated. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211129102203.2243509-12-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dm-log-writes: add a log_writes_dax_pgoff helperChristoph Hellwig
Add a helper to perform the entire remapping for DAX accesses. This helper open codes bdev_dax_pgoff given that the alignment checks have already been done by the submitting file system and don't need to be repeated. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211129102203.2243509-11-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dm-linear: add a linear_dax_pgoff helperChristoph Hellwig
Add a helper to perform the entire remapping for DAX accesses. This helper open codes bdev_dax_pgoff given that the alignment checks have already been done by the submitting file system and don't need to be repeated. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211129102203.2243509-10-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dax: remove dax_capableChristoph Hellwig
Just open code the block size and dax_dev == NULL checks in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> [erofs] Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-9-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dax: simplify the dax_device <-> gendisk associationChristoph Hellwig
Replace the dax_host_hash with an xarray indexed by the pointer value of the gendisk, and require explicitly calls from the block drivers that want to associate their gendisk with a dax_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-5-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dm: make the DAX support depend on CONFIG_FS_DAXChristoph Hellwig
The device mapper DAX support is all hanging off a block device and thus can't be used with device dax. Make it depend on CONFIG_FS_DAX instead of CONFIG_DAX_DRIVER. This also means that bdev_dax_pgoff only needs to be built under CONFIG_FS_DAX now. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20211129102203.2243509-3-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dm: fix alloc_dax error handling in alloc_devChristoph Hellwig
Make sure ->dax_dev is NULL on error so that the cleanup path doesn't trip over an ERR_PTR. Reported-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211129102203.2243509-2-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-11-29block: remove the ->rq_disk field in struct requestChristoph Hellwig
Just use the disk attached to the request_queue instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29block: remove GENHD_FL_EXT_DEVTChristoph Hellwig
All modern drivers can support extra partitions using the extended dev_t. In fact except for the ioctl method drivers never even see partitions in normal operation. So remove the GENHD_FL_EXT_DEVT and allow extra partitions for all block devices that do support partitions, and require those that do not support partitions to explicit disallow them using GENHD_FL_NO_PART. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-24dm btree remove: fix use after free in rebalance_children()Joe Thornber
Move dm_tm_unlock() after dm_tm_dec(). Cc: stable@vger.kernel.org Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-11-09Merge tag 'for-5.16/drivers-2021-11-09' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull more block driver updates from Jens Axboe: - Last series adding error handling support for add_disk() in drivers. After this one, and once the SCSI side has been merged, we can finally annotate add_disk() as must_check. (Luis) - bcache fixes (Coly) - zram fixes (Ming) - ataflop locking fix (Tetsuo) - nbd fixes (Ye, Yu) - MD merge via Song - Cleanup (Yang) - sysfs fix (Guoqing) - Misc fixes (Geert, Wu, luo) * tag 'for-5.16/drivers-2021-11-09' of git://git.kernel.dk/linux-block: (34 commits) bcache: Revert "bcache: use bvec_virt" ataflop: Add missing semicolon to return statement floppy: address add_disk() error handling on probe ataflop: address add_disk() error handling on probe block: update __register_blkdev() probe documentation ataflop: remove ataflop_probe_lock mutex mtd/ubi/block: add error handling support for add_disk() block/sunvdc: add error handling support for add_disk() z2ram: add error handling support for add_disk() nvdimm/pmem: use add_disk() error handling nvdimm/pmem: cleanup the disk if pmem_release_disk() is yet assigned nvdimm/blk: add error handling support for add_disk() nvdimm/blk: avoid calling del_gendisk() on early failures nvdimm/btt: add error handling support for add_disk() nvdimm/btt: use goto error labels on btt_blk_init() loop: Remove duplicate assignments drbd: Fix double free problem in drbd_create_device nvdimm/btt: do not call del_gendisk() if not needed bcache: fix use-after-free problem in bcache_device_free() zram: replace fsync_bdev with sync_blockdev ...
2021-11-09Merge tag 'for-5.16/block-2021-11-09' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - Set of fixes for the batched tag allocation (Ming, me) - add_disk() error handling fix (Luis) - Nested queue quiesce fixes (Ming) - Shared tags init error handling fix (Ye) - Misc cleanups (Jean, Ming, me) * tag 'for-5.16/block-2021-11-09' of git://git.kernel.dk/linux-block: nvme: wait until quiesce is done scsi: make sure that request queue queiesce and unquiesce balanced scsi: avoid to quiesce sdev->request_queue two times blk-mq: add one API for waiting until quiesce is done blk-mq: don't free tags if the tag_set is used by other device in queue initialztion block: fix device_add_disk() kobject_create_and_add() error handling block: ensure cached plug request matches the current queue block: move queue enter logic into blk_mq_submit_bio() block: make bio_queue_enter() fast-path available inline block: split request allocation components into helpers block: have plug stored requests hold references to the queue blk-mq: update hctx->nr_active in blk_mq_end_request_batch() blk-mq: add RQF_ELV debug entry blk-mq: only try to run plug merge if request has same queue with incoming bio block: move RQF_ELV setting into allocators dm: don't stop request queue after the dm device is suspended block: replace always false argument with 'false' block: assign correct tag before doing prefetch of request blk-mq: fix redundant check of !e expression
2021-11-09Merge tag 'for-5.16/dm-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Add DM core support for emitting audit events through the audit subsystem. Also enhance both the integrity and crypt targets to emit events to via dm-audit. - Various other simple code improvements and cleanups. * tag 'for-5.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm table: log table creation error code dm: make workqueue names device-specific dm writecache: Make use of the helper macro kthread_run() dm crypt: Make use of the helper macro kthread_run() dm verity: use bvec_kmap_local in verity_for_bv_block dm log writes: use memcpy_from_bvec in log_writes_map dm integrity: use bvec_kmap_local in __journal_read_write dm integrity: use bvec_kmap_local in integrity_metadata dm: add add_disk() error handling dm: Remove redundant flush_workqueue() calls dm crypt: log aead integrity violations to audit subsystem dm integrity: log audit events for dm-integrity target dm: introduce audit event module for device mapper
2021-11-08bcache: Revert "bcache: use bvec_virt"Coly Li
This reverts commit 2fd3e5efe791946be0957c8e1eed9560b541fe46. The above commit replaces page_address(bv->bv_page) by bvec_virt(bv) to avoid directly access to bv->bv_page, but in situation bv->bv_offset is not zero and page_address(bv->bv_page) is not equal to bvec_virt(bv). In such case a memory corruption may happen because memory in next page is tainted by following line in do_btree_node_write(), memcpy(bvec_virt(bv), addr, PAGE_SIZE); This patch reverts the mentioned commit to avoid the memory corruption. Fixes: 2fd3e5efe791 ("bcache: use bvec_virt") Signed-off-by: Coly Li <colyli@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org # 5.15 Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211103151041.70516-1-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-03bcache: fix use-after-free problem in bcache_device_free()Coly Li
In bcache_device_free(), pointer disk is referenced still in ida_simple_remove() after blk_cleanup_disk() gets called on this pointer. This may cause a potential panic by use-after-free on the disk pointer. This patch fixes the problem by calling blk_cleanup_disk() after ida_simple_remove(). Fixes: bc70852fd104 ("bcache: convert to blk_alloc_disk/blk_cleanup_disk") Signed-off-by: Coly Li <colyli@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: stable@vger.kernel.org # v5.14+ Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211103064917.67383-1-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-02raid5-ppl: use swap() to make code cleanerYang Guang
Use the macro `swap()` defined in `include/linux/minmax.h` to avoid opencoding it. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Yang Guang <yang.guang5@zte.com.cn> Signed-off-by: Song Liu <songliubraving@fb.com>
2021-11-02md/bitmap: don't set max_write_behind if there is no write mostly deviceGuoqing Jiang
We shouldn't set it since write behind IO should only happen to write mostly device. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Song Liu <songliubraving@fb.com>
2021-11-02dm: don't stop request queue after the dm device is suspendedMing Lei
For fixing queue quiesce race between driver and block layer(elevator switch, update nr_requests, ...), we need to support concurrent quiesce and unquiesce, which requires the two call to be balanced. __bind() is only called from dm_swap_table() in which dm device has been suspended already, so not necessary to stop queue again. With this way, request queue quiesce and unquiesce can be balanced. Reported-by: Yi Zhang <yi.zhang@redhat.com> Fixes: e70feb8b3e68 ("blk-mq: support concurrent queue quiesce/unquiesce") Signed-off-by: Ming Lei <ming.lei@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Link: https://lore.kernel.org/r/20211021145918.2691762-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-01dm table: log table creation error codeMichał Mirosław
Help debugging table creation errors by adding the error name in the log. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-11-01dm: make workqueue names device-specificMichał Mirosław
Add device number to kdmflush workqueue name to help debugging CPU usage. Resulting `ps axfu` snippet: root 3791 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kdmflush/253:7] root 3792 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kcryptd_io/253:7] root 3793 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kcryptd/253:7] root 3794 0.0 0.0 0 0 ? S paź19 0:00 \_ [dmcrypt_write/253:7] root 3814 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kdmflush/253:8] root 3815 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kdmflush/253:9] root 3816 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kdmflush/253:10] Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-11-01dm writecache: Make use of the helper macro kthread_run()Cai Huoqing
Replace kthread_create/wake_up_process() with kthread_run() to simplify the code. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-11-01dm crypt: Make use of the helper macro kthread_run()Cai Huoqing
Replace kthread_create/wake_up_process() with kthread_run() to simplify the code. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-11-01dm verity: use bvec_kmap_local in verity_for_bv_blockChristoph Hellwig
Using local kmaps slightly reduces the chances to stray writes, and the bvec interface cleans up the code a little bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-11-01dm log writes: use memcpy_from_bvec in log_writes_mapChristoph Hellwig
Use memcpy_from_bvec instead of open coding the logic. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-11-01dm integrity: use bvec_kmap_local in __journal_read_writeChristoph Hellwig
Using local kmaps slightly reduces the chances to stray writes, and the bvec interface cleans up the code a little bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-11-01dm integrity: use bvec_kmap_local in integrity_metadataChristoph Hellwig
Using local kmaps slightly reduces the chances to stray writes, and the bvec interface cleans up the code a little bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-11-01dm: add add_disk() error handlingLuis Chamberlain
We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. There are two calls to dm_setup_md_queue() which can fail then, one on dm_early_create() and we can easily see that the error path there calls dm_destroy in the error path. The other use case is on the ioctl table_load case. If that fails userspace needs to call the DM_DEV_REMOVE_CMD to cleanup the state - similar to any other failure. Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-11-01dm: Remove redundant flush_workqueue() callsChristophe JAILLET
destroy_workqueue() already drains the queue before destroying it, so there is no need to flush it explicitly. Remove the redundant flush_workqueue() calls. This was generated with coccinelle: @@ expression E; @@ - flush_workqueue(E); destroy_workqueue(E); Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-11-01Merge tag 'for-5.16/passthrough-flag-2021-10-29' of ↵Linus Torvalds
git://git.kernel.dk/linux-block Pull QUEUE_FLAG_SCSI_PASSTHROUGH removal from Jens Axboe: "This contains a series leading to the removal of the QUEUE_FLAG_SCSI_PASSTHROUGH queue flag" * tag 'for-5.16/passthrough-flag-2021-10-29' of git://git.kernel.dk/linux-block: block: remove blk_{get,put}_request block: remove QUEUE_FLAG_SCSI_PASSTHROUGH block: remove the initialize_rq_fn blk_mq_ops method scsi: add a scsi_alloc_request helper bsg-lib: initialize the bsg_job in bsg_transport_sg_io_fn nfsd/blocklayout: use ->get_unique_id instead of sending SCSI commands sd: implement ->get_unique_id block: add a ->get_unique_id method
2021-11-01Merge tag 'for-5.16/bdev-size-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull bdev size cleanups from Jens Axboe: "Clean up the bdev size handling with new bdev_nr_bytes() helper" * tag 'for-5.16/bdev-size-2021-10-29' of git://git.kernel.dk/linux-block: (34 commits) partitions/ibm: use bdev_nr_sectors instead of open coding it partitions/efi: use bdev_nr_bytes instead of open coding it block/ioctl: use bdev_nr_sectors and bdev_nr_bytes block: cache inode size in bdev udf: use sb_bdev_nr_blocks reiserfs: use sb_bdev_nr_blocks ntfs: use sb_bdev_nr_blocks jfs: use sb_bdev_nr_blocks ext4: use sb_bdev_nr_blocks block: add a sb_bdev_nr_blocks helper block: use bdev_nr_bytes instead of open coding it in blkdev_fallocate squashfs: use bdev_nr_bytes instead of open coding it reiserfs: use bdev_nr_bytes instead of open coding it pstore/blk: use bdev_nr_bytes instead of open coding it ntfs3: use bdev_nr_bytes instead of open coding it nilfs2: use bdev_nr_bytes instead of open coding it nfs/blocklayout: use bdev_nr_bytes instead of open coding it jfs: use bdev_nr_bytes instead of open coding it hfsplus: use bdev_nr_sectors instead of open coding it hfs: use bdev_nr_sectors instead of open coding it ...
2021-11-01Merge tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: - paride driver cleanups (Christoph) - Remove cryptoloop support (Christoph) - null_blk poll support (me) - Now that add_disk() supports proper error handling, add it to various drivers (Luis) - Make ataflop actually work again (Michael) - s390 dasd fixes (Stefan, Heiko) - nbd fixes (Yu, Ye) - Remove redundant wq flush in mtip32xx (Christophe) - NVMe updates - fix a multipath partition scanning deadlock (Hannes Reinecke) - generate uevent once a multipath namespace is operational again (Hannes Reinecke) - support unique discovery controller NQNs (Hannes Reinecke) - fix use-after-free when a port is removed (Israel Rukshin) - clear shadow doorbell memory on resets (Keith Busch) - use struct_size (Len Baker) - add error handling support for add_disk (Luis Chamberlain) - limit the maximal queue size for RDMA controllers (Max Gurtovoy) - use a few more symbolic names (Max Gurtovoy) - fix error code in nvme_rdma_setup_ctrl (Max Gurtovoy) - add support for ->map_queues on FC (Saurav Kashyap) - support the current discovery subsystem entry (Hannes Reinecke) - use flex_array_size and struct_size (Len Baker) - bcache fixes (Christoph, Coly, Chao, Lin, Qing) - MD updates (Christoph, Guoqing, Xiao) - Misc fixes (Dan, Ding, Jiapeng, Shin'ichiro, Ye) * tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-block: (117 commits) null_blk: Fix handling of submit_queues and poll_queues attributes block: ataflop: Fix warning comparing pointer to 0 bcache: replace snprintf in show functions with sysfs_emit bcache: move uapi header bcache.h to bcache code directory nvmet: use flex_array_size and struct_size nvmet: register discovery subsystem as 'current' nvmet: switch check for subsystem type nvme: add new discovery log page entry definitions block: ataflop: more blk-mq refactoring fixes block: remove support for cryptoloop and the xor transfer mtd: add add_disk() error handling rnbd: add error handling support for add_disk() um/drivers/ubd_kern: add error handling support for add_disk() m68k/emu/nfblock: add error handling support for add_disk() xen-blkfront: add error handling support for add_disk() bcache: add error handling support for add_disk() dm: add add_disk() error handling block: aoe: fixup coccinelle warnings nvmet: use struct_size over open coded arithmetic nvme: drop scan_lock and always kick requeue list when removing namespaces ...