summaryrefslogtreecommitdiff
path: root/drivers/md
AgeCommit message (Collapse)Author
2022-07-14md/raid1: Use the new blk_opf_t typeBart Van Assche
Improve static type checking by using the new blk_opf_t type for variables that represent request flags. Acked-by: Song Liu <song@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-35-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14md/bcache: Combine two prio_io() argumentsBart Van Assche
Improve uniformity in the kernel of handling of request operation and flags by passing these as a single argument. Cc: Coly Li <colyli@suse.de> Cc: Mingzhe Zou <mingzhe.zou@easystack.cn> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-34-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14md/bcache: Combine two uuid_io() argumentsBart Van Assche
Improve uniformity in the kernel of handling of request operation and flags by passing these as a single argument. Cc: Coly Li <colyli@suse.de> Cc: Mingzhe Zou <mingzhe.zou@easystack.cn> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-33-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14md/core: Combine two sync_page_io() argumentsBart Van Assche
Improve uniformity in the kernel of handling of request operation and flags by passing these as a single argument. Cc: Song Liu <song@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-32-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14dm/dm-zoned: Use the enum req_op typeBart Van Assche
Improve static type checking by using the enum req_op type for arguments that represent a request operation. Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-31-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14dm/zone: Use the enum req_op typeBart Van Assche
Use the enum req_op type for request operations instead of unsigned int. This patch fixes a sparse warning that has been introduced by making enum req_op __bitwise. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-30-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14dm-snap: Combine request operation type and flagsBart Van Assche
Pass the request operation and its flags as a single argument to improve kernel code uniformity. Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-29-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14dm mirror log: Use the new blk_opf_t typeBart Van Assche
Improve static type checking by using the new blk_opf_t type for a function argument that represents a request operation type. Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-28-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14dm/dm-integrity: Combine request operation and flagsBart Van Assche
Combine the request operation type and request flags into a single argument. Improve static type checking by using the enum req_op type for variables that represent a request operation and the new blk_opf_t type for variables that represent request flags. Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Eric Biggers <ebiggers@google.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-27-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14dm/dm-flakey: Use the new blk_opf_t typeBart Van Assche
Use the new blk_opf_t type for structure members that represent request flags. Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-26-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14dm/ebs: Change 'int rw' into 'enum req_op op'Bart Van Assche
Improve static type checking by using type 'enum req_op' instead of 'int'. Make the role of the 'rw' arguments more clear by renaming these into 'op' (operation). This patch does not change any functionality since REQ_OP_READ = READ = 0 and REQ_OP_WRITE = WRITE = 1. Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Heinz Mauelshagen <heinzm@redhat.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-25-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14dm/core: Combine request operation type and flagsBart Van Assche
Improve kernel code uniformity by combining the request operation type and flags into a single variable. Change 'int rw' into 'enum req_op op' because the name 'op' is what is used in the block layer to hold a request type. Use the blk_opf_t and enum req_op types where appropriate to improve static type checking. Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-24-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14dm/core: Rename kcopyd_job.rw into kcopyd.opBart Van Assche
The member name 'rw' suggests that this member either has the value 'READ' or 'WRITE' and no other values. Since that member also can have the value REQ_OP_WRITE_ZEROES, rename 'rw' into 'op'. This patch does not change any functionality since REQ_OP_READ = READ = 0 and REQ_OP_WRITE = WRITE = 1. Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-23-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14dm/core: Reduce the size of struct dm_io_requestBart Van Assche
Combine the bi_op and bi_op_flags into the bi_opf member. Use the new blk_opf_t type to improve static type checking. This patch does not change any functionality. Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-22-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14block: Change the type of req_op() and bio_op() into enum req_opBart Van Assche
Improve static type checking by changing the type of the value returned by req_op() and bio_op() from unsigned int into enum req_op. Insert 'default: break;' in switch statements on the enum req_op type to prevent that the compiler warns about these switch statements. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Tim Waugh <tim@cyberelk.net> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-5-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14treewide: Rename enum req_opf into enum req_opBart Van Assche
The type name enum req_opf is misleading since it suggests that values of this type include both an operation type and flags. Since values of this type represent an operation only, change the type name into enum req_op. Convert the enum req_op documentation into kernel-doc format. Move a few definitions such that the enum req_op documentation occurs just above the enum req_op definition. The name "req_opf" was introduced by commit ef295ecf090d ("block: better op and flags encoding"). Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14block: remove bdevnameChristoph Hellwig
Replace the remaining calls of bdevname with snprintf using the %pg format specifier. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220713055317.1888500-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-08dm: verity-loadpin: Use CONFIG_SECURITY_LOADPIN_VERITY for conditional ↵Matthias Kaehlcke
compilation The verity glue for LoadPin is only needed when CONFIG_SECURITY_LOADPIN_VERITY is set, use this option for conditional compilation instead of the combo of CONFIG_DM_VERITY and CONFIG_SECURITY_LOADPIN. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Acked-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/lkml/20220627083512.v7.3.I5aca2dcc3b06de4bf53696cd21329dce8272b8aa@changeid Signed-off-by: Kees Cook <keescook@chromium.org>
2022-07-08dm: Add verity helpers for LoadPinMatthias Kaehlcke
LoadPin limits loading of kernel modules, firmware and certain other files to a 'pinned' file system (typically a read-only rootfs). To provide more flexibility LoadPin is being extended to also allow loading these files from trusted dm-verity devices. For that purpose LoadPin can be provided with a list of verity root digests that it should consider as trusted. Add a bunch of helpers to allow LoadPin to check whether a DM device is a trusted verity device. The new functions broadly fall in two categories: those that need access to verity internals (like the root digest), and the 'glue' between LoadPin and verity. The new file dm-verity-loadpin.c contains the glue functions. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Acked-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/lkml/20220627083512.v7.1.I3e928575a23481121e73286874c4c2bdb403355d@changeid Signed-off-by: Kees Cook <keescook@chromium.org>
2022-07-07dm snapshot: fix typo in snapshot_map() commentZhang Jiaming
Signed-off-by: Zhang Jiaming <jiaming@nfschina.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-07dm raid: remove redundant "the" in parse_raid_params() commentJiang Jian
Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-07dm cache: fix typo in 2 comment blocksSteven Lung
Replace neccessarily with necessarily. Signed-off-by: Steven Lung <1030steven@gmail.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-07dm verity: fix checkpatch close brace errorJeongHyeon Lee
Resolves: ERROR: else should follow close brace '}' Signed-off-by: JeongHyeon Lee <jhs2.lee@samsung.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-07dm table: rename dm_target variable in dm_table_add_target()Mike Snitzer
Rename from "tgt" to "ti" so that all of dm-table.c code uses the same naming for dm_target variables. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-07dm table: audit all dm_table_get_target() callersMike Snitzer
All callers of dm_table_get_target() are expected to do proper bounds checking on the index they pass. Move dm_table_get_target() to dm-core.h to make it extra clear that only DM core code should be using it. Switch it to be inlined while at it. Standardize all DM core callers to use the same for loop pattern and make associated variables as local as possible. Rename some variables (e.g. s/table/t/ and s/tgt/ti/) along the way. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-07dm table: remove dm_table_get_num_targets() wrapperMike Snitzer
More efficient and readable to just access table->num_targets directly. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-07dm: add two stage requeue mechanismMing Lei
Commit 61b6e2e5321d ("dm: fix BLK_STS_DM_REQUEUE handling when dm_io represents split bio") reverted DM core's bio splitting back to using bio_split()+bio_chain() because it was found that otherwise DM's BLK_STS_DM_REQUEUE would trigger a live-lock waiting for bio completion that would never occur. Restore using bio_trim()+bio_inc_remaining(), like was done in commit 7dd76d1feec7 ("dm: improve bio splitting and associated IO accounting"), but this time with proper handling for the above scenario that is covered in more detail in the commit header for 61b6e2e5321d. Solve this issue by adding a two staged dm_io requeue mechanism that uses the new dm_bio_rewind() via dm_io_rewind(): 1) requeue the dm_io into the requeue_list added to struct mapped_device, and schedule it via new added requeue work. This workqueue just clones the dm_io->orig_bio (which DM saves and ensures its end sector isn't modified). dm_io_rewind() uses the sectors and sectors_offset members of the dm_io that are recorded relative to the end of orig_bio: dm_bio_rewind()+bio_trim() are then used to make that cloned bio reflect the subset of the original bio that is represented by the dm_io that is being requeued. 2) the 2nd stage requeue is same with original requeue, but io->orig_bio points to new cloned bio (which matches the requeued dm_io as described above). This allows DM core to shift the need for bio cloning from bio-split time (during IO submission) to the less likely BLK_STS_DM_REQUEUE handling (after IO completes with that error). Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-07dm: add dm_bio_rewind() API to DM coreMing Lei
Commit 7759eb23fd98 ("block: remove bio_rewind_iter()") removed a similar API for the following reasons: ``` It is pointed that bio_rewind_iter() is one very bad API[1]: 1) bio size may not be restored after rewinding 2) it causes some bogus change, such as 5151842b9d8732 (block: reset bi_iter.bi_done after splitting bio) 3) rewinding really makes things complicated wrt. bio splitting 4) unnecessary updating of .bi_done in fast path [1] https://marc.info/?t=153549924200005&r=1&w=2 So this patch takes Kent's suggestion to restore one bio into its original state via saving bio iterator(struct bvec_iter) in bio_integrity_prep(), given now bio_rewind_iter() is only used by bio integrity code. ``` However, saving off a copy of the 32 bytes bio->bi_iter in case rewind needed isn't efficient because it bloats per-bio-data for what is an unlikely case. That suggestion also ignores the need to restore crypto and integrity info. Add dm_bio_rewind() API for a specific use-case that is much more narrow than the previous more generic rewind code that was reverted: 1) most bios have a fixed end sector since bio split is done from front of the bio, if driver just records how many sectors between current bio's start sector and the original bio's end sector, the original position can be restored. Keeping the original bio's end sector fixed is a _hard_ requirement for this interface! 2) if a bio's end sector won't change (usually bio_trim() isn't called, or in the case of DM it preserves original bio), user can restore the original position by storing sector offset from the current ->bi_iter.bi_sector to bio's end sector; together with saving bio size, only 8 bytes is needed to restore to original bio. 3) DM's requeue use case: when BLK_STS_DM_REQUEUE happens, DM core needs to restore to an "original bio" which represents the current dm_io to be requeued (which may be a subset of the original bio). By storing the sector offset from the original bio's end sector and dm_io's size, dm_bio_rewind() can restore such original bio. See commit 7dd76d1feec7 ("dm: improve bio splitting and associated IO accounting") for more details on how DM does this. Leveraging this, allows DM core to shift the need for bio cloning from bio-split time (during IO submission) to the less likely BLK_STS_DM_REQUEUE handling (after IO completes with that error). 4) Unlike the original rewind API, dm_bio_rewind() doesn't add .bi_done to bvec_iter and there is no effect on the fast path. Implement dm_bio_rewind() by factoring out clear helpers that it calls: dm_bio_integrity_rewind, dm_bio_crypt_rewind and dm_bio_rewind_iter. DM is able to ensure that dm_bio_rewind() is used safely but, given the constraint that the bio's end must never change, other hypothetical future callers may not take the same care. So make dm_bio_rewind() and all supporting code local to DM to avoid risk of hypothetical abuse. A "dm_" prefix was added to all functions to avoid any namespace collisions. Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-06block: move zone related fields to struct gendiskChristoph Hellwig
Move the zone related fields that are currently stored in struct request_queue to struct gendisk as these are part of the highlevel block layer API and are only used for non-passthrough I/O that requires the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: remove blk_queue_zone_sectorsChristoph Hellwig
Always use bdev_zone_sectors instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06dm-zoned: cleanup dmz_fixup_devicesChristoph Hellwig
Use the bdev based helpers where applicable and move the zoned_dev into the scope where it is actually used. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: replace blkdev_nr_zones with bdev_nr_zonesChristoph Hellwig
Pass a block_device instead of a request_queue as that is what most callers have at hand. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: use bdev_is_zoned instead of open coding itChristoph Hellwig
Use bdev_is_zoned in all places where a block_device is available instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-03mm: shrinkers: provide shrinkers with namesRoman Gushchin
Currently shrinkers are anonymous objects. For debugging purposes they can be identified by count/scan function names, but it's not always useful: e.g. for superblock's shrinkers it's nice to have at least an idea of to which superblock the shrinker belongs. This commit adds names to shrinkers. register_shrinker() and prealloc_shrinker() functions are extended to take a format and arguments to master a name. In some cases it's not possible to determine a good name at the time when a shrinker is allocated. For such cases shrinker_debugfs_rename() is provided. The expected format is: <subsystem>-<shrinker_type>[:<instance>]-<id> For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair. After this change the shrinker debugfs directory looks like: $ cd /sys/kernel/debug/shrinker/ $ ls dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42 mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43 mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44 rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49 sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13 sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36 sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19 sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10 sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9 sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37 sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38 sb-dax-11 sb-proc-45 sb-tmpfs-35 sb-debugfs-7 sb-proc-46 sb-tmpfs-40 [roman.gushchin@linux.dev: fix build warnings] Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle Reported-by: kernel test robot <lkp@intel.com> Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-29dm raid: fix KASAN warning in raid5_add_disksMikulas Patocka
There's a KASAN warning in raid5_add_disk when running the LVM testsuite. The warning happens in the test lvconvert-raid-reshape-linear_to_raid6-single-type.sh. We fix the warning by verifying that rdev->saved_raid_disk is within limits. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-06-29dm raid: fix KASAN warning in raid5_remove_diskMikulas Patocka
There's a KASAN warning in raid5_remove_disk when running the LVM testsuite. We fix this warning by verifying that the "number" variable is within limits. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-06-29dm: improve BLK_STS_DM_REQUEUE and BLK_STS_AGAIN handlingMing Lei
If either BLK_STS_DM_REQUEUE or BLK_STS_AGAIN is returned for POLLED io, we requeue the original bio into deferred list and kick md->wq to re-submit it to block layer. Improve the handling in the following way: 1) Factor out dm_handle_requeue() for handling dm_io requeue. 2) Unify handling for BLK_STS_DM_REQUEUE and BLK_STS_AGAIN: clear REQ_POLLED for BLK_STS_DM_REQUEUE too, for the sake of simplicity, given BLK_STS_DM_REQUEUE is very unusual. 3) Queue md->wq explicitly in dm_handle_requeue(), so requeue handling becomes more robust. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-06-29dm: refactor dm_md_mempool allocationChristoph Hellwig
The current split between dm_table_alloc_md_mempools and dm_alloc_md_mempools is rather arbitrary, so merge the two into one easy to follow function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-06-29dm: unexport dm_get_reserved_rq_based_iosChristoph Hellwig
dm_get_reserved_rq_based_ios is only used in the core dm code, so remove the export. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-06-28block: remove blk_cleanup_diskChristoph Hellwig
blk_cleanup_disk is nothing but a trivial wrapper for put_disk now, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-27dm raid: fix accesses beyond end of raid member arrayHeinz Mauelshagen
On dm-raid table load (using raid_ctr), dm-raid allocates an array rs->devs[rs->raid_disks] for the raid device members. rs->raid_disks is defined by the number of raid metadata and image tupples passed into the target's constructor. In the case of RAID layout changes being requested, that number can be different from the current number of members for existing raid sets as defined in their superblocks. Example RAID layout changes include: - raid1 legs being added/removed - raid4/5/6/10 number of stripes changed (stripe reshaping) - takeover to higher raid level (e.g. raid5 -> raid6) When accessing array members, rs->raid_disks must be used in control loops instead of the potentially larger value in rs->md.raid_disks. Otherwise it will cause memory access beyond the end of the rs->devs array. Fix this by changing code that is prone to out-of-bounds access. Also fix validate_raid_redundancy() to validate all devices that are added. Also, use braces to help clean up raid_iterate_devices(). The out-of-bounds memory accesses was discovered using KASAN. This commit was verified to pass all LVM2 RAID tests (with KASAN enabled). Cc: stable@vger.kernel.org Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-06-27dm: open code blk_max_size_offset in max_io_lenChristoph Hellwig
max_io_len always passes an explicitly non-zero chunk_sectors into blk_max_size_offset. That means much of blk_max_size_offset is not needed and can be open coded to simplify the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20220614090934.570632-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-23dm mirror log: clear log bits up to BITS_PER_LONG boundaryMikulas Patocka
Commit 85e123c27d5c ("dm mirror log: round up region bitmap size to BITS_PER_LONG") introduced a regression on 64-bit architectures in the lvm testsuite tests: lvcreate-mirror, mirror-names and vgsplit-operation. If the device is shrunk, we need to clear log bits beyond the end of the device. The code clears bits up to a 32-bit boundary and then calculates lc->sync_count by summing set bits up to a 64-bit boundary (the commit changed that; previously, this boundary was 32-bit too). So, it was using some non-zeroed bits in the calculation and this caused misbehavior. Fix this regression by clearing bits up to BITS_PER_LONG boundary. Fixes: 85e123c27d5c ("dm mirror log: round up region bitmap size to BITS_PER_LONG") Cc: stable@vger.kernel.org Reported-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-06-23dm: fix BLK_STS_DM_REQUEUE handling when dm_io represents split bioMing Lei
Commit 7dd76d1feec7 ("dm: improve bio splitting and associated IO accounting") removed using cloned bio when dm io splitting is needed. Using bio_trim()+bio_inc_remaining() rather than bio_split()+bio_chain() causes multiple dm_io instances to share the same original bio, and it works fine if IOs are completed successfully. But a regression was caused for the case when BLK_STS_DM_REQUEUE is returned from any one of DM's cloned bios (whose dm_io share the same orig_bio). In this BLK_STS_DM_REQUEUE case only the mapped subset of the original bio for the current exact dm_io needs to be re-submitted. However, since the original bio is shared among all dm_io instances, the ->orig_bio actually only represents the last dm_io instance, so requeue can't work as expected. Also when more than one dm_io is requeued, the same original bio is requeued from all dm_io's completion handler, then race is caused. Fix this issue by still allocating one clone bio for completing io only, then io accounting can rely on ->orig_bio being unmodified. This is needed because the dm_io's sector_offset and sectors members are recorded relative to an unmodified ->orig_bio. In the future, we can go back to using bio_trim()+bio_inc_remaining() for dm's io splitting but then delay needing a bio clone only when handling BLK_STS_DM_REQUEUE, but that approach is a bit complicated (so it needs a development cycle): 1) bio clone needs to be done in task context 2) a block interface for unwinding bio is required Fixes: 7dd76d1feec7 ("dm: improve bio splitting and associated IO accounting") Reported-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-06-21dm: do not return early from dm_io_complete if BLK_STS_AGAIN without pollingMike Snitzer
Commit 5291984004edf ("dm: fix bio polling to handle possibile BLK_STS_AGAIN") inadvertently introduced an early return from dm_io_complete() without first queueing the bio to DM if BLK_STS_AGAIN occurs and bio-polling is _not_ being used. Fix this by only returning early from dm_io_complete() if the bio has first been properly queued to DM. Otherwise, the bio will never finish via bio_endio. Fixes: 5291984004edf ("dm: fix bio polling to handle possibile BLK_STS_AGAIN") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-06-21dm era: commit metadata in postsuspend after worker stopsNikos Tsironis
During postsuspend dm-era does the following: 1. Archives the current era 2. Commits the metadata, as part of the RPC call for archiving the current era 3. Stops the worker Until the worker stops, it might write to the metadata again. Moreover, these writes are not flushed to disk immediately, but are cached by the dm-bufio client, which writes them back asynchronously. As a result, the committed metadata of a suspended dm-era device might not be consistent with the in-core metadata. In some cases, this can result in the corruption of the on-disk metadata. Suppose the following sequence of events: 1. Load a new table, e.g. a snapshot-origin table, to a device with a dm-era table 2. Suspend the device 3. dm-era commits its metadata, but the worker does a few more metadata writes until it stops, as part of digesting an archived writeset 4. These writes are cached by the dm-bufio client 5. Load the dm-era table to another device. 6. The new instance of the dm-era target loads the committed, on-disk metadata, which don't include the extra writes done by the worker after the metadata commit. 7. Resume the new device 8. The new dm-era target instance starts using the metadata 9. Resume the original device 10. The destructor of the old dm-era target instance is called and destroys the dm-bufio client, which results in flushing the cached writes to disk 11. These writes might overwrite the writes done by the new dm-era instance, hence corrupting its metadata. Fix this by committing the metadata after the worker stops running. stop_worker uses flush_workqueue to flush the current work. However, the work item may re-queue itself and flush_workqueue doesn't wait for re-queued works to finish. This could result in the worker changing the metadata after they have been committed, or writing to the metadata concurrently with the commit in the postsuspend thread. Use drain_workqueue instead, which waits until the work and all re-queued works finish. Fixes: eec40579d8487 ("dm: add era target") Cc: stable@vger.kernel.org # v3.15+ Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-06-17Merge tag 'block-5.19-2022-06-16' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request from Christoph - Quirks, quirks, quirks to work around buggy consumer grade devices (Keith Bush, Ning Wang, Stefan Reiter, Rasheed Hsueh) - Better kernel messages for devices that need quirking (Keith Bush) - Make a kernel message more useful (Thomas Weißschuh) - MD pull request from Song, with a few fixes - blk-mq sysfs locking fixes (Ming) - BFQ stats fix (Bart) - blk-mq offline queue fix (Bart) - blk-mq flush request tag fix (Ming) * tag 'block-5.19-2022-06-16' of git://git.kernel.dk/linux-block: block/bfq: Enable I/O statistics blk-mq: don't clear flush_rq from tags->rqs[] blk-mq: avoid to touch q->elevator without any protection blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none block: Fix handling of offline queues in blk_mq_alloc_request_hctx() md/raid5-ppl: Fix argument order in bio_alloc_bioset() Revert "md: don't unregister sync_thread with reconfig_mutex held" nvme-pci: disable write zeros support on UMIC and Samsung SSDs nvme-pci: avoid the deepest sleep state on ZHITAI TiPro7000 SSDs nvme-pci: sk hynix p31 has bogus namespace ids nvme-pci: smi has bogus namespace ids nvme-pci: phison e12 has bogus namespace ids nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG GAMMIX S50 nvme-pci: add trouble shooting steps for timeouts nvme: add bug report info for global duplicate id nvme: add device name to warning in uuid_show()
2022-06-16dm mirror log: round up region bitmap size to BITS_PER_LONGMikulas Patocka
The code in dm-log rounds up bitset_size to 32 bits. It then uses find_next_zero_bit_le on the allocated region. find_next_zero_bit_le accesses the bitmap using unsigned long pointers. So, on 64-bit architectures, it may access 4 bytes beyond the allocated size. Fix this bug by rounding up bitset_size to BITS_PER_LONG. This bug was found by running the lvm2 testsuite with kasan. Fixes: 29121bd0b00e ("[PATCH] dm mirror log: bitset_size fix") Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-06-16dm: fix narrow race for REQ_NOWAIT bios being issued despite no supportMikulas Patocka
Starting with the commit 63a225c9fd20, device mapper has an optimization that it will take cheaper table lock (dm_get_live_table_fast instead of dm_get_live_table) if the bio has REQ_NOWAIT. The bios with REQ_NOWAIT must not block in the target request routine, if they did, we would be blocking while holding rcu_read_lock, which is prohibited. The targets that are suitable for REQ_NOWAIT optimization (and that don't block in the map routine) have the flag DM_TARGET_NOWAIT set. Device mapper will test if all the targets and all the devices in a table support nowait (see the function dm_table_supports_nowait) and it will set or clear the QUEUE_FLAG_NOWAIT flag on its request queue according to this check. There's a test in submit_bio_noacct: "if ((bio->bi_opf & REQ_NOWAIT) && !blk_queue_nowait(q)) goto not_supported" - this will make sure that REQ_NOWAIT bios can't enter a request queue that doesn't support them. This mechanism works to prevent REQ_NOWAIT bios from reaching dm targets that don't support the REQ_NOWAIT flag (and that may block in the map routine) - except that there is a small race condition: submit_bio_noacct checks if the queue has the QUEUE_FLAG_NOWAIT without holding any locks. Immediatelly after this check, the device mapper table may be reloaded with a table that doesn't support REQ_NOWAIT (for example, if we start moving the logical volume or if we activate a snapshot). However the REQ_NOWAIT bio that already passed the check in submit_bio_noacct would be sent to device mapper, where it could be redirected to a dm target that doesn't support REQ_NOWAIT - the result is sleeping while we hold rcu_read_lock. In order to fix this race, we double-check if the target supports REQ_NOWAIT while we hold the table lock (so that the table can't change under us). Fixes: 563a225c9fd2 ("dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-06-16dm: fix use-after-free in dm_put_live_table_bioMikulas Patocka
dm_put_live_table_bio is called from the end of dm_submit_bio. However, at this point, the bio may be already finished and the caller may have freed the bio. Consequently, dm_put_live_table_bio accesses the stale "bio" pointer. Fix this bug by loading the bi_opf value and passing it to dm_get_live_table_bio and dm_put_live_table_bio instead of the bio. This bug was found by running the lvm2 testsuite with kasan. Fixes: 563a225c9fd2 ("dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>