summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2025-09-20ublk: pass q_id and tag to __ublk_check_and_get_req()Caleb Sander Mateos
__ublk_check_and_get_req() only uses its ublk_queue argument to get the q_id and tag. Pass those arguments explicitly to save an access to the ublk_queue. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-20ublk: don't access ublk_queue in ublk_daemon_register_io_buf()Caleb Sander Mateos
For ublk servers with many ublk queues, accessing the ublk_queue in ublk_daemon_register_io_buf() is a frequent cache miss. Get the flags from the ublk_device instead, which is accessed earlier in ublk_ch_uring_cmd_local(). Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-20ublk: don't access ublk_queue in ublk_register_io_buf()Caleb Sander Mateos
For ublk servers with many ublk queues, accessing the ublk_queue in ublk_register_io_buf() is a frequent cache miss. Get the flags from the ublk_device instead, which is accessed earlier in ublk_ch_uring_cmd_local(). Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-20ublk: pass ublk_device to ublk_register_io_buf()Caleb Sander Mateos
Avoid repeating the 2 dereferences to get the ublk_device from the io_uring_cmd by passing it from ublk_ch_uring_cmd_local() to ublk_register_io_buf(). Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-20ublk: don't dereference ublk_queue in ublk_check_and_get_req()Caleb Sander Mateos
For ublk servers with many ublk queues, accessing the ublk_queue in ublk_ch_{read,write}_iter() is a frequent cache miss. Get the flags and queue depth from the ublk_device instead, which is accessed just before. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-20ublk: don't dereference ublk_queue in ublk_ch_uring_cmd_local()Caleb Sander Mateos
For ublk servers with many ublk queues, accessing the ublk_queue to handle a ublk command is a frequent cache miss. Get the queue depth from the ublk_device instead, which is accessed just before. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-20ublk: add helpers to check ublk_device flagsCaleb Sander Mateos
Introduce ublk_device analogues of the ublk_queue flag helpers: - ublk_support_zero_copy() -> ublk_dev_support_user_copy() - ublk_support_auto_buf_reg() -> ublk_dev_support_auto_buf_reg() - ublk_support_user_copy() -> ublk_dev_support_user_copy() - ublk_need_map_io() -> ublk_dev_need_map_io() - ublk_need_req_ref() -> ublk_dev_need_req_ref() - ublk_need_get_data() -> ublk_dev_need_get_data() These will be used in subsequent changes to avoid accessing the ublk_queue just for the flags, and instead use the ublk_device. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-20ublk: don't pass ublk_queue to __ublk_fail_req()Caleb Sander Mateos
__ublk_fail_req() only uses the ublk_queue to get the ublk_device, which its caller already has. So just pass the ublk_device directly. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-20ublk: don't pass q_id to ublk_queue_cmd_buf_size()Caleb Sander Mateos
ublk_queue_cmd_buf_size() only needs the queue depth, which is the same for all queues. Get the queue depth from the ublk_device instead so the q_id parameter can be dropped. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-20ublk: remove ubq check in ublk_check_and_get_req()Caleb Sander Mateos
ublk_get_queue() never returns a NULL pointer, so there's no need to check its return value in ublk_check_and_get_req(). Drop the check. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-10md/md-llbitmap: Use DIV_ROUND_UP_SECTOR_TNathan Chancellor
When building for 32-bit platforms, there are several link (if builtin) or modpost (if a module) errors due to dividends of type 'sector_t' in DIV_ROUND_UP: arm-linux-gnueabi-ld: drivers/md/md-llbitmap.o: in function `llbitmap_resize': drivers/md/md-llbitmap.c:1017:(.text+0xae8): undefined reference to `__aeabi_uldivmod' arm-linux-gnueabi-ld: drivers/md/md-llbitmap.c:1020:(.text+0xb10): undefined reference to `__aeabi_uldivmod' arm-linux-gnueabi-ld: drivers/md/md-llbitmap.o: in function `llbitmap_end_discard': drivers/md/md-llbitmap.c:1114:(.text+0xf14): undefined reference to `__aeabi_uldivmod' arm-linux-gnueabi-ld: drivers/md/md-llbitmap.o: in function `llbitmap_start_discard': drivers/md/md-llbitmap.c:1097:(.text+0x1808): undefined reference to `__aeabi_uldivmod' arm-linux-gnueabi-ld: drivers/md/md-llbitmap.o: in function `llbitmap_read_sb': drivers/md/md-llbitmap.c:867:(.text+0x2080): undefined reference to `__aeabi_uldivmod' arm-linux-gnueabi-ld: drivers/md/md-llbitmap.o:drivers/md/md-llbitmap.c:895: more undefined references to `__aeabi_uldivmod' follow Use DIV_ROUND_UP_SECTOR_T instead of DIV_ROUND_UP, which exists to handle this exact situation. Fixes: 5ab829f1971d ("md/md-llbitmap: introduce new lockless bitmap") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-10ublk: consolidate nr_io_ready and nr_queues_readyCaleb Sander Mateos
ublk_mark_io_ready() tracks whether all the ublk_device's I/Os have been fetched by incrementing ublk_queue's nr_io_ready count and incrementing ublk_device's nr_queues_ready count if the whole queue is ready. Simplify the logic by just tracking the total number of fetched I/Os on each ublk_device. When this count reaches nr_hw_queues * queue_depth, the ublk_device is ready to receive I/O. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-10md/raid0: convert raid0_make_request() to use bio_submit_split_bioset()Yu Kuai
Currently, raid0_make_request() will remap the original bio to underlying disks to prevent reordered IO. Now that bio_submit_split_bioset() will put original bio to the head of current->bio_list, it's safe converting to use this helper and bio will still be ordered. CC: Jan Kara <jack@suse.cz> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-10md/md-linear: convert to use bio_submit_split_bioset()Yu Kuai
Unify bio split code, prepare to fix reordered split IO. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-10md/raid5: convert to use bio_submit_split_bioset()Yu Kuai
Unify bio split code, prepare to fix ordering of split IO. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-10md/raid10: convert read/write to use bio_submit_split_bioset()Yu Kuai
Unify bio split code, prepare to fix ordering of split IO, the error path is modified a bit, however no functional changes are intended: - bio_submit_split_bioset() can fail the original bio directly by split error, set R10BIO_Uptodate in this case to notify raid_end_bio_io() that the original bio is returned already. - set R10BIO_Uptodate and set error value to -EIO is useless now, for r10_bio without R10BIO_Uptodate, -EIO will be returned for original bio. And discard is not handled, because discard is only split for unaligned head and tail, and this can be considered slow path, the reorder here does not matter much. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-10md/raid10: add a new r10bio flag R10BIO_ReturnedYu Kuai
The new helper bio_submit_split_bioset() can failed the orginal bio on split errors, prepare to handle this case in raid_end_bio_io(). The flag name is refer to the r1bio flag name. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-10md/raid1: convert to use bio_submit_split_bioset()Yu Kuai
Unify bio split code, and prepare to fix ordering of split IO. Noted that bio_submit_split_bioset() can fail the original bio directly by split error, set R1BIO_Returned in this case to notify raid_end_bio_io() that the original bio is returned already. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-10md/raid0: convert raid0_handle_discard() to use bio_submit_split_bioset()Yu Kuai
Unify bio split code, and prepare to fix ordering of split IO Noted commit 319ff40a5427 ("md/raid0: Fix performance regression for large sequential writes") already fix ordering of split IO by remapping bio to underlying disks before resubmitting it, with the respect md_submit_bio() already split it by sectors, and raid0_make_request() will split at most once for unaligned IO. This is a bit hacky and we'll convert this to solution in general later. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-10md: fix mssing blktrace bio split eventsYu Kuai
If bio is split by internal handling like chunksize or badblocks, the corresponding trace_block_split() is missing, resulting in blktrace inability to catch BIO split events and making it harder to analyze the BIO sequence. Cc: stable@vger.kernel.org Fixes: 4b1faf931650 ("block: Kill bio_pair_split()") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-09Merge tag 'md-6.18-20250909' of ↵Jens Axboe
gitolite.kernel.org:pub/scm/linux/kernel/git/mdraid/linux into for-6.18/block Pull MD changes from Yu Kuai: "Redundant data is used to enhance data fault tolerance, and the storage method for redundant data vary depending on the RAID levels. And it's important to maintain the consistency of redundant data. Bitmap is used to record which data blocks have been synchronized and which ones need to be resynchronized or recovered. Each bit in the bitmap represents a segment of data in the array. When a bit is set, it indicates that the multiple redundant copies of that data segment may not be consistent. Data synchronization can be performed based on the bitmap after power failure or readding a disk. If there is no bitmap, a full disk synchronization is required. Due to known performance issues with md-bitmap and the unreasonable implementations: - self-managed IO submitting like filemap_write_page(); - global spin_lock I have decided not to continue optimizing based on the current bitmap implementation, this new bitmap is invented without locking from IO fast path and can be used with fast disks. Key features for the new bitmap: - IO fastpath is lockless, if user issues lots of write IO to the same bitmap bit in a short time, only the first write has additional overhead to update bitmap bit, no additional overhead for the following writes; - support only resync or recover written data, means in the case creating new array or replacing with a new disk, there is no need to do a full disk resync/recovery;" * tag 'md-6.18-20250909' of gitolite.kernel.org:pub/scm/linux/kernel/git/mdraid/linux: (24 commits) md/md-llbitmap: introduce new lockless bitmap md/md-bitmap: make method bitmap_ops->daemon_work optional md: add a new recovery_flag MD_RECOVERY_LAZY_RECOVER md/md-bitmap: add a new method blocks_synced() in bitmap_operations md/md-bitmap: add a new method skip_sync_blocks() in bitmap_operations md/md-bitmap: delay registration of bitmap_ops until creating bitmap md/md-bitmap: add a new sysfs api bitmap_type md: add a new mddev field 'bitmap_id' md/md-bitmap: support discard for bitmap ops md: factor out a helper raid_is_456() md: add a new parameter 'offset' to md_super_write() md/md-bitmap: introduce CONFIG_MD_BITMAP md: check before referencing mddev->bitmap_ops md/dm-raid: check before referencing mddev->bitmap_ops md/raid5: check before referencing mddev->bitmap_ops md/raid10: check before referencing mddev->bitmap_ops md/raid1: check before referencing mddev->bitmap_ops md/raid1: check bitmap before behind write md/md-bitmap: handle the case bitmap is not enabled before end_sync() md/md-bitmap: handle the case bitmap is not enabled before start_sync() ...
2025-09-09blk-map: provide the bdev to bio if one existsKeith Busch
We can now safely provide a block device when extracting user pages for driver and user passthrough commands. Set the bdev so the caller doesn't have to do that later. This has an additional benefit of being able to extract P2P pages in the passthrough path. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-09blk-mq-dma: bring back p2p request flagsKeith Busch
We only need to consider data and metadata dma mapping types separately. The request and bio integrity payload have enough flag bits to internally track the mapping type for each. Use these so the caller doesn't need to track them, and provide separete request and integrity helpers to the common code. This will make it easier to scale new mappings, like the proposed MMIO attribute, without burdening the caller to track such things. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-09drivers/block: WQ_PERCPU added to alloc_workqueue usersMarco Crivellari
Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This patch adds a new WQ_PERCPU flag to explicitly request the use of the per-CPU behavior. Both flags coexist for one release cycle to allow callers to transition their calls. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. All existing users have been updated accordingly. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-09drivers/block: replace use of system_unbound_wq with system_dfl_wqMarco Crivellari
Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. system_unbound_wq should be the default workqueue so as not to enforce locality constraints for random work whenever it's not required. Adding system_dfl_wq to encourage its use when unbound work should be used. queue_work() / queue_delayed_work() / mod_delayed_work() will now use the new unbound wq: whether the user still use the old wq a warn will be printed along with a wq redirect to the new one. The old system_unbound_wq will be kept for a few release cycles. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-09drivers/block: replace use of system_wq with system_percpu_wqMarco Crivellari
Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. system_unbound_wq should be the default workqueue so as not to enforce locality constraints for random work whenever it's not required. Adding system_dfl_wq to encourage its use when unbound work should be used. queue_work() / queue_delayed_work() / mod_delayed_work() will now use the new unbound wq: whether the user still use the old wq a warn will be printed along with a wq redirect to the new one. The old system_unbound_wq will be kept for a few release cycles. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-09block: floppy: Replace kmalloc() + copy_from_user() with memdup_user()Thorsten Blum
Replace kmalloc() followed by copy_from_user() with memdup_user() to improve and simplify raw_cmd_copyin(). No functional changes intended. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-09block: remove the bi_inline_vecs variable sized array from struct bioChristoph Hellwig
Bios are embedded into other structures, and at least spare is unhappy about embedding structures with variable sized arrays. There's no real need to the array anyway, we can replace it with a helper pointing to the memory just behind the bio, and with the previous cleanups there is very few site doing anything special with it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-09block: add a bio_init_inline helperChristoph Hellwig
Just a simpler wrapper around bio_init for callers that want to initialize a bio with inline bvecs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-09nbd: restrict sockets to TCP and UDPEric Dumazet
Recently, syzbot started to abuse NBD with all kinds of sockets. Commit cf1b2326b734 ("nbd: verify socket is supported during setup") made sure the socket supported a shutdown() method. Explicitely accept TCP and UNIX stream sockets. Fixes: cf1b2326b734 ("nbd: verify socket is supported during setup") Reported-by: syzbot+e1cd6bd8493060bd701d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/CANn89iJ+76eE3A_8S_zTpSyW5hvPRn6V57458hCZGY5hbH_bFA@mail.gmail.com/T/#m081036e8747cd7e2626c1da5d78c8b9d1e55b154 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mike Christie <mchristi@redhat.com> Cc: Richard W.M. Jones <rjones@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Yu Kuai <yukuai1@huaweicloud.com> Cc: linux-block@vger.kernel.org Cc: nbd@other.debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-08null_blk: Fix the description of the cache_size module argumentGenjian Zhang
When executing modinfo null_blk, there is an error in the description of module parameter mbps, and the output information of cache_size is incomplete.The output of modinfo before and after applying this patch is as follows: Before: [...] parm: cache_size:ulong [...] parm: mbps:Cache size in MiB for memory-backed device. Default: 0 (none) (uint) [...] After: [...] parm: cache_size:Cache size in MiB for memory-backed device. Default: 0 (none) (ulong) [...] parm: mbps:Limit maximum bandwidth (in MiB/s). Default: 0 (no limit) (uint) [...] Fixes: 058efe000b31 ("null_blk: add module parameters for 4 options") Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-06md/md-llbitmap: introduce new lockless bitmapYu Kuai
Redundant data is used to enhance data fault tolerance, and the storage method for redundant data vary depending on the RAID levels. And it's important to maintain the consistency of redundant data. Bitmap is used to record which data blocks have been synchronized and which ones need to be resynchronized or recovered. Each bit in the bitmap represents a segment of data in the array. When a bit is set, it indicates that the multiple redundant copies of that data segment may not be consistent. Data synchronization can be performed based on the bitmap after power failure or readding a disk. If there is no bitmap, a full disk synchronization is required. Due to known performance issues with md-bitmap and the unreasonable implementations: - self-managed IO submitting like filemap_write_page(); - global spin_lock I have decided not to continue optimizing based on the current bitmap implementation, this new bitmap is invented without locking from IO fast path and can be used with fast disks. For designs and details, see the comments in drivers/md-llbitmap.c. Link: https://lore.kernel.org/linux-raid/20250829080426.1441678-12-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Li Nan <linan122@huawei.com>
2025-09-06md/md-bitmap: make method bitmap_ops->daemon_work optionalYu Kuai
daemon_work() will be called by daemon thread, on the one hand, daemon thread doesn't have strict wake-up time; on the other hand, too much work are put to daemon thread, like handle sync IO, handle failed or specail normal IO, handle recovery, and so on. Hence daemon thread may be too busy to clear dirty bits in time. Make bitmap_ops->daemon_work() optional and following patches will use separate async work to clear dirty bits for the new bitmap. Link: https://lore.kernel.org/linux-raid/20250829080426.1441678-11-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Li Nan <linan122@huawei.com>
2025-09-06md: add a new recovery_flag MD_RECOVERY_LAZY_RECOVERYu Kuai
This flag is used by llbitmap in later patches to skip raid456 initial recover and delay building initial xor data to first write. https://lore.kernel.org/linux-raid/20250829080426.1441678-10-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com>
2025-09-06md/md-bitmap: add a new method blocks_synced() in bitmap_operationsYu Kuai
Currently, raid456 must perform a whole array initial recovery to build initail xor data, then IO to the array won't have to read all the blocks in underlying disks. This behavior will affect IO performance a lot, and nowadays there are huge disks and the initial recovery can take a long time. Hence llbitmap will support lazy initial recovery in following patches. This method is used to check if data blocks is synced or not, if not then IO will still have to read all blocks for raid456. Link: https://lore.kernel.org/linux-raid/20250829080426.1441678-9-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com>
2025-09-06md/md-bitmap: add a new method skip_sync_blocks() in bitmap_operationsYu Kuai
This method is used to check if blocks can be skipped before calling into pers->sync_request(), llbitmap will use this method to skip resync for unwritten/clean data blocks, and recovery/check/repair for unwritten data blocks; Link: https://lore.kernel.org/linux-raid/20250829080426.1441678-8-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Xiao Ni <xni@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Li Nan <linan122@huawei.com>
2025-09-06md/md-bitmap: delay registration of bitmap_ops until creating bitmapYu Kuai
Currently bitmap_ops is registered while allocating mddev, this is fine when there is only one bitmap_ops. Delay setting bitmap_ops until creating bitmap, so that user can choose which bitmap to use before running the array. Link: https://lore.kernel.org/linux-raid/20250721171557.34587-7-yukuai@kernel.org Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Li Nan <linan122@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com>
2025-09-06md/md-bitmap: add a new sysfs api bitmap_typeYu Kuai
The api will be used by mdadm to set bitmap_type while creating new array or assembling array, prepare to add a new bitmap. Currently available options are: cat /sys/block/md0/md/bitmap_type none [bitmap] Link: https://lore.kernel.org/linux-raid/20250829080426.1441678-6-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Xiao Ni <xni@redhat.com> Reviewed-by: Li Nan <linan122@huawei.com>
2025-09-06md: add a new mddev field 'bitmap_id'Yu Kuai
Prepare to store the bitmap id selected by user, also refactor mddev_set_bitmap_ops a bit in case the value is invalid. Link: https://lore.kernel.org/linux-raid/20250829080426.1441678-5-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Li Nan <linan122@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com>
2025-09-06md/md-bitmap: support discard for bitmap opsYu Kuai
Use two new methods {start, end}_discard in bitmap_ops and a new field 'rw' in struct md_io_clone to handle discard IO, prepare to support new md bitmap. Since all bitmap functions to hanlde write IO are the same, also add typedef to make code cleaner. Link: https://lore.kernel.org/linux-raid/20250829080426.1441678-4-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Li Nan <linan122@huawei.com>
2025-09-06md: factor out a helper raid_is_456()Yu Kuai
There are no functional changes, the helper will be used by llbitmap in following patches. Link: https://lore.kernel.org/linux-raid/20250829080426.1441678-3-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Li Nan <linan122@huawei.com>
2025-09-06md: add a new parameter 'offset' to md_super_write()Yu Kuai
The parameter is always set to 0 for now, following patches will use this helper to write llbitmap to underlying disks, allow writing dirty sectors instead of the whole page. Also rename md_super_write to md_write_metadata since there is nothing super-block specific. Link: https://lore.kernel.org/linux-raid/20250829080426.1441678-2-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Li Nan <linan122@huawei.com>
2025-09-06md/md-bitmap: introduce CONFIG_MD_BITMAPYu Kuai
Now that all implementations are internal, it's sensible to add a config option for md-bitmap, and it's a good way for isolation. Link: https://lore.kernel.org/linux-raid/20250707012711.376844-16-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com>
2025-09-06md: check before referencing mddev->bitmap_opsYu Kuai
Prepare to introduce CONFIG_MD_BITMAP. Link: https://lore.kernel.org/linux-raid/20250707012711.376844-15-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com>
2025-09-06md/dm-raid: check before referencing mddev->bitmap_opsYu Kuai
Prepare to introduce CONFIG_MD_BITMAP. Link: https://lore.kernel.org/linux-raid/20250707012711.376844-14-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com>
2025-09-06md/raid5: check before referencing mddev->bitmap_opsYu Kuai
Prepare to introduce CONFIG_MD_BITMAP. Link: https://lore.kernel.org/linux-raid/20250707012711.376844-13-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com>
2025-09-06md/raid10: check before referencing mddev->bitmap_opsYu Kuai
Prepare to introduce CONFIG_MD_BITMAP. Link: https://lore.kernel.org/linux-raid/20250707012711.376844-12-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com>
2025-09-06md/raid1: check before referencing mddev->bitmap_opsYu Kuai
Prepare to introduce CONFIG_MD_BITMAP. Link: https://lore.kernel.org/linux-raid/20250707012711.376844-11-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com>
2025-09-06md/raid1: check bitmap before behind writeYu Kuai
behind write rely on bitmap, because the number of IO are recorded in bitmap->behind_writes, and callers rely on bitmap_wait_behind_writes() to wait for IO to be done. However, currently callers doesn't check if bitmap is enabeld before calling into behind methods. Hence if behind write start without bitmap, readers will not wait for slow write IO to be done and old data can be read in some corner cases. Link: https://lore.kernel.org/linux-raid/20250707012711.376844-10-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com>
2025-09-06md/md-bitmap: handle the case bitmap is not enabled before end_sync()Yu Kuai
This case can be handled without knowing internal implementation. Prepare to introduce CONFIG_MD_BITMAP. Link: https://lore.kernel.org/linux-raid/20250707012711.376844-9-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com>