summaryrefslogtreecommitdiff
path: root/drivers/block/loop.c
AgeCommit message (Collapse)Author
6 daysMerge tag 'for-6.17/block-20250728' of git://git.kernel.dk/linuxLinus Torvalds
Pull block updates from Jens Axboe: - MD pull request via Yu: - call del_gendisk synchronously (Xiao) - cleanup unused variable (John) - cleanup workqueue flags (Ryo) - fix faulty rdev can't be removed during resync (Qixing) - NVMe pull request via Christoph: - try PCIe function level reset on init failure (Keith Busch) - log TLS handshake failures at error level (Maurizio Lombardi) - pci-epf: do not complete commands twice if nvmet_req_init() fails (Rick Wertenbroek) - misc cleanups (Alok Tiwari) - Removal of the pktcdvd driver This has been more than a decade coming at this point, and some recently revealed breakages that had it causing issues even for cases where it isn't required made me re-pull the trigger on this one. It's known broken and nobody has stepped up to maintain the code - Series for ublk supporting batch commands, enabling the use of multishot where appropriate - Speed up ublk exit handling - Fix for the two-stage elevator fixing which could leak data - Convert NVMe to use the new IOVA based API - Increase default max transfer size to something more reasonable - Series fixing write operations on zoned DM devices - Add tracepoints for zoned block device operations - Prep series working towards improving blk-mq queue management in the presence of isolated CPUs - Don't allow updating of the block size of a loop device that is currently under exclusively ownership/open - Set chunk sectors from stacked device stripe size and use it for the atomic write size limit - Switch to folios in bcache read_super() - Fix for CD-ROM MRW exit flush handling - Various tweaks, fixes, and cleanups * tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux: (94 commits) block: restore two stage elevator switch while running nr_hw_queue update cdrom: Call cdrom_mrw_exit from cdrom_release function sunvdc: Balance device refcount in vdc_port_mpgroup_check nvme-pci: try function level reset on init failure dm: split write BIOs on zone boundaries when zone append is not emulated block: use chunk_sectors when evaluating stacked atomic write limits dm-stripe: limit chunk_sectors to the stripe size md/raid10: set chunk_sectors limit md/raid0: set chunk_sectors limit block: sanitize chunk_sectors for atomic write limits ilog2: add max_pow_of_two_factor() nvmet: pci-epf: Do not complete commands twice if nvmet_req_init() fails nvme-tcp: log TLS handshake failures at error level docs: nvme: fix grammar in nvme-pci-endpoint-target.rst nvme: fix typo in status code constant for self-test in progress nvmet: remove redundant assignment of error code in nvmet_ns_enable() nvme: fix incorrect variable in io cqes error message nvme: fix multiple spelling and grammar issues in host drivers block: fix blk_zone_append_update_request_bio() kernel-doc md/raid10: fix set but not used variable in sync_request_write() ...
2025-07-16loop: use kiocb helpers to fix lockdep warningMing Lei
The lockdep tool can report a circular lock dependency warning in the loop driver's AIO read/write path: ``` [ 6540.587728] kworker/u96:5/72779 is trying to acquire lock: [ 6540.593856] ff110001b5968440 (sb_writers#9){.+.+}-{0:0}, at: loop_process_work+0x11a/0xf70 [loop] [ 6540.603786] [ 6540.603786] but task is already holding lock: [ 6540.610291] ff110001b5968440 (sb_writers#9){.+.+}-{0:0}, at: loop_process_work+0x11a/0xf70 [loop] [ 6540.620210] [ 6540.620210] other info that might help us debug this: [ 6540.627499] Possible unsafe locking scenario: [ 6540.627499] [ 6540.634110] CPU0 [ 6540.636841] ---- [ 6540.639574] lock(sb_writers#9); [ 6540.643281] lock(sb_writers#9); [ 6540.646988] [ 6540.646988] *** DEADLOCK *** ``` This patch fixes the issue by using the AIO-specific helpers `kiocb_start_write()` and `kiocb_end_write()`. These functions are designed to be used with a `kiocb` and manage write sequencing correctly for asynchronous I/O without introducing the problematic lock dependency. The `kiocb` is already part of the `loop_cmd` struct, so this change also simplifies the completion function `lo_rw_aio_do_completion()` by using the `iocb` from the `cmd` struct directly, instead of retrieving the loop device from the request queue. Fixes: 39d86db34e41 ("loop: add file_start_write() and file_end_write()") Cc: Changhui Zhong <czhong@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250716114808.3159657-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-11loop: Avoid updating block size under exclusive ownerJan Kara
Syzbot came up with a reproducer where a loop device block size is changed underneath a mounted filesystem. This causes a mismatch between the block device block size and the block size stored in the superblock causing confusion in various places such as fs/buffer.c. The particular issue triggered by syzbot was a warning in __getblk_slow() due to requested buffer size not matching block device block size. Fix the problem by getting exclusive hold of the loop device to change its block size. This fails if somebody (such as filesystem) has already an exclusive ownership of the block device and thus prevents modifying the loop device under some exclusive owner which doesn't expect it. Reported-by: syzbot+01ef7a8da81a975e1ccd@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: syzbot+01ef7a8da81a975e1ccd@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20250711163202.19623-2-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-11loop: move lo_set_size() out of queue freezeMing Lei
It isn't necessary to freeze queue for updating disk size given submit_bio() doesn't grab queue usage counter for checking eod. Also many driver won't freeze queue for calling set_capacity_and_notify(). Move lo_set_size() out of queue freeze for fixing many lockdep warning report. Link: https://lore.kernel.org/linux-block/67ea99e0.050a0220.3c3d88.0042.GAE@google.com/ Reported-by: syzbot+9dd7dbb1a4b915dee638@syzkaller.appspotmail.com Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250611084938.108829-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-27loop: add file_start_write() and file_end_write()Ming Lei
file_start_write() and file_end_write() should be added around ->write_iter(). Recently we switch to ->write_iter() from vfs_iter_write(), and the implied file_start_write() and file_end_write() are lost. Also we never add them for dio code path, so add them back for covering both. Cc: Jeff Moyer <jmoyer@redhat.com> Fixes: f2fed441c69b ("loop: stop using vfs_iter_{read,write} for buffered I/O") Fixes: bc07c10a3603 ("block: loop: support DIO & AIO") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250527153405.837216-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-20loop: don't require ->write_iter for writable files in loop_configureChristoph Hellwig
Block devices can be opened read-write even if they can't be written to for historic reasons. Remove the check requiring file->f_op->write_iter when the block devices was opened in loop_configure. The call to loop_check_backing_file just below ensures the ->write_iter is present for backing files opened for writing, which is the only check that is actually needed. Fixes: f5c84eff634b ("loop: Add sanity check for read/write_iter") Reported-by: Christian Hesse <mail@eworm.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250520135420.1177312-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-05loop: Add sanity check for read/write_iterLizhi Xu
Some file systems do not support read_iter/write_iter, such as selinuxfs in this issue. So before calling them, first confirm that the interface is supported and then call it. It is releavant in that vfs_iter_read/write have the check, and removal of their used caused szybot to be able to hit this issue. Fixes: f2fed441c69b ("loop: stop using vfs_iter__{read,write} for buffered I/O") Reported-by: syzbot+6af973a3b8dfd2faefdc@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6af973a3b8dfd2faefdc Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250428143626.3318717-1-lizhi.xu@windriver.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-15loop: stop using vfs_iter_{read,write} for buffered I/OChristoph Hellwig
vfs_iter_{read,write} always perform direct I/O when the file has the O_DIRECT flag set, which breaks disabling direct I/O using the LOOP_SET_STATUS / LOOP_SET_STATUS64 ioctls. This was recenly reported as a regression, but as far as I can tell was only uncovered by better checking for block sizes and has been around since the direct I/O support was added. Fix this by using the existing aio code that calls the raw read/write iter methods instead. Note that despite the comments there is no need for block drivers to ever call flush_dcache_page themselves, and the call is a left-over from prehistoric times. Fixes: ab1cb278bc70 ("block: loop: introduce ioctl command of LOOP_SET_DIRECT_IO") Reported-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20250409130940.3685677-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-15loop: LOOP_SET_FD: send uevents for partitionsThomas Weißschuh
Remove the suppression of the uevents before scanning for partitions. The partitions inherit their suppression settings from their parent device, which lead to the uevents being dropped. This is similar to the same changes for LOOP_CONFIGURE done in commit bb430b694226 ("loop: LOOP_CONFIGURE: send uevents for partitions"). Fixes: 498ef5c777d9 ("loop: suppress uevents while reconfiguring the device") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20250415-loop-uevent-changed-v3-1-60ff69ac6088@linutronix.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-15loop: properly send KOBJ_CHANGED uevent for disk deviceThomas Weißschuh
The original commit message and the wording "uncork" in the code comment indicate that it is expected that the suppressed event instances are automatically sent after unsuppressing. This is not the case, instead they are discarded. In effect this means that no "changed" events are emitted on the device itself by default. While each discovered partition does trigger a changed event on the device, devices without partitions don't have any event emitted. This makes udev miss the device creation and prompted workarounds in userspace. See the linked util-linux/losetup bug. Explicitly emit the events and drop the confusingly worded comments. Link: https://github.com/util-linux/util-linux/issues/2434 Fixes: 498ef5c777d9 ("loop: suppress uevents while reconfiguring the device") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://lore.kernel.org/r/20250415-loop-uevent-changed-v2-1-0c4e6a923b2a@linutronix.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-15loop: aio inherit the ioprio of original requestYunlong Xing
Set cmd->iocb.ki_ioprio to the ioprio of loop device's request. The purpose is to inherit the original request ioprio in the aio flow. Signed-off-by: Yunlong Xing <yunlong.xing@unisoc.com> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250414030159.501180-1-yunlong.xing@unisoc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-18loop: move vfs_fsync() out of loop_update_dio()Ming Lei
If vfs_flush() is called with queue frozen, the queue freeze lock may be connected with FS internal lock, and lockdep warning can be triggered because the queue freeze lock is connected with too many global or sub-system locks. Fix the warning by moving vfs_fsync() out of loop_update_dio(): - vfs_fsync() is only needed when switching to dio - only loop_change_fd() and loop_configure() may switch from buffered IO to direct IO, so call vfs_fsync() directly here. This way is safe because either loop is in unbound, or new file isn't attached - for the other two cases of set_status and set_block_size, direct IO can only become off, so no need to call vfs_fsync() Cc: Christoph Hellwig <hch@infradead.org> Reported-by: Kun Hu <huk23@m.fudan.edu.cn> Reported-by: Jiaji Qin <jjtan24@m.fudan.edu.cn> Closes: https://lore.kernel.org/linux-block/359BC288-B0B1-4815-9F01-3A349B12E816@m.fudan.edu.cn/T/#u Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250318072955.3893805-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-04loop: Remove struct loop_func_tableZhu Yanjun
The struct is introduced in the commit 754d96798fab ("loop: remove loop.h"), but it is not used now. So remove it. No functional changes. Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250227163343.55952-1-yanjun.zhu@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-24loop: take the file system minimum dio alignment into accountChristoph Hellwig
The loop driver currently uses the logical block size of the underlying bdev as the lower bound of the loop device block size. While this works for many cases, it fails for file systems made up of multiple devices with different logical block sizes (e.g. XFS with a RT device that has a larger logical block size), or when the file systems doesn't support direct I/O writes at the sector size granularity (e.g. because it does out of place writes with a file system block size larger than the sector size). Fix this by querying the minimum direct I/O alignment from statx when available. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20250131120120.1315125-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-24loop: check in LO_FLAGS_DIRECT_IO in loop_default_blocksizeChristoph Hellwig
We can't go below the minimum direct I/O size no matter if direct I/O is enabled by passing in an O_DIRECT file descriptor or due to the explicit flag. Now that LO_FLAGS_DIRECT_IO is set earlier after assigning a backing file, loop_default_blocksize can check it instead of the O_DIRECT flag to handle both conditions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20250131120120.1315125-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-24loop: set LO_FLAGS_DIRECT_IO in loop_assign_backing_fileChristoph Hellwig
Assigning LO_FLAGS_DIRECT_IO from the O_DIRECT flag is related to assigning a new backing file. Move the assignment in preparation of using the flag more and earlier. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20250131120120.1315125-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-24loop: factor out a loop_assign_backing_file helperChristoph Hellwig
Split the code for setting up a backing file into a helper in preparation of adding more code to this path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20250131120120.1315125-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-18Revert "driver: block: release the lo_work_lock before queue_work"Zhaoyang Huang
This reverts commit ad934fc1784802fd1408224474b25ee5289fadfc. loop_queue_work should be strictly serialized to loop_process_work since the lo_worker could be freed without noticing new work has been queued again. Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Link: https://lore.kernel.org/r/20250218065835.19503-1-zhaoyang.huang@unisoc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-11loop: release the lo_work_lock before queue_workZhaoyang Huang
queue_work could spin on wq->cpu_pwq->pool->lock which could lead to concurrent loop_process_work failed on lo_work_lock contention and increase the request latency. Remove this combination by moving the lock release ahead of queue_work. Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Link: https://lore.kernel.org/r/20250207091942.3966756-1-zhaoyang.huang@unisoc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-31block: force noio scope in blk_mq_freeze_queueChristoph Hellwig
When block drivers or the core block code perform allocations with a frozen queue, this could try to recurse into the block device to reclaim memory and deadlock. Thus all allocations done by a process that froze a queue need to be done without __GFP_IO and __GFP_FS. Instead of tying to track all of them down, force a noio scope as part of freezing the queue. Note that nvme is a bit of a mess here due to the non-owner freezes, and they will be addressed separately. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250131120352.1315351-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-27loop: don't clear LO_FLAGS_PARTSCAN on LOOP_SET_STATUS{,64}Christoph Hellwig
LOOP_SET_STATUS{,64} can set a lot more flags than it is supposed to clear (the LOOP_SET_STATUS_CLEARABLE_FLAGS vs LOOP_SET_STATUS_SETTABLE_FLAGS defines should have been a hint..). Fix this by only clearing the bits in LOOP_SET_STATUS_CLEARABLE_FLAGS. Fixes: ae074d07a0e5 ("loop: move updating lo_flag s out of loop_set_status_from_info") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250127143045.538279-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10loop: remove the use_dio field in struct loop_deviceChristoph Hellwig
This field duplicate the LO_FLAGS_DIRECT_IO flag in lo_flags. Remove it to have a single source of truth about using direct I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250110073750.1582447-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10loop: don't freeze the queue in loop_update_dioChristoph Hellwig
All callers of loop_update_dio except for loop_configure already have the queue frozen, and loop_configure works on an unbound device. Remove the superfluous recursive freezing in loop_update_dio and add asserts for the locking and freezing state instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250110073750.1582447-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10loop: allow loop_set_status to re-enable direct I/OChristoph Hellwig
Unlike all other calls of (__)loop_update_dio, loop_set_status never looks at the O_DIRECT flag of the backing file, and thus doesn't re-enable direct I/O on an O_DIRECT backing file if e.g. the new block size would allow it. Fix that and remove the need for the separate __loop_update_dio flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250110073750.1582447-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10loop: open code the direct I/O flag update in loop_set_dioChristoph Hellwig
loop_set_dio is different from the other (__)loop_update_dio callers in that it doesn't take any implicit conditions into account and wants to update the direct I/O flag to the user passed in value and fail if that can't be done. Open code the logic here to prepare for simplifying the other direct I/O flag updates and to make the error handling less convoluted. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250110073750.1582447-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10loop: only write back pagecache when starting to to use direct I/OChristoph Hellwig
There is no point in doing an fdatasync to write out pages when switching away from direct I/O, as there won't be any. The writeback is only needed when switching to direct I/O, which would have to invalidate the pagecache less efficiently from the I/O path. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250110073750.1582447-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10loop: create a lo_can_use_dio helperChristoph Hellwig
Factor out a part of __loop_update_dio in preparation for further refactoring. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250110073750.1582447-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10loop: update commands in loop_set_status still referring to transfersChristoph Hellwig
The concept of transfers is gone since commit 47e9624616c8 ("block: remove support for cryptoloop and the xor transfer"). Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250110073750.1582447-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10loop: move updating lo_flags out of loop_set_status_from_infoChristoph Hellwig
While loop_configure simplify assigns the flags passed in by userspace, loop_set_status only looks at the two changeable flags, and currently has to do a complicate dance to implement that. Move assign lo->lo_flags out of loop_set_status_from_info into the callers and thus drastically simplify the lo_flags handling in loop_set_status. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250110073750.1582447-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10loop: fix queue freeze vs limits lock orderChristoph Hellwig
Match the locking order used by the core block code by only freezing the queue after taking the limits lock using the queue_limits_commit_update_frozen helper and document the callers that do not freeze the queue at all. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250110054726.1499538-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10loop: refactor queue limits updatesChristoph Hellwig
Replace loop_reconfigure_limits with a slightly less encompassing loop_update_limits that expects the caller to acquire and commit the queue limits to prepare for sorting out the freeze vs limits lock ordering. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Link: https://lore.kernel.org/r/20250110054726.1499538-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: remove BLK_MQ_F_SHOULD_MERGEChristoph Hellwig
BLK_MQ_F_SHOULD_MERGE is set for all tag_sets except those that purely process passthrough commands (bsg-lib, ufs tmf, various nvme admin queues) and thus don't even check the flag. Remove it to simplify the driver interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241219060214.1928848-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-19loop: Fix ABBA locking raceOGAWA Hirofumi
Current loop calls vfs_statfs() while holding the q->limits_lock. If FS takes some locking in vfs_statfs callback, this may lead to ABBA locking bug (at least, FAT fs has this issue actually). So this patch calls vfs_statfs() outside q->limits_locks instead, because looks like no reason to hold q->limits_locks while getting discord configs. Chain exists of: &sbi->fat_lock --> &q->q_usage_counter(io)#17 --> &q->limits_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&q->limits_lock); lock(&q->q_usage_counter(io)#17); lock(&q->limits_lock); lock(&sbi->fat_lock); *** DEADLOCK *** Reported-by: syzbot+a5d8c609c02f508672cc@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a5d8c609c02f508672cc Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-09loop: fix type of block sizeLi Wang
PAGE_SIZE may be 64K, and the max block size can be PAGE_SIZE, so any variable for holding block size can't be defined as 'unsigned short'. Unfortunately commit 473516b36193 ("loop: use the atomic queue limits update API") passes 'bsize' with type of 'unsigned short' to loop_reconfigure_limits(), and causes LTP/ioctl_loop06 test failure: 12 ioctl_loop06.c:76: TINFO: Using LOOP_SET_BLOCK_SIZE with arg > PAGE_SIZE 13 ioctl_loop06.c:59: TFAIL: Set block size succeed unexpectedly ... 18 ioctl_loop06.c:76: TINFO: Using LOOP_CONFIGURE with block_size > PAGE_SIZE 19 ioctl_loop06.c:59: TFAIL: Set block size succeed unexpectedly Fixes the issue by defining 'block size' variable with 'unsigned int', which is aligned with block layer's definition. (improve commit log & add fixes tag) Fixes: 473516b36193 ("loop: use the atomic queue limits update API") Cc: John Garry <john.g.garry@oracle.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Li Wang <liwang@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241109022744.1126003-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-01loop: Simplify discard granularity calcJohn Garry
A bdev discard granularity is always at least SECTOR_SIZE, so don't check for a zero value. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241101092215.422428-1-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-30loop: Use bdev limit helpers for configuring discardJohn Garry
Instead of directly looking at the request_queue limits, use the bdev limits helpers, which is preferable. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241030111900.3981223-1-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-15Merge tag 'for-6.11/block-20240710' of git://git.kernel.dk/linuxLinus Torvalds
Pull block updates from Jens Axboe: - NVMe updates via Keith: - Device initialization memory leak fixes (Keith) - More constants defined (Weiwen) - Target debugfs support (Hannes) - PCIe subsystem reset enhancements (Keith) - Queue-depth multipath policy (Redhat and PureStorage) - Implement get_unique_id (Christoph) - Authentication error fixes (Gaosheng) - MD updates via Song - sync_action fix and refactoring (Yu Kuai) - Various small fixes (Christoph Hellwig, Li Nan, and Ofir Gal, Yu Kuai, Benjamin Marzinski, Christophe JAILLET, Yang Li) - Fix loop detach/open race (Gulam) - Fix lower control limit for blk-throttle (Yu) - Add module descriptions to various drivers (Jeff) - Add support for atomic writes for block devices, and statx reporting for same. Includes SCSI and NVMe (John, Prasad, Alan) - Add IO priority information to block trace points (Dongliang) - Various zone improvements and tweaks (Damien) - mq-deadline tag reservation improvements (Bart) - Ignore direct reclaim swap writes in writeback throttling (Baokun) - Block integrity improvements and fixes (Anuj) - Add basic support for rust based block drivers. Has a dummy null_blk variant for now (Andreas) - Series converting driver settings to queue limits, and cleanups and fixes related to that (Christoph) - Cleanup for poking too deeply into the bvec internals, in preparation for DMA mapping API changes (Christoph) - Various minor tweaks and fixes (Jiapeng, John, Kanchan, Mikulas, Ming, Zhu, Damien, Christophe, Chaitanya) * tag 'for-6.11/block-20240710' of git://git.kernel.dk/linux: (206 commits) floppy: add missing MODULE_DESCRIPTION() macro loop: add missing MODULE_DESCRIPTION() macro ublk_drv: add missing MODULE_DESCRIPTION() macro xen/blkback: add missing MODULE_DESCRIPTION() macro block/rnbd: Constify struct kobj_type block: take offset into account in blk_bvec_map_sg again block: fix get_max_segment_size() warning loop: Don't bother validating blocksize virtio_blk: Don't bother validating blocksize null_blk: Don't bother validating blocksize block: Validate logical block size in blk_validate_limits() virtio_blk: Fix default logical block size fallback nvmet-auth: fix nvmet_auth hash error handling nvme: implement ->get_unique_id block: pass a phys_addr_t to get_max_segment_size block: add a bvec_phys helper blk-lib: check for kill signal in ioctl BLKZEROOUT block: limit the Write Zeroes to manually writing zeroes fallback block: refacto blkdev_issue_zeroout block: move read-only and supported checks into (__)blkdev_issue_zeroout ...
2024-07-10loop: add missing MODULE_DESCRIPTION() macroJeff Johnson
make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/block/loop.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20240602-md-block-loop-v1-1-b9b7e2603e72@quicinc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-09loop: Don't bother validating blocksizeJohn Garry
The block queue limits validation does this for us now. The loop_configure() -> WARN_ON_ONCE() call is dropped, as an invalid block size would trigger this now. We don't want userspace to be able to directly trigger WARNs. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20240708091651.177447-6-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-05loop: remove the unused inode variable in loop_configureChristoph Hellwig
Remove the inode variable now that the last user is gone. Fixes: a17ece76bcfe ("loop: regularize upgrading the block size for direct I/O") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240705053114.2042976-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-28loop: don't set QUEUE_FLAG_NOMERGESChristoph Hellwig
QUEUE_FLAG_NOMERGES isn't really a driver interface, but a user tunable. There also isn't any good reason to set it in the loop driver. The original commit adding it (5b5e20f421c0b6d "block: loop: set QUEUE_FLAG_NOMERGES for request queue of loop") claims that "It doesn't make sense to enable merge because the I/O submitted to backing file is handled page by page." which of course isn't true for multi-page bvec now, and it never has been for direct I/O, for which commit 40326d8a33d ("block/loop: allow request merge for directio mode") alredy disabled the nomerges flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240627124926.512662-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-27loop: Fix a race between loop detach and loop openGulam Mohamed
1. Userspace sends the command "losetup -d" which uses the open() call to open the device 2. Kernel receives the ioctl command "LOOP_CLR_FD" which calls the function loop_clr_fd() 3. If LOOP_CLR_FD is the first command received at the time, then the AUTOCLEAR flag is not set and deletion of the loop device proceeds ahead and scans the partitions (drop/add partitions) if (disk_openers(lo->lo_disk) > 1) { lo->lo_flags |= LO_FLAGS_AUTOCLEAR; loop_global_unlock(lo, true); return 0; } 4. Before scanning partitions, it will check to see if any partition of the loop device is currently opened 5. If any partition is opened, then it will return EBUSY: if (disk->open_partitions) return -EBUSY; 6. So, after receiving the "LOOP_CLR_FD" command and just before the above check for open_partitions, if any other command (like blkid) opens any partition of the loop device, then the partition scan will not proceed and EBUSY is returned as shown in above code 7. But in "__loop_clr_fd()", this EBUSY error is not propagated 8. We have noticed that this is causing the partitions of the loop to remain stale even after the loop device is detached resulting in the IO errors on the partitions Fix: Defer the detach of loop device to release function, which is called when the last close happens, by setting the lo_flags to LO_FLAGS_AUTOCLEAR at the time of detach i.e in loop_clr_fd() function. Test case involves the following two scripts: script1.sh: while [ 1 ]; do losetup -P -f /home/opt/looptest/test10.img blkid /dev/loop0p1 done script2.sh: while [ 1 ]; do losetup -d /dev/loop0 done Without fix, the following IO errors have been observed: kernel: __loop_clr_fd: partition scan of loop0 failed (rc=-16) kernel: I/O error, dev loop0, sector 20971392 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 kernel: I/O error, dev loop0, sector 108868 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 kernel: Buffer I/O error on dev loop0p1, logical block 27201, async page read Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240618164042.343777-1-gulam.mohamed@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19block: move the nonrot flag to queue_limitsChristoph Hellwig
Move the nonrot flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Use the chance to switch to defaulting to non-rotational and require the driver to opt into rotational, which matches the polarity of the sysfs interface. For the z2ram, ps3vram, 2x memstick, ubiblock and dcssblk the new rotational flag is not set as they clearly are not rotational despite this being a behavior change. There are some other drivers that unconditionally set the rotational flag to keep the existing behavior as they arguably can be used on rotational devices even if that is probably not their main use today (e.g. virtio_blk and drbd). The flag is automatically inherited in blk_stack_limits matching the existing behavior in dm and md. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19block: move cache control settings out of queue->flagsChristoph Hellwig
Move the cache control settings into the queue_limits so that the flags can be set atomically with the device queue frozen. Add new features and flags field for the driver set flags, and internal (usually sysfs-controlled) flags in the block layer. Note that we'll eventually remove enough field from queue_limits to bring it back to the previous size. The disable flag is inverted compared to the previous meaning, which means it now survives a rescan, similar to the max_sectors and max_discard_sectors user limits. The FLUSH and FUA flags are now inherited by blk_stack_limits, which simplified the code in dm a lot, but also causes a slight behavior change in that dm-switch and dm-unstripe now advertise a write cache despite setting num_flush_bios to 0. The I/O path will handle this gracefully, but as far as I can tell the lack of num_flush_bios and thus flush support is a pre-existing data integrity bug in those targets that really needs fixing, after which a non-zero num_flush_bios should be required in dm for targets that map to underlying devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19loop: fold loop_update_rotational into loop_reconfigure_limitsChristoph Hellwig
This prepares for moving the rotational flag into the queue_limits and also fixes it for the case where the loop device is backed by a block device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240617060532.127975-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19loop: also use the default block size from an underlying block deviceChristoph Hellwig
Fix the code in loop_reconfigure_limits to pick a default block size for O_DIRECT file descriptors to also work when the loop device sits on top of a block device and not just on a regular file on a block device based file system. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240617060532.127975-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19loop: regularize upgrading the block size for direct I/OChristoph Hellwig
The LOOP_CONFIGURE path automatically upgrades the block size to that of the underlying file for O_DIRECT file descriptors, but the LOOP_SET_BLOCK_SIZE path does not. Fix this by lifting the code to pick the block size into common code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240617060532.127975-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19loop: always update discard settings in loop_reconfigure_limitsChristoph Hellwig
Simplify loop_reconfigure_limits by always updating the discard limits. This adds a little more work to loop_set_block_size, but doesn't change the outcome as the discard flag won't change. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240617060532.127975-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19loop: stop using loop_reconfigure_limits in __loop_clr_fdChristoph Hellwig
__loop_clr_fd wants to clear all settings on the device. Prepare for moving more settings into the block limits by open coding loop_reconfigure_limits. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240617060532.127975-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-14loop: Disable fallocate() zero and discard if not supportedCyril Hrubis
If fallcate is implemented but zero and discard operations are not supported by the filesystem the backing file is on we continue to fill dmesg with errors from the blk_mq_end_request() since each time we call fallocate() on the loop device the EOPNOTSUPP error from lo_fallocate() ends up propagated into the block layer. In the end syscall succeeds since the blkdev_issue_zeroout() falls back to writing zeroes which makes the errors even more misleading and confusing. How to reproduce: 1. make sure /tmp is mounted as tmpfs 2. dd if=/dev/zero of=/tmp/disk.img bs=1M count=100 3. losetup /dev/loop0 /tmp/disk.img 4. mkfs.ext2 /dev/loop0 5. dmesg |tail [710690.898214] operation not supported error, dev loop0, sector 204672 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 [710690.898279] operation not supported error, dev loop0, sector 522 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 [710690.898603] operation not supported error, dev loop0, sector 16906 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 [710690.898917] operation not supported error, dev loop0, sector 32774 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 [710690.899218] operation not supported error, dev loop0, sector 49674 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 [710690.899484] operation not supported error, dev loop0, sector 65542 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 [710690.899743] operation not supported error, dev loop0, sector 82442 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 [710690.900015] operation not supported error, dev loop0, sector 98310 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 [710690.900276] operation not supported error, dev loop0, sector 115210 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 [710690.900546] operation not supported error, dev loop0, sector 131078 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 This patch changes the lo_fallocate() to clear the flags for zero and discard operations if we get EOPNOTSUPP from the backing file fallocate callback, that way we at least stop spewing errors after the first unsuccessful try. CC: Jan Kara <jack@suse.cz> Signed-off-by: Cyril Hrubis <chrubis@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240613163817.22640-1-chrubis@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>