Age | Commit message (Collapse) | Author |
|
Pull block updates from Jens Axboe:
- MD pull request via Yu:
- call del_gendisk synchronously (Xiao)
- cleanup unused variable (John)
- cleanup workqueue flags (Ryo)
- fix faulty rdev can't be removed during resync (Qixing)
- NVMe pull request via Christoph:
- try PCIe function level reset on init failure (Keith Busch)
- log TLS handshake failures at error level (Maurizio Lombardi)
- pci-epf: do not complete commands twice if nvmet_req_init()
fails (Rick Wertenbroek)
- misc cleanups (Alok Tiwari)
- Removal of the pktcdvd driver
This has been more than a decade coming at this point, and some
recently revealed breakages that had it causing issues even for cases
where it isn't required made me re-pull the trigger on this one. It's
known broken and nobody has stepped up to maintain the code
- Series for ublk supporting batch commands, enabling the use of
multishot where appropriate
- Speed up ublk exit handling
- Fix for the two-stage elevator fixing which could leak data
- Convert NVMe to use the new IOVA based API
- Increase default max transfer size to something more reasonable
- Series fixing write operations on zoned DM devices
- Add tracepoints for zoned block device operations
- Prep series working towards improving blk-mq queue management in the
presence of isolated CPUs
- Don't allow updating of the block size of a loop device that is
currently under exclusively ownership/open
- Set chunk sectors from stacked device stripe size and use it for the
atomic write size limit
- Switch to folios in bcache read_super()
- Fix for CD-ROM MRW exit flush handling
- Various tweaks, fixes, and cleanups
* tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux: (94 commits)
block: restore two stage elevator switch while running nr_hw_queue update
cdrom: Call cdrom_mrw_exit from cdrom_release function
sunvdc: Balance device refcount in vdc_port_mpgroup_check
nvme-pci: try function level reset on init failure
dm: split write BIOs on zone boundaries when zone append is not emulated
block: use chunk_sectors when evaluating stacked atomic write limits
dm-stripe: limit chunk_sectors to the stripe size
md/raid10: set chunk_sectors limit
md/raid0: set chunk_sectors limit
block: sanitize chunk_sectors for atomic write limits
ilog2: add max_pow_of_two_factor()
nvmet: pci-epf: Do not complete commands twice if nvmet_req_init() fails
nvme-tcp: log TLS handshake failures at error level
docs: nvme: fix grammar in nvme-pci-endpoint-target.rst
nvme: fix typo in status code constant for self-test in progress
nvmet: remove redundant assignment of error code in nvmet_ns_enable()
nvme: fix incorrect variable in io cqes error message
nvme: fix multiple spelling and grammar issues in host drivers
block: fix blk_zone_append_update_request_bio() kernel-doc
md/raid10: fix set but not used variable in sync_request_write()
...
|
|
The lockdep tool can report a circular lock dependency warning in the loop
driver's AIO read/write path:
```
[ 6540.587728] kworker/u96:5/72779 is trying to acquire lock:
[ 6540.593856] ff110001b5968440 (sb_writers#9){.+.+}-{0:0}, at: loop_process_work+0x11a/0xf70 [loop]
[ 6540.603786]
[ 6540.603786] but task is already holding lock:
[ 6540.610291] ff110001b5968440 (sb_writers#9){.+.+}-{0:0}, at: loop_process_work+0x11a/0xf70 [loop]
[ 6540.620210]
[ 6540.620210] other info that might help us debug this:
[ 6540.627499] Possible unsafe locking scenario:
[ 6540.627499]
[ 6540.634110] CPU0
[ 6540.636841] ----
[ 6540.639574] lock(sb_writers#9);
[ 6540.643281] lock(sb_writers#9);
[ 6540.646988]
[ 6540.646988] *** DEADLOCK ***
```
This patch fixes the issue by using the AIO-specific helpers
`kiocb_start_write()` and `kiocb_end_write()`. These functions are
designed to be used with a `kiocb` and manage write sequencing
correctly for asynchronous I/O without introducing the problematic
lock dependency.
The `kiocb` is already part of the `loop_cmd` struct, so this change
also simplifies the completion function `lo_rw_aio_do_completion()` by
using the `iocb` from the `cmd` struct directly, instead of retrieving
the loop device from the request queue.
Fixes: 39d86db34e41 ("loop: add file_start_write() and file_end_write()")
Cc: Changhui Zhong <czhong@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250716114808.3159657-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Syzbot came up with a reproducer where a loop device block size is
changed underneath a mounted filesystem. This causes a mismatch between
the block device block size and the block size stored in the superblock
causing confusion in various places such as fs/buffer.c. The particular
issue triggered by syzbot was a warning in __getblk_slow() due to
requested buffer size not matching block device block size.
Fix the problem by getting exclusive hold of the loop device to change
its block size. This fails if somebody (such as filesystem) has already
an exclusive ownership of the block device and thus prevents modifying
the loop device under some exclusive owner which doesn't expect it.
Reported-by: syzbot+01ef7a8da81a975e1ccd@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Tested-by: syzbot+01ef7a8da81a975e1ccd@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20250711163202.19623-2-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
It isn't necessary to freeze queue for updating disk size given submit_bio()
doesn't grab queue usage counter for checking eod.
Also many driver won't freeze queue for calling set_capacity_and_notify().
Move lo_set_size() out of queue freeze for fixing many lockdep warning
report.
Link: https://lore.kernel.org/linux-block/67ea99e0.050a0220.3c3d88.0042.GAE@google.com/
Reported-by: syzbot+9dd7dbb1a4b915dee638@syzkaller.appspotmail.com
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250611084938.108829-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
file_start_write() and file_end_write() should be added around ->write_iter().
Recently we switch to ->write_iter() from vfs_iter_write(), and the
implied file_start_write() and file_end_write() are lost.
Also we never add them for dio code path, so add them back for covering
both.
Cc: Jeff Moyer <jmoyer@redhat.com>
Fixes: f2fed441c69b ("loop: stop using vfs_iter_{read,write} for buffered I/O")
Fixes: bc07c10a3603 ("block: loop: support DIO & AIO")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250527153405.837216-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Block devices can be opened read-write even if they can't be written to
for historic reasons. Remove the check requiring file->f_op->write_iter
when the block devices was opened in loop_configure. The call to
loop_check_backing_file just below ensures the ->write_iter is present
for backing files opened for writing, which is the only check that is
actually needed.
Fixes: f5c84eff634b ("loop: Add sanity check for read/write_iter")
Reported-by: Christian Hesse <mail@eworm.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250520135420.1177312-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Some file systems do not support read_iter/write_iter, such as selinuxfs
in this issue.
So before calling them, first confirm that the interface is supported and
then call it.
It is releavant in that vfs_iter_read/write have the check, and removal
of their used caused szybot to be able to hit this issue.
Fixes: f2fed441c69b ("loop: stop using vfs_iter__{read,write} for buffered I/O")
Reported-by: syzbot+6af973a3b8dfd2faefdc@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6af973a3b8dfd2faefdc
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250428143626.3318717-1-lizhi.xu@windriver.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
vfs_iter_{read,write} always perform direct I/O when the file has the
O_DIRECT flag set, which breaks disabling direct I/O using the
LOOP_SET_STATUS / LOOP_SET_STATUS64 ioctls.
This was recenly reported as a regression, but as far as I can tell
was only uncovered by better checking for block sizes and has been
around since the direct I/O support was added.
Fix this by using the existing aio code that calls the raw read/write
iter methods instead. Note that despite the comments there is no need
for block drivers to ever call flush_dcache_page themselves, and the
call is a left-over from prehistoric times.
Fixes: ab1cb278bc70 ("block: loop: introduce ioctl command of LOOP_SET_DIRECT_IO")
Reported-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20250409130940.3685677-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Remove the suppression of the uevents before scanning for partitions.
The partitions inherit their suppression settings from their parent device,
which lead to the uevents being dropped.
This is similar to the same changes for LOOP_CONFIGURE done in
commit bb430b694226 ("loop: LOOP_CONFIGURE: send uevents for partitions").
Fixes: 498ef5c777d9 ("loop: suppress uevents while reconfiguring the device")
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20250415-loop-uevent-changed-v3-1-60ff69ac6088@linutronix.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The original commit message and the wording "uncork" in the code comment
indicate that it is expected that the suppressed event instances are
automatically sent after unsuppressing.
This is not the case, instead they are discarded.
In effect this means that no "changed" events are emitted on the device
itself by default.
While each discovered partition does trigger a changed event on the
device, devices without partitions don't have any event emitted.
This makes udev miss the device creation and prompted workarounds in
userspace. See the linked util-linux/losetup bug.
Explicitly emit the events and drop the confusingly worded comments.
Link: https://github.com/util-linux/util-linux/issues/2434
Fixes: 498ef5c777d9 ("loop: suppress uevents while reconfiguring the device")
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://lore.kernel.org/r/20250415-loop-uevent-changed-v2-1-0c4e6a923b2a@linutronix.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Set cmd->iocb.ki_ioprio to the ioprio of loop device's request.
The purpose is to inherit the original request ioprio in the aio
flow.
Signed-off-by: Yunlong Xing <yunlong.xing@unisoc.com>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250414030159.501180-1-yunlong.xing@unisoc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If vfs_flush() is called with queue frozen, the queue freeze lock may be
connected with FS internal lock, and lockdep warning can be triggered
because the queue freeze lock is connected with too many global or
sub-system locks.
Fix the warning by moving vfs_fsync() out of loop_update_dio():
- vfs_fsync() is only needed when switching to dio
- only loop_change_fd() and loop_configure() may switch from buffered
IO to direct IO, so call vfs_fsync() directly here. This way is safe
because either loop is in unbound, or new file isn't attached
- for the other two cases of set_status and set_block_size, direct IO
can only become off, so no need to call vfs_fsync()
Cc: Christoph Hellwig <hch@infradead.org>
Reported-by: Kun Hu <huk23@m.fudan.edu.cn>
Reported-by: Jiaji Qin <jjtan24@m.fudan.edu.cn>
Closes: https://lore.kernel.org/linux-block/359BC288-B0B1-4815-9F01-3A349B12E816@m.fudan.edu.cn/T/#u
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250318072955.3893805-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The struct is introduced in the commit 754d96798fab
("loop: remove loop.h"), but it is not used now.
So remove it.
No functional changes.
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250227163343.55952-1-yanjun.zhu@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The loop driver currently uses the logical block size of the underlying
bdev as the lower bound of the loop device block size. While this works
for many cases, it fails for file systems made up of multiple devices
with different logical block sizes (e.g. XFS with a RT device that has a
larger logical block size), or when the file systems doesn't support
direct I/O writes at the sector size granularity (e.g. because it does
out of place writes with a file system block size larger than the sector
size).
Fix this by querying the minimum direct I/O alignment from statx when
available.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20250131120120.1315125-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We can't go below the minimum direct I/O size no matter if direct I/O is
enabled by passing in an O_DIRECT file descriptor or due to the explicit
flag. Now that LO_FLAGS_DIRECT_IO is set earlier after assigning a
backing file, loop_default_blocksize can check it instead of the
O_DIRECT flag to handle both conditions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20250131120120.1315125-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Assigning LO_FLAGS_DIRECT_IO from the O_DIRECT flag is related to
assigning a new backing file. Move the assignment in preparation
of using the flag more and earlier.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20250131120120.1315125-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Split the code for setting up a backing file into a helper in preparation
of adding more code to this path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20250131120120.1315125-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This reverts commit ad934fc1784802fd1408224474b25ee5289fadfc.
loop_queue_work should be strictly serialized to loop_process_work since
the lo_worker could be freed without noticing new work has been queued
again.
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Link: https://lore.kernel.org/r/20250218065835.19503-1-zhaoyang.huang@unisoc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
queue_work could spin on wq->cpu_pwq->pool->lock which could lead to
concurrent loop_process_work failed on lo_work_lock contention and
increase the request latency. Remove this combination by moving the
lock release ahead of queue_work.
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Link: https://lore.kernel.org/r/20250207091942.3966756-1-zhaoyang.huang@unisoc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When block drivers or the core block code perform allocations with a
frozen queue, this could try to recurse into the block device to
reclaim memory and deadlock. Thus all allocations done by a process
that froze a queue need to be done without __GFP_IO and __GFP_FS.
Instead of tying to track all of them down, force a noio scope as
part of freezing the queue.
Note that nvme is a bit of a mess here due to the non-owner freezes,
and they will be addressed separately.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250131120352.1315351-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
LOOP_SET_STATUS{,64} can set a lot more flags than it is supposed to
clear (the LOOP_SET_STATUS_CLEARABLE_FLAGS vs
LOOP_SET_STATUS_SETTABLE_FLAGS defines should have been a hint..).
Fix this by only clearing the bits in LOOP_SET_STATUS_CLEARABLE_FLAGS.
Fixes: ae074d07a0e5 ("loop: move updating lo_flag s out of loop_set_status_from_info")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250127143045.538279-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This field duplicate the LO_FLAGS_DIRECT_IO flag in lo_flags. Remove it
to have a single source of truth about using direct I/O.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250110073750.1582447-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
All callers of loop_update_dio except for loop_configure already have the
queue frozen, and loop_configure works on an unbound device. Remove the
superfluous recursive freezing in loop_update_dio and add asserts for the
locking and freezing state instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250110073750.1582447-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Unlike all other calls of (__)loop_update_dio, loop_set_status never
looks at the O_DIRECT flag of the backing file, and thus doesn't
re-enable direct I/O on an O_DIRECT backing file if e.g. the new block
size would allow it. Fix that and remove the need for the separate
__loop_update_dio flag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250110073750.1582447-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
loop_set_dio is different from the other (__)loop_update_dio callers in
that it doesn't take any implicit conditions into account and wants to
update the direct I/O flag to the user passed in value and fail if that
can't be done.
Open code the logic here to prepare for simplifying the other direct I/O
flag updates and to make the error handling less convoluted.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250110073750.1582447-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There is no point in doing an fdatasync to write out pages when switching
away from direct I/O, as there won't be any. The writeback is only
needed when switching to direct I/O, which would have to invalidate the
pagecache less efficiently from the I/O path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250110073750.1582447-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Factor out a part of __loop_update_dio in preparation for further
refactoring.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250110073750.1582447-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The concept of transfers is gone since commit 47e9624616c8 ("block:
remove support for cryptoloop and the xor transfer").
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250110073750.1582447-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
While loop_configure simplify assigns the flags passed in by userspace,
loop_set_status only looks at the two changeable flags, and currently
has to do a complicate dance to implement that.
Move assign lo->lo_flags out of loop_set_status_from_info into the
callers and thus drastically simplify the lo_flags handling in
loop_set_status.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250110073750.1582447-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Match the locking order used by the core block code by only freezing
the queue after taking the limits lock using the
queue_limits_commit_update_frozen helper and document the callers that
do not freeze the queue at all.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20250110054726.1499538-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Replace loop_reconfigure_limits with a slightly less encompassing
loop_update_limits that expects the caller to acquire and commit the
queue limits to prepare for sorting out the freeze vs limits lock
ordering.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Link: https://lore.kernel.org/r/20250110054726.1499538-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
BLK_MQ_F_SHOULD_MERGE is set for all tag_sets except those that purely
process passthrough commands (bsg-lib, ufs tmf, various nvme admin
queues) and thus don't even check the flag. Remove it to simplify the
driver interface.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241219060214.1928848-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Current loop calls vfs_statfs() while holding the q->limits_lock. If
FS takes some locking in vfs_statfs callback, this may lead to ABBA
locking bug (at least, FAT fs has this issue actually).
So this patch calls vfs_statfs() outside q->limits_locks instead,
because looks like no reason to hold q->limits_locks while getting
discord configs.
Chain exists of:
&sbi->fat_lock --> &q->q_usage_counter(io)#17 --> &q->limits_lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&q->limits_lock);
lock(&q->q_usage_counter(io)#17);
lock(&q->limits_lock);
lock(&sbi->fat_lock);
*** DEADLOCK ***
Reported-by: syzbot+a5d8c609c02f508672cc@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=a5d8c609c02f508672cc
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
PAGE_SIZE may be 64K, and the max block size can be PAGE_SIZE, so any
variable for holding block size can't be defined as 'unsigned short'.
Unfortunately commit 473516b36193 ("loop: use the atomic queue limits
update API") passes 'bsize' with type of 'unsigned short' to
loop_reconfigure_limits(), and causes LTP/ioctl_loop06 test failure:
12 ioctl_loop06.c:76: TINFO: Using LOOP_SET_BLOCK_SIZE with arg > PAGE_SIZE
13 ioctl_loop06.c:59: TFAIL: Set block size succeed unexpectedly
...
18 ioctl_loop06.c:76: TINFO: Using LOOP_CONFIGURE with block_size > PAGE_SIZE
19 ioctl_loop06.c:59: TFAIL: Set block size succeed unexpectedly
Fixes the issue by defining 'block size' variable with 'unsigned int', which is
aligned with block layer's definition.
(improve commit log & add fixes tag)
Fixes: 473516b36193 ("loop: use the atomic queue limits update API")
Cc: John Garry <john.g.garry@oracle.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Li Wang <liwang@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241109022744.1126003-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
A bdev discard granularity is always at least SECTOR_SIZE, so don't check
for a zero value.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241101092215.422428-1-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Instead of directly looking at the request_queue limits, use the bdev
limits helpers, which is preferable.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241030111900.3981223-1-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull block updates from Jens Axboe:
- NVMe updates via Keith:
- Device initialization memory leak fixes (Keith)
- More constants defined (Weiwen)
- Target debugfs support (Hannes)
- PCIe subsystem reset enhancements (Keith)
- Queue-depth multipath policy (Redhat and PureStorage)
- Implement get_unique_id (Christoph)
- Authentication error fixes (Gaosheng)
- MD updates via Song
- sync_action fix and refactoring (Yu Kuai)
- Various small fixes (Christoph Hellwig, Li Nan, and Ofir Gal, Yu
Kuai, Benjamin Marzinski, Christophe JAILLET, Yang Li)
- Fix loop detach/open race (Gulam)
- Fix lower control limit for blk-throttle (Yu)
- Add module descriptions to various drivers (Jeff)
- Add support for atomic writes for block devices, and statx reporting
for same. Includes SCSI and NVMe (John, Prasad, Alan)
- Add IO priority information to block trace points (Dongliang)
- Various zone improvements and tweaks (Damien)
- mq-deadline tag reservation improvements (Bart)
- Ignore direct reclaim swap writes in writeback throttling (Baokun)
- Block integrity improvements and fixes (Anuj)
- Add basic support for rust based block drivers. Has a dummy null_blk
variant for now (Andreas)
- Series converting driver settings to queue limits, and cleanups and
fixes related to that (Christoph)
- Cleanup for poking too deeply into the bvec internals, in preparation
for DMA mapping API changes (Christoph)
- Various minor tweaks and fixes (Jiapeng, John, Kanchan, Mikulas,
Ming, Zhu, Damien, Christophe, Chaitanya)
* tag 'for-6.11/block-20240710' of git://git.kernel.dk/linux: (206 commits)
floppy: add missing MODULE_DESCRIPTION() macro
loop: add missing MODULE_DESCRIPTION() macro
ublk_drv: add missing MODULE_DESCRIPTION() macro
xen/blkback: add missing MODULE_DESCRIPTION() macro
block/rnbd: Constify struct kobj_type
block: take offset into account in blk_bvec_map_sg again
block: fix get_max_segment_size() warning
loop: Don't bother validating blocksize
virtio_blk: Don't bother validating blocksize
null_blk: Don't bother validating blocksize
block: Validate logical block size in blk_validate_limits()
virtio_blk: Fix default logical block size fallback
nvmet-auth: fix nvmet_auth hash error handling
nvme: implement ->get_unique_id
block: pass a phys_addr_t to get_max_segment_size
block: add a bvec_phys helper
blk-lib: check for kill signal in ioctl BLKZEROOUT
block: limit the Write Zeroes to manually writing zeroes fallback
block: refacto blkdev_issue_zeroout
block: move read-only and supported checks into (__)blkdev_issue_zeroout
...
|
|
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/block/loop.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20240602-md-block-loop-v1-1-b9b7e2603e72@quicinc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The block queue limits validation does this for us now.
The loop_configure() -> WARN_ON_ONCE() call is dropped, as an invalid
block size would trigger this now. We don't want userspace to be able to
directly trigger WARNs.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20240708091651.177447-6-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Remove the inode variable now that the last user is gone.
Fixes: a17ece76bcfe ("loop: regularize upgrading the block size for direct I/O")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240705053114.2042976-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
QUEUE_FLAG_NOMERGES isn't really a driver interface, but a user tunable.
There also isn't any good reason to set it in the loop driver.
The original commit adding it (5b5e20f421c0b6d "block: loop: set
QUEUE_FLAG_NOMERGES for request queue of loop") claims that "It doesn't
make sense to enable merge because the I/O submitted to backing file is
handled page by page." which of course isn't true for multi-page bvec
now, and it never has been for direct I/O, for which commit 40326d8a33d
("block/loop: allow request merge for directio mode") alredy disabled
the nomerges flag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240627124926.512662-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
1. Userspace sends the command "losetup -d" which uses the open() call
to open the device
2. Kernel receives the ioctl command "LOOP_CLR_FD" which calls the
function loop_clr_fd()
3. If LOOP_CLR_FD is the first command received at the time, then the
AUTOCLEAR flag is not set and deletion of the
loop device proceeds ahead and scans the partitions (drop/add
partitions)
if (disk_openers(lo->lo_disk) > 1) {
lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
loop_global_unlock(lo, true);
return 0;
}
4. Before scanning partitions, it will check to see if any partition of
the loop device is currently opened
5. If any partition is opened, then it will return EBUSY:
if (disk->open_partitions)
return -EBUSY;
6. So, after receiving the "LOOP_CLR_FD" command and just before the above
check for open_partitions, if any other command
(like blkid) opens any partition of the loop device, then the partition
scan will not proceed and EBUSY is returned as shown in above code
7. But in "__loop_clr_fd()", this EBUSY error is not propagated
8. We have noticed that this is causing the partitions of the loop to
remain stale even after the loop device is detached resulting in the
IO errors on the partitions
Fix:
Defer the detach of loop device to release function, which is called when
the last close happens, by setting the lo_flags to LO_FLAGS_AUTOCLEAR at
the time of detach i.e in loop_clr_fd() function.
Test case involves the following two scripts:
script1.sh:
while [ 1 ];
do
losetup -P -f /home/opt/looptest/test10.img
blkid /dev/loop0p1
done
script2.sh:
while [ 1 ];
do
losetup -d /dev/loop0
done
Without fix, the following IO errors have been observed:
kernel: __loop_clr_fd: partition scan of loop0 failed (rc=-16)
kernel: I/O error, dev loop0, sector 20971392 op 0x0:(READ) flags 0x80700
phys_seg 1 prio class 0
kernel: I/O error, dev loop0, sector 108868 op 0x0:(READ) flags 0x0
phys_seg 1 prio class 0
kernel: Buffer I/O error on dev loop0p1, logical block 27201, async page
read
Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240618164042.343777-1-gulam.mohamed@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Move the nonrot flag into the queue_limits feature field so that it can
be set atomically with the queue frozen.
Use the chance to switch to defaulting to non-rotational and require
the driver to opt into rotational, which matches the polarity of the
sysfs interface.
For the z2ram, ps3vram, 2x memstick, ubiblock and dcssblk the new
rotational flag is not set as they clearly are not rotational despite
this being a behavior change. There are some other drivers that
unconditionally set the rotational flag to keep the existing behavior
as they arguably can be used on rotational devices even if that is
probably not their main use today (e.g. virtio_blk and drbd).
The flag is automatically inherited in blk_stack_limits matching the
existing behavior in dm and md.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Move the cache control settings into the queue_limits so that the flags
can be set atomically with the device queue frozen.
Add new features and flags field for the driver set flags, and internal
(usually sysfs-controlled) flags in the block layer. Note that we'll
eventually remove enough field from queue_limits to bring it back to the
previous size.
The disable flag is inverted compared to the previous meaning, which
means it now survives a rescan, similar to the max_sectors and
max_discard_sectors user limits.
The FLUSH and FUA flags are now inherited by blk_stack_limits, which
simplified the code in dm a lot, but also causes a slight behavior
change in that dm-switch and dm-unstripe now advertise a write cache
despite setting num_flush_bios to 0. The I/O path will handle this
gracefully, but as far as I can tell the lack of num_flush_bios
and thus flush support is a pre-existing data integrity bug in those
targets that really needs fixing, after which a non-zero num_flush_bios
should be required in dm for targets that map to underlying devices.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This prepares for moving the rotational flag into the queue_limits and
also fixes it for the case where the loop device is backed by a block
device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240617060532.127975-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix the code in loop_reconfigure_limits to pick a default block size for
O_DIRECT file descriptors to also work when the loop device sits on top
of a block device and not just on a regular file on a block device based
file system.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240617060532.127975-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The LOOP_CONFIGURE path automatically upgrades the block size to that
of the underlying file for O_DIRECT file descriptors, but the
LOOP_SET_BLOCK_SIZE path does not. Fix this by lifting the code to
pick the block size into common code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240617060532.127975-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Simplify loop_reconfigure_limits by always updating the discard limits.
This adds a little more work to loop_set_block_size, but doesn't change
the outcome as the discard flag won't change.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240617060532.127975-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
__loop_clr_fd wants to clear all settings on the device. Prepare for
moving more settings into the block limits by open coding
loop_reconfigure_limits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240617060532.127975-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If fallcate is implemented but zero and discard operations are not
supported by the filesystem the backing file is on we continue to fill
dmesg with errors from the blk_mq_end_request() since each time we call
fallocate() on the loop device the EOPNOTSUPP error from lo_fallocate()
ends up propagated into the block layer. In the end syscall succeeds
since the blkdev_issue_zeroout() falls back to writing zeroes which
makes the errors even more misleading and confusing.
How to reproduce:
1. make sure /tmp is mounted as tmpfs
2. dd if=/dev/zero of=/tmp/disk.img bs=1M count=100
3. losetup /dev/loop0 /tmp/disk.img
4. mkfs.ext2 /dev/loop0
5. dmesg |tail
[710690.898214] operation not supported error, dev loop0, sector 204672 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.898279] operation not supported error, dev loop0, sector 522 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.898603] operation not supported error, dev loop0, sector 16906 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.898917] operation not supported error, dev loop0, sector 32774 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.899218] operation not supported error, dev loop0, sector 49674 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.899484] operation not supported error, dev loop0, sector 65542 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.899743] operation not supported error, dev loop0, sector 82442 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.900015] operation not supported error, dev loop0, sector 98310 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.900276] operation not supported error, dev loop0, sector 115210 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.900546] operation not supported error, dev loop0, sector 131078 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
This patch changes the lo_fallocate() to clear the flags for zero and
discard operations if we get EOPNOTSUPP from the backing file fallocate
callback, that way we at least stop spewing errors after the first
unsuccessful try.
CC: Jan Kara <jack@suse.cz>
Signed-off-by: Cyril Hrubis <chrubis@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240613163817.22640-1-chrubis@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|