summaryrefslogtreecommitdiff
path: root/drivers/block
AgeCommit message (Collapse)Author
2023-04-18zram: make zram_bio_discard more self-containedChristoph Hellwig
Derive the index and offset variables inside the function, and complete the bio directly in preparation for cleaning up the I/O path. Link: https://lkml.kernel.org/r/20230411171459.567614-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: remove valid_io_requestChristoph Hellwig
All bios hande to drivers from the block layer are checked against the device size and for logical block alignment already (and have been since long before zram was merged), so don't duplicate those checks. Link: https://lkml.kernel.org/r/20230411171459.567614-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: always compile read_from_bdev_syncChristoph Hellwig
Patch series "zram I/O path cleanups and fixups", v3. This series cleans up the zram I/O path, and fixes the handling of synchronous I/O to the underlying device in the writeback_store function or for > 4K PAGE_SIZE systems. The fixes are at the end, as I could not fully reason about them being safe before untangling the callchain. This patch (of 17): read_from_bdev_sync is currently only compiled for non-4k PAGE_SIZE, which means it won't be built with the most common configurations. Replace the ifdef with a check for the PAGE_SIZE in an if instead. The check uses an extra symbol and IS_ENABLED to allow the compiler to eliminate the dead code, leading to the same generated code size: text data bss dec hex filename 16709 1428 12 18149 46e5 drivers/block/zram/zram_drv.o.old 16709 1428 12 18149 46e5 drivers/block/zram/zram_drv.o.new Link: https://lkml.kernel.org/r/20230411171459.567614-1-hch@lst.de Link: https://lkml.kernel.org/r/20230411171459.567614-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: fix up permission for the hot_add sysfs fileGreg Kroah-Hartman
Commit 75a2d4226b53 ("driver core: class: mark the struct class for sysfs callbacks as constant") changed the attribute to use CLASS_ATTR_RO() which changed the permission from 0400 to 0444. But this atribute is "special" in that reading it modifies the system state, so it MUST be set to 0400 so that only root processes can muck around with it. Fix this all up, AND document this so that I don't change it again in 3-4 years when I stumble across it and wonder why it's an open-coded _ATTR() macro. Reported-by: Denis Efremov <efremov@linux.com> Fixes: 75a2d4226b53 ("driver core: class: mark the struct class for sysfs callbacks as constant") Link: https://lore.kernel.org/r/2023041810-angelic-conical-52d8@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-13block: null_blk: make fault-injection dynamically configurable per deviceAkinobu Mita
The null_blk driver has multiple driver-specific fault injection mechanisms. Each fault injection configuration can only be specified by a module parameter and cannot be reconfigured without reloading the driver. Also, each configuration is common to all devices and is initialized every time a new device is added. This change adds the following subdirectories for each null_blk device. /sys/kernel/config/nullb/<disk>/timeout_inject /sys/kernel/config/nullb/<disk>/requeue_inject /sys/kernel/config/nullb/<disk>/init_hctx_fault_inject Each fault injection attribute can be dynamically set per device by a corresponding file in these directories. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Link: https://lore.kernel.org/r/20230327143733.14599-3-akinobu.mita@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-10Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "Some last minute fixes - most of them for regressions" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa_sim_net: complete the initialization before register the device vdpa/mlx5: Add and remove debugfs in setup/teardown driver tools/virtio: fix typo in README instructions vhost-scsi: Fix crash during LUN unmapping vhost-scsi: Fix vhost_scsi struct use after free virtio-blk: fix ZBD probe in kernels without ZBD support virtio-blk: fix to match virtio spec
2023-04-06block: ublk: make sure that block size is set correctlyMing Lei
block size is one very key setting for block layer, and bad block size could panic kernel easily. Make sure that block size is set correctly. Meantime if ublk_validate_params() fails, clear ub->params so that disk is prevented from being added. Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Reported-and-tested-by: Breno Leitao <leitao@debian.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-05ublk: read any SQE values upfrontJens Axboe
Since SQE memory is shared with userspace, we should only be reading it once. We cannot read it multiple times, particularly when it's read once for validation and then read again for the actual use. ublk_ch_uring_cmd() is safe when called as a retry operation, as the memory backing is stable at that point. But for normal issue, we want to ensure that we only read ublksrv_io_cmd once. Wrap the function in a helper that reads the value into an on-stack copy of the struct. Cc: stable@vger.kernel.org # 6.0+ Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-05mm, treewide: redefine MAX_ORDER sanelyKirill A. Shutemov
MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1. This definition is counter-intuitive and lead to number of bugs all over the kernel. Change the definition of MAX_ORDER to be inclusive: the range of orders user can ask from buddy allocator is 0..MAX_ORDER now. [kirill@shutemov.name: fix min() warning] Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box [akpm@linux-foundation.org: fix another min_t warning] [kirill@shutemov.name: fixups per Zi Yan] Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name [akpm@linux-foundation.org: fix underlining in docs] Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/ Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05floppy: fix MAX_ORDER usageKirill A. Shutemov
MAX_ORDER is not inclusive: the maximum allocation order buddy allocator can deliver is MAX_ORDER-1. Fix MAX_ORDER usage in floppy code. Also allocation buffer exactly PAGE_SIZE << MAX_ORDER bytes is okay. Fix MAX_LEN check. Link: https://lkml.kernel.org/r/20230315113133.11326-4-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Denis Efremov <efremov@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05drbd: Rename kvfree_rcu() to kvfree_rcu_mightsleep()Uladzislau Rezki (Sony)
The kvfree_rcu() macro's single-argument form is deprecated. Therefore switch to the new kvfree_rcu_mightsleep() variant. The goal is to avoid accidental use of the single-argument forms, which can introduce functionality bugs in atomic contexts and latency bugs in non-atomic contexts. Cc: Jens Axboe <axboe@kernel.dk> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Begrudgingly-acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2023-04-04virtio-blk: fix ZBD probe in kernels without ZBD supportDmitry Fomichev
When the kernel is built without support for zoned block devices, virtio-blk probe needs to error out any host-managed device scans to prevent such devices from appearing in the system as non-zoned. The current virtio-blk code simply bypasses all ZBD checks if CONFIG_BLK_DEV_ZONED is not defined and this leads to host-managed block devices being presented as non-zoned in the OS. This is one of the main problems this patch series is aimed to fix. In this patch, make VIRTIO_BLK_F_ZONED feature defined even when CONFIG_BLK_DEV_ZONED is not. This change makes the code compliant with the voted revision of virtio-blk ZBD spec. Modify the probe code to look at the situation when VIRTIO_BLK_F_ZONED is negotiated in a kernel that is built without ZBD support. In this case, the code checks the zoned model of the device and fails the probe is the device is host-managed. The patch also adds the comment to clarify that the call to perform the zoned device probe is correctly placed after virtio_device ready(). Fixes: 95bfec41bd3d ("virtio-blk: add support for zoned block devices") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Message-Id: <20230330214953.1088216-3-dmitry.fomichev@wdc.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-04virtio-blk: fix to match virtio specDmitry Fomichev
The merged patch series to support zoned block devices in virtio-blk is not the most up to date version. The merged patch can be found at https://lore.kernel.org/linux-block/20221016034127.330942-3-dmitry.fomichev@wdc.com/ but the latest and reviewed version is https://lore.kernel.org/linux-block/20221110053952.3378990-3-dmitry.fomichev@wdc.com/ The reason is apparently that the correct mailing lists and maintainers were not copied. The differences between the two are mostly cleanups, but there is one change that is very important in terms of compatibility with the approved virtio-zbd specification. Before it was approved, the OASIS virtio spec had a change in VIRTIO_BLK_T_ZONE_APPEND request layout that is not reflected in the current virtio-blk driver code. In the running code, the status is the first byte of the in-header that is followed by some pad bytes and the u64 that carries the sector at which the data has been written to the zone back to the driver, aka the append sector. This layout turned out to be problematic for implementing in QEMU and the request status byte has been eventually made the last byte of the in-header. The current code doesn't expect that and this causes the append sector value always come as zero to the block layer. This needs to be fixed ASAP. Fixes: 95bfec41bd3d ("virtio-blk: add support for zoned block devices") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Message-Id: <20230330214953.1088216-2-dmitry.fomichev@wdc.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-03Merge 6.3-rc5 into driver-core-nextGreg Kroah-Hartman
We need the fixes in here for testing, as well as the driver core changes for documentation updates to build on. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-02block: ublk_drv: cleanup 'struct ublk_map_data'Ming Lei
'struct ublk_map_data' is passed to ublk_copy_user_pages() for copying data between userspace buffer and request pages. Here what matters is userspace buffer address/len and 'struct request', so replace ->io field with user buffer address, and rename max_bytes as len. Meantime remove 'ubq' field from ublk_map_data, since it isn't used any more. Then code becomes more readable. Reviewed-by: Ziyang Zhang <ZiyangZhang@linux.alibaba.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-02block: ublk_drv: clean up several helpersMing Lei
Convert the following pattern in several helpers if (Z) return true return false into: return Z; Reviewed-by: Ziyang Zhang <ZiyangZhang@linux.alibaba.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-02block: ublk_drv: add two helpers to clean up map/unmap requestMing Lei
Add two helpers for checking if map/unmap is needed, since we may have passthrough request which needs map or unmap in future, such as for supporting report zones. Meantime don't mark ublk_copy_user_pages as inline since this function is a bit fat now. Reviewed-by: Ziyang Zhang <ZiyangZhang@linux.alibaba.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-02block: ublk_drv: don't consider flush request in map/unmap ioMing Lei
There isn't data in request of REQ_OP_FLUSH always, so don't consider it in both ublk_map_io() and ublk_unmap_io(). Reviewed-by: Ziyang Zhang <ZiyangZhang@linux.alibaba.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-02block: ublk_drv: add common exit handlingMing Lei
Simplify exit handling a bit, and prepare for supporting fused command. Reviewed-by: Ziyang Zhang <ZiyangZhang@linux.alibaba.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-02pktcdvd: simplify the class_pktcdvd logicGreg Kroah-Hartman
There is no need to dynamically create and destroy the class_pktcdvd structure, just make it static and remove the memory allocation logic which simplifies and cleans up the logic a lot. Cc: linux-block@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20230331164724.319703-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-01drbd: Pass a peer device to the resync and online verify functionsChristoph Böhmwalder
Originally-from: Andreas Grünbacher <agruen@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20230330102744.2128122-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-01drbd: pass drbd_peer_device to __req_modChristoph Böhmwalder
In preparation to support multiple connections, we need to know which one we need to modify the request state for. Originally-from: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20230330102744.2128122-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-01drbd: drbd_uuid_compare: pass a peer_deviceChristoph Böhmwalder
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20230330102744.2128122-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-01drbd: INFO_bm_xfer_stats(): Pass a peer device argumentAndreas Gruenbacher
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20230330102744.2128122-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-01drbd: Add peer device parameter to whole-bitmap I/O handlersAndreas Gruenbacher
Pass a peer device parameter through the bitmap I/O functions to the I/O handlers. In after_state_ch(), set that parameter when queuing the drbd_send_bitmap operation so that this operation knows where to send the bitmap. Signed-off-by: Andreas Gruenbacher <agruen@kernel.org> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20230330102744.2128122-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-01drbd: Rip out the ERR_IF_CNT_IS_NEGATIVE macroAndreas Gruenbacher
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20230330102744.2128122-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-01null_blk: use kmap_local_page() and kunmap_local()Chaitanya Kulkarni
Replace the deprecated API kmap_atomic() and kunmap_atomic() with kmap_local_page() and kunmap_local() in null_flush_cache_page(). Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230330184926.64209-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-01null_blk: use non-deprecated lib functionsChaitanya Kulkarni
Use library helper memcpy_page() to copy source page into destination instead of having duplicate code in copy_to_nullb() & copy_from_nullb(). In copy_from_nullb() also replace the memset call with zero_user(). Use library helper memset_page() to set the buffer to 0xFF instead of having duplicate code. This also removes deprecated API kmap_atomic() from copy_to_nullb() copy_from_nullb() and nullb_fill_pattern(), from :include/linux/highmem.h: "kmap_atomic - Atomically map a page for temporary usage - Deprecated!" Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230330184926.64209-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-29driver core: class: mark the struct class for sysfs callbacks as constantGreg Kroah-Hartman
struct class should never be modified in a sysfs callback as there is nothing in the structure to modify, and frankly, the structure is almost never used in a sysfs callback, so mark it as constant to allow struct class to be moved to read-only memory. While we are touching all class sysfs callbacks also mark the attribute as constant as it can not be modified. The bonding code still uses this structure so it can not be removed from the function callbacks. Cc: "David S. Miller" <davem@davemloft.net> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Bartosz Golaszewski <brgl@bgdev.pl> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Miquel Raynal <miquel.raynal@bootlin.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Russ Weight <russell.h.weight@intel.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steve French <sfrench@samba.org> Cc: Vignesh Raghavendra <vigneshr@ti.com> Cc: linux-cifs@vger.kernel.org Cc: linux-gpio@vger.kernel.org Cc: linux-mtd@lists.infradead.org Cc: linux-rdma@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: netdev@vger.kernel.org Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20230325084537.3622280-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-27loop: LOOP_CONFIGURE: send uevents for partitionsAlyssa Ross
LOOP_CONFIGURE is, as far as I understand it, supposed to be a way to combine LOOP_SET_FD and LOOP_SET_STATUS64 into a single syscall. When using LOOP_SET_FD+LOOP_SET_STATUS64, a single uevent would be sent for each partition found on the loop device after the second ioctl(), but when using LOOP_CONFIGURE, no such uevent was being sent. In the old setup, uevents are disabled for LOOP_SET_FD, but not for LOOP_SET_STATUS64. This makes sense, as it prevents uevents being sent for a partially configured device during LOOP_SET_FD - they're only sent at the end of LOOP_SET_STATUS64. But for LOOP_CONFIGURE, uevents were disabled for the entire operation, so that final notification was never issued. To fix this, reduce the critical section to exclude the loop_reread_partitions() call, which causes the uevents to be issued, to after uevents are re-enabled, matching the behaviour of the LOOP_SET_FD+LOOP_SET_STATUS64 combination. I noticed this because Busybox's losetup program recently changed from using LOOP_SET_FD+LOOP_SET_STATUS64 to LOOP_CONFIGURE, and this broke my setup, for which I want a notification from the kernel any time a new partition becomes available. Signed-off-by: Alyssa Ross <hi@alyssa.is> [hch: reduced the critical section] Signed-off-by: Christoph Hellwig <hch@lst.de> Fixes: 3448914e8cc5 ("loop: Add LOOP_CONFIGURE ioctl") Link: https://lore.kernel.org/r/20230320125430.55367-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-23driver core: bus: mark the struct bus_type for sysfs callbacks as constantGreg Kroah-Hartman
struct bus_type should never be modified in a sysfs callback as there is nothing in the structure to modify, and frankly, the structure is almost never used in a sysfs callback, so mark it as constant to allow struct bus_type to be moved to read-only memory. Cc: "David S. Miller" <davem@davemloft.net> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Bounine <alex.bou9@gmail.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Ben Widawsky <bwidawsk@kernel.org> Cc: Dexuan Cui <decui@microsoft.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Harald Freudenberger <freude@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hu Haowen <src.res@email.cn> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Stuart Yoder <stuyoder@gmail.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Acked-by: Ilya Dryomov <idryomov@gmail.com> # rbd Acked-by: Ira Weiny <ira.weiny@intel.com> # cxl Reviewed-by: Alex Shi <alexs@kernel.org> Acked-by: Iwona Winiarska <iwona.winiarska@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> # scsi Link: https://lore.kernel.org/r/20230313182918.1312597-23-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-20block/io_uring: pass in issue_flags for uring_cmd task_work handlingJens Axboe
io_uring_cmd_done() currently assumes that the uring_lock is held when invoked, and while it generally is, this is not guaranteed. Pass in the issue_flags associated with it, so that we have IO_URING_F_UNLOCKED available to be able to lock the CQ ring appropriately when completing events. Cc: stable@vger.kernel.org Fixes: ee692a21e9bf ("fs,io_uring: add infrastructure for uring-cmd") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-18block: ublk_drv: mark device as LIVE before adding diskMing Lei
IO can be started before add_disk() returns, such as reading parititon table, then the monitor work should work for making forward progress. So mark device as LIVE before adding disk, meantime change to DEAD if add_disk() fails. Fixed: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Reviewed-by: Ziyang Zhang <ZiyangZhang@linux.alibaba.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230318141231.55562-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-17driver core: class: remove module * from class_create()Greg Kroah-Hartman
The module pointer in class_create() never actually did anything, and it shouldn't have been requred to be set as a parameter even if it did something. So just remove it and fix up all callers of the function in the kernel tree at the same time. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20230313181843.1207845-4-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17drivers: remove struct module * setting from struct classGreg Kroah-Hartman
There is no need to manually set the owner of a struct class, as the registering function does it automatically, so remove all of the explicit settings from various drivers that did so as it is unneeded. This allows us to remove this pointer entirely from this structure going forward. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230313181843.1207845-2-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-15block: sunvdc: add check for mdesc_grab() returning NULLLiang He
In vdc_port_probe(), we should check the return value of mdesc_grab() as it may return NULL, which can cause potential NPD bug. Fixes: 43fdf27470b2 ("[SPARC64]: Abstract out mdesc accesses for better MD update handling.") Signed-off-by: Liang He <windhl@126.com> Link: https://lore.kernel.org/r/20230315062032.1741692-1-windhl@126.com [axboe: style cleanup] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-15block: null_blk: cleanup null_queue_rq()Damien Le Moal
Use a local struct request pointer variable to avoid having to dereference struct blk_mq_queue_data multiple times. While at it, also fix the function argument indentation and remove a useless "else" after a return. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20230314041106.19173-2-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-15block: null_blk: Fix handling of fake timeout requestDamien Le Moal
When injecting a fake timeout into the null_blk driver using fail_io_timeout, the request timeout handler does not execute blk_mq_complete_request(), so the complete callback is never executed for a timedout request. The null_blk driver also has a driver-specific fake timeout mechanism which does not have this problem. Fix the problem with fail_io_timeout by using the same meachanism as null_blk internal timeout feature, using the fake_timeout field of null_blk commands. Reported-by: Akinobu Mita <akinobu.mita@gmail.com> Fixes: de3510e52b0a ("null_blk: fix command timeout completion handling") Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230314041106.19173-2-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-14loop: Fix use-after-free issuesBart Van Assche
do_req_filebacked() calls blk_mq_complete_request() synchronously or asynchronously when using asynchronous I/O unless memory allocation fails. Hence, modify loop_handle_cmd() such that it does not dereference 'cmd' nor 'rq' after do_req_filebacked() finished unless we are sure that the request has not yet been completed. This patch fixes the following kernel crash: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000054 Call trace: css_put.42938+0x1c/0x1ac loop_process_work+0xc8c/0xfd4 loop_rootcg_workfn+0x24/0x34 process_one_work+0x244/0x558 worker_thread+0x400/0x8fc kthread+0x16c/0x1e0 ret_from_fork+0x10/0x20 Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Schatzberg <schatzberg.dan@gmail.com> Fixes: c74d40e8b5e2 ("loop: charge i/o to mem and blk cg") Fixes: bc07c10a3603 ("block: loop: support DIO & AIO") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230314182155.80625-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-14nbd: use the structured req attr checkJakub Kicinski
Use the macro for checking presence of required attributes. It has the advantage of reporting to the user which attr was missing in a machine-readable format (extack). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230224021301.1630703-2-kuba@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-14nbd: allow genl access outside init_netJakub Kicinski
NBD doesn't have much to do with networking, allow users outside init_net to access the family. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230224021301.1630703-1-kuba@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-10pktcdvd: Remove CONFIG_CDROM_PKTCDVD_WCACHE from uapi headerThomas Huth
CONFIG_* switches should not be exposed in uapi headers, thus let's get rid of the USE_WCACHING macro here (which was also named way to generic) and integrate the logic directly in the only function that needs it. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-03-03Merge tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request via Christoph: - Don't access released socket during error recovery (Akinobu Mita) - Bring back auto-removal of deleted namespaces during sequential scan (Christoph Hellwig) - Fix an error code in nvme_auth_process_dhchap_challenge (Dan Carpenter) - Show well known discovery name (Daniel Wagner) - Add a missing endianess conversion in effects masking (Keith Busch) - Fix for a regression introduced in blk-rq-qos during init in this merge window (Breno) - Reorder a few fields in struct blk_mq_tag_set, eliminating a few holes and shrinking it (Christophe) - Remove redundant bdev_get_queue() NULL checks (Juhyung) - Add sed-opal single user mode support flag (Luca) - Remove SQE128 check in ublk as it isn't needed, saving some memory (Ming) - Op specific segment checking for cloned requests (Uday) - Exclusive open partition scan fixes (Yu) - Loop offset/size checking before assigning them in the device (Zhong) - Bio polling fixes (me) * tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linux: blk-mq: enforce op-specific segment limits in blk_insert_cloned_request nvme-fabrics: show well known discovery name nvme-tcp: don't access released socket during error recovery nvme-auth: fix an error code in nvme_auth_process_dhchap_challenge() nvme: bring back auto-removal of deleted namespaces during sequential scan blk-iocost: Pass gendisk to ioc_refresh_params nvme: fix sparse warning on effects masking block: be a bit more careful in checking for NULL bdev while polling block: clear bio->bi_bdev when putting a bio back in the cache loop: loop_set_status_from_info() check before assignment ublk: remove check IO_URING_F_SQE128 in ublk_ch_uring_cmd block: remove more NULL checks after bdev_get_queue() blk-mq: Reorder fields in 'struct blk_mq_tag_set' block: fix scan partition for exclusively open device again block: Revert "block: Do not reread partition table on exclusively open device" sed-opal: add support flag for SUM in status ioctl
2023-03-02Merge tag 'ceph-for-6.3-rc1' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "Two small fixes from Xiubo and myself, marked for stable" * tag 'ceph-for-6.3-rc1' of https://github.com/ceph/ceph-client: rbd: avoid use-after-free in do_rbd_add() when rbd_dev_create() fails ceph: update the time stamps and try to drop the suid/sgid
2023-02-26rbd: avoid use-after-free in do_rbd_add() when rbd_dev_create() failsIlya Dryomov
If getting an ID or setting up a work queue in rbd_dev_create() fails, use-after-free on rbd_dev->rbd_client, rbd_dev->spec and rbd_dev->opts is triggered in do_rbd_add(). The root cause is that the ownership of these structures is transfered to rbd_dev prematurely and they all end up getting freed when rbd_dev_create() calls rbd_dev_free() prior to returning to do_rbd_add(). Found by Linux Verification Center (linuxtesting.org) with SVACE, an incomplete patch submitted by Natalia Petrova <n.petrova@fintech.ru>. Cc: stable@vger.kernel.org Fixes: 1643dfa4c2c8 ("rbd: introduce a per-device ordered workqueue") Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-02-25Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - device feature provisioning in ifcvf, mlx5 - new SolidNET driver - support for zoned block device in virtio blk - numa support in virtio pmem - VIRTIO_F_RING_RESET support in vhost-net - more debugfs entries in mlx5 - resume support in vdpa - completion batching in virtio blk - cleanup of dma api use in vdpa - now simulating more features in vdpa-sim - documentation, features, fixes all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (64 commits) vdpa/mlx5: support device features provisioning vdpa/mlx5: make MTU/STATUS presence conditional on feature bits vdpa: validate device feature provisioning against supported class vdpa: validate provisioned device features against specified attribute vdpa: conditionally read STATUS in config space vdpa: fix improper error message when adding vdpa dev vdpa/mlx5: Initialize CVQ iotlb spinlock vdpa/mlx5: Don't clear mr struct on destroy MR vdpa/mlx5: Directly assign memory key tools/virtio: enable to build with retpoline vringh: fix a typo in comments for vringh_kiov vhost-vdpa: print warning when vhost_vdpa_alloc_domain fails scsi: virtio_scsi: fix handling of kmalloc failure vdpa: Fix a couple of spelling mistakes in some messages vhost-net: support VIRTIO_F_RING_RESET vhost-scsi: convert sysfs snprintf and sprintf to sysfs_emit vdpa: mlx5: support per virtqueue dma device vdpa: set dma mask for vDPA device virtio-vdpa: support per vq dma device vdpa: introduce get_vq_dma_device() ...
2023-02-23Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
2023-02-22loop: loop_set_status_from_info() check before assignmentZhong Jinghua
In loop_set_status_from_info(), lo->lo_offset and lo->lo_sizelimit should be checked before reassignment, because if an overflow error occurs, the original correct value will be changed to the wrong value, and it will not be changed back. More, the original patch did not solve the problem, the value was set and ioctl returned an error, but the subsequent io used the value in the loop driver, which still caused an alarm: loop_handle_cmd do_req_filebacked loff_t pos = ((loff_t) blk_rq_pos(rq) << 9) + lo->lo_offset; lo_rw_aio cmd->iocb.ki_pos = pos Fixes: c490a0b5a4f3 ("loop: Check for overflow while configuring loop") Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230221095027.3656193-1-zhongjinghua@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-22Merge tag 'ata-6.3-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull ATA updates from Damien Le Moal: - Small cleanup of the pata_octeon driver to drop a useless platform callback (Uwe) - Simplify ata_scsi_cmd_error_handler() code using the fact that ap->ops->error_handler is NULL most of the time (Wenchao) - Several patches improving libata error handling. This is in preparation for supporting the command duration limits (CDL) feature. The changes allow handling corner cases of ATA NCQ errors which do not happen with regular drives but will be triggered with CDL drives (Niklas) - Simplify the qc_fill_rtf operation (me) - Improve SCSI command translation for REPORT_SUPPORTED_OPERATION_CODES command (me) - Cleanup of libata FUA handling. This falls short of enabling FUA for ATA drives that support it by default as there were concerns that old drives would break. The series however fixes several issues with the FUA support to ensure that FUA is reported as being supported only for drives that can handle all possible write cases (NCQ and non-NCQ). A check in the block layer is also added to ensure that we never see read FUA commands (current behavior) (me) - Several patches to move the old PARIDE (parallel port IDE) driver to libata as pata_parport. Given that this driver also needs protocol modules, the driver code resides in its own pata_parport directoy under drivers/ata (Ondrej) * tag 'ata-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: ata: pata_parport: Fix ida_alloc return value error check drivers/block: Move PARIDE protocol modules to drivers/ata/pata_parport drivers/block: Remove PARIDE core and high-level protocols ata: pata_parport: add driver (PARIDE replacement) ata: libata: exclude FUA support for known buggy drives ata: libata: Fix FUA handling in ata_build_rw_tf() ata: libata: cleanup fua support detection ata: libata: Rename and cleanup ata_rwcmd_protocol() ata: libata: Introduce ata_ncq_supported() block: add a sanity check for non-write flush/fua bios ata: libata-scsi: improve ata_scsiop_maint_in() ata: libata-scsi: do not overwrite SCSI ML and status bytes ata: libata: move NCQ related ATA_DFLAGs ata: libata: respect successfully completed commands during errors ata: libata: read the shared status for successful NCQ commands once ata: libata: simplify qc_fill_rtf port operation interface ata: scsi: rename flag ATA_QCFLAG_FAILED to ATA_QCFLAG_EH ata: libata-eh: Cleanup ata_scsi_cmd_error_handler() ata: octeon: Drop empty platform remove function
2023-02-21ublk: remove check IO_URING_F_SQE128 in ublk_ch_uring_cmdMing Lei
sizeof(struct ublksrv_io_cmd) is 16bytes, which can be held in 64byte SQE, so not necessary to check IO_URING_F_SQE128. With this change, we get chance to save half SQ ring memory. Fixed: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230220041413.1524335-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>