summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-09-04Merge branch 'rework/misc-cleanups' into for-linusPetr Mladek
2023-09-04Merge branch 'for-6.6-vsprintf-doc' into for-linusPetr Mladek
2023-09-03virtio_ring: fix avail_wrap_counter in virtqueue_add_packedYuan Yao
In current packed virtqueue implementation, the avail_wrap_counter won't flip, in the case when the driver supplies a descriptor chain with a length equals to the queue size; total_sg == vq->packed.vring.num. Let’s assume the following situation: vq->packed.vring.num=4 vq->packed.next_avail_idx: 1 vq->packed.avail_wrap_counter: 0 Then the driver adds a descriptor chain containing 4 descriptors. We expect the following result with avail_wrap_counter flipped: vq->packed.next_avail_idx: 1 vq->packed.avail_wrap_counter: 1 But, the current implementation gives the following result: vq->packed.next_avail_idx: 1 vq->packed.avail_wrap_counter: 0 To reproduce the bug, you can set a packed queue size as small as possible, so that the driver is more likely to provide a descriptor chain with a length equal to the packed queue size. For example, in qemu run following commands: sudo qemu-system-x86_64 \ -enable-kvm \ -nographic \ -kernel "path/to/kernel_image" \ -m 1G \ -drive file="path/to/rootfs",if=none,id=disk \ -device virtio-blk,drive=disk \ -drive file="path/to/disk_image",if=none,id=rwdisk \ -device virtio-blk,drive=rwdisk,packed=on,queue-size=4,\ indirect_desc=off \ -append "console=ttyS0 root=/dev/vda rw init=/bin/bash" Inside the VM, create a directory and mount the rwdisk device on it. The rwdisk will hang and mount operation will not complete. This commit fixes the wrap counter error by flipping the packed.avail_wrap_counter, when start of descriptor chain equals to the end of descriptor chain (head == i). Fixes: 1ce9e6055fa0 ("virtio_ring: introduce packed ring support") Signed-off-by: Yuan Yao <yuanyaogoog@chromium.org> Message-Id: <20230808051110.3492693-1-yuanyaogoog@chromium.org> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_vdpa: build affinity masks conditionallyJason Wang
We try to build affinity mask via create_affinity_masks() unconditionally which may lead several issues: - the affinity mask is not used for parent without affinity support (only VDUSE support the affinity now) - the logic of create_affinity_masks() might not work for devices other than block. For example it's not rare in the networking device where the number of queues could exceed the number of CPUs. Such case breaks the current affinity logic which is based on group_cpus_evenly() who assumes the number of CPUs are not less than the number of groups. This can trigger a warning[1]: if (ret >= 0) WARN_ON(nr_present + nr_others < numgrps); Fixing this by only build the affinity masks only when - Driver passes affinity descriptor, driver like virtio-blk can make sure to limit the number of queues when it exceeds the number of CPUs - Parent support affinity setting config ops This help to avoid the warning. More optimizations could be done on top. [1] [ 682.146655] WARNING: CPU: 6 PID: 1550 at lib/group_cpus.c:400 group_cpus_evenly+0x1aa/0x1c0 [ 682.146668] CPU: 6 PID: 1550 Comm: vdpa Not tainted 6.5.0-rc5jason+ #79 [ 682.146671] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [ 682.146673] RIP: 0010:group_cpus_evenly+0x1aa/0x1c0 [ 682.146676] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc e8 1b c4 74 ff 48 89 ef e8 13 ac 98 ff 4c 89 e7 45 31 e4 e8 08 ac 98 ff eb c2 <0f> 0b eb b6 e8 fd 05 c3 00 45 31 e4 eb e5 cc cc cc cc cc cc cc cc [ 682.146679] RSP: 0018:ffffc9000215f498 EFLAGS: 00010293 [ 682.146682] RAX: 000000000001f1e0 RBX: 0000000000000041 RCX: 0000000000000000 [ 682.146684] RDX: ffff888109922058 RSI: 0000000000000041 RDI: 0000000000000030 [ 682.146686] RBP: ffff888109922058 R08: ffffc9000215f498 R09: ffffc9000215f4a0 [ 682.146687] R10: 00000000000198d0 R11: 0000000000000030 R12: ffff888107e02800 [ 682.146689] R13: 0000000000000030 R14: 0000000000000030 R15: 0000000000000041 [ 682.146692] FS: 00007fef52315740(0000) GS:ffff888237380000(0000) knlGS:0000000000000000 [ 682.146695] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 682.146696] CR2: 00007fef52509000 CR3: 0000000110dbc004 CR4: 0000000000370ee0 [ 682.146698] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 682.146700] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 682.146701] Call Trace: [ 682.146703] <TASK> [ 682.146705] ? __warn+0x7b/0x130 [ 682.146709] ? group_cpus_evenly+0x1aa/0x1c0 [ 682.146712] ? report_bug+0x1c8/0x1e0 [ 682.146717] ? handle_bug+0x3c/0x70 [ 682.146721] ? exc_invalid_op+0x14/0x70 [ 682.146723] ? asm_exc_invalid_op+0x16/0x20 [ 682.146727] ? group_cpus_evenly+0x1aa/0x1c0 [ 682.146729] ? group_cpus_evenly+0x15c/0x1c0 [ 682.146731] create_affinity_masks+0xaf/0x1a0 [ 682.146735] virtio_vdpa_find_vqs+0x83/0x1d0 [ 682.146738] ? __pfx_default_calc_sets+0x10/0x10 [ 682.146742] virtnet_find_vqs+0x1f0/0x370 [ 682.146747] virtnet_probe+0x501/0xcd0 [ 682.146749] ? vp_modern_get_status+0x12/0x20 [ 682.146751] ? get_cap_addr.isra.0+0x10/0xc0 [ 682.146754] virtio_dev_probe+0x1af/0x260 [ 682.146759] really_probe+0x1a5/0x410 Fixes: 3dad56823b53 ("virtio-vdpa: Support interrupt affinity spreading mechanism") Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230811091539.1359865-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_net: merge dma operations when filling mergeable buffersXuan Zhuo
Currently, the virtio core will perform a dma operation for each buffer. Although, the same page may be operated multiple times. This patch, the driver does the dma operation and manages the dma address based the feature premapped of virtio core. This way, we can perform only one dma operation for the pages of the alloc frag. This is beneficial for the iommu device. kernel command line: intel_iommu=on iommu.passthrough=0 | strict=0 | strict=1 Before | 775496pps | 428614pps After | 1109316pps | 742853pps Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20230810123057.43407-13-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: introduce dma sync api for virtqueueXuan Zhuo
These API has been introduced: * virtqueue_dma_need_sync * virtqueue_dma_sync_single_range_for_cpu * virtqueue_dma_sync_single_range_for_device These APIs can be used together with the premapped mechanism to sync the DMA address. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20230810123057.43407-12-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: introduce dma map api for virtqueueXuan Zhuo
Added virtqueue_dma_map_api* to map DMA addresses for virtual memory in advance. The purpose is to keep memory mapped across multiple add/get buf operations. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20230810123057.43407-11-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: introduce virtqueue_reset()Xuan Zhuo
Introduce virtqueue_reset() to release all buffer inside vq. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230810123057.43407-10-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: separate the logic of reset/enable from virtqueue_resizeXuan Zhuo
The subsequent reset function will reuse these logic. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230810123057.43407-9-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: correct the expression of the description of virtqueue_resize()Xuan Zhuo
Modify the "useless" to a more accurate "unused". Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230810123057.43407-8-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: skip unmap for premappedXuan Zhuo
Now we add a case where we skip dma unmap, the vq->premapped is true. We can't just rely on use_dma_api to determine whether to skip the dma operation. For convenience, I introduced the "do_unmap". By default, it is the same as use_dma_api. If the driver is configured with premapped, then do_unmap is false. So as long as do_unmap is false, for addr of desc, we should skip dma unmap operation. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20230810123057.43407-7-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: introduce virtqueue_dma_dev()Xuan Zhuo
Added virtqueue_dma_dev() to get DMA device for virtio. Then the caller can do dma operation in advance. The purpose is to keep memory mapped across multiple add/get buf operations. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230810123057.43407-6-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: support add premapped bufXuan Zhuo
If the vq is the premapped mode, use the sg_dma_address() directly. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20230810123057.43407-5-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: introduce virtqueue_set_dma_premapped()Xuan Zhuo
This helper allows the driver change the dma mode to premapped mode. Under the premapped mode, the virtio core do not do dma mapping internally. This just work when the use_dma_api is true. If the use_dma_api is false, the dma options is not through the DMA APIs, that is not the standard way of the linux kernel. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20230810123057.43407-4-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: put mapping error check in vring_map_one_sgXuan Zhuo
This patch put the dma addr error check in vring_map_one_sg(). The benefits of doing this: 1. reduce one judgment of vq->use_dma_api. 2. make vring_map_one_sg more simple, without calling vring_mapping_error to check the return value. simplifies subsequent code Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230810123057.43407-3-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: check use_dma_api before unmap desc for indirectXuan Zhuo
Inside detach_buf_split(), if use_dma_api is false, vring_unmap_one_split_indirect will be called many times, but actually nothing is done. So this patch check use_dma_api firstly. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230810123057.43407-2-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03vdpa_sim: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OKEugenio Pérez
Start offering the feature in the simulator. Other parent drivers can follow this code to offer it too. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230609092127.170673-5-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03vdpa: add get_backend_features vdpa operationEugenio Pérez
This operation allow vdpa parent to expose its own backend feature bits. Next patches introduce a feature not compatible with all parent drivers: the ability to enable vq after driver_ok. Each parent must declare if it allows it or not. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230609092127.170673-4-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03vdpa: accept VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend featureEugenio Pérez
Accepting VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature if userland sets it. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230609092127.170673-3-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03vdpa: add VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK flagEugenio Pérez
This feature flag allows the driver enabling virtqueues both before and after DRIVER_OK. This is needed for software assisted live migration, so userland can restore the device status in devices with control virtqueue before the dataplane is enabled. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230609092127.170673-2-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03vdpa/mlx5: Remove unused function declarationsYue Haibing
Commit 29064bfdabd5 ("vdpa/mlx5: Add support library for mlx5 VDPA implementation") declared but never implemented these. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Message-Id: <20230803143041.23388-1-yuehaibing@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03Merge tag 'dmaengine-6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine updates from Vinod Koul: "New controller support and updates to drivers. New support: - Qualcomm SM6115 and QCM2290 dmaengine support - at_xdma support for microchip,sam9x7 controller Updates: - idxd updates for wq simplification and ats knob updates - fsl edma updates for v3 support - Xilinx AXI4-Stream control support - Yaml conversion for bcm dma binding" * tag 'dmaengine-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (53 commits) dmaengine: fsl-edma: integrate v3 support dt-bindings: fsl-dma: fsl-edma: add edma3 compatible string dmaengine: fsl-edma: move tcd into struct fsl_dma_chan dmaengine: fsl-edma: refactor chan_name setup and safety dmaengine: fsl-edma: move clearing of register interrupt into setup_irq function dmaengine: fsl-edma: refactor using devm_clk_get_enabled dmaengine: fsl-edma: simply ATTR_DSIZE and ATTR_SSIZE by using ffs() dmaengine: fsl-edma: move common IRQ handler to common.c dmaengine: fsl-edma: Remove enum edma_version dmaengine: fsl-edma: transition from bool fields to bitmask flags in drvdata dmaengine: fsl-edma: clean up EXPORT_SYMBOL_GPL in fsl-edma-common.c dmaengine: fsl-edma: fix build error when arch is s390 dmaengine: idxd: Fix issues with PRS disable sysfs knob dmaengine: idxd: Allow ATS disable update only for configurable devices dmaengine: xilinx_dma: Program interrupt delay timeout dmaengine: xilinx_dma: Use tasklet_hi_schedule for timing critical usecase dmaengine: xilinx_dma: Freeup active list based on descriptor completion bit dmaengine: xilinx_dma: Increase AXI DMA transaction segment count dmaengine: xilinx_dma: Pass AXI4-Stream control words to dma client dt-bindings: dmaengine: xilinx_dma: Add xlnx,irq-delay property ...
2023-09-03Merge tag 'phy-for-6.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy Pull phy updates from Vinod Koul: "As usual a couple of new drivers, a bunch of new device support and few updates to existing drivers New Support: - Starfive dphy rx, JH7110 usb and pcie support - Rockchip rv1126 inno-dsi phy, rk3588 usb and pcie support - Qualcomm sa8775p PCIe support, M31 USB PHY driver - Samsung Exynos850 usb support Updates: - Mediatek dsi driver clock updates - Qualcomm sm8150 combo phy with reworking of qmp pcie driver - Xilinx zynqmp runtime PM support" * tag 'phy-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (83 commits) phy: exynos5-usbdrd: Add Exynos850 support phy: exynos5-usbdrd: Add 26MHz ref clk support phy: exynos5-usbdrd: Make it possible to pass custom phy ops dt-bindings: phy: samsung,usb3-drd-phy: Add Exynos850 support phy: qcom-qmp-combo: fix clock probing phy: qcom-qmp-pcie: support SM8150 PCIe QMP PHYs phy: qcom-qmp-pcie: populate offsets configuration phy: qcom-qmp-pcie: simplify clock handling phy: qcom-qmp-pcie: keep offset tables sorted phy: qcom-qmp-pcie: drop ln_shrd from v5_20 config dt-bindings: phy: qcom,qmp-pcie: describe SM8150 PCIe PHYs dt-bindings: phy: migrate QMP PCIe PHY bindings to qcom,sc8280xp-qmp-pcie-phy.yaml phy: fsl-imx8mq-usb: add dev_err_probe if getting vbus failed phy: qcom: Introduce M31 USB PHY driver dt-bindings: phy: qcom,m31: Document qcom,m31 USB phy phy: rockchip: inno-dsidphy: Add rv1126 support dt-bindings: phy: rockchip-inno-dsidphy: Document rv1126 dt-bindings: phy: mediatek,tphy: allow simple nodename pattern phy: amlogic: meson-g12a-usb2: fix Wvoid-pointer-to-enum-cast warning phy: marvell pxa-usb: fix Wvoid-pointer-to-enum-cast warning ...
2023-09-03Merge tag 'soundwire-6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire Pull soundwire updates from Vinod Koul: "Device numbering and intel driver changes are main features: - Core support for soundwire device number allocation - intel driver updates for adding hw_params for DAI ops, hybrid number allocation and power managemnt callback updates - DT header include changes for subsystem" * tag 'soundwire-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: soundwire: intel_ace2x: add DAI hw_params/prepare/hw_free callbacks soundwire: intel_auxdevice: add hybrid IDA-based device_number allocation soundwire: bus: add callbacks for device_number allocation soundwire: extend parameters of new_peripheral_assigned() callback soundWire: intel_auxdevice: resume 'sdw-master' on startup and system resume soundwire: intel_auxdevice: enable pm_runtime earlier on startup soundwire: Explicitly include correct DT includes
2023-09-04kbuild: Show marked Kconfig fragments in "help"Kees Cook
Currently the Kconfig fragments in kernel/configs and arch/*/configs that aren't used internally aren't discoverable through "make help", which consists of hard-coded lists of config fragments. Instead, list all the fragment targets that have a "# Help: " comment prefix so the targets can be generated dynamically. Add logic to the Makefile to search for and display the fragment and comment. Add comments to fragments that are intended to be direct targets. Signed-off-by: Kees Cook <keescook@chromium.org> Co-developed-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Reviewed-by: Nicolas Schier <nicolas@fjasle.eu> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-09-03Merge tag 'mtd/for-6.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull MTD updates from Miquel Raynal: "Core MTD changes: - Use refcount to prevent corruption - Call external _get and _put in right order - Fix use-after-free in mtd release - Explicitly include correct DT includes - Clean refcounting with MTD_PARTITIONED_MASTER - mtdblock: make warning messages ratelimited - dt-bindings: Add SEAMA partition bindings Device driver changes: - Use devm helper functions - Fix questionable cast, remove pointless ones. - error handling fixes - add support for new chip versions - update DT bindings - misc cleanups - fix typos, whitespace, indentation" * tag 'mtd/for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (105 commits) dt-bindings: mtd: amlogic,meson-nand: drop unneeded quotes mtd: spear_smi: Use helper function devm_clk_get_enabled() mtd: rawnand: orion: Use helper function devm_clk_get_optional_enabled() mtd: rawnand: vf610_nfc: Use helper function devm_clk_get_enabled() mtd: rawnand: sunxi: Use helper function devm_clk_get_enabled() mtd: rawnand: stm32_fmc2: Use helper function devm_clk_get_enabled() mtd: rawnand: mtk: Use helper function devm_clk_get_enabled() mtd: rawnand: mpc5121: Use helper function devm_clk_get_enabled() mtd: rawnand: lpc32xx_slc: Use helper function devm_clk_get_enabled() mtd: rawnand: intel: Use helper function devm_clk_get_enabled() mtd: rawnand: fsmc: Use helper function devm_clk_get_enabled() mtd: rawnand: arasan: Use helper function devm_clk_get_enabled() mtd: rawnand: qcom: Add read/read_start ops in exec_op path mtd: rawnand: qcom: Clear buf_count and buf_start in raw read mtd: maps: fix -Wvoid-pointer-to-enum-cast warning mtd: rawnand: fix -Wvoid-pointer-to-enum-cast warning mtd: rawnand: fsmc: handle clk prepare error in fsmc_nand_resume() mtd: rawnand: Propagate error and simplify ternary operators for brcmstb_nand_wait_for_completion() mtd: rawnand: qcom: Sort includes alphabetically mtd: rawnand: qcom: Do not override the error no of submit_descs() ...
2023-09-02Merge tag 'f2fs-for-6-6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this cycle, we don't have a highlighted feature enhancement, but mostly have fixed issues mainly in two parts: 1) zoned block device, and 2) compression support. For zoned block device, we've tried to improve the power-off recovery flow as much as possible. For compression, we found some corner cases caused by wrong compression policy and logics. Other than them, there were some reverts and stat corrections. Bug fixes: - use finish zone command when closing a zone - check zone type before sending async reset zone command - fix to assign compress_level for lz4 correctly - fix error path of f2fs_submit_page_read() - don't {,de}compress non-full cluster - send small discard commands during checkpoint back - flush inode if atomic file is aborted - correct to account gc/cp stats And, there are minor bug fixes, avoiding false lockdep warning, and clean-ups" * tag 'f2fs-for-6-6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (25 commits) f2fs: use finish zone command when closing a zone f2fs: compress: fix to assign compress_level for lz4 correctly f2fs: fix error path of f2fs_submit_page_read() f2fs: clean up error handling in sanity_check_{compress_,}inode() f2fs: avoid false alarm of circular locking Revert "f2fs: do not issue small discard commands during checkpoint" f2fs: doc: fix description of max_small_discards f2fs: should update REQ_TIME for direct write f2fs: fix to account cp stats correctly f2fs: fix to account gc stats correctly f2fs: remove unneeded check condition in __f2fs_setxattr() f2fs: fix to update i_ctime in __f2fs_setxattr() Revert "f2fs: fix to do sanity check on extent cache correctly" f2fs: increase usage of folio_next_index() helper f2fs: Only lfs mode is allowed with zoned block device feature f2fs: check zone type before sending async reset zone command f2fs: compress: don't {,de}compress non-full cluster f2fs: allow f2fs_ioc_{,de}compress_file to be interrupted f2fs: don't reopen the main block device in f2fs_scan_devices f2fs: fix to avoid mmap vs set_compress_option case ...
2023-09-02mm/kmemleak: move up cond_resched() call in page scanning loopWaiman Long
Commit bde5f6bc68db ("kmemleak: add scheduling point to kmemleak_scan()") added a cond_resched() call to the struct page scanning loop to prevent soft lockup from happening. However, soft lockup can still happen in that loop in some corner cases when the pages that satisfy the "!(pfn & 63)" check are skipped for some reasons. Fix this corner case by moving up the cond_resched() check so that it will be called every 64 pages unconditionally. Link: https://lkml.kernel.org/r/20230825164947.1317981-1-longman@redhat.com Fixes: bde5f6bc68db ("kmemleak: add scheduling point to kmemleak_scan()") Signed-off-by: Waiman Long <longman@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Yisheng Xie <xieyisheng1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-02mm: page_alloc: remove stale CMA guard codeJohannes Weiner
In the past, movable allocations could be disallowed from CMA through PF_MEMALLOC_PIN. As CMA pages are funneled through the MOVABLE pcplist, this required filtering that cornercase during allocations, such that pinnable allocations wouldn't accidentally get a CMA page. However, since 8e3560d963d2 ("mm: honor PF_MEMALLOC_PIN for all movable pages"), PF_MEMALLOC_PIN automatically excludes __GFP_MOVABLE. Once again, MOVABLE implies CMA is allowed. Remove the stale filtering code. Also remove a stale comment that was introduced as part of the filtering code, because the filtering let order-0 pages fall through to the buddy allocator. See 1d91df85f399 ("mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs") for context. The comment's been obsolete since the introduction of the explicit ALLOC_HIGHATOMIC flag in eb2e2b425c69 ("mm/page_alloc: explicitly record high-order atomic allocations in alloc_flags"). Link: https://lkml.kernel.org/r/20230824153821.243148-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: David Hildenbrand <david@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-02MAINTAINERS: add rmap.h to mm entryBaruch Siach
Make it easier to figure out where to send patches for this file. Link: https://lkml.kernel.org/r/efbc7689d35a48ff402644d696aa9a8d8bb6333a.1692877089.git.baruch@tkos.co.il Signed-off-by: Baruch Siach <baruch@tkos.co.il> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-02rmap: remove anon_vma_link() nommu stubBaruch Siach
anon_vma_link() is unused since commit 5beb49305251 ("mm: change anon_vma linking to fix multi-process server scalability issue"). Link: https://lkml.kernel.org/r/cdce9b00c9ab15f6d02eddf40dcad537d3e9676f.1692877089.git.baruch@tkos.co.il Signed-off-by: Baruch Siach <baruch@tkos.co.il> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-02proc/ksm: add ksm stats to /proc/pid/smapsStefan Roesch
With madvise and prctl KSM can be enabled for different VMA's. Once it is enabled we can query how effective KSM is overall. However we cannot easily query if an individual VMA benefits from KSM. This commit adds a KSM section to the /prod/<pid>/smaps file. It reports how many of the pages are KSM pages. Note that KSM-placed zeropages are not included, only actual KSM pages. Here is a typical output: 7f420a000000-7f421a000000 rw-p 00000000 00:00 0 Size: 262144 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 51212 kB Pss: 8276 kB Shared_Clean: 172 kB Shared_Dirty: 42996 kB Private_Clean: 196 kB Private_Dirty: 7848 kB Referenced: 15388 kB Anonymous: 51212 kB KSM: 41376 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 202016 kB SwapPss: 3882 kB Locked: 0 kB THPeligible: 0 ProtectionKey: 0 ksm_state: 0 ksm_skip_base: 0 ksm_skip_count: 0 VmFlags: rd wr mr mw me nr mg anon This information also helps with the following workflow: - First enable KSM for all the VMA's of a process with prctl. - Then analyze with the above smaps report which VMA's benefit the most - Change the application (if possible) to add the corresponding madvise calls for the VMA's that benefit the most [shr@devkernel.io: v5] Link: https://lkml.kernel.org/r/20230823170107.1457915-1-shr@devkernel.io Link: https://lkml.kernel.org/r/20230822180539.1424843-1-shr@devkernel.io Signed-off-by: Stefan Roesch <shr@devkernel.io> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-02mm/hwpoison: rename hwp_walk* to hwpoison_walk*Jiaqi Yan
In the discussion of "Improve hugetlbfs read on HWPOISON hugepages" [1], Matthew Wilcox suggests hwp is a bad abbreviation of hwpoison, as hwp is already used as "an acronym by acpi, intel_pstate, some clock drivers, an ethernet driver, and a scsi driver"[1]. So rename hwp_walk and hwp_walk_ops to hwpoison_walk and hwpoison_walk_ops respectively. raw_hwp_(page|list), *_raw_hwp, and raw_hwp_unreliable flag are other major appearances of "hwp". However, given the "raw" hint in the name, it is easy to differentiate them from other "hwp" acronyms. Since renaming them is not as straightforward as renaming hwp_walk*, they are not covered by this commit. [1] https://lore.kernel.org/lkml/20230707201904.953262-5-jiaqiyan@google.com/T/#me6fecb8ce1ad4d5769199c9e162a44bc88f7bdec Link: https://lkml.kernel.org/r/20230713235553.4121855-1-jiaqiyan@google.com Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-02mm: memory-failure: add PageOffline() checkMiaohe Lin
Memory failure is not interested in logically offlined pages. Skip this type of page. Link: https://lkml.kernel.org/r/20230727115643.639741-5-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-02Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "Updates to the usual drivers (ufs, lpfc, qla2xxx, mpi3mr, libsas) and the usual minor updates and bug fixes but no significant core changes" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (116 commits) scsi: storvsc: Handle additional SRB status values scsi: libsas: Delete sas_ata_task.retry_count scsi: libsas: Delete sas_ata_task.stp_affil_pol scsi: libsas: Delete sas_ata_task.set_affil_pol scsi: libsas: Delete sas_ssp_task.task_prio scsi: libsas: Delete sas_ssp_task.enable_first_burst scsi: libsas: Delete sas_ssp_task.retry_count scsi: libsas: Delete struct scsi_core scsi: libsas: Delete enum sas_phy_type scsi: libsas: Delete enum sas_class scsi: libsas: Delete sas_ha_struct.lldd_module scsi: target: Fix write perf due to unneeded throttling scsi: lpfc: Do not abuse UUID APIs and LPFC_COMPRESS_VMID_SIZE scsi: pm8001: Remove unused declarations scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock scsi: elx: sli4: Remove code duplication scsi: bfa: Replace one-element array with flexible-array member in struct fc_rscn_pl_s scsi: qla2xxx: Remove unused declarations scsi: pmcraid: Use pci_dev_id() to simplify the code scsi: pm80xx: Set RETFIS when requested by libsas ...
2023-09-02Merge tag 'probes-v6.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes updates from Masami Hiramatsu: - kprobes: use struct_size() for variable size kretprobe_instance data structure. - eprobe: Simplify trace_eprobe list iteration. - probe events: Data structure field access support on BTF argument. - Update BTF argument support on the functions in the kernel loadable modules (only loaded modules are supported). - Move generic BTF access function (search function prototype and get function parameters) to a separated file. - Add a function to search a member of data structure in BTF. - Support accessing BTF data structure member from probe args by C-like arrow('->') and dot('.') operators. e.g. 't sched_switch next=next->pid vruntime=next->se.vruntime' - Support accessing BTF data structure member from $retval. e.g. 'f getname_flags%return +0($retval->name):string' - Add string type checking if BTF type info is available. This will reject if user specify ":string" type for non "char pointer" type. - Automatically assume the fprobe event as a function return event if $retval is used. - selftests/ftrace: Add BTF data field access test cases. - Documentation: Update fprobe event example with BTF data field. * tag 'probes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: Documentation: tracing: Update fprobe event example with BTF field selftests/ftrace: Add BTF fields access testcases tracing/fprobe-event: Assume fprobe is a return event by $retval tracing/probes: Add string type check with BTF tracing/probes: Support BTF field access from $retval tracing/probes: Support BTF based data structure field access tracing/probes: Add a function to search a member of a struct/union tracing/probes: Move finding func-proto API and getting func-param API to trace_btf tracing/probes: Support BTF argument on module functions tracing/eprobe: Iterate trace_eprobe directly kernel: kprobes: Use struct_size()
2023-09-02Merge tag 'trace-v6.6-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull more tracing updates from Steven Rostedt: "Tracing fixes and clean ups: - Replace strlcpy() with strscpy() - Initialize the pipe cpumask to zero on allocation - Use within_module() instead of open coding it - Remove extra space in hwlat_detectory/mode output - Use LIST_HEAD() instead of open coding it - A bunch of clean ups and fixes for the cpumask filter - Set local da_mon_##name to static - Fix race in snapshot buffer between cpu write and swap" * tag 'trace-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/filters: Fix coding style issues tracing/filters: Change parse_pred() cpulist ternary into an if block tracing/filters: Fix double-free of struct filter_pred.mask tracing/filters: Fix error-handling of cpulist parsing buffer tracing: Zero the pipe cpumask on alloc to avoid spurious -EBUSY ftrace: Use LIST_HEAD to initialize clear_hash ftrace: Use within_module to check rec->ip within specified module. tracing: Replace strlcpy with strscpy in trace/events/task.h tracing: Fix race issue between cpu buffer write and swap tracing: Remove extra space at the end of hwlat_detector/mode rv: Set variable 'da_mon_##name' to static
2023-09-02Merge tag 'pstore-v6.6-rc1-fix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore fix from Kees Cook: - Adjust sizes of buffers just avoid uncompress failures (Ard Biesheuvel) * tag 'pstore-v6.6-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore: Base compression input buffer size on estimated compressed size
2023-09-02Merge tag 'x86-urgent-2023-09-02' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 selftest fix from Ingo Molnar: "Fix the __NR_map_shadow_stack syscall-renumbering fallout in the x86 self-test code. [ Arguably the existing code was unnecessarily fragile, and tooling should have picked up the new syscall number, and a wider fix is being worked on - but meanwhile, let's not have the old syscall number in the kernel tree. ]" * tag 'x86-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests/x86: Update map_shadow_stack syscall nr
2023-09-02Merge tag 'timers-urgent-2023-09-02' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix false positive 'softirq work is pending' messages on -rt kernels, caused by a buggy factoring-out of existing code" * tag 'timers-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick/rcu: Fix false positive "softirq work is pending" messages
2023-09-02Merge tag 'smp-urgent-2023-09-02' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug fix from Ingo Molnar: "Fix a CPU hotplug related deadlock between the task which initiates and controls a CPU hot-unplug operation vs. the CFS bandwidth timer" * tag 'smp-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Prevent self deadlock on CPU hot-unplug
2023-09-02Merge tag 'sched-urgent-2023-09-02' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Miscellaneous scheduler fixes: a reporting fix, a static symbol fix, and a kernel-doc fix" * tag 'sched-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Report correct state for TASK_IDLE | TASK_FREEZABLE sched/fair: Make update_entity_lag() static sched/core: Add kernel-doc for set_cpus_allowed_ptr()
2023-09-02mm/pagewalk: fix bootstopping regression from extra pte_unmap()Hugh Dickins
Mikhail reports early-6.6-based Fedora Rawhide not booting: "rcu_preempt detected expedited stalls", minutes wait, and then hung_task splat while kworker trying to synchronize_rcu_expedited(). Nothing logged to disk. He bisected to my 6.6 a349d72fd9ef ("mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s"): but the one to blame is my 6.5 commit to fix the espfix "bad pmd" warnings when booting x86_64 with CONFIG_EFI_PGT_DUMP=y. Gaah, that added an "addr >= TASK_SIZE" check to avoid pte_offset_map(), but failed to add the equivalent check when choosing to pte_unmap(). It's not a problem on 6.5 (for different reasons, it's harmless on both 64-bit and 32-bit), but becomes a bootstopper on 6.6 with the unbalanced rcu_read_unlock() - RCU has a WARN_ON_ONCE for that, but it would have scrolled off Mikhail's console too quickly. Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Closes: https://lore.kernel.org/linux-mm/CABXGCsNi8Tiv5zUPNXr6UJw6qV1VdaBEfGqEAMkkXE3QPvZuAQ@mail.gmail.com/ Fixes: 8b1cb4a2e819 ("mm/pagewalk: fix EFI_PGT_DUMP of espfix area") Fixes: a349d72fd9ef ("mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s") Signed-off-by: Hugh Dickins <hughd@google.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-09-02cgroup: fix build when CGROUP_SCHED is not enabledLinus Torvalds
Sudip Mukherjee reports that the mips sb1250_swarm_defconfig build fails with the current kernel. It isn't actually MIPS-specific, it's just that that defconfig does not have CGROUP_SCHED enabled like most configs do, and as such shows this error: kernel/cgroup/cgroup.c: In function 'cgroup_local_stat_show': kernel/cgroup/cgroup.c:3699:15: error: implicit declaration of function 'cgroup_tryget_css'; did you mean 'cgroup_tryget'? [-Werror=implicit-function-declaration] 3699 | css = cgroup_tryget_css(cgrp, ss); | ^~~~~~~~~~~~~~~~~ | cgroup_tryget kernel/cgroup/cgroup.c:3699:13: warning: assignment to 'struct cgroup_subsys_state *' from 'int' makes pointer from integer without a cast [-Wint-conversion] 3699 | css = cgroup_tryget_css(cgrp, ss); | ^ because cgroup_tryget_css() only exists when CGROUP_SCHED is enabled, and the cgroup_local_stat_show() function should similarly be guarded by that config option. Move things around a bit to fix this all. Fixes: d1d4ff5d11a5 ("cgroup: put cgroup_tryget_css() inside CONFIG_CGROUP_SCHED") Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-09-02fbdev/g364fb: fix build failure with mipsSudip Mukherjee
Fix the typo which resulted in the driver using FB_DEFAULT_IOMEM_HELPERS instead of FB_DEFAULT_IOMEM_OPS as the fbdev I/O helpers. Fixes: 501126083855 ("fbdev/g364fb: Use fbdev I/O helpers") Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-09-02Merge branch 'fixes' into miscJames Bottomley
2023-09-02ata: libata-core: Disable NCQ_TRIM on Micron 1100 drivesPawel Zmarzly
Micron 1100 drives lock up when encountering queued TRIM command. It is a quite old hardware series, for past years we have been running our machines with these drives using libata.force=noncqtrim. [Damien] Move the "Crucial_CT*M500*" entry to keep Micron and Crucial entries together. Signed-off-by: Pawel Zmarzly <pzmarzly@meta.com> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-09-02ata: ahci: Add Elkhart Lake AHCI controllerWerner Fischer
Elkhart Lake is the successor of Apollo Lake and Gemini Lake. These CPUs and their PCHs are used in mobile and embedded environments. With this patch I suggest that Elkhart Lake SATA controllers [1] should use the default LPM policy for mobile chipsets. The disadvantage of missing hot-plug support with this setting should not be an issue, as those CPUs are used in embedded environments and not in servers with hot-plug backplanes. We discovered that the Elkhart Lake SATA controllers have been missing in ahci.c after a customer reported the throttling of his SATA SSD after a short period of higher I/O. We determined the high temperature of the SSD controller in idle mode as the root cause for that. Depending on the used SSD, we have seen up to 1.8 Watt lower system idle power usage and up to 30°C lower SSD controller temperatures in our tests, when we set med_power_with_dipm manually. I have provided a table showing seven different SATA SSDs from ATP, Intel/Solidigm and Samsung [2]. Intel lists a total of 3 SATA controller IDs (4B60, 4B62, 4B63) in [1] for those mobile PCHs. This commit just adds 0x4b63 as I do not have test systems with 0x4b60 and 0x4b62 SATA controllers. I have tested this patch with a system which uses 0x4b63 as SATA controller. [1] https://sata-io.org/product/8803 [2] https://www.thomas-krenn.com/en/wiki/SATA_Link_Power_Management#Example_LES_v4 Signed-off-by: Werner Fischer <devlists@wefi.net> Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-09-01tracing/filters: Fix coding style issuesValentin Schneider
Recent commits have introduced some coding style issues, fix those up. Link: https://lkml.kernel.org/r/20230901151039.125186-5-vschneid@redhat.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-01tracing/filters: Change parse_pred() cpulist ternary into an if blockValentin Schneider
Review comments noted that an if block would be clearer than a ternary, so swap it out. No change in behaviour intended Link: https://lkml.kernel.org/r/20230901151039.125186-4-vschneid@redhat.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>