summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-04-03scsi: megaraid_sas: Block zero-length ATA VPD inquiryChandrakanth Patil
A firmware bug was observed where ATA VPD inquiry commands with a zero-length data payload were not handled and failed with a non-standard status code of 0xf0. Avoid sending ATA VPD inquiry commands without data payload by setting the device no_vpd_size flag to 1. In addition, if the firmware returns a status code of 0xf0, set scsi_cmnd->result to CHECK_CONDITION to facilitate proper error handling. Suggested-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250402193735.5098-1-chandrakanth.patil@broadcom.com Tested-by: Ryan Lahfa <ryan@lahfa.xyz> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-04-03scsi: scsi_transport_srp: Replace min/max nesting with clamp()Li Haoran
This patch replaces min(a, max(b, c)) by clamp(val, lo, hi) in the SRP transport layer. The clamp() macro explicitly expresses the intent of constraining a value within bounds, improving code readability. Signed-off-by: Li Haoran <li.haoran7@zte.com.cn> Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn> Link: https://lore.kernel.org/r/202503311555115618U8Md16mKpRYOIy2TOmB6@zte.com.cn Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-04-03scsi: ufs: core: Add device level exception supportBao D. Nguyen
The ufs device JEDEC specification version 4.1 adds support for the device level exception events. To support this new device level exception feature, expose two new sysfs nodes below to provide the user space access to the device level exception information. /sys/bus/platform/drivers/ufshcd/*/device_lvl_exception_count /sys/bus/platform/drivers/ufshcd/*/device_lvl_exception_id The device_lvl_exception_count sysfs node reports the number of device level exceptions that have occurred since the last time this variable is reset. Writing a value of 0 will reset it. The device_lvl_exception_id reports the exception ID which is the qDeviceLevelExceptionID attribute of the device JEDEC specifications version 4.1 and later. The user space application can query these sysfs nodes to get more information about the device level exception. Signed-off-by: Bao D. Nguyen <quic_nguyenb@quicinc.com> Link: https://lore.kernel.org/r/6278d7c125b2f0cf5056f4a647a4b9c1fdd24fc7.1743198325.git.quic_nguyenb@quicinc.com Reviewed-by: Peter Wang <peter.wang@mediatek.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Arthur Simchaev <arthur.simchaev@sandisk.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-04-03scsi: ufs: core: Rename ufshcd_wb_presrv_usrspc_keep_vcc_on()Bao D. Nguyen
The ufshcd_wb_presrv_usrspc_keep_vcc_on() function has deviated from its original implementation. The "_keep_vcc_on" part of the function name is misleading. Rename the function to ufshcd_wb_curr_buff_threshold_check() to improve the readability. Also, updated the comments in the function. There is no change to the functionality. Signed-off-by: Bao D. Nguyen <quic_nguyenb@quicinc.com> Link: https://lore.kernel.org/r/02ae5e133f6ebf23b54d943e6d1d9de2544eb80e.1743192926.git.quic_nguyenb@quicinc.com Reviewed-by: Avri Altman <avri.altman@sandisk.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-04-03block: don't grab elevator lock during queue initializationMing Lei
->elevator_lock depends on queue freeze lock, see block/blk-sysfs.c. queue freeze lock depends on fs_reclaim. So don't grab elevator lock during queue initialization which needs to call kmalloc(GFP_KERNEL), and we can cut the dependency between ->elevator_lock and fs_reclaim, then the lockdep warning can be killed. This way is safe because elevator setting isn't ready to run during queue initialization. There isn't such issue in __blk_mq_update_nr_hw_queues() because memalloc_noio_save() is called before acquiring elevator lock. Fixes the following lockdep warning: https://lore.kernel.org/linux-block/67e6b425.050a0220.2f068f.007b.GAE@google.com/ Reported-by: syzbot+4c7e0f9b94ad65811efb@syzkaller.appspotmail.com Cc: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250403105402.1334206-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-03io_uring: always do atomic put from iowqPavel Begunkov
io_uring always switches requests to atomic refcounting for iowq execution before there is any parallilism by setting REQ_F_REFCOUNT, and the flag is not cleared until the request completes. That should be fine as long as the compiler doesn't make up a non existing value for the flags, however KCSAN still complains when the request owner changes oter flag bits: BUG: KCSAN: data-race in io_req_task_cancel / io_wq_free_work ... read to 0xffff888117207448 of 8 bytes by task 3871 on cpu 0: req_ref_put_and_test io_uring/refs.h:22 [inline] Skip REQ_F_REFCOUNT checks for iowq, we know it's set. Reported-by: syzbot+903a2ad71fb3f1e47cf5@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d880bc27fb8c3209b54641be4ff6ac02b0e5789a.1743679736.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-03scsi: smartpqi: Use is_kdump_kernel() to check for kdumpMartin Wilck
The smartpqi driver checks the reset_devices variable to determine whether special adjustments need to be made for kdump. This has the effect that after a regular kexec reboot, some driver parameters such as max_transfer_size are much lower than usual. More importantly, kexec reboot tests have revealed memory corruption caused by the driver log being written to system memory after a kexec. Fix this by testing is_kdump_kernel() rather than reset_devices where appropriate. Fixes: 058311b72f54 ("scsi: smartpqi: Add fw log to kdump") Signed-off-by: Martin Wilck <mwilck@suse.com> Link: https://lore.kernel.org/r/20250321223319.109250-1-mwilck@suse.com Cc: Randy Wright <rwright@hpe.com> Acked-by: Don Brace <don.brace@microchip.com> Tested-by: Don Brace <don.brace@microchip.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-04-03scsi: pm80xx: Set phy_attached to zero when device is goneIgor Pylypiv
When a fatal error occurs, a phy down event may not be received to set phy->phy_attached to zero. Signed-off-by: Igor Pylypiv <ipylypiv@google.com> Signed-off-by: Salomon Dushimirimana <salomondush@google.com> Link: https://lore.kernel.org/r/20250319230305.3172920-1-salomondush@google.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-04-03Merge patch series "ufs-exynos stability fixes for gs101"Martin K. Petersen
Peter Griffin <peter.griffin@linaro.org> says: Hi folks, This series fixes several stability issues with the upstream ufs-exynos driver, specifically for the gs101 SoC found in Pixel 6. The main fix is regarding the IO cache coherency setting and ensuring that it is correctly applied depending on if the dma-coherent property is specified in device tree. This fixes the UFS stability issues on gs101 and I would imagine will also fix issues on exynosauto platform that seems to have similar iocc shareability bits. Additionally the phy reference counting is fixed which allows module load/unload to work reliably and keeps the phy state machine in sync with the controller glue driver. regards, Peter Link: https://lore.kernel.org/r/20250319-exynos-ufs-stability-fixes-v2-0-96722cc2ba1b@linaro.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-04-03scsi: ufs: exynos: gs101: Put UFS device in reset on .suspend()Peter Griffin
GPIO_OUT[0] is connected to the reset pin of embedded UFS device. Before powering off the phy assert the reset signal. This is added as a gs101 specific suspend hook so as not to have any unintended consequences for other SoCs supported by this driver. Signed-off-by: Peter Griffin <peter.griffin@linaro.org> Link: https://lore.kernel.org/r/20250319-exynos-ufs-stability-fixes-v2-7-96722cc2ba1b@linaro.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-04-03scsi: ufs: exynos: Move phy calls to .exit() callbackPeter Griffin
ufshcd_pltfrm_remove() calls ufshcd_remove(hba) which in turn calls ufshcd_hba_exit(). By moving the phy_power_off() and phy_exit() calls to the newly created .exit callback they get called by ufshcd_variant_hba_exit() before ufshcd_hba_exit() turns off the regulators. This is also similar flow to the ufs-qcom driver. Signed-off-by: Peter Griffin <peter.griffin@linaro.org> Link: https://lore.kernel.org/r/20250319-exynos-ufs-stability-fixes-v2-6-96722cc2ba1b@linaro.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-04-03scsi: ufs: exynos: Enable PRDT pre-fetching with UFSHCD_CAP_CRYPTOPeter Griffin
PRDT_PREFETCH_ENABLE[31] bit should be set when desctype field of fmpsecurity0 register is type2 (double file encryption) or type3 (support for file and disk encryption). Setting this bit enables PRDT pre-fetching on both TXPRDT and RXPRDT. Signed-off-by: Peter Griffin <peter.griffin@linaro.org> Link: https://lore.kernel.org/r/20250319-exynos-ufs-stability-fixes-v2-5-96722cc2ba1b@linaro.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-04-03scsi: ufs: exynos: Ensure consistent phy reference countsPeter Griffin
ufshcd_link_startup() can call ufshcd_vops_link_startup_notify() multiple times when retrying. This causes the phy reference count to keep increasing and the phy to not properly re-initialize. If the phy has already been previously powered on, first issue a phy_power_off() and phy_exit(), before re-initializing and powering on again. Signed-off-by: Peter Griffin <peter.griffin@linaro.org> Link: https://lore.kernel.org/r/20250319-exynos-ufs-stability-fixes-v2-4-96722cc2ba1b@linaro.org Fixes: 3d73b200f989 ("scsi: ufs: ufs-exynos: Change ufs phy control sequence") Cc: stable@vger.kernel.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-04-03scsi: ufs: exynos: Disable iocc if dma-coherent property isn't setPeter Griffin
If dma-coherent property isn't set then descriptors are non-cacheable and the iocc shareability bits should be disabled. Without this UFS can end up in an incompatible configuration and suffer from random cache related stability issues. Suggested-by: Bart Van Assche <bvanassche@acm.org> Fixes: cc52e15397cc ("scsi: ufs: ufs-exynos: Support ExynosAuto v9 UFS") Signed-off-by: Peter Griffin <peter.griffin@linaro.org> Link: https://lore.kernel.org/r/20250319-exynos-ufs-stability-fixes-v2-3-96722cc2ba1b@linaro.org Cc: Chanho Park <chanho61.park@samsung.com> Cc: stable@vger.kernel.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-04-03scsi: ufs: exynos: Move UFS shareability value to drvdataPeter Griffin
gs101 I/O coherency shareability bits differ from exynosauto SoC. To support both SoCs move this info the SoC drvdata. Currently both the value and mask are the same for both gs101 and exynosauto, thus we use the same value. Signed-off-by: Peter Griffin <peter.griffin@linaro.org> Link: https://lore.kernel.org/r/20250319-exynos-ufs-stability-fixes-v2-2-96722cc2ba1b@linaro.org Fixes: d11e0a318df8 ("scsi: ufs: exynos: Add support for Tensor gs101 SoC") Cc: stable@vger.kernel.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-04-03scsi: ufs: exynos: Ensure pre_link() executes before exynos_ufs_phy_init()Peter Griffin
Ensure clocks are enabled before configuring unipro. Additionally move the pre_link() hook before the exynos_ufs_phy_init() calls. This means the register write sequence more closely resembles the ordering of the downstream driver. Signed-off-by: Peter Griffin <peter.griffin@linaro.org> Link: https://lore.kernel.org/r/20250319-exynos-ufs-stability-fixes-v2-1-96722cc2ba1b@linaro.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-04-03scsi: iscsi: Fix missing scsi_host_put() in error pathMiaoqian Lin
Add goto to ensure scsi_host_put() is called in all error paths of iscsi_set_host_param() function. This fixes a potential memory leak when strlen() check fails. Fixes: ce51c8170084 ("scsi: iscsi: Add strlen() check in iscsi_if_set{_host}_param()") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20250318094344.91776-1-linmq006@gmail.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-04-03netfilter: nft_tunnel: fix geneve_opt type confusion additionLin Ma
When handling multiple NFTA_TUNNEL_KEY_OPTS_GENEVE attributes, the parsing logic should place every geneve_opt structure one by one compactly. Hence, when deciding the next geneve_opt position, the pointer addition should be in units of char *. However, the current implementation erroneously does type conversion before the addition, which will lead to heap out-of-bounds write. [ 6.989857] ================================================================== [ 6.990293] BUG: KASAN: slab-out-of-bounds in nft_tunnel_obj_init+0x977/0xa70 [ 6.990725] Write of size 124 at addr ffff888005f18974 by task poc/178 [ 6.991162] [ 6.991259] CPU: 0 PID: 178 Comm: poc-oob-write Not tainted 6.1.132 #1 [ 6.991655] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 6.992281] Call Trace: [ 6.992423] <TASK> [ 6.992586] dump_stack_lvl+0x44/0x5c [ 6.992801] print_report+0x184/0x4be [ 6.993790] kasan_report+0xc5/0x100 [ 6.994252] kasan_check_range+0xf3/0x1a0 [ 6.994486] memcpy+0x38/0x60 [ 6.994692] nft_tunnel_obj_init+0x977/0xa70 [ 6.995677] nft_obj_init+0x10c/0x1b0 [ 6.995891] nf_tables_newobj+0x585/0x950 [ 6.996922] nfnetlink_rcv_batch+0xdf9/0x1020 [ 6.998997] nfnetlink_rcv+0x1df/0x220 [ 6.999537] netlink_unicast+0x395/0x530 [ 7.000771] netlink_sendmsg+0x3d0/0x6d0 [ 7.001462] __sock_sendmsg+0x99/0xa0 [ 7.001707] ____sys_sendmsg+0x409/0x450 [ 7.002391] ___sys_sendmsg+0xfd/0x170 [ 7.003145] __sys_sendmsg+0xea/0x170 [ 7.004359] do_syscall_64+0x5e/0x90 [ 7.005817] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 7.006127] RIP: 0033:0x7ec756d4e407 [ 7.006339] Code: 48 89 fa 4c 89 df e8 38 aa 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 faf [ 7.007364] RSP: 002b:00007ffed5d46760 EFLAGS: 00000202 ORIG_RAX: 000000000000002e [ 7.007827] RAX: ffffffffffffffda RBX: 00007ec756cc4740 RCX: 00007ec756d4e407 [ 7.008223] RDX: 0000000000000000 RSI: 00007ffed5d467f0 RDI: 0000000000000003 [ 7.008620] RBP: 00007ffed5d468a0 R08: 0000000000000000 R09: 0000000000000000 [ 7.009039] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 7.009429] R13: 00007ffed5d478b0 R14: 00007ec756ee5000 R15: 00005cbd4e655cb8 Fix this bug with correct pointer addition and conversion in parse and dump code. Fixes: 925d844696d9 ("netfilter: nft_tunnel: add support for geneve opts") Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-04-03rseq: Eliminate useless task_work on execveMathieu Desnoyers
Eliminate a useless task_work on execve by moving the call to rseq_set_notify_resume() from sched_mm_cid_after_execve() to the error path of bprm_execve(). The call to rseq_set_notify_resume() from sched_mm_cid_after_execve() is pointless in the success case, because rseq_execve() will clear the rseq pointer before returning to userspace. sched_mm_cid_after_execve() is called from both the success and error paths of bprm_execve(). The call to rseq_set_notify_resume() is needed on error because the mm_cid may have changed. Also move the rseq_execve() to right after sched_mm_cid_after_execve() in bprm_execve(). [ mingo: Merged to a recent upstream kernel, extended the changelog. ] Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250327132945.1558783-1-mathieu.desnoyers@efficios.com
2025-04-03sched/isolation: Make CONFIG_CPU_ISOLATION depend on CONFIG_SMPOleg Nesterov
kernel/sched/isolation.c obviously makes no sense without CONFIG_SMP, but the Kconfig entry we have right now: config CPU_ISOLATION bool "CPU isolation" depends on SMP || COMPILE_TEST allows the creation of pointless .config's which cause build failures. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250330134955.GA7910@redhat.com Closes: https://lore.kernel.org/oe-kbuild-all/202503260646.lrUqD3j5-lkp@intel.com/
2025-04-03net: decrease cached dst counters in dst_releaseAntoine Tenart
Upstream fix ac888d58869b ("net: do not delay dst_entries_add() in dst_release()") moved decrementing the dst count from dst_destroy to dst_release to avoid accessing already freed data in case of netns dismantle. However in case CONFIG_DST_CACHE is enabled and OvS+tunnels are used, this fix is incomplete as the same issue will be seen for cached dsts: Unable to handle kernel paging request at virtual address ffff5aabf6b5c000 Call trace: percpu_counter_add_batch+0x3c/0x160 (P) dst_release+0xec/0x108 dst_cache_destroy+0x68/0xd8 dst_destroy+0x13c/0x168 dst_destroy_rcu+0x1c/0xb0 rcu_do_batch+0x18c/0x7d0 rcu_core+0x174/0x378 rcu_core_si+0x18/0x30 Fix this by invalidating the cache, and thus decrementing cached dst counters, in dst_release too. Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel") Signed-off-by: Antoine Tenart <atenart@kernel.org> Link: https://patch.msgid.link/20250326173634.31096-1-atenart@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-03crypto: inside-secure/eip93 - acquire lock on eip93_put_descriptor hashChristian Marangi
In the EIP93 HASH functions, the eip93_put_descriptor is called without acquiring lock. This is problematic when multiple thread execute hash operations. Correctly acquire ring write lock on calling eip93_put_descriptor to prevent concurrent access and mess with the ring pointers. Fixes: 9739f5f93b78 ("crypto: eip93 - Add Inside Secure SafeXcel EIP-93 crypto engine support") Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-03Merge tag 'amd-drm-next-6.15-2025-03-27' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.15-2025-03-27: amdgpu: - Guard against potential division by 0 in fan code - Zero RPM support for SMU 14.0.2 - Properly handle SI and CIK support being disabled - PSR fixes - DML2 fixes - DP Link training fix - Vblank fixes - RAS fixes - Partitioning fix - SDMA fix - SMU 13.0.x fixes - Rom fetching fix - MES fixes - Queue reset fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250328004749.3392457-1-alexander.deucher@amd.com
2025-04-03Merge tag 'drm-xe-next-fixes-2025-03-27' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next Driver Changes: - Fix NULL pointer dereference on error path - Add missing HW workaround for BMG - Fix survivability mode not triggering - Fix build warning when DRM_FBDEV_EMULATION is not set Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/vxy5kwdkzgp2u2umnyxv4ygslmdlvzjl22xotzxaw55dv7plpz@34miqxkbvggu
2025-04-03Merge tag 'drm-intel-next-fixes-2025-03-25' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next drm/i915 fixes for v6.15 merge window: - Bounds check for scalers in DSC prefill latency computation - Fix build by adding a missing include Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/878qota36x.fsf@intel.com
2025-04-03Merge tag 'drm-misc-next-fixes-2025-03-27' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next Short summary of fixes pull: adp: - Fix error handling in plane setup Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20250327141835.GA96037@linux.fritz.box
2025-04-02Merge tag 'firewire-updates-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 Pull firewire update from Takashi Sakamoto: "A single commit to use the common helper function for on-stack trailing array to enqueue any isochronous packet by the requests from userspace applications" * tag 'firewire-updates-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: firewire: core: avoid -Wflex-array-member-not-at-end warning
2025-04-02selftests/bpf: Fix verifier_private_stack test failureYonghong Song
Several verifier_private_stack tests failed with latest bpf-next. For example, for 'Private stack, single prog' subtest, the jitted code: func #0: 0: f3 0f 1e fa endbr64 4: 0f 1f 44 00 00 nopl (%rax,%rax) 9: 0f 1f 00 nopl (%rax) c: 55 pushq %rbp d: 48 89 e5 movq %rsp, %rbp 10: f3 0f 1e fa endbr64 14: 49 b9 58 74 8a 8f 7d 60 00 00 movabsq $0x607d8f8a7458, %r9 1e: 65 4c 03 0c 25 28 c0 48 87 addq %gs:-0x78b73fd8, %r9 27: bf 2a 00 00 00 movl $0x2a, %edi 2c: 49 89 b9 00 ff ff ff movq %rdi, -0x100(%r9) 33: 31 c0 xorl %eax, %eax 35: c9 leave 36: e9 20 5d 0f e1 jmp 0xffffffffe10f5d5b The insn 'addq %gs:-0x78b73fd8, %r9' does not match the expected regex 'addq %gs:0x{{.*}}, %r9' and this caused test failure. Fix it by changing '%gs:0x{{.*}}' to '%gs:{{.*}}' to accommodate the possible negative offset. A few other subtests are fixed in a similar way. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250331033828.365077-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-02selftests/bpf: Fix verifier_bpf_fastcall testSong Liu
Commit [1] moves percpu data on x86 from address 0x000... to address 0xfff... Before [1]: 159020: 0000000000030700 0 OBJECT GLOBAL DEFAULT 23 pcpu_hot After [1]: 152602: ffffffff83a3e034 4 OBJECT GLOBAL DEFAULT 35 pcpu_hot As a result, verifier_bpf_fastcall tests should now expect a negative value for pcpu_hot, IOW, the disassemble should show "r=" instead of "w=". Fix this in the test. Note that, a later change created a new variable "cpu_number" for bpf_get_smp_processor_id() [2]. The inlining logic is updated properly as part of this change, so there is no need to fix anything on the kernel side. [1] commit 9d7de2aa8b41 ("x86/percpu/64: Use relative percpu offsets") [2] commit 01c7bc5198e9 ("x86/smp: Move cpu number to percpu hot section") Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250328193124.808784-1-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-02selftests/bpf: Fix tests after fields reorder in struct fileSong Liu
The change in struct file [1] moved f_ref to the 3rd cache line. It made *(u64 *)file dereference invalid from the verifier point of view, because btf_struct_walk() walks into f_lock field, which is 4-byte long. Fix the selftests to deference the file pointer as a 4-byte access. [1] commit e249056c91a2 ("fs: place f_ref to 3rd cache line in struct file to resolve false sharing") Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250327185528.1740787-1-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-02xsk: Fix __xsk_generic_xmit() error code when cq is fullWang Liang
When the cq reservation is failed, the error code is not set which is initialized to zero in __xsk_generic_xmit(). That means the packet is not send successfully but sendto() return ok. Considering the impact on uapi, return -EAGAIN is a good idea. The cq is full usually because it is not released in time, try to send msg again is appropriate. The bug was at the very early implementation of xsk, so the Fixes tag targets the commit that introduced the changes in xsk_cq_reserve_addr_locked where this fix depends on. Fixes: e6c4047f5122 ("xsk: Use xsk_buff_pool directly for cq functions") Suggested-by: Magnus Karlsson <magnus.karlsson@gmail.com> Signed-off-by: Wang Liang <wangliang74@huawei.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250227081052.4096337-1-wangliang74@huawei.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-02Merge tag 'for-6.15/dm-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mikulas Patocka: - dm-crypt: switch to using the crc32 library - dm-verity, dm-integrity, dm-crypt: documentation improvement - dm-vdo fixes - dm-stripe: enable inline crypto passthrough - dm-integrity: set ti->error on memory allocation failure - dm-bufio: remove unused return value - dm-verity: do forward error correction on metadata I/O errors - dm: fix unconditional IO throttle caused by REQ_PREFLUSH - dm cache: prevent BUG_ON by blocking retries on failed device resumes - dm cache: support shrinking the origin device - dm: restrict dm device size to 2^63-512 bytes - dm-delay: support zoned devices - dm-verity: support block number limits for different ioprio classes - dm-integrity: fix non-constant-time tag verification (security bug) - dm-verity, dm-ebs: fix prefetch-vs-suspend race * tag 'for-6.15/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (27 commits) dm-ebs: fix prefetch-vs-suspend race dm-verity: fix prefetch-vs-suspend race dm-integrity: fix non-constant-time tag verification dm-verity: support block number limits for different ioprio classes dm-delay: support zoned devices dm: restrict dm device size to 2^63-512 bytes dm cache: support shrinking the origin device dm cache: prevent BUG_ON by blocking retries on failed device resumes dm vdo indexer: reorder uds_request to reduce padding dm: fix unconditional IO throttle caused by REQ_PREFLUSH dm vdo: rework processing of loaded refcount byte arrays dm vdo: remove remaining ring references dm-verity: do forward error correction on metadata I/O errors dm-bufio: remove unused return value dm-integrity: set ti->error on memory allocation failure dm: Enable inline crypto passthrough for striped target dm vdo slab-depot: read refcount blocks in large chunks at load time dm vdo vio-pool: allow variable-sized metadata vios dm vdo vio-pool: support pools with multiple data blocks per vio dm vdo vio-pool: add a pool pointer to pooled_vio ...
2025-04-03docs: fs/9p: Add missing "not" in cache documentationTingmao Wang
A quick fix for what I assume is a typo. Signed-off-by: Tingmao Wang <m@maowtm.org> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> Message-ID: <20250330213443.98434-1-m@maowtm.org> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2025-04-02Merge tag 'libnvdimm-for-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Ira Weiny: "Most of the code changes are to remove dead code. The bug fixes are minor, Syzkaller and one for broken devices which are unlikely to be in the field. So no need to backport them. - two patches to remove dead code: nd_attach_ndns() and nd_region_conflict() have not been used since 2017 and 2019 respectively - Fix divide-by-0 if device returns a broken LSA value - Fix Syzkaller reported bug" * tag 'libnvdimm-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm/labels: Fix divide error in nd_label_data_init() libnvdimm: Remove unused nd_attach_ndns libnvdimm: Remove unused nd_region_conflict acpi: nfit: fix narrowing conversion in acpi_nfit_ctl
2025-04-02Merge tag 'cxl-for-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull Compute Express Link (CXL) updates from Dave Jiang: - Add support for Global Persistent Flush (GPF) - Cleanup of DPA partition metadata handling: - Remove the CXL_DECODER_MIXED enum that's not needed anymore - Introduce helpers to access resource and perf meta data - Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info' - Make cxl_dpa_alloc() DPA partition number agnostic - Remove cxl_decoder_mode - Cleanup partition size and perf helpers - Remove unused CXL partition values - Add logging support for CXL CPER endpoint and port protocol errors: - Prefix protocol error struct and function names with cxl_ - Move protocol error definitions and structures to a common location - Remove drivers/firmware/efi/cper_cxl.h to include/linux/cper.h - Add support in GHES to process CXL CPER protocol errors - Process CXL CPER protocol errors - Add trace logging for CXL PCIe port RAS errors - Remove redundant gp_port init - Add validation of cxl device serial number - CXL ABI documentation updates/fixups - A series that uses guard() to clean up open coded mutex lockings and remove gotos for error handling. - Some followup patches to support dirty shutdown accounting: - Add helper to retrieve DVSEC offset for dirty shutdown registers - Rename cxl_get_dirty_shutdown() to cxl_arm_dirty_shutdown() - Add support for dirty shutdown count via sysfs - cxl_test support for dirty shutdown - A series to support CXL mailbox Features commands. Mostly in preparation for CXL EDAC code to utilize the Features commands. It's also in preparation for CXL fwctl support to utilize the CXL Features. The commands include "Get Supported Features", "Get Feature", and "Set Feature". - A series to support extended linear cache support described by the ACPI HMAT table. The addition helps enumerate the cache and also provides additional RAS reporting support for configuration with extended linear cache. (and related fixes for the series). - An update to cxl_test to support a 3-way capable CFMWS - A documentation fix to remove unused "mixed mode" * tag 'cxl-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (39 commits) cxl/region: Fix the first aliased address miscalculation cxl/region: Quiet some dev_warn()s in extended linear cache setup cxl/Documentation: Remove 'mixed' from sysfs mode doc cxl: Fix warning from emitting resource_size_t as long long int on 32bit systems cxl/test: Define a CFMWS capable of a 3 way HB interleave cxl/mem: Do not return error if CONFIG_CXL_MCE unset tools/testing/cxl: Set Shutdown State support cxl/pmem: Export dirty shutdown count via sysfs cxl/pmem: Rename cxl_dirty_shutdown_state() cxl/pci: Introduce cxl_gpf_get_dvsec() cxl/pci: Support Global Persistent Flush (GPF) cxl: Document missing sysfs files cxl: Plug typos in ABI doc cxl/pmem: debug invalid serial number data cxl/cdat: Remove redundant gp_port initialization cxl/memdev: Remove unused partition values cxl/region: Drop goto pattern of construct_region() cxl/region: Drop goto pattern in cxl_dax_region_alloc() cxl/core: Use guard() to drop goto pattern of cxl_dpa_alloc() cxl/core: Use guard() to drop the goto pattern of cxl_dpa_free() ...
2025-04-02Merge tag 'usb-6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB / Thunderbolt updates from Greg KH: "Here is the big set of USB and Thunderbolt driver updates for 6.15-rc1. Included in here are: - Thunderbolt driver and core api updates for new hardware and features - usb-storage const array cleanups - typec driver updates - dwc3 driver updates - xhci driver updates and bugfixes - small USB documentation updates - usb cdns3 driver updates - usb gadget driver updates - other small driver updates and fixes All of these have been in linux-next for a while with no reported issues" * tag 'usb-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (92 commits) thunderbolt: Do not add non-active NVM if NVM upgrade is disabled for retimer thunderbolt: Scan retimers after device router has been enumerated usb: host: cdns3: forward lost power information to xhci usb: host: xhci-plat: allow upper layers to signal power loss usb: xhci: change xhci_resume() parameters to explicit the desired info usb: cdns3-ti: run HW init at resume() if HW was reset usb: cdns3-ti: move reg writes to separate function usb: cdns3: call cdns_power_is_lost() only once in cdns_resume() usb: cdns3: rename hibernated argument of role->resume() to lost_power usb: xhci: tegra: rename `runtime` boolean to `is_auto_runtime` usb: host: xhci-plat: mvebu: use ->quirks instead of ->init_quirk() func usb: dwc3: Don't use %pK through printk usb: core: Don't use %pK through printk usb: gadget: aspeed: Add NULL pointer check in ast_vhub_init_dev() dt-bindings: usb: qcom,dwc3: Synchronize minItems for interrupts and -names usb: common: usb-conn-gpio: switch psy_cfg from of_node to fwnode usb: xhci: Avoid Stop Endpoint retry loop if the endpoint seems Running usb: xhci: Don't change the status of stalled TDs on failed Stop EP xhci: Avoid queuing redundant Stop Endpoint command for stalled endpoint xhci: Handle spurious events on Etron host isoc enpoints ...
2025-04-02Merge tag 'tty-6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver updates from Greg KH: "Here is the big set of serial and tty driver updates for 6.15-rc1. Include in here are the following: - more great tty layer cleanups from Jiri. Someday this will be done, but that's not going to be any year soon... - kdb debug driver reverts to fix a reported issue - lots of .dts binding updates for different devices with serial devices - lots of tiny updates and tweaks and a few bugfixes for different serial drivers. All of these have been in linux-next for a while with no reported issues" * tag 'tty-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (79 commits) tty: serial: fsl_lpuart: Fix unused variable 'sport' build warning serial: stm32: do not deassert RS485 RTS GPIO prematurely serial: 8250: add driver for NI UARTs dt-bindings: serial: snps-dw-apb-uart: document RZ/N1 binding without DMA serial: icom: fix code format problems serial: sh-sci: Save and restore more registers tty: serial: pl011: remove incorrect of_match_ptr annotation dt-bindings: serial: snps-dw-apb-uart: Add support for rk3562 tty: serial: lpuart: only disable CTS instead of overwriting the whole UARTMODIR register tty: caif: removed unused function debugfs_tx() serial: 8250_dma: terminate correct DMA in tx_dma_flush() tty: serial: fsl_lpuart: rename register variables more specifically tty: serial: fsl_lpuart: use port struct directly to simply code tty: serial: fsl_lpuart: Use u32 and u8 for register variables tty: serial: fsl_lpuart: disable transmitter before changing RS485 related registers tty: serial: 8250: Add Brainboxes XC devices dt-bindings: serial: fsl-lpuart: support i.MX94 tty: serial: 8250: Add some more device IDs dt-bindings: serial: samsung: add exynos7870-uart compatible serial: 8250_dw: Comment possible corner cases in serial_out() implementation ...
2025-04-02Merge tag 'staging-6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver updates from Greg KH: "Here is the big set of staging driver cleanups and updates for 6.15-rc1. As expected, with the introduction of the gpib drivers, loads of cleanups and fixes showed up, with the huge majority of changes being for that chunk of drivers. This is good and shows that the community can fix up things in public when asked to. Also included in here are: - small sm750fb cleanups - tiny rtl8723bs cleanups - more vchiq_arm cleanups and changes, hopefully this will get out of staging soon All of these have been in linux-next for almost 2 weeks now with no reported issues" * tag 'staging-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (76 commits) staging: rtl8723bs: fixed a unnecessary parentheses coding style issue staging: vchiq_arm: Improve initial VCHIQ connect staging: vchiq_arm: Create keep-alive thread during probe staging: vchiq_arm: Stop kthreads if vchiq cdev register fails staging: vchiq_arm: Fix possible NPR of keep-alive thread staging: vchiq_arm: Register debugfs after cdev staging: vchiq_arm: Don't use %pK through printk staging: rtl8723bs: select CONFIG_CRYPTO_LIB_AES staging: rtl8723bs: Remove some unused functions, macros, and structs staging: gpib: change return type of t1_delay function to report errors staging: gpib: remove commented-out lines staging: gpib: fix kernel-doc section for usb_gpib_line_status() function staging: gpib: fix kernel-doc section for function usb_gpib_interface_clear() staging: gpib: fix kernel-doc section for write_loop() function staging: gpib: Removing typedef for gpib_board staging: gpib: struct typing for gpib_gboard_t staging: gpib: tnt4882: struct gpib_board staging: gpib: tms9914: struct gpib_board staging: gpib: pc2: struct gpib_board staging: gpib: ni_usb_gpib: struct gpib_board ...
2025-04-02Merge tag 'char-misc-6.15-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc fixes from Greg KH: "Here are two counter driver fixes that I realized I never sent to you for 6.14-final. They have been in my for weeks, as well as linux-next, my fault for not sending them earlier. They are: - bugfix for stm32-lptimer-cnt counter driver - bugfix for microchip-tcb-capture counter driver Again, these have been in linux-next for weeks with no reported issues" * tag 'char-misc-6.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: counter: microchip-tcb-capture: Fix undefined counter channel state on probe counter: stm32-lptimer-cnt: fix error handling when enabling
2025-04-02cifs: update internal version numberSteve French
To 2.52 Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-02cifs: Implement is_network_name_deleted for SMB1Pali Rohár
This change allows Linux SMB1 client to autoreconnect the share when it is modified on server by admin operation which removes and re-adds it. Implementation is reused from SMB2+ is_network_name_deleted callback. There are just adjusted checks for error codes and access to struct smb_hdr. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-02cifs: Remove cifs_truncate_page() as it should be superfluousDavid Howells
The calls to cifs_truncate_page() should be superfluous as the places that call it also call truncate_setsize() or cifs_setsize() and therefore truncate_pagecache() which should also clear the tail part of the folio containing the EOF marker. Further, smb3_simple_falloc() calls both cifs_setsize() and truncate_setsize() in addition to cifs_truncate_page(). Remove the superfluous calls. This gets rid of another function referring to struct page. [Should cifs_setsize() also set inode->i_blocks?] Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <stfrench@microsoft.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-02tunnels: Accept PACKET_HOST in skb_tunnel_check_pmtu().Guillaume Nault
Because skb_tunnel_check_pmtu() doesn't handle PACKET_HOST packets, commit 30a92c9e3d6b ("openvswitch: Set the skbuff pkt_type for proper pmtud support.") forced skb->pkt_type to PACKET_OUTGOING for openvswitch packets that are sent using the OVS_ACTION_ATTR_OUTPUT action. This allowed such packets to invoke the iptunnel_pmtud_check_icmp() or iptunnel_pmtud_check_icmpv6() helpers and thus trigger PMTU update on the input device. However, this also broke other parts of PMTU discovery. Since these packets don't have the PACKET_HOST type anymore, they won't trigger the sending of ICMP Fragmentation Needed or Packet Too Big messages to remote hosts when oversized (see the skb_in->pkt_type condition in __icmp_send() for example). These two skb->pkt_type checks are therefore incompatible as one requires skb->pkt_type to be PACKET_HOST, while the other requires it to be anything but PACKET_HOST. It makes sense to not trigger ICMP messages for non-PACKET_HOST packets as these messages should be generated only for incoming l2-unicast packets. However there doesn't seem to be any reason for skb_tunnel_check_pmtu() to ignore PACKET_HOST packets. Allow both cases to work by allowing skb_tunnel_check_pmtu() to work on PACKET_HOST packets and not overriding skb->pkt_type in openvswitch anymore. Fixes: 30a92c9e3d6b ("openvswitch: Set the skbuff pkt_type for proper pmtud support.") Fixes: 4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets") Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Tested-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/eac941652b86fddf8909df9b3bf0d97bc9444793.1743208264.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02vsock: avoid timeout during connect() if the socket is closingStefano Garzarella
When a peer attempts to establish a connection, vsock_connect() contains a loop that waits for the state to be TCP_ESTABLISHED. However, the other peer can be fast enough to accept the connection and close it immediately, thus moving the state to TCP_CLOSING. When this happens, the peer in the vsock_connect() is properly woken up, but since the state is not TCP_ESTABLISHED, it goes back to sleep until the timeout expires, returning -ETIMEDOUT. If the socket state is TCP_CLOSING, waiting for the timeout is pointless. vsock_connect() can return immediately without errors or delay since the connection actually happened. The socket will be in a closing state, but this is not an issue, and subsequent calls will fail as expected. We discovered this issue while developing a test that accepts and immediately closes connections to stress the transport switch between two connect() calls, where the first one was interrupted by a signal (see Closes link). Reported-by: Luigi Leonardi <leonardi@redhat.com> Closes: https://lore.kernel.org/virtualization/bq6hxrolno2vmtqwcvb5bljfpb7mvwb3kohrvaed6auz5vxrfv@ijmd2f3grobn/ Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Luigi Leonardi <leonardi@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Link: https://patch.msgid.link/20250328141528.420719-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02Merge branch ↵Jakub Kicinski
'udp-fix-two-integer-overflows-when-sk-sk_rcvbuf-is-close-to-int_max' Kuniyuki Iwashima says: ==================== udp: Fix two integer overflows when sk->sk_rcvbuf is close to INT_MAX. I got a report that UDP mem usage in /proc/net/sockstat did not drop even after an application was terminated. The issue could happen if sk->sk_rmem_alloc wraps around due to a large sk->sk_rcvbuf, which was INT_MAX in our case. The patch 2 fixes the issue, and the patch 1 fixes yet another overflow I found while investigating the issue. v3: https://lore.kernel.org/20250327202722.63756-1-kuniyu@amazon.com v2: https://lore.kernel.org/20250325195826.52385-1-kuniyu@amazon.com v1: https://lore.kernel.org/20250323231016.74813-1-kuniyu@amazon.com ==================== Link: https://patch.msgid.link/20250401184501.67377-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02udp: Fix memory accounting leak.Kuniyuki Iwashima
Matt Dowling reported a weird UDP memory usage issue. Under normal operation, the UDP memory usage reported in /proc/net/sockstat remains close to zero. However, it occasionally spiked to 524,288 pages and never dropped. Moreover, the value doubled when the application was terminated. Finally, it caused intermittent packet drops. We can reproduce the issue with the script below [0]: 1. /proc/net/sockstat reports 0 pages # cat /proc/net/sockstat | grep UDP: UDP: inuse 1 mem 0 2. Run the script till the report reaches 524,288 # python3 test.py & sleep 5 # cat /proc/net/sockstat | grep UDP: UDP: inuse 3 mem 524288 <-- (INT_MAX + 1) >> PAGE_SHIFT 3. Kill the socket and confirm the number never drops # pkill python3 && sleep 5 # cat /proc/net/sockstat | grep UDP: UDP: inuse 1 mem 524288 4. (necessary since v6.0) Trigger proto_memory_pcpu_drain() # python3 test.py & sleep 1 && pkill python3 5. The number doubles # cat /proc/net/sockstat | grep UDP: UDP: inuse 1 mem 1048577 The application set INT_MAX to SO_RCVBUF, which triggered an integer overflow in udp_rmem_release(). When a socket is close()d, udp_destruct_common() purges its receive queue and sums up skb->truesize in the queue. This total is calculated and stored in a local unsigned integer variable. The total size is then passed to udp_rmem_release() to adjust memory accounting. However, because the function takes a signed integer argument, the total size can wrap around, causing an overflow. Then, the released amount is calculated as follows: 1) Add size to sk->sk_forward_alloc. 2) Round down sk->sk_forward_alloc to the nearest lower multiple of PAGE_SIZE and assign it to amount. 3) Subtract amount from sk->sk_forward_alloc. 4) Pass amount >> PAGE_SHIFT to __sk_mem_reduce_allocated(). When the issue occurred, the total in udp_destruct_common() was 2147484480 (INT_MAX + 833), which was cast to -2147482816 in udp_rmem_release(). At 1) sk->sk_forward_alloc is changed from 3264 to -2147479552, and 2) sets -2147479552 to amount. 3) reverts the wraparound, so we don't see a warning in inet_sock_destruct(). However, udp_memory_allocated ends up doubling at 4). Since commit 3cd3399dd7a8 ("net: implement per-cpu reserves for memory_allocated"), memory usage no longer doubles immediately after a socket is close()d because __sk_mem_reduce_allocated() caches the amount in udp_memory_per_cpu_fw_alloc. However, the next time a UDP socket receives a packet, the subtraction takes effect, causing UDP memory usage to double. This issue makes further memory allocation fail once the socket's sk->sk_rmem_alloc exceeds net.ipv4.udp_rmem_min, resulting in packet drops. To prevent this issue, let's use unsigned int for the calculation and call sk_forward_alloc_add() only once for the small delta. Note that first_packet_length() also potentially has the same problem. [0]: from socket import * SO_RCVBUFFORCE = 33 INT_MAX = (2 ** 31) - 1 s = socket(AF_INET, SOCK_DGRAM) s.bind(('', 0)) s.setsockopt(SOL_SOCKET, SO_RCVBUFFORCE, INT_MAX) c = socket(AF_INET, SOCK_DGRAM) c.connect(s.getsockname()) data = b'a' * 100 while True: c.send(data) Fixes: f970bd9e3a06 ("udp: implement memory accounting helpers") Reported-by: Matt Dowling <madowlin@amazon.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250401184501.67377-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02udp: Fix multiple wraparounds of sk->sk_rmem_alloc.Kuniyuki Iwashima
__udp_enqueue_schedule_skb() has the following condition: if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) goto drop; sk->sk_rcvbuf is initialised by net.core.rmem_default and later can be configured by SO_RCVBUF, which is limited by net.core.rmem_max, or SO_RCVBUFFORCE. If we set INT_MAX to sk->sk_rcvbuf, the condition is always false as sk->sk_rmem_alloc is also signed int. Then, the size of the incoming skb is added to sk->sk_rmem_alloc unconditionally. This results in integer overflow (possibly multiple times) on sk->sk_rmem_alloc and allows a single socket to have skb up to net.core.udp_mem[1]. For example, if we set a large value to udp_mem[1] and INT_MAX to sk->sk_rcvbuf and flood packets to the socket, we can see multiple overflows: # cat /proc/net/sockstat | grep UDP: UDP: inuse 3 mem 7956736 <-- (7956736 << 12) bytes > INT_MAX * 15 ^- PAGE_SHIFT # ss -uam State Recv-Q ... UNCONN -1757018048 ... <-- flipping the sign repeatedly skmem:(r2537949248,rb2147483646,t0,tb212992,f1984,w0,o0,bl0,d0) Previously, we had a boundary check for INT_MAX, which was removed by commit 6a1f12dd85a8 ("udp: relax atomic operation on sk->sk_rmem_alloc"). A complete fix would be to revert it and cap the right operand by INT_MAX: rmem = atomic_add_return(size, &sk->sk_rmem_alloc); if (rmem > min(size + (unsigned int)sk->sk_rcvbuf, INT_MAX)) goto uncharge_drop; but we do not want to add the expensive atomic_add_return() back just for the corner case. Casting rmem to unsigned int prevents multiple wraparounds, but we still allow a single wraparound. # cat /proc/net/sockstat | grep UDP: UDP: inuse 3 mem 524288 <-- (INT_MAX + 1) >> 12 # ss -uam State Recv-Q ... UNCONN -2147482816 ... <-- INT_MAX + 831 bytes skmem:(r2147484480,rb2147483646,t0,tb212992,f3264,w0,o0,bl0,d14468947) So, let's define rmem and rcvbuf as unsigned int and check skb->truesize only when rcvbuf is large enough to lower the overflow possibility. Note that we still have a small chance to see overflow if multiple skbs to the same socket are processed on different core at the same time and each size does not exceed the limit but the total size does. Note also that we must ignore skb->truesize for a small buffer as explained in commit 363dc73acacb ("udp: be less conservative with sock rmem accounting"). Fixes: 6a1f12dd85a8 ("udp: relax atomic operation on sk->sk_rmem_alloc") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250401184501.67377-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02rtnetlink: Use register_pernet_subsys() in rtnl_net_debug_init().Kuniyuki Iwashima
rtnl_net_debug_init() registers rtnl_net_debug_net_ops by register_pernet_device() but calls unregister_pernet_subsys() in case register_netdevice_notifier() fails. It corrupts pernet_list because first_device is updated in register_pernet_device() but not unregister_pernet_subsys(). Let's fix it by calling register_pernet_subsys() instead. The _subsys() one fits better for the use case because it keeps the notifier alive until default_device_exit_net(), giving it more chance to test NETDEV_UNREGISTER. Fixes: 03fa53485659 ("rtnetlink: Add ASSERT_RTNL_NET() placeholder for netdev notifier.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250401190716.70437-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02Merge tag 'nfs-for-6.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Bugfixes: - Three fixes for looping in the NFSv4 state manager delegation code - Fix for the NFSv4 state XDR code (Neil Brown) - Fix a leaked reference in nfs_lock_and_join_requests() - Fix a use-after-free in the delegation return code Features: - Implement the NFSv4.2 copy offload OFFLOAD_STATUS operation to allow monitoring of an in-progress copy - Add a mount option to force NFSv3/NFSv4 to use READDIRPLUS in a getdents() call - SUNRPC now allows some basic management of an existing RPC client's connections using sysfs - Improvements to the automated teardown of a NFS client when the container it was initiated from gets killed - Improvements to prevent tasks from getting stuck in a killable wait state after calling exit_signals()" * tag 'nfs-for-6.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (29 commits) nfs: Add missing release on error in nfs_lock_and_join_requests() NFSv4: Check for delegation validity in nfs_start_delegation_return_locked() NFS: Don't allow waiting for exiting tasks SUNRPC: Don't allow waiting for exiting tasks NFSv4: Treat ENETUNREACH errors as fatal for state recovery NFSv4: clp->cl_cons_state < 0 signifies an invalid nfs_client NFSv4: Further cleanups to shutdown loops NFS: Shut down the nfs_client only after all the superblocks SUNRPC: rpc_clnt_set_transport() must not change the autobind setting SUNRPC: rpcbind should never reset the port to the value '0' pNFS/flexfiles: Report ENETDOWN as a connection error pNFS/flexfiles: Treat ENETUNREACH errors as fatal in containers NFS: Treat ENETUNREACH errors as fatal in containers NFS: Add a mount option to make ENETUNREACH errors fatal sunrpc: Add a sysfs file for one-step xprt deletion sunrpc: Add a sysfs file for adding a new xprt sunrpc: Add a sysfs files for rpc_clnt information sunrpc: Add a sysfs attr for xprtsec NFS: Add implid to sysfs NFS: Extend rdirplus mount option with "force|none" ...
2025-04-02Merge tag 'fuse-update-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Allow connection to server to time out (Joanne Koong) - If server doesn't support creating a hard link, return EPERM rather than ENOSYS (Matt Johnston) - Allow file names longer than 1024 chars (Bernd Schubert) - Fix a possible race if request on io_uring queue is interrupted (Bernd Schubert) - Misc fixes and cleanups * tag 'fuse-update-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: remove unneeded atomic set in uring creation fuse: fix uring race condition for null dereference of fc fuse: Increase FUSE_NAME_MAX to PATH_MAX fuse: Allocate only namelen buf memory in fuse_notify_ fuse: add default_request_timeout and max_request_timeout sysctls fuse: add kernel-enforced timeout option for requests fuse: optmize missing FUSE_LINK support fuse: Return EPERM rather than ENOSYS from link() fuse: removed unused function fuse_uring_create() from header fuse: {io-uring} Fix a possible req cancellation race