summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-10-15net: phy: mdio-bcm-unimac: Add BCM6846 supportLinus Walleij
Add Unimac mdio compatible string for the special BCM6846 variant. This variant has a few extra registers compared to other versions. Suggested-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/linux-devicetree/b542b2e8-115c-4234-a464-e73aa6bece5c@broadcom.com/ Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://patch.msgid.link/20241012-bcm6846-mdio-v1-2-c703ca83e962@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15dt-bindings: net: brcm,unimac-mdio: Add bcm6846-mdioLinus Walleij
The MDIO block in the BCM6846 is not identical to any of the previous versions, but has extended registers not present in the other variants. For this reason we need to use a new compatible especially for this SoC. Suggested-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/linux-devicetree/b542b2e8-115c-4234-a464-e73aa6bece5c@broadcom.com/ Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20241012-bcm6846-mdio-v1-1-c703ca83e962@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15udp: Compute L4 checksum as usual when not segmenting the skbJakub Sitnicki
If: 1) the user requested USO, but 2) there is not enough payload for GSO to kick in, and 3) the egress device doesn't offer checksum offload, then we want to compute the L4 checksum in software early on. In the case when we are not taking the GSO path, but it has been requested, the software checksum fallback in skb_segment doesn't get a chance to compute the full checksum, if the egress device can't do it. As a result we end up sending UDP datagrams with only a partial checksum filled in, which the peer will discard. Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload") Reported-by: Ivan Babrou <ivan@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20241011-uso-swcsum-fixup-v2-1-6e1ddc199af9@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15genetlink: hold RCU in genlmsg_mcast()Eric Dumazet
While running net selftests with CONFIG_PROVE_RCU_LIST=y I saw one lockdep splat [1]. genlmsg_mcast() uses for_each_net_rcu(), and must therefore hold RCU. Instead of letting all callers guard genlmsg_multicast_allns() with a rcu_read_lock()/rcu_read_unlock() pair, do it in genlmsg_mcast(). This also means the @flags parameter is useless, we need to always use GFP_ATOMIC. [1] [10882.424136] ============================= [10882.424166] WARNING: suspicious RCU usage [10882.424309] 6.12.0-rc2-virtme #1156 Not tainted [10882.424400] ----------------------------- [10882.424423] net/netlink/genetlink.c:1940 RCU-list traversed in non-reader section!! [10882.424469] other info that might help us debug this: [10882.424500] rcu_scheduler_active = 2, debug_locks = 1 [10882.424744] 2 locks held by ip/15677: [10882.424791] #0: ffffffffb6b491b0 (cb_lock){++++}-{3:3}, at: genl_rcv (net/netlink/genetlink.c:1219) [10882.426334] #1: ffffffffb6b49248 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg (net/netlink/genetlink.c:61 net/netlink/genetlink.c:57 net/netlink/genetlink.c:1209) [10882.426465] stack backtrace: [10882.426805] CPU: 14 UID: 0 PID: 15677 Comm: ip Not tainted 6.12.0-rc2-virtme #1156 [10882.426919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [10882.427046] Call Trace: [10882.427131] <TASK> [10882.427244] dump_stack_lvl (lib/dump_stack.c:123) [10882.427335] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822) [10882.427387] genlmsg_multicast_allns (net/netlink/genetlink.c:1940 (discriminator 7) net/netlink/genetlink.c:1977 (discriminator 7)) [10882.427436] l2tp_tunnel_notify.constprop.0 (net/l2tp/l2tp_netlink.c:119) l2tp_netlink [10882.427683] l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:253) l2tp_netlink [10882.427748] genl_family_rcv_msg_doit (net/netlink/genetlink.c:1115) [10882.427834] genl_rcv_msg (net/netlink/genetlink.c:1195 net/netlink/genetlink.c:1210) [10882.427877] ? __pfx_l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:186) l2tp_netlink [10882.427927] ? __pfx_genl_rcv_msg (net/netlink/genetlink.c:1201) [10882.427959] netlink_rcv_skb (net/netlink/af_netlink.c:2551) [10882.428069] genl_rcv (net/netlink/genetlink.c:1220) [10882.428095] netlink_unicast (net/netlink/af_netlink.c:1332 net/netlink/af_netlink.c:1357) [10882.428140] netlink_sendmsg (net/netlink/af_netlink.c:1901) [10882.428210] ____sys_sendmsg (net/socket.c:729 (discriminator 1) net/socket.c:744 (discriminator 1) net/socket.c:2607 (discriminator 1)) Fixes: 33f72e6f0c67 ("l2tp : multicast notification to the registered listeners") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Chapman <jchapman@katalix.com> Cc: Tom Parkin <tparkin@katalix.com> Cc: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20241011171217.3166614-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: dsa: mv88e6xxx: Fix the max_vid definition for the MV88E6361Peter Rashleigh
According to the Marvell datasheet the 88E6361 has two VTU pages (4k VIDs per page) so the max_vid should be 8191, not 4095. In the current implementation mv88e6xxx_vtu_walk() gives unexpected results because of this error. I verified that mv88e6xxx_vtu_walk() works correctly on the MV88E6361 with this patch in place. Fixes: 12899f299803 ("net: dsa: mv88e6xxx: enable support for 88E6361 switch") Signed-off-by: Peter Rashleigh <peter@rashleigh.ca> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241014204342.5852-1-peter@rashleigh.ca Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().Kuniyuki Iwashima
Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler(). """ We are seeing a use-after-free from a bpf prog attached to trace_tcp_retransmit_synack. The program passes the req->sk to the bpf_sk_storage_get_tracing kernel helper which does check for null before using it. """ The commit 83fccfc3940c ("inet: fix potential deadlock in reqsk_queue_unlink()") added timer_pending() in reqsk_queue_unlink() not to call del_timer_sync() from reqsk_timer_handler(), but it introduced a small race window. Before the timer is called, expire_timers() calls detach_timer(timer, true) to clear timer->entry.pprev and marks it as not pending. If reqsk_queue_unlink() checks timer_pending() just after expire_timers() calls detach_timer(), TCP will miss del_timer_sync(); the reqsk timer will continue running and send multiple SYN+ACKs until it expires. The reported UAF could happen if req->sk is close()d earlier than the timer expiration, which is 63s by default. The scenario would be 1. inet_csk_complete_hashdance() calls inet_csk_reqsk_queue_drop(), but del_timer_sync() is missed 2. reqsk timer is executed and scheduled again 3. req->sk is accept()ed and reqsk_put() decrements rsk_refcnt, but reqsk timer still has another one, and inet_csk_accept() does not clear req->sk for non-TFO sockets 4. sk is close()d 5. reqsk timer is executed again, and BPF touches req->sk Let's not use timer_pending() by passing the caller context to __inet_csk_reqsk_queue_drop(). Note that reqsk timer is pinned, so the issue does not happen in most use cases. [1] [0] BUG: KFENCE: use-after-free read in bpf_sk_storage_get_tracing+0x2e/0x1b0 Use-after-free read at 0x00000000a891fb3a (in kfence-#1): bpf_sk_storage_get_tracing+0x2e/0x1b0 bpf_prog_5ea3e95db6da0438_tcp_retransmit_synack+0x1d20/0x1dda bpf_trace_run2+0x4c/0xc0 tcp_rtx_synack+0xf9/0x100 reqsk_timer_handler+0xda/0x3d0 run_timer_softirq+0x292/0x8a0 irq_exit_rcu+0xf5/0x320 sysvec_apic_timer_interrupt+0x6d/0x80 asm_sysvec_apic_timer_interrupt+0x16/0x20 intel_idle_irq+0x5a/0xa0 cpuidle_enter_state+0x94/0x273 cpu_startup_entry+0x15e/0x260 start_secondary+0x8a/0x90 secondary_startup_64_no_verify+0xfa/0xfb kfence-#1: 0x00000000a72cc7b6-0x00000000d97616d9, size=2376, cache=TCPv6 allocated by task 0 on cpu 9 at 260507.901592s: sk_prot_alloc+0x35/0x140 sk_clone_lock+0x1f/0x3f0 inet_csk_clone_lock+0x15/0x160 tcp_create_openreq_child+0x1f/0x410 tcp_v6_syn_recv_sock+0x1da/0x700 tcp_check_req+0x1fb/0x510 tcp_v6_rcv+0x98b/0x1420 ipv6_list_rcv+0x2258/0x26e0 napi_complete_done+0x5b1/0x2990 mlx5e_napi_poll+0x2ae/0x8d0 net_rx_action+0x13e/0x590 irq_exit_rcu+0xf5/0x320 common_interrupt+0x80/0x90 asm_common_interrupt+0x22/0x40 cpuidle_enter_state+0xfb/0x273 cpu_startup_entry+0x15e/0x260 start_secondary+0x8a/0x90 secondary_startup_64_no_verify+0xfa/0xfb freed by task 0 on cpu 9 at 260507.927527s: rcu_core_si+0x4ff/0xf10 irq_exit_rcu+0xf5/0x320 sysvec_apic_timer_interrupt+0x6d/0x80 asm_sysvec_apic_timer_interrupt+0x16/0x20 cpuidle_enter_state+0xfb/0x273 cpu_startup_entry+0x15e/0x260 start_secondary+0x8a/0x90 secondary_startup_64_no_verify+0xfa/0xfb Fixes: 83fccfc3940c ("inet: fix potential deadlock in reqsk_queue_unlink()") Reported-by: Martin KaFai Lau <martin.lau@kernel.org> Closes: https://lore.kernel.org/netdev/eb6684d0-ffd9-4bdc-9196-33f690c25824@linux.dev/ Link: https://lore.kernel.org/netdev/b55e2ca0-42f2-4b7c-b445-6ffd87ca74a0@linux.dev/ [1] Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20241014223312.4254-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15neighbour: Remove NEIGH_DN_TABLE.Kuniyuki Iwashima
Since commit 1202cdd66531 ("Remove DECnet support from kernel"), NEIGH_DN_TABLE is no longer used. MPLS has implicit dependency on it in nla_put_via(), but nla_get_via() does not support DECnet. Let's remove NEIGH_DN_TABLE. Now, neigh_tables[] has only 2 elements and no extra iteration for DECnet in many places. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241014235216.10785-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: bcmasp: fix potential memory leak in bcmasp_xmit()Wang Hai
The bcmasp_xmit() returns NETDEV_TX_OK without freeing skb in case of mapping fails, add dev_kfree_skb() to fix it. Fixes: 490cb412007d ("net: bcmasp: Add support for ASP2.0 Ethernet controller") Signed-off-by: Wang Hai <wanghai38@huawei.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20241014145901.48940-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: cxgb3: Remove stid deadcodeDr. David Alan Gilbert
cxgb3_alloc_stid() and cxgb3_free_stid() have been unused since commit 30e0f6cf5acb ("RDMA/iw_cxgb3: Remove the iw_cxgb3 module from kernel") Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patch.msgid.link/20241013012946.284721-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15Merge branch 'cxgb4-deadcode-removal'Jakub Kicinski
Dr. David Alan Gilbert says: ==================== cxgb4: Deadcode removal This is a bunch of deadcode removal in cxgb4. It's all complete function removal rather than any actual change to logic. Build and boot tested, but I don't have the hardware to test the actual card. ==================== Link: https://patch.msgid.link/20241013203831.88051-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15cxgb4: Remove unused t4_free_ofld_rxqsDr. David Alan Gilbert
t4_free_ofld_rxqs() has been unused since commit 0fbc81b3ad51 ("chcr/cxgb4i/cxgbit/RDMA/cxgb4: Allocate resources dynamically for all cxgb4 ULD's") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20241013203831.88051-7-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15cxgb4: Remove unused cxgb4_l2t_alloc_switchingDr. David Alan Gilbert
cxgb4_l2t_alloc_switching() has been unused since it was added in commit f7502659cec8 ("cxgb4: Add API to alloc l2t entry; also update existing ones") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20241013203831.88051-6-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15cxgb4: Remove unused cxgb4_scsi_initDr. David Alan Gilbert
cxgb4_iscsi_init() has been unused since 2016's commit 5999299f1ce9 ("cxgb3i,cxgb4i,libcxgbi: remove iSCSI DDP support") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20241013203831.88051-5-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15cxgb4: Remove unused cxgb4_get_srq_entryDr. David Alan Gilbert
cxgb4_get_srq_entry() has been unused since 2018's commit e47094751ddc ("cxgb4: Add support to initialise/read SRQ entries") which added it. Remove it. Note: I'm a bit suspicious whether any of the srq code in there actually does anything useful; without this get I can't see anything that reads the data, so perhaps the whole thing should go? But that however would remove one of the opcode handlers, and I have no way to test that. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20241013203831.88051-4-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15cxgb4: Remove unused cxgb4_alloc/free_raw_mac_filtDr. David Alan Gilbert
cxgb4_alloc_raw_mac_filt() and cxgb4_free_raw_mac_filt() have been unused since they were added in 2019 commit 5fab51581f62 ("cxgb4: Add MPS TCAM refcounting for raw mac filters") Remove them. This was also the last use of cxgb4_mps_ref_dec(). Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20241013203831.88051-3-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15cxgb4: Remove unused cxgb4_alloc/free_encap_mac_filtDr. David Alan Gilbert
cxgb4_alloc_encap_mac_filt() and cxgb4_free_encap_mac_filt() have been unused since commit 28b3870578ef ("cxgb4: Re-work the logic for mps refcounting") Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20241013203831.88051-2-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: systemport: fix potential memory leak in bcm_sysport_xmit()Wang Hai
The bcm_sysport_xmit() returns NETDEV_TX_OK without freeing skb in case of dma_map_single() fails, add dev_kfree_skb() to fix it. Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Wang Hai <wanghai38@huawei.com> Link: https://patch.msgid.link/20241014145115.44977-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15Merge tag 'trace-ringbuffer-v6.12-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer fixes from Steven Rostedt: - Fix ref counter of buffers assigned at boot up A tracing instance can be created from the kernel command line. If it maps to memory, it is considered permanent and should not be deleted, or bad things can happen. If it is not mapped to memory, then the user is fine to delete it via rmdir from the instances directory. But the ref counts assumed 0 was free to remove and greater than zero was not. But this was not the case. When an instance is created, it should have the reference of 1, and if it should not be removed, it must be greater than 1. The boot up code set normal instances with a ref count of 0, which could get removed if something accessed it and then released it. And memory mapped instances had a ref count of 1 which meant it could be deleted, and bad things happen. Keep normal instances ref count as 1, and set memory mapped instances ref count to 2. - Protect sub buffer size (order) updates from other modifications When a ring buffer is changing the size of its sub-buffers, no other operations should be performed on the ring buffer. That includes reading it. But the locking only grabbed the buffer->mutex that keeps some operations from touching the ring buffer. It also must hold the cpu_buffer->reader_lock as well when updates happen as other paths use that to do some operations on the ring buffer. * tag 'trace-ringbuffer-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ring-buffer: Fix reader locking when changing the sub buffer order ring-buffer: Fix refcount setting of boot mapped buffers
2024-10-15Merge tag 'bcachefs-2024-10-14' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: - New metadata version inode_has_child_snapshots This fixes bugs with handling of unlinked inodes + snapshots, in particular when an inode is reattached after taking a snapshot; deleted inodes now get correctly cleaned up across snapshots. - Disk accounting rewrite fixes - validation fixes for when a device has been removed - fix journal replay failing with "journal_reclaim_would_deadlock" - Some more small fixes for erasure coding + device removal - Assorted small syzbot fixes * tag 'bcachefs-2024-10-14' of git://evilpiepirate.org/bcachefs: (27 commits) bcachefs: Fix sysfs warning in fstests generic/730,731 bcachefs: Handle race between stripe reuse, invalidate_stripe_to_dev bcachefs: Fix kasan splat in new_stripe_alloc_buckets() bcachefs: Add missing validation for bch_stripe.csum_granularity_bits bcachefs: Fix missing bounds checks in bch2_alloc_read() bcachefs: fix uaf in bch2_dio_write_done() bcachefs: Improve check_snapshot_exists() bcachefs: Fix bkey_nocow_lock() bcachefs: Fix accounting replay flags bcachefs: Fix invalid shift in member_to_text() bcachefs: Fix bch2_have_enough_devs() for BCH_SB_MEMBER_INVALID bcachefs: __wait_for_freeing_inode: Switch to wait_bit_queue_entry bcachefs: Check if stuck in journal_res_get() closures: Add closure_wait_event_timeout() bcachefs: Fix state lock involved deadlock bcachefs: Fix NULL pointer dereference in bch2_opt_to_text bcachefs: Release transaction before wake up bcachefs: add check for btree id against max in try read node bcachefs: Disk accounting device validation fixes bcachefs: bch2_inode_or_descendents_is_open() ...
2024-10-15net: ethernet: rtsn: fix potential memory leak in rtsn_start_xmit()Wang Hai
The rtsn_start_xmit() returns NETDEV_TX_OK without freeing skb in case of skb->len being too long, add dev_kfree_skb_any() to fix it. Fixes: b0d3969d2b4d ("net: ethernet: rtsn: Add support for Renesas Ethernet-TSN") Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241014144250.38802-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: xilinx: axienet: fix potential memory leak in axienet_start_xmit()Wang Hai
The axienet_start_xmit() returns NETDEV_TX_OK without freeing skb in case of dma_map_single() fails, add dev_kfree_skb_any() to fix it. Fixes: 71791dc8bdea ("net: axienet: Check for DMA mapping errors") Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Link: https://patch.msgid.link/20241014143704.31938-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15Merge branch 'mptcp-prevent-mpc-handshake-on-port-based-signal-endpoints'Jakub Kicinski
Matthieu Baerts says: ==================== mptcp: prevent MPC handshake on port-based signal endpoints MPTCP connection requests toward a listening socket created by the in-kernel PM for a port based signal endpoint will never be accepted, they need to be explicitly rejected. - Patch 1: Explicitly reject such requests. A fix for >= v5.12. - Patch 2: Cover this case in the MPTCP selftests to avoid regressions. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> v1: https://lore.kernel.org/20240908180620.822579-1-xiyou.wangcong@gmail.com Link: https://lore.kernel.org/a5289a0d-2557-40b8-9575-6f1a0bbf06e4@redhat.com ==================== Link: https://patch.msgid.link/20241014-net-mptcp-mpc-port-endp-v2-0-7faea8e6b6ae@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15selftests: mptcp: join: test for prohibited MPC to port-based endpPaolo Abeni
Explicitly verify that MPC connection attempts towards a port-based signal endpoint fail with a reset. Note that this new test is a bit different from the other ones, not using 'run_tests'. It is then needed to add the capture capability, and the picking the right port which have been extracted into three new helpers. The info about the capture can also be printed from a single point, which simplifies the exit paths in do_transfer(). The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it validates the previous fix for an issue introduced by this commit ID. Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port") Cc: stable@vger.kernel.org Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241014-net-mptcp-mpc-port-endp-v2-2-7faea8e6b6ae@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15mptcp: prevent MPC handshake on port-based signal endpointsPaolo Abeni
Syzkaller reported a lockdep splat: ============================================ WARNING: possible recursive locking detected 6.11.0-rc6-syzkaller-00019-g67784a74e258 #0 Not tainted -------------------------------------------- syz-executor364/5113 is trying to acquire lock: ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 but task is already holding lock: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(k-slock-AF_INET); lock(k-slock-AF_INET); *** DEADLOCK *** May be due to missing lock nesting notation 7 locks held by syz-executor364/5113: #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline] #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x153/0x1b10 net/mptcp/protocol.c:1806 #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline] #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg_fastopen+0x11f/0x530 net/mptcp/protocol.c:1727 #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5f/0x1b80 net/ipv4/ip_output.c:470 #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1390 net/ipv4/ip_output.c:228 #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline] #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x33b/0x15b0 net/core/dev.c:6104 #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x230/0x5f0 net/ipv4/ip_input.c:232 #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 stack backtrace: CPU: 0 UID: 0 PID: 5113 Comm: syz-executor364 Not tainted 6.11.0-rc6-syzkaller-00019-g67784a74e258 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119 check_deadlock kernel/locking/lockdep.c:3061 [inline] validate_chain+0x15d3/0x5900 kernel/locking/lockdep.c:3855 __lock_acquire+0x137a/0x2040 kernel/locking/lockdep.c:5142 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5759 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 mptcp_sk_clone_init+0x32/0x13c0 net/mptcp/protocol.c:3279 subflow_syn_recv_sock+0x931/0x1920 net/mptcp/subflow.c:874 tcp_check_req+0xfe4/0x1a20 net/ipv4/tcp_minisocks.c:853 tcp_v4_rcv+0x1c3e/0x37f0 net/ipv4/tcp_ipv4.c:2267 ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 __netif_receive_skb_one_core net/core/dev.c:5661 [inline] __netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775 process_backlog+0x662/0x15b0 net/core/dev.c:6108 __napi_poll+0xcb/0x490 net/core/dev.c:6772 napi_poll net/core/dev.c:6841 [inline] net_rx_action+0x89b/0x1240 net/core/dev.c:6963 handle_softirqs+0x2c4/0x970 kernel/softirq.c:554 do_softirq+0x11b/0x1e0 kernel/softirq.c:455 </IRQ> <TASK> __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:908 [inline] __dev_queue_xmit+0x1763/0x3e90 net/core/dev.c:4450 dev_queue_xmit include/linux/netdevice.h:3105 [inline] neigh_hh_output include/net/neighbour.h:526 [inline] neigh_output include/net/neighbour.h:540 [inline] ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:235 ip_local_out net/ipv4/ip_output.c:129 [inline] __ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:535 __tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466 tcp_rcv_synsent_state_process net/ipv4/tcp_input.c:6542 [inline] tcp_rcv_state_process+0x2c32/0x4570 net/ipv4/tcp_input.c:6729 tcp_v4_do_rcv+0x77d/0xc70 net/ipv4/tcp_ipv4.c:1934 sk_backlog_rcv include/net/sock.h:1111 [inline] __release_sock+0x214/0x350 net/core/sock.c:3004 release_sock+0x61/0x1f0 net/core/sock.c:3558 mptcp_sendmsg_fastopen+0x1ad/0x530 net/mptcp/protocol.c:1733 mptcp_sendmsg+0x1884/0x1b10 net/mptcp/protocol.c:1812 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x1a6/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597 ___sys_sendmsg net/socket.c:2651 [inline] __sys_sendmmsg+0x3b2/0x740 net/socket.c:2737 __do_sys_sendmmsg net/socket.c:2766 [inline] __se_sys_sendmmsg net/socket.c:2763 [inline] __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2763 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f04fb13a6b9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 01 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffd651f42d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f04fb13a6b9 RDX: 0000000000000001 RSI: 0000000020000d00 RDI: 0000000000000004 RBP: 00007ffd651f4310 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000020000080 R11: 0000000000000246 R12: 00000000000f4240 R13: 00007f04fb187449 R14: 00007ffd651f42f4 R15: 00007ffd651f4300 </TASK> As noted by Cong Wang, the splat is false positive, but the code path leading to the report is an unexpected one: a client is attempting an MPC handshake towards the in-kernel listener created by the in-kernel PM for a port based signal endpoint. Such connection will be never accepted; many of them can make the listener queue full and preventing the creation of MPJ subflow via such listener - its intended role. Explicitly detect this scenario at initial-syn time and drop the incoming MPC request. Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port") Cc: stable@vger.kernel.org Reported-by: syzbot+f4aacdfef2c6a6529c3e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f4aacdfef2c6a6529c3e Cc: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241014-net-mptcp-mpc-port-endp-v2-1-7faea8e6b6ae@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net/smc: Fix searching in list of known pnetids in smc_pnet_add_pnetidLi RongQing
pnetid of pi (not newly allocated pe) should be compared Fixes: e888a2e8337c ("net/smc: introduce list of pnetids for Ethernet devices") Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Link: https://patch.msgid.link/20241014115321.33234-1-lirongqing@baidu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15Merge branch 'net-ethernet-freescale-use-pa-to-format-resource_size_t'Jakub Kicinski
Simon Horman says: ==================== net: ethernet: freescale: Use %pa to format resource_size_t This short series addersses the formatting of variables of type resource_size_t in freescale drivers. The correct format string for resource_size_t is %pa which acts on the address of the variable to be formatted [1]. [1] https://elixir.bootlin.com/linux/v6.11.3/source/Documentation/core-api/printk-formats.rst#L229 These problems were introduced by commit 9d9326d3bc0e ("phy: Change mii_bus id field to a string") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/netdev/711d7f6d-b785-7560-f4dc-c6aad2cce99@linux-m68k.org/ Reviewed-by: Daniel Machon <daniel.machon@microchip.com> ==================== Link: https://patch.msgid.link/20241014-net-pa-fmt-v1-0-dcc9afb8858b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: ethernet: fs_enet: Use %pa to format resource_size_tSimon Horman
The correct format string for resource_size_t is %pa which acts on the address of the variable to be formatted [1]. [1] https://elixir.bootlin.com/linux/v6.11.3/source/Documentation/core-api/printk-formats.rst#L229 Introduced by commit 9d9326d3bc0e ("phy: Change mii_bus id field to a string") Flagged by gcc-14 as: drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c: In function 'fs_mii_bitbang_init': drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c:126:46: warning: format '%x' expects argument of type 'unsigned int', but argument 4 has type 'resource_size_t' {aka 'long long unsigned int'} [-Wformat=] 126 | snprintf(bus->id, MII_BUS_ID_SIZE, "%x", res.start); | ~^ ~~~~~~~~~ | | | | | resource_size_t {aka long long unsigned int} | unsigned int | %llx No functional change intended. Compile tested only. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/netdev/711d7f6d-b785-7560-f4dc-c6aad2cce99@linux-m68k.org/ Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20241014-net-pa-fmt-v1-2-dcc9afb8858b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: fec_mpc52xx_phy: Use %pa to format resource_size_tSimon Horman
The correct format string for resource_size_t is %pa which acts on the address of the variable to be formatted [1]. [1] https://elixir.bootlin.com/linux/v6.11.3/source/Documentation/core-api/printk-formats.rst#L229 Introduced by commit 9d9326d3bc0e ("phy: Change mii_bus id field to a string") Flagged by gcc-14 as: drivers/net/ethernet/freescale/fec_mpc52xx_phy.c: In function 'mpc52xx_fec_mdio_probe': drivers/net/ethernet/freescale/fec_mpc52xx_phy.c:97:46: warning: format '%x' expects argument of type 'unsigned int', but argument 4 has type 'resource_size_t' {aka 'long long unsigned int'} [-Wformat=] 97 | snprintf(bus->id, MII_BUS_ID_SIZE, "%x", res.start); | ~^ ~~~~~~~~~ | | | | | resource_size_t {aka long long unsigned int} | unsigned int | %llx No functional change intended. Compile tested only. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/netdev/711d7f6d-b785-7560-f4dc-c6aad2cce99@linux-m68k.org/ Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20241014-net-pa-fmt-v1-1-dcc9afb8858b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15Merge branch 'net-string-format-safety-updates'Jakub Kicinski
Simon Horman says: ==================== net: String format safety updates This series addresses string format safety issues that are flagged by tooling in files touched by recent patches. I do not believe that any of these issues are bugs. Rather, I am providing these updates as I think there is a value in addressing such warnings so real problems stand out. v1: https://lore.kernel.org/20241011-string-thing-v1-0-acc506568033@kernel.org ==================== Link: https://patch.msgid.link/20241014-string-thing-v2-0-b9b29625060a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: txgbe: Pass string literal as format argument of alloc_workqueue()Simon Horman
Recently I noticed that both gcc-14 and clang-18 report that passing a non-string literal as the format argument of clkdev_create() is potentially insecure. E.g. clang-18 says: .../txgbe_phy.c:582:35: warning: format string is not a string literal (potentially insecure) [-Wformat-security] 581 | clock = clkdev_create(clk, NULL, clk_name); | ^~~~~~~~ .../txgbe_phy.c:582:35: note: treat the string as an argument to avoid this 581 | clock = clkdev_create(clk, NULL, clk_name); | ^ | "%s", It is always the case where the contents of clk_name is safe to pass as the format argument. That is, in my understanding, it never contains any format escape sequences. However, it seems better to be safe than sorry. And, as a bonus, compiler output becomes less verbose by addressing this issue as suggested by clang-18. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241014-string-thing-v2-2-b9b29625060a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: dsa: microchip: copy string using strscpySimon Horman
Prior to this patch ksz_ptp_msg_irq_setup() uses snprintf() to copy strings. It does so by passing strings as the format argument of snprintf(). This appears to be safe, due to the absence of format specifiers in the strings, which are declared within the same function. But nonetheless GCC 14 warns about it: .../ksz_ptp.c:1109:55: warning: format string is not a string literal (potentially insecure) [-Wformat-security] 1109 | snprintf(ptpmsg_irq->name, sizeof(ptpmsg_irq->name), name[n]); | ^~~~~~~ .../ksz_ptp.c:1109:55: note: treat the string as an argument to avoid this 1109 | snprintf(ptpmsg_irq->name, sizeof(ptpmsg_irq->name), name[n]); | ^ | "%s", As what we are really dealing with here is a string copy, it seems make sense to use a function designed for this purpose. In this case null padding is not required, so strscpy is appropriate. And as the destination is an array of fixed size, the 2-argument variant may be used. Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241014-string-thing-v2-1-b9b29625060a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15Merge branch 'replace-call_rcu-by-kfree_rcu-for-simple-kmem_cache_free-callback'Jakub Kicinski
Julia Lawall says: ==================== replace call_rcu by kfree_rcu for simple kmem_cache_free callback Since SLOB was removed and since commit 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"), it is not necessary to use call_rcu when the callback only performs kmem_cache_free. Use kfree_rcu() directly. The changes were done using the following Coccinelle semantic patch. This semantic patch is designed to ignore cases where the callback function is used in another way. // <smpl> @r@ expression e; local idexpression e2; identifier cb,f,g; position p; @@ ( call_rcu(...,e2) | call_rcu(&e->f,cb@p) | call_rcu(&e->f.g,cb@p) ) @r1@ type T,T1; identifier x,r.cb; @@ cb(...) { ( kmem_cache_free(...); | T x = ...; kmem_cache_free(...,(T1)x); | T x; x = ...; kmem_cache_free(...,(T1)x); ) } @s depends on r1@ position p != r.p; identifier r.cb; @@ cb@p @script:ocaml@ cb << r.cb; p << s.p; @@ Printf.eprintf "Other use of %s at %s:%d\n" cb (List.hd p).file (List.hd p).line @depends on r1 && !s@ expression e; identifier r.cb,f,g; position r.p; @@ ( - call_rcu(&e->f,cb@p) + kfree_rcu(e,f) | - call_rcu(&e->f.g,cb@p) + kfree_rcu(e,f.g) ) @r1a depends on !s@ type T,T1; identifier x,r.cb; @@ - cb(...) { ( - kmem_cache_free(...); | - T x = ...; - kmem_cache_free(...,(T1)x); | - T x; - x = ...; - kmem_cache_free(...,(T1)x); ) - } @r2 depends on !r1@ identifier r.cb; @@ cb(...) { ... } @script:ocaml depends on !r1 && !r2@ cb << r.cb; @@ Printf.eprintf "need definition for %s\n" cb // </smpl> ==================== Acked-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Link: https://patch.msgid.link/20241013201704.49576-1-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15kcm: replace call_rcu by kfree_rcu for simple kmem_cache_free callbackJulia Lawall
Since SLOB was removed and since commit 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"), it is not necessary to use call_rcu when the callback only performs kmem_cache_free. Use kfree_rcu() directly. The changes were made using Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Acked-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Link: https://patch.msgid.link/20241013201704.49576-15-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: bridge: replace call_rcu by kfree_rcu for simple kmem_cache_free callbackJulia Lawall
Since SLOB was removed and since commit 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"), it is not necessary to use call_rcu when the callback only performs kmem_cache_free. Use kfree_rcu() directly. The changes were made using Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Link: https://patch.msgid.link/20241013201704.49576-9-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15ipv6: replace call_rcu by kfree_rcu for simple kmem_cache_free callbackJulia Lawall
Since SLOB was removed and since commit 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"), it is not necessary to use call_rcu when the callback only performs kmem_cache_free. Use kfree_rcu() directly. The changes were made using Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Link: https://patch.msgid.link/20241013201704.49576-5-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15inetpeer: replace call_rcu by kfree_rcu for simple kmem_cache_free callbackJulia Lawall
Since SLOB was removed and since commit 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"), it is not necessary to use call_rcu when the callback only performs kmem_cache_free. Use kfree_rcu() directly. The changes were made using Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Link: https://patch.msgid.link/20241013201704.49576-4-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15ipv4: replace call_rcu by kfree_rcu for simple kmem_cache_free callbackJulia Lawall
Since SLOB was removed and since commit 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"), it is not necessary to use call_rcu when the callback only performs kmem_cache_free. Use kfree_rcu() directly. The changes were made using Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Link: https://patch.msgid.link/20241013201704.49576-3-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: macb: Avoid 20s boot delay by skipping MDIO bus registration for ↵Oleksij Rempel
fixed-link PHY A boot delay was introduced by commit 79540d133ed6 ("net: macb: Fix handling of fixed-link node"). This delay was caused by the call to `mdiobus_register()` in cases where a fixed-link PHY was present. The MDIO bus registration triggered unnecessary PHY address scans, leading to a 20-second delay due to attempts to detect Clause 45 (C45) compatible PHYs, despite no MDIO bus being attached. The commit 79540d133ed6 ("net: macb: Fix handling of fixed-link node") was originally introduced to fix a regression caused by commit 7897b071ac3b4 ("net: macb: convert to phylink"), which caused the driver to misinterpret fixed-link nodes as PHY nodes. This resulted in warnings like: mdio_bus f0028000.ethernet-ffffffff: fixed-link has invalid PHY address mdio_bus f0028000.ethernet-ffffffff: scan phy fixed-link at address 0 ... mdio_bus f0028000.ethernet-ffffffff: scan phy fixed-link at address 31 This patch reworks the logic to avoid registering and allocation of the MDIO bus when: - The device tree contains a fixed-link node. - There is no "mdio" child node in the device tree. If a child node named "mdio" exists, the MDIO bus will be registered to support PHYs attached to the MACB's MDIO bus. Otherwise, with only a fixed-link, the MDIO bus is skipped. Tested on a sama5d35 based system with a ksz8863 switch attached to macb0. Fixes: 79540d133ed6 ("net: macb: Fix handling of fixed-link node") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Cc: stable@vger.kernel.org Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241013052916.3115142-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: ethernet: aeroflex: fix potential memory leak in greth_start_xmit_gbit()Wang Hai
The greth_start_xmit_gbit() returns NETDEV_TX_OK without freeing skb in case of skb->len being too long, add dev_kfree_skb() to fix it. Fixes: d4c41139df6e ("net: Add Aeroflex Gaisler 10/100/1G Ethernet MAC driver") Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Link: https://patch.msgid.link/20241012110434.49265-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15netdevsim: use cond_resched() in nsim_dev_trap_report_work()Eric Dumazet
I am still seeing many syzbot reports hinting that syzbot might fool nsim_dev_trap_report_work() with hundreds of ports [1] Lets use cond_resched(), and system_unbound_wq instead of implicit system_wq. [1] INFO: task syz-executor:20633 blocked for more than 143 seconds. Not tainted 6.12.0-rc2-syzkaller-00205-g1d227fcc7222 #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor state:D stack:25856 pid:20633 tgid:20633 ppid:1 flags:0x00004006 ... NMI backtrace for cpu 1 CPU: 1 UID: 0 PID: 16760 Comm: kworker/1:0 Not tainted 6.12.0-rc2-syzkaller-00205-g1d227fcc7222 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Workqueue: events nsim_dev_trap_report_work RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x70 kernel/kcov.c:210 Code: 89 fb e8 23 00 00 00 48 8b 3d 04 fb 9c 0c 48 89 de 5b e9 c3 c7 5d 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <f3> 0f 1e fa 48 8b 04 24 65 48 8b 0c 25 c0 d7 03 00 65 8b 15 60 f0 RSP: 0018:ffffc90000a187e8 EFLAGS: 00000246 RAX: 0000000000000100 RBX: ffffc90000a188e0 RCX: ffff888027d3bc00 RDX: ffff888027d3bc00 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88804a2e6000 R08: ffffffff8a4bc495 R09: ffffffff89da3577 R10: 0000000000000004 R11: ffffffff8a4bc2b0 R12: dffffc0000000000 R13: ffff88806573b503 R14: dffffc0000000000 R15: ffff8880663cca00 FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc90a747f98 CR3: 000000000e734000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 000000000000002b DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: <NMI> </NMI> <TASK> __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382 spin_unlock_bh include/linux/spinlock.h:396 [inline] nsim_dev_trap_report drivers/net/netdevsim/dev.c:820 [inline] nsim_dev_trap_report_work+0x75d/0xaa0 drivers/net/netdevsim/dev.c:850 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310 worker_thread+0x870/0xd30 kernel/workqueue.c:3391 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Fixes: ba5e1272142d ("netdevsim: avoid potential loop in nsim_dev_trap_report_work()") Reported-by: syzbot+d383dc9579a76f56c251@syzkaller.appspotmail.com Reported-by: syzbot+c596faae21a68bf7afd0@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20241012094230.3893510-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: airoha: Implement BQL supportLorenzo Bianconi
Introduce BQL support in the airoha_eth driver reporting to the kernel info about tx hw DMA queues in order to avoid bufferbloat and keep the latency small. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20241012-en7581-bql-v2-1-4deb4efdb60b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15macsec: don't increment counters for an unrelated SASabrina Dubroca
On RX, we shouldn't be incrementing the stats for an arbitrary SA in case the actual SA hasn't been set up. Those counters are intended to track packets for their respective AN when the SA isn't currently configured. Due to the way MACsec is implemented, we don't keep counters unless the SA is configured, so we can't track those packets, and those counters will remain at 0. The RXSC's stats keeps track of those packets without telling us which AN they belonged to. We could add counters for non-existent SAs, and then find a way to integrate them in the dump to userspace, but I don't think it's worth the effort. Fixes: 91ec9bd57f35 ("macsec: Fix traffic counters/statistics") Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/f5ac92aaa5b89343232615f4c03f9f95042c6aa0.1728657709.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net: phy: aquantia: fix return value check in aqr107_config_mdi()Daniel Golle
of_property_read_u32() returns -EINVAL in case the property cannot be found rather than -ENOENT. Fix the check to not abort probing in case of the property being missing, and also in case CONFIG_OF is not set which will result in -ENOSYS. Fixes: a2e1ba275eae ("net: phy: aquantia: allow forcing order of MDI pairs") Reported-by: Jon Hunter <jonathanh@nvidia.com> Closes: https://lore.kernel.org/all/114b4c03-5d16-42ed-945d-cf78eabea12b@nvidia.com/ Suggested-by: Hans-Frieder Vogt <hfdevel@gmx.net> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Link: https://patch.msgid.link/f8282e2fc6a5ac91fe91491edc7f1ca8f4a65a0d.1728825323.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15Merge branch 'net-af_packet-allow-joining-a-fanout-when-link-is-down'Jakub Kicinski
Gur Stavi says: ==================== net: af_packet: allow joining a fanout when link is down PACKET socket can retain its fanout membership through link down and up and leave a fanout while closed regardless of link state. However, socket was forbidden from joining a fanout while it was not RUNNING. This scenario was identified while studying DPDK pmd_af_packet_drv. Since sockets are only created during initialization, there is no reason to fail the initialization if a single link is temporarily down. This patch allows PACKET socket to join a fanout while not RUNNING. Selftest psock_fanout is extended to test this "fanout while link down" scenario. Selftest psock_fanout is also extended to test fanout create/join by socket that did not bind or specified a protocol, which carries an implicit bind. v3: https://lore.kernel.org/cover.1728555449.git.gur.stavi@huawei.com v2: https://lore.kernel.org/cover.1728382839.git.gur.stavi@huawei.com v1: https://lore.kernel.org/cover.1728303615.git.gur.stavi@huawei.com ==================== Link: https://patch.msgid.link/cover.1728802323.git.gur.stavi@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15selftests: net/psock_fanout: unbound socket fanoutGur Stavi
Add a test that validates that an unbound packet socket cannot create/join a fanout group. Signed-off-by: Gur Stavi <gur.stavi@huawei.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/7612fa90f613100e2b64c563cab3d7fdf36010db.1728802323.git.gur.stavi@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15selftests: net/psock_fanout: socket joins fanout when link is downGur Stavi
Modify test_control_group to have toggle parameter. When toggle is non-zero, loopback device will be set down for the initialization of fd[1] which is still expected to successfully join the fanout. Signed-off-by: Gur Stavi <gur.stavi@huawei.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/6f4a506ed5f08f8fc00a966dec8febd1030c6e98.1728802323.git.gur.stavi@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15af_packet: allow fanout_add when socket is not RUNNINGGur Stavi
PACKET socket can retain its fanout membership through link down and up and leave a fanout while closed regardless of link state. However, socket was forbidden from joining a fanout while it was not RUNNING. This patch allows PACKET socket to join fanout while not RUNNING. Socket can be RUNNING if it has a specified protocol. Either directly from packet_create (being implicitly bound to any interface) or following a successful bind. Socket RUNNING state is switched off if it is bound to an interface that went down. Instead of the test for RUNNING, this patch adds a test that socket can become RUNNING. Signed-off-by: Gur Stavi <gur.stavi@huawei.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/4f1a3c37dbef980ef044c4d2adf91c76e2eca14b.1728802323.git.gur.stavi@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15r8169: implement additional ethtool stats opsHeiner Kallweit
This adds support for ethtool standard statistics, and makes use of the extended hardware statistics being available from RTl8125. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/58e0da73-a7dd-4be3-82ae-d5b3f9069bde@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15ring-buffer: Fix reader locking when changing the sub buffer orderPetr Pavlu
The function ring_buffer_subbuf_order_set() updates each ring_buffer_per_cpu and installs new sub buffers that match the requested page order. This operation may be invoked concurrently with readers that rely on some of the modified data, such as the head bit (RB_PAGE_HEAD), or the ring_buffer_per_cpu.pages and reader_page pointers. However, no exclusive access is acquired by ring_buffer_subbuf_order_set(). Modifying the mentioned data while a reader also operates on them can then result in incorrect memory access and various crashes. Fix the problem by taking the reader_lock when updating a specific ring_buffer_per_cpu in ring_buffer_subbuf_order_set(). Link: https://lore.kernel.org/linux-trace-kernel/20240715145141.5528-1-petr.pavlu@suse.com/ Link: https://lore.kernel.org/linux-trace-kernel/20241010195849.2f77cc3f@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20241011112850.17212b25@gandalf.local.home/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20241015112440.26987-1-petr.pavlu@suse.com Fixes: 8e7b58c27b3c ("ring-buffer: Just update the subbuffers when changing their allocation order") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-15Merge tag 'batadv-next-pullrequest-20241015' of ↵Paolo Abeni
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - Add flex array to struct batadv_tvlv_tt_data, by Erick Archer - Use string choice helper to print booleans, by Sven Eckelmann - replace call_rcu by kfree_rcu for simple kmem_cache_free callback, by Julia Lawall * tag 'batadv-next-pullrequest-20241015' of git://git.open-mesh.org/linux-merge: batman-adv: replace call_rcu by kfree_rcu for simple kmem_cache_free callback batman-adv: Use string choice helper to print booleans batman-adv: Add flex array to struct batadv_tvlv_tt_data batman-adv: Start new development cycle ==================== Link: https://patch.msgid.link/20241015073946.46613-1-sw@simonwunderlich.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>