linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2025-03-17	tcp: reorganize tcp_in_ack_event() and tcp_count_delivered()	Ilpo Järvinen
	- Move tcp_count_delivered() earlier and split tcp_count_delivered_ce() out of it - Move tcp_in_ack_event() later - While at it, remove the inline from tcp_in_ack_event() and let the compiler to decide Accurate ECN's heuristics does not know if there is going to be ACE field based CE counter increase or not until after rtx queue has been processed. Only then the number of ACKed bytes/pkts is available. As CE or not affects presence of FLAG_ECE, that information for tcp_in_ack_event is not yet available in the old location of the call to tcp_in_ack_event(). Signed-off-by: Ilpo Järvinen <ij@kernel.org> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-03-14	net/smc: use the correct ndev to find pnetid by pnetid table	Guangguan Wang
	When using smc_pnet in SMC, it will only search the pnetid in the base_ndev of the netdev hierarchy(both HW PNETID and User-defined sw pnetid). This may not work for some scenarios when using SMC in container on cloud environment. In container, there have choices of different container network, such as directly using host network, virtual network IPVLAN, veth, etc. Different choices of container network have different netdev hierarchy. Examples of netdev hierarchy show below. (eth0 and eth1 in host below is the netdev directly related to the physical device). _______________________________ \| _________________ \| \| \|POD \| \| \| \| \| \| \| \| eth0_________ \| \| \| \|____\| \|__\| \| \| \| \| \| \| \| \| \| \| eth1\|base_ndev\| eth0_______ \| \| \| \| \| RDMA \|\| \| host \|_________\| \|_______\|\| --------------------------------- netdev hierarchy if directly using host network ________________________________ \| _________________ \| \| \|POD __________ \| \| \| \| \|upper_ndev\| \| \| \| \|eth0\|__________\| \| \| \| \|_______\|_________\| \| \| \|lower netdev \| \| __\|______ \| \| eth1\| \| eth0_______ \| \| \|base_ndev\| \| RDMA \|\| \| host \|_________\| \|_______\|\| --------------------------------- netdev hierarchy if using IPVLAN _______________________________ \| _____________________ \| \| \|POD _________ \| \| \| \| \|base_ndev\|\| \| \| \|eth0(veth)\|_________\|\| \| \| \|____________\|________\| \| \| \|pairs \| \| _______\|_ \| \| \| \| eth0_______ \| \| veth\|base_ndev\| \| RDMA \|\| \| \|_________\| \|_______\|\| \| _________ \| \| eth1\|base_ndev\| \| \| host \|_________\| \| --------------------------------- netdev hierarchy if using veth Due to some reasons, the eth1 in host is not RDMA attached netdevice, pnetid is needed to map the eth1(in host) with RDMA device so that POD can do SMC-R. Because the eth1(in host) is managed by CNI plugin(such as Terway, network management plugin in container environment), and in cloud environment the eth(in host) can dynamically be inserted by CNI when POD create and dynamically be removed by CNI when POD destroy and no POD related to the eth(in host) anymore. It is hard to config the pnetid to the eth1(in host). But it is easy to config the pnetid to the netdevice which can be seen in POD. When do SMC-R, both the container directly using host network and the container using veth network can successfully match the RDMA device, because the configured pnetid netdev is a base_ndev. But the container using IPVLAN can not successfully match the RDMA device and 0x03030000 fallback happens, because the configured pnetid netdev is not a base_ndev. Additionally, if config pnetid to the eth1(in host) also can not work for matching RDMA device when using veth network and doing SMC-R in POD. To resolve the problems list above, this patch extends to search user -defined sw pnetid in the clc handshake ndev when no pnetid can be found in the base_ndev, and the base_ndev take precedence over ndev for backward compatibility. This patch also can unify the pnetid setup of different network choices list above in container(Config user-defined sw pnetid in the netdevice can be seen in POD). Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-03-14	selftests: drv-net: fix merge conflicts resolution	Matthieu Baerts (NGI0)
	After the recent merge between net-next and net, I got some conflicts on my side because the merge resolution was different from Stephen's one [1] I applied on my side in the MPTCP tree. It looks like the code that is now in net-next is using the old way to retrieve the local and remote addresses. This patch is now using the new way, like what was in Stephen's email [1]. Also, in get_interface_info(), there were no conflicts in this area, because that was new code from 'net', but a small adaptation was needed there as well to get the remote address. Fixes: 941defcea7e1 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") Link: https://lore.kernel.org/20250311115758.17a1d414@canb.auug.org.au [1] Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250314-net-next-drv-net-ping-fix-merge-v1-1-0d5c19daf707@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Paolo Abeni
	Cross-merge networking fixes after downstream PR (net-6.14-rc6). Conflicts: tools/testing/selftests/drivers/net/ping.py 75cc19c8ff89 ("selftests: drv-net: add xdp cases for ping.py") de94e8697405 ("selftests: drv-net: store addresses in dict indexed by ipver") https://lore.kernel.org/netdev/20250311115758.17a1d414@canb.auug.org.au/ net/core/devmem.c a70f891e0fa0 ("net: devmem: do not WARN conditionally after netdev_rx_queue_restart()") 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations") https://lore.kernel.org/netdev/20250313114929.43744df1@canb.auug.org.au/ Adjacent changes: tools/testing/selftests/net/Makefile 6f50175ccad4 ("selftests: Add IPv6 link-local address generation tests for GRE devices.") 2e5584e0f913 ("selftests/net: expand cmsg_ipv6.sh with ipv4") drivers/net/ethernet/broadcom/bnxt/bnxt.c 661958552eda ("eth: bnxt: do not use BNXT_VNIC_NTUPLE unconditionally in queue restart logic") fe96d717d38e ("bnxt_en: Extend queue stop/start for TX rings") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	Merge tag 'net-6.14-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter, bluetooth and wireless. No known regressions outstanding. Current release - regressions: - wifi: nl80211: fix assoc link handling - eth: lan78xx: sanitize return values of register read/write functions Current release - new code bugs: - ethtool: tsinfo: fix dump command - bluetooth: btusb: configure altsetting for HCI_USER_CHANNEL - eth: mlx5: DR, use the right action structs for STEv3 Previous releases - regressions: - netfilter: nf_tables: make destruction work queue pernet - gre: fix IPv6 link-local address generation. - wifi: iwlwifi: fix TSO preparation - bluetooth: revert "bluetooth: hci_core: fix sleeping function called from invalid context" - ovs: revert "openvswitch: switch to per-action label counting in conntrack" - eth: - ice: fix switchdev slow-path in LAG - bonding: fix incorrect MAC address setting to receive NS messages Previous releases - always broken: - core: prevent TX of unreadable skbs - sched: prevent creation of classes with TC_H_ROOT - netfilter: nft_exthdr: fix offset with ipv4_find_option() - wifi: cfg80211: cancel wiphy_work before freeing wiphy - mctp: copy headers if cloned - phy: nxp-c45-tja11xx: add errata for TJA112XA/B - eth: - bnxt: fix kernel panic in the bnxt_get_queue_stats{rx \| tx} - mlx5: bridge, fix the crash caused by LAG state check" * tag 'net-6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits) net: mana: cleanup mana struct after debugfs_remove() net/mlx5e: Prevent bridge link show failure for non-eswitch-allowed devices net/mlx5: Bridge, fix the crash caused by LAG state check net/mlx5: Lag, Check shared fdb before creating MultiPort E-Switch net/mlx5: Fix incorrect IRQ pool usage when releasing IRQs net/mlx5: HWS, Rightsize bwc matcher priority net/mlx5: DR, use the right action structs for STEv3 Revert "openvswitch: switch to per-action label counting in conntrack" net: openvswitch: remove misbehaving actions length check selftests: Add IPv6 link-local address generation tests for GRE devices. gre: Fix IPv6 link-local address generation. netfilter: nft_exthdr: fix offset with ipv4_find_option() selftests/tc-testing: Add a test case for DRR class with TC_H_ROOT net_sched: Prevent creation of classes with TC_H_ROOT ipvs: prevent integer overflow in do_ip_vs_get_ctl() selftests: netfilter: skip br_netfilter queue tests if kernel is tainted netfilter: nf_conncount: Fully initialize struct nf_conncount_tuple in insert_tree() wifi: mac80211: fix MPDU length parsing for EHT 5/6 GHz qlcnic: fix memory leak issues in qlcnic_sriov_common.c rtase: Fix improper release of ring list entries in rtase_sw_reset ...
2025-03-13	Merge tag 'vfs-6.14-rc7.fixes' of ↵	Linus Torvalds
	gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Bring in an RCU pathwalk fix for afs. This is brought in as a merge from the vfs-6.15.shared.afs branch that needs this commit and other trees already depend on it. - Fix vboxfs unterminated string handling. * tag 'vfs-6.14-rc7.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: vboxsf: Add __nonstring annotations for unterminated strings afs: Fix afs_atcell_get_link() to handle RCU pathwalk
2025-03-13	Merge tag 'nf-25-03-13' of ↵	Paolo Abeni
	git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for net: 1) Missing initialization of cpu and jiffies32 fields in conncount, from Kohei Enju. 2) Skip several tests in case kernel is tainted, otherwise tests bogusly report failure too as they also check for tainted kernel, from Florian Westphal. 3) Fix a hyphothetical integer overflow in do_ip_vs_get_ctl() leading to bogus error logs, from Dan Carpenter. 4) Fix incorrect offset in ipv4 option match in nft_exthdr, from Alexey Kashavkin. netfilter pull request 25-03-13 * tag 'nf-25-03-13' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_exthdr: fix offset with ipv4_find_option() ipvs: prevent integer overflow in do_ip_vs_get_ctl() selftests: netfilter: skip br_netfilter queue tests if kernel is tainted netfilter: nf_conncount: Fully initialize struct nf_conncount_tuple in insert_tree() ==================== Link: https://patch.msgid.link/20250313095636.2186-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	net: mana: cleanup mana struct after debugfs_remove()	Shradha Gupta
	When on a MANA VM hibernation is triggered, as part of hibernate_snapshot(), mana_gd_suspend() and mana_gd_resume() are called. If during this mana_gd_resume(), a failure occurs with HWC creation, mana_port_debugfs pointer does not get reinitialized and ends up pointing to older, cleaned-up dentry. Further in the hibernation path, as part of power_down(), mana_gd_shutdown() is triggered. This call, unaware of the failures in resume, tries to cleanup the already cleaned up mana_port_debugfs value and hits the following bug: [ 191.359296] mana 7870:00:00.0: Shutdown was called [ 191.359918] BUG: kernel NULL pointer dereference, address: 0000000000000098 [ 191.360584] #PF: supervisor write access in kernel mode [ 191.361125] #PF: error_code(0x0002) - not-present page [ 191.361727] PGD 1080ea067 P4D 0 [ 191.362172] Oops: Oops: 0002 [#1] SMP NOPTI [ 191.362606] CPU: 11 UID: 0 PID: 1674 Comm: bash Not tainted 6.14.0-rc5+ #2 [ 191.363292] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/21/2024 [ 191.364124] RIP: 0010:down_write+0x19/0x50 [ 191.364537] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb e8 de cd ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 13 75 16 65 48 8b 05 88 24 4c 6a 48 89 43 08 48 8b 5d [ 191.365867] RSP: 0000:ff45fbe0c1c037b8 EFLAGS: 00010246 [ 191.366350] RAX: 0000000000000000 RBX: 0000000000000098 RCX: ffffff8100000000 [ 191.366951] RDX: 0000000000000001 RSI: 0000000000000064 RDI: 0000000000000098 [ 191.367600] RBP: ff45fbe0c1c037c0 R08: 0000000000000000 R09: 0000000000000001 [ 191.368225] R10: ff45fbe0d2b01000 R11: 0000000000000008 R12: 0000000000000000 [ 191.368874] R13: 000000000000000b R14: ff43dc27509d67c0 R15: 0000000000000020 [ 191.369549] FS: 00007dbc5001e740(0000) GS:ff43dc663f380000(0000) knlGS:0000000000000000 [ 191.370213] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 191.370830] CR2: 0000000000000098 CR3: 0000000168e8e002 CR4: 0000000000b73ef0 [ 191.371557] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 191.372192] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 191.372906] Call Trace: [ 191.373262] <TASK> [ 191.373621] ? show_regs+0x64/0x70 [ 191.374040] ? __die+0x24/0x70 [ 191.374468] ? page_fault_oops+0x290/0x5b0 [ 191.374875] ? do_user_addr_fault+0x448/0x800 [ 191.375357] ? exc_page_fault+0x7a/0x160 [ 191.375971] ? asm_exc_page_fault+0x27/0x30 [ 191.376416] ? down_write+0x19/0x50 [ 191.376832] ? down_write+0x12/0x50 [ 191.377232] simple_recursive_removal+0x4a/0x2a0 [ 191.377679] ? __pfx_remove_one+0x10/0x10 [ 191.378088] debugfs_remove+0x44/0x70 [ 191.378530] mana_detach+0x17c/0x4f0 [ 191.378950] ? __flush_work+0x1e2/0x3b0 [ 191.379362] ? __cond_resched+0x1a/0x50 [ 191.379787] mana_remove+0xf2/0x1a0 [ 191.380193] mana_gd_shutdown+0x3b/0x70 [ 191.380642] pci_device_shutdown+0x3a/0x80 [ 191.381063] device_shutdown+0x13e/0x230 [ 191.381480] kernel_power_off+0x35/0x80 [ 191.381890] hibernate+0x3c6/0x470 [ 191.382312] state_store+0xcb/0xd0 [ 191.382734] kobj_attr_store+0x12/0x30 [ 191.383211] sysfs_kf_write+0x3e/0x50 [ 191.383640] kernfs_fop_write_iter+0x140/0x1d0 [ 191.384106] vfs_write+0x271/0x440 [ 191.384521] ksys_write+0x72/0xf0 [ 191.384924] __x64_sys_write+0x19/0x20 [ 191.385313] x64_sys_call+0x2b0/0x20b0 [ 191.385736] do_syscall_64+0x79/0x150 [ 191.386146] ? __mod_memcg_lruvec_state+0xe7/0x240 [ 191.386676] ? __lruvec_stat_mod_folio+0x79/0xb0 [ 191.387124] ? __pfx_lru_add+0x10/0x10 [ 191.387515] ? queued_spin_unlock+0x9/0x10 [ 191.387937] ? do_anonymous_page+0x33c/0xa00 [ 191.388374] ? __handle_mm_fault+0xcf3/0x1210 [ 191.388805] ? __count_memcg_events+0xbe/0x180 [ 191.389235] ? handle_mm_fault+0xae/0x300 [ 191.389588] ? do_user_addr_fault+0x559/0x800 [ 191.390027] ? irqentry_exit_to_user_mode+0x43/0x230 [ 191.390525] ? irqentry_exit+0x1d/0x30 [ 191.390879] ? exc_page_fault+0x86/0x160 [ 191.391235] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 191.391745] RIP: 0033:0x7dbc4ff1c574 [ 191.392111] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d d5 ea 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 [ 191.393412] RSP: 002b:00007ffd95a23ab8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 191.393990] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007dbc4ff1c574 [ 191.394594] RDX: 0000000000000005 RSI: 00005a6eeadb0ce0 RDI: 0000000000000001 [ 191.395215] RBP: 00007ffd95a23ae0 R08: 00007dbc50003b20 R09: 0000000000000000 [ 191.395805] R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000000005 [ 191.396404] R13: 00005a6eeadb0ce0 R14: 00007dbc500045c0 R15: 00007dbc50001ee0 [ 191.396987] </TASK> To fix this, we explicitly set such mana debugfs variables to NULL after debugfs_remove() is called. Fixes: 6607c17c6c5e ("net: mana: Enable debugfs files for MANA device") Cc: stable@vger.kernel.org Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/1741688260-28922-1-git-send-email-shradhagupta@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	Merge branch 'mlx5-misc-fixes-2025-03-10'	Paolo Abeni
	Tariq Toukan says: ==================== mlx5 misc fixes 2025-03-10 This patchset provides misc bug fixes from the team to the mlx5 core and Eth drivers. ==================== Link: https://patch.msgid.link/1741644104-97767-1-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	net/mlx5e: Prevent bridge link show failure for non-eswitch-allowed devices	Carolina Jubran
	mlx5_eswitch_get_vepa returns -EPERM if the device lacks eswitch_manager capability, blocking mlx5e_bridge_getlink from retrieving VEPA mode. Since mlx5e_bridge_getlink implements ndo_bridge_getlink, returning -EPERM causes bridge link show to fail instead of skipping devices without this capability. To avoid this, return -EOPNOTSUPP from mlx5e_bridge_getlink when mlx5_eswitch_get_vepa fails, ensuring the command continues processing other devices while ignoring those without the necessary capability. Fixes: 4b89251de024 ("net/mlx5: Support ndo bridge_setlink and getlink") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1741644104-97767-7-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	net/mlx5: Bridge, fix the crash caused by LAG state check	Jianbo Liu
	When removing LAG device from bridge, NETDEV_CHANGEUPPER event is triggered. Driver finds the lower devices (PFs) to flush all the offloaded entries. And mlx5_lag_is_shared_fdb is checked, it returns false if one of PF is unloaded. In such case, mlx5_esw_bridge_lag_rep_get() and its caller return NULL, instead of the alive PF, and the flush is skipped. Besides, the bridge fdb entry's lastuse is updated in mlx5 bridge event handler. But this SWITCHDEV_FDB_ADD_TO_BRIDGE event can be ignored in this case because the upper interface for bond is deleted, and the entry will never be aged because lastuse is never updated. To make things worse, as the entry is alive, mlx5 bridge workqueue keeps sending that event, which is then handled by kernel bridge notifier. It causes the following crash when accessing the passed bond netdev which is already destroyed. To fix this issue, remove such checks. LAG state is already checked in commit 15f8f168952f ("net/mlx5: Bridge, verify LAG state when adding bond to bridge"), driver still need to skip offload if LAG becomes invalid state after initialization. Oops: stack segment: 0000 [#1] SMP CPU: 3 UID: 0 PID: 23695 Comm: kworker/u40:3 Tainted: G OE 6.11.0_mlnx #1 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5_bridge_wq mlx5_esw_bridge_update_work [mlx5_core] RIP: 0010:br_switchdev_event+0x2c/0x110 [bridge] Code: 44 00 00 48 8b 02 48 f7 00 00 02 00 00 74 69 41 54 55 53 48 83 ec 08 48 8b a8 08 01 00 00 48 85 ed 74 4a 48 83 fe 02 48 89 d3 <4c> 8b 65 00 74 23 76 49 48 83 fe 05 74 7e 48 83 fe 06 75 2f 0f b7 RSP: 0018:ffffc900092cfda0 EFLAGS: 00010297 RAX: ffff888123bfe000 RBX: ffffc900092cfe08 RCX: 00000000ffffffff RDX: ffffc900092cfe08 RSI: 0000000000000001 RDI: ffffffffa0c585f0 RBP: 6669746f6e690a30 R08: 0000000000000000 R09: ffff888123ae92c8 R10: 0000000000000000 R11: fefefefefefefeff R12: ffff888123ae9c60 R13: 0000000000000001 R14: ffffc900092cfe08 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88852c980000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f15914c8734 CR3: 0000000002830005 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __die_body+0x1a/0x60 ? die+0x38/0x60 ? do_trap+0x10b/0x120 ? do_error_trap+0x64/0xa0 ? exc_stack_segment+0x33/0x50 ? asm_exc_stack_segment+0x22/0x30 ? br_switchdev_event+0x2c/0x110 [bridge] ? sched_balance_newidle.isra.149+0x248/0x390 notifier_call_chain+0x4b/0xa0 atomic_notifier_call_chain+0x16/0x20 mlx5_esw_bridge_update+0xec/0x170 [mlx5_core] mlx5_esw_bridge_update_work+0x19/0x40 [mlx5_core] process_scheduled_works+0x81/0x390 worker_thread+0x106/0x250 ? bh_worker+0x110/0x110 kthread+0xb7/0xe0 ? kthread_park+0x80/0x80 ret_from_fork+0x2d/0x50 ? kthread_park+0x80/0x80 ret_from_fork_asm+0x11/0x20 </TASK> Fixes: ff9b7521468b ("net/mlx5: Bridge, support LAG") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1741644104-97767-6-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	net/mlx5: Lag, Check shared fdb before creating MultiPort E-Switch	Shay Drory
	Currently, MultiPort E-Switch is requesting to create a LAG with shared FDB without checking the LAG is supporting shared FDB. Add the check. Fixes: a32327a3a02c ("net/mlx5: Lag, Control MultiPort E-Switch single FDB mode") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1741644104-97767-5-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	net/mlx5: Fix incorrect IRQ pool usage when releasing IRQs	Shay Drory
	mlx5_irq_pool_get() is a getter for completion IRQ pool only. However, after the cited commit, mlx5_irq_pool_get() is called during ctrl IRQ release flow to retrieve the pool, resulting in the use of an incorrect IRQ pool. Hence, use the newly introduced mlx5_irq_get_pool() getter to retrieve the correct IRQ pool based on the IRQ itself. While at it, rename mlx5_irq_pool_get() to mlx5_irq_table_get_comp_irq_pool() which accurately reflects its purpose and improves code readability. Fixes: 0477d5168bbb ("net/mlx5: Expose SFs IRQs") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1741644104-97767-4-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	net/mlx5: HWS, Rightsize bwc matcher priority	Vlad Dogaru
	The bwc layer was clamping the matcher priority from 32 bits to 16 bits. This didn't show up until a matcher was resized, since the initial native matcher was created using the correct 32 bit value. The fix also reorders fields to avoid some padding. Fixes: 2111bb970c78 ("net/mlx5: HWS, added backward-compatible API handling") Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1741644104-97767-3-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	net/mlx5: DR, use the right action structs for STEv3	Yevgeny Kliteynik
	Some actions in ConnectX-8 (STEv3) have different structure, and they are handled separately in ste_ctx_v3. This separate handling was missing two actions: INSERT_HDR and REMOVE_HDR, which broke SWS for Linux Bridge. This patch resolves the issue by introducing dedicated callbacks for the insert and remove header functions, with version-specific implementations for each STE variant. Fixes: 4d617b57574f ("net/mlx5: DR, add support for ConnectX-8 steering") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Itamar Gozlan <igozlan@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1741644104-97767-2-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	Merge branch 'mlx5-next' of ↵	Paolo Abeni
	git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Tariq Toukan says: ==================== mlx5-next updates 2025-03-10 The following pull-request contains common mlx5 updates for your net-next tree. Please pull and let me know of any problem. * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Add IFC bits for PPCNT recovery counters group net/mlx5: fs, add RDMA TRANSPORT steering domain support net/mlx5: Query ADV_RDMA capabilities net/mlx5: Limit non-privileged commands net/mlx5: Allow the throttle mechanism to be more dynamic net/mlx5: Add RDMA_CTRL HW capabilities ==================== Link: https://patch.msgid.link/1741608293-41436-1-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	dt-bindings: net: Define interrupt constraints for DWMAC vendor bindings	Lad Prabhakar
	The `snps,dwmac.yaml` binding currently sets `maxItems: 3` for the `interrupts` and `interrupt-names` properties, but vendor bindings selecting `snps,dwmac.yaml` do not impose these limits. Define constraints for `interrupts` and `interrupt-names` properties in various DWMAC vendor bindings to ensure proper validation and consistency. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Acked-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Link: https://patch.msgid.link/20250309003301.1152228-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	Merge branch 'net-stmmac-dwmac-rk-validate-grf-and-peripheral-grf-during-probe'	Paolo Abeni
	Jonas Karlman says: ==================== net: stmmac: dwmac-rk: Validate GRF and peripheral GRF during probe All Rockchip GMAC variants typically write to GRF regs to control e.g. interface mode, speed and MAC rx/tx delay. Newer SoCs such as RK3576 and RK3588 use a mix of GRF and peripheral GRF regs. These syscon regmaps is located with help of a rockchip,grf and rockchip,php-grf phandle. However, validating the rockchip,grf and rockchip,php-grf syscon regmap is deferred until e.g. interface mode or speed is configured. This series change to validate the GRF and peripheral GRF syscon regmap at probe time to help simplify the SoC specific operations. This should not introduce any backward compatibility issues as all GMAC nodes have been added together with a rockchip,grf phandle (and rockchip,php-grf where required) in their initial commit. ==================== Link: https://patch.msgid.link/20250308213720.2517944-1-jonas@kwiboo.se Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	net: stmmac: dwmac-rk: Remove unneeded GRF and peripheral GRF checks	Jonas Karlman
	Now that GRF, and peripheral GRF where needed, is validated at probe time there is no longer any need to check and log an error in each SoC specific operation. Remove unneeded IS_ERR() checks and early bail out from each SoC specific operation. Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250308213720.2517944-4-jonas@kwiboo.se Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	net: stmmac: dwmac-rk: Validate GRF and peripheral GRF during probe	Jonas Karlman
	All Rockchip GMAC variants typically write to GRF regs to control e.g. interface mode, speed and MAC rx/tx delay. Newer SoCs such as RK3576 and RK3588 use a mix of GRF and peripheral GRF regs. These syscon regmaps is located with help of a rockchip,grf and rockchip,php-grf phandle. However, validating the rockchip,grf and rockchip,php-grf syscon regmap is deferred until e.g. interface mode or speed is configured, inside the individual SoC specific operations. Change to validate the rockchip,grf and rockchip,php-grf syscon regmap at probe time to simplify all SoC specific operations. This should not introduce any backward compatibility issues as all GMAC nodes have been added together with a rockchip,grf phandle (and rockchip,php-grf where required) in their initial commit. Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250308213720.2517944-3-jonas@kwiboo.se Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	dt-bindings: net: rockchip-dwmac: Require rockchip,grf and rockchip,php-grf	Jonas Karlman
	All Rockchip GMAC variants typically write to GRF regs to control e.g. interface mode, speed and MAC rx/tx delay. Newer SoCs such as RK3562, RK3576 and RK3588 use a mix of GRF and peripheral GRF regs. Prior to the commit b331b8ef86f0 ("dt-bindings: net: convert rockchip-dwmac to json-schema") the property rockchip,grf was listed under "Required properties". During the conversion this was lost and rockchip,grf has since then incorrectly been treated as optional and not as required. Similarly, when rockchip,php-grf was added to the schema in the commit a2b77831427c ("dt-bindings: net: rockchip-dwmac: add rk3588 gmac compatible") it also incorrectly has been treated as optional for all GMAC variants, when it should have been required for RK3588, and later also for RK3576. Update this binding to require rockchip,grf and rockchip,php-grf to properly reflect that GRF (and peripheral GRF for RK3576/RK3588) is required to control part of GMAC. This should not introduce any breakage as all Rockchip GMAC nodes have been added together with a rockchip,grf phandle (and rockchip,php-grf where required) in their initial commit. Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250308213720.2517944-2-jonas@kwiboo.se Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	Revert "openvswitch: switch to per-action label counting in conntrack"	Xin Long
	Currently, ovs_ct_set_labels() is only called for confirmed conntrack entries (ct) within ovs_ct_commit(). However, if the conntrack entry does not have the labels_ext extension, attempting to allocate it in ovs_ct_get_conn_labels() for a confirmed entry triggers a warning in nf_ct_ext_add(): WARN_ON(nf_ct_is_confirmed(ct)); This happens when the conntrack entry is created externally before OVS increments net->ct.labels_used. The issue has become more likely since commit fcb1aa5163b1 ("openvswitch: switch to per-action label counting in conntrack"), which changed to use per-action label counting and increment net->ct.labels_used when a flow with ct action is added. Since there’s no straightforward way to fully resolve this issue at the moment, this reverts the commit to avoid breaking existing use cases. Fixes: fcb1aa5163b1 ("openvswitch: switch to per-action label counting in conntrack") Reported-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/1bdeb2f3a812bca016a225d3de714427b2cd4772.1741457143.git.lucien.xin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	net: openvswitch: remove misbehaving actions length check	Ilya Maximets
	The actions length check is unreliable and produces different results depending on the initial length of the provided netlink attribute and the composition of the actual actions inside of it. For example, a user can add 4088 empty clone() actions without triggering -EMSGSIZE, on attempt to add 4089 such actions the operation will fail with the -EMSGSIZE verdict. However, if another 16 KB of other actions will be appended to the previous 4089 clone() actions, the check passes and the flow is successfully installed into the openvswitch datapath. The reason for a such a weird behavior is the way memory is allocated. When ovs_flow_cmd_new() is invoked, it calls ovs_nla_copy_actions(), that in turn calls nla_alloc_flow_actions() with either the actual length of the user-provided actions or the MAX_ACTIONS_BUFSIZE. The function adds the size of the sw_flow_actions structure and then the actually allocated memory is rounded up to the closest power of two. So, if the user-provided actions are larger than MAX_ACTIONS_BUFSIZE, then MAX_ACTIONS_BUFSIZE + sizeof(*sfa) rounded up is 32K + 24 -> 64K. Later, while copying individual actions, we look at ksize(), which is 64K, so this way the MAX_ACTIONS_BUFSIZE check is not actually triggered and the user can easily allocate almost 64 KB of actions. However, when the initial size is less than MAX_ACTIONS_BUFSIZE, but the actions contain ones that require size increase while copying (such as clone() or sample()), then the limit check will be performed during the reserve_sfa_size() and the user will not be allowed to create actions that yield more than 32 KB internally. This is one part of the problem. The other part is that it's not actually possible for the userspace application to know beforehand if the particular set of actions will be rejected or not. Certain actions require more space in the internal representation, e.g. an empty clone() takes 4 bytes in the action list passed in by the user, but it takes 12 bytes in the internal representation due to an extra nested attribute, and some actions require less space in the internal representations, e.g. set(tunnel(..)) normally takes 64+ bytes in the action list provided by the user, but only needs to store a single pointer in the internal implementation, since all the data is stored in the tunnel_info structure instead. And the action size limit is applied to the internal representation, not to the action list passed by the user. So, it's not possible for the userpsace application to predict if the certain combination of actions will be rejected or not, because it is not possible for it to calculate how much space these actions will take in the internal representation without knowing kernel internals. All that is causing random failures in ovs-vswitchd in userspace and inability to handle certain traffic patterns as a result. For example, it is reported that adding a bit more than a 1100 VMs in an OpenStack setup breaks the network due to OVS not being able to handle ARP traffic anymore in some cases (it tries to install a proper datapath flow, but the kernel rejects it with -EMSGSIZE, even though the action list isn't actually that large.) Kernel behavior must be consistent and predictable in order for the userspace application to use it in a reasonable way. ovs-vswitchd has a mechanism to re-direct parts of the traffic and partially handle it in userspace if the required action list is oversized, but that doesn't work properly if we can't actually tell if the action list is oversized or not. Solution for this is to check the size of the user-provided actions instead of the internal representation. This commit just removes the check from the internal part because there is already an implicit size check imposed by the netlink protocol. The attribute can't be larger than 64 KB. Realistically, we could reduce the limit to 32 KB, but we'll be risking to break some existing setups that rely on the fact that it's possible to create nearly 64 KB action lists today. Vast majority of flows in real setups are below 100-ish bytes. So removal of the limit will not change real memory consumption on the system. The absolutely worst case scenario is if someone adds a flow with 64 KB of empty clone() actions. That will yield a 192 KB in the internal representation consuming 256 KB block of memory. However, that list of actions is not meaningful and also a no-op. Real world very large action lists (that can occur for a rare cases of BUM traffic handling) are unlikely to contain a large number of clones and will likely have a lot of tunnel attributes making the internal representation comparable in size to the original action list. So, it should be fine to just remove the limit. Commit in the 'Fixes' tag is the first one that introduced the difference between internal representation and the user-provided action lists, but there were many more afterwards that lead to the situation we have today. Fixes: 7d5437c709de ("openvswitch: Add tunneling interface.") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20250308004609.2881861-1-i.maximets@ovn.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	Merge branch 'gre-fix-regressions-in-ipv6-link-local-address-generation'	Paolo Abeni
	Guillaume Nault says: ==================== gre: Fix regressions in IPv6 link-local address generation. IPv6 link-local address generation has some special cases for GRE devices. This has led to several regressions in the past, and some of them are still not fixed. This series fixes the remaining problems, like the ipv6.conf.<dev>.addr_gen_mode sysctl being ignored and the router discovery process not being started (see details in patch 1). To avoid any further regressions, patch 2 adds selftests covering IPv4 and IPv6 gre/gretap devices with all combinations of currently supported addr_gen_mode values. ==================== Link: https://patch.msgid.link/cover.1741375285.git.gnault@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	selftests: Add IPv6 link-local address generation tests for GRE devices.	Guillaume Nault
	GRE devices have their special code for IPv6 link-local address generation that has been the source of several regressions in the past. Add selftest to check that all gre, ip6gre, gretap and ip6gretap get an IPv6 link-link local address in accordance with the net.ipv6.conf.<dev>.addr_gen_mode sysctl. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/2d6772af8e1da9016b2180ec3f8d9ee99f470c77.1741375285.git.gnault@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	gre: Fix IPv6 link-local address generation.	Guillaume Nault
	Use addrconf_addr_gen() to generate IPv6 link-local addresses on GRE devices in most cases and fall back to using add_v4_addrs() only in case the GRE configuration is incompatible with addrconf_addr_gen(). GRE used to use addrconf_addr_gen() until commit e5dd729460ca ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address") restricted this use to gretap and ip6gretap devices, and created add_v4_addrs() (borrowed from SIT) for non-Ethernet GRE ones. The original problem came when commit 9af28511be10 ("addrconf: refuse isatap eui64 for INADDR_ANY") made __ipv6_isatap_ifid() fail when its addr parameter was 0. The commit says that this would create an invalid address, however, I couldn't find any RFC saying that the generated interface identifier would be wrong. Anyway, since gre over IPv4 devices pass their local tunnel address to __ipv6_isatap_ifid(), that commit broke their IPv6 link-local address generation when the local address was unspecified. Then commit e5dd729460ca ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address") tried to fix that case by defining add_v4_addrs() and calling it to generate the IPv6 link-local address instead of using addrconf_addr_gen() (apart for gretap and ip6gretap devices, which would still use the regular addrconf_addr_gen(), since they have a MAC address). That broke several use cases because add_v4_addrs() isn't properly integrated into the rest of IPv6 Neighbor Discovery code. Several of these shortcomings have been fixed over time, but add_v4_addrs() remains broken on several aspects. In particular, it doesn't send any Router Sollicitations, so the SLAAC process doesn't start until the interface receives a Router Advertisement. Also, add_v4_addrs() mostly ignores the address generation mode of the interface (/proc/sys/net/ipv6/conf//addr_gen_mode), thus breaking the IN6_ADDR_GEN_MODE_RANDOM and IN6_ADDR_GEN_MODE_STABLE_PRIVACY cases. Fix the situation by using add_v4_addrs() only in the specific scenario where the normal method would fail. That is, for interfaces that have all of the following characteristics: run over IPv4, * transport IP packets directly, not Ethernet (that is, not gretap interfaces), * tunnel endpoint is INADDR_ANY (that is, 0), * device address generation mode is EUI64. In all other cases, revert back to the regular addrconf_addr_gen(). Also, remove the special case for ip6gre interfaces in add_v4_addrs(), since ip6gre devices now always use addrconf_addr_gen() instead. Fixes: e5dd729460ca ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address") Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/559c32ce5c9976b269e6337ac9abb6a96abe5096.1741375285.git.gnault@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	net: hsr: Add KUnit test for PRP	Jaakko Karrenpalo
	Add unit tests for the PRP duplicate detection Signed-off-by: Jaakko Karrenpalo <jkarrenpalo@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250307161700.1045-2-jkarrenpalo@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	net: hsr: Fix PRP duplicate detection	Jaakko Karrenpalo
	Add PRP specific function for handling duplicate packets. This is needed because of potential L2 802.1p prioritization done by network switches. The L2 prioritization can re-order the PRP packets from a node causing the existing implementation to discard the frame(s) that have been received 'late' because the sequence number is before the previous received packet. This can happen if the node is sending multiple frames back-to-back with different priority. Signed-off-by: Jaakko Karrenpalo <jkarrenpalo@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250307161700.1045-1-jkarrenpalo@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	netfilter: nft_exthdr: fix offset with ipv4_find_option()	Alexey Kashavkin
	There is an incorrect calculation in the offset variable which causes the nft_skb_copy_to_reg() function to always return -EFAULT. Adding the start variable is redundant. In the __ip_options_compile() function the correct offset is specified when finding the function. There is no need to add the size of the iphdr structure to the offset. Fixes: dbb5281a1f84 ("netfilter: nf_tables: add support for matching IPv4 options") Signed-off-by: Alexey Kashavkin <akashavkin@gmail.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-03-13	net: cn23xx: fix typos	Janik Haag
	This patch fixes a few typos, spelling mistakes, and a bit of grammar, increasing the comments readability. Signed-off-by: Janik Haag <janik@aq0.de> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250307145648.1679912-2-janik@aq0.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13	net: hns3: use string choices helper	Jian Shen
	Use string choices helper for better readability. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250307113733.819448-1-shaojijie@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-12	Merge tag 'sched_ext-for-6.14-rc6-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fix from Tejun Heo: "BPF schedulers could trigger a crash by passing in an invalid CPU to the scx_bpf_select_cpu_dfl() helper. Fix it by verifying input validity" * tag 'sched_ext-for-6.14-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Validate prev_cpu in scx_bpf_select_cpu_dfl()
2025-03-12	Merge tag 'spi-fix-v6.14-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A couple of driver specific fixes, an error handling fix for the Atmel QuadSPI driver and a fix for a nasty synchronisation issue in the data path for the Microchip driver which affects larger transfers. There's also a MAINTAINERS update for the Samsung driver" * tag 'spi-fix-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: microchip-core: prevent RX overflows when transmit size > FIFO size MAINTAINERS: add tambarus as R for Samsung SPI spi: atmel-quadspi: remove references to runtime PM on error path
2025-03-12	netdevsim: 'support' multi-buf XDP	Jakub Kicinski
	Don't error out on large MTU if XDP is multi-buf. The ping test now tests ping with XDP and high MTU. netdevsim doesn't actually run the prog (yet?) so it doesn't matter if the prog was multi-buf.. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/20250311092820.542148-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-12	Merge branch 'net-remove-rtnl_lock-from-the-callers-of-queue-apis'	Jakub Kicinski
	Stanislav Fomichev says: ==================== net: remove rtnl_lock from the callers of queue APIs All drivers that use queue management APIs already depend on the netdev lock. Ultimately, we want to have most of the paths that work with specific netdev to be rtnl_lock-free (ethtool mostly in particular). Queue API currently has a much smaller API surface, so start with rtnl_lock from it: - add mutex to each dmabuf binding (to replace rtnl_lock) - move netdev lock management to the callers of netdev_rx_queue_restart and drop rtnl_lock ==================== Link: https://patch.msgid.link/20250311144026.4154277-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-12	net: drop rtnl_lock for queue_mgmt operations	Stanislav Fomichev
	All drivers that use queue API are already converted to use netdev instance lock. Move netdev instance lock management to the netlink layer and drop rtnl_lock. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Mina Almasry. <almasrymina@google.com> Link: https://patch.msgid.link/20250311144026.4154277-4-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-12	net: add granular lock for the netdev netlink socket	Stanislav Fomichev
	As we move away from rtnl_lock for queue ops, introduce per-netdev_nl_sock lock. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250311144026.4154277-3-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-12	net: create netdev_nl_sock to wrap bindings list	Stanislav Fomichev
	No functional changes. Next patches will add more granular locking to netdev_nl_sock. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250311144026.4154277-2-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-12	net/mlx5: Avoid unnecessary use of comma operator	Simon Horman
	Although it does not seem to have any untoward side-effects, the use of ';' to separate to assignments seems more appropriate than ','. Flagged by clang-19 -Wcomma No functional change intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250307-mlx5-comma-v1-1-934deb6927bb@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-12	selftests: net: bump GRO timeout for gro/setup_veth	Jakub Kicinski
	Commit 51bef03e1a71 ("selftests/net: deflake GRO tests") recently switched to NAPI suspension, and lowered the timeout from 1ms to 100us. This started causing flakes in netdev-run CI. Let's bump it to 200us. In a quick test of a debug kernel I see failures with 100us, with 200us in 5 runs I see 2 completely clean runs and 3 with a single retry (GRO test will retry up to 5 times). Reviewed-by: Kevin Krakauer <krakauer@google.com> Link: https://patch.msgid.link/20250310110821.385621-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-12	eth: bnxt: add missing netdev lock management to bnxt_dl_reload_up	Stanislav Fomichev
	bnxt_dl_reload_up is completely missing instance lock management which can result in `devlink dev reload` leaving with instance lock held. Add the missing calls. Also add netdev_assert_locked to make it clear that the up() method is running with the instance lock grabbed. v2: - add net/netdev_lock.h include to bnxt_devlink.c for netdev_assert_locked Fixes: 004b5008016a ("eth: bnxt: remove most dependencies on RTNL") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250309215851.2003708-3-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-12	eth: bnxt: request unconditional ops lock	Stanislav Fomichev
	netdev_lock_ops conditionally grabs instance lock when queue_mgmt_ops is defined. However queue_mgmt_ops support is signaled via FW so we can sometimes boot without queue_mgmt_ops being set. This will result in bnxt running without instance lock which the driver now heavily depends on. Set request_ops_lock to true unconditionally to always request netdev instance lock. Fixes: 004b5008016a ("eth: bnxt: remove most dependencies on RTNL") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250309215851.2003708-2-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-12	eth: bnxt: switch to netif_close	Stanislav Fomichev
	All (error) paths that call dev_close are already holding instance lock, so switch to netif_close to avoid the deadlock. v2: - add missing EXPORT_MODULE for netif_close Fixes: 004b5008016a ("eth: bnxt: remove most dependencies on RTNL") Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250309215851.2003708-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-12	net: revert to lockless TC_SETUP_BLOCK and TC_SETUP_FT	Stanislav Fomichev
	There is a couple of places from which we can arrive to ndo_setup_tc with TC_SETUP_BLOCK/TC_SETUP_FT: - netlink - netlink notifier - netdev notifier Locking netdev too deep in this call chain seems to be problematic (especially assuming some/all of the call_netdevice_notifiers NETDEV_UNREGISTER) might soon be running with the instance lock). Revert to lockless ndo_setup_tc for TC_SETUP_BLOCK/TC_SETUP_FT. NFT framework already takes care of most of the locking. Document the assumptions. ndo_setup_tc TC_SETUP_BLOCK nft_block_offload_cmd nft_chain_offload_cmd nft_flow_block_chain nft_flow_offload_chain nft_flow_rule_offload_abort nft_flow_rule_offload_commit nft_flow_rule_offload_commit nf_tables_commit nfnetlink_rcv_batch nfnetlink_rcv_skb_batch nfnetlink_rcv nft_offload_netdev_event NETDEV_UNREGISTER notifier ndo_setup_tc TC_SETUP_FT nf_flow_table_offload_cmd nf_flow_table_offload_setup nft_unregister_flowtable_hook nft_register_flowtable_net_hooks nft_flowtable_update nf_tables_newflowtable nfnetlink_rcv_batch (.call NFNL_CB_BATCH) nft_flowtable_update nf_tables_newflowtable nft_flowtable_event nf_tables_flowtable_event NETDEV_UNREGISTER notifier __nft_unregister_flowtable_net_hooks nft_unregister_flowtable_net_hooks nf_tables_commit nfnetlink_rcv_batch (.call NFNL_CB_BATCH) __nf_tables_abort nf_tables_abort nfnetlink_rcv_batch __nft_release_hook __nft_release_hooks nf_tables_pre_exit_net -> module unload nft_rcv_nl_event netlink_register_notifier (oh boy) nft_register_flowtable_net_hooks nft_flowtable_update nf_tables_newflowtable nf_tables_newflowtable Fixes: c4f0f30b424e ("net: hold netdev instance lock during nft ndo_setup_tc") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Reported-by: syzbot+0afb4bcf91e5a1afdcad@syzkaller.appspotmail.com Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250308044726.1193222-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-12	Merge branch 'net_sched-prevent-creation-of-classes-with-tc_h_root'	Jakub Kicinski
	Cong Wang says: ==================== net_sched: Prevent creation of classes with TC_H_ROOT This patchset contains a bug fix and its TDC test case. ==================== Link: https://patch.msgid.link/20250306232355.93864-1-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-12	selftests/tc-testing: Add a test case for DRR class with TC_H_ROOT	Cong Wang
	Integrate the reproduer from Mingi to TDC. All test results: 1..4 ok 1 0385 - Create DRR with default setting ok 2 2375 - Delete DRR with handle ok 3 3092 - Show DRR class ok 4 4009 - Reject creation of DRR class with classid TC_H_ROOT Cc: Mingi Cho <mincho@theori.io> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250306232355.93864-3-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-12	net_sched: Prevent creation of classes with TC_H_ROOT	Cong Wang
	The function qdisc_tree_reduce_backlog() uses TC_H_ROOT as a termination condition when traversing up the qdisc tree to update parent backlog counters. However, if a class is created with classid TC_H_ROOT, the traversal terminates prematurely at this class instead of reaching the actual root qdisc, causing parent statistics to be incorrectly maintained. In case of DRR, this could lead to a crash as reported by Mingi Cho. Prevent the creation of any Qdisc class with classid TC_H_ROOT (0xFFFFFFFF) across all qdisc types, as suggested by Jamal. Reported-by: Mingi Cho <mincho@theori.io> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Fixes: 066a3b5b2346 ("[NET_SCHED] sch_api: fix qdisc_tree_decrease_qlen() loop") Link: https://patch.msgid.link/20250306232355.93864-2-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-12	ipvs: prevent integer overflow in do_ip_vs_get_ctl()	Dan Carpenter
	The get->num_services variable is an unsigned int which is controlled by the user. The struct_size() function ensures that the size calculation does not overflow an unsigned long, however, we are saving the result to an int so the calculation can overflow. Both "len" and "get->num_services" come from the user. This check is just a sanity check to help the user and ensure they are using the API correctly. An integer overflow here is not a big deal. This has no security impact. Save the result from struct_size() type size_t to fix this integer overflow bug. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-03-12	selftests: netfilter: skip br_netfilter queue tests if kernel is tainted	Florian Westphal
	These scripts fail if the kernel is tainted which leads to wrong test failure reports in CI environments when an unrelated test triggers some splat. Check taint state at start of script and SKIP if its already dodgy. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-03-12	netfilter: nf_conncount: Fully initialize struct nf_conncount_tuple in ↵	Kohei Enju
	insert_tree() Since commit b36e4523d4d5 ("netfilter: nf_conncount: fix garbage collection confirm race"), `cpu` and `jiffies32` were introduced to the struct nf_conncount_tuple. The commit made nf_conncount_add() initialize `conn->cpu` and `conn->jiffies32` when allocating the struct. In contrast, count_tree() was not changed to initialize them. By commit 34848d5c896e ("netfilter: nf_conncount: Split insert and traversal"), count_tree() was split and the relevant allocation code now resides in insert_tree(). Initialize `conn->cpu` and `conn->jiffies32` in insert_tree(). BUG: KMSAN: uninit-value in find_or_evict net/netfilter/nf_conncount.c:117 [inline] BUG: KMSAN: uninit-value in __nf_conncount_add+0xd9c/0x2850 net/netfilter/nf_conncount.c:143 find_or_evict net/netfilter/nf_conncount.c:117 [inline] __nf_conncount_add+0xd9c/0x2850 net/netfilter/nf_conncount.c:143 count_tree net/netfilter/nf_conncount.c:438 [inline] nf_conncount_count+0x82f/0x1e80 net/netfilter/nf_conncount.c:521 connlimit_mt+0x7f6/0xbd0 net/netfilter/xt_connlimit.c:72 __nft_match_eval net/netfilter/nft_compat.c:403 [inline] nft_match_eval+0x1a5/0x300 net/netfilter/nft_compat.c:433 expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline] nft_do_chain+0x426/0x2290 net/netfilter/nf_tables_core.c:288 nft_do_chain_ipv4+0x1a5/0x230 net/netfilter/nft_chain_filter.c:23 nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline] nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626 nf_hook_slow_list+0x24d/0x860 net/netfilter/core.c:663 NF_HOOK_LIST include/linux/netfilter.h:350 [inline] ip_sublist_rcv+0x17b7/0x17f0 net/ipv4/ip_input.c:633 ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:669 __netif_receive_skb_list_ptype net/core/dev.c:5936 [inline] __netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5983 __netif_receive_skb_list net/core/dev.c:6035 [inline] netif_receive_skb_list_internal+0x1085/0x1700 net/core/dev.c:6126 netif_receive_skb_list+0x5a/0x460 net/core/dev.c:6178 xdp_recv_frames net/bpf/test_run.c:280 [inline] xdp_test_run_batch net/bpf/test_run.c:361 [inline] bpf_test_run_xdp_live+0x2e86/0x3480 net/bpf/test_run.c:390 bpf_prog_test_run_xdp+0xf1d/0x1ae0 net/bpf/test_run.c:1316 bpf_prog_test_run+0x5e5/0xa30 kernel/bpf/syscall.c:4407 __sys_bpf+0x6aa/0xd90 kernel/bpf/syscall.c:5813 __do_sys_bpf kernel/bpf/syscall.c:5902 [inline] __se_sys_bpf kernel/bpf/syscall.c:5900 [inline] __ia32_sys_bpf+0xa0/0xe0 kernel/bpf/syscall.c:5900 ia32_sys_call+0x394d/0x4180 arch/x86/include/generated/asm/syscalls_32.h:358 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0xb0/0x110 arch/x86/entry/common.c:387 do_fast_syscall_32+0x38/0x80 arch/x86/entry/common.c:412 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:450 entry_SYSENTER_compat_after_hwframe+0x84/0x8e Uninit was created at: slab_post_alloc_hook mm/slub.c:4121 [inline] slab_alloc_node mm/slub.c:4164 [inline] kmem_cache_alloc_noprof+0x915/0xe10 mm/slub.c:4171 insert_tree net/netfilter/nf_conncount.c:372 [inline] count_tree net/netfilter/nf_conncount.c:450 [inline] nf_conncount_count+0x1415/0x1e80 net/netfilter/nf_conncount.c:521 connlimit_mt+0x7f6/0xbd0 net/netfilter/xt_connlimit.c:72 __nft_match_eval net/netfilter/nft_compat.c:403 [inline] nft_match_eval+0x1a5/0x300 net/netfilter/nft_compat.c:433 expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline] nft_do_chain+0x426/0x2290 net/netfilter/nf_tables_core.c:288 nft_do_chain_ipv4+0x1a5/0x230 net/netfilter/nft_chain_filter.c:23 nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline] nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626 nf_hook_slow_list+0x24d/0x860 net/netfilter/core.c:663 NF_HOOK_LIST include/linux/netfilter.h:350 [inline] ip_sublist_rcv+0x17b7/0x17f0 net/ipv4/ip_input.c:633 ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:669 __netif_receive_skb_list_ptype net/core/dev.c:5936 [inline] __netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5983 __netif_receive_skb_list net/core/dev.c:6035 [inline] netif_receive_skb_list_internal+0x1085/0x1700 net/core/dev.c:6126 netif_receive_skb_list+0x5a/0x460 net/core/dev.c:6178 xdp_recv_frames net/bpf/test_run.c:280 [inline] xdp_test_run_batch net/bpf/test_run.c:361 [inline] bpf_test_run_xdp_live+0x2e86/0x3480 net/bpf/test_run.c:390 bpf_prog_test_run_xdp+0xf1d/0x1ae0 net/bpf/test_run.c:1316 bpf_prog_test_run+0x5e5/0xa30 kernel/bpf/syscall.c:4407 __sys_bpf+0x6aa/0xd90 kernel/bpf/syscall.c:5813 __do_sys_bpf kernel/bpf/syscall.c:5902 [inline] __se_sys_bpf kernel/bpf/syscall.c:5900 [inline] __ia32_sys_bpf+0xa0/0xe0 kernel/bpf/syscall.c:5900 ia32_sys_call+0x394d/0x4180 arch/x86/include/generated/asm/syscalls_32.h:358 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0xb0/0x110 arch/x86/entry/common.c:387 do_fast_syscall_32+0x38/0x80 arch/x86/entry/common.c:412 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:450 entry_SYSENTER_compat_after_hwframe+0x84/0x8e Reported-by: syzbot+83fed965338b573115f7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=83fed965338b573115f7 Fixes: b36e4523d4d5 ("netfilter: nf_conncount: fix garbage collection confirm race") Signed-off-by: Kohei Enju <enjuk@amazon.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>