summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)Author
2025-03-25Merge tag 'timers-cleanups-2025-03-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "A treewide hrtimer timer cleanup hrtimers are initialized with hrtimer_init() and a subsequent store to the callback pointer. This turned out to be suboptimal for the upcoming Rust integration and is obviously a silly implementation to begin with. This cleanup replaces the hrtimer_init(T); T->function = cb; sequence with hrtimer_setup(T, cb); The conversion was done with Coccinelle and a few manual fixups. Once the conversion has completely landed in mainline, hrtimer_init() will be removed and the hrtimer::function becomes a private member" * tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits) wifi: rt2x00: Switch to use hrtimer_update_function() io_uring: Use helper function hrtimer_update_function() serial: xilinx_uartps: Use helper function hrtimer_update_function() ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup() RDMA: Switch to use hrtimer_setup() virtio: mem: Switch to use hrtimer_setup() drm/vmwgfx: Switch to use hrtimer_setup() drm/xe/oa: Switch to use hrtimer_setup() drm/vkms: Switch to use hrtimer_setup() drm/msm: Switch to use hrtimer_setup() drm/i915/request: Switch to use hrtimer_setup() drm/i915/uncore: Switch to use hrtimer_setup() drm/i915/pmu: Switch to use hrtimer_setup() drm/i915/perf: Switch to use hrtimer_setup() drm/i915/gvt: Switch to use hrtimer_setup() drm/i915/huc: Switch to use hrtimer_setup() drm/amdgpu: Switch to use hrtimer_setup() stm class: heartbeat: Switch to use hrtimer_setup() i2c: Switch to use hrtimer_setup() iio: Switch to use hrtimer_setup() ...
2025-03-25stmmac: intel: interface switching support for RPL-P platformChoong Yong Liang
Based on the patch series [1], the enablement of interface switching for RPL-P will use the same handling as ADL-N. Link: https://patchwork.kernel.org/project/netdevbpf/cover/20250227121522.1802832-1-yong.liang.choong@linux.intel.com/ [1] Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Link: https://patch.msgid.link/20250324062742.462771-1-yong.liang.choong@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25stmmac: Replace deprecated PCI functionsPhilipp Stanner
The PCI functions - pcim_iomap_regions() and - pcim_iomap_table() have been deprecated. Replace them with their successor function, pcim_iomap_region(). Make variable declaration order at closeby places comply with reverse christmas tree order. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Huacai Chen <chenhuacai@loongson.cn> Tested-by: Henry Chen <chenx97@aosc.io> Signed-off-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250324092928.9482-6-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25stmmac: Remove pcim_* functions for driver detachPhilipp Stanner
Functions prefixed with "pcim_" are managed devres functions which perform automatic cleanup once the driver unloads. It is, thus, not necessary to call any cleanup functions in remove() callbacks. Remove the pcim_ cleanup function calls in the remove() callbacks. Signed-off-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Yanteng Si <si.yanteng@linux.dev> Tested-by: Henry Chen <chenx97@aosc.io> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250324092928.9482-5-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25stmmac: loongson: Remove surplus loopPhilipp Stanner
loongson_dwmac_probe() contains a loop which doesn't have an effect, because it tries to call pcim_iomap_regions() with the same parameters several times. The break statement at the loop's end furthermore ensures that the loop only runs once anyways. Remove the surplus loop. Signed-off-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Yanteng Si <si.yanteng@linux.dev> Reviewed-by: Huacai Chen <chenhuacai@loongson.cn> Tested-by: Henry Chen <chenx97@aosc.io> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250324092928.9482-4-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25octeontx2-af: mcs: Remove redundant 'flush_workqueue()' callsChen Ni
'destroy_workqueue()' already drains the queue before destroying it, so there is no need to flush it explicitly. Remove the redundant 'flush_workqueue()' calls. This was generated with coccinelle: @@ expression E; @@ - flush_workqueue(E); destroy_workqueue(E); Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250324080854.408188-1-nichen@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25Merge tag 'ipsec-next-2025-03-24' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2025-03-24 1) Prevent setting high order sequence number bits input in non-ESN mode. From Leon Romanovsky. 2) Support PMTU handling in tunnel mode for packet offload. From Leon Romanovsky. 3) Make xfrm_state_lookup_byaddr lockless. From Florian Westphal. 4) Remove unnecessary NULL check in xfrm_lookup_with_ifid(). From Dan Carpenter. * tag 'ipsec-next-2025-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: Remove unnecessary NULL check in xfrm_lookup_with_ifid() xfrm: state: make xfrm_state_lookup_byaddr lockless xfrm: check for PMTU in tunnel mode for packet offload xfrm: provide common xdo_dev_offload_ok callback implementation xfrm: rely on XFRM offload xfrm: simplify SA initialization routine xfrm: delay initialization of offload path till its actually requested xfrm: prevent high SEQ input in non-ESN mode ==================== Link: https://patch.msgid.link/20250324061855.4116819-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: au1000_eth: Mark au1000_ReleaseDB() staticJohan Korsnes
This fixes the following build warning: ``` drivers/net/ethernet/amd/au1000_eth.c:574:6: warning: no previous prototype for 'au1000_ReleaseDB' [-Wmissing-prototypes] 574 | void au1000_ReleaseDB(struct au1000_private *aup, struct db_dest *pDB) | ^~~~~~~~~~~~~~~~ ``` Signed-off-by: Johan Korsnes <johan.korsnes@gmail.com> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250323190450.111241-1-johan.korsnes@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25ibmvnic: Use kernel helpers for hex dumpsNick Child
Previously, when the driver was printing hex dumps, the buffer was cast to an 8 byte long and printed using string formatters. If the buffer size was not a multiple of 8 then a read buffer overflow was possible. Therefore, create a new ibmvnic function that loops over a buffer and calls hex_dump_to_buffer instead. This patch address KASAN reports like the one below: ibmvnic 30000003 env3: Login Buffer: ibmvnic 30000003 env3: 01000000af000000 <...> ibmvnic 30000003 env3: 2e6d62692e736261 ibmvnic 30000003 env3: 65050003006d6f63 ================================================================== BUG: KASAN: slab-out-of-bounds in ibmvnic_login+0xacc/0xffc [ibmvnic] Read of size 8 at addr c0000001331a9aa8 by task ip/17681 <...> Allocated by task 17681: <...> ibmvnic_login+0x2f0/0xffc [ibmvnic] ibmvnic_open+0x148/0x308 [ibmvnic] __dev_open+0x1ac/0x304 <...> The buggy address is located 168 bytes inside of allocated 175-byte region [c0000001331a9a00, c0000001331a9aaf) <...> ================================================================= ibmvnic 30000003 env3: 000000000033766e Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Dave Marquardt <davemarq@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250320212951.11142-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: stmmac: dwmac-rk: Add initial support for RK3528 integrated PHYJonas Karlman
Rockchip RK3528 (and RV1106) has a different integrated PHY compared to the integrated PHY on RK3228/RK3328. Current powerup/down operation is not compatible with the integrated PHY found in these newer SoCs. Add operations to powerup/down the integrated PHY found in RK3528. Use helpers that can be used by other GMAC variants in the future. Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250319214415.3086027-6-jonas@kwiboo.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: stmmac: dwmac-rk: Add integrated_phy_powerdown operationJonas Karlman
Rockchip RK3528 (and RV1106) has a different integrated PHY compared to the integrated PHY on RK3228/RK3328. Current powerup/down operation is not compatible with the integrated PHY found in these newer SoCs. Add a new integrated_phy_powerdown operation and change the call chain for integrated_phy_powerup to prepare support for the integrated PHY found in these newer SoCs. Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250319214415.3086027-5-jonas@kwiboo.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: stmmac: dwmac-rk: Move integrated_phy_powerup/down functionsJonas Karlman
Rockchip RK3528 (and RV1106) has a different integrated PHY compared to the integrated PHY on RK3228/RK3328. Current powerup/down operation is not compatible with the integrated PHY found in these SoCs. Move the rk_gmac_integrated_phy_powerup/down functions to top of the file to prepare for them to be called directly by a GMAC variant specific powerup/down operation. Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250319214415.3086027-4-jonas@kwiboo.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: stmmac: dwmac-rk: Add GMAC support for RK3528David Wu
Rockchip RK3528 has two Ethernet controllers based on Synopsys DWC Ethernet QoS IP. Add initial support for the RK3528 GMAC variant. Signed-off-by: David Wu <david.wu@rock-chips.com> Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Link: https://patch.msgid.link/20250319214415.3086027-3-jonas@kwiboo.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: stmmac: block PHY RXC clock-stopRussell King (Oracle)
The DesignWare core requires the receive clock to be running during certain operations. Ensure that we block PHY RXC clock-stop during these operations. This is a best-efforts change - not everywhere can be covered by this because of net's core locking, which means we can't access the MDIO bus to configure the PHY to disable RXC clock-stop in certain areas. These are marked with FIXME comments. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tvO6p-008Vjz-Qy@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: stmmac: socfpga: remove phy_resume() callRussell King (Oracle)
As the previous commit addressed DWGMAC resuming with a PHY in suspended state, there is now no need for socfpga to work around this. Remove this code. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tvO6f-008Vjn-J1@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: stmmac: address non-LPI resume failures properlyRussell King (Oracle)
The Synopsys Designware GMAC core databook requires all clocks to be active in order to complete software reset, which we perform during resume. However, IEEE 802.3 allows a PHY to stop its clocks when placed in low-power mode, which happens when the system is suspended and WoL is not enabled. As an attempt to work around this, commit 36d18b5664ef ("net: stmmac: start phylink instance before stmmac_hw_setup()") started phylink early, but this has the side effect that the mac_link_up() method may be called before or during the initialisation of GMAC hardware. We also have the socfpga glue driver directly calling phy_resume() also as an attempt to work around this. In a previous commit, phylink_prepare_resume() has been introduced to give MAC drivers a way to ensure that the PHY is resumed prior to their initialisation of their MAC hardware. This commit adds the call, and moves the phylink_resume() call back to where it should be before the aforementioned commit. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tvO6a-008Vjh-FG@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25sfc: support X4 devlink flashEdward Cree
Unlike X2 and EF100, we do not attempt to parse the firmware file to find an image within it; we simply hand the entire file to the MC, which is responsible for understanding any container formats we might use and validating that the firmware file is applicable to this NIC. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/9a72a74002a7819c780b0a18ce9294c9d4e1db12.1742493017.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25sfc: update MCDI protocol headersEdward Cree
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/bcb7597460a5a99d1dca4ef282f4aa2dd46ae545.1742493017.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25sfc: rip out MDIO supportEdward Cree
Unlike Siena, no EF10 board ever had an external PHY, and consequently MDIO handling isn't even built into the firmware. Since Siena has been split out into its own driver, the MDIO code can be deleted from the sfc driver. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/aa689d192ddaef7abe82709316c2be648a7bd66e.1742493017.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net/mlx5e: TC, Don't offload CT commit if it's the last actionJianbo Liu
For CT action with commit argument, it's usually followed by the forward action, either to the output netdev or next chain. The default behavior for software is to drop by setting action attribute to TC_ACT_SHOT instead of TC_ACT_PIPE if it's the last action. But driver can't handle it, so block the offload for such case. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742392983-153050-6-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net/mlx5e: CT: Filter legacy rules that are unrelated to nicPaul Blakey
In nic mode CT setup where we do hairpin between the two nics, both nics register to the same flow table (per zone), and try to offload all rules on it. Instead, filter the rules that originated from the relevant nic (so only one side is offloaded for each nic). Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742392983-153050-5-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net/mlx5: Update pfnum retrieval for devlink port attributesShay Drory
Align mlx5 driver usage of 'pfnum' with the documentation clarification introduced in commit bb70b0d48d8e ("devlink: Improve the port attributes description"). Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742392983-153050-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net/mlx5: fw reset, check bridge accessibility at earlier stageAmir Tzin
Currently, mlx5_is_reset_now_capable() checks whether the pci bridge is accessible only on bridge hot plug capability check. If the pci bridge is not accessible, reset now will fail regardless of bridge hotplug capability. Move this check to function mlx5_is_reset_now_capable() which, in such case, aborts the reset and does so in the request phase instead of the reset now phase. Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Amir Tzin <amirtz@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742392983-153050-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net/mlx5: Lag, use port selection tables when availableMark Bloch
As queue affinity is being deprecated and will no longer be supported in the future, Always check for the presence of the port selection namespace. When available, leverage it to distribute traffic across the physical ports via steering, ensuring compatibility with future NICs. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742392983-153050-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net/mlx5e: TX, Utilize WQ fragments edge for multi-packet WQEsTariq Toukan
For simplicity reasons, the driver avoids crossing work queue fragment boundaries within the same TX WQE (Work-Queue Element). Until today, as the number of packets in a TX MPWQE (Multi-Packet WQE) descriptor is not known in advance, the driver pre-prepared contiguous memory for the largest possible WQE. For this, when getting too close to the fragment edge, having no room for the largest WQE possible, the driver was filling the fragment remainder with NOP descriptors, aligning the next descriptor to the beginning of the next fragment. Generating and handling these NOPs wastes resources, like: CPU cycles, work-queue entries fetched to the device, and PCI bandwidth. In this patch, we replace this NOPs filling mechanism in the TX MPWQE flow. Instead, we utilize the remaining entries of the fragment with a TX MPWQE. If this room turns out to be too small, we simply open an additional descriptor starting at the beginning of the next fragment. Performance benchmark: uperf test, single server against 3 clients. TCP multi-stream, bidir, traffic profile "2x350B read, 1400B write". Bottleneck is in inbound PCI bandwidth (device POV). +---------------+------------+------------+--------+ | | Before | After | | +---------------+------------+------------+--------+ | BW | 117.4 Gbps | 121.1 Gbps | +3.1% | +---------------+------------+------------+--------+ | tx_packets | 15 M/sec | 15.5 M/sec | +3.3% | +---------------+------------+------------+--------+ | tx_nops | 3 M/sec | 0 | -100% | +---------------+------------+------------+--------+ Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1742391746-118647-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-03-18 (ice, idpf) For ice: Przemek modifies string declarations to resolve compile issues on gcc 7.5. Karol adds padding to initial programming of GLTSYN_TIME* registers to ensure it will occur in the future to prevent hardware issues. Jesse Brandeburg turns off driver RDMA capability when the corresponding kernel config is not enabled to aid in preventing resource exhaustion. Jan adjusts type declaration to properly catch error conditions and prevent truncation of values. He also adds bounds checking to prevent overflow in ice_vc_cfg_q_quanta(). Lukasz adds checking and error reporting for invalid values in ice_vc_cfg_q_bw(). Mateusz adds check for valid size for ice_vc_fdir_parse_raw(). For idpf: Emil adds check, and handling, on failure to register netdev. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: idpf: check error for register_netdev() on init ice: fix using untrusted value of pkt_len in ice_vc_fdir_parse_raw() ice: fix input validation for virtchnl BW ice: validate queue quanta parameters to prevent OOB access ice: stop truncating queue ids when checking virtchnl: make proto and filter action count unsigned ice: fix reservation of resources for RDMA when disabled ice: ensure periodic output start time is in the future ice: health.c: fix compilation on gcc 7.5 ==================== Link: https://patch.msgid.link/20250318200511.2958251-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: ti: cpsw: Add metadata support for xdp modeLorenzo Bianconi
Set metadata size building the skb from xdp_buff in cpsw/cpsw_new drivers. ti cpsw and cpsw_new drivers set xdp headroom at least to CPSW_HEADROOM_NA: CPSW_HEADROOM_NA max(XDP_PACKET_HEADROOM, NET_SKB_PAD) + NET_IP_ALIGN so the headroom is large enough to contain xdp_frame and xdp metadata. Please note this patch is just compiled tested. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-7-b6075778f61f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: mana: Add metadata support for xdp modeLorenzo Bianconi
Set metadata size building the skb from xdp_buff in mana driver. mana driver sets xdp headroom to XDP_PACKET_HEADROOM so the headroom is large enough to contain xdp_frame and xdp metadata. Please note this patch is just compiled tested. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-6-b6075778f61f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: ethernet: mediatek: Add metadata support for xdp modeLorenzo Bianconi
Set metadata size building the skb from xdp_buff in mediatek driver. mtk_eth_soc driver sets xdp headroom to XDP_PACKET_HEADROOM so the headroom is large enough to contain xdp_frame and xdp metadata. Please note this patch is just compiled tested. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-5-b6075778f61f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: octeontx2: Add metadata support for xdp modeLorenzo Bianconi
Set metadata size building the skb from xdp_buff in octeontx2 driver. octeontx2 driver sets xdp headroom to OTX2_HEAD_ROOM OTX2_HEAD_ROOM OTX2_ALIGN OTX2_ALIGN 128 so the headroom is large enough to contain xdp_frame and xdp metadata. Please note this patch is just compiled tested. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-4-b6075778f61f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: netsec: Add metadata support for xdp modeLorenzo Bianconi
Set metadata size building the skb from xdp_buff in netsec driver. netsec driver sets xdp headroom to NETSEC_RXBUF_HEADROOM: NETSEC_RXBUF_HEADROOM max(XDP_PACKET_HEADROOM, NET_SKB_PAD) + NET_IP_ALIGN so the headroom is large enough to contain xdp_frame and xdp metadata. Please note this patch is just compiled tested. Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-3-b6075778f61f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: mvpp2: Add metadata support for xdp modeLorenzo Bianconi
Set metadata size building the skb from xdp_buff in mvpp2 driver mvpp2 driver sets xdp headroom to: MVPP2_MH_SIZE + MVPP2_SKB_HEADROOM where MVPP2_MH_SIZE 2 MVPP2_SKB_HEADROOM min(max(XDP_PACKET_HEADROOM, NET_SKB_PAD), 224) so the headroom is large enough to contain xdp_frame and xdp metadata. Please note this patch is just compiled tested. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-2-b6075778f61f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: mvneta: Add metadata support for xdp modeLorenzo Bianconi
Set metadata size building the skb from xdp_buff in mvneta driver mvneta sets xdp headroom to: MVNETA_MH_SIZE + MVNETA_SKB_HEADROOM where MVNETA_MH_SIZE 2 MVNETA_SKB_HEADROOM max(NET_SKB_PAD, XDP_PACKET_HEADROOM) so the headroom is large enough to contain xdp_frame and xdp metadata. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-1-b6075778f61f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: tulip: avoid unused variable warningSimon Horman
There is an effort to achieve W=1 kernel builds without warnings. As part of that effort Helge Deller highlighted the following warnings in the tulip driver when compiling with W=1 and CONFIG_TULIP_MWI=n: .../tulip_core.c: In function ‘tulip_init_one’: .../tulip_core.c:1309:22: warning: variable ‘force_csr0’ set but not used This patch addresses that problem using IS_ENABLED(). This approach has the added benefit of reducing conditionally compiled code. And thus increasing compile coverage. E.g. for allmodconfig builds which enable CONFIG_TULIP_MWI. Compile tested only. No run-time effect intended. Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250318-tulip-w1-v3-1-a813fadd164d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24Merge tag 'bitmap-for-6.15' of https://github.com/norov/linuxLinus Torvalds
Pull bitmap updates from Yury Norov: - cpumask_next_wrap() rework (me) - GENMASK() simplification (I Hsin) - rust bindings for cpumasks (Viresh and me) - scattered cleanups (Andy, Tamir, Vincent, Ignacio and Joel) * tag 'bitmap-for-6.15' of https://github.com/norov/linux: (22 commits) cpumask: align text in comment riscv: fix test_and_{set,clear}_bit ordering documentation treewide: fix typo 'unsigned __init128' -> 'unsigned __int128' MAINTAINERS: add rust bindings entry for bitmap API rust: Add cpumask helpers uapi: Revert "bitops: avoid integer overflow in GENMASK(_ULL)" cpumask: drop cpumask_next_wrap_old() PCI: hv: Switch hv_compose_multi_msi_req_get_cpu() to using cpumask_next_wrap() scsi: lpfc: rework lpfc_next_{online,present}_cpu() scsi: lpfc: switch lpfc_irq_rebalance() to using cpumask_next_wrap() s390: switch stop_machine_yield() to using cpumask_next_wrap() padata: switch padata_find_next() to using cpumask_next_wrap() cpumask: use cpumask_next_wrap() where appropriate cpumask: re-introduce cpumask_next{,_and}_wrap() cpumask: deprecate cpumask_next_wrap() powerpc/xmon: simplify xmon_batch_next_cpu() ibmvnic: simplify ibmvnic_set_queue_affinity() virtio_net: simplify virtnet_set_affinity() objpool: rework objpool_pop() cpumask: add for_each_{possible,online}_cpu_wrap ...
2025-03-24net/mlx5: Start health poll after enable hcaMoshe Shemesh
The health poll mechanism performs periodic checks to detect firmware errors. One of the checks verifies the function is still enabled on firmware side, but the function is enabled only after enable_hca command completed. Start health poll after enable_hca command to avoid a race between function enabled and first health polling. Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1742331077-102038-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net/mlx5: LAG, reload representors on LAG creation failureMark Bloch
When LAG creation fails, the driver reloads the RDMA devices. If RDMA representors are present, they should also be reloaded. This step was missed in the cited commit. Fixes: 598fe77df855 ("net/mlx5: Lag, Create shared FDB when in switchdev mode") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1742331077-102038-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24mlxsw: spectrum_acl_bloom_filter: Workaround for some LLVM versionsWangYuli
This is a workaround to mitigate a compiler anomaly. During LLVM toolchain compilation of this driver on s390x architecture, an unreasonable __write_overflow_field warning occurs. Contextually, chunk_index is restricted to 0, 1 or 2. By expanding these possibilities, the compile warning is suppressed. Fix follow error with clang-19 when -Werror: In file included from drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_bloom_filter.c:5: In file included from ./include/linux/gfp.h:7: In file included from ./include/linux/mmzone.h:8: In file included from ./include/linux/spinlock.h:63: In file included from ./include/linux/lockdep.h:14: In file included from ./include/linux/smp.h:13: In file included from ./include/linux/cpumask.h:12: In file included from ./include/linux/bitmap.h:13: In file included from ./include/linux/string.h:392: ./include/linux/fortify-string.h:571:4: error: call to '__write_overflow_field' declared with 'warning' attribute: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror,-Wattribute-warning] 571 | __write_overflow_field(p_size_field, size); | ^ 1 error generated. According to the testing, we can be fairly certain that this is a clang compiler bug, impacting only clang-19 and below. Clang versions 20 and 21 do not exhibit this behavior. Link: https://lore.kernel.org/all/484364B641C901CD+20250311141025.1624528-1-wangyuli@uniontech.com/ Fixes: 7585cacdb978 ("mlxsw: spectrum_acl: Add Bloom filter handling") Co-developed-by: Zijian Chen <czj2441@163.com> Signed-off-by: Zijian Chen <czj2441@163.com> Co-developed-by: Wentao Guan <guanwentao@uniontech.com> Signed-off-by: Wentao Guan <guanwentao@uniontech.com> Suggested-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Tested-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> Link: https://patch.msgid.link/A1858F1D36E653E0+20250318103654.708077-1-wangyuli@uniontech.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24mlxsw: Add VXLAN bridge ports to same hardware domain as physical bridge portsAmit Cohen
When hardware floods packets to bridge ports, but flooding to VXLAN bridge port fails during encapsulation to one of the remote VTEPs, the packets are trapped to CPU. In such case, the packets are marked with skb->offload_fwd_mark, which means that packet was L2-forwarded in hardware. Software data path repeats flooding, but packets which are marked with skb->offload_fwd_mark will not be flooded by the bridge to bridge ports which are in the same hardware domain as the ingress port. Currently, mlxsw does not add VXLAN bridge ports to the same hardware domain as physical bridge ports despite the fact that the device is able to forward packets to and from VXLAN tunnels in hardware. In some scenarios (as mentioned above) this can result in remote VTEPs receiving duplicate packets. The packets are first flooded by hardware and after an encapsulation failure, they are flooded again to all remote VTEPs by software. Solve this by adding VXLAN bridge ports to the same hardware domain as physical bridge ports, so then nbp_switchdev_allowed_egress() will return false also for VXLAN, and packets will not be sent twice from VXLAN device. switchdev_bridge_port_offload() should get vxlan_dev not as const, so some changes are required. Call switchdev API from mlxsw_sp_bridge_vxlan_{join,leave}() which handle offload configurations. Reported-by: Vladimir Oltean <olteanv@gmail.com> Closes: https://lore.kernel.org/all/20250210152246.4ajumdchwhvbarik@skbuf/ Reported-by: Vladyslav Mykhaliuk <vmykhaliuk@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/7279056843140fae3a72c2d204c7886b79d03899.1742224300.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24mlxsw: spectrum_switchdev: Move mlxsw_sp_bridge_vxlan_join()Amit Cohen
Next patch will call __mlxsw_sp_bridge_vxlan_leave() from mlxsw_sp_bridge_vxlan_join() as part of error flow, move the function to be able to call the second one. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/64750a0965536530482318578bada30fac372b8a.1742224300.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24mlxsw: spectrum_switchdev: Add an internal API for VXLAN leaveAmit Cohen
There is asymmetry in how the VXLAN join and leave functions are used. The join function (mlxsw_sp_bridge_vxlan_join()) is only called in response to netdev events (e.g., VXLAN device joining a bridge), but the leave function is also called in response to switchdev events (e.g., VLAN configuration on top of the VXLAN device) in order to invalidate VNI to FID mappings. This asymmetry will cause problems when the functions will be later extended to mark VXLAN bridge ports as offloaded or not. Therefore, create an internal function (__mlxsw_sp_bridge_vxlan_leave()) that is used to invalidate VNI to FID mappings and call it from mlxsw_sp_bridge_vxlan_leave() which will only be invoked in response to netdev events, like mlxsw_sp_bridge_vxlan_join(). No functional changes intended. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/f3a32bd2d87a0b7ac4d2bb98a427dc6d95a01cd0.1742224300.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24mlxsw: spectrum: Call mlxsw_sp_bridge_vxlan_{join, leave}() for VLAN-aware ↵Amit Cohen
bridge mlxsw_sp_bridge_vxlan_{join,leave}() are not called when a VXLAN device joins or leaves a VLAN-aware bridge. As mentioned in the comment - when the bridge is VLAN-aware, the VNI of the VXLAN device needs to be mapped to a VLAN, but at this point no VLANs are configured on the VxLAN device. This means that we can call the APIs, but there is no point to do that, as they do not configure anything in such cases. Next patch will extend mlxsw_sp_bridge_vxlan_{join,leave}() to set hardware domain for VXLAN, this should be done also when a VXLAN device joins or leaves a VLAN-aware bridge. Call the APIs, which for now do not do anything in these flows. Align the call to mlxsw_sp_bridge_vxlan_leave() to be called like mlxsw_sp_bridge_vxlan_join(), only in case that the VXLAN device is up, so move the check to be done before calling mlxsw_sp_bridge_vxlan_{join,leave}(). This does not change the existing behavior, as there is a similar check inside mlxsw_sp_bridge_vxlan_leave(). Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/994c1ea93520f9ea55d1011cd47dc2180d526484.1742224300.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24mlxsw: Trap ARP packets at layer 2 instead of layer 3Amit Cohen
Next patch will set the same hardware domain for all bridge ports, including VXLAN, to prevent packets from being forwarded by software when they were already forwarded by hardware. ARP packets are not flooded by hardware to VXLAN, so software should handle such flooding. When hardware domain of VXLAN device will be changed, ARP packets which are trapped and marked with offload_fwd_mark will not be flooded to VXLAN also in software, which will break VXLAN traffic. To prevent such breaking, trap ARP packets at layer 2 and don't mark them as L2-forwarded in hardware, then flooding ARP packets will be done only in software, and VXLAN will send ARP packets. Remove NVE_ENCAP_ARP which is no longer needed, as now ARP packets are trapped when they enter the device. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/b2a2cc607a1f4cb96c10bd3b0b0244ba3117fd2e.1742224300.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24bnxt_en: Linearize TX SKB if the fragments exceed the maxMichael Chan
If skb_shinfo(skb)->nr_frags excceds what the chip can support, linearize the SKB and warn once to let the user know. net.core.max_skb_frags can be lowered, for example, to avoid the issue. Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250321211639.3812992-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24bnxt_en: Mask the bd_cnt field in the TX BD properlyMichael Chan
The bd_cnt field in the TX BD specifies the total number of BDs for the TX packet. The bd_cnt field has 5 bits and the maximum number supported is 32 with the value 0. CONFIG_MAX_SKB_FRAGS can be modified and the total number of SKB fragments can approach or exceed the maximum supported by the chip. Add a macro to properly mask the bd_cnt field so that the value 32 will be properly masked and set to 0 in the bd_cnd field. Without this patch, the out-of-range bd_cnt value will corrupt the TX BD and may cause TX timeout. The next patch will check for values exceeding 32. Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250321211639.3812992-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net: ethernet: Drop unused of_gpio.hPeng Fan
of_gpio.h is deprecated. Since there is no of_gpio_x API, drop unused of_gpio.h. While at here, drop gpio.h and gpio/consumer.h if no user in driver. Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250320031542.3960381-1-peng.fan@oss.nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net/mlx5e: Always select CONFIG_PAGE_POOL_STATSTariq Toukan
Always set PAGE_POOL_STATS in mlx5 Eth driver. Cleanup the corresponding #ifdefs. Page pool stats are essential to monitor and analyze RX performance. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/1742412199-159596-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net/mlx5e: Use right API to free bitmap memoryMark Zhang
Use bitmap_free() to free memory allocated with bitmap_zalloc_node(). This fixes memtrack error: mtl rsc inconsistency: memtrack_free: .../drivers/net/ethernet/mellanox/mlx5/core/en_main.c::466: kfree for unknown address=0xFFFF0000CA3619E8, device=0x0 Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1742412199-159596-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net/mlx5: Remove NULL check before dev_{put, hold}Gal Pressman
Fix coccinelle warnings: WARNING: NULL check before dev_{put, hold} functions is not needed. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1742412199-159596-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24net/mlx5e: Fix ethtool -N flow-type ip4 to RSS contextMaxim Mikityanskiy
There commands can be used to add an RSS context and steer some traffic into it: # ethtool -X eth0 context new New RSS context is 1 # ethtool -N eth0 flow-type ip4 dst-ip 1.1.1.1 context 1 Added rule with ID 1023 However, the second command fails with EINVAL on mlx5e: # ethtool -N eth0 flow-type ip4 dst-ip 1.1.1.1 context 1 rmgr: Cannot insert RX class rule: Invalid argument Cannot insert classification rule It happens when flow_get_tirn calls flow_type_to_traffic_type with flow_type = IP_USER_FLOW or IPV6_USER_FLOW. That function only handles IPV4_FLOW and IPV6_FLOW cases, but unlike all other cases which are common for hash and spec, IPv4 and IPv6 defines different contants for hash and for spec: #define TCP_V4_FLOW 0x01 /* hash or spec (tcp_ip4_spec) */ #define UDP_V4_FLOW 0x02 /* hash or spec (udp_ip4_spec) */ ... #define IPV4_USER_FLOW 0x0d /* spec only (usr_ip4_spec) */ #define IP_USER_FLOW IPV4_USER_FLOW #define IPV6_USER_FLOW 0x0e /* spec only (usr_ip6_spec; nfc only) */ #define IPV4_FLOW 0x10 /* hash only */ #define IPV6_FLOW 0x11 /* hash only */ Extend the switch in flow_type_to_traffic_type to support both, which fixes the failing ethtool -N command with flow-type ip4 or ip6. Fixes: 248d3b4c9a39 ("net/mlx5e: Support flow classification into RSS contexts") Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Tested-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250319124508.3979818-1-maxim@isovalent.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>