summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-25dt-bindings: net: rockchip-dwmac: Add compatible string for RK3528Jonas Karlman
Rockchip RK3528 has two Ethernet controllers based on Synopsys DWC Ethernet QoS IP. Add compatible string for the RK3528 variant. Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20250319214415.3086027-2-jonas@kwiboo.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25Merge branch 'net-improve-stmmac-resume-rx-clocking'Jakub Kicinski
Russell King says: ==================== net: improve stmmac resume rx clocking stmmac has had a long history of problems with resuming, illustrated by reset failure due to the receive clock not running. Several attempts have been attempted over the years to address this issue, such as moving phylink_start() (now phylink_resume()) super early in stmmac_resume() in commit 90702dcd19c0 ("net: stmmac: fix MAC not working when system resume back with WoL a ctive.") However, this has the downside that stmmac_mac_link_up() can (and demonstrably is) called before or during the driver initialisation in another thread. This can cause issues as packets could begin to be queued, and the transmit/receive enable bits will be set before any initialisation has been done. Another attempt is used by dwmac-socfpga.c in commit 2d871aa07136 ("net: stmmac: add platform init/exit for Altera's ARM socfpga") which pre-dates the above commit. Neither of these two approaches consider the effect of EEE with a PHY that supports receive clock-stop and has that feature enabled (which the stmmac driver does enable). If the link is up, then there is the possibility for the receive path to be in low-power mode, and the PHY may stop its receive clock. This series addresses these issues by (each is not necessarily a separate patch): 1) introducing phylink_prepare_resume(), which can be used by MAC drivers to ensure that the PHY is resumed prior to doing any re-initialisation work. This call is added to stmmac_resume(). 2) moving phylink_resume() after all re-initialisation has completed, thereby ensuring that the hardware is ready to be enabled for packet reception/transmission. 3) with (1) and (2) addressed, the need for socfpga to have a private work-around is no longer necessary, so it is removed. 4) introducing phylink functions to block/unblock the receive clock- stop at the PHY. As these require PHY access over the MDIO bus, they can sleep, so are not suitable for atomic access. 5) the stmmac hardware requires the receive clock to be running for reset to complete. Depending on synthesis options, this requirement may also extend to writing various registers as well, e.g. setting the MAC address, writing some of the vlan registers, etc. Full details are in the databook. We add blocking/unblocking of the PHY receive clock-stop around parts of the main stmmac driver where we have a context that we can sleep. These are wrapped with the new phylink functions. However, depending on synthesis options, there could be other places where the net core calls the driver with a BH-disabled context where we can't sleep, and thus can't block the PHY from disabling its receive clock. These are documented with FIXME comments. Given the last paragraph above, I am wondering whether a better approach would be to ensure that receive clock-stop is always disabled at the PHY with stmmac. From what I can see, implementations do not document to this level of detail, which makes it difficult to tell which registers require the receive clock to be running to behave correctly. This patch series has been tested on the Tegra194 Jetson Xavier NX board kindly donated by NVidia, with two additional patches that are pending in patchwork - the first is required to have EEE's LPI mode passed through to the MAC on this platform to allow testing under PHY clock-stop scenarios. The second is a bug fix for PHYLIB and makes "eee off" functional, but should not affect this series. All patches on top of net-next commit f749448ce9f1 ("Merge branch 'net-mlx5-hw-steering-cleanups'") https://patchwork.kernel.org/project/netdevbpf/patch/E1ttnHW-00785s-Uq@rmk-PC.armlinux.org.uk/ https://patchwork.kernel.org/project/netdevbpf/patch/E1ttmWN-0077Mb-Q6@rmk-PC.armlinux.org.uk/ ==================== Link: https://patch.msgid.link/Z9ySeo61VYTClIJJ@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: stmmac: block PHY RXC clock-stopRussell King (Oracle)
The DesignWare core requires the receive clock to be running during certain operations. Ensure that we block PHY RXC clock-stop during these operations. This is a best-efforts change - not everywhere can be covered by this because of net's core locking, which means we can't access the MDIO bus to configure the PHY to disable RXC clock-stop in certain areas. These are marked with FIXME comments. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tvO6p-008Vjz-Qy@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: phylink: add functions to block/unblock rx clock stopRussell King (Oracle)
Some MACs require the PHY receive clock to be running to complete setup actions. This may fail if the PHY has negotiated EEE, the MAC supports receive clock stop, and the link has entered LPI state. Provide a pair of APIs that MAC drivers can use to temporarily block the PHY disabling the receive clock. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tvO6k-008Vjt-MZ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: stmmac: socfpga: remove phy_resume() callRussell King (Oracle)
As the previous commit addressed DWGMAC resuming with a PHY in suspended state, there is now no need for socfpga to work around this. Remove this code. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tvO6f-008Vjn-J1@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: stmmac: address non-LPI resume failures properlyRussell King (Oracle)
The Synopsys Designware GMAC core databook requires all clocks to be active in order to complete software reset, which we perform during resume. However, IEEE 802.3 allows a PHY to stop its clocks when placed in low-power mode, which happens when the system is suspended and WoL is not enabled. As an attempt to work around this, commit 36d18b5664ef ("net: stmmac: start phylink instance before stmmac_hw_setup()") started phylink early, but this has the side effect that the mac_link_up() method may be called before or during the initialisation of GMAC hardware. We also have the socfpga glue driver directly calling phy_resume() also as an attempt to work around this. In a previous commit, phylink_prepare_resume() has been introduced to give MAC drivers a way to ensure that the PHY is resumed prior to their initialisation of their MAC hardware. This commit adds the call, and moves the phylink_resume() call back to where it should be before the aforementioned commit. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tvO6a-008Vjh-FG@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: phylink: add phylink_prepare_resume()Russell King (Oracle)
When the system is suspended, the PHY may be placed in low-power mode by setting the BMCR 0.11 Power down bit. IEEE 802.3 states that the behaviour of the PHY in this state is implementation specific, and the PHY is not required to meet the RX_CLK and TX_CLK requirements. Essentially, this means that a PHY may stop the clocks that it is generating while in power down state. However, MACs exist which require the clocks from the PHY to be running in order to properly resume. phylink_prepare_resume() provides them with a way to clear the Power down bit early. Note, however, that IEEE 802.3 gives PHYs up to 500ms grace before the transmit and receive clocks meet the requirements after clearing the power down bit. Add a resume preparation function, which will ensure that the receive clock from the PHY is appropriately configured while resuming. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tvO6V-008Vjb-AP@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25Merge branch 'sfc-devlink-flash-for-x4'Jakub Kicinski
Edward Cree says: ==================== sfc: devlink flash for X4 Updates to support devlink flash on X4 NICs. Patch #2 is needed for NVRAM_PARTITION_TYPE_AUTO, and patch #1 is needed because the latest MCDI headers from firmware no longer include MDIO read/write commands. v1: https://lore.kernel.org/cover.1742223233.git.ecree.xilinx@gmail.com ==================== Link: https://patch.msgid.link/cover.1742493016.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25sfc: support X4 devlink flashEdward Cree
Unlike X2 and EF100, we do not attempt to parse the firmware file to find an image within it; we simply hand the entire file to the MC, which is responsible for understanding any container formats we might use and validating that the firmware file is applicable to this NIC. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/9a72a74002a7819c780b0a18ce9294c9d4e1db12.1742493017.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25sfc: update MCDI protocol headersEdward Cree
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/bcb7597460a5a99d1dca4ef282f4aa2dd46ae545.1742493017.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25sfc: rip out MDIO supportEdward Cree
Unlike Siena, no EF10 board ever had an external PHY, and consequently MDIO handling isn't even built into the firmware. Since Siena has been split out into its own driver, the MDIO code can be deleted from the sfc driver. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/aa689d192ddaef7abe82709316c2be648a7bd66e.1742493017.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: reorganize IP MIB values (II)Eric Dumazet
Commit 14a196807482 ("net: reorganize IP MIB values") changed MIB values to group hot fields together. Since then 5 new fields have been added without caring about data locality. This patch moves IPSTATS_MIB_OUTPKTS, IPSTATS_MIB_NOECTPKTS, IPSTATS_MIB_ECT1PKTS, IPSTATS_MIB_ECT0PKTS, IPSTATS_MIB_CEPKTS to the hot portion of per-cpu data. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250320101434.3174412-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25tcp: avoid atomic operations on sk->sk_rmem_allocEric Dumazet
TCP uses generic skb_set_owner_r() and sock_rfree() for received packets, with socket lock being owned. Switch to private versions, avoiding two atomic operations per packet. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250320121604.3342831-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25Merge branch 'nexthop-convert-rtm_-new-del-nexthop-to-per-netns-rtnl'Jakub Kicinski
Kuniyuki Iwashima says: ==================== nexthop: Convert RTM_{NEW,DEL}NEXTHOP to per-netns RTNL. Patch 1 - 5 move some validation for RTM_NEWNEXTHOP so that it can be called without RTNL. Patch 6 & 7 converts RTM_NEWNEXTHOP and RTM_DELNEXTHOP to per-netns RTNL. Note that RTM_GETNEXTHOP and RTM_GETNEXTHOPBUCKET are not touched in this series. rtm_get_nexthop() can be easily converted to RCU, but rtm_dump_nexthop() needs more work due to the left-to-right rbtree walk, which looks prone to node deletion and tree rotation without a retry mechanism. v1: https://lore.kernel.org/netdev/20250318233240.53946-1-kuniyu@amazon.com/ ==================== Link: https://patch.msgid.link/20250319230743.65267-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25nexthop: Convert RTM_DELNEXTHOP to per-netns RTNL.Kuniyuki Iwashima
In rtm_del_nexthop(), only nexthop_find_by_id() and remove_nexthop() require RTNL as they touch net->nexthop.rb_root. Let's move RTNL down as rtnl_net_lock() before nexthop_find_by_id(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250319230743.65267-8-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25nexthop: Convert RTM_NEWNEXTHOP to per-netns RTNL.Kuniyuki Iwashima
If we pass false to the rtnl_held param of lwtunnel_valid_encap_type(), we can move RTNL down before rtm_to_nh_config_rtnl(). Let's use rtnl_net_lock() in rtm_new_nexthop(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250319230743.65267-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25nexthop: Remove redundant group len check in nexthop_create_group().Kuniyuki Iwashima
The number of NHA_GROUP entries is guaranteed to be non-zero in nh_check_attr_group(). Let's remove the redundant check in nexthop_create_group(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250319230743.65267-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25nexthop: Check NLM_F_REPLACE and NHA_ID in rtm_new_nexthop().Kuniyuki Iwashima
nexthop_add() checks if NLM_F_REPLACE is specified without non-zero NHA_ID, which does not require RTNL. Let's move the check to rtm_new_nexthop(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250319230743.65267-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25nexthop: Move NHA_OIF validation to rtm_to_nh_config_rtnl().Kuniyuki Iwashima
NHA_OIF needs to look up a device by __dev_get_by_index(), which requires RTNL. Let's move NHA_OIF validation to rtm_to_nh_config_rtnl(). Note that the proceeding checks made the original !cfg->nh_fdb check redundant. NHA_FDB is set -> NHA_OIF cannot be set NHA_FDB is set but false -> NHA_OIF must be set NHA_FDB is not set -> NHA_OIF must be set Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250319230743.65267-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25nexthop: Split nh_check_attr_group().Kuniyuki Iwashima
We will push RTNL down to rtm_new_nexthop(), and then we want to move non-RTNL operations out of the scope. nh_check_attr_group() validates NHA_GROUP attributes, and nexthop_find_by_id() and some validation requires RTNL. Let's factorise such parts as nh_check_attr_group_rtnl() and call it from rtm_to_nh_config_rtnl(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250319230743.65267-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25nexthop: Move nlmsg_parse() in rtm_to_nh_config() to rtm_new_nexthop().Kuniyuki Iwashima
We will split rtm_to_nh_config() into non-RTNL and RTNL parts, and then the latter also needs tb. As a prep, let's move nlmsg_parse() to rtm_new_nexthop(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250319230743.65267-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25ipv6: fix _DEVADD() and _DEVUPD() macrosEric Dumazet
ip6_rcv_core() is using: __IP6_ADD_STATS(net, idev, IPSTATS_MIB_NOECTPKTS + (ipv6_get_dsfield(hdr) & INET_ECN_MASK), max_t(unsigned short, 1, skb_shinfo(skb)->gso_segs)); This is currently evaluating both expressions twice. Fix _DEVADD() and _DEVUPD() macros to evaluate their arguments once. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250319212516.2385451-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25Merge branch 'mlx5-misc-enhancements-2025-03-19'Jakub Kicinski
Tariq Toukan says: ==================== mlx5 misc enhancements 2025-03-19 This series introduces multiple small misc enhancements from the team to the mlx5 core and Eth drivers. ==================== Link: https://patch.msgid.link/1742392983-153050-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net/mlx5e: TC, Don't offload CT commit if it's the last actionJianbo Liu
For CT action with commit argument, it's usually followed by the forward action, either to the output netdev or next chain. The default behavior for software is to drop by setting action attribute to TC_ACT_SHOT instead of TC_ACT_PIPE if it's the last action. But driver can't handle it, so block the offload for such case. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742392983-153050-6-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net/mlx5e: CT: Filter legacy rules that are unrelated to nicPaul Blakey
In nic mode CT setup where we do hairpin between the two nics, both nics register to the same flow table (per zone), and try to offload all rules on it. Instead, filter the rules that originated from the relevant nic (so only one side is offloaded for each nic). Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742392983-153050-5-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net/mlx5: Update pfnum retrieval for devlink port attributesShay Drory
Align mlx5 driver usage of 'pfnum' with the documentation clarification introduced in commit bb70b0d48d8e ("devlink: Improve the port attributes description"). Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742392983-153050-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net/mlx5: fw reset, check bridge accessibility at earlier stageAmir Tzin
Currently, mlx5_is_reset_now_capable() checks whether the pci bridge is accessible only on bridge hot plug capability check. If the pci bridge is not accessible, reset now will fail regardless of bridge hotplug capability. Move this check to function mlx5_is_reset_now_capable() which, in such case, aborts the reset and does so in the request phase instead of the reset now phase. Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Amir Tzin <amirtz@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742392983-153050-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net/mlx5: Lag, use port selection tables when availableMark Bloch
As queue affinity is being deprecated and will no longer be supported in the future, Always check for the presence of the port selection namespace. When available, leverage it to distribute traffic across the physical ports via steering, ensuring compatibility with future NICs. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742392983-153050-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net/mlx5e: TX, Utilize WQ fragments edge for multi-packet WQEsTariq Toukan
For simplicity reasons, the driver avoids crossing work queue fragment boundaries within the same TX WQE (Work-Queue Element). Until today, as the number of packets in a TX MPWQE (Multi-Packet WQE) descriptor is not known in advance, the driver pre-prepared contiguous memory for the largest possible WQE. For this, when getting too close to the fragment edge, having no room for the largest WQE possible, the driver was filling the fragment remainder with NOP descriptors, aligning the next descriptor to the beginning of the next fragment. Generating and handling these NOPs wastes resources, like: CPU cycles, work-queue entries fetched to the device, and PCI bandwidth. In this patch, we replace this NOPs filling mechanism in the TX MPWQE flow. Instead, we utilize the remaining entries of the fragment with a TX MPWQE. If this room turns out to be too small, we simply open an additional descriptor starting at the beginning of the next fragment. Performance benchmark: uperf test, single server against 3 clients. TCP multi-stream, bidir, traffic profile "2x350B read, 1400B write". Bottleneck is in inbound PCI bandwidth (device POV). +---------------+------------+------------+--------+ | | Before | After | | +---------------+------------+------------+--------+ | BW | 117.4 Gbps | 121.1 Gbps | +3.1% | +---------------+------------+------------+--------+ | tx_packets | 15 M/sec | 15.5 M/sec | +3.3% | +---------------+------------+------------+--------+ | tx_nops | 3 M/sec | 0 | -100% | +---------------+------------+------------+--------+ Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1742391746-118647-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25dql: Fix dql->limit value when reset.Jing Su
Executing dql_reset after setting a non-zero value for limit_min can lead to an unreasonable situation where dql->limit is less than dql->limit_min. For instance, after setting /sys/class/net/eth*/queues/tx-0/byte_queue_limits/limit_min, an ifconfig down/up operation might cause the ethernet driver to call netdev_tx_reset_queue, which in turn invokes dql_reset. In this case, dql->limit is reset to 0 while dql->limit_min remains non-zero value, which is unexpected. The limit should always be greater than or equal to limit_min. Signed-off-by: Jing Su <jingsusu@didiglobal.com> Link: https://patch.msgid.link/Z9qHD1s/NEuQBdgH@pilot-ThinkCentre-M930t-N000 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25Merge branch 'selftests-net-mixed-select-polling-mode-for-tcp-ao-tests'Jakub Kicinski
Dmitry Safonov via says: ==================== selftests/net: Mixed select()+polling mode for TCP-AO tests Should fix flaky tcp-ao/connect-deny-ipv6 test. v1: https://lore.kernel.org/20250312-tcp-ao-selftests-polling-v1-0-72a642b855d5@gmail.com ==================== Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-0-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25selftests/net: Drop timeout argument from test_client_verify()Dmitry Safonov
It's always TEST_TIMEOUT_SEC, with an unjustified exception in rst test, that is more paranoia-long timeout rather than based on requirements. Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-7-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25selftests/net: Delete timeout from test_connect_socket()Dmitry Safonov
Unused: it's always either the default timeout or asynchronous connect(). Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-6-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25selftests/net: Print the testing side in unsigned-md5Dmitry Safonov
As both client and server print the same test name on failure or pass, add "[server]" so that it's more obvious from a log which side printed "ok" or "not ok". Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-5-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25selftests/net: Add mixed select()+polling mode to TCP-AO testsDmitry Safonov
Currently, tcp_ao tests have two timeouts: TEST_RETRANSMIT_SEC and TEST_TIMEOUT_SEC [by default 1 and 5 seconds]. The first one, TEST_RETRANSMIT_SEC is used for operations that are expected to succeed in order for a test to pass. It is usually not consumed and exists only to avoid indefinite test run if the operation didn't complete. The second one, TEST_RETRANSMIT_SEC exists for the tests that checking operations, that are expected to fail/timeout. It is shorter as it is fully consumed, with an expectation that if operation didn't succeed during that period, it will timeout. And the related test that expects the timeout is passing. The actual operation failure is then cross-verified by other means like counters checks. The issue with TEST_RETRANSMIT_SEC timeout is that 1 second is the exact initial TCP timeout. So, in case the initial segment gets lost (quite unlikely on local veth interface between two net namespaces, yet happens in slow VMs), the retransmission never happens and as a result, the test is not actually testing the functionality. Which in the end fails counters checks. As I want tcp_ao selftests to be fast and finishing in a reasonable amount of time on manual run, I didn't consider increasing TEST_RETRANSMIT_SEC. Rather, initially, BPF_SOCK_OPS_TIMEOUT_INIT looked promising as a lever to make the initial TCP timeout shorter. But as it's not a socket bpf attached thing, but sock_ops (attaches to cgroups), the selftests would have to use libbpf, which I wanted to avoid if not absolutely required. Instead, use a mixed select() and counters polling mode with the longer TEST_TIMEOUT_SEC timeout to detect running-away failed tests. It actually not only allows losing segments and succeeding after the previous TEST_RETRANSMIT_SEC timeout was consumed, but makes the tests expecting timeout/failure pass faster. The only test case taking longer (TEST_TIMEOUT_SEC) now is connect-deny "wrong snd id", which checks for no key on SYN-ACK for which there is no counter in the kernel (see tcp_make_synack()). Yet it can be speed up by poking skpair from the trace event (see trace_tcp_ao_synack_no_key). Fixes: ed9d09b309b1 ("selftests/net: Add a test for TCP-AO keys matching") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20241205070656.6ef344d7@kernel.org/ Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-4-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25selftests/net: Fetch and check TCP-MD5 countersDmitry Safonov
There are related TCP-MD5 <=> TCP and TCP-MD5 <=> TCP-AO tests that can benefit from checking the related counters, not only from validating operations timeouts. It also prepares the code for introduction of mixed select()+poll mode, see the follow-up patches. Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-3-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25selftests/net: Provide tcp-ao counters comparison helperDmitry Safonov
Rename __test_tcp_ao_counters_cmp() into test_assert_counters_ao() and test_tcp_ao_key_counters_cmp() into test_assert_counters_key() as they are asserts, rather than just compare functions. Provide test_cmp_counters() helper, that's going to be used to compare ao_info and netns counters as a stop condition for polling the sockets. Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-2-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25selftests/net: Print TCP flags in more common formatDmitry Safonov
Before: ># 13145[lib/ftrace-tcp.c:427] trace event filter tcp_ao_key_not_found [2001:db8:1::1:-1 => 2001:db8:254::1:7010, L3index 0, flags: !FS!R!P!., keyid: 100, rnext: 100, maclen: -1, sne: -1] = 1 After: ># 13487[lib/ftrace-tcp.c:427] trace event filter tcp_ao_key_not_found [2001:db8:1::1:-1 => 2001:db8:254::1:7010, L3index 0, flags: S, keyid: 100, rnext: 100, maclen: -1, sne: -1] = 1 For the history, I think the initial format was to emphasize the absence of flags as well as their presence (!R meant no RST flag). But looking again, it's just unreadable and hard to understand. Make it the standard/expected one. Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-1-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25ynl: devlink: add missing board-serial-numberJiri Pirko
Add a missing attribute of board serial number. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250320085947.103419-2-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25Merge branch 'net-xdp-add-missing-metadata-support-for-some-xdp-drvs'Jakub Kicinski
Lorenzo Bianconi says: ==================== net: xdp: Add missing metadata support for some xdp drvs Introduce missing metadata support for some xdp drivers setting metadata size building the skb from xdp_buff. Please note most of the drivers are just compile tested. v1: https://lore.kernel.org/20250311-mvneta-xdp-meta-v1-0-36cf1c99790e@kernel.org ==================== Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-0-b6075778f61f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: ti: cpsw: Add metadata support for xdp modeLorenzo Bianconi
Set metadata size building the skb from xdp_buff in cpsw/cpsw_new drivers. ti cpsw and cpsw_new drivers set xdp headroom at least to CPSW_HEADROOM_NA: CPSW_HEADROOM_NA max(XDP_PACKET_HEADROOM, NET_SKB_PAD) + NET_IP_ALIGN so the headroom is large enough to contain xdp_frame and xdp metadata. Please note this patch is just compiled tested. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-7-b6075778f61f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: mana: Add metadata support for xdp modeLorenzo Bianconi
Set metadata size building the skb from xdp_buff in mana driver. mana driver sets xdp headroom to XDP_PACKET_HEADROOM so the headroom is large enough to contain xdp_frame and xdp metadata. Please note this patch is just compiled tested. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-6-b6075778f61f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: ethernet: mediatek: Add metadata support for xdp modeLorenzo Bianconi
Set metadata size building the skb from xdp_buff in mediatek driver. mtk_eth_soc driver sets xdp headroom to XDP_PACKET_HEADROOM so the headroom is large enough to contain xdp_frame and xdp metadata. Please note this patch is just compiled tested. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-5-b6075778f61f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: octeontx2: Add metadata support for xdp modeLorenzo Bianconi
Set metadata size building the skb from xdp_buff in octeontx2 driver. octeontx2 driver sets xdp headroom to OTX2_HEAD_ROOM OTX2_HEAD_ROOM OTX2_ALIGN OTX2_ALIGN 128 so the headroom is large enough to contain xdp_frame and xdp metadata. Please note this patch is just compiled tested. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-4-b6075778f61f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: netsec: Add metadata support for xdp modeLorenzo Bianconi
Set metadata size building the skb from xdp_buff in netsec driver. netsec driver sets xdp headroom to NETSEC_RXBUF_HEADROOM: NETSEC_RXBUF_HEADROOM max(XDP_PACKET_HEADROOM, NET_SKB_PAD) + NET_IP_ALIGN so the headroom is large enough to contain xdp_frame and xdp metadata. Please note this patch is just compiled tested. Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-3-b6075778f61f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: mvpp2: Add metadata support for xdp modeLorenzo Bianconi
Set metadata size building the skb from xdp_buff in mvpp2 driver mvpp2 driver sets xdp headroom to: MVPP2_MH_SIZE + MVPP2_SKB_HEADROOM where MVPP2_MH_SIZE 2 MVPP2_SKB_HEADROOM min(max(XDP_PACKET_HEADROOM, NET_SKB_PAD), 224) so the headroom is large enough to contain xdp_frame and xdp metadata. Please note this patch is just compiled tested. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-2-b6075778f61f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: mvneta: Add metadata support for xdp modeLorenzo Bianconi
Set metadata size building the skb from xdp_buff in mvneta driver mvneta sets xdp headroom to: MVNETA_MH_SIZE + MVNETA_SKB_HEADROOM where MVNETA_MH_SIZE 2 MVNETA_SKB_HEADROOM max(NET_SKB_PAD, XDP_PACKET_HEADROOM) so the headroom is large enough to contain xdp_frame and xdp metadata. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-1-b6075778f61f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25net: tulip: avoid unused variable warningSimon Horman
There is an effort to achieve W=1 kernel builds without warnings. As part of that effort Helge Deller highlighted the following warnings in the tulip driver when compiling with W=1 and CONFIG_TULIP_MWI=n: .../tulip_core.c: In function ‘tulip_init_one’: .../tulip_core.c:1309:22: warning: variable ‘force_csr0’ set but not used This patch addresses that problem using IS_ENABLED(). This approach has the added benefit of reducing conditionally compiled code. And thus increasing compile coverage. E.g. for allmodconfig builds which enable CONFIG_TULIP_MWI. Compile tested only. No run-time effect intended. Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250318-tulip-w1-v3-1-a813fadd164d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25Merge branch 'af_unix-clean-up-headers'Jakub Kicinski
Kuniyuki Iwashima says: ==================== af_unix: Clean up headers. AF_UNIX files include many unnecessary headers (netdevice.h and rtnetlink.h, etc), and this series cleans them up. Note that there are still some headers included indirectly and modifying them triggers rebuild, which seems mostly inevitable. [0] $ python3 include_graph.py net/unix/garbage.c linux/rtnetlink.h linux/netdevice.h ... include/net/af_unix.h | include/linux/net.h | | include/linux/once.h | | include/linux/sockptr.h | | include/uapi/linux/net.h | include/net/sock.h | | include/linux/netdevice.h <--- ... | | include/net/dst.h | | | include/linux/rtnetlink.h <--- [0]: https://gist.github.com/q2ven/9c5897f11a493145829029c0bfb364d0 ==================== Link: https://patch.msgid.link/20250318034934.86708-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25af_unix: Clean up #include under net/unix/.Kuniyuki Iwashima
net/unix/*.c include many unnecessary header files (rtnetlink.h, netdevice.h, etc). Let's clean them up. af_unix.c: +uapi/linux/sockios.h : Only exist under include/uapi +uapi/linux/termios.h : Only exist under include/uapi -linux/freezer.h : No longer use freezable_schedule_timeout() -linux/in.h : No ipv4_is_XXX() etc -linux/module.h : No longer support CONFIG_UNIX=m -linux/netdevice.h : No dev used -linux/rtnetlink.h : Not part of rtnetlink API -linux/signal.h : signal_pending() is defined in sched/signal.h -linux/stat.h : No struct stat used -net/checksum.h : CHECKSUM_UNNECESSARY is defined in skbuff.h diag.c: +linux/dcache.h : struct dentry in sk_diag_dump_vfs() +linux/user_namespace.h : struct user_namespace in sk_diag_dump_uid() +uapi/linux/unix_diag.h : Only exist under include/uapi/ garbage.c: +linux/list.h : struct unix_{vertex,edge}, etc +linux/workqueue.h : DECLARE_WORK(unix_gc_work, ...) -linux/file.h : No fget() etc -linux/kernel.h : No cond_resched() etc -linux/netdevice.h : No dev used -linux/proc_fs.h : No procfs provided -linux/string.h : No memcpy(), kmemdup(), etc sysctl_net_unix.c: +linux/string.h : kmemdup() +net/net_namespace.h : struct net, net_eq() -linux/mm.h : slab.h is enough Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250318034934.86708-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>