summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-04-09net: txgbe: add sriov function supportMengyuan Lou
Add sriov_configure for driver ops. Add mailbox handler wx_msg_task for txgbe. Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Link: https://patch.msgid.link/ECDC57CF4F2316B9+20250408091556.9640-7-mengyuanlou@net-swift.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09net: ngbe: add sriov function supportMengyuan Lou
Add sriov_configure for driver ops. Add mailbox handler wx_msg_task for ngbe in the interrupt handler. Add the notification flow when the vfs exist. Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Link: https://patch.msgid.link/C9A0A43732966022+20250408091556.9640-6-mengyuanlou@net-swift.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09net: libwx: Add msg task funcMengyuan Lou
Implement wx_msg_task which is used to process mailbox messages sent by vf. Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Link: https://patch.msgid.link/8257B39B95CDB469+20250408091556.9640-5-mengyuanlou@net-swift.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09net: libwx: Redesign flow when sriov is enabledMengyuan Lou
Reallocate queue and int resources when sriov is enabled. Redefine macro VMDQ to make it work in VT mode. Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Link: https://patch.msgid.link/64B616774ABE3C5A+20250408091556.9640-4-mengyuanlou@net-swift.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09net: libwx: Add sriov api for wangxun nicsMengyuan Lou
Implement sriov_configure interface for wangxun nics in libwx. Enable VT mode and initialize vf control structure, when sriov is enabled. Do not be allowed to disable sriov when vfs are assigned. Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Link: https://patch.msgid.link/81EA45C21B0A98B0+20250408091556.9640-3-mengyuanlou@net-swift.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09net: libwx: Add mailbox api for wangxun pf driversMengyuan Lou
Implements the mailbox interfaces for wangxun pf drivers ngbe and txgbe. Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Link: https://patch.msgid.link/70017BD4D67614A4+20250408091556.9640-2-mengyuanlou@net-swift.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09net: ethernet: cortina: Use TOE/TSO on all TCPLinus Walleij
It is desireable to push the hardware accelerator to also process non-segmented TCP frames: we pass the skb->len to the "TOE/TSO" offloader and it will handle them. Without this quirk the driver becomes unstable and lock up and and crash. I do not know exactly why, but it is probably due to the TOE (TCP offload engine) feature that is coupled with the segmentation feature - it is not possible to turn one part off and not the other, either both TOE and TSO are active, or neither of them. Not having the TOE part active seems detrimental, as if that hardware feature is not really supposed to be turned off. The datasheet says: "Based on packet parsing and TCP connection/NAT table lookup results, the NetEngine puts the packets belonging to the same TCP connection to the same queue for the software to process. The NetEngine puts incoming packets to the buffer or series of buffers for a jumbo packet. With this hardware acceleration, IP/TCP header parsing, checksum validation and connection lookup are offloaded from the software processing." After numerous tests with the hardware locking up after something between minutes and hours depending on load using iperf3 I have concluded this is necessary to stabilize the hardware. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://patch.msgid.link/20250408-gemini-ethernet-tso-always-v1-1-e669f932359c@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09Merge branch ↵Jakub Kicinski
'bridge-prevent-unicast-arp-ns-packets-from-being-suppressed-by-bridge' Amit Cohen says: ==================== bridge: Prevent unicast ARP/NS packets from being suppressed by bridge Currently, unicast ARP requests/NS packets are replied by bridge when suppression is enabled, then they are also forwarded, which results two replicas of ARP reply/NA - one from the bridge and second from the target. The purpose of ARP/ND suppression is to reduce flooding in the broadcast domain, which is not relevant for unicast packets. In addition, the use case of unicast ARP/NS is to poll a specific host, so it does not make sense to have the switch answer on behalf of the host. Forward ARP requests/NS packets and prevent the bridge from replying to them. Patch set overview: Patch #1 prevents unicast ARP/NS packets from being suppressed by bridge Patch #2 adds test cases for unicast ARP/NS with suppression enabled ==================== Link: https://patch.msgid.link/cover.1744123493.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09selftests: test_bridge_neigh_suppress: Test unicast ARP/NS with suppressionAmit Cohen
Add test cases to check that unicast ARP/NS packets are replied once, even if ARP/ND suppression is enabled. Without the previous patch: $ ./test_bridge_neigh_suppress.sh ... Unicast ARP, per-port ARP suppression - VLAN 10 ----------------------------------------------- TEST: "neigh_suppress" is on [ OK ] TEST: Unicast ARP, suppression on, h1 filter [FAIL] TEST: Unicast ARP, suppression on, h2 filter [ OK ] Unicast ARP, per-port ARP suppression - VLAN 20 ----------------------------------------------- TEST: "neigh_suppress" is on [ OK ] TEST: Unicast ARP, suppression on, h1 filter [FAIL] TEST: Unicast ARP, suppression on, h2 filter [ OK ] ... Unicast NS, per-port NS suppression - VLAN 10 --------------------------------------------- TEST: "neigh_suppress" is on [ OK ] TEST: Unicast NS, suppression on, h1 filter [FAIL] TEST: Unicast NS, suppression on, h2 filter [ OK ] Unicast NS, per-port NS suppression - VLAN 20 --------------------------------------------- TEST: "neigh_suppress" is on [ OK ] TEST: Unicast NS, suppression on, h1 filter [FAIL] TEST: Unicast NS, suppression on, h2 filter [ OK ] ... Tests passed: 156 Tests failed: 4 With the previous patch: $ ./test_bridge_neigh_suppress.sh ... Unicast ARP, per-port ARP suppression - VLAN 10 ----------------------------------------------- TEST: "neigh_suppress" is on [ OK ] TEST: Unicast ARP, suppression on, h1 filter [ OK ] TEST: Unicast ARP, suppression on, h2 filter [ OK ] Unicast ARP, per-port ARP suppression - VLAN 20 ----------------------------------------------- TEST: "neigh_suppress" is on [ OK ] TEST: Unicast ARP, suppression on, h1 filter [ OK ] TEST: Unicast ARP, suppression on, h2 filter [ OK ] ... Unicast NS, per-port NS suppression - VLAN 10 --------------------------------------------- TEST: "neigh_suppress" is on [ OK ] TEST: Unicast NS, suppression on, h1 filter [ OK ] TEST: Unicast NS, suppression on, h2 filter [ OK ] Unicast NS, per-port NS suppression - VLAN 20 --------------------------------------------- TEST: "neigh_suppress" is on [ OK ] TEST: Unicast NS, suppression on, h1 filter [ OK ] TEST: Unicast NS, suppression on, h2 filter [ OK ] ... Tests passed: 160 Tests failed: 0 Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/dc240b9649b31278295189f412223f320432c5f2.1744123493.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09net: bridge: Prevent unicast ARP/NS packets from being suppressed by bridgeAmit Cohen
When Proxy ARP or ARP/ND suppression are enabled, ARP/NS packets can be handled by bridge in br_do_proxy_suppress_arp()/br_do_suppress_nd(). For broadcast packets, they are replied by bridge, but later they are not flooded. Currently, unicast packets are replied by bridge when suppression is enabled, and they are also forwarded, which results two replicas of ARP reply/NA - one from the bridge and second from the target. RFC 1122 describes use case for unicat ARP packets - "unicast poll" - actively poll the remote host by periodically sending a point-to-point ARP request to it, and delete the entry if no ARP reply is received from N successive polls. The purpose of ARP/ND suppression is to reduce flooding in the broadcast domain. If a host is sending a unicast ARP/NS, then it means it already knows the address and the switches probably know it as well and there will not be any flooding. In addition, the use case of unicast ARP/NS is to poll a specific host, so it does not make sense to have the switch answer on behalf of the host. According to RFC 9161: "A PE SHOULD reply to broadcast/multicast address resolution messages, i.e., ARP Requests, ARP probes, NS messages, as well as DAD NS messages. An ARP probe is an ARP Request constructed with an all-zero sender IP address that may be used by hosts for IPv4 Address Conflict Detection as specified in [RFC5227]. A PE SHOULD NOT reply to unicast address resolution requests (for instance, NUD NS messages)." Forward such requests and prevent the bridge from replying to them. Reported-by: Denis Yulevych <denisyu@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/6bf745a149ddfe5e6be8da684a63aa574a326f8d.1744123493.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09net: remove cpu stall in txq_trans_update()Eric Dumazet
txq_trans_update() currently uses txq->xmit_lock_owner to conditionally update txq->trans_start. For regular devices, txq->xmit_lock_owner is updated from HARD_TX_LOCK() and HARD_TX_UNLOCK(), and this apparently causes cpu stalls. Using dev->lltx, which sits in a read-mostly cache-line, and already used in HARD_TX_LOCK() and HARD_TX_UNLOCK() helps cpu prediction. On an AMD EPYC 7B12 dual socket server, tcp_rr with 128 threads and 30,000 flows gets a 5 % increase in throughput. As explained in commit 95ecba62e2fd ("net: fix races in netdev_tx_sent_queue()/dev_watchdog()") I am planning to no longer update txq->trans_start in the fast path in a followup patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250408202742.2145516-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09octeontx2-pf: Add error log forcn10k_map_unmap_rq_policer()Wentao Liang
The cn10k_free_matchall_ipolicer() calls the cn10k_map_unmap_rq_policer() for each queue in a for loop without checking for any errors. Check the return value of the cn10k_map_unmap_rq_policer() function during each loop, and report a warning if the function fails. Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250408032602.2909-1-vulab@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09configs/debug: run and debug PREEMPTStanislav Fomichev
Recent change [0] resulted in a "BUG: using __this_cpu_read() in preemptible" splat [1]. PREEMPT kernels have additional requirements on what can and can not run with/without preemption enabled. Expose those constrains in the debug kernels. 0: https://lore.kernel.org/netdev/20250314120048.12569-2-justin.iurman@uliege.be/ 1: https://lore.kernel.org/netdev/20250402094458.006ba2a7@kernel.org/T/#mbf72641e9d7d274daee9003ef5edf6833201f1bc Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250402172305.1775226-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09net: ipvlan: remove __get_unaligned_cpu32 from ipvlan driverJulian Vetter
The __get_unaligned_cpu32 function is deprecated. So, replace it with the more generic get_unaligned and just cast the input parameter. Signed-off-by: Julian Vetter <julian@outer-limits.org> Link: https://patch.msgid.link/20250408091946.2266271-1-julian@outer-limits.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09net: remove __get_unaligned_cpu32 from macvlan driverJulian Vetter
The __get_unaligned_cpu32 function is deprecated. So, replace it with the more generic get_unaligned and just cast the input parameter. Signed-off-by: Julian Vetter <julian@outer-limits.org> Link: https://patch.msgid.link/20250408091548.2263911-1-julian@outer-limits.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09Merge branch 'net-depend-on-instance-lock-for-queue-related-netlink-ops'Jakub Kicinski
Jakub Kicinski says: ==================== net: depend on instance lock for queue related netlink ops netdev-genl used to be protected by rtnl_lock. In previous release we already switched the queue management ops (for Rx zero-copy) to the instance lock. This series converts other ops to depend on the instance lock when possible. Unfortunately queue related state is hard to lock (unlike NAPI) as the process of switching the number of queues usually involves a large reconfiguration of the driver. The reconfig process has historically been under rtnl_lock, but for drivers which opt into ops locking it is also under the instance lock. Leverage that and conditionally take rtnl_lock or instance lock depending on the device capabilities. v1: https://lore.kernel.org/20250407190117.16528-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20250408195956.412733-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09netdev: depend on netdev->lock for qstats in ops locked driversJakub Kicinski
We mostly needed rtnl_lock in qstat to make sure the queue count is stable while we work. For "ops locked" drivers the instance lock protects the queue count, so we don't have to take rtnl_lock. For currently ops-locked drivers: netdevsim and bnxt need the protection from netdev going down while we dump, which instance lock provides. gve doesn't care. Reviewed-by: Joe Damato <jdamato@fastly.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250408195956.412733-9-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09docs: netdev: break down the instance locking info per ops structJakub Kicinski
Explicitly list all the ops structs and what locking they provide. Use "ops locked" as a term for drivers which have ops called under the instance lock. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250408195956.412733-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09netdev: depend on netdev->lock for xdp featuresJakub Kicinski
Writes to XDP features are now protected by netdev->lock. Other things we report are based on ops which don't change once device has been registered. It is safe to stop taking rtnl_lock, and depend on netdev->lock instead. Reviewed-by: Joe Damato <jdamato@fastly.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250408195956.412733-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09xdp: double protect netdev->xdp_flags with netdev->lockJakub Kicinski
Protect xdp_features with netdev->lock. This way pure readers no longer have to take rtnl_lock to access the field. This includes calling NETDEV_XDP_FEAT_CHANGE under the lock. Looks like that's fine for bonding, the only "real" listener, it's the same as ethtool feature change. In terms of normal drivers - only GVE need special consideration (other drivers don't use instance lock or don't support XDP). It calls xdp_set_features_flag() helper from gve_init_priv() which in turn is called from gve_reset_recovery() (locked), or prior to netdev registration. So switch to _locked. Reviewed-by: Joe Damato <jdamato@fastly.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Acked-by: Harshitha Ramamurthy <hramamurthy@google.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250408195956.412733-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09netdev: don't hold rtnl_lock over nl queue info get when possibleJakub Kicinski
Netdev queue dump accesses: NAPI, memory providers, XSk pointers. All three are "ops protected" now, switch to the op compat locking. rtnl lock does not have to be taken for "ops locked" devices. Reviewed-by: Joe Damato <jdamato@fastly.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250408195956.412733-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09netdev: add "ops compat locking" helpersJakub Kicinski
Add helpers to "lock a netdev in a backward-compatible way", which for ops-locked netdevs will mean take the instance lock. For drivers which haven't opted into the ops locking we'll take rtnl_lock. The scoped foreach is dropping and re-taking the lock for each device, even if prev and next are both under rtnl_lock. I hope that's fine since we expect that netdev nl to be mostly supported by modern drivers, and modern drivers should also opt into the instance locking. Note that these helpers are mostly needed for queue related state, because drivers modify queue config in their ops in a non-atomic way. Or differently put, queue changes don't have a clear-cut API like NAPI configuration. Any state that can should just use the instance lock directly, not the "compat" hacks. Reviewed-by: Joe Damato <jdamato@fastly.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250408195956.412733-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09net: designate XSK pool pointers in queues as "ops protected"Jakub Kicinski
Read accesses go via xsk_get_pool_from_qid(), the call coming from the core and gve look safe (other "ops locked" drivers don't support XSK). Write accesses go via xsk_reg_pool_at_qid() and xsk_clear_pool_at_qid(). Former is already under the ops lock, latter is not (both coming from the workqueue via xp_clear_dev() and NETDEV_UNREGISTER via xsk_notifier()). Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250408195956.412733-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09net: avoid potential race between netdev_get_by_index_lock() and netns switchJakub Kicinski
netdev_get_by_index_lock() performs following steps: rcu_lock(); dev = lookup(netns, ifindex); dev_get(dev); rcu_unlock(); [... lock & validate the dev ...] return dev Validation right now only checks if the device is registered but since the lookup is netns-aware we must also protect against the device switching netns right after we dropped the RCU lock. Otherwise the caller in netns1 may get a pointer to a device which has just switched to netns2. We can't hold the lock for the entire netns change process (because of the NETDEV_UNREGISTER notifier), and there's no existing marking to indicate that the netns is unlisted because of netns move, so add one. AFAIU none of the existing netdev_get_by_index_lock() callers can suffer from this problem (NAPI code double checks the netns membership and other callers are either under rtnl_lock or not ns-sensitive), so this patch does not have to be treated as a fix. Reviewed-by: Joe Damato <jdamato@fastly.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250408195956.412733-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08net: Drop unused @sk of __skb_try_recv_from_queue()Michal Luczaj
__skb_try_recv_from_queue() deals with a queue, @sk is not used since commit e427cad6eee4 ("net: datagram: drop 'destructor' argument from several helpers"). Remove sk from function parameters, adapt callers. No functional change intended. Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250407-cleanup-drop-param-sk-v1-1-cd076979afac@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08Merge branch 'udp_tunnel-gro-optimizations'Jakub Kicinski
Paolo Abeni says: ==================== udp_tunnel: GRO optimizations The UDP tunnel GRO stage is source of measurable overhead for workload based on UDP-encapsulated traffic: each incoming packets requires a full UDP socket lookup and an indirect call. In the most common setups a single UDP tunnel device is used. In such case we can optimize both the lookup and the indirect call. Patch 1 tracks per netns the active UDP tunnels and replaces the socket lookup with a single destination port comparison when possible. Patch 2 tracks the different types of UDP tunnels and replaces the indirect call with a static one when there is a single UDP tunnel type active. I measure ~10% performance improvement in TCP over UDP tunnel stream tests on top of this series. v4: https://lore.kernel.org/cover.1741718157.git.pabeni@redhat.com v3: https://lore.kernel.org/cover.1741632298.git.pabeni@redhat.com v2: https://lore.kernel.org/cover.1741338765.git.pabeni@redhat.com v1: https://lore.kernel.org/cover.1741275846.git.pabeni@redhat.com ==================== Link: https://patch.msgid.link/cover.1744040675.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08udp_tunnel: use static call for GRO hooks when possiblePaolo Abeni
It's quite common to have a single UDP tunnel type active in the whole system. In such a case we can replace the indirect call for the UDP tunnel GRO callback with a static call. Add the related accounting in the control path and switch to static call when possible. To keep the code simple use a static array for the registered tunnel types, and size such array based on the kernel config. Note that there are valid kernel configurations leading to UDP_MAX_TUNNEL_TYPES == 0 even with IS_ENABLED(CONFIG_NET_UDP_TUNNEL), Explicitly skip the accounting in such a case, to avoid compile warning when accessing "udp_tunnel_gro_types". Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/53d156cdfddcc9678449e873cc83e68fa1582653.1744040675.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08udp_tunnel: create a fastpath GRO lookup.Paolo Abeni
Most UDP tunnels bind a socket to a local port, with ANY address, no peer and no interface index specified. Additionally it's quite common to have a single tunnel device per namespace. Track in each namespace the UDP tunnel socket respecting the above. When only a single one is present, store a reference in the netns. When such reference is not NULL, UDP tunnel GRO lookup just need to match the incoming packet destination port vs the socket local port. The tunnel socket never sets the reuse[port] flag[s]. When bound to no address and interface, no other socket can exist in the same netns matching the specified local port. Matching packets with non-local destination addresses will be aggregated, and eventually segmented as needed - no behavior changes intended. Restrict the optimization to kernel sockets only: it covers all the relevant use-cases, and user-space owned sockets could be disconnected and rebound after setup_udp_tunnel_sock(), breaking the uniqueness assumption Note that the UDP tunnel socket reference is stored into struct netns_ipv4 for both IPv4 and IPv6 tunnels. That is intentional to keep all the fastpath-related netns fields in the same struct and allow cacheline-based optimization. Currently both the IPv4 and IPv6 socket pointer share the same cacheline as the `udp_table` field. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/41d16bc8d1257d567f9344c445b4ae0b4a91ede4.1744040675.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08selftests: tc-testing: Pre-load IFE action and its submodulesVictor Nogueira
Recently we had some issues in parallel TDC where some of IFE tests are failing due to some of IFE's submodules (like act_meta_skbtcindex and act_meta_skbprio) taking too long to load [1]. To avoid that issue, pre-load IFE and all its submodules before running any of the tests in tdc.sh [1] https://lore.kernel.org/netdev/e909b2a0-244e-4141-9fa9-1b7d96ab7d71@mojatatu.com/T/#u Signed-off-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20250407215656.2535990-1-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08net: ena: Support persistent per-NAPI config.Kuniyuki Iwashima
Let's pass the queue index to netif_napi_add_config() to preserve per-NAPI config. Test: Set 100 to defer-hard-irqs (default is 0) and check the value after link down & up. $ cat /sys/class/net/enp39s0/napi_defer_hard_irqs 0 $ ./tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 2}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 65, 'ifindex': 2, 'irq': 29, 'irq-suspend-timeout': 0}] $ sudo ./tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --do napi-set --json='{"id": 65, "defer-hard-irqs": 100}' $ sudo ip link set enp39s0 down && sudo ip link set enp39s0 up Without patch: $ ./tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 2}' [{'defer-hard-irqs': 0, <------------------- Reset to 0 'gro-flush-timeout': 0, 'id': 66, <------------------------------- New ID 'ifindex': 2, 'irq': 29, 'irq-suspend-timeout': 0}] With patch: $ ./tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 2}' [{'defer-hard-irqs': 100, <--------------+-- Preserved 'gro-flush-timeout': 0, | 'id': 65, <----------------------------' 'ifindex': 2, 'irq': 29, 'irq-suspend-timeout': 0}] Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com> Link: https://patch.msgid.link/20250407164802.25184-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08Merge branch 'rps-misc-changes'Jakub Kicinski
Eric Dumazet says: ==================== rps: misc changes Minor changes in rps: skb_flow_limit() is probably unused these days, and data-races are quite theoretical. ==================== Link: https://patch.msgid.link/20250407163602.170356-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08net: rps: remove kfree_rcu_mightsleep() useEric Dumazet
Add an rcu_head to sd_flow_limit and rps_sock_flow_table structs to use the more conventional and predictable k[v]free_rcu(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250407163602.170356-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08net: add data-race annotations in softnet_seq_show()Eric Dumazet
softnet_seq_show() reads several fields that might be updated concurrently. Add READ_ONCE() and WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250407163602.170356-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08net: rps: annotate data-races around (struct sd_flow_limit)->countEric Dumazet
softnet_seq_show() can read fl->count while another cpu updates this field from skb_flow_limit(). Make this field an 'unsigned int', as its only consumer only deals with 32 bit. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250407163602.170356-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08net: rps: change skb_flow_limit() hash functionEric Dumazet
As explained in commit f3483c8e1da6 ("net: rfs: hash function change"), masking low order bits of skb_get_hash(skb) has low entropy. A NIC with 32 RX queues uses the 5 low order bits of rss key to select a queue. This means all packets landing to a given queue share the same 5 low order bits. Switch to hash_32() to reduce hash collisions. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250407163602.170356-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08amd-xgbe: Convert to SPDX identifierRaju Rangoju
Use SPDX-License-Identifier accross all the files of the xgbe driver to ensure compliance with Linux kernel standards, thus removing the boiler-plate template license text. Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250407102913.3063691-1-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-08rocker: Simplify if condition in ofdpa_port_fdb()Thorsten Blum
Remove the double negation and simplify the if condition. No functional changes intended. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250407091442.743478-1-thorsten.blum@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-08eth: nfp: remove __get_unaligned_cpu32 from netronome driversJulian Vetter
The __get_unaligned_cpu32 function is deprecated. So, replace it with the more generic get_unaligned and just cast the input parameter. Signed-off-by: Julian Vetter <julian@outer-limits.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250407083306.1553921-1-julian@outer-limits.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-08hamradio: Remove unnecessary strscpy_pad() size argumentsThorsten Blum
If the destination buffer has a fixed length, strscpy_pad() automatically determines its size using sizeof() when the argument is omitted. This makes the explicit sizeof() calls unnecessary - remove them. No functional changes intended. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://patch.msgid.link/20250407082607.741919-2-thorsten.blum@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-04Merge tag 'net-6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter. Current release - regressions: - four fixes for the netdev per-instance locking Current release - new code bugs: - consolidate more code between existing Rx zero-copy and uring so that the latter doesn't miss / have to duplicate the safety checks Previous releases - regressions: - ipv6: fix omitted Netlink attributes when using SKIP_STATS Previous releases - always broken: - net: fix geneve_opt length integer overflow - udp: fix multiple wrap arounds of sk->sk_rmem_alloc when it approaches INT_MAX - dsa: mvpp2: add a lock to avoid corruption of the shared TCAM - dsa: airoha: fix issues with traffic QoS configuration / offload, and flow table offload Misc: - touch up the Netlink YAML specs of old families to make them usable for user space C codegen" * tag 'net-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits) selftests: net: amt: indicate progress in the stress test netlink: specs: rt_route: pull the ifa- prefix out of the names netlink: specs: rt_addr: pull the ifa- prefix out of the names netlink: specs: rt_addr: fix get multi command name netlink: specs: rt_addr: fix the spec format / schema failures net: avoid false positive warnings in __net_mp_close_rxq() net: move mp dev config validation to __net_mp_open_rxq() net: ibmveth: make veth_pool_store stop hanging arcnet: Add NULL check in com20020pci_probe() ipv6: Do not consider link down nexthops in path selection ipv6: Start path selection from the first nexthop usbnet:fix NPE during rx_complete net: octeontx2: Handle XDP_ABORTED and XDP invalid as XDP_DROP net: fix geneve_opt length integer overflow io_uring/zcrx: fix selftests w/ updated netdev Python helpers selftests: net: use netdevsim in netns test docs: net: document netdev notifier expectations net: dummy: request ops lock netdevsim: add dummy device notifiers net: rename rtnl_net_debug to lock_debug ...
2025-04-04Merge tag 'spi-fix-v6.15-merge-window' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A small collection of fixes that came in during the merge window, everything is driver specific with nothing standing out particularly" * tag 'spi-fix-v6.15-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: bcm2835: Restore native CS probing when pinctrl-bcm2835 is absent spi: bcm2835: Do not call gpiod_put() on invalid descriptor spi: cadence-qspi: revert "Improve spi memory performance" spi: cadence: Fix out-of-bounds array access in cdns_mrvl_xspi_setup_clock() spi: fsl-qspi: use devm function instead of driver remove spi: SPI_QPIC_SNAND should be tristate and depend on MTD spi-rockchip: Fix register out of bounds access
2025-04-04Merge tag 'soc-drivers-6.15-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull more SoC driver updates from Arnd Bergmann: "This is the promised follow-up to the soc drivers branch, adding minor updates to omap and freescale drivers. Most notably, Ioana Ciornei takes over maintenance of the DPAA bus driver used in some NXP (originally Freescale) chips" * tag 'soc-drivers-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: bus: fsl-mc: Remove deadcode MAINTAINERS: add the linuppc-dev list to the fsl-mc bus entry MAINTAINERS: fix nonexistent dtbinding file name MAINTAINERS: add myself as maintainer for the fsl-mc bus irqdomain: soc: Switch to irq_find_mapping() Input: tsc2007 - accept standard properties
2025-04-04Merge tag 'platform-drivers-x86-v6.15-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: - thinkpad_acpi: - Fix NULL pointer dereferences while probing - Disable ACPI fan access for T495* and E560 - ISST: Correct command storage data length * tag 'platform-drivers-x86-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: MAINTAINERS: consistently use my dedicated email address platform/x86: ISST: Correct command storage data length platform/x86: thinkpad_acpi: disable ACPI fan access for T495* and E560 platform/x86: thinkpad_acpi: Fix NULL pointer dereferences while probing
2025-04-04selftests: net: amt: indicate progress in the stress testJakub Kicinski
Our CI expects output from the test at least once every 10 minutes. The AMT test when running on debug kernel is just on the edge of that time for the stress test. Improve the output: - print the name of the test first, before starting it, - output a dot every 10% of the way. Output after: TEST: amt discovery [ OK ] TEST: IPv4 amt multicast forwarding [ OK ] TEST: IPv6 amt multicast forwarding [ OK ] TEST: IPv4 amt traffic forwarding torture .......... [ OK ] TEST: IPv6 amt traffic forwarding torture .......... [ OK ] Reviewed-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250403145636.2891166-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04Merge branch 'netlink-specs-rt_addr-fix-problems-revealed-by-c-codegen'Jakub Kicinski
Jakub Kicinski says: ==================== netlink: specs: rt_addr: fix problems revealed by C codegen I put together basic YNL C support for classic netlink. This revealed a few problems in the rt_addr spec. v1: https://lore.kernel.org/20250401012939.2116915-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20250403013706.2828322-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04netlink: specs: rt_route: pull the ifa- prefix out of the namesJakub Kicinski
YAML specs don't normally include the C prefix name in the name of the YAML attr. Remove the ifa- prefix from all attributes in route-attrs and metrics and specify name-prefix instead. This is a bit risky, hopefully there aren't many users out there. Fixes: 023289b4f582 ("doc/netlink: Add spec for rt route messages") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250403013706.2828322-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04netlink: specs: rt_addr: pull the ifa- prefix out of the namesJakub Kicinski
YAML specs don't normally include the C prefix name in the name of the YAML attr. Remove the ifa- prefix from all attributes in addr-attrs and specify name-prefix instead. This is a bit risky, hopefully there aren't many users out there. Fixes: dfb0f7d9d979 ("doc/netlink: Add spec for rt addr messages") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250403013706.2828322-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04netlink: specs: rt_addr: fix get multi command nameJakub Kicinski
Command names should match C defines, codegens may depend on it. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Fixes: 4f280376e531 ("selftests/net: Add selftest for IPv4 RTM_GETMULTICAST support") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250403013706.2828322-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04netlink: specs: rt_addr: fix the spec format / schema failuresJakub Kicinski
The spec is mis-formatted, schema validation says: Failed validating 'type' in schema['properties']['operations']['properties']['list']['items']['properties']['dump']['properties']['request']['properties']['value']: {'minimum': 0, 'type': 'integer'} On instance['operations']['list'][3]['dump']['request']['value']: '58 - ifa-family' The ifa-family clearly wants to be part of an attribute list. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Yuyang Huang <yuyanghuang@google.com> Fixes: 4f280376e531 ("selftests/net: Add selftest for IPv4 RTM_GETMULTICAST support") Link: https://patch.msgid.link/20250403013706.2828322-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04Merge branch 'net-make-memory-provider-install-close-paths-more-common'Jakub Kicinski
Jakub Kicinski says: ==================== net: make memory provider install / close paths more common We seem to be fixing bugs in config path for devmem which also exist in the io_uring ZC path. Let's try to make the two paths more common, otherwise this is bound to keep happening. Found by code inspection and compile tested only. v1: https://lore.kernel.org/20250331194201.2026422-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20250403013405.2827250-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>