summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-04-09docs: netdev: break down the instance locking info per ops structJakub Kicinski
Explicitly list all the ops structs and what locking they provide. Use "ops locked" as a term for drivers which have ops called under the instance lock. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250408195956.412733-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09netdev: depend on netdev->lock for xdp featuresJakub Kicinski
Writes to XDP features are now protected by netdev->lock. Other things we report are based on ops which don't change once device has been registered. It is safe to stop taking rtnl_lock, and depend on netdev->lock instead. Reviewed-by: Joe Damato <jdamato@fastly.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250408195956.412733-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09xdp: double protect netdev->xdp_flags with netdev->lockJakub Kicinski
Protect xdp_features with netdev->lock. This way pure readers no longer have to take rtnl_lock to access the field. This includes calling NETDEV_XDP_FEAT_CHANGE under the lock. Looks like that's fine for bonding, the only "real" listener, it's the same as ethtool feature change. In terms of normal drivers - only GVE need special consideration (other drivers don't use instance lock or don't support XDP). It calls xdp_set_features_flag() helper from gve_init_priv() which in turn is called from gve_reset_recovery() (locked), or prior to netdev registration. So switch to _locked. Reviewed-by: Joe Damato <jdamato@fastly.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Acked-by: Harshitha Ramamurthy <hramamurthy@google.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250408195956.412733-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09netdev: don't hold rtnl_lock over nl queue info get when possibleJakub Kicinski
Netdev queue dump accesses: NAPI, memory providers, XSk pointers. All three are "ops protected" now, switch to the op compat locking. rtnl lock does not have to be taken for "ops locked" devices. Reviewed-by: Joe Damato <jdamato@fastly.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250408195956.412733-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09netdev: add "ops compat locking" helpersJakub Kicinski
Add helpers to "lock a netdev in a backward-compatible way", which for ops-locked netdevs will mean take the instance lock. For drivers which haven't opted into the ops locking we'll take rtnl_lock. The scoped foreach is dropping and re-taking the lock for each device, even if prev and next are both under rtnl_lock. I hope that's fine since we expect that netdev nl to be mostly supported by modern drivers, and modern drivers should also opt into the instance locking. Note that these helpers are mostly needed for queue related state, because drivers modify queue config in their ops in a non-atomic way. Or differently put, queue changes don't have a clear-cut API like NAPI configuration. Any state that can should just use the instance lock directly, not the "compat" hacks. Reviewed-by: Joe Damato <jdamato@fastly.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250408195956.412733-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09net: designate XSK pool pointers in queues as "ops protected"Jakub Kicinski
Read accesses go via xsk_get_pool_from_qid(), the call coming from the core and gve look safe (other "ops locked" drivers don't support XSK). Write accesses go via xsk_reg_pool_at_qid() and xsk_clear_pool_at_qid(). Former is already under the ops lock, latter is not (both coming from the workqueue via xp_clear_dev() and NETDEV_UNREGISTER via xsk_notifier()). Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250408195956.412733-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-09net: avoid potential race between netdev_get_by_index_lock() and netns switchJakub Kicinski
netdev_get_by_index_lock() performs following steps: rcu_lock(); dev = lookup(netns, ifindex); dev_get(dev); rcu_unlock(); [... lock & validate the dev ...] return dev Validation right now only checks if the device is registered but since the lookup is netns-aware we must also protect against the device switching netns right after we dropped the RCU lock. Otherwise the caller in netns1 may get a pointer to a device which has just switched to netns2. We can't hold the lock for the entire netns change process (because of the NETDEV_UNREGISTER notifier), and there's no existing marking to indicate that the netns is unlisted because of netns move, so add one. AFAIU none of the existing netdev_get_by_index_lock() callers can suffer from this problem (NAPI code double checks the netns membership and other callers are either under rtnl_lock or not ns-sensitive), so this patch does not have to be treated as a fix. Reviewed-by: Joe Damato <jdamato@fastly.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250408195956.412733-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08net: Drop unused @sk of __skb_try_recv_from_queue()Michal Luczaj
__skb_try_recv_from_queue() deals with a queue, @sk is not used since commit e427cad6eee4 ("net: datagram: drop 'destructor' argument from several helpers"). Remove sk from function parameters, adapt callers. No functional change intended. Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250407-cleanup-drop-param-sk-v1-1-cd076979afac@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08Merge branch 'udp_tunnel-gro-optimizations'Jakub Kicinski
Paolo Abeni says: ==================== udp_tunnel: GRO optimizations The UDP tunnel GRO stage is source of measurable overhead for workload based on UDP-encapsulated traffic: each incoming packets requires a full UDP socket lookup and an indirect call. In the most common setups a single UDP tunnel device is used. In such case we can optimize both the lookup and the indirect call. Patch 1 tracks per netns the active UDP tunnels and replaces the socket lookup with a single destination port comparison when possible. Patch 2 tracks the different types of UDP tunnels and replaces the indirect call with a static one when there is a single UDP tunnel type active. I measure ~10% performance improvement in TCP over UDP tunnel stream tests on top of this series. v4: https://lore.kernel.org/cover.1741718157.git.pabeni@redhat.com v3: https://lore.kernel.org/cover.1741632298.git.pabeni@redhat.com v2: https://lore.kernel.org/cover.1741338765.git.pabeni@redhat.com v1: https://lore.kernel.org/cover.1741275846.git.pabeni@redhat.com ==================== Link: https://patch.msgid.link/cover.1744040675.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08udp_tunnel: use static call for GRO hooks when possiblePaolo Abeni
It's quite common to have a single UDP tunnel type active in the whole system. In such a case we can replace the indirect call for the UDP tunnel GRO callback with a static call. Add the related accounting in the control path and switch to static call when possible. To keep the code simple use a static array for the registered tunnel types, and size such array based on the kernel config. Note that there are valid kernel configurations leading to UDP_MAX_TUNNEL_TYPES == 0 even with IS_ENABLED(CONFIG_NET_UDP_TUNNEL), Explicitly skip the accounting in such a case, to avoid compile warning when accessing "udp_tunnel_gro_types". Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/53d156cdfddcc9678449e873cc83e68fa1582653.1744040675.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08udp_tunnel: create a fastpath GRO lookup.Paolo Abeni
Most UDP tunnels bind a socket to a local port, with ANY address, no peer and no interface index specified. Additionally it's quite common to have a single tunnel device per namespace. Track in each namespace the UDP tunnel socket respecting the above. When only a single one is present, store a reference in the netns. When such reference is not NULL, UDP tunnel GRO lookup just need to match the incoming packet destination port vs the socket local port. The tunnel socket never sets the reuse[port] flag[s]. When bound to no address and interface, no other socket can exist in the same netns matching the specified local port. Matching packets with non-local destination addresses will be aggregated, and eventually segmented as needed - no behavior changes intended. Restrict the optimization to kernel sockets only: it covers all the relevant use-cases, and user-space owned sockets could be disconnected and rebound after setup_udp_tunnel_sock(), breaking the uniqueness assumption Note that the UDP tunnel socket reference is stored into struct netns_ipv4 for both IPv4 and IPv6 tunnels. That is intentional to keep all the fastpath-related netns fields in the same struct and allow cacheline-based optimization. Currently both the IPv4 and IPv6 socket pointer share the same cacheline as the `udp_table` field. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/41d16bc8d1257d567f9344c445b4ae0b4a91ede4.1744040675.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08selftests: tc-testing: Pre-load IFE action and its submodulesVictor Nogueira
Recently we had some issues in parallel TDC where some of IFE tests are failing due to some of IFE's submodules (like act_meta_skbtcindex and act_meta_skbprio) taking too long to load [1]. To avoid that issue, pre-load IFE and all its submodules before running any of the tests in tdc.sh [1] https://lore.kernel.org/netdev/e909b2a0-244e-4141-9fa9-1b7d96ab7d71@mojatatu.com/T/#u Signed-off-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20250407215656.2535990-1-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08net: ena: Support persistent per-NAPI config.Kuniyuki Iwashima
Let's pass the queue index to netif_napi_add_config() to preserve per-NAPI config. Test: Set 100 to defer-hard-irqs (default is 0) and check the value after link down & up. $ cat /sys/class/net/enp39s0/napi_defer_hard_irqs 0 $ ./tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 2}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 65, 'ifindex': 2, 'irq': 29, 'irq-suspend-timeout': 0}] $ sudo ./tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --do napi-set --json='{"id": 65, "defer-hard-irqs": 100}' $ sudo ip link set enp39s0 down && sudo ip link set enp39s0 up Without patch: $ ./tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 2}' [{'defer-hard-irqs': 0, <------------------- Reset to 0 'gro-flush-timeout': 0, 'id': 66, <------------------------------- New ID 'ifindex': 2, 'irq': 29, 'irq-suspend-timeout': 0}] With patch: $ ./tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 2}' [{'defer-hard-irqs': 100, <--------------+-- Preserved 'gro-flush-timeout': 0, | 'id': 65, <----------------------------' 'ifindex': 2, 'irq': 29, 'irq-suspend-timeout': 0}] Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com> Link: https://patch.msgid.link/20250407164802.25184-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08Merge branch 'rps-misc-changes'Jakub Kicinski
Eric Dumazet says: ==================== rps: misc changes Minor changes in rps: skb_flow_limit() is probably unused these days, and data-races are quite theoretical. ==================== Link: https://patch.msgid.link/20250407163602.170356-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08net: rps: remove kfree_rcu_mightsleep() useEric Dumazet
Add an rcu_head to sd_flow_limit and rps_sock_flow_table structs to use the more conventional and predictable k[v]free_rcu(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250407163602.170356-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08net: add data-race annotations in softnet_seq_show()Eric Dumazet
softnet_seq_show() reads several fields that might be updated concurrently. Add READ_ONCE() and WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250407163602.170356-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08net: rps: annotate data-races around (struct sd_flow_limit)->countEric Dumazet
softnet_seq_show() can read fl->count while another cpu updates this field from skb_flow_limit(). Make this field an 'unsigned int', as its only consumer only deals with 32 bit. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250407163602.170356-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08net: rps: change skb_flow_limit() hash functionEric Dumazet
As explained in commit f3483c8e1da6 ("net: rfs: hash function change"), masking low order bits of skb_get_hash(skb) has low entropy. A NIC with 32 RX queues uses the 5 low order bits of rss key to select a queue. This means all packets landing to a given queue share the same 5 low order bits. Switch to hash_32() to reduce hash collisions. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250407163602.170356-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-08amd-xgbe: Convert to SPDX identifierRaju Rangoju
Use SPDX-License-Identifier accross all the files of the xgbe driver to ensure compliance with Linux kernel standards, thus removing the boiler-plate template license text. Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250407102913.3063691-1-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-08rocker: Simplify if condition in ofdpa_port_fdb()Thorsten Blum
Remove the double negation and simplify the if condition. No functional changes intended. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250407091442.743478-1-thorsten.blum@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-08eth: nfp: remove __get_unaligned_cpu32 from netronome driversJulian Vetter
The __get_unaligned_cpu32 function is deprecated. So, replace it with the more generic get_unaligned and just cast the input parameter. Signed-off-by: Julian Vetter <julian@outer-limits.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250407083306.1553921-1-julian@outer-limits.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-08hamradio: Remove unnecessary strscpy_pad() size argumentsThorsten Blum
If the destination buffer has a fixed length, strscpy_pad() automatically determines its size using sizeof() when the argument is omitted. This makes the explicit sizeof() calls unnecessary - remove them. No functional changes intended. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://patch.msgid.link/20250407082607.741919-2-thorsten.blum@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-04Merge tag 'net-6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter. Current release - regressions: - four fixes for the netdev per-instance locking Current release - new code bugs: - consolidate more code between existing Rx zero-copy and uring so that the latter doesn't miss / have to duplicate the safety checks Previous releases - regressions: - ipv6: fix omitted Netlink attributes when using SKIP_STATS Previous releases - always broken: - net: fix geneve_opt length integer overflow - udp: fix multiple wrap arounds of sk->sk_rmem_alloc when it approaches INT_MAX - dsa: mvpp2: add a lock to avoid corruption of the shared TCAM - dsa: airoha: fix issues with traffic QoS configuration / offload, and flow table offload Misc: - touch up the Netlink YAML specs of old families to make them usable for user space C codegen" * tag 'net-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits) selftests: net: amt: indicate progress in the stress test netlink: specs: rt_route: pull the ifa- prefix out of the names netlink: specs: rt_addr: pull the ifa- prefix out of the names netlink: specs: rt_addr: fix get multi command name netlink: specs: rt_addr: fix the spec format / schema failures net: avoid false positive warnings in __net_mp_close_rxq() net: move mp dev config validation to __net_mp_open_rxq() net: ibmveth: make veth_pool_store stop hanging arcnet: Add NULL check in com20020pci_probe() ipv6: Do not consider link down nexthops in path selection ipv6: Start path selection from the first nexthop usbnet:fix NPE during rx_complete net: octeontx2: Handle XDP_ABORTED and XDP invalid as XDP_DROP net: fix geneve_opt length integer overflow io_uring/zcrx: fix selftests w/ updated netdev Python helpers selftests: net: use netdevsim in netns test docs: net: document netdev notifier expectations net: dummy: request ops lock netdevsim: add dummy device notifiers net: rename rtnl_net_debug to lock_debug ...
2025-04-04Merge tag 'spi-fix-v6.15-merge-window' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A small collection of fixes that came in during the merge window, everything is driver specific with nothing standing out particularly" * tag 'spi-fix-v6.15-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: bcm2835: Restore native CS probing when pinctrl-bcm2835 is absent spi: bcm2835: Do not call gpiod_put() on invalid descriptor spi: cadence-qspi: revert "Improve spi memory performance" spi: cadence: Fix out-of-bounds array access in cdns_mrvl_xspi_setup_clock() spi: fsl-qspi: use devm function instead of driver remove spi: SPI_QPIC_SNAND should be tristate and depend on MTD spi-rockchip: Fix register out of bounds access
2025-04-04Merge tag 'soc-drivers-6.15-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull more SoC driver updates from Arnd Bergmann: "This is the promised follow-up to the soc drivers branch, adding minor updates to omap and freescale drivers. Most notably, Ioana Ciornei takes over maintenance of the DPAA bus driver used in some NXP (originally Freescale) chips" * tag 'soc-drivers-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: bus: fsl-mc: Remove deadcode MAINTAINERS: add the linuppc-dev list to the fsl-mc bus entry MAINTAINERS: fix nonexistent dtbinding file name MAINTAINERS: add myself as maintainer for the fsl-mc bus irqdomain: soc: Switch to irq_find_mapping() Input: tsc2007 - accept standard properties
2025-04-04Merge tag 'platform-drivers-x86-v6.15-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: - thinkpad_acpi: - Fix NULL pointer dereferences while probing - Disable ACPI fan access for T495* and E560 - ISST: Correct command storage data length * tag 'platform-drivers-x86-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: MAINTAINERS: consistently use my dedicated email address platform/x86: ISST: Correct command storage data length platform/x86: thinkpad_acpi: disable ACPI fan access for T495* and E560 platform/x86: thinkpad_acpi: Fix NULL pointer dereferences while probing
2025-04-04selftests: net: amt: indicate progress in the stress testJakub Kicinski
Our CI expects output from the test at least once every 10 minutes. The AMT test when running on debug kernel is just on the edge of that time for the stress test. Improve the output: - print the name of the test first, before starting it, - output a dot every 10% of the way. Output after: TEST: amt discovery [ OK ] TEST: IPv4 amt multicast forwarding [ OK ] TEST: IPv6 amt multicast forwarding [ OK ] TEST: IPv4 amt traffic forwarding torture .......... [ OK ] TEST: IPv6 amt traffic forwarding torture .......... [ OK ] Reviewed-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250403145636.2891166-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04Merge branch 'netlink-specs-rt_addr-fix-problems-revealed-by-c-codegen'Jakub Kicinski
Jakub Kicinski says: ==================== netlink: specs: rt_addr: fix problems revealed by C codegen I put together basic YNL C support for classic netlink. This revealed a few problems in the rt_addr spec. v1: https://lore.kernel.org/20250401012939.2116915-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20250403013706.2828322-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04netlink: specs: rt_route: pull the ifa- prefix out of the namesJakub Kicinski
YAML specs don't normally include the C prefix name in the name of the YAML attr. Remove the ifa- prefix from all attributes in route-attrs and metrics and specify name-prefix instead. This is a bit risky, hopefully there aren't many users out there. Fixes: 023289b4f582 ("doc/netlink: Add spec for rt route messages") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250403013706.2828322-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04netlink: specs: rt_addr: pull the ifa- prefix out of the namesJakub Kicinski
YAML specs don't normally include the C prefix name in the name of the YAML attr. Remove the ifa- prefix from all attributes in addr-attrs and specify name-prefix instead. This is a bit risky, hopefully there aren't many users out there. Fixes: dfb0f7d9d979 ("doc/netlink: Add spec for rt addr messages") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250403013706.2828322-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04netlink: specs: rt_addr: fix get multi command nameJakub Kicinski
Command names should match C defines, codegens may depend on it. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Fixes: 4f280376e531 ("selftests/net: Add selftest for IPv4 RTM_GETMULTICAST support") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250403013706.2828322-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04netlink: specs: rt_addr: fix the spec format / schema failuresJakub Kicinski
The spec is mis-formatted, schema validation says: Failed validating 'type' in schema['properties']['operations']['properties']['list']['items']['properties']['dump']['properties']['request']['properties']['value']: {'minimum': 0, 'type': 'integer'} On instance['operations']['list'][3]['dump']['request']['value']: '58 - ifa-family' The ifa-family clearly wants to be part of an attribute list. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Yuyang Huang <yuyanghuang@google.com> Fixes: 4f280376e531 ("selftests/net: Add selftest for IPv4 RTM_GETMULTICAST support") Link: https://patch.msgid.link/20250403013706.2828322-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04Merge branch 'net-make-memory-provider-install-close-paths-more-common'Jakub Kicinski
Jakub Kicinski says: ==================== net: make memory provider install / close paths more common We seem to be fixing bugs in config path for devmem which also exist in the io_uring ZC path. Let's try to make the two paths more common, otherwise this is bound to keep happening. Found by code inspection and compile tested only. v1: https://lore.kernel.org/20250331194201.2026422-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20250403013405.2827250-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04net: avoid false positive warnings in __net_mp_close_rxq()Jakub Kicinski
Commit under Fixes solved the problem of spurious warnings when we uninstall an MP from a device while its down. The __net_mp_close_rxq() which is used by io_uring was not fixed. Move the fix over and reuse __net_mp_close_rxq() in the devmem path. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Fixes: a70f891e0fa0 ("net: devmem: do not WARN conditionally after netdev_rx_queue_restart()") Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250403013405.2827250-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04net: move mp dev config validation to __net_mp_open_rxq()Jakub Kicinski
devmem code performs a number of safety checks to avoid having to reimplement all of them in the drivers. Move those to __net_mp_open_rxq() and reuse that function for binding to make sure that io_uring ZC also benefits from them. While at it rename the queue ID variable to rxq_idx in __net_mp_open_rxq(), we touch most of the relevant lines. The XArray insertion is reordered after the netdev_rx_queue_restart() call, otherwise we'd need to duplicate the queue index check or risk inserting an invalid pointer. The XArray allocation failures should be extremely rare. Reviewed-by: Mina Almasry <almasrymina@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Fixes: 6e18ed929d3b ("net: add helpers for setting a memory provider on an rx queue") Link: https://patch.msgid.link/20250403013405.2827250-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04net: ibmveth: make veth_pool_store stop hangingDave Marquardt
v2: - Created a single error handling unlock and exit in veth_pool_store - Greatly expanded commit message with previous explanatory-only text Summary: Use rtnl_mutex to synchronize veth_pool_store with itself, ibmveth_close and ibmveth_open, preventing multiple calls in a row to napi_disable. Background: Two (or more) threads could call veth_pool_store through writing to /sys/devices/vio/30000002/pool*/*. You can do this easily with a little shell script. This causes a hang. I configured LOCKDEP, compiled ibmveth.c with DEBUG, and built a new kernel. I ran this test again and saw: Setting pool0/active to 0 Setting pool1/active to 1 [ 73.911067][ T4365] ibmveth 30000002 eth0: close starting Setting pool1/active to 1 Setting pool1/active to 0 [ 73.911367][ T4366] ibmveth 30000002 eth0: close starting [ 73.916056][ T4365] ibmveth 30000002 eth0: close complete [ 73.916064][ T4365] ibmveth 30000002 eth0: open starting [ 110.808564][ T712] systemd-journald[712]: Sent WATCHDOG=1 notification. [ 230.808495][ T712] systemd-journald[712]: Sent WATCHDOG=1 notification. [ 243.683786][ T123] INFO: task stress.sh:4365 blocked for more than 122 seconds. [ 243.683827][ T123] Not tainted 6.14.0-01103-g2df0c02dab82-dirty #8 [ 243.683833][ T123] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 243.683838][ T123] task:stress.sh state:D stack:28096 pid:4365 tgid:4365 ppid:4364 task_flags:0x400040 flags:0x00042000 [ 243.683852][ T123] Call Trace: [ 243.683857][ T123] [c00000000c38f690] [0000000000000001] 0x1 (unreliable) [ 243.683868][ T123] [c00000000c38f840] [c00000000001f908] __switch_to+0x318/0x4e0 [ 243.683878][ T123] [c00000000c38f8a0] [c000000001549a70] __schedule+0x500/0x12a0 [ 243.683888][ T123] [c00000000c38f9a0] [c00000000154a878] schedule+0x68/0x210 [ 243.683896][ T123] [c00000000c38f9d0] [c00000000154ac80] schedule_preempt_disabled+0x30/0x50 [ 243.683904][ T123] [c00000000c38fa00] [c00000000154dbb0] __mutex_lock+0x730/0x10f0 [ 243.683913][ T123] [c00000000c38fb10] [c000000001154d40] napi_enable+0x30/0x60 [ 243.683921][ T123] [c00000000c38fb40] [c000000000f4ae94] ibmveth_open+0x68/0x5dc [ 243.683928][ T123] [c00000000c38fbe0] [c000000000f4aa20] veth_pool_store+0x220/0x270 [ 243.683936][ T123] [c00000000c38fc70] [c000000000826278] sysfs_kf_write+0x68/0xb0 [ 243.683944][ T123] [c00000000c38fcb0] [c0000000008240b8] kernfs_fop_write_iter+0x198/0x2d0 [ 243.683951][ T123] [c00000000c38fd00] [c00000000071b9ac] vfs_write+0x34c/0x650 [ 243.683958][ T123] [c00000000c38fdc0] [c00000000071bea8] ksys_write+0x88/0x150 [ 243.683966][ T123] [c00000000c38fe10] [c0000000000317f4] system_call_exception+0x124/0x340 [ 243.683973][ T123] [c00000000c38fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec ... [ 243.684087][ T123] Showing all locks held in the system: [ 243.684095][ T123] 1 lock held by khungtaskd/123: [ 243.684099][ T123] #0: c00000000278e370 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x50/0x248 [ 243.684114][ T123] 4 locks held by stress.sh/4365: [ 243.684119][ T123] #0: c00000003a4cd3f8 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x88/0x150 [ 243.684132][ T123] #1: c000000041aea888 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x154/0x2d0 [ 243.684143][ T123] #2: c0000000366fb9a8 (kn->active#64){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x160/0x2d0 [ 243.684155][ T123] #3: c000000035ff4cb8 (&dev->lock){+.+.}-{3:3}, at: napi_enable+0x30/0x60 [ 243.684166][ T123] 5 locks held by stress.sh/4366: [ 243.684170][ T123] #0: c00000003a4cd3f8 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x88/0x150 [ 243.684183][ T123] #1: c00000000aee2288 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x154/0x2d0 [ 243.684194][ T123] #2: c0000000366f4ba8 (kn->active#64){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x160/0x2d0 [ 243.684205][ T123] #3: c000000035ff4cb8 (&dev->lock){+.+.}-{3:3}, at: napi_disable+0x30/0x60 [ 243.684216][ T123] #4: c0000003ff9bbf18 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0x138/0x12a0 From the ibmveth debug, two threads are calling veth_pool_store, which calls ibmveth_close and ibmveth_open. Here's the sequence: T4365 T4366 ----------------- ----------------- --------- veth_pool_store veth_pool_store ibmveth_close ibmveth_close napi_disable napi_disable ibmveth_open napi_enable <- HANG ibmveth_close calls napi_disable at the top and ibmveth_open calls napi_enable at the top. https://docs.kernel.org/networking/napi.html]] says The control APIs are not idempotent. Control API calls are safe against concurrent use of datapath APIs but an incorrect sequence of control API calls may result in crashes, deadlocks, or race conditions. For example, calling napi_disable() multiple times in a row will deadlock. In the normal open and close paths, rtnl_mutex is acquired to prevent other callers. This is missing from veth_pool_store. Use rtnl_mutex in veth_pool_store fixes these hangs. Signed-off-by: Dave Marquardt <davemarq@linux.ibm.com> Fixes: 860f242eb534 ("[PATCH] ibmveth change buffer pools dynamically") Reviewed-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250402154403.386744-1-davemarq@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04arcnet: Add NULL check in com20020pci_probe()Henry Martin
devm_kasprintf() returns NULL when memory allocation fails. Currently, com20020pci_probe() does not check for this case, which results in a NULL pointer dereference. Add NULL check after devm_kasprintf() to prevent this issue and ensure no resources are left allocated. Fixes: 6b17a597fc2f ("arcnet: restoring support for multiple Sohard Arcnet cards") Signed-off-by: Henry Martin <bsdhenrymartin@gmail.com> Link: https://patch.msgid.link/20250402135036.44697-1-bsdhenrymartin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04Merge branch 'ipv6-multipath-routing-fixes'Jakub Kicinski
Ido Schimmel says: ==================== ipv6: Multipath routing fixes This patchset contains two fixes for IPv6 multipath routing. See the commit messages for more details. ==================== Link: https://patch.msgid.link/20250402114224.293392-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04ipv6: Do not consider link down nexthops in path selectionIdo Schimmel
Nexthops whose link is down are not supposed to be considered during path selection when the "ignore_routes_with_linkdown" sysctl is set. This is done by assigning them a negative region boundary. However, when comparing the computed hash (unsigned) with the region boundary (signed), the negative region boundary is treated as unsigned, resulting in incorrect nexthop selection. Fix by treating the computed hash as signed. Note that the computed hash is always in range of [0, 2^31 - 1]. Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250402114224.293392-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04ipv6: Start path selection from the first nexthopIdo Schimmel
Cited commit transitioned IPv6 path selection to use hash-threshold instead of modulo-N. With hash-threshold, each nexthop is assigned a region boundary in the multipath hash function's output space and a nexthop is chosen if the calculated hash is smaller than the nexthop's region boundary. Hash-threshold does not work correctly if path selection does not start with the first nexthop. For example, if fib6_select_path() is always passed the last nexthop in the group, then it will always be chosen because its region boundary covers the entire hash function's output space. Fix this by starting the selection process from the first nexthop and do not consider nexthops for which rt6_score_route() provided a negative score. Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N") Reported-by: Stanislav Fomichev <stfomichev@gmail.com> Closes: https://lore.kernel.org/netdev/Z9RIyKZDNoka53EO@mini-arch/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250402114224.293392-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04usbnet:fix NPE during rx_completeYing Lu
Missing usbnet_going_away Check in Critical Path. The usb_submit_urb function lacks a usbnet_going_away validation, whereas __usbnet_queue_skb includes this check. This inconsistency creates a race condition where: A URB request may succeed, but the corresponding SKB data fails to be queued. Subsequent processes: (e.g., rx_complete → defer_bh → __skb_unlink(skb, list)) attempt to access skb->next, triggering a NULL pointer dereference (Kernel Panic). Fixes: 04e906839a05 ("usbnet: fix cyclical race on disconnect with work queue") Cc: stable@vger.kernel.org Signed-off-by: Ying Lu <luying1@xiaomi.com> Link: https://patch.msgid.link/4c9ef2efaa07eb7f9a5042b74348a67e5a3a7aea.1743584159.git.luying1@xiaomi.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04net: octeontx2: Handle XDP_ABORTED and XDP invalid as XDP_DROPLorenzo Bianconi
In the current implementation octeontx2 manages XDP_ABORTED and XDP invalid as XDP_PASS forwarding the skb to the networking stack. Align the behaviour to other XDP drivers handling XDP_ABORTED and XDP invalid as XDP_DROP. Please note this patch has just compile tested. Fixes: 06059a1a9a4a5 ("octeontx2-pf: Add XDP support to netdev PF") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250401-octeontx2-xdp-abort-fix-v1-1-f0587c35a0b9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04Merge tag 'x86-urgent-2025-04-04' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix a performance regression on AMD iGPU and dGPU drivers, related to the unintended activation of DMA bounce buffers that regressed game performance if KASLR disturbed things just enough - Fix a copy_user_generic() performance regression on certain older non-FSRM/ERMS CPUs - Fix a Clang build warning due to a semantic merge conflict the Kunit tree generated with the x86 tree - Fix FRED related system hang during S4 resume - Remove an unused API * tag 'x86-urgent-2025-04-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fred: Fix system hang during S4 resume with FRED enabled x86/platform/iosf_mbi: Remove unused iosf_mbi_unregister_pmic_bus_access_notifier() x86/mm/init: Handle the special case of device private pages in add_pages(), to not increase max_pfn and trigger dma_addressing_limited() bounce buffers x86/tools: Drop duplicate unlikely() definition in insn_decoder_test.c x86/uaccess: Improve performance by aligning writes to 8 bytes in copy_user_generic(), on non-FSRM/ERMS CPUs
2025-04-04Merge tag 'sound-fix-6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of device-specific fixes that have been gathered since the previous pull: - A few more HD-audio quirks and fixups - A series of Qualcomm AudioReach fixes - Various small fixes for ASoC rt5665, WSA, SOF and Cirrus" * tag 'sound-fix-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek: Fix built-in mic on another ASUS VivoBook model ALSA: hda/realtek - Support mute led function for HP platform ASoC: imx-card: Add NULL check in imx_card_probe() ASoC: codecs: rt5665: Fix some error handling paths in rt5665_probe() ASoC: q6apm-dai: make use of q6apm_get_hw_pointer ASoC: qdsp6: q6apm-dai: fix capture pipeline overruns. ASoC: qdsp6: q6apm-dai: set 10 ms period and buffer alignment. ASoC: q6apm: add q6apm_get_hw_pointer helper ASoC: q6apm-dai: schedule all available frames to avoid dsp under-runs ASoC: SOF: hda/ptl: Move mic privacy change notification sending to a work ALSA/hda: intel-sdw-acpi: Remove (explicitly) unused header ALSA: hda/realtek: Enable Mute LED on HP OMEN 16 Laptop xd000xx ALSA: hda/tas2781: Upgrade calibratd-data writing code to support Alpha and Beta dsp firmware ASoC: qdsp6: q6asm-dai: fix q6asm_dai_compr_set_params error path ALSA: hda/realtek: Fix built-in mic breakage on ASUS VivoBook X515JA ASoC: sma1307: Fix error handling in sma1307_setting_loaded() ASoC: codecs: wsa884x: Correct VI sense channel mask ASoC: codecs: wsa883x: Correct VI sense channel mask firmware: cs_dsp: Ensure cs_dsp_load[_coeff]() returns 0 on success
2025-04-04Merge tag 'omap-for-v6.14/drivers-signed' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap into soc/drivers-2 arm/omap: drivers: updates for v6.14 * tag 'omap-for-v6.14/drivers-signed' of https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap: Input: tsc2007 - accept standard properties
2025-04-04Merge tag 'soc_fsl-6.15-1' of https://github.com/chleroy/linux into ↵Arnd Bergmann
soc/drivers-2 FSL SOC Changes for 6.15: - irqdomain cleanups from Jiry - Add Ioana as Maintainer of fsl-mc bus and remove Laurentiu and Stuart - Remove deadcode from fsl-mc bus * tag 'soc_fsl-6.15-1' of https://github.com/chleroy/linux: bus: fsl-mc: Remove deadcode MAINTAINERS: add the linuppc-dev list to the fsl-mc bus entry MAINTAINERS: fix nonexistent dtbinding file name MAINTAINERS: add myself as maintainer for the fsl-mc bus irqdomain: soc: Switch to irq_find_mapping()
2025-04-03Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull dcache fixes from Al Viro: "Fixes for bugs caught as part of tree-in-dcache work. Mostly dentry refcount mishandling" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: hypfs_create_cpu_files(): add missing check for hypfs_mkdir() failure qibfs: fix _another_ leak spufs: fix a leak in spufs_create_context() spufs: fix gang directory lifetimes spufs: fix a leak on spufs_new_file() failure
2025-04-03Merge tag 'nf-25-04-03' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains Netfilter fixes for net: 1) conncount incorrectly removes element for non-dynamic sets, these elements represent a static control plane configuration, leave them in place. 2) syzbot found a way to unregister a basechain that has been never registered from the chain update path, fix from Florian Westphal. 3) Fix incorrect pointer arithmetics in geneve support for tunnel, from Lin Ma. * tag 'nf-25-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_tunnel: fix geneve_opt type confusion addition netfilter: nf_tables: don't unregister hook when table is dormant netfilter: nft_set_hash: GC reaps elements with conncount for dynamic sets only ==================== Link: https://patch.msgid.link/20250403115752.19608-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03Merge tag 'v6.15rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: "Four ksmbd SMB3 server fixes, all also for stable" * tag 'v6.15rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: fix null pointer dereference in alloc_preauth_hash() ksmbd: validate zero num_subauth before sub_auth is accessed ksmbd: fix overflow in dacloffset bounds check ksmbd: fix session use-after-free in multichannel connection
2025-04-03Merge tag 'trace-ringbuffer-v6.15-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer updates from Steven Rostedt: "Persistent buffer cleanups and simplifications. It was mistaken that the physical memory returned from "reserve_mem" had to be vmap()'d to get to it from a virtual address. But reserve_mem already maps the memory to the virtual address of the kernel so a simple phys_to_virt() can be used to get to the virtual address from the physical memory returned by "reserve_mem". With this new found knowledge, the code can be cleaned up and simplified. - Enforce that the persistent memory is page aligned As the buffers using the persistent memory are all going to be mapped via pages, make sure that the memory given to the tracing infrastructure is page aligned. If it is not, it will print a warning and fail to map the buffer. - Use phys_to_virt() to get the virtual address from reserve_mem Instead of calling vmap() on the physical memory returned from "reserve_mem", use phys_to_virt() instead. As the memory returned by "memmap" or any other means where a physical address is given to the tracing infrastructure, it still needs to be vmap(). Since this memory can never be returned back to the buddy allocator nor should it ever be memmory mapped to user space, flag this buffer and up the ref count. The ref count will keep it from ever being freed, and the flag will prevent it from ever being memory mapped to user space. - Use vmap_page_range() for memmap virtual address mapping For the memmap buffer, instead of allocating an array of struct pages, assigning them to the contiguous phsycial memory and then passing that to vmap(), use vmap_page_range() instead - Replace flush_dcache_folio() with flush_kernel_vmap_range() Instead of calling virt_to_folio() and passing that to flush_dcache_folio(), just call flush_kernel_vmap_range() directly. This also fixes a bug where if a subbuffer was bigger than PAGE_SIZE only the PAGE_SIZE portion would be flushed" * tag 'trace-ringbuffer-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ring-buffer: Use flush_kernel_vmap_range() over flush_dcache_folio() tracing: Use vmap_page_range() to map memmap ring buffer tracing: Have reserve_mem use phys_to_virt() and separate from memmap buffer tracing: Enforce the persistent ring buffer to be page aligned