summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-04-03net: dummy: request ops lockStanislav Fomichev
Even though dummy device doesn't really need an instance lock, a lot of selftests use dummy so it's useful to have extra expose to the instance lock on NIPA. Request the instance/ops locking. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250401163452.622454-7-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03netdevsim: add dummy device notifiersStanislav Fomichev
In order to exercise and verify notifiers' locking assumptions, register dummy notifiers (via register_netdevice_notifier_dev_net). Share notifier event handler that enforces the assumptions with lock_debug.c (rename and export rtnl_net_debug_event as netdev_debug_event). Add ops lock asserts to netdev_debug_event. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250401163452.622454-6-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03net: rename rtnl_net_debug to lock_debugStanislav Fomichev
And make it selected by CONFIG_DEBUG_NET. Don't rename any of the structs/functions. Next patch will use rtnl_net_debug_event in netdevsim. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250401163452.622454-5-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03net: use netif_disable_lro in ipv6_add_devStanislav Fomichev
ipv6_add_dev might call dev_disable_lro which unconditionally grabs instance lock, so it will deadlock during NETDEV_REGISTER. Switch to netif_disable_lro. Make sure all callers hold the instance lock as well. Cc: Cosmin Ratiu <cratiu@nvidia.com> Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250401163452.622454-4-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03net: hold instance lock during NETDEV_REGISTER/UPStanislav Fomichev
Callers of inetdev_init can come from several places with inconsistent expectation about netdev instance lock. Grab instance lock during REGISTER (plus UP). Also solve the inconsistency with UNREGISTER where it was locked only during move netns path. WARNING: CPU: 10 PID: 1479 at ./include/net/netdev_lock.h:54 __netdev_update_features+0x65f/0xca0 __warn+0x81/0x180 __netdev_update_features+0x65f/0xca0 report_bug+0x156/0x180 handle_bug+0x4f/0x90 exc_invalid_op+0x13/0x60 asm_exc_invalid_op+0x16/0x20 __netdev_update_features+0x65f/0xca0 netif_disable_lro+0x30/0x1d0 inetdev_init+0x12f/0x1f0 inetdev_event+0x48b/0x870 notifier_call_chain+0x38/0xf0 register_netdevice+0x741/0x8b0 register_netdev+0x1f/0x40 mlx5e_probe+0x4e3/0x8e0 [mlx5_core] auxiliary_bus_probe+0x3f/0x90 really_probe+0xc3/0x3a0 __driver_probe_device+0x80/0x150 driver_probe_device+0x1f/0x90 __device_attach_driver+0x7d/0x100 bus_for_each_drv+0x80/0xd0 __device_attach+0xb4/0x1c0 bus_probe_device+0x91/0xa0 device_add+0x657/0x870 Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reported-by: Cosmin Ratiu <cratiu@nvidia.com> Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250401163452.622454-3-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03net: switch to netif_disable_lro in inetdev_initStanislav Fomichev
Cosmin reports the following deadlock: dump_stack_lvl+0x62/0x90 print_deadlock_bug+0x274/0x3b0 __lock_acquire+0x1229/0x2470 lock_acquire+0xb7/0x2b0 __mutex_lock+0xa6/0xd20 dev_disable_lro+0x20/0x80 inetdev_init+0x12f/0x1f0 inetdev_event+0x48b/0x870 notifier_call_chain+0x38/0xf0 netif_change_net_namespace+0x72e/0x9f0 do_setlink.isra.0+0xd5/0x1220 rtnl_newlink+0x7ea/0xb50 rtnetlink_rcv_msg+0x459/0x5e0 netlink_rcv_skb+0x54/0x100 netlink_unicast+0x193/0x270 netlink_sendmsg+0x204/0x450 Switch to netif_disable_lro which assumes the caller holds the instance lock. inetdev_init is called for blackhole device (which sw device and doesn't grab instance lock) and from REGISTER/UNREGISTER notifiers. We already hold the instance lock for REGISTER notifier during netns change and we'll soon hold the lock during other paths. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reported-by: Cosmin Ratiu <cratiu@nvidia.com> Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250401163452.622454-2-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03net: airoha: Validate egress gdm port in airoha_ppe_foe_entry_prepare()Lorenzo Bianconi
Dev pointer in airoha_ppe_foe_entry_prepare routine is not strictly a device allocated by airoha_eth driver since it is an egress device and the flowtable can contain even wlan, pppoe or vlan devices. E.g: flowtable ft { hook ingress priority filter devices = { eth1, lan1, lan2, lan3, lan4, wlan0 } flags offload ^ | "not allocated by airoha_eth" -- } In this case airoha_get_dsa_port() will just return the original device pointer and we can't assume netdev priv pointer points to an airoha_gdm_port struct. Fix the issue validating egress gdm port in airoha_ppe_foe_entry_prepare routine before accessing net_device priv pointer. Fixes: 00a7678310fe ("net: airoha: Introduce flowtable offload support") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250401-airoha-validate-egress-gdm-port-v4-1-c7315d33ce10@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03net: dsa: mv88e6xxx: propperly shutdown PPU re-enable timer on destroyDavid Oberhollenzer
The mv88e6xxx has an internal PPU that polls PHY state. If we want to access the internal PHYs, we need to disable the PPU first. Because that is a slow operation, a 10ms timer is used to re-enable it, canceled with every access, so bulk operations effectively only disable it once and re-enable it some 10ms after the last access. If a PHY is accessed and then the mv88e6xxx module is removed before the 10ms are up, the PPU re-enable ends up accessing a dangling pointer. This especially affects probing during bootup. The MDIO bus and PHY registration may succeed, but registration with the DSA framework may fail later on (e.g. because the CPU port depends on another, very slow device that isn't done probing yet, returning -EPROBE_DEFER). In this case, probe() fails, but the MDIO subsystem may already have accessed the MIDO bus or PHYs, arming the timer. This is fixed as follows: - If probe fails after mv88e6xxx_phy_init(), make sure we also call mv88e6xxx_phy_destroy() before returning - In mv88e6xxx_remove(), make sure we do the teardown in the correct order, calling mv88e6xxx_phy_destroy() after unregistering the switch device. - In mv88e6xxx_phy_destroy(), destroy both the timer and the work item that the timer might schedule, synchronously waiting in case one of the callbacks already fired and destroying the timer first, before waiting for the work item. - Access to the PPU is guarded by a mutex, the worker acquires it with a mutex_trylock(), not proceeding with the expensive shutdown if that fails. We grab the mutex in mv88e6xxx_phy_destroy() to make sure the slow PPU shutdown is already done or won't even enter, when we wait for the work item. Fixes: 2e5f032095ff ("dsa: add support for the Marvell 88E6131 switch chip") Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/20250401135705.92760-1-david.oberhollenzer@sigma-star.at Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03MAINTAINERS: Update Loic Poulain's email addressLoic Poulain
Update Loic Poulain's email address to @oss.qualcomm.com. Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250401145344.10669-1-loic.poulain@oss.qualcomm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03ipv6: fix omitted netlink attributes when using RTEXT_FILTER_SKIP_STATSFernando Fernandez Mancera
Using RTEXT_FILTER_SKIP_STATS is incorrectly skipping non-stats IPv6 netlink attributes on link dump. This causes issues on userspace tools, e.g iproute2 is not rendering address generation mode as it should due to missing netlink attribute. Move the filling of IFLA_INET6_STATS and IFLA_INET6_ICMP6STATS to a helper function guarded by a flag check to avoid hitting the same situation in the future. Fixes: d5566fd72ec1 ("rtnetlink: RTEXT_FILTER_SKIP_STATS support to avoid dumping inet/inet6 stats") Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250402121751.3108-1-ffmancera@riseup.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03eth: bnxt: fix deadlock in the mgmt_opsTaehee Yoo
When queue is being reset, callbacks of mgmt_ops are called by netdev_nl_bind_rx_doit(). The netdev_nl_bind_rx_doit() first acquires netdev_lock() and then calls callbacks. So, mgmt_ops callbacks should not acquire netdev_lock() internaly. The bnxt_queue_{start | stop}() calls napi_{enable | disable}() but they internally acquire netdev_lock(). So, deadlock occurs. To avoid deadlock, napi_{enable | disable}_locked() should be used instead. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Fixes: cae03e5bdd9e ("net: hold netdev instance lock during queue operations") Link: https://patch.msgid.link/20250402133123.840173-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03net/selftests: Add loopback link local route for self-connectDmitry Safonov
self-connect-ipv6 got slightly flaky on netdev: > # timeout set to 120 > # selftests: net/tcp_ao: self-connect_ipv6 > # 1..5 > # # 708[lib/setup.c:250] rand seed 1742872572 > # TAP version 13 > # # 708[lib/proc.c:213] Snmp6 Ip6OutNoRoutes: 0 => 1 > # not ok 1 # error 708[self-connect.c:70] failed to connect() > # ok 2 No unexpected trace events during the test run > # # Planned tests != run tests (5 != 2) > # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:1 > ok 1 selftests: net/tcp_ao: self-connect_ipv6 I can not reproduce it on my machines, but judging by "Ip6OutNoRoutes" there is no route to the local_addr (::1). Looking at the kernel code, I see that kernel does add link-local address automatically in init_loopback(), but that is called from ipv6 notifier block. So, in turn the userspace that brought up the loopback interface may see rtnetlink ACK earlier than addrconf_notify() does it's job (at least, on a slow VM such as netdev). Probably, for ipv4 it's the same, judging by inetdev_event(). The fix is quite simple: set the link-local route straight after bringing the loopback interface. That will make it synchronous. Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250402-tcp-ao-selfconnect-flake-v1-1-8388d629ef3d@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03sfc: fix NULL dereferences in ef100_process_design_param()Edward Cree
Since cited commit, ef100_probe_main() and hence also ef100_check_design_params() run before efx->net_dev is created; consequently, we cannot netif_set_tso_max_size() or _segs() at this point. Move those netif calls to ef100_probe_netdev(), and also replace netif_err within the design params code with pci_err. Reported-by: Kyungwook Boo <bookyungwook@gmail.com> Fixes: 98ff4c7c8ac7 ("sfc: Separate netdev probe/remove from PCI probe/remove") Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250401225439.2401047-1-edward.cree@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03gve: handle overflow when reporting TX consumed descriptorsJoshua Washington
When the tx tail is less than the head (in cases of wraparound), the TX consumed descriptor statistic in DQ will be reported as UINT32_MAX - head + tail, which is incorrect. Mask the difference of head and tail according to the ring size when reporting the statistic. Cc: stable@vger.kernel.org Fixes: 2c9198356d56 ("gve: Add consumed counts to ethtool stats") Signed-off-by: Joshua Washington <joshwash@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250402001037.2717315-1-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03net: decrease cached dst counters in dst_releaseAntoine Tenart
Upstream fix ac888d58869b ("net: do not delay dst_entries_add() in dst_release()") moved decrementing the dst count from dst_destroy to dst_release to avoid accessing already freed data in case of netns dismantle. However in case CONFIG_DST_CACHE is enabled and OvS+tunnels are used, this fix is incomplete as the same issue will be seen for cached dsts: Unable to handle kernel paging request at virtual address ffff5aabf6b5c000 Call trace: percpu_counter_add_batch+0x3c/0x160 (P) dst_release+0xec/0x108 dst_cache_destroy+0x68/0xd8 dst_destroy+0x13c/0x168 dst_destroy_rcu+0x1c/0xb0 rcu_do_batch+0x18c/0x7d0 rcu_core+0x174/0x378 rcu_core_si+0x18/0x30 Fix this by invalidating the cache, and thus decrementing cached dst counters, in dst_release too. Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel") Signed-off-by: Antoine Tenart <atenart@kernel.org> Link: https://patch.msgid.link/20250326173634.31096-1-atenart@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-02tunnels: Accept PACKET_HOST in skb_tunnel_check_pmtu().Guillaume Nault
Because skb_tunnel_check_pmtu() doesn't handle PACKET_HOST packets, commit 30a92c9e3d6b ("openvswitch: Set the skbuff pkt_type for proper pmtud support.") forced skb->pkt_type to PACKET_OUTGOING for openvswitch packets that are sent using the OVS_ACTION_ATTR_OUTPUT action. This allowed such packets to invoke the iptunnel_pmtud_check_icmp() or iptunnel_pmtud_check_icmpv6() helpers and thus trigger PMTU update on the input device. However, this also broke other parts of PMTU discovery. Since these packets don't have the PACKET_HOST type anymore, they won't trigger the sending of ICMP Fragmentation Needed or Packet Too Big messages to remote hosts when oversized (see the skb_in->pkt_type condition in __icmp_send() for example). These two skb->pkt_type checks are therefore incompatible as one requires skb->pkt_type to be PACKET_HOST, while the other requires it to be anything but PACKET_HOST. It makes sense to not trigger ICMP messages for non-PACKET_HOST packets as these messages should be generated only for incoming l2-unicast packets. However there doesn't seem to be any reason for skb_tunnel_check_pmtu() to ignore PACKET_HOST packets. Allow both cases to work by allowing skb_tunnel_check_pmtu() to work on PACKET_HOST packets and not overriding skb->pkt_type in openvswitch anymore. Fixes: 30a92c9e3d6b ("openvswitch: Set the skbuff pkt_type for proper pmtud support.") Fixes: 4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets") Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Tested-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/eac941652b86fddf8909df9b3bf0d97bc9444793.1743208264.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02vsock: avoid timeout during connect() if the socket is closingStefano Garzarella
When a peer attempts to establish a connection, vsock_connect() contains a loop that waits for the state to be TCP_ESTABLISHED. However, the other peer can be fast enough to accept the connection and close it immediately, thus moving the state to TCP_CLOSING. When this happens, the peer in the vsock_connect() is properly woken up, but since the state is not TCP_ESTABLISHED, it goes back to sleep until the timeout expires, returning -ETIMEDOUT. If the socket state is TCP_CLOSING, waiting for the timeout is pointless. vsock_connect() can return immediately without errors or delay since the connection actually happened. The socket will be in a closing state, but this is not an issue, and subsequent calls will fail as expected. We discovered this issue while developing a test that accepts and immediately closes connections to stress the transport switch between two connect() calls, where the first one was interrupted by a signal (see Closes link). Reported-by: Luigi Leonardi <leonardi@redhat.com> Closes: https://lore.kernel.org/virtualization/bq6hxrolno2vmtqwcvb5bljfpb7mvwb3kohrvaed6auz5vxrfv@ijmd2f3grobn/ Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Luigi Leonardi <leonardi@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Link: https://patch.msgid.link/20250328141528.420719-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02Merge branch ↵Jakub Kicinski
'udp-fix-two-integer-overflows-when-sk-sk_rcvbuf-is-close-to-int_max' Kuniyuki Iwashima says: ==================== udp: Fix two integer overflows when sk->sk_rcvbuf is close to INT_MAX. I got a report that UDP mem usage in /proc/net/sockstat did not drop even after an application was terminated. The issue could happen if sk->sk_rmem_alloc wraps around due to a large sk->sk_rcvbuf, which was INT_MAX in our case. The patch 2 fixes the issue, and the patch 1 fixes yet another overflow I found while investigating the issue. v3: https://lore.kernel.org/20250327202722.63756-1-kuniyu@amazon.com v2: https://lore.kernel.org/20250325195826.52385-1-kuniyu@amazon.com v1: https://lore.kernel.org/20250323231016.74813-1-kuniyu@amazon.com ==================== Link: https://patch.msgid.link/20250401184501.67377-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02udp: Fix memory accounting leak.Kuniyuki Iwashima
Matt Dowling reported a weird UDP memory usage issue. Under normal operation, the UDP memory usage reported in /proc/net/sockstat remains close to zero. However, it occasionally spiked to 524,288 pages and never dropped. Moreover, the value doubled when the application was terminated. Finally, it caused intermittent packet drops. We can reproduce the issue with the script below [0]: 1. /proc/net/sockstat reports 0 pages # cat /proc/net/sockstat | grep UDP: UDP: inuse 1 mem 0 2. Run the script till the report reaches 524,288 # python3 test.py & sleep 5 # cat /proc/net/sockstat | grep UDP: UDP: inuse 3 mem 524288 <-- (INT_MAX + 1) >> PAGE_SHIFT 3. Kill the socket and confirm the number never drops # pkill python3 && sleep 5 # cat /proc/net/sockstat | grep UDP: UDP: inuse 1 mem 524288 4. (necessary since v6.0) Trigger proto_memory_pcpu_drain() # python3 test.py & sleep 1 && pkill python3 5. The number doubles # cat /proc/net/sockstat | grep UDP: UDP: inuse 1 mem 1048577 The application set INT_MAX to SO_RCVBUF, which triggered an integer overflow in udp_rmem_release(). When a socket is close()d, udp_destruct_common() purges its receive queue and sums up skb->truesize in the queue. This total is calculated and stored in a local unsigned integer variable. The total size is then passed to udp_rmem_release() to adjust memory accounting. However, because the function takes a signed integer argument, the total size can wrap around, causing an overflow. Then, the released amount is calculated as follows: 1) Add size to sk->sk_forward_alloc. 2) Round down sk->sk_forward_alloc to the nearest lower multiple of PAGE_SIZE and assign it to amount. 3) Subtract amount from sk->sk_forward_alloc. 4) Pass amount >> PAGE_SHIFT to __sk_mem_reduce_allocated(). When the issue occurred, the total in udp_destruct_common() was 2147484480 (INT_MAX + 833), which was cast to -2147482816 in udp_rmem_release(). At 1) sk->sk_forward_alloc is changed from 3264 to -2147479552, and 2) sets -2147479552 to amount. 3) reverts the wraparound, so we don't see a warning in inet_sock_destruct(). However, udp_memory_allocated ends up doubling at 4). Since commit 3cd3399dd7a8 ("net: implement per-cpu reserves for memory_allocated"), memory usage no longer doubles immediately after a socket is close()d because __sk_mem_reduce_allocated() caches the amount in udp_memory_per_cpu_fw_alloc. However, the next time a UDP socket receives a packet, the subtraction takes effect, causing UDP memory usage to double. This issue makes further memory allocation fail once the socket's sk->sk_rmem_alloc exceeds net.ipv4.udp_rmem_min, resulting in packet drops. To prevent this issue, let's use unsigned int for the calculation and call sk_forward_alloc_add() only once for the small delta. Note that first_packet_length() also potentially has the same problem. [0]: from socket import * SO_RCVBUFFORCE = 33 INT_MAX = (2 ** 31) - 1 s = socket(AF_INET, SOCK_DGRAM) s.bind(('', 0)) s.setsockopt(SOL_SOCKET, SO_RCVBUFFORCE, INT_MAX) c = socket(AF_INET, SOCK_DGRAM) c.connect(s.getsockname()) data = b'a' * 100 while True: c.send(data) Fixes: f970bd9e3a06 ("udp: implement memory accounting helpers") Reported-by: Matt Dowling <madowlin@amazon.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250401184501.67377-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02udp: Fix multiple wraparounds of sk->sk_rmem_alloc.Kuniyuki Iwashima
__udp_enqueue_schedule_skb() has the following condition: if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) goto drop; sk->sk_rcvbuf is initialised by net.core.rmem_default and later can be configured by SO_RCVBUF, which is limited by net.core.rmem_max, or SO_RCVBUFFORCE. If we set INT_MAX to sk->sk_rcvbuf, the condition is always false as sk->sk_rmem_alloc is also signed int. Then, the size of the incoming skb is added to sk->sk_rmem_alloc unconditionally. This results in integer overflow (possibly multiple times) on sk->sk_rmem_alloc and allows a single socket to have skb up to net.core.udp_mem[1]. For example, if we set a large value to udp_mem[1] and INT_MAX to sk->sk_rcvbuf and flood packets to the socket, we can see multiple overflows: # cat /proc/net/sockstat | grep UDP: UDP: inuse 3 mem 7956736 <-- (7956736 << 12) bytes > INT_MAX * 15 ^- PAGE_SHIFT # ss -uam State Recv-Q ... UNCONN -1757018048 ... <-- flipping the sign repeatedly skmem:(r2537949248,rb2147483646,t0,tb212992,f1984,w0,o0,bl0,d0) Previously, we had a boundary check for INT_MAX, which was removed by commit 6a1f12dd85a8 ("udp: relax atomic operation on sk->sk_rmem_alloc"). A complete fix would be to revert it and cap the right operand by INT_MAX: rmem = atomic_add_return(size, &sk->sk_rmem_alloc); if (rmem > min(size + (unsigned int)sk->sk_rcvbuf, INT_MAX)) goto uncharge_drop; but we do not want to add the expensive atomic_add_return() back just for the corner case. Casting rmem to unsigned int prevents multiple wraparounds, but we still allow a single wraparound. # cat /proc/net/sockstat | grep UDP: UDP: inuse 3 mem 524288 <-- (INT_MAX + 1) >> 12 # ss -uam State Recv-Q ... UNCONN -2147482816 ... <-- INT_MAX + 831 bytes skmem:(r2147484480,rb2147483646,t0,tb212992,f3264,w0,o0,bl0,d14468947) So, let's define rmem and rcvbuf as unsigned int and check skb->truesize only when rcvbuf is large enough to lower the overflow possibility. Note that we still have a small chance to see overflow if multiple skbs to the same socket are processed on different core at the same time and each size does not exceed the limit but the total size does. Note also that we must ignore skb->truesize for a small buffer as explained in commit 363dc73acacb ("udp: be less conservative with sock rmem accounting"). Fixes: 6a1f12dd85a8 ("udp: relax atomic operation on sk->sk_rmem_alloc") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250401184501.67377-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02rtnetlink: Use register_pernet_subsys() in rtnl_net_debug_init().Kuniyuki Iwashima
rtnl_net_debug_init() registers rtnl_net_debug_net_ops by register_pernet_device() but calls unregister_pernet_subsys() in case register_netdevice_notifier() fails. It corrupts pernet_list because first_device is updated in register_pernet_device() but not unregister_pernet_subsys(). Let's fix it by calling register_pernet_subsys() instead. The _subsys() one fits better for the use case because it keeps the notifier alive until default_device_exit_net(), giving it more chance to test NETDEV_UNREGISTER. Fixes: 03fa53485659 ("rtnetlink: Add ASSERT_RTNL_NET() placeholder for netdev notifier.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250401190716.70437-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02selftests: tc-testing: fix nat regex matchingPedro Tammela
In iproute 6.14, the nat ip mask logic was fixed to remove an undefined behaviour[1]. So now instead of reporting '0.0.0.0/32' on x86 and potentially '0.0.0.0/0' in other platforms, it reports '0.0.0.0/0' in all platforms. [1] https://lore.kernel.org/netdev/20250306112520.188728-1-torben.nielsen@prevas.dk/ Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Link: https://patch.msgid.link/20250401144908.568140-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02net: mvpp2: Prevent parser TCAM memory corruptionTobias Waldekranz
Protect the parser TCAM/SRAM memory, and the cached (shadow) SRAM information, from concurrent modifications. Both the TCAM and SRAM tables are indirectly accessed by configuring an index register that selects the row to read or write to. This means that operations must be atomic in order to, e.g., avoid spreading writes across multiple rows. Since the shadow SRAM array is used to find free rows in the hardware table, it must also be protected in order to avoid TOCTOU errors where multiple cores allocate the same row. This issue was detected in a situation where `mvpp2_set_rx_mode()` ran concurrently on two CPUs. In this particular case the MVPP2_PE_MAC_UC_PROMISCUOUS entry was corrupted, causing the classifier unit to drop all incoming unicast - indicated by the `rx_classifier_drops` counter. Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250401065855.3113635-1-tobias@waldekranz.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02eth: mlx4: select PAGE_POOLGreg Thelen
With commit 8533b14b3d65 ("eth: mlx4: create a page pool for Rx") mlx4 started using functions guarded by PAGE_POOL. This change introduced build errors when CONFIG_MLX4_EN is set but CONFIG_PAGE_POOL is not: ld: vmlinux.o: in function `mlx4_en_alloc_frags': en_rx.c:(.text+0xa5eaf9): undefined reference to `page_pool_alloc_pages' ld: vmlinux.o: in function `mlx4_en_create_rx_ring': (.text+0xa5ee91): undefined reference to `page_pool_create' Make MLX4_EN select PAGE_POOL to fix the ml;x4 build errors. Fixes: 8533b14b3d65 ("eth: mlx4: create a page pool for Rx") Signed-off-by: Greg Thelen <gthelen@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250401015315.2306092-1-gthelen@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02MAINTAINERS: update Open vSwitch maintainersJakub Kicinski
Pravin has not been active for a while, missingmaints reports: Subsystem OPENVSWITCH Changes 138 / 253 (54%) (No activity) Top reviewers: [41]: aconole@redhat.com [31]: horms@kernel.org [23]: echaudro@redhat.com [8]: fw@strlen.de [6]: i.maximets@ovn.org INACTIVE MAINTAINER Pravin B Shelar <pshelar@ovn.org> Let's elevate Aaron, Eelco and Ilya to the status of maintainers. Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250401001520.2080231-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02bpf: add missing ops lock around dev_xdp_attach_linkStanislav Fomichev
Syzkaller points out that create_link path doesn't grab ops lock, add it. Reported-by: syzbot+08936936fe8132f91f1a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/67e6b3e8.050a0220.2f068f.0079.GAE@google.com/ Fixes: 97246d6d21c2 ("net: hold netdev instance lock during ndo_bpf") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250331142814.1887506-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02net: airoha: Fix ETS priomap validationLorenzo Bianconi
ETS Qdisc schedules SP bands in a priority order assigning band-0 the highest priority (band-0 > band-1 > .. > band-n) while EN7581 arranges SP bands in a priority order assigning band-7 the highest priority (band-7 > band-6, .. > band-n). Fix priomap check in airoha_qdma_set_tx_ets_sched routine in order to align ETS Qdisc and airoha_eth driver SP priority ordering. Fixes: b56e4d660a96 ("net: airoha: Enforce ETS Qdisc priomap") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Link: https://patch.msgid.link/20250331-airoha-ets-validate-priomap-v1-1-60a524488672@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02net: airoha: Fix qid report in airoha_tc_get_htb_get_leaf_queue()Lorenzo Bianconi
Fix the following kernel warning deleting HTB offloaded leafs and/or root HTB qdisc in airoha_eth driver properly reporting qid in airoha_tc_get_htb_get_leaf_queue routine. $tc qdisc replace dev eth1 root handle 10: htb offload $tc class add dev eth1 arent 10: classid 10:4 htb rate 100mbit ceil 100mbit $tc qdisc replace dev eth1 parent 10:4 handle 4: ets bands 8 \ quanta 1514 3028 4542 6056 7570 9084 10598 12112 $tc qdisc del dev eth1 root [ 55.827864] ------------[ cut here ]------------ [ 55.832493] WARNING: CPU: 3 PID: 2678 at 0xffffffc0798695a4 [ 55.956510] CPU: 3 PID: 2678 Comm: tc Tainted: G O 6.6.71 #0 [ 55.963557] Hardware name: Airoha AN7581 Evaluation Board (DT) [ 55.969383] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 55.976344] pc : 0xffffffc0798695a4 [ 55.979851] lr : 0xffffffc079869a20 [ 55.983358] sp : ffffffc0850536a0 [ 55.986665] x29: ffffffc0850536a0 x28: 0000000000000024 x27: 0000000000000001 [ 55.993800] x26: 0000000000000000 x25: ffffff8008b19000 x24: ffffff800222e800 [ 56.000935] x23: 0000000000000001 x22: 0000000000000000 x21: ffffff8008b19000 [ 56.008071] x20: ffffff8002225800 x19: ffffff800379d000 x18: 0000000000000000 [ 56.015206] x17: ffffffbf9ea59000 x16: ffffffc080018000 x15: 0000000000000000 [ 56.022342] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000001 [ 56.029478] x11: ffffffc081471008 x10: ffffffc081575a98 x9 : 0000000000000000 [ 56.036614] x8 : ffffffc08167fd40 x7 : ffffffc08069e104 x6 : ffffff8007f86000 [ 56.043748] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000001 [ 56.050884] x2 : 0000000000000000 x1 : 0000000000000250 x0 : ffffff800222c000 [ 56.058020] Call trace: [ 56.060459] 0xffffffc0798695a4 [ 56.063618] 0xffffffc079869a20 [ 56.066777] __qdisc_destroy+0x40/0xa0 [ 56.070528] qdisc_put+0x54/0x6c [ 56.073748] qdisc_graft+0x41c/0x648 [ 56.077324] tc_get_qdisc+0x168/0x2f8 [ 56.080978] rtnetlink_rcv_msg+0x230/0x330 [ 56.085076] netlink_rcv_skb+0x5c/0x128 [ 56.088913] rtnetlink_rcv+0x14/0x1c [ 56.092490] netlink_unicast+0x1e0/0x2c8 [ 56.096413] netlink_sendmsg+0x198/0x3c8 [ 56.100337] ____sys_sendmsg+0x1c4/0x274 [ 56.104261] ___sys_sendmsg+0x7c/0xc0 [ 56.107924] __sys_sendmsg+0x44/0x98 [ 56.111492] __arm64_sys_sendmsg+0x20/0x28 [ 56.115580] invoke_syscall.constprop.0+0x58/0xfc [ 56.120285] do_el0_svc+0x3c/0xbc [ 56.123592] el0_svc+0x18/0x4c [ 56.126647] el0t_64_sync_handler+0x118/0x124 [ 56.131005] el0t_64_sync+0x150/0x154 [ 56.134660] ---[ end trace 0000000000000000 ]--- Fixes: ef1ca9271313b ("net: airoha: Add sched HTB offload support") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/20250331-airoha-htb-qdisc-offload-del-fix-v1-1-4ea429c2c968@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02sctp: add mutual exclusion in proc_sctp_do_udp_port()Eric Dumazet
We must serialize calls to sctp_udp_sock_stop() and sctp_udp_sock_start() or risk a crash as syzbot reported: Oops: general protection fault, probably for non-canonical address 0xdffffc000000000d: 0000 [#1] SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000068-0x000000000000006f] CPU: 1 UID: 0 PID: 6551 Comm: syz.1.44 Not tainted 6.14.0-syzkaller-g7f2ff7b62617 #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025 RIP: 0010:kernel_sock_shutdown+0x47/0x70 net/socket.c:3653 Call Trace: <TASK> udp_tunnel_sock_release+0x68/0x80 net/ipv4/udp_tunnel_core.c:181 sctp_udp_sock_stop+0x71/0x160 net/sctp/protocol.c:930 proc_sctp_do_udp_port+0x264/0x450 net/sctp/sysctl.c:553 proc_sys_call_handler+0x3d0/0x5b0 fs/proc/proc_sysctl.c:601 iter_file_splice_write+0x91c/0x1150 fs/splice.c:738 do_splice_from fs/splice.c:935 [inline] direct_splice_actor+0x18f/0x6c0 fs/splice.c:1158 splice_direct_to_actor+0x342/0xa30 fs/splice.c:1102 do_splice_direct_actor fs/splice.c:1201 [inline] do_splice_direct+0x174/0x240 fs/splice.c:1227 do_sendfile+0xafd/0xe50 fs/read_write.c:1368 __do_sys_sendfile64 fs/read_write.c:1429 [inline] __se_sys_sendfile64 fs/read_write.c:1415 [inline] __x64_sys_sendfile64+0x1d8/0x220 fs/read_write.c:1415 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] Fixes: 046c052b475e ("sctp: enable udp tunneling socks") Reported-by: syzbot+fae49d997eb56fa7c74d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/67ea5c01.050a0220.1547ec.012b.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20250331091532.224982-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02Merge branch 'net_sched-skbprio-remove-overly-strict-queue-assertions'Jakub Kicinski
Cong Wang says: ==================== net_sched: skbprio: Remove overly strict queue assertions ==================== Link: https://patch.msgid.link/20250329222536.696204-1-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02selftests: tc-testing: Add TBF with SKBPRIO queue length corner case testCong Wang
Add a test case to validate the interaction between TBF and SKBPRIO queueing disciplines, specifically targeting queue length accounting corner cases. This test complements the fix for the queue length accounting issue in the SKBPRIO qdisc. This is still best-effort, as timing and manipulating enqueue and dequeue from user-space is very hard. Cc: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/20250329222536.696204-3-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02net_sched: skbprio: Remove overly strict queue assertionsCong Wang
In the current implementation, skbprio enqueue/dequeue contains an assertion that fails under certain conditions when SKBPRIO is used as a child qdisc under TBF with specific parameters. The failure occurs because TBF sometimes peeks at packets in the child qdisc without actually dequeuing them when tokens are unavailable. This peek operation creates a discrepancy between the parent and child qdisc queue length counters. When TBF later receives a high-priority packet, SKBPRIO's queue length may show a different value than what's reflected in its internal priority queue tracking, triggering the assertion. The fix removes this overly strict assertions in SKBPRIO, they are not necessary at all. Reported-by: syzbot+a3422a19b05ea96bee18@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a3422a19b05ea96bee18 Fixes: aea5f654e6b7 ("net/sched: add skbprio scheduler") Cc: Nishanth Devarajan <ndev2021@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/20250329222536.696204-2-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-02netlabel: Fix NULL pointer exception caused by CALIPSO on IPv4 socketsDebin Zhu
When calling netlbl_conn_setattr(), addr->sa_family is used to determine the function behavior. If sk is an IPv4 socket, but the connect function is called with an IPv6 address, the function calipso_sock_setattr() is triggered. Inside this function, the following code is executed: sk_fullsock(__sk) ? inet_sk(__sk)->pinet6 : NULL; Since sk is an IPv4 socket, pinet6 is NULL, leading to a null pointer dereference. This patch fixes the issue by checking if inet6_sk(sk) returns a NULL pointer before accessing pinet6. Signed-off-by: Debin Zhu <mowenroot@163.com> Signed-off-by: Bitao Ouyang <1985755126@qq.com> Acked-by: Paul Moore <paul@paul-moore.com> Fixes: ceba1832b1b2 ("calipso: Set the calipso socket label to match the secattr.") Link: https://patch.msgid.link/20250401124018.4763-1-mowenroot@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-01Merge tag 'net-6.15-rc0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Rather tiny pull request, mostly so that we can get into our trees your fix to the x86 Makefile. Current release - regressions: - Revert "tcp: avoid atomic operations on sk->sk_rmem_alloc", error queue accounting was missed Current release - new code bugs: - 5 fixes for the netdevice instance locking work Previous releases - regressions: - usbnet: restore usb%d name exception for local mac addresses Previous releases - always broken: - rtnetlink: allocate vfinfo size for VF GUIDs when supported, avoid spurious GET_LINK failures - eth: mana: Switch to page pool for jumbo frames - phy: broadcom: Correct BCM5221 PHY model detection Misc: - selftests: drv-net: replace helpers for referring to other files" * tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (22 commits) Revert "tcp: avoid atomic operations on sk->sk_rmem_alloc" bnxt_en: bring back rtnl lock in bnxt_shutdown eth: gve: add missing netdev locks on reset and shutdown paths selftests: mptcp: ignore mptcp_diag binary selftests: mptcp: close fd_in before returning in main_loop selftests: mptcp: fix incorrect fd checks in main_loop mptcp: fix NULL pointer in can_accept_new_subflow octeontx2-af: Free NIX_AF_INT_VEC_GEN irq octeontx2-af: Fix mbox INTR handler when num VFs > 64 net: fix use-after-free in the netdev_nl_sock_priv_destroy() selftests: net: use Path helpers in ping selftests: net: use the dummy bpf from net/lib selftests: drv-net: replace the rpath helper with Path objects net: lapbether: use netdev_lockdep_set_classes() helper net: phy: broadcom: Correct BCM5221 PHY model detection net: usb: usbnet: restore usb%d name exception for local mac addresses net/mlx5e: SHAMPO, Make reserved size independent of page size net: mana: Switch to page pool for jumbo frames MAINTAINERS: Add dedicated entries for phy_link_topology net: move replay logic to tc_modify_qdisc ...
2025-04-01Merge tag 'vfio-v6.15-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO updates from Alex Williamson: - Relax IGD support code to match display class device rather than specifically requiring a VGA device (Tomita Moeko) - Accelerate DMA mapping of device MMIO by iterating at PMD and PUD levels to take advantage of huge pfnmap support added in v6.12 (Alex Williamson) - Extend virtio vfio-pci variant driver to include migration support for block devices where enabled by the PF (Yishai Hadas) - Virtualize INTx PIN register for devices where the platform does not route legacy PCI interrupts for the device and the interrupt is reported as IRQ_NOTCONNECTED (Alex Williamson) * tag 'vfio-v6.15-rc1' of https://github.com/awilliam/linux-vfio: vfio/pci: Handle INTx IRQ_NOTCONNECTED vfio/virtio: Enable support for virtio-block live migration vfio/type1: Use mapping page mask for pfnmaps mm: Provide address mask in struct follow_pfnmap_args vfio/type1: Use consistent types for page counts vfio/type1: Use vfio_batch for vaddr_get_pfns() vfio/type1: Convert all vaddr_get_pfns() callers to use vfio_batch vfio/type1: Catch zero from pin_user_pages_remote() vfio/pci: match IGD devices in display controller class
2025-04-01Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "A small number of improvements all over the place: - shutdown has been reworked to reset devices - virtio fs is now allowed in vduse - vhost-scsi memory use has been reduced - cleanups, fixes all over the place A couple more fixes are being tested and will be merged after rc1" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost-scsi: Reduce response iov mem use vhost-scsi: Allocate iov_iter used for unaligned copies when needed vhost-scsi: Stop duplicating se_cmd fields vhost-scsi: Dynamically allocate scatterlists vhost-scsi: Return queue full for page alloc failures during copy vhost-scsi: Add better resource allocation failure handling vhost-scsi: Allocate T10 PI structs only when enabled vhost-scsi: Reduce mem use by moving upages to per queue vduse: add virtio_fs to allowed dev id sound/virtio: Fix cancel_sync warnings on uninitialized work_structs vdpa/mlx5: Fix oversized null mkey longer than 32bit vdpa/mlx5: Fix mlx5_vdpa_get_config() endianness on big-endian machines vhost-scsi: Fix handling of multiple calls to vhost_scsi_set_endpoint tools: virtio/linux/module.h add MODULE_DESCRIPTION() define. tools: virtio/linux/compiler.h: Add data_race() define. tools/virtio: Add DMA_MAPPING_ERROR and sg_dma_len api define for virtio test virtio: break and reset virtio devices on device_shutdown()
2025-04-01Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Two significant new items: - Allow reporting IOMMU HW events to userspace when the events are clearly linked to a device. This is linked to the VIOMMU object and is intended to be used by a VMM to forward HW events to the virtual machine as part of emulating a vIOMMU. ARM SMMUv3 is the first driver to use this mechanism. Like the existing fault events the data is delivered through a simple FD returning event records on read(). - PASID support in VFIO. The "Process Address Space ID" is a PCI feature that allows the device to tag all PCI DMA operations with an ID. The IOMMU will then use the ID to select a unique translation for those DMAs. This is part of Intel's vIOMMU support as VT-D HW requires the hypervisor to manage each PASID entry. The support is generic so any VFIO user could attach any translation to a PASID, and the support should work on ARM SMMUv3 as well. AMD requires additional driver work. Some minor updates, along with fixes: - Prevent using nested parents with fault's, no driver support today - Put a single "cookie_type" value in the iommu_domain to indicate what owns the various opaque owner fields" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (49 commits) iommufd: Test attach before detaching pasid iommufd: Fix iommu_vevent_header tables markup iommu: Convert unreachable() to BUG() iommufd: Balance veventq->num_events inc/dec iommufd: Initialize the flags of vevent in iommufd_viommu_report_event() iommufd/selftest: Add coverage for reporting max_pasid_log2 via IOMMU_HW_INFO iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices ida: Add ida_find_first_range() iommufd/selftest: Add coverage for iommufd pasid attach/detach iommufd/selftest: Add test ops to test pasid attach/detach iommufd/selftest: Add a helper to get test device iommufd/selftest: Add set_dev_pasid in mock iommu iommufd: Allow allocating PASID-compatible domain iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support iommufd: Enforce PASID-compatible domain for RID iommufd: Support pasid attach/replace iommufd: Enforce PASID-compatible domain in PASID path iommufd/device: Add pasid_attach array to track per-PASID attach ...
2025-04-01Merge tag 'edac_urgent_for_v6.15_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC documentation fix from Borislav Petkov: - A single fix making sure the EDAC subtree is included in the documentation table of contents * tag 'edac_urgent_for_v6.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: Documentation/EDAC: Fix warning document isn't included in any toctree
2025-04-01Merge tag 'thermal-6.15-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more thermal control updates from Rafael Wysocki: "These are mostly assorted updates of thermal drivers used on ARM platforms: - Use dev_err_probe() helpers to simplify the init code in the Qoriq thermal driver (Frank Li) - Power down the Qoriq's TMU at suspend time (Alice Guo) - Add ipq5332, ipq5424 compatible to the QCom's tsens thermal driver and TSENS enable / calibration support for V2 (Praveenkumar I) - Add missing rk3328 mapping entry (Trevor Woerner) - Remove duplicate struct declaration from the thermal core header file (Xueqin Luo) - Disable the monitoring mode during suspend in the LVTS Mediatek driver to prevent temperature acquisition glitches (Nícolas F. R. A. Prado) - Disable Stage 3 thermal threshold in the LVTS Mediatek driver because it disables the suspend ability and does not have an interrupt handler (Nícolas F. R. A. Prado) - Fix low temperature offset interrupt in the LVTS Mediatek driver to prevent multiple interrupts from triggering when the system is at its normal functionning temperature (Nícolas F. R. A. Prado) - Enable interrupts in the LVTS Mediatek driver only on sensors that are in use (Nícolas F. R. A. Prado) - Add the BCM74110 compatible DT binding and the corresponding code to support a chip based on a different process node than previous chips (Florian Fainelli) - Correct indentation and style in DTS example (Krzysztof Kozlowski) - Unify hexadecimal annotatation in the rcar_gen3 driver (Niklas Söderlund) - Factor out the code logic to read fuses on Gen3 and Gen4 in the rcar_gen3 thermal driver (Niklas Söderlund) - Drop unused driver data from the QCom's spmi temperature alarm driver (Johan Hovold)" * tag 'thermal-6.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal/drivers/qcom-spmi-temp-alarm: Drop unused driver data thermal: rcar_gen3: Reuse logic to read fuses on Gen3 and Gen4 thermal: rcar_gen3: Use lowercase hex constants dt-bindings: thermal: Correct indentation and style in DTS example thermal/drivers/brcmstb_thermal: Add support for BCM74110 dt-bindings: thermal: Update for BCM74110 thermal/drivers/mediatek/lvts: Only update IRQ enable for valid sensors thermal/drivers/mediatek/lvts: Start sensor interrupts disabled thermal/drivers/mediatek/lvts: Disable low offset IRQ for minimum threshold thermal/drivers/mediatek/lvts: Disable Stage 3 thermal threshold thermal/drivers/mediatek/lvts: Disable monitor mode during suspend thermal: core: Remove duplicate struct declaration thermal/drivers/rockchip: Add missing rk3328 mapping entry thermal/drivers/tsens: Add TSENS enable and calibration support for V2 dt-bindings: thermal: tsens: Add ipq5332, ipq5424 compatible thermal/drivers/qoriq: Power down TMU on system suspend thermal/drivers/qoriq: Use dev_err_probe() simplify the code
2025-04-01Merge tag 'i3c/for-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux Pull i3c updates from Alexandre Belloni: "The silvaco driver gets support for the integration of the IP in the Nuvoton npcm845 SoC. There is also a fix for a possible NULL pointer dereference that can happen with early IBIs. Summary: Core: - Fix a possible NULL pointer dereference due to IBI coming when the target driver is not yet probed. Drivers: - mipi-i3c-hci: Use I2C DMA-safe api - svc: add Nuvoton npcm845 support" * tag 'i3c/for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: i3c: Add NULL pointer check in i3c_master_queue_ibi() i3c: master: Drop duplicate check before calling OF APIs i3c: master: svc: Fix implicit fallthrough in svc_i3c_master_ibi_work() i3c: master: svc: Fix missing STOP for master request i3c: master: svc: Use readsb helper for reading MDB i3c: master: svc: Fix missing the IBI rules i3c: master: svc: Fix i3c_master_get_free_addr return check i3c: master: svc: Fix npcm845 DAA process corruption i3c: master: svc: Fix npcm845 invalid slvstart event i3c: master: svc: Fix npcm845 FIFO empty issue i3c: master: svc: Add support for Nuvoton npcm845 i3c dt-bindings: i3c: silvaco: Add npcm845 compatible string dt-bindings: i3c: dw: Add power-domains i3c: master: svc: Flush FIFO before sending Dynamic Address Assignment(DAA) i3c: mipi-i3c-hci: Use I2C DMA-safe api i3c: Remove the const qualifier from i2c_msg pointer in i2c_xfers API MAINTAINERS: Add Frank Li to Silvaco I3C MAINTAINERS: Remove Conor Culhane from Silvaco I3C
2025-04-01Merge tag 'linux-watchdog-6.15-rc1' of ↵Linus Torvalds
git://www.linux-watchdog.org/linux-watchdog Pull watchdog updates from Wim Van Sebroeck: - Add watchdog driver for Lenovo SE30 platform - Add support for Allwinner A523 - Add i.MX94 support - watchdog framework: Convert to use device property - renesas,wdt: Document RZ/G3E support - Various other fixes and improvemenents * tag 'linux-watchdog-6.15-rc1' of git://www.linux-watchdog.org/linux-watchdog: watchdog: sunxi_wdt: Add support for Allwinner A523 dt-bindings: watchdog: sunxi: add Allwinner A523 compatible string watchdog: aspeed: fix 64-bit division watchdog: npcm: Remove unnecessary NULL check before clk_prepare_enable/clk_disable_unprepare dt-bindings: watchdog: renesas,wdt: Document RZ/G3E support watchdog: Convert to use device property watchdog: lenovo_se30_wdt: include io.h for devm_ioremap() dt-bindings: watchdog: fsl-imx7ulp-wdt: Add i.MX94 support watchdog: nic7018_wdt: tidy up ACPI ID table watchdog: s3c2410_wdt: Fix PMU register bits for ExynosAutoV920 SoC watchdog: lenovo_se30_wdt: Watchdog driver for Lenovo SE30 platform watchdog: Enable RZV2HWDT driver depend on ARCH_RENESAS watchdog: cros-ec: Add newlines to printks watchdog: aspeed: Update bootstatus handling
2025-04-01Merge tag 'i2c-for-6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: "i2c-core updates (collected by Wolfram): - remove last user and unexport i2c_of_match_device() - irq usage cleanup from Jiri i2c-host updates (collected by Andi): Refactoring and cleanups: - octeon, cadence, i801, pasemi, mlxbf, bcm-iproc: general refactorings - octeon: remove 10-bit address support Improvements: - amd-asf: improved error handling - designware: use guard(mutex) - amd-asf, designware: update naming to follow latest specs - cadence: fix cleanup path in probe - i801: use MMIO and I/O mapping helpers to access registers - pxa: handle error after clk_prepare_enable New features: - added i2c_10bit_addr_*_from_msg() and updated multiple drivers - omap: added multiplexer state handling - qcom-geni: update frequency configuration - qup: introduce DMA usage policy New hardware support: - exynos: add support for Samsung exynos7870 - k1: add support for spacemit k1 (new driver) - imx: add support for i.mx94 lpi2c - rk3x: add support for rk3562 - designware: add support for Renesas RZ/N1D Multiplexers: - ltc4306, reg: fix assignment in platform_driver structure at24 eeprom updates (collected by Bartosz): - add two new compatible entries to the DT binding document - drop of_match_ptr() and ACPI_PTR() macros" * tag 'i2c-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (50 commits) dt-bindings: i2c: snps,designware-i2c: describe Renesas RZ/N1D variant irqdomain: i2c: Switch to irq_find_mapping() i2c: iproc: Refactor prototype and remove redundant error checks i2c: qcom-geni: Update i2c frequency table to match hardware guidance i2c: mlxbf: Use readl_poll_timeout_atomic() for polling i2c: pasemi: Add registers bits and switch to BIT() i2c: k1: Initialize variable before use i2c: spacemit: add support for SpacemiT K1 SoC dt-bindings: i2c: spacemit: add support for K1 SoC i2c: omap: Add support for setting mux dt-bindings: i2c: omap: Add mux-states property i2c: octeon: remove 10-bit addressing support i2c: octeon: fix return commenting i2c: i801: Use MMIO if available i2c: i801: Switch to iomapped register access i2c: i801: Improve too small kill wait time in i801_check_post i2c: i801: Move i801_wait_intr and i801_wait_byte_done in the code i2c: i801: Cosmetic improvements i2c: cadence: Move reset_control_assert after pm_runtime_set_suspended in probe error path i2c: cadence: Simplify using devm_clk_get_enabled() ...
2025-04-01Documentation/EDAC: Fix warning document isn't included in any toctreeShiju Jose
Fix the build (htmldocs) warning: Documentation/edac/index.rst: WARNING: document isn't included in any toctree. Fixes: db99ea5f2c03 ("EDAC: Add support for EDAC device features control") Closes: https://lore.kernel.org/all/20250228185102.15842f8b@canb.auug.org.au/ Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20250401115823.573-1-shiju.jose@huawei.com
2025-04-01Merge tag 'dmaengine-6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine updates from Vinod Koul: "The dmaengine subsystem updates for this cycle consist of a new driver (Microchip) along with couple of yaml binding conversions, core api updates and bunch of driver updates etc. New HW support: - Microchip sama7d65 dma controller - Yaml conversion of atmel dma binding and Freescale Elo DMA Controller binding Core: - Remove device_prep_dma_imm_data() API as users are removed - Reduce scope of some less frequently used DMA request channel APIs with aim to cleanup these in future Updates: - Drop Fenghua Yu from idxd maintainers, as he changed jobs - AMD ptdma support for multiqueue and ae4dma deprecated PCI IDs removal" * tag 'dmaengine-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (29 commits) dmaengine: ptdma: Utilize the AE4DMA engine's multi-queue functionality dmaengine: ae4dma: Use the MSI count and its corresponding IRQ number dmaengine: ae4dma: Remove deprecated PCI IDs dmaengine: Remove device_prep_dma_imm_data from struct dma_device dmaengine: ti: edma: support sw triggered chans in of_edma_xlate() dmaengine: ti: k3-udma: Enable second resource range for BCDMA and PKTDMA dmaengine: fsl-edma: free irq correctly in remove path dmaengine: fsl-edma: cleanup chan after dma_async_device_unregister dt-bindings: dma: snps,dw-axi-dmac: Allow devices to be marked as noncoherent dmaengine: dmatest: Fix dmatest waiting less when interrupted dt-bindings: dma: Convert fsl,elo*-dma to YAML dt-bindings: dma: fsl-mxs-dma: Add compatible string for i.MX8 chips dmaengine: Fix typo in comment dmaengine: ti: k3-udma-glue: Drop skip_fdq argument from k3_udma_glue_reset_rx_chn dmaengine: bcm2835-dma: fix warning when CONFIG_PM=n dt-bindings: dma: fsl,edma: Add i.MX94 support dt-bindings: dma: atmel: add microchip,sama7d65-dma dmaengine: img-mdc: remove incorrect of_match_ptr annotation dmaengine: idxd: Delete unnecessary NULL check dmaengine: pxa: Enable compile test ...
2025-04-01Merge tag 'phy-for-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy Pull phy updates from Vinod Koul: "A fairly moderate sized request for the generic phy subsystem with some new device and driver support along with driver updates with Samsung and Qualcomm ones being major ones. New HW Support: - Qualcomm X1P42100 PCIe Gen4x4, QCS615 qmp usbc, PCIe UNIPHY 28LP driver, SM8750 QMP UFS PHY - Rockchip rk3576 hdptx, rk3562 naneng-combo support - Samsung MIPI D-/C-PHY driver, ExynosAutov920 ufs phy driver Updates: - Samsung USB3 Type-C lane orientation detection and configuration for Google gs101 - Qualcomm support for dual lane PHY support for QCS8300 SoC" * tag 'phy-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (47 commits) phy: rockchip-naneng-combo: Support rk3562 dt-bindings: phy: rockchip: Add rk3562 naneng-combophy compatible phy: rockchip: Add Samsung MIPI D-/C-PHY driver dt-bindings: phy: Add Rockchip MIPI C-/D-PHY schema phy: qcom: uniphy-28lp: add COMMON_CLK dependency phy: rockchip: usbdp: Remove unnecessary bool conversion phy: rockchip: usbdp: Avoid call hpd_event_trigger in dp_phy_init phy: rockchip: usbdp: Only verify link rates/lanes/voltage when the corresponding set flags are set phy: qcom-qmp-pcie: add dual lane PHY support for QCS8300 dt-bindings: phy: qcom,sc8280xp-qmp-pcie-phy: Document the QCS8300 QMP PCIe PHY Gen4 x2 phy: qcom-qmp-ufs: Add PHY Configuration support for sm8750 dt-bindings: phy: qcom,sc8280xp-qmp-ufs-phy: document the SM8750 QMP UFS PHY phy: qcom: Introduce PCIe UNIPHY 28LP driver dt-bindings: phy: qcom,uniphy-pcie: Document PCIe uniphy phy: qcom: qmp-usbc: Add qmp configuration for QCS615 phy: freescale: imx8m-pcie: assert phy reset and perst in power off phy: freescale: imx8m-pcie: cleanup reset logic phy: core: Remove unused phy_pm_runtime_(allow|forbid) dt-bindings: phy: document Allwinner A523 USB-2.0 PHY phy: phy-rockchip-samsung-hdptx: Add support for RK3576 ...
2025-04-01Merge tag 'soundwire-6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire Pull soundwire updates from Vinod Koul: - Support for SoundWire Bulk Register Access (BRA) protocol in core along with Intel driver support and ASoC bits required - AMD driver updates and support for ACP 7.0 and 7.1 platforms * tag 'soundwire-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: (28 commits) soundwire: take in count the bandwidth of a prepared stream ASoC: rt711-sdca: add DP0 support soundwire: debugfs: add interface for BPT/BRA transfers ASoC: SOF: Intel: hda-sdw-bpt: add CHAIN_DMA support soundwire: intel_ace2x: add BPT send_async/wait callbacks soundwire: intel: add BPT context definition ASoC: SOF: Intel: hda-sdw-bpt: add helpers for SoundWire BPT DMA soundwire: intel_auxdevice: add indirection for BPT send_async/wait soundwire: cadence: add BTP/BRA helpers to format data soundwire: bus: add bpt_stream pointer soundwire: bus: add send_async/wait APIs for BPT protocol soundwire: stream: reuse existing code for BPT stream soundwire: stream: special-case the bus compute_params() routine soundwire: stream: extend sdw_alloc_stream() to take 'type' parameter soundwire: extend sdw_stream_type to BPT soundwire: cadence: add BTP support for DP0 Documentation: driver: add SoundWire BRA description soundwire: amd: change the log level for command response log soundwire: slave: fix an OF node reference leak in soundwire slave device soundwire: Use str_enable_disable-like helpers ...
2025-04-01Merge tag 'char-misc-6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc / IIO driver updates from Greg KH: "Here is the big set of char, misc, iio, and other smaller driver subsystems for 6.15-rc1. Lots of stuff in here, including: - loads of IIO changes and driver updates - counter driver updates - w1 driver updates - faux conversions for some drivers that were abusing the platform bus interface - coresight driver updates - rust miscdevice binding updates based on real-world-use - other minor driver updates All of these have been in linux-next with no reported issues for quite a while" * tag 'char-misc-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (292 commits) samples: rust_misc_device: fix markup in top-level docs Coresight: Fix a NULL vs IS_ERR() bug in probe misc: lis3lv02d: convert to use faux_device tlclk: convert to use faux_device regulator: dummy: convert to use the faux device interface bus: mhi: host: Fix race between unprepare and queue_buf coresight: configfs: Constify struct config_item_type doc: iio: ad7380: describe offload support iio: ad7380: add support for SPI offload iio: light: Add check for array bounds in veml6075_read_int_time_ms iio: adc: ti-ads7924 Drop unnecessary function parameters staging: iio: ad9834: Use devm_regulator_get_enable() staging: iio: ad9832: Use devm_regulator_get_enable() iio: gyro: bmg160_spi: add of_match_table dt-bindings: iio: adc: Add i.MX94 and i.MX95 support iio: adc: ad7768-1: remove unnecessary locking Documentation: ABI: add wideband filter type to sysfs-bus-iio iio: adc: ad7768-1: set MOSI idle state to prevent accidental reset iio: adc: ad7768-1: Fix conversion result sign iio: adc: ad7124: Benefit of dev = indio_dev->dev.parent in ad7124_parse_channel_config() ...
2025-04-01Merge tag 'driver-core-6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updatesk from Greg KH: "Here is the big set of driver core updates for 6.15-rc1. Lots of stuff happened this development cycle, including: - kernfs scaling changes to make it even faster thanks to rcu - bin_attribute constify work in many subsystems - faux bus minor tweaks for the rust bindings - rust binding updates for driver core, pci, and platform busses, making more functionaliy available to rust drivers. These are all due to people actually trying to use the bindings that were in 6.14. - make Rafael and Danilo full co-maintainers of the driver core codebase - other minor fixes and updates" * tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (52 commits) rust: platform: require Send for Driver trait implementers rust: pci: require Send for Driver trait implementers rust: platform: impl Send + Sync for platform::Device rust: pci: impl Send + Sync for pci::Device rust: platform: fix unrestricted &mut platform::Device rust: pci: fix unrestricted &mut pci::Device rust: device: implement device context marker rust: pci: use to_result() in enable_device_mem() MAINTAINERS: driver core: mark Rafael and Danilo as co-maintainers rust/kernel/faux: mark Registration methods inline driver core: faux: only create the device if probe() succeeds rust/faux: Add missing parent argument to Registration::new() rust/faux: Drop #[repr(transparent)] from faux::Registration rust: io: fix devres test with new io accessor functions rust: io: rename `io::Io` accessors kernfs: Move dput() outside of the RCU section. efi: rci2: mark bin_attribute as __ro_after_init rapidio: constify 'struct bin_attribute' firmware: qemu_fw_cfg: constify 'struct bin_attribute' powerpc/perf/hv-24x7: Constify 'struct bin_attribute' ...
2025-04-01Merge tag 'mm-nonmm-stable-2025-03-30-18-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - The series "powerpc/crash: use generic crashkernel reservation" from Sourabh Jain changes powerpc's kexec code to use more of the generic layers. - The series "get_maintainer: report subsystem status separately" from Vlastimil Babka makes some long-requested improvements to the get_maintainer output. - The series "ucount: Simplify refcounting with rcuref_t" from Sebastian Siewior cleans up and optimizing the refcounting in the ucount code. - The series "reboot: support runtime configuration of emergency hw_protection action" from Ahmad Fatoum improves the ability for a driver to perform an emergency system shutdown or reboot. - The series "Converge on using secs_to_jiffies() part two" from Easwar Hariharan performs further migrations from msecs_to_jiffies() to secs_to_jiffies(). - The series "lib/interval_tree: add some test cases and cleanup" from Wei Yang permits more userspace testing of kernel library code, adds some more tests and performs some cleanups. - The series "hung_task: Dump the blocking task stacktrace" from Masami Hiramatsu arranges for the hung_task detector to dump the stack of the blocking task and not just that of the blocked task. - The series "resource: Split and use DEFINE_RES*() macros" from Andy Shevchenko provides some cleanups to the resource definition macros. - Plus the usual shower of singleton patches - please see the individual changelogs for details. * tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits) mailmap: consolidate email addresses of Alexander Sverdlin fs/procfs: fix the comment above proc_pid_wchan() relay: use kasprintf() instead of fixed buffer formatting resource: replace open coded variant of DEFINE_RES() resource: replace open coded variants of DEFINE_RES_*_NAMED() resource: replace open coded variant of DEFINE_RES_NAMED_DESC() resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED() samples: add hung_task detector mutex blocking sample hung_task: show the blocker task if the task is hung on mutex kexec_core: accept unaccepted kexec segments' destination addresses watchdog/perf: optimize bytes copied and remove manual NUL-termination lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap() lib/interval_tree: skip the check before go to the right subtree lib/interval_tree: add test case for span iteration lib/interval_tree: add test case for interval_tree_iter_xxx() helpers lib/rbtree: add random seed lib/rbtree: split tests lib/rbtree: enable userland test suite for rbtree related data structure checkpatch: describe --min-conf-desc-length scripts/gdb/symbols: determine KASLR offset on s390 ...
2025-04-01Merge tag 'mm-stable-2025-03-30-16-52' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "Enable strict percpu address space checks" from Uros Bizjak uses x86 named address space qualifiers to provide compile-time checking of percpu area accesses. This has caused a small amount of fallout - two or three issues were reported. In all cases the calling code was found to be incorrect. - The series "Some cleanup for memcg" from Chen Ridong implements some relatively monir cleanups for the memcontrol code. - The series "mm: fixes for device-exclusive entries (hmm)" from David Hildenbrand fixes a boatload of issues which David found then using device-exclusive PTE entries when THP is enabled. More work is needed, but this makes thins better - our own HMM selftests now succeed. - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed remove the z3fold and zbud implementations. They have been deprecated for half a year and nobody has complained. - The series "mm: further simplify VMA merge operation" from Lorenzo Stoakes implements numerous simplifications in this area. No runtime effects are anticipated. - The series "mm/madvise: remove redundant mmap_lock operations from process_madvise()" from SeongJae Park rationalizes the locking in the madvise() implementation. Performance gains of 20-25% were observed in one MADV_DONTNEED microbenchmark. - The series "Tiny cleanup and improvements about SWAP code" from Baoquan He contains a number of touchups to issues which Baoquan noticed when working on the swap code. - The series "mm: kmemleak: Usability improvements" from Catalin Marinas implements a couple of improvements to the kmemleak user-visible output. - The series "mm/damon/paddr: fix large folios access and schemes handling" from Usama Arif provides a couple of fixes for DAMON's handling of large folios. - The series "mm/damon/core: fix wrong and/or useless damos_walk() behaviors" from SeongJae Park fixes a few issues with the accuracy of kdamond's walking of DAMON regions. - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo Stoakes changes the interaction between framebuffer deferred-io and core MM. No functional changes are anticipated - this is preparatory work for the future removal of page structure fields. - The series "mm/damon: add support for hugepage_size DAMOS filter" from Usama Arif adds a DAMOS filter which permits the filtering by huge page sizes. - The series "mm: permit guard regions for file-backed/shmem mappings" from Lorenzo Stoakes extends the guard region feature from its present "anon mappings only" state. The feature now covers shmem and file-backed mappings. - The series "mm: batched unmap lazyfree large folios during reclamation" from Barry Song cleans up and speeds up the unmapping for pte-mapped large folios. - The series "reimplement per-vma lock as a refcount" from Suren Baghdasaryan puts the vm_lock back into the vma. Our reasons for pulling it out were largely bogus and that change made the code more messy. This patchset provides small (0-10%) improvements on one microbenchmark. - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and improves" from SeongJae Park does some maintenance work on the DAMON docs. - The series "hugetlb/CMA improvements for large systems" from Frank van der Linden addresses a pile of issues which have been observed when using CMA on large machines. - The series "mm/damon: introduce DAMOS filter type for unmapped pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the page's mapped/unmapped status. - The series "zsmalloc/zram: there be preemption" from Sergey Senozhatsky teaches zram to run its compression and decompression operations preemptibly. - The series "selftests/mm: Some cleanups from trying to run them" from Brendan Jackman fixes a pile of unrelated issues which Brendan encountered while runnimg our selftests. - The series "fs/proc/task_mmu: add guard region bit to pagemap" from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to determine whether a particular page is a guard page. - The series "mm, swap: remove swap slot cache" from Kairui Song removes the swap slot cache from the allocation path - it simply wasn't being effective. - The series "mm: cleanups for device-exclusive entries (hmm)" from David Hildenbrand implements a number of unrelated cleanups in this code. - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual implements a number of preparatoty cleanups to the GENERIC_PTDUMP Kconfig logic. - The series "mm/damon: auto-tune aggregation interval" from SeongJae Park implements a feedback-driven automatic tuning feature for DAMON's aggregation interval tuning. - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in powerpc, sparc and x86 lazy MMU implementations. Ryan did this in preparation for implementing lazy mmu mode for arm64 to optimize vmalloc. - The series "mm/page_alloc: Some clarifications for migratetype fallback" from Brendan Jackman reworks some commentary to make the code easier to follow. - The series "page_counter cleanup and size reduction" from Shakeel Butt cleans up the page_counter code and fixes a size increase which we accidentally added late last year. - The series "Add a command line option that enables control of how many threads should be used to allocate huge pages" from Thomas Prescher does that. It allows the careful operator to significantly reduce boot time by tuning the parallalization of huge page initialization. - The series "Fix calculations in trace_balance_dirty_pages() for cgwb" from Tang Yizhou fixes the tracing output from the dirty page balancing code. - The series "mm/damon: make allow filters after reject filters useful and intuitive" from SeongJae Park improves the handling of allow and reject filters. Behaviour is made more consistent and the documention is updated accordingly. - The series "Switch zswap to object read/write APIs" from Yosry Ahmed updates zswap to the new object read/write APIs and thus permits the removal of some legacy code from zpool and zsmalloc. - The series "Some trivial cleanups for shmem" from Baolin Wang does as it claims. - The series "fs/dax: Fix ZONE_DEVICE page reference counts" from Alistair Popple regularizes the weird ZONE_DEVICE page refcount handling in DAX, permittig the removal of a number of special-case checks. - The series "refactor mremap and fix bug" from Lorenzo Stoakes is a preparatoty refactoring and cleanup of the mremap() code. - The series "mm: MM owner tracking for large folios (!hugetlb) + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in which we determine whether a large folio is known to be mapped exclusively into a single MM. - The series "mm/damon: add sysfs dirs for managing DAMOS filters based on handling layers" from SeongJae Park adds a couple of new sysfs directories to ease the management of DAMON/DAMOS filters. - The series "arch, mm: reduce code duplication in mem_init()" from Mike Rapoport consolidates many per-arch implementations of mem_init() into code generic code, where that is practical. - The series "mm/damon/sysfs: commit parameters online via damon_call()" from SeongJae Park continues the cleaning up of sysfs access to DAMON internal data. - The series "mm: page_ext: Introduce new iteration API" from Luiz Capitulino reworks the page_ext initialization to fix a boot-time crash which was observed with an unusual combination of compile and cmdline options. - The series "Buddy allocator like (or non-uniform) folio split" from Zi Yan reworks the code to split a folio into smaller folios. The main benefit is lessened memory consumption: fewer post-split folios are generated. - The series "Minimize xa_node allocation during xarry split" from Zi Yan reduces the number of xarray xa_nodes which are generated during an xarray split. - The series "drivers/base/memory: Two cleanups" from Gavin Shan performs some maintenance work on the drivers/base/memory code. - The series "Add tracepoints for lowmem reserves, watermarks and totalreserve_pages" from Martin Liu adds some more tracepoints to the page allocator code. - The series "mm/madvise: cleanup requests validations and classifications" from SeongJae Park cleans up some warts which SeongJae observed during his earlier madvise work. - The series "mm/hwpoison: Fix regressions in memory failure handling" from Shuai Xue addresses two quite serious regressions which Shuai has observed in the memory-failure implementation. - The series "mm: reliable huge page allocator" from Johannes Weiner makes huge page allocations cheaper and more reliable by reducing fragmentation. - The series "Minor memcg cleanups & prep for memdescs" from Matthew Wilcox is preparatory work for the future implementation of memdescs. - The series "track memory used by balloon drivers" from Nico Pache introduces a way to track memory used by our various balloon drivers. - The series "mm/damon: introduce DAMOS filter type for active pages" from Nhat Pham permits users to filter for active/inactive pages, separately for file and anon pages. - The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia separates the proactive reclaim statistics from the direct reclaim statistics. - The series "mm/vmscan: don't try to reclaim hwpoison folio" from Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim code. * tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits) mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex() x86/mm: restore early initialization of high_memory for 32-bits mm/vmscan: don't try to reclaim hwpoison folio mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper cgroup: docs: add pswpin and pswpout items in cgroup v2 doc mm: vmscan: split proactive reclaim statistics from direct reclaim statistics selftests/mm: speed up split_huge_page_test selftests/mm: uffd-unit-tests support for hugepages > 2M docs/mm/damon/design: document active DAMOS filter type mm/damon: implement a new DAMOS filter type for active pages fs/dax: don't disassociate zero page entries MM documentation: add "Unaccepted" meminfo entry selftests/mm: add commentary about 9pfs bugs fork: use __vmalloc_node() for stack allocation docs/mm: Physical Memory: Populate the "Zones" section xen: balloon: update the NR_BALLOON_PAGES state hv_balloon: update the NR_BALLOON_PAGES state balloon_compaction: update the NR_BALLOON_PAGES state meminfo: add a per node counter for balloon drivers mm: remove references to folio in __memcg_kmem_uncharge_page() ...