summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-05arch: x86: add IPC mailbox accessor function and add SoC register accessDavid E. Box
- Exports intel_pmc_ipc() for host access to the PMC IPC mailbox - Enables the host to access specific SoC registers through the PMC firmware using IPC commands. This access method is necessary for registers that are not available through direct Memory-Mapped I/O (MMIO), which is used for other accessible parts of the PMC. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Chao Qin <chao.qin@intel.com> Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/20250227121522.1802832-4-yong.liang.choong@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-05net: pcs: xpcs: re-initiate clause 37 Auto-negotiationChoong Yong Liang
The xpcs_switch_interface_mode function was introduced to handle interface switching. According to the XPCS datasheet, a soft reset is required to initiate Clause 37 auto-negotiation when the XPCS switches interface modes. When the interface mode switches from 2500BASE-X to SGMII, re-initiating Clause 37 auto-negotiation is required for the SGMII interface mode to function properly. Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Link: https://patch.msgid.link/20250227121522.1802832-3-yong.liang.choong@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-05net: phylink: use pl->link_interface in phylink_expects_phy()Choong Yong Liang
The phylink_expects_phy() function allows MAC drivers to check if they are expecting a PHY to attach. The checking condition in phylink_expects_phy() aims to achieve the same result as the checking condition in phylink_attach_phy(). However, the checking condition in phylink_expects_phy() uses pl->link_config.interface, while phylink_attach_phy() uses pl->link_interface. Initially, both pl->link_interface and pl->link_config.interface are set to SGMII, and pl->cfg_link_an_mode is set to MLO_AN_INBAND. When the interface switches from SGMII to 2500BASE-X, pl->link_config.interface is updated by phylink_major_config(). At this point, pl->cfg_link_an_mode remains MLO_AN_INBAND, and pl->link_config.interface is set to 2500BASE-X. Subsequently, when the STMMAC interface is taken down administratively and brought back up, it is blocked by phylink_expects_phy(). Since phylink_expects_phy() and phylink_attach_phy() aim to achieve the same result, phylink_expects_phy() should check pl->link_interface, which never changes, instead of pl->link_config.interface, which is updated by phylink_major_config(). Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com> Link: https://patch.msgid.link/20250227121522.1802832-2-yong.liang.choong@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-05Merge tag 'hid-for-linus-2025030501' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: - power management fix in intel-thc-hid (Even Xu) - nintendo gencon mapping fix (Ryan McClelland) - fix for UAF on device diconnect path in hid-steam (Vicki Pfau) - two fixes for UAF on device disconnect path in intel-ish-hid (Zhang Lixu) - fix for potential NULL dereference in hid-appleir (Daniil Dulov) - few other small cosmetic fixes (e.g. typos) * tag 'hid-for-linus-2025030501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: Intel-thc-hid: Intel-quickspi: Correct device state after S4 HID: intel-thc-hid: Fix spelling mistake "intput" -> "input" HID: hid-steam: Fix use-after-free when detaching device HID: debug: Fix spelling mistake "Messanger" -> "Messenger" HID: appleir: Fix potential NULL dereference at raw event handle HID: apple: disable Fn key handling on the Omoton KB066 HID: i2c-hid: improve i2c_hid_get_report error message HID: intel-ish-hid: Fix use-after-free issue in ishtp_hid_remove() HID: intel-ish-hid: Fix use-after-free issue in hid_ishtp_cl_remove() HID: google: fix unused variable warning under !CONFIG_ACPI HID: nintendo: fix gencon button events map HID: corsair-void: Update power supply values with a unified work handler
2025-03-05fs/pipe: remove buggy and unused 'helper' functionLinus Torvalds
While looking for incorrect users of the pipe head/tail fields (see commit c27c66afc449: "fs/pipe: Fix pipe_occupancy() with 16-bit indexes"), I found a bug in pipe_discard_from() that looked entirely broken. However, the fix is trivial: this buggy function isn't actually called by anything, so let's just remove it ASAP. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-05include/linux/pipe_fs_i: Add htmldoc annotation for "head_tail" memberK Prateek Nayak
Add htmldoc annotation for the newly introduced "head_tail" member describing it to be a union of the pipe_inode_info's @head and @tail members. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/lkml/20250305204609.5e64768e@canb.auug.org.au/ Fixes: 3d252160b818 ("fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex") Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-05fs/pipe: Fix pipe_occupancy() with 16-bit indexesLinus Torvalds
The pipe_occupancy() logic implicitly relied on the natural unsigned modulo arithmetic in C, but that doesn't work for the new 'pipe_index_t' case, since any arithmetic will be done in 'int' (and here we had also made it 'unsigned int' due to the function call boundary). So make the modulo arithmetic explicit by casting the result to the proper type. Cc: Oleg Nesterov <oleg@redhat.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Swapnil Sapkal <swapnil.sapkal@amd.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/all/CAHk-=wjyHsGLx=rxg6PKYBNkPYAejgo7=CbyL3=HGLZLsAaJFQ@mail.gmail.com/ Fixes: 3d252160b818 ("fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-05net-timestamp: support TCP GSO case for a few missing flagsJason Xing
When I read through the TSO codes, I found out that we probably miss initializing the tx_flags of last seg when TSO is turned off, which means at the following points no more timestamp (for this last one) will be generated. There are three flags to be handled in this patch: 1. SKBTX_HW_TSTAMP 2. SKBTX_BPF 3. SKBTX_SCHED_TSTAMP Note that SKBTX_BPF[1] was added in 6.14.0-rc2 by commit 6b98ec7e882af ("bpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback") and only belongs to net-next branch material for now. The common issue of the above three flags can be fixed by this single patch. This patch initializes the tx_flags to SKBTX_ANY_TSTAMP like what the UDP GSO does to make the newly segmented last skb inherit the tx_flags so that requested timestamp will be generated in each certain layer, or else that last one has zero value of tx_flags which leads to no timestamp at all. Fixes: 4ed2d765dfacc ("net-timestamp: TCP timestamping") Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-03-05exfat: add a check for invalid data sizeYuezhang Mo
Add a check for invalid data size to avoid corrupted filesystem from being further corrupted. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-03-05exfat: short-circuit zero-byte writes in exfat_file_write_iterEric Sandeen
When generic_write_checks() returns zero, it means that iov_iter_count() is zero, and there is no work to do. Simply return success like all other filesystems do, rather than proceeding down the write path, which today yields an -EFAULT in generic_perform_write() via the (fault_in_iov_iter_readable(i, bytes) == bytes) check when bytes == 0. Fixes: 11a347fb6cef ("exfat: change to get file size from DataLength") Reported-by: Noah <kernel-org-10@maxgrass.eu> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-03-05exfat: fix soft lockup in exfat_clear_bitmapNamjae Jeon
bitmap clear loop will take long time in __exfat_free_cluster() if data size of file/dir enty is invalid. If cluster bit in bitmap is already clear, stop clearing bitmap go to out of loop. Fixes: 31023864e67a ("exfat: add fat entry operations") Reported-by: Kun Hu <huk23@m.fudan.edu.cn>, Jiaji Qin <jjtan24@m.fudan.edu.cn> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-03-05exfat: fix just enough dentries but allocate a new cluster to dirYuezhang Mo
This commit fixes the condition for allocating cluster to parent directory to avoid allocating new cluster to parent directory when there are just enough empty directory entries at the end of the parent directory. Fixes: af02c72d0b62 ("exfat: convert exfat_find_empty_entry() to use dentry cache") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-03-05Merge branch 'dynamic-possix-clocks-permission-checks'David S. Miller
Wojtek Wasko says: ==================== Permission checks for dynamic POSIX clocks Dynamic clocks - such as PTP clocks - extend beyond the standard POSIX clock API by using ioctl calls. While file permissions are enforced for standard POSIX operations, they are not implemented for ioctl calls, since the POSIX layer cannot differentiate between calls which modify the clock's state (like enabling PPS output generation) and those that don't (such as retrieving the clock's PPS capabilities). On the other hand, drivers implementing the dynamic clocks lack the necessary information context to enforce permission checks themselves. Additionally, POSIX clock layer requires the WRITE permission even for readonly adjtime() operations before invoking the callback. Add a struct file pointer to the POSIX clock context and use it to implement the appropriate permission checks on PTP chardevs. Permit readonly adjtime() for dynamic clocks. Add a readonly option to testptp. Changes in v4: - Allow readonly adjtime() for dynamic clocks, as suggested by Thomas Changes in v3: - Reword the log message for commit against posix-clock and fix documentation of struct posix_clock_context, as suggested by Thomas Changes in v2: - Store file pointer in POSIX clock context rather than fmode in the PTP clock's private data, as suggested by Richard. - Move testptp.c changes into separate patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2025-03-05testptp: Add option to open PHC in readonly modeWojtek Wasko
PTP Hardware Clocks no longer require WRITE permission to perform readonly operations, such as listing device capabilities or listening to EXTTS events once they have been enabled by a process with WRITE permissions. Add '-r' option to testptp to open the PHC in readonly mode instead of the default read-write mode. Skip enabling EXTTS if readonly mode is requested. Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Wojtek Wasko <wwasko@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-03-05ptp: Add PHC file mode checks. Allow RO adjtime() without FMODE_WRITE.Wojtek Wasko
Many devices implement highly accurate clocks, which the kernel manages as PTP Hardware Clocks (PHCs). Userspace applications rely on these clocks to timestamp events, trace workload execution, correlate timescales across devices, and keep various clocks in sync. The kernel’s current implementation of PTP clocks does not enforce file permissions checks for most device operations except for POSIX clock operations, where file mode is verified in the POSIX layer before forwarding the call to the PTP subsystem. Consequently, it is common practice to not give unprivileged userspace applications any access to PTP clocks whatsoever by giving the PTP chardevs 600 permissions. An example of users running into this limitation is documented in [1]. Additionally, POSIX layer requires WRITE permission even for readonly adjtime() calls which are used in PTP layer to return current frequency offset applied to the PHC. Add permission checks for functions that modify the state of a PTP device. Continue enforcing permission checks for POSIX clock operations (settime, adjtime) in the POSIX layer. Only require WRITE access for dynamic clocks adjtime() if any flags are set in the modes field. [1] https://lists.nwtime.org/sympa/arc/linuxptp-users/2024-01/msg00036.html Changes in v4: - Require FMODE_WRITE in ajtime() only for calls modifying the clock in any way. Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Wojtek Wasko <wwasko@nvidia.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-03-05posix-clock: Store file pointer in struct posix_clock_contextWojtek Wasko
File descriptor based pc_clock_*() operations of dynamic posix clocks have access to the file pointer and implement permission checks in the generic code before invoking the relevant dynamic clock callback. Character device operations (open, read, poll, ioctl) do not implement a generic permission control and the dynamic clock callbacks have no access to the file pointer to implement them. Extend struct posix_clock_context with a struct file pointer and initialize it in posix_clock_open(), so that all dynamic clock callbacks can access it. Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Wojtek Wasko <wwasko@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-03-05doc: correcting two prefix errors in idmappings.rstAiden Ma
Add the 'k' prefix to id 21000. And id `u1000` in the third idmapping should be mapped to `k31000`, not `u31000`. Signed-off-by: Aiden Ma <jiaheng.ma@foxmail.com> Link: https://lore.kernel.org/r/tencent_4E7B1F143E8051530C21FCADF4E014DCBB06@qq.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-04Merge tag 'x86_microcode_for_v6.14_rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull AMD microcode loading fixes from Borislav Petkov: - Load only sha256-signed microcode patch blobs - Other good cleanups * tag 'x86_microcode_for_v6.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/AMD: Load only SHA256-checksummed patches x86/microcode/AMD: Add get_patch_level() x86/microcode/AMD: Get rid of the _load_microcode_amd() forward declaration x86/microcode/AMD: Merge early_apply_microcode() into its single callsite x86/microcode/AMD: Remove unused save_microcode_in_initrd_amd() declarations x86/microcode/AMD: Remove ugly linebreak in __verify_patch_section() signature
2025-03-04vlan: enforce underlying device typeOscar Maes
Currently, VLAN devices can be created on top of non-ethernet devices. Besides the fact that it doesn't make much sense, this also causes a bug which leaks the address of a kernel function to usermode. When creating a VLAN device, we initialize GARP (garp_init_applicant) and MRP (mrp_init_applicant) for the underlying device. As part of the initialization process, we add the multicast address of each applicant to the underlying device, by calling dev_mc_add. __dev_mc_add uses dev->addr_len to determine the length of the new multicast address. This causes an out-of-bounds read if dev->addr_len is greater than 6, since the multicast addresses provided by GARP and MRP are only 6 bytes long. This behaviour can be reproduced using the following commands: ip tunnel add gretest mode ip6gre local ::1 remote ::2 dev lo ip l set up dev gretest ip link add link gretest name vlantest type vlan id 100 Then, the following command will display the address of garp_pdu_rcv: ip maddr show | grep 01:80:c2:00:00:21 Fix the bug by enforcing the type of the underlying device during VLAN device initialization. Fixes: 22bedad3ce11 ("net: convert multicast list to list_head") Reported-by: syzbot+91161fe81857b396c8a0@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/000000000000ca9a81061a01ec20@google.com/ Signed-off-by: Oscar Maes <oscmaes92@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250303155619.8918-1-oscmaes92@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04net: Prevent use after free in netif_napi_set_irq_locked()Dan Carpenter
The cpu_rmap_put() will call kfree() when the last reference is dropped so it could result in a use after free when we dereference the same pointer the next line. Move the cpu_rmap_put() after the dereference. Fixes: bd7c00605ee0 ("net: move aRFS rmap management and CPU affinity to core") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/5a9c53a4-5487-4b8c-9ffa-d8e5343aaaaf@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04net: cadence: macb: Synchronize standard statsSean Anderson
The new stats calculations add several additional calls to macb/gem_update_stats() and accesses to bp->hw_stats. These are protected by a spinlock since commit fa52f15c745c ("net: cadence: macb: Synchronize stats calculations"), which was applied in parallel. Add some locking now that the net has been merged into net-next. Fixes: f6af690a295a ("net: cadence: macb: Report standard stats") Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20250303231832.1648274-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04Merge branch 'tcp-scale-connect-under-pressure'Jakub Kicinski
Eric Dumazet says: ==================== tcp: scale connect() under pressure Adoption of bhash2 in linux-6.1 made some operations almost twice more expensive, because of additional locks. This series adds RCU in __inet_hash_connect() to help the case where many attempts need to be made before finding an available 4-tuple. This brings a ~200 % improvement in this experiment: Server: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog Client: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server Before series: utime_start=0.288582 utime_end=1.548707 stime_start=20.637138 stime_end=2002.489845 num_transactions=484453 latency_min=0.156279245 latency_max=20.922042756 latency_mean=1.546521274 latency_stddev=3.936005194 num_samples=312537 throughput=47426.00 perf top on the client: 49.54% [kernel] [k] _raw_spin_lock 25.87% [kernel] [k] _raw_spin_lock_bh 5.97% [kernel] [k] queued_spin_lock_slowpath 5.67% [kernel] [k] __inet_hash_connect 3.53% [kernel] [k] __inet6_check_established 3.48% [kernel] [k] inet6_ehashfn 0.64% [kernel] [k] rcu_all_qs After this series: utime_start=0.271607 utime_end=3.847111 stime_start=18.407684 stime_end=1997.485557 num_transactions=1350742 latency_min=0.014131929 latency_max=17.895073144 latency_mean=0.505675853 # Nice reduction of latency metrics latency_stddev=2.125164772 num_samples=307884 throughput=139866.80 # 194 % increase perf top on client: 56.86% [kernel] [k] __inet6_check_established 17.96% [kernel] [k] __inet_hash_connect 13.88% [kernel] [k] inet6_ehashfn 2.52% [kernel] [k] rcu_all_qs 2.01% [kernel] [k] __cond_resched 0.41% [kernel] [k] _raw_spin_lock ==================== Link: https://patch.msgid.link/20250302124237.3913746-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04tcp: use RCU lookup in __inet_hash_connect()Eric Dumazet
When __inet_hash_connect() has to try many 4-tuples before finding an available one, we see a high spinlock cost from the many spin_lock_bh(&head->lock) performed in its loop. This patch adds an RCU lookup to avoid the spinlock cost. check_established() gets a new @rcu_lookup argument. First reason is to not make any changes while head->lock is not held. Second reason is to not make this RCU lookup a second time after the spinlock has been acquired. Tested: Server: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog Client: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server Before series: utime_start=0.288582 utime_end=1.548707 stime_start=20.637138 stime_end=2002.489845 num_transactions=484453 latency_min=0.156279245 latency_max=20.922042756 latency_mean=1.546521274 latency_stddev=3.936005194 num_samples=312537 throughput=47426.00 perf top on the client: 49.54% [kernel] [k] _raw_spin_lock 25.87% [kernel] [k] _raw_spin_lock_bh 5.97% [kernel] [k] queued_spin_lock_slowpath 5.67% [kernel] [k] __inet_hash_connect 3.53% [kernel] [k] __inet6_check_established 3.48% [kernel] [k] inet6_ehashfn 0.64% [kernel] [k] rcu_all_qs After this series: utime_start=0.271607 utime_end=3.847111 stime_start=18.407684 stime_end=1997.485557 num_transactions=1350742 latency_min=0.014131929 latency_max=17.895073144 latency_mean=0.505675853 # Nice reduction of latency metrics latency_stddev=2.125164772 num_samples=307884 throughput=139866.80 # 190 % increase perf top on client: 56.86% [kernel] [k] __inet6_check_established 17.96% [kernel] [k] __inet_hash_connect 13.88% [kernel] [k] inet6_ehashfn 2.52% [kernel] [k] rcu_all_qs 2.01% [kernel] [k] __cond_resched 0.41% [kernel] [k] _raw_spin_lock Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Tested-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250302124237.3913746-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04tcp: add RCU management to inet_bind_bucketEric Dumazet
Add RCU protection to inet_bind_bucket structure. - Add rcu_head field to the structure definition. - Use kfree_rcu() at destroy time, and remove inet_bind_bucket_destroy() first argument. - Use hlist_del_rcu() and hlist_add_head_rcu() methods. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250302124237.3913746-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04tcp: optimize inet_use_bhash2_on_bind()Eric Dumazet
There is no reason to call ipv6_addr_type(). Instead, use highly optimized ipv6_addr_any() and ipv6_addr_v4mapped(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250302124237.3913746-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04tcp: use RCU in __inet{6}_check_established()Eric Dumazet
When __inet_hash_connect() has to try many 4-tuples before finding an available one, we see a high spinlock cost from __inet_check_established() and/or __inet6_check_established(). This patch adds an RCU lookup to avoid the spinlock acquisition when the 4-tuple is found in the hash table. Note that there are still spin_lock_bh() calls in __inet_hash_connect() to protect inet_bind_hashbucket, this will be fixed later in this series. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Tested-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250302124237.3913746-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04mptcp: fix 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addrKrister Johansen
If multiple connection requests attempt to create an implicit mptcp endpoint in parallel, more than one caller may end up in mptcp_pm_nl_append_new_local_addr because none found the address in local_addr_list during their call to mptcp_pm_nl_get_local_id. In this case, the concurrent new_local_addr calls may delete the address entry created by the previous caller. These deletes use synchronize_rcu, but this is not permitted in some of the contexts where this function may be called. During packet recv, the caller may be in a rcu read critical section and have preemption disabled. An example stack: BUG: scheduling while atomic: swapper/2/0/0x00000302 Call Trace: <IRQ> dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1)) dump_stack (lib/dump_stack.c:124) __schedule_bug (kernel/sched/core.c:5943) schedule_debug.constprop.0 (arch/x86/include/asm/preempt.h:33 kernel/sched/core.c:5970) __schedule (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 kernel/sched/features.h:29 kernel/sched/core.c:6621) schedule (arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6804 kernel/sched/core.c:6818) schedule_timeout (kernel/time/timer.c:2160) wait_for_completion (kernel/sched/completion.c:96 kernel/sched/completion.c:116 kernel/sched/completion.c:127 kernel/sched/completion.c:148) __wait_rcu_gp (include/linux/rcupdate.h:311 kernel/rcu/update.c:444) synchronize_rcu (kernel/rcu/tree.c:3609) mptcp_pm_nl_append_new_local_addr (net/mptcp/pm_netlink.c:966 net/mptcp/pm_netlink.c:1061) mptcp_pm_nl_get_local_id (net/mptcp/pm_netlink.c:1164) mptcp_pm_get_local_id (net/mptcp/pm.c:420) subflow_check_req (net/mptcp/subflow.c:98 net/mptcp/subflow.c:213) subflow_v4_route_req (net/mptcp/subflow.c:305) tcp_conn_request (net/ipv4/tcp_input.c:7216) subflow_v4_conn_request (net/mptcp/subflow.c:651) tcp_rcv_state_process (net/ipv4/tcp_input.c:6709) tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1934) tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2334) ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1)) ip_local_deliver_finish (include/linux/rcupdate.h:813 net/ipv4/ip_input.c:234) ip_local_deliver (include/linux/netfilter.h:314 include/linux/netfilter.h:308 net/ipv4/ip_input.c:254) ip_sublist_rcv_finish (include/net/dst.h:461 net/ipv4/ip_input.c:580) ip_sublist_rcv (net/ipv4/ip_input.c:640) ip_list_rcv (net/ipv4/ip_input.c:675) __netif_receive_skb_list_core (net/core/dev.c:5583 net/core/dev.c:5631) netif_receive_skb_list_internal (net/core/dev.c:5685 net/core/dev.c:5774) napi_complete_done (include/linux/list.h:37 include/net/gro.h:449 include/net/gro.h:444 net/core/dev.c:6114) igb_poll (drivers/net/ethernet/intel/igb/igb_main.c:8244) igb __napi_poll (net/core/dev.c:6582) net_rx_action (net/core/dev.c:6653 net/core/dev.c:6787) handle_softirqs (kernel/softirq.c:553) __irq_exit_rcu (kernel/softirq.c:588 kernel/softirq.c:427 kernel/softirq.c:636) irq_exit_rcu (kernel/softirq.c:651) common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14)) </IRQ> This problem seems particularly prevalent if the user advertises an endpoint that has a different external vs internal address. In the case where the external address is advertised and multiple connections already exist, multiple subflow SYNs arrive in parallel which tends to trigger the race during creation of the first local_addr_list entries which have the internal address instead. Fix by skipping the replacement of an existing implicit local address if called via mptcp_pm_nl_get_local_id. Fixes: d045b9eb95a9 ("mptcp: introduce implicit endpoints") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250303-net-mptcp-fix-sched-while-atomic-v1-1-f6a216c5a74c@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04net: ethernet: ti: cpsw_new: populate netdev of_nodeAlexander Sverdlin
So that of_find_net_device_by_node() can find CPSW ports and other DSA switches can be stacked downstream. Tested in conjunction with KSZ8873. Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Link: https://patch.msgid.link/20250303074703.1758297-1-alexander.sverdlin@siemens.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04tipc: Reduce scope for the variable “fdefq” in tipc_link_tnl_prepare()Markus Elfring
The address of a data structure member was determined before a corresponding null pointer check in the implementation of the function “tipc_link_tnl_prepare”. Thus avoid the risk for undefined behaviour by moving the definition for the local variable “fdefq” into an if branch at the end. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Link: https://patch.msgid.link/08fe8fc3-19c3-4324-8719-0ee74b0f32c9@web.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04selftests: drv-net: use env.rpath in the HDS testJakub Kicinski
Commit 29b036be1b0b ("selftests: drv-net: test XDP, HDS auto and the ioctl path") added a new test case in the net tree, now that this code has made its way to net-next convert it to use the env.rpath() helper instead of manually computing the relative path. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250228212956.25399-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04dsa: mt7530: Utilize REGMAP_IRQ for interrupt handlingDaniel Golle
Replace the custom IRQ chip handler and mask/unmask functions with REGMAP_IRQ. This significantly simplifies the code and allows for the removal of almost all interrupt-related functions from mt7530.c. Tested on MT7988A built-in switch (MMIO) as well as MT7531AE IC (MDIO). Signed-off-by: Daniel Golle <daniel@makrotopia.org> Acked-by: Chester A. Unal <chester.a.unal@arinc9.com> Link: https://patch.msgid.link/221013c3530b61504599e285c341a993f6188f00.1740792674.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04net: ethtool: netlink: Allow NULL nlattrs when getting a phy_deviceMaxime Chevallier
ethnl_req_get_phydev() is used to lookup a phy_device, in the case an ethtool netlink command targets a specific phydev within a netdev's topology. It takes as a parameter a const struct nlattr *header that's used for error handling : if (!phydev) { NL_SET_ERR_MSG_ATTR(extack, header, "no phy matching phyindex"); return ERR_PTR(-ENODEV); } In the notify path after a ->set operation however, there's no request attributes available. The typical callsite for the above function looks like: phydev = ethnl_req_get_phydev(req_base, tb[ETHTOOL_A_XXX_HEADER], info->extack); So, when tb is NULL (such as in the ethnl notify path), we have a nice crash. It turns out that there's only the PLCA command that is in that case, as the other phydev-specific commands don't have a notification. This commit fixes the crash by passing the cmd index and the nlattr array separately, allowing NULL-checking it directly inside the helper. Fixes: c15e065b46dc ("net: ethtool: Allow passing a phy index for some commands") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Reported-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com> Link: https://patch.msgid.link/20250301141114.97204-1-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04ppp: use IFF_NO_QUEUE in virtual interfacesQingfang Deng
For PPPoE, PPTP, and PPPoL2TP, the start_xmit() function directly forwards packets to the underlying network stack and never returns anything other than 1. So these interfaces do not require a qdisc, and the IFF_NO_QUEUE flag should be set. Introduces a direct_xmit flag in struct ppp_channel to indicate when IFF_NO_QUEUE should be applied. The flag is set in ppp_connect_channel() for relevant protocols. While at it, remove the usused latency member from struct ppp_channel. Signed-off-by: Qingfang Deng <dqfext@gmail.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20250301135517.695809-1-dqfext@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04Merge branch 'eth-fbnic-cleanup-macros-and-string-function'Jakub Kicinski
Lee Trager says: ==================== eth: fbnic: Cleanup macros and string function We have received some feedback that the macros we use for reading FW mailbox attributes are too large in scope and confusing to understanding. Additionally the string function did not provide errors allowing it to silently succeed. This patch set fixes theses issues. ==================== Link: https://patch.msgid.link/20250228191935.3953712-1-lee@trager.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04eth: fbnic: Replace firmware field macrosLee Trager
Replace the firmware field macros with new macros which follow typical kernel standards. No variables are required to be predefined for use and results are now returned. These macros are prefixed with fta or fbnic TLV attribute. Signed-off-by: Lee Trager <lee@trager.us> Link: https://patch.msgid.link/20250228191935.3953712-4-lee@trager.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04eth: fbnic: Update fbnic_tlv_attr_get_string() to work like nla_strscpy()Lee Trager
Allow fbnic_tlv_attr_get_string() to return an error code. In the event the source mailbox attribute is missing return -EINVAL. Like nla_strscpy() return -E2BIG when the source string is larger than the destination string. In this case the amount of data copied is equal to dstsize. Signed-off-by: Lee Trager <lee@trager.us> Link: https://patch.msgid.link/20250228191935.3953712-3-lee@trager.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04eth: fbnic: Prepend TSENE FW fields with FBNIC_FWLee Trager
All other firmware fields are prepended with FBNIC_FW. Update TSENE fields to follow the same format. Signed-off-by: Lee Trager <lee@trager.us> Link: https://patch.msgid.link/20250228191935.3953712-2-lee@trager.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04Merge branch ↵Jakub Kicinski
'net-convert-gianfar-triple-speed-ethernet-controller-bindings-to-yaml' J. Neuschäfer says: ==================== net: Convert Gianfar (Triple Speed Ethernet Controller) bindings to YAML The aim of this series is to modernize the device tree bindings for the Freescale "Gianfar" ethernet controller (a.k.a. TSEC, Triple Speed Ethernet Controller) by converting them to YAML. v1: https://lore.kernel.org/20250220-gianfar-yaml-v1-0-0ba97fd1ef92@posteo.net ==================== Link: https://patch.msgid.link/20250228-gianfar-yaml-v2-0-6beeefbd4818@posteo.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04dt-bindings: net: Convert fsl,gianfar to YAMLJ. Neuschäfer
Add a binding for the "Gianfar" ethernet controller, also known as TSEC/eTSEC. Signed-off-by: J. Neuschäfer <j.ne@posteo.net> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20250228-gianfar-yaml-v2-3-6beeefbd4818@posteo.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04dt-bindings: net: fsl,gianfar-mdio: Update information about TBIJ. Neuschäfer
When this binding was originally written, all known TSEC Ethernet controllers had a Ten-Bit Interface (TBI). However, some datasheets such as for the MPC8315E suggest that this is not universally true: The eTSECs do not support TBI, GMII, and FIFO operating modes, so all references to these interfaces and features should be ignored for this device. Acked-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: J. Neuschäfer <j.ne@posteo.net> Link: https://patch.msgid.link/20250228-gianfar-yaml-v2-2-6beeefbd4818@posteo.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04dt-bindings: net: Convert fsl,gianfar-{mdio,tbi} to YAMLJ. Neuschäfer
Move the information related to the Freescale Gianfar (TSEC) MDIO bus and the Ten-Bit Interface (TBI) from fsl-tsec-phy.txt to a new binding file in YAML format, fsl,gianfar-mdio.yaml. Signed-off-by: J. Neuschäfer <j.ne@posteo.net> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20250228-gianfar-yaml-v2-1-6beeefbd4818@posteo.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04Merge branch 'net-phy-nxp-c45-tja11xx-add-support-for-tja1121'Jakub Kicinski
Andrei Botila says: ==================== net: phy: nxp-c45-tja11xx: add support for TJA1121 This patch series adds .match_phy_device for the existing TJAs to differentiate between TJA1103/TJA1104 and TJA1120/TJA1121. TJA1103 and TJA1104 share the same PHY_ID but TJA1104 has MACsec capabilities while TJA1103 doesn't. Also add support for TJA1121 which is based on TJA1120 hardware with additional MACsec IP. ==================== Link: https://patch.msgid.link/20250228154320.2979000-1-andrei.botila@oss.nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04net: phy: nxp-c45-tja11xx: add support for TJA1121Andrei Botila
Add support for TJA1121 which is based on TJA1120 but with additional MACsec IP. Signed-off-by: Andrei Botila <andrei.botila@oss.nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250228154320.2979000-3-andrei.botila@oss.nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04net: phy: nxp-c45-tja11xx: add match_phy_device to TJA1103/TJA1104Andrei Botila
Add .match_phy_device for the existing TJAs to differentiate between TJA1103 and TJA1104. TJA1103 and TJA1104 share the same PHY_ID but TJA1104 has MACsec capabilities while TJA1103 doesn't. Signed-off-by: Andrei Botila <andrei.botila@oss.nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250228154320.2979000-2-andrei.botila@oss.nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04dpll: Add an assertion to check freq_supported_numJiasheng Jiang
Since the driver is broken in the case that src->freq_supported is not NULL but src->freq_supported_num is 0, add an assertion for it. Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://patch.msgid.link/20250228150210.34404-1-jiashengjiangcool@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04Merge branch 'mptcp-improve-code-coverage-and-small-optimisations'Jakub Kicinski
Matthieu Baerts says: ==================== mptcp: improve code coverage and small optimisations This small series have various unrelated patches: - Patch 1 and 2: improve code coverage by validating mptcp_diag_dump_one thanks to a new tool displaying MPTCP info for a specific token. - Patch 3: a fix for a commit which is only in net-next. - Patch 4: reduce parameters for one in-kernel PM helper. - Patch 5: exit early when processing an ADD_ADDR echo to avoid unneeded operations. ==================== Link: https://patch.msgid.link/20250228-net-next-mptcp-coverage-small-opti-v1-0-f933c4275676@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04mptcp: pm: exit early with ADD_ADDR echo if possibleMatthieu Baerts (NGI0)
When the userspace PM is used, or when the in-kernel limits are reached, there will be no need to schedule the PM worker to signal new addresses. That corresponds to pm->work_pending set to 0. In this case, an early exit can be done in mptcp_pm_add_addr_echoed() not to hold the PM lock, and iterate over the announced addresses list, not to schedule the worker anyway in this case. This is similar to what is done when a connection or a subflow has been established. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250228-net-next-mptcp-coverage-small-opti-v1-5-f933c4275676@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04mptcp: pm: in-kernel: reduce parameters of set_flagsGeliang Tang
The number of parameters in mptcp_nl_set_flags() can be reduced. Only need to pass a "local" parameter to it instead of "local->addr" and "local->flags". Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250228-net-next-mptcp-coverage-small-opti-v1-4-f933c4275676@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04mptcp: pm: in-kernel: avoid access entry without lockGeliang Tang
In mptcp_pm_nl_set_flags(), "entry" is copied to "local" when pernet->lock is held to avoid direct access to entry without pernet->lock. Therefore, "local->flags" should be passed to mptcp_nl_set_flags instead of "entry->flags" when pernet->lock is not held, so as to avoid access to entry. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Fixes: 145dc6cc4abd ("mptcp: pm: change to fullmesh only for 'subflow'") Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250228-net-next-mptcp-coverage-small-opti-v1-3-f933c4275676@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04selftests: mptcp: add a test for mptcp_diag_dump_oneGang Yan
This patch introduces a new 'chk_diag' test in diag.sh. It retrieves the token for a specified MPTCP socket (msk) using the 'ss' command and then accesses the 'mptcp_diag_dump_one' in kernel via ./mptcp_diag to verify if the correct token is returned. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/524 Signed-off-by: Gang Yan <yangang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250228-net-next-mptcp-coverage-small-opti-v1-2-f933c4275676@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>