summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-07-31net: wan: fsl_qmc_hdlc: Discard received CRCHerve Codina
Received frame from QMC contains the CRC. Upper layers don't need this CRC and tcpdump mentioned trailing junk data due to this CRC presence. As some other HDLC driver, simply discard this CRC. Fixes: d0f2258e79fd ("net: wan: Add support for QMC HDLC") Cc: stable@vger.kernel.org Signed-off-by: Herve Codina <herve.codina@bootlin.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240730063133.179598-1-herve.codina@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31net: wan: fsl_qmc_hdlc: Convert carrier_lock spinlock to a mutexHerve Codina
The carrier_lock spinlock protects the carrier detection. While it is held, framer_get_status() is called which in turn takes a mutex. This is not correct and can lead to a deadlock. A run with PROVE_LOCKING enabled detected the issue: [ BUG: Invalid wait context ] ... c204ddbc (&framer->mutex){+.+.}-{3:3}, at: framer_get_status+0x40/0x78 other info that might help us debug this: context-{4:4} 2 locks held by ifconfig/146: #0: c0926a38 (rtnl_mutex){+.+.}-{3:3}, at: devinet_ioctl+0x12c/0x664 #1: c2006a40 (&qmc_hdlc->carrier_lock){....}-{2:2}, at: qmc_hdlc_framer_set_carrier+0x30/0x98 Avoid the spinlock usage and convert carrier_lock to a mutex. Fixes: 54762918ca85 ("net: wan: fsl_qmc_hdlc: Add framer support") Cc: stable@vger.kernel.org Signed-off-by: Herve Codina <herve.codina@bootlin.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240730063104.179553-1-herve.codina@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31Merge branch 'mlx5-misc-fixes-2024-07-30'Jakub Kicinski
Tariq Toukan says: ==================== mlx5 misc fixes 2024-07-30 This patchset provides misc bug fixes from the team to the mlx5 core and Eth drivers. ==================== Link: https://patch.msgid.link/20240730061638.1831002-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31net/mlx5e: Add a check for the return value from mlx5_port_set_eth_ptysShahar Shitrit
Since the documentation for mlx5_toggle_port_link states that it should only be used after setting the port register, we add a check for the return value from mlx5_port_set_eth_ptys to ensure the register was successfully set before calling it. Fixes: 667daedaecd1 ("net/mlx5e: Toggle link only after modifying port parameters") Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Link: https://patch.msgid.link/20240730061638.1831002-9-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31net/mlx5e: Fix CT entry update leaks of modify header contextChris Mi
The cited commit allocates a new modify header to replace the old one when updating CT entry. But if failed to allocate a new one, eg. exceed the max number firmware can support, modify header will be an error pointer that will trigger a panic when deallocating it. And the old modify header point is copied to old attr. When the old attr is freed, the old modify header is lost. Fix it by restoring the old attr to attr when failed to allocate a new modify header context. So when the CT entry is freed, the right modify header context will be freed. And the panic of accessing error pointer is also fixed. Fixes: 94ceffb48eac ("net/mlx5e: Implement CT entry update") Signed-off-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Link: https://patch.msgid.link/20240730061638.1831002-8-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31net/mlx5e: Require mlx5 tc classifier action support for IPsec prio capabilityRahul Rameshbabu
Require mlx5 classifier action support when creating IPSec chains in offload path. MLX5_IPSEC_CAP_PRIO should only be set if CONFIG_MLX5_CLS_ACT is enabled. If CONFIG_MLX5_CLS_ACT=n and MLX5_IPSEC_CAP_PRIO is set, configuring IPsec offload will fail due to the mlxx5 ipsec chain rules failing to be created due to lack of classifier action support. Fixes: fa5aa2f89073 ("net/mlx5e: Use chains for IPsec policy priority offload") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Link: https://patch.msgid.link/20240730061638.1831002-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31net/mlx5: Fix missing lock on sync reset reloadMoshe Shemesh
On sync reset reload work, when remote host updates devlink on reload actions performed on that host, it misses taking devlink lock before calling devlink_remote_reload_actions_performed() which results in triggering lock assert like the following: WARNING: CPU: 4 PID: 1164 at net/devlink/core.c:261 devl_assert_locked+0x3e/0x50 … CPU: 4 PID: 1164 Comm: kworker/u96:6 Tainted: G S W 6.10.0-rc2+ #116 Hardware name: Supermicro SYS-2028TP-DECTR/X10DRT-PT, BIOS 2.0 12/18/2015 Workqueue: mlx5_fw_reset_events mlx5_sync_reset_reload_work [mlx5_core] RIP: 0010:devl_assert_locked+0x3e/0x50 … Call Trace: <TASK> ? __warn+0xa4/0x210 ? devl_assert_locked+0x3e/0x50 ? report_bug+0x160/0x280 ? handle_bug+0x3f/0x80 ? exc_invalid_op+0x17/0x40 ? asm_exc_invalid_op+0x1a/0x20 ? devl_assert_locked+0x3e/0x50 devlink_notify+0x88/0x2b0 ? mlx5_attach_device+0x20c/0x230 [mlx5_core] ? __pfx_devlink_notify+0x10/0x10 ? process_one_work+0x4b6/0xbb0 process_one_work+0x4b6/0xbb0 […] Fixes: 84a433a40d0e ("net/mlx5: Lock mlx5 devlink reload callbacks") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Link: https://patch.msgid.link/20240730061638.1831002-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31net/mlx5: Lag, don't use the hardcoded value of the first portMark Bloch
The cited commit didn't change the body of the loop as it should. It shouldn't be using MLX5_LAG_P1. Fixes: 7e978e7714d6 ("net/mlx5: Lag, use actual number of lag ports") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Link: https://patch.msgid.link/20240730061638.1831002-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31net/mlx5: DR, Fix 'stack guard page was hit' error in dr_ruleYevgeny Kliteynik
This patch reduces the size of hw_ste_arr_optimized array that is allocated on stack from 640 bytes (5 match STEs + 5 action STES) to 448 bytes (2 match STEs + 5 action STES). This fixes the 'stack guard page was hit' issue, while still fitting majority of the usecases (up to 2 match STEs). Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Link: https://patch.msgid.link/20240730061638.1831002-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31net/mlx5: Fix error handling in irq_pool_request_irqShay Drory
In case mlx5_irq_alloc fails, the previously allocated index remains in the XArray, which could lead to inconsistencies. Fix it by adding error handling that erases the allocated index from the XArray if mlx5_irq_alloc returns an error. Fixes: c36326d38d93 ("net/mlx5: Round-Robin EQs over IRQs") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Link: https://patch.msgid.link/20240730061638.1831002-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31net/mlx5: Always drain health in shutdown callbackShay Drory
There is no point in recovery during device shutdown. if health work started need to wait for it to avoid races and NULL pointer access. Hence, drain health WQ on shutdown callback. Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver") Fixes: d2aa060d40fa ("net/mlx5: Cancel health poll before sending panic teardown command") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Link: https://patch.msgid.link/20240730061638.1831002-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31net: Add skbuff.h to MAINTAINERSBreno Leitao
The network maintainers need to be copied if the skbuff.h is touched. This also helps git-send-email to figure out the proper maintainers when touching the file. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240730161404.2028175-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31r8169: don't increment tx_dropped in case of NETDEV_TX_BUSYHeiner Kallweit
The skb isn't consumed in case of NETDEV_TX_BUSY, therefore don't increment the tx_dropped counter. Fixes: 188f4af04618 ("r8169: use NETDEV_TX_{BUSY/OK}") Cc: stable@vger.kernel.org Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Link: https://patch.msgid.link/bbba9c48-8bac-4932-9aa1-d2ed63bc9433@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-07-31 We've added 2 non-merge commits during the last 2 day(s) which contain a total of 2 files changed, 2 insertions(+), 2 deletions(-). The main changes are: 1) Fix BPF selftest build after tree sync with regards to a _GNU_SOURCE macro redefined compilation error, from Stanislav Fomichev. 2) Fix a wrong test in the ASSERT_OK() check in uprobe_syscall BPF selftest, from Jiri Olsa. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf/selftests: Fix ASSERT_OK condition check in uprobe_syscall test selftests/bpf: Filter out _GNU_SOURCE when compiling test_cpp ==================== Link: https://patch.msgid.link/20240731115706.19677-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-01nouveau: set placement to original placement on uvmm validate.Dave Airlie
When a buffer is evicted for memory pressure or TTM evict all, the placement is set to the eviction domain, this means the buffer never gets revalidated on the next exec to the correct domain. I think this should be fine to use the initial domain from the object creation, as least with VM_BIND this won't change after init so this should be the correct answer. Fixes: b88baab82871 ("drm/nouveau: implement new VM_BIND uAPI") Cc: Danilo Krummrich <dakr@redhat.com> Cc: <stable@vger.kernel.org> # v6.6 Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240515025542.2156774-1-airlied@gmail.com
2024-07-31netfilter: iptables: Fix potential null-ptr-deref in ip6table_nat_table_init().Kuniyuki Iwashima
ip6table_nat_table_init() accesses net->gen->ptr[ip6table_nat_net_ops.id], but the function is exposed to user space before the entry is allocated via register_pernet_subsys(). Let's call register_pernet_subsys() before xt_register_template(). Fixes: fdacd57c79b7 ("netfilter: x_tables: never register tables by default") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-07-31netfilter: iptables: Fix null-ptr-deref in iptable_nat_table_init().Kuniyuki Iwashima
We had a report that iptables-restore sometimes triggered null-ptr-deref at boot time. [0] The problem is that iptable_nat_table_init() is exposed to user space before the kernel fully initialises netns. In the small race window, a user could call iptable_nat_table_init() that accesses net_generic(net, iptable_nat_net_id), which is available only after registering iptable_nat_net_ops. Let's call register_pernet_subsys() before xt_register_template(). [0]: bpfilter: Loaded bpfilter_umh pid 11702 Started bpfilter BUG: kernel NULL pointer dereference, address: 0000000000000013 PF: supervisor write access in kernel mode PF: error_code(0x0002) - not-present page PGD 0 P4D 0 PREEMPT SMP NOPTI CPU: 2 PID: 11879 Comm: iptables-restor Not tainted 6.1.92-99.174.amzn2023.x86_64 #1 Hardware name: Amazon EC2 c6i.4xlarge/, BIOS 1.0 10/16/2017 RIP: 0010:iptable_nat_table_init (net/ipv4/netfilter/iptable_nat.c:87 net/ipv4/netfilter/iptable_nat.c:121) iptable_nat Code: 10 4c 89 f6 48 89 ef e8 0b 19 bb ff 41 89 c4 85 c0 75 38 41 83 c7 01 49 83 c6 28 41 83 ff 04 75 dc 48 8b 44 24 08 48 8b 0c 24 <48> 89 08 4c 89 ef e8 a2 3b a2 cf 48 83 c4 10 44 89 e0 5b 5d 41 5c RSP: 0018:ffffbef902843cd0 EFLAGS: 00010246 RAX: 0000000000000013 RBX: ffff9f4b052caa20 RCX: ffff9f4b20988d80 RDX: 0000000000000000 RSI: 0000000000000064 RDI: ffffffffc04201c0 RBP: ffff9f4b29394000 R08: ffff9f4b07f77258 R09: ffff9f4b07f77240 R10: 0000000000000000 R11: ffff9f4b09635388 R12: 0000000000000000 R13: ffff9f4b1a3c6c00 R14: ffff9f4b20988e20 R15: 0000000000000004 FS: 00007f6284340000(0000) GS:ffff9f51fe280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000013 CR3: 00000001d10a6005 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259) ? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259) ? xt_find_table_lock (net/netfilter/x_tables.c:1259) ? __die_body.cold (arch/x86/kernel/dumpstack.c:478 arch/x86/kernel/dumpstack.c:420) ? page_fault_oops (arch/x86/mm/fault.c:727) ? exc_page_fault (./arch/x86/include/asm/irqflags.h:40 ./arch/x86/include/asm/irqflags.h:75 arch/x86/mm/fault.c:1470 arch/x86/mm/fault.c:1518) ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:570) ? iptable_nat_table_init (net/ipv4/netfilter/iptable_nat.c:87 net/ipv4/netfilter/iptable_nat.c:121) iptable_nat xt_find_table_lock (net/netfilter/x_tables.c:1259) xt_request_find_table_lock (net/netfilter/x_tables.c:1287) get_info (net/ipv4/netfilter/ip_tables.c:965) ? security_capable (security/security.c:809 (discriminator 13)) ? ns_capable (kernel/capability.c:376 kernel/capability.c:397) ? do_ipt_get_ctl (net/ipv4/netfilter/ip_tables.c:1656) ? bpfilter_send_req (net/bpfilter/bpfilter_kern.c:52) bpfilter nf_getsockopt (net/netfilter/nf_sockopt.c:116) ip_getsockopt (net/ipv4/ip_sockglue.c:1827) __sys_getsockopt (net/socket.c:2327) __x64_sys_getsockopt (net/socket.c:2342 net/socket.c:2339 net/socket.c:2339) do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:81) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121) RIP: 0033:0x7f62844685ee Code: 48 8b 0d 45 28 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 37 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0a c3 66 0f 1f 84 00 00 00 00 00 48 8b 15 09 RSP: 002b:00007ffd1f83d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000037 RAX: ffffffffffffffda RBX: 00007ffd1f83d680 RCX: 00007f62844685ee RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000004 RBP: 0000000000000004 R08: 00007ffd1f83d670 R09: 0000558798ffa2a0 R10: 00007ffd1f83d680 R11: 0000000000000246 R12: 00007ffd1f83e3b2 R13: 00007f628455baa0 R14: 00007ffd1f83d7b0 R15: 00007f628457a008 </TASK> Modules linked in: iptable_nat(+) bpfilter rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache veth xt_state xt_connmark xt_nat xt_statistic xt_MASQUERADE xt_mark xt_addrtype ipt_REJECT nf_reject_ipv4 nft_chain_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment nft_compat nf_tables nfnetlink overlay nls_ascii nls_cp437 vfat fat ghash_clmulni_intel aesni_intel ena crypto_simd ptp cryptd i8042 pps_core serio button sunrpc sch_fq_codel configfs loop dm_mod fuse dax dmi_sysfs crc32_pclmul crc32c_intel efivarfs CR2: 0000000000000013 Fixes: fdacd57c79b7 ("netfilter: x_tables: never register tables by default") Reported-by: Takahiro Kawahara <takawaha@amazon.co.jp> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-07-31x86/setup: Parse the builtin command line before mergingBorislav Petkov (AMD)
Commit in Fixes was added as a catch-all for cases where the cmdline is parsed before being merged with the builtin one. And promptly one issue appeared, see Link below. The microcode loader really needs to parse it that early, but the merging happens later. Reshuffling the early boot nightmare^W code to handle that properly would be a painful exercise for another day so do the chicken thing and parse the builtin cmdline too before it has been merged. Fixes: 0c40b1c7a897 ("x86/setup: Warn when option parsing is done too early") Reported-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240730152108.GAZqkE5Dfi9AuKllRw@fat_crate.local Link: https://lore.kernel.org/r/20240722152330.GCZp55ck8E_FT4kPnC@fat_crate.local
2024-07-31drm/atomic: Allow userspace to use damage clips with async flipsAndré Almeida
Allow userspace to use damage clips with atomic async flips. Damage clips are useful for partial plane updates, which can be helpful for clients that want to do flips asynchronously. Fixes: 0e26cc72c71c ("drm: Refuse to async flip with atomic prop changes") Signed-off-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Simon Ser <contact@emersion.fr> Signed-off-by: Simon Ser <contact@emersion.fr> Link: https://patchwork.freedesktop.org/patch/msgid/20240702212215.109696-2-andrealmeid@igalia.com
2024-07-31drm/atomic: Allow userspace to use explicit sync with atomic async flipsAndré Almeida
Allow userspace to use explicit synchronization with atomic async flips. That means that the flip will wait for some hardware fence, and then will flip as soon as possible (async) in regard of the vblank. Fixes: 0e26cc72c71c ("drm: Refuse to async flip with atomic prop changes") Signed-off-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Simon Ser <contact@emersion.fr> Signed-off-by: Simon Ser <contact@emersion.fr> Link: https://patchwork.freedesktop.org/patch/msgid/20240702212215.109696-1-andrealmeid@igalia.com
2024-07-31ALSA: hda: Conditionally use snooping for AMD HDMITakashi Iwai
The recent regression report revealed that the use of WC pages for AMD HDMI device together with AMD IOMMU leads to unexpected truncation or noises. The issue seems triggered by the change in the kernel core memory allocation that enables IOMMU driver to use always S/G buffers. Meanwhile, the use of WC pages has been a workaround for the similar issue with standard pages in the past. So, now we need to apply the workaround conditionally, namely, only when IOMMU isn't in place. This patch modifies the workaround code to check the DMA ops at first and apply the snoop-off only when needed. Fixes: f5ff79fddf0e ("dma-mapping: remove CONFIG_DMA_REMAP") Link: https://bugzilla.kernel.org/show_bug.cgi?id=219087 Link: https://patch.msgid.link/20240731170521.31714-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-07-31minmax: fix up min3() and max3() tooLinus Torvalds
David Laight pointed out that we should deal with the min3() and max3() mess too, which still does excessive expansion. And our current macros are actually rather broken. In particular, the macros did this: #define min3(x, y, z) min((typeof(x))min(x, y), z) #define max3(x, y, z) max((typeof(x))max(x, y), z) and that not only is a nested expansion of possibly very complex arguments with all that involves, the typing with that "typeof()" cast is completely wrong. For example, imagine what happens in max3() if 'x' happens to be a 'unsigned char', but 'y' and 'z' are 'unsigned long'. The types are compatible, and there's no warning - but the result is just random garbage. No, I don't think we've ever hit that issue in practice, but since we now have sane infrastructure for doing this right, let's just use it. It fixes any excessive expansion, and also avoids these kinds of broken type issues. Requested-by: David Laight <David.Laight@aculab.com> Acked-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-31riscv: cpufeature: Do not drop Linux-internal extensionsSamuel Holland
The Linux-internal Xlinuxenvcfg ISA extension is omitted from the riscv_isa_ext array because it has no DT binding and should not appear in /proc/cpuinfo. The logic added in commit 625034abd52a ("riscv: add ISA extensions validation callback") assumes all extensions are included in riscv_isa_ext, and so riscv_resolve_isa() wrongly drops Xlinuxenvcfg from the final ISA string. Instead, accept such Linux-internal ISA extensions as if they have no validation callback. Fixes: 625034abd52a ("riscv: add ISA extensions validation callback") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240718213011.2600150-1-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-31s390: Keep inittext section writableHeiko Carstens
There is no added security by making the inittext section non-writable, however it does split part of the kernel mapping into 4K mappings instead of 1M mappings: ---[ Kernel Image Start ]--- 0x000003ffe0000000-0x000003ffe0e00000 14M PMD RO X 0x000003ffe0e00000-0x000003ffe0ec7000 796K PTE RO X 0x000003ffe0ec7000-0x000003ffe0f00000 228K PTE RO NX 0x000003ffe0f00000-0x000003ffe1300000 4M PMD RO NX 0x000003ffe1300000-0x000003ffe1353000 332K PTE RO NX 0x000003ffe1353000-0x000003ffe1400000 692K PTE RW NX 0x000003ffe1400000-0x000003ffe1500000 1M PMD RW NX 0x000003ffe1500000-0x000003ffe1700000 2M PTE RW NX <--- 0x000003ffe1700000-0x000003ffe1800000 1M PMD RW NX 0x000003ffe1800000-0x000003ffe187e000 504K PTE RW NX ---[ Kernel Image End ]--- Keep the inittext writable and enable instruction execution protection (aka noexec) later to prevent this. This also allows to use the generic free_initmem() implementation. ---[ Kernel Image Start ]--- 0x000003ffe0000000-0x000003ffe0e00000 14M PMD RO X 0x000003ffe0e00000-0x000003ffe0ec7000 796K PTE RO X 0x000003ffe0ec7000-0x000003ffe0f00000 228K PTE RO NX 0x000003ffe0f00000-0x000003ffe1300000 4M PMD RO NX 0x000003ffe1300000-0x000003ffe1353000 332K PTE RO NX 0x000003ffe1353000-0x000003ffe1400000 692K PTE RW NX 0x000003ffe1400000-0x000003ffe1800000 4M PMD RW NX <--- 0x000003ffe1800000-0x000003ffe187e000 504K PTE RW NX ---[ Kernel Image End ]--- Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-31s390/vmlinux.lds.S: Move ro_after_init section behind rodata sectionHeiko Carstens
The .data.rel.ro and .got section were added between the rodata and ro_after_init data section, which adds an RW mapping in between all RO mapping of the kernel image: ---[ Kernel Image Start ]--- 0x000003ffe0000000-0x000003ffe0e00000 14M PMD RO X 0x000003ffe0e00000-0x000003ffe0ec7000 796K PTE RO X 0x000003ffe0ec7000-0x000003ffe0f00000 228K PTE RO NX 0x000003ffe0f00000-0x000003ffe1300000 4M PMD RO NX 0x000003ffe1300000-0x000003ffe1331000 196K PTE RO NX 0x000003ffe1331000-0x000003ffe13b3000 520K PTE RW NX <--- 0x000003ffe13b3000-0x000003ffe13d5000 136K PTE RO NX 0x000003ffe13d5000-0x000003ffe1400000 172K PTE RW NX 0x000003ffe1400000-0x000003ffe1500000 1M PMD RW NX 0x000003ffe1500000-0x000003ffe1700000 2M PTE RW NX 0x000003ffe1700000-0x000003ffe1800000 1M PMD RW NX 0x000003ffe1800000-0x000003ffe187e000 504K PTE RW NX ---[ Kernel Image End ]--- Move the ro_after_init data section again right behind the rodata section to prevent interleaving RO and RW mappings: ---[ Kernel Image Start ]--- 0x000003ffe0000000-0x000003ffe0e00000 14M PMD RO X 0x000003ffe0e00000-0x000003ffe0ec7000 796K PTE RO X 0x000003ffe0ec7000-0x000003ffe0f00000 228K PTE RO NX 0x000003ffe0f00000-0x000003ffe1300000 4M PMD RO NX 0x000003ffe1300000-0x000003ffe1353000 332K PTE RO NX 0x000003ffe1353000-0x000003ffe1400000 692K PTE RW NX 0x000003ffe1400000-0x000003ffe1500000 1M PMD RW NX 0x000003ffe1500000-0x000003ffe1700000 2M PTE RW NX 0x000003ffe1700000-0x000003ffe1800000 1M PMD RW NX 0x000003ffe1800000-0x000003ffe187e000 504K PTE RW NX ---[ Kernel Image End ]--- Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-31s390/mm: Get rid of RELOC_HIDE()Heiko Carstens
Since __va(0) does not translate to NULL anymore remove RELOC_HIDE() which was only added to get rid of a compile warning with clang W=1: arch/s390/mm/vmem.c:666:36: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 666 | __set_memory_4k(__va(0), __va(0) + ident_map_size); | ~~~~~~~ ^ Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-31s390/mm/ptdump: Improve sorting of markersHeiko Carstens
Use the sort() from lib/sort.c to sort markers instead of the private implementation. The current implementation does not sort markers properly if they have to be moved downwards: ---[ Real Memory Copy Area Start ]--- 0x0000035b903ff000-0x0000035b90400000 4K PTE I ---[ vmalloc Area Start ]--- ---[ Real Memory Copy Area End ]--- Add a new member to each marker which indicates if a marker is start of an area. If addresses of areas are equal consider an address which defines the start of an area higher than the address which defines the end of an area. In result the output is sorted as intended: ---[ Real Memory Copy Area Start ]--- 0x0000019cedcff000-0x0000019cedd00000 4K PTE I ---[ Real Memory Copy Area End ]--- ---[ vmalloc Area Start ]--- Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-31s390/mm/ptdump: Add support for relocated lowcore mappingHeiko Carstens
The page table dumper contains a hard coded assumption that the first mapped area starts at address zero. With a relocated lowcore this is not true anymore. Subsequently the first entry (lowcore) is printed as if it would contain everything from address zero until the end of the location of the lowcore area. Fix this by adding a single "Kernel Virtual Address Space" entry, which always starts at address zero. It ends when the lowcore area starts which is either address zero, or its relocated address. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-31s390/mm/ptdump: Fix handling of identity mapping areaHeiko Carstens
Since virtual and real addresses are not the same anymore the assumption that the kernel image is contained within the identity mapping is also not true anymore. Fix this by adding two explicit areas and at the correct locations: one for the 8kb lowcore area, and one for the identity mapping. Fixes: c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces") Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-31s390/cio: Add missing MODULE_DESCRIPTION() macrosJeff Johnson
With ARCH=s390, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/s390/cio/ccwgroup.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/s390/cio/vfio_ccw.o Add the missing invocations of the MODULE_DESCRIPTION() macro. Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com> Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20240715-md-s390-drivers-s390-cio-v2-1-97eaa6971124@quicinc.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-31s390/alternatives: Remove unused empty header fileHeiko Carstens
Remove the unused and empty arch/s390/kernel/alternative.h header file which was added by mistake. Fixes: 5ade5be4edf8 ("s390: Add infrastructure to patch lowcore accesses") Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-31s390/fpu: Re-add exception handling in load_fpu_state()Heiko Carstens
With the recent rewrite of the fpu code exception handling for the lfpc instruction within load_fpu_state() was erroneously removed. Add it again to prevent that loading invalid floating point register values cause an unhandled specification exception. Fixes: 8c09871a950a ("s390/fpu: limit save and restore to used registers") Cc: stable@vger.kernel.org Reported-by: Aristeu Rozanski <aris@redhat.com> Tested-by: Aristeu Rozanski <aris@redhat.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-31ALSA: usb-audio: Correct surround channels in UAC1 channel mapTakashi Iwai
USB-audio driver puts SNDRV_CHMAP_SL and _SR as left and right surround channels for UAC1 channel map, respectively. But they should have been SNDRV_CHMAP_RL and _RR; the current value *_SL and _SR are rather "side" channels, not "surround". I guess I took those mistakenly when I read the spec mentioning "surround left". This patch corrects those entries to be the right channels. Suggested-by: Sylvain BERTRAND <sylvain.bertrand@legeek.net> Closes: https://lore.kernel.orgZ/qIyJD8lhd8hFhlC@freedom Fixes: 04324ccc75f9 ("ALSA: usb-audio: add channel map support") Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20240731142018.24750-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-07-31ALSA: seq: ump: Explicitly reset RPN with Null RPNTakashi Iwai
RPN with 127:127 is treated as a Null RPN, just to reset the parameters, and it's not translated to MIDI2. Although the current code can work as is in most cases, better to implement the RPN reset explicitly for Null message. Link: https://patch.msgid.link/20240731130528.12600-6-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-07-31ALSA: seq: ump: Transmit RPN/NRPN message at each MSB/LSB data receptionTakashi Iwai
Just like the core UMP conversion helper, we need to deal with the partially-filled RPN/NRPN data in the sequencer UMP converter as well. Link: https://patch.msgid.link/20240731130528.12600-5-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-07-31ALSA: seq: ump: Use the common RPN/bank conversion contextTakashi Iwai
The UMP core conversion helper API already defines the context needed to record the bank and RPN/NRPN values, and we can simply re-use the same struct instead of re-defining the same content as a different name. Link: https://patch.msgid.link/20240731130528.12600-4-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-07-31ALSA: ump: Explicitly reset RPN with Null RPNTakashi Iwai
RPN with 127:127 is treated as a Null RPN, just to reset the parameters, and it's not translated to MIDI2. Although the current code can work as is in most cases, better to implement the RPN reset explicitly for Null message. Link: https://patch.msgid.link/20240731130528.12600-3-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-07-31ALSA: ump: Transmit RPN/NRPN message at each MSB/LSB data receptionTakashi Iwai
The UMP 1.1 spec says that an RPN/NRPN should be sent when one of the following occurs: * a CC 38 is received * a subsequent CC 6 is received * a CC 98, 99, 100, and 101 is received, indicating the last RPN/NRPN message has ended and a new one has started That said, we should send a partial data even if it's not fully filled. Let's change the UMP conversion helper code to follow that rule. Link: https://patch.msgid.link/20240731130528.12600-2-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-07-31perf/x86: Fix smp_processor_id()-in-preemptible warningsLi Huafei
The following bug was triggered on a system built with CONFIG_DEBUG_PREEMPT=y: # echo p > /proc/sysrq-trigger BUG: using smp_processor_id() in preemptible [00000000] code: sh/117 caller is perf_event_print_debug+0x1a/0x4c0 CPU: 3 UID: 0 PID: 117 Comm: sh Not tainted 6.11.0-rc1 #109 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x4f/0x60 check_preemption_disabled+0xc8/0xd0 perf_event_print_debug+0x1a/0x4c0 __handle_sysrq+0x140/0x180 write_sysrq_trigger+0x61/0x70 proc_reg_write+0x4e/0x70 vfs_write+0xd0/0x430 ? handle_mm_fault+0xc8/0x240 ksys_write+0x9c/0xd0 do_syscall_64+0x96/0x190 entry_SYSCALL_64_after_hwframe+0x4b/0x53 This is because the commit d4b294bf84db ("perf/x86: Hybrid PMU support for counters") took smp_processor_id() outside the irq critical section. If a preemption occurs in perf_event_print_debug() and the task is migrated to another cpu, we may get incorrect pmu debug information. Move smp_processor_id() back inside the irq critical section to fix this issue. Fixes: d4b294bf84db ("perf/x86: Hybrid PMU support for counters") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20240729220928.325449-1-lihuafei1@huawei.com
2024-07-31jump_label: Fix the fix, brown paper bags galorePeter Zijlstra
Per the example of: !atomic_cmpxchg(&key->enabled, 0, 1) the inverse was written as: atomic_cmpxchg(&key->enabled, 1, 0) except of course, that while !old is only true for old == 0, old is true for everything except old == 0. Fix it to read: atomic_cmpxchg(&key->enabled, 1, 0) == 1 such that only the 1->0 transition returns true and goes on to disable the keys. Fixes: 83ab38ef0a0b ("jump_label: Fix concurrency issues in static_key_slow_dec()") Reported-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Darrick J. Wong <djwong@kernel.org> Link: https://lkml.kernel.org/r/20240731105557.GY33588@noisy.programming.kicks-ass.net
2024-07-31tick/broadcast: Move per CPU pointer access into the atomic sectionThomas Gleixner
The recent fix for making the take over of the broadcast timer more reliable retrieves a per CPU pointer in preemptible context. This went unnoticed as compilers hoist the access into the non-preemptible region where the pointer is actually used. But of course it's valid that the compiler keeps it at the place where the code puts it which rightfully triggers: BUG: using smp_processor_id() in preemptible [00000000] code: caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0 Move it to the actual usage site which is in a non-preemptible region. Fixes: f7d43dd206e7 ("tick/broadcast: Make takeover of broadcast hrtimer reliable") Reported-by: David Wang <00107082@163.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Yu Liao <liaoyu15@huawei.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/87ttg56ers.ffs@tglx
2024-07-31Merge branch 'thermal-intel'Rafael J. Wysocki
Merge fixes for the int340x thermal driver handling of MSI IRQs: - Fix MSI error path cleanup in int340x, allow it to work with a subset of thermal MSI IRQs if some of them are not working and make it free all MSI IRQs on module exit (Srinivas Pandruvada). * thermal-intel: thermal: intel: int340x: Free MSI IRQ vectors on module exit thermal: intel: int340x: Allow limited thermal MSI support thermal: intel: int340x: Fix kernel warning during MSI cleanup
2024-07-31thermal: trip: Avoid skipping trips in thermal_zone_set_trips()Rafael J. Wysocki
Say there are 3 trip points A, B, C sorted in ascending temperature order with no hysteresis. If the zone temerature is exactly equal to B, thermal_zone_set_trips() will set the boundaries to A and C and the hardware will not catch any crossing of B (either way) until either A or C is crossed and the boundaries are changed. To avoid that, use non-strict inequalities when comparing the trip threshold to the zone temperature in thermal_zone_set_trips(). In the example above, it will cause both boundaries to be set to B, which is desirable because an interrupt will trigger when the zone temperature becomes different from B regardless of which way it goes. That will allow a new interval to be set depending on the direction of the zone temperature change. Fixes: 893bae92237d ("thermal: trip: Make thermal_zone_set_trips() use trip thresholds") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/12509184.O9o76ZdvQC@rjwysocki.net
2024-07-31Revert "ALSA: firewire-lib: operate for period elapse event in process context"Edmund Raile
Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event in process context") removed the process context workqueue from amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove its overhead. With RME Fireface 800, this lead to a regression since Kernels 5.14.0, causing an AB/BA deadlock competition for the substream lock with eventual system freeze under ALSA operation: thread 0: * (lock A) acquire substream lock by snd_pcm_stream_lock_irq() in snd_pcm_status64() * (lock B) wait for tasklet to finish by calling tasklet_unlock_spin_wait() in tasklet_disable_in_atomic() in ohci_flush_iso_completions() of ohci.c thread 1: * (lock B) enter tasklet * (lock A) attempt to acquire substream lock, waiting for it to be released: snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed() in update_pcm_pointers() in process_ctx_payloads() in process_rx_packets() of amdtp-stream.c ? tasklet_unlock_spin_wait </NMI> <TASK> ohci_flush_iso_completions firewire_ohci amdtp_domain_stream_pcm_pointer snd_firewire_lib snd_pcm_update_hw_ptr0 snd_pcm snd_pcm_status64 snd_pcm ? native_queued_spin_lock_slowpath </NMI> <IRQ> _raw_spin_lock_irqsave snd_pcm_period_elapsed snd_pcm process_rx_packets snd_firewire_lib irq_target_callback snd_firewire_lib handle_it_packet firewire_ohci context_tasklet firewire_ohci Restore the process context work queue to prevent deadlock AB/BA deadlock competition for ALSA substream lock of snd_pcm_stream_lock_irq() in snd_pcm_status64() and snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed(). revert commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event in process context") Replace inline description to prevent future deadlock. Cc: stable@vger.kernel.org Fixes: 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event in process context") Reported-by: edmund.raile <edmund.raile@proton.me> Closes: https://lore.kernel.org/r/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/ Signed-off-by: Edmund Raile <edmund.raile@protonmail.com> Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20240730195318.869840-3-edmund.raile@protonmail.com
2024-07-31Revert "ALSA: firewire-lib: obsolete workqueue for period update"Edmund Raile
prepare resolution of AB/BA deadlock competition for substream lock: restore workqueue previously used for process context: revert commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period update") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl@jn3eo7pe642q/ Signed-off-by: Edmund Raile <edmund.raile@protonmail.com> Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20240730195318.869840-2-edmund.raile@protonmail.com
2024-07-30Merge tag 'for-6.11-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix regression in extent map rework when handling insertion of overlapping compressed extent - fix unexpected file length when appending to a file using direct io and buffer not faulted in - in zoned mode, fix accounting of unusable space when flipping read-only block group back to read-write - fix page locking when COWing an inline range, assertion failure found by syzbot - fix calculation of space info in debugging print - tree-checker, add validation of data reference item - fix a few -Wmaybe-uninitialized build warnings * tag 'for-6.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: initialize location to fix -Wmaybe-uninitialized in btrfs_lookup_dentry() btrfs: fix corruption after buffer fault in during direct IO append write btrfs: zoned: fix zone_unusable accounting on making block group read-write again btrfs: do not subtract delalloc from avail bytes btrfs: make cow_file_range_inline() honor locked_page on error btrfs: fix corrupt read due to bad offset of a compressed extent map btrfs: tree-checker: validate dref root and objectid
2024-07-30Merge tag 'perf-tools-fixes-for-v6.11-2024-07-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Namhyung Kim: "Some more build fixes and a random crash fix: - Fix cross-build by setting pkg-config env according to the arch - Fix static build for missing library dependencies - Fix Segfault when callchain has no symbols" * tag 'perf-tools-fixes-for-v6.11-2024-07-30' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf docs: Document cross compilation perf: build: Link lib 'zstd' for static build perf: build: Link lib 'lzma' for static build perf: build: Only link libebl.a for old libdw perf: build: Set Python configuration for cross compilation perf: build: Setup PKG_CONFIG_LIBDIR for cross compilation perf tool: fix dereferencing NULL al->maps
2024-07-30Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== ice: fix AF_XDP ZC timeout and concurrency issues Maciej Fijalkowski says: Changes included in this patchset address an issue that customer has been facing when AF_XDP ZC Tx sockets were used in combination with flow control and regular Tx traffic. After executing: ethtool --set-priv-flags $dev link-down-on-close on ethtool -A $dev rx on tx on launching multiple ZC Tx sockets on $dev + pinging remote interface (so that regular Tx traffic is present) and then going through down/up of $dev, Tx timeout occurred and then most of the time ice driver was unable to recover from that state. These patches combined together solve the described above issue on customer side. Main focus here is to forbid producing Tx descriptors when either carrier is not yet initialized or process of bringing interface down has already started. v1: https://lore.kernel.org/netdev/20240708221416.625850-1-anthony.l.nguyen@intel.com/ * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: xsk: fix txq interrupt mapping ice: add missing WRITE_ONCE when clearing ice_rx_ring::xdp_prog ice: improve updating ice_{t,r}x_ring::xsk_pool ice: toggle netif_carrier when setting up XSK pool ice: modify error handling when setting XSK pool in ndo_bpf ice: replace synchronize_rcu with synchronize_net ice: don't busy wait for Rx queue disable in ice_qp_dis() ice: respect netif readiness in AF_XDP ZC related ndo's ==================== Link: https://patch.msgid.link/20240729200716.681496-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-30net: drop bad gso csum_start and offset in virtio_net_hdrWillem de Bruijn
Tighten csum_start and csum_offset checks in virtio_net_hdr_to_skb for GSO packets. The function already checks that a checksum requested with VIRTIO_NET_HDR_F_NEEDS_CSUM is in skb linear. But for GSO packets this might not hold for segs after segmentation. Syzkaller demonstrated to reach this warning in skb_checksum_help offset = skb_checksum_start_offset(skb); ret = -EINVAL; if (WARN_ON_ONCE(offset >= skb_headlen(skb))) By injecting a TSO packet: WARNING: CPU: 1 PID: 3539 at net/core/dev.c:3284 skb_checksum_help+0x3d0/0x5b0 ip_do_fragment+0x209/0x1b20 net/ipv4/ip_output.c:774 ip_finish_output_gso net/ipv4/ip_output.c:279 [inline] __ip_finish_output+0x2bd/0x4b0 net/ipv4/ip_output.c:301 iptunnel_xmit+0x50c/0x930 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x2296/0x2c70 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x759/0xa60 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4850 [inline] netdev_start_xmit include/linux/netdevice.h:4864 [inline] xmit_one net/core/dev.c:3595 [inline] dev_hard_start_xmit+0x261/0x8c0 net/core/dev.c:3611 __dev_queue_xmit+0x1b97/0x3c90 net/core/dev.c:4261 packet_snd net/packet/af_packet.c:3073 [inline] The geometry of the bad input packet at tcp_gso_segment: [ 52.003050][ T8403] skb len=12202 headroom=244 headlen=12093 tailroom=0 [ 52.003050][ T8403] mac=(168,24) mac_len=24 net=(192,52) trans=244 [ 52.003050][ T8403] shinfo(txflags=0 nr_frags=1 gso(size=1552 type=3 segs=0)) [ 52.003050][ T8403] csum(0x60000c7 start=199 offset=1536 ip_summed=3 complete_sw=0 valid=0 level=0) Mitigate with stricter input validation. csum_offset: for GSO packets, deduce the correct value from gso_type. This is already done for USO. Extend it to TSO. Let UFO be: udp[46]_ufo_fragment ignores these fields and always computes the checksum in software. csum_start: finding the real offset requires parsing to the transport header. Do not add a parser, use existing segmentation parsing. Thanks to SKB_GSO_DODGY, that also catches bad packets that are hw offloaded. Again test both TSO and USO. Do not test UFO for the above reason, and do not test UDP tunnel offload. GSO packet are almost always CHECKSUM_PARTIAL. USO packets may be CHECKSUM_NONE since commit 10154dbded6d6 ("udp: Allow GSO transmit from devices with no checksum offload"), but then still these fields are initialized correctly in udp4_hwcsum/udp6_hwcsum_outgoing. So no need to test for ip_summed == CHECKSUM_PARTIAL first. This revises an existing fix mentioned in the Fixes tag, which broke small packets with GSO offload, as detected by kselftests. Link: https://syzkaller.appspot.com/bug?extid=e1db31216c789f552871 Link: https://lore.kernel.org/netdev/20240723223109.2196886-1-kuba@kernel.org Fixes: e269d79c7d35 ("net: missing check virtio") Cc: stable@vger.kernel.org Signed-off-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240729201108.1615114-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-30net: phy: aquantia: only poll GLOBAL_CFG regs on aqr113, aqr113c and aqr115cBartosz Golaszewski
Commit 708405f3e56e ("net: phy: aquantia: wait for the GLOBAL_CFG to start returning real values") introduced a workaround for an issue observed on aqr115c. However there were never any reports of it happening on other models and the workaround has been reported to cause and issue on aqr113c (and it may cause the same on any other model not supporting 10M mode). Let's limit the impact of the workaround to aqr113, aqr113c and aqr115c and poll the 100M GLOBAL_CFG register instead as both models are known to support it correctly. Reported-by: Jon Hunter <jonathanh@nvidia.com> Closes: https://lore.kernel.org/lkml/7c0140be-4325-4005-9068-7e0fc5ff344d@nvidia.com/ Fixes: 708405f3e56e ("net: phy: aquantia: wait for the GLOBAL_CFG to start returning real values") Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Antoine Tenart <atenart@kernel.org> Link: https://patch.msgid.link/20240729150315.65798-1-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>