summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-07-30thermal: intel: int340x: Allow limited thermal MSI supportSrinivas Pandruvada
On some Lunar Lake pre-production systems, not all the MSI thermal vectors are valid. In that case instead of failing module load, continue with partial thermal interrupt support. pci_alloc_irq_vectors() can return less than expected maximum vectors. In that case call devm_request_threaded_irq() only for current maximum vectors. Fixes: 7a9a8c5faf41 ("thermal: intel: int340x: Support MSI interrupt for Lunar Lake") Reported-by: Yijun Shen <Yijun.Shen@dell.com> Tested-by: Yijun Shen <Yijun.Shen@dell.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Link: https://patch.msgid.link/20240723140228.865919-3-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-30thermal: intel: int340x: Fix kernel warning during MSI cleanupSrinivas Pandruvada
On some pre-production Lunar Lake systems, there is a kernel warning: remove_proc_entry: removing non-empty directory 'irq/172' WARNING: CPU: 0 PID: 501 at fs/proc/generic.c:717 remove_proc_entry+0x1b4/0x1e0 ... ... remove_proc_entry+0x1b4/0x1e0 report_bug+0x182/0x1b0 handle_bug+0x51/0xa0 exc_invalid_op+0x18/0x80 asm_exc_invalid_op+0x1b/0x20 remove_proc_entry+0x1b4/0x1e0 remove_proc_entry+0x1b4/0x1e0 unregister_irq_proc+0xf2/0x120 free_desc+0x41/0xe0 irq_domain_free_irqs+0x138/0x1c0 irq_free_descs+0x52/0x80 irq_domain_free_irqs+0x151/0x1c0 msi_domain_free_locked.part.0+0x17e/0x1c0 msi_domain_free_irqs_all_locked+0x74/0xc0 pci_msi_teardown_msi_irqs+0x50/0x60 pci_free_msi_irqs+0x12/0x40 pci_free_irq_vectors+0x58/0x70 On these systems, not all the MSI thermal vectors are valid. This causes devm_request_threaded_irq() to fail for some vectors. As part of the clean up on this error, pci_free_irq_vectors() is called without calling devm_free_irq(). This causes the above warning. Add a function proc_thermal_free_msi() to call devm_free_irq() for all successfully registered IRQ handlers, then call pci_free_irq_vectors(). Call this function for MSI cleanup. Fixes: 7a9a8c5faf41 ("thermal: intel: int340x: Support MSI interrupt for Lunar Lake") Reported-by: Yijun Shen <Yijun.shen@dell.com> Tested-by: Yijun Shen <Yijun.shen@dell.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Link: https://patch.msgid.link/20240723140228.865919-2-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-30drm/i915: Fix possible int overflow in skl_ddi_calculate_wrpll()Nikita Zhandarovich
On the off chance that clock value ends up being too high (by means of skl_ddi_calculate_wrpll() having been called with big enough value of crtc_state->port_clock * 1000), one possible consequence may be that the result will not be able to fit into signed int. Fix this issue by moving conversion of clock parameter from kHz to Hz into the body of skl_ddi_calculate_wrpll(), as well as casting the same parameter to u64 type while calculating the value for AFE clock. This both mitigates the overflow problem and avoids possible erroneous integer promotion mishaps. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: 82d354370189 ("drm/i915/skl: Implementation of SKL DPLL programming") Cc: stable@vger.kernel.org Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240729174035.25727-1-n.zhandarovich@fintech.ru (cherry picked from commit 833cf12846aa19adf9b76bc79c40747726f3c0c1) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2024-07-30drm/i915/hdcp: Fix HDCP2_STREAM_STATUS macroSuraj Kandpal
Fix HDCP2_STREAM_STATUS macro, it called pipe instead of port never threw a compile error as no one used it. --v2 -Add Fixes [Jani] Fixes: d631b984cc90 ("drm/i915/hdcp: Add HDCP 2.2 stream register") Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240730035505.3759899-1-suraj.kandpal@intel.com (cherry picked from commit 73d7cd542bbd0a7c6881ea0df5255f190a1e7236) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2024-07-30btrfs: initialize location to fix -Wmaybe-uninitialized in btrfs_lookup_dentry()David Sterba
Some arch + compiler combinations report a potentially unused variable location in btrfs_lookup_dentry(). This is a false alert as the variable is passed by value and always valid or there's an error. The compilers cannot probably reason about that although btrfs_inode_by_name() is in the same file. > + /kisskb/src/fs/btrfs/inode.c: error: 'location.objectid' may be used +uninitialized in this function [-Werror=maybe-uninitialized]: => 5603:9 > + /kisskb/src/fs/btrfs/inode.c: error: 'location.type' may be used +uninitialized in this function [-Werror=maybe-uninitialized]: => 5674:5 m68k-gcc8/m68k-allmodconfig mips-gcc8/mips-allmodconfig powerpc-gcc5/powerpc-all{mod,yes}config powerpc-gcc5/ppc64_defconfig Initialize it to zero, this should fix the warnings and won't change the behaviour as btrfs_inode_by_name() accepts only a root or inode item types, otherwise returns an error. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/linux-btrfs/bd4e9928-17b3-9257-8ba7-6b7f9bbb639a@linux-m68k.org/ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-30x86/CPU/AMD: Add models 0x60-0x6f to the Zen5 rangePerry Yuan
Add some new Zen5 models for the 0x1A family. [ bp: Merge the 0x60 and 0x70 ranges. ] Signed-off-by: Perry Yuan <perry.yuan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240729064626.24297-1-bp@kernel.org
2024-07-30net/iucv: fix use after free in iucv_sock_close()Alexandra Winter
iucv_sever_path() is called from process context and from bh context. iucv->path is used as indicator whether somebody else is taking care of severing the path (or it is already removed / never existed). This needs to be done with atomic compare and swap, otherwise there is a small window where iucv_sock_close() will try to work with a path that has already been severed and freed by iucv_callback_connrej() called by iucv_tasklet_fn(). Example: [452744.123844] Call Trace: [452744.123845] ([<0000001e87f03880>] 0x1e87f03880) [452744.123966] [<00000000d593001e>] iucv_path_sever+0x96/0x138 [452744.124330] [<000003ff801ddbca>] iucv_sever_path+0xc2/0xd0 [af_iucv] [452744.124336] [<000003ff801e01b6>] iucv_sock_close+0xa6/0x310 [af_iucv] [452744.124341] [<000003ff801e08cc>] iucv_sock_release+0x3c/0xd0 [af_iucv] [452744.124345] [<00000000d574794e>] __sock_release+0x5e/0xe8 [452744.124815] [<00000000d5747a0c>] sock_close+0x34/0x48 [452744.124820] [<00000000d5421642>] __fput+0xba/0x268 [452744.124826] [<00000000d51b382c>] task_work_run+0xbc/0xf0 [452744.124832] [<00000000d5145710>] do_notify_resume+0x88/0x90 [452744.124841] [<00000000d5978096>] system_call+0xe2/0x2c8 [452744.125319] Last Breaking-Event-Address: [452744.125321] [<00000000d5930018>] iucv_path_sever+0x90/0x138 [452744.125324] [452744.125325] Kernel panic - not syncing: Fatal exception in interrupt Note that bh_lock_sock() is not serializing the tasklet context against process context, because the check for sock_owned_by_user() and corresponding handling is missing. Ideas for a future clean-up patch: A) Correct usage of bh_lock_sock() in tasklet context, as described in Link: https://lore.kernel.org/netdev/1280155406.2899.407.camel@edumazet-laptop/ Re-enqueue, if needed. This may require adding return values to the tasklet functions and thus changes to all users of iucv. B) Change iucv tasklet into worker and use only lock_sock() in af_iucv. Fixes: 7d316b945352 ("af_iucv: remove IUCV-pathes completely") Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://patch.msgid.link/20240729122818.947756-1-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-30io_uring: remove unused local list heads in NAPI functionsOlivier Langlois
These lists are unused, remove them. Signed-off-by: Olivier Langlois <olivier@trillion01.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0a0ae3e955aed0f3e3d29882fb3d3cb575e0009b.1722294947.git.olivier@trillion01.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-30io_uring: keep multishot request NAPI timeout currentOlivier Langlois
This refresh statement was originally present in the original patch: https://lore.kernel.org/netdev/20221121191437.996297-2-shr@devkernel.io/ It has been removed with no explanation in v6: https://lore.kernel.org/netdev/20230201222254.744422-2-shr@devkernel.io/ It is important to make the refresh for multishot requests, because if no new requests using the same NAPI device are added to the ring, the entry will become stale and be removed silently. The unsuspecting user will not know that their ring had busy polling for only 60 seconds before being pruned. Signed-off-by: Olivier Langlois <olivier@trillion01.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Fixes: 8d0c12a80cdeb ("io-uring: add napi busy poll support") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/0fe61a019ec61e5708cd117cb42ed0dab95e1617.1722294646.git.olivier@trillion01.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-30platform/chrome: cros_ec_proto: Lock device when updating MKBP versionPatryk Duda
The cros_ec_get_host_command_version_mask() function requires that the caller must have ec_dev->lock mutex before calling it. This requirement was not met and as a result it was possible that two commands were sent to the device at the same time. The problem was observed while using UART backend which doesn't use any additional locks, unlike SPI backend which locks the controller until response is received. Fixes: f74c7557ed0d ("platform/chrome: cros_ec_proto: Update version on GET_NEXT_EVENT failure") Cc: stable@vger.kernel.org Signed-off-by: Patryk Duda <patrykd@google.com> Link: https://lore.kernel.org/r/20240730104425.607083-1-patrykd@google.com Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
2024-07-30net/smc: prevent UAF in inet_create()D. Wythe
Following syzbot repro crashes the kernel: socketpair(0x2, 0x1, 0x100, &(0x7f0000000140)) (fail_nth: 13) Fix this by not calling sk_common_release() from smc_create_clcsk(). Stack trace: socket: no more sockets ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 1 PID: 5092 at lib/refcount.c:28 refcount_warn_saturate+0x15a/0x1d0 lib/refcount.c:28 Modules linked in: CPU: 1 PID: 5092 Comm: syz-executor424 Not tainted 6.10.0-syzkaller-04483-g0be9ae5486cd #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024 RIP: 0010:refcount_warn_saturate+0x15a/0x1d0 lib/refcount.c:28 Code: 80 f3 1f 8c e8 e7 69 a8 fc 90 0f 0b 90 90 eb 99 e8 cb 4f e6 fc c6 05 8a 8d e8 0a 01 90 48 c7 c7 e0 f3 1f 8c e8 c7 69 a8 fc 90 <0f> 0b 90 90 e9 76 ff ff ff e8 a8 4f e6 fc c6 05 64 8d e8 0a 01 90 RSP: 0018:ffffc900034cfcf0 EFLAGS: 00010246 RAX: 3b9fcde1c862f700 RBX: ffff888022918b80 RCX: ffff88807b39bc00 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000003 R08: ffffffff815878a2 R09: fffffbfff1c39d94 R10: dffffc0000000000 R11: fffffbfff1c39d94 R12: 00000000ffffffe9 R13: 1ffff11004523165 R14: ffff888022918b28 R15: ffff888022918b00 FS: 00005555870e7380(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000140 CR3: 000000007582e000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> inet_create+0xbaf/0xe70 __sock_create+0x490/0x920 net/socket.c:1571 sock_create net/socket.c:1622 [inline] __sys_socketpair+0x2ca/0x720 net/socket.c:1769 __do_sys_socketpair net/socket.c:1822 [inline] __se_sys_socketpair net/socket.c:1819 [inline] __x64_sys_socketpair+0x9b/0xb0 net/socket.c:1819 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fbcb9259669 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 a1 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fffe931c6d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000035 RAX: ffffffffffffffda RBX: 00007fffe931c6f0 RCX: 00007fbcb9259669 RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000002 RBP: 0000000000000002 R08: 00007fffe931c476 R09: 00000000000000a0 R10: 0000000020000140 R11: 0000000000000246 R12: 00007fffe931c6ec R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001 </TASK> Link: https://lore.kernel.org/r/20240723175809.537291-1-edumazet@google.com/ Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://patch.msgid.link/1722224415-30999-1-git-send-email-alibuda@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-30Merge branch 'mptcp-fix-inconsistent-backup-usage'Paolo Abeni
Matthieu Baerts says: ==================== mptcp: fix inconsistent backup usage In all the MPTCP backup related tests, the backup flag was set on one side, and the expected behaviour is to have both sides respecting this decision. That's also the "natural" way, and what the users seem to expect. On the scheduler side, only the 'backup' field was checked, which is supposed to be set only if the other peer flagged a subflow as backup. But in various places, this flag was also set when the local host flagged the subflow as backup, certainly to have the expected behaviour mentioned above. Patch 1 modifies the packet scheduler to check if the backup flag has been set on both directions, not to change its behaviour after having applied the following patches. That's what the default packet scheduler should have done since the beginning in v5.7. Patch 2 fixes the backup flag being mirrored on the MPJ+SYN+ACK by accident since its introduction in v5.7. Instead, the received and sent backup flags are properly distinguished in requests. Patch 3 stops setting the received backup flag as well when sending an MP_PRIO, something that was done since the MP_PRIO support in v5.12. Patch 4 adds related and missing MIB counters to be able to easily check if MP_JOIN are sent with a backup flag. Certainly because these counters were not there, the behaviour that is fixed by patches here was not properly verified. Patch 5 validates the previous patch by extending the MPTCP Join selftest. Patch 6 fixes the backup support in signal endpoints: if a signal endpoint had the backup flag, it was not set in the MPJ+SYN+ACK as expected. It was only set for ongoing connections, but not future ones as expected, since the introduction of the backup flag in endpoints in v5.10. Patch 7 validates the previous patch by extending the MPTCP Join selftest as well. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- Matthieu Baerts (NGI0) (7): mptcp: sched: check both directions for backup mptcp: distinguish rcv vs sent backup flag in requests mptcp: pm: only set request_bkup flag when sending MP_PRIO mptcp: mib: count MPJ with backup flag selftests: mptcp: join: validate backup in MPJ mptcp: pm: fix backup support in signal endpoints selftests: mptcp: join: check backup support in signal endp include/trace/events/mptcp.h | 2 +- net/mptcp/mib.c | 2 + net/mptcp/mib.h | 2 + net/mptcp/options.c | 2 +- net/mptcp/pm.c | 12 +++++ net/mptcp/pm_netlink.c | 19 ++++++- net/mptcp/pm_userspace.c | 18 +++++++ net/mptcp/protocol.c | 10 ++-- net/mptcp/protocol.h | 4 ++ net/mptcp/subflow.c | 10 ++++ tools/testing/selftests/net/mptcp/mptcp_join.sh | 72 ++++++++++++++++++++----- 11 files changed, 132 insertions(+), 21 deletions(-) ==================== Link: https://patch.msgid.link/20240727-upstream-net-20240727-mptcp-backup-signal-v1-0-f50b31604cf1@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-30selftests: mptcp: join: check backup support in signal endpMatthieu Baerts (NGI0)
Before the previous commit, 'signal' endpoints with the 'backup' flag were ignored when sending the MP_JOIN. The MPTCP Join selftest has then been modified to validate this case: the "single address, backup" test, is now validating the MP_JOIN with a backup flag as it is what we expect it to do with such name. The previous version has been kept, but renamed to "single address, switch to backup" to avoid confusions. The "single address with port, backup" test is also now validating the MPJ with a backup flag, which makes more sense than checking the switch to backup with an MP_PRIO. The "mpc backup both sides" test is now validating that the backup flag is also set in MP_JOIN from and to the addresses used in the initial subflow, using the special ID 0. The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it validates the previous fix for an issue introduced by this commit ID. Fixes: 4596a2c1b7f5 ("mptcp: allow creating non-backup subflows") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-30mptcp: pm: fix backup support in signal endpointsMatthieu Baerts (NGI0)
There was a support for signal endpoints, but only when the endpoint's flag was changed during a connection. If an endpoint with the signal and backup was already present, the MP_JOIN reply was not containing the backup flag as expected. That's confusing to have this inconsistent behaviour. On the other hand, the infrastructure to set the backup flag in the SYN + ACK + MP_JOIN was already there, it was just never set before. Now when requesting the local ID from the path-manager, the backup status is also requested. Note that when the userspace PM is used, the backup flag can be set if the local address was already used before with a backup flag, e.g. if the address was announced with the 'backup' flag, or a subflow was created with the 'backup' flag. Fixes: 4596a2c1b7f5 ("mptcp: allow creating non-backup subflows") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/507 Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-30selftests: mptcp: join: validate backup in MPJMatthieu Baerts (NGI0)
A peer can notify the other one that a subflow has to be treated as "backup" by two different ways: either by sending a dedicated MP_PRIO notification, or by setting the backup flag in the MP_JOIN handshake. The selftests were previously monitoring the former, but not the latter. This is what is now done here by looking at these new MIB counters when validating the 'backup' cases: MPTcpExtMPJoinSynBackupRx MPTcpExtMPJoinSynAckBackupRx The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it will help to validate a new fix for an issue introduced by this commit ID. Fixes: 4596a2c1b7f5 ("mptcp: allow creating non-backup subflows") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-30mptcp: mib: count MPJ with backup flagMatthieu Baerts (NGI0)
Without such counters, it is difficult to easily debug issues with MPJ not having the backup flags on production servers. This is not strictly a fix, but it eases to validate the following patches without requiring to take packet traces, to query ongoing connections with Netlink with admin permissions, or to guess by looking at the behaviour of the packet scheduler. Also, the modification is self contained, isolated, well controlled, and the increments are done just after others, there from the beginning. It looks then safe, and helpful to backport this. Fixes: 4596a2c1b7f5 ("mptcp: allow creating non-backup subflows") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-30mptcp: pm: only set request_bkup flag when sending MP_PRIOMatthieu Baerts (NGI0)
The 'backup' flag from mptcp_subflow_context structure is supposed to be set only when the other peer flagged a subflow as backup, not the opposite. Fixes: 067065422fcd ("mptcp: add the outgoing MP_PRIO support") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-30mptcp: distinguish rcv vs sent backup flag in requestsMatthieu Baerts (NGI0)
When sending an MP_JOIN + SYN + ACK, it is possible to mark the subflow as 'backup' by setting the flag with the same name. Before this patch, the backup was set if the other peer set it in its MP_JOIN + SYN request. It is not correct: the backup flag should be set in the MPJ+SYN+ACK only if the host asks for it, and not mirroring what was done by the other peer. It is then required to have a dedicated bit for each direction, similar to what is done in the subflow context. Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-30mptcp: sched: check both directions for backupMatthieu Baerts (NGI0)
The 'mptcp_subflow_context' structure has two items related to the backup flags: - 'backup': the subflow has been marked as backup by the other peer - 'request_bkup': the backup flag has been set by the host Before this patch, the scheduler was only looking at the 'backup' flag. That can make sense in some cases, but it looks like that's not what we wanted for the general use, because either the path-manager was setting both of them when sending an MP_PRIO, or the receiver was duplicating the 'backup' flag in the subflow request. Note that the use of these two flags in the path-manager are going to be fixed in the next commits, but this change here is needed not to modify the behaviour. Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-30drm/ast: astdp: Wake up during connector status detectionThomas Zimmermann
Power up the ASTDP connector for connection status detection if the connector is not active. Keep it powered if a display is attached. This fixes a bug where the connector does not come back after disconnecting the display. The encoder's atomic_disable turns off power on the physical connector. Further HPD reads will fail, thus preventing the driver from detecting re-connected displays. For connectors that are actively used, only test the HPD flag without touching power. Fixes: f81bb0ac7872 ("drm/ast: report connection status on Display Port.") Cc: Jocelyn Falempe <jfalempe@redhat.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Dave Airlie <airlied@redhat.com> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v6.6+ Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240717143319.104012-2-tzimmermann@suse.de
2024-07-30x86/sev: Fix __reserved field in sev_configPavan Kumar Paluri
sev_config currently has debug, ghcbs_initialized, and use_cas fields. However, __reserved count has not been updated. Fix this. Fixes: 34ff65901735 ("x86/sev: Use kernel provided SVSM Calling Areas") Signed-off-by: Pavan Kumar Paluri <papaluri@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240729180808.366587-1-papaluri@amd.com
2024-07-30Merge drm-misc/drm-misc-next-fixes into drm-misc-fixesMaxime Ripard
There's a patch left in drm-misc-next-fixes, let's bring it into drm-misc-fixes. Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-07-30Merge drm/drm-fixes into drm-misc-fixesMaxime Ripard
Let's start the new drm-misc-fixes cycle by bringing in 6.11-rc1. Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-07-30media: v4l: Fix missing tabular column hint for Y14P formatJean-Michel Hautbois
The original patch added two columns in the flat-table of Luma-Only Image Formats, without updating hints to latex: above it. This results in wrong column count in the output of Sphinx's latex builder. Fix it. Reported-by: Akira Yokosawa <akiyks@gmail.com> Closes: https://lore.kernel.org/linux-media/bdbc27ba-5098-49fb-aabf-753c81361cc7@gmail.com/ Fixes: adb1d4655e53 ("media: v4l: Add V4L2-PIX-FMT-Y14P format") Cc: stable@vger.kernel.org # for v6.10 Signed-off-by: Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-07-30media: intel/ipu6: select AUXILIARY_BUS in KconfigBingbu Cao
Intel IPU6 PCI driver need register its devices on auxiliary bus, so it needs to select the AUXILIARY_BUS in Kconfig. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202407161833.7BEFXejx-lkp@intel.com/ Fixes: c70281cc83d6 ("media: intel/ipu6: add Kconfig and Makefile") Signed-off-by: Bingbu Cao <bingbu.cao@intel.com> Cc: stable@vger.kernel.org # for v6.10 Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-07-30media: ipu-bridge: fix ipu6 Kconfig dependenciesArnd Bergmann
Commit 4670c8c3fb04 ("media: ipu-bridge: Fix Kconfig dependencies") changed how IPU_BRIDGE dependencies are handled for all drivers, but the IPU6 variant was added the old way, which causes build time warnings when I2C is turned off: WARNING: unmet direct dependencies detected for IPU_BRIDGE Depends on [n]: MEDIA_SUPPORT [=m] && PCI [=y] && MEDIA_PCI_SUPPORT [=y] && (ACPI [=y] || COMPILE_TEST [=y]) && I2C [=n] Selected by [m]: - VIDEO_INTEL_IPU6 [=m] && MEDIA_SUPPORT [=m] && PCI [=y] && MEDIA_PCI_SUPPORT [=y] && (ACPI [=y] || COMPILE_TEST [=y]) && VIDEO_DEV [=m] && X86 [=y] && X86_64 [=y] && HAS_DMA [=y] To make it consistent with the other IPU drivers as well as avoid this warning, change the 'select' into 'depends on'. Fixes: c70281cc83d6 ("media: intel/ipu6: add Kconfig and Makefile") Signed-off-by: Arnd Bergmann <arnd@arndb.de> [Sakari Ailus: Alternatively depend on !IPU_BRIDGE.] Cc: stable@vger.kernel.org # for v6.10 Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-07-29Merge branch '6.11/scsi-queue' into 6.11/scsi-fixesMartin K. Petersen
Pull outstanding commits from 6.11 queue into fixes. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-29profiling: remove stale percpu flip buffer variablesLinus Torvalds
For some reason I didn't see this issue on my arm64 or x86-64 builds, but Stephen Rothwell reports that commit 2accfdb7eff6 ("profiling: attempt to remove per-cpu profile flip buffer") left these static variables around, and the powerpc build is unhappy about them: kernel/profile.c:52:28: warning: 'cpu_profile_flip' defined but not used [-Wunused-variable] 52 | static DEFINE_PER_CPU(int, cpu_profile_flip); | ^~~~~~~~~~~~~~~~ .. So remove these stale left-over remnants too. Fixes: 2accfdb7eff6 ("profiling: attempt to remove per-cpu profile flip buffer") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-29selftests/bpf: Filter out _GNU_SOURCE when compiling test_cppStanislav Fomichev
Jakub reports build failures when merging linux/master with net tree: CXX test_cpp In file included from <built-in>:454: <command line>:2:9: error: '_GNU_SOURCE' macro redefined [-Werror,-Wmacro-redefined] 2 | #define _GNU_SOURCE | ^ <built-in>:445:9: note: previous definition is here 445 | #define _GNU_SOURCE 1 The culprit is commit cc937dad85ae ("selftests: centralize -D_GNU_SOURCE= to CFLAGS in lib.mk") which unconditionally added -D_GNU_SOUCE to CLFAGS. Apparently clang++ also unconditionally adds it for the C++ targets [0] which causes a conflict. Add small change in the selftests makefile to filter it out for test_cpp. Not sure which tree it should go via, targeting bpf for now, but net might be better? 0: https://stackoverflow.com/questions/11670581/why-is-gnu-source-defined-by-default-and-how-to-turn-it-off Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20240725214029.1760809-1-sdf@fomichev.me
2024-07-29Merge tag 'for-linus-2024072901' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Benjamin Tissoires: - fixes for HID-BPF after the merge with the bpf tree (Arnd Bergmann and Benjamin Tissoires) - some tool type fix for the Wacom driver (Tatsunosuke Tobita) - a reorder of the sensor discovery to ensure the HID AMD SFH is removed when no sensors are available (Basavaraj Natikar) * tag 'for-linus-2024072901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: selftests/hid: add test for attaching multiple time the same struct_ops HID: bpf: prevent the same struct_ops to be attached more than once selftests/hid: disable struct_ops auto-attach selftests/hid: fix bpf_wq new API HID: amd_sfh: Move sensor discovery before HID device initialization hid: bpf: add BPF_JIT dependency HID: wacom: more appropriate tool type categorization HID: wacom: Modify pen IDs
2024-07-29Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "The biggest thing here is the adminq change - but it looks like the only way to avoid headq blocking causing indefinite stalls. This fixes three issues: - Prevent admin commands on one VF blocking another. This prevents a bad VF from blocking a good one, as well as fixing a scalability issue with large # of VFs - Correctly return error on command failure on octeon. We used to treat failed commands as a success. - Fix modpost warning when building virtio_dma_buf. Harmless, but the fix is trivial" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_pci_modern: remove admin queue serialization lock virtio_pci_modern: use completion instead of busy loop to wait on admin cmd result virtio_pci_modern: pass cmd as an identification token virtio_pci_modern: create admin queue of queried size virtio: create admin queues alongside other virtqueues virtio_pci: pass vq info as an argument to vp_setup_vq() virtio: push out code to vp_avq_index() virtio_pci_modern: treat vp_dev->admin_vq.info.vq pointer as static virtio_pci: introduce vector allocation fallback for slow path virtqueues virtio_pci: pass vector policy enum to vp_find_one_vq_msix() virtio_pci: pass vector policy enum to vp_find_vqs_msix() virtio_pci: simplify vp_request_msix_vectors() call a bit virtio_pci: push out single vq find code to vp_find_one_vq_msix() vdpa/octeon_ep: Fix error code in octep_process_mbox() virtio: add missing MODULE_DESCRIPTION() macro
2024-07-29task_work: make TWA_NMI_CURRENT handling conditional on IRQ_WORKLinus Torvalds
The TWA_NMI_CURRENT handling very much depends on IRQ_WORK, but that isn't universally enabled everywhere. Maybe the IRQ_WORK infrastructure should just be unconditional - x86 ends up indirectly enabling it through unconditionally enabling PERF_EVENTS, for example. But it also gets enabled by having SMP support, or even if you just have PRINTK enabled. But in the meantime TWA_NMI_CURRENT causes tons of build failures on various odd minimal configs. Which did show up in linux-next, but despite that nobody bothered to fix it or even inform me until -rc1 was out. Fixes: 466e4d801cd4 ("task_work: Add TWA_NMI_CURRENT as an additional notify mode") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: kernelci.org bot <bot@kernelci.org> Reported-by: Guenter Roeck <linux@roeck-us.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-29profiling: attempt to remove per-cpu profile flip bufferLinus Torvalds
This is the really old legacy kernel profiling code, which has long since been obviated by "real profiling" (ie 'prof' and company), and mainly remains as a source of syzbot reports. There are anecdotal reports that people still use it for boot-time profiling, but it's unlikely that such use would care about the old NUMA optimizations in this code from 2004 (commit ad02973d42: "profile: 512x Altix timer interrupt livelock fix" in the BK import archive at [1]) So in order to head off future syzbot reports, let's try to simplify this code and get rid of the per-cpu profile buffers that are quite a large portion of the complexity footprint of this thing (including CPU hotplug callbacks etc). It's unlikely anybody will actually notice, or possibly, as Thomas put it: "Only people who indulge in nostalgia will notice :)". That said, if it turns out that this code is actually actively used by somebody, we can always revert this removal. Thus the "attempt" in the summary line. [ Note: in a small nod to "the profiling code can cause NUMA problems", this also removes the "increment the last entry in the profiling array on any unknown hits" logic. That would account any program counter in a module to that single counter location, and might exacerbate any NUMA cacheline bouncing issues ] Link: https://lore.kernel.org/all/CAHk-=wgs52BxT4Zjmjz8aNvHWKxf5_ThBY4bYL1Y6CTaNL2dTw@mail.gmail.com/ Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git [1] Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-29profiling: remove prof_cpu_maskTetsuo Handa
syzbot is reporting uninit-value at profile_hits(), for there is a race window between if (!alloc_cpumask_var(&prof_cpu_mask, GFP_KERNEL)) return -ENOMEM; cpumask_copy(prof_cpu_mask, cpu_possible_mask); in profile_init() and cpumask_available(prof_cpu_mask) && cpumask_test_cpu(smp_processor_id(), prof_cpu_mask)) in profile_tick(); prof_cpu_mask remains uninitialzed until cpumask_copy() completes while cpumask_available(prof_cpu_mask) returns true as soon as alloc_cpumask_var(&prof_cpu_mask) completes. We could replace alloc_cpumask_var() with zalloc_cpumask_var() and call cpumask_copy() from create_proc_profile() on only UP kernels, for profile_online_cpu() calls cpumask_set_cpu() as needed via cpuhp_setup_state(CPUHP_AP_ONLINE_DYN) on SMP kernels. But this patch removes prof_cpu_mask because it seems unnecessary. The cpumask_test_cpu(smp_processor_id(), prof_cpu_mask) test in profile_tick() is likely always true due to a CPU cannot call profile_tick() if that CPU is offline and cpumask_set_cpu(cpu, prof_cpu_mask) is called when that CPU becomes online and cpumask_clear_cpu(cpu, prof_cpu_mask) is called when that CPU becomes offline . This test could be false during transition between online and offline. But according to include/linux/cpuhotplug.h , CPUHP_PROFILE_PREPARE belongs to PREPARE section, which means that the CPU subjected to profile_dead_cpu() cannot be inside profile_tick() (i.e. no risk of use-after-free bug) because interrupt for that CPU is disabled during PREPARE section. Therefore, this test is guaranteed to be true, and can be removed. (Since profile_hits() checks prof_buffer != NULL, we don't need to check prof_buffer != NULL here unless get_irq_regs() or user_mode() is such slow that we want to avoid when prof_buffer == NULL). do_profile_hits() is called from profile_tick() from timer interrupt only if cpumask_test_cpu(smp_processor_id(), prof_cpu_mask) is true and prof_buffer is not NULL. But syzbot is also reporting that sometimes do_profile_hits() is called while current thread is still doing vzalloc(), where prof_buffer must be NULL at this moment. This indicates that multiple threads concurrently tried to write to /sys/kernel/profiling interface, which caused that somebody else try to re-allocate prof_buffer despite somebody has already allocated prof_buffer. Fix this by using serialization. Reported-by: syzbot <syzbot+b1a83ab2a9eb9321fbdd@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=b1a83ab2a9eb9321fbdd Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot <syzbot+b1a83ab2a9eb9321fbdd@syzkaller.appspotmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-29Input: MT - limit max slotsTetsuo Handa
syzbot is reporting too large allocation at input_mt_init_slots(), for num_slots is supplied from userspace using ioctl(UI_DEV_CREATE). Since nobody knows possible max slots, this patch chose 1024. Reported-by: syzbot <syzbot+0122fa359a69694395d5@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=0122fa359a69694395d5 Suggested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-29Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linuxLinus Torvalds
Pull ARM updates from Russell King: - ftrace: don't assume stack frames are contiguous in memory - remove unused mod_inwind_map structure - spelling fixes - allow use of LD dead code/data elimination - fix callchain_trace() return value - add support for stackleak gcc plugin - correct some reset asm function prototypes for CFI [ Missed the merge window because Russell forgot to push out ] * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux: ARM: 9408/1: mm: CFI: Fix some erroneous reset prototypes ARM: 9407/1: Add support for STACKLEAK gcc plugin ARM: 9406/1: Fix callchain_trace() return value ARM: 9404/1: arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION ARM: 9403/1: Alpine: Spelling s/initialiing/initializing/ ARM: 9402/1: Kconfig: Spelling s/Cortex A-/Cortex-A/ ARM: 9400/1: Remove unused struct 'mod_unwind_map'
2024-07-29btrfs: fix corruption after buffer fault in during direct IO append writeFilipe Manana
During an append (O_APPEND write flag) direct IO write if the input buffer was not previously faulted in, we can corrupt the file in a way that the final size is unexpected and it includes an unexpected hole. The problem happens like this: 1) We have an empty file, with size 0, for example; 2) We do an O_APPEND direct IO with a length of 4096 bytes and the input buffer is not currently faulted in; 3) We enter btrfs_direct_write(), lock the inode and call generic_write_checks(), which calls generic_write_checks_count(), and that function sets the iocb position to 0 with the following code: if (iocb->ki_flags & IOCB_APPEND) iocb->ki_pos = i_size_read(inode); 4) We call btrfs_dio_write() and enter into iomap, which will end up calling btrfs_dio_iomap_begin() and that calls btrfs_get_blocks_direct_write(), where we update the i_size of the inode to 4096 bytes; 5) After btrfs_dio_iomap_begin() returns, iomap will attempt to access the page of the write input buffer (at iomap_dio_bio_iter(), with a call to bio_iov_iter_get_pages()) and fail with -EFAULT, which gets returned to btrfs at btrfs_direct_write() via btrfs_dio_write(); 6) At btrfs_direct_write() we get the -EFAULT error, unlock the inode, fault in the write buffer and then goto to the label 'relock'; 7) We lock again the inode, do all the necessary checks again and call again generic_write_checks(), which calls generic_write_checks_count() again, and there we set the iocb's position to 4K, which is the current i_size of the inode, with the following code pointed above: if (iocb->ki_flags & IOCB_APPEND) iocb->ki_pos = i_size_read(inode); 8) Then we go again to btrfs_dio_write() and enter iomap and the write succeeds, but it wrote to the file range [4K, 8K), leaving a hole in the [0, 4K) range and an i_size of 8K, which goes against the expectations of having the data written to the range [0, 4K) and get an i_size of 4K. Fix this by not unlocking the inode before faulting in the input buffer, in case we get -EFAULT or an incomplete write, and not jumping to the 'relock' label after faulting in the buffer - instead jump to a location immediately before calling iomap, skipping all the write checks and relocking. This solves this problem and it's fine even in case the input buffer is memory mapped to the same file range, since only holding the range locked in the inode's io tree can cause a deadlock, it's safe to keep the inode lock (VFS lock), as was fixed and described in commit 51bd9563b678 ("btrfs: fix deadlock due to page faults during direct IO reads and writes"). A sample reproducer provided by a reporter is the following: $ cat test.c #ifndef _GNU_SOURCE #define _GNU_SOURCE #endif #include <fcntl.h> #include <stdio.h> #include <sys/mman.h> #include <sys/stat.h> #include <unistd.h> int main(int argc, char *argv[]) { if (argc < 2) { fprintf(stderr, "Usage: %s <test file>\n", argv[0]); return 1; } int fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC | O_DIRECT | O_APPEND, 0644); if (fd < 0) { perror("creating test file"); return 1; } char *buf = mmap(NULL, 4096, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); ssize_t ret = write(fd, buf, 4096); if (ret < 0) { perror("pwritev2"); return 1; } struct stat stbuf; ret = fstat(fd, &stbuf); if (ret < 0) { perror("stat"); return 1; } printf("size: %llu\n", (unsigned long long)stbuf.st_size); return stbuf.st_size == 4096 ? 0 : 1; } A test case for fstests will be sent soon. Reported-by: Hanna Czenczek <hreitz@redhat.com> Link: https://lore.kernel.org/linux-btrfs/0b841d46-12fe-4e64-9abb-871d8d0de271@redhat.com/ Fixes: 8184620ae212 ("btrfs: fix lost file sync on direct IO write with nowait and dsync iocb") CC: stable@vger.kernel.org # 6.1+ Tested-by: Hanna Czenczek <hreitz@redhat.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-29btrfs: zoned: fix zone_unusable accounting on making block group read-write ↵Naohiro Aota
again When btrfs makes a block group read-only, it adds all free regions in the block group to space_info->bytes_readonly. That free space excludes reserved and pinned regions. OTOH, when btrfs makes the block group read-write again, it moves all the unused regions into the block group's zone_unusable. That unused region includes reserved and pinned regions. As a result, it counts too much zone_unusable bytes. Fortunately (or unfortunately), having erroneous zone_unusable does not affect the calculation of space_info->bytes_readonly, because free space (num_bytes in btrfs_dec_block_group_ro) calculation is done based on the erroneous zone_unusable and it reduces the num_bytes just to cancel the error. This behavior can be easily discovered by adding a WARN_ON to check e.g, "bg->pinned > 0" in btrfs_dec_block_group_ro(), and running fstests test case like btrfs/282. Fix it by properly considering pinned and reserved in btrfs_dec_block_group_ro(). Also, add a WARN_ON and introduce btrfs_space_info_update_bytes_zone_unusable() to catch a similar mistake. Fixes: 169e0da91a21 ("btrfs: zoned: track unusable bytes for zones") CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-29btrfs: do not subtract delalloc from avail bytesNaohiro Aota
The block group's avail bytes printed when dumping a space info subtract the delalloc_bytes. However, as shown in btrfs_add_reserved_bytes() and btrfs_free_reserved_bytes(), it is added or subtracted along with "reserved" for the delalloc case, which means the "delalloc_bytes" is a part of the "reserved" bytes. So, excluding it to calculate the avail space counts delalloc_bytes twice, which can lead to an invalid result. Fixes: e50b122b832b ("btrfs: print available space for a block group when dumping a space info") CC: stable@vger.kernel.org # 6.6+ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-29btrfs: make cow_file_range_inline() honor locked_page on errorBoris Burkov
The btrfs buffered write path runs through __extent_writepage() which has some tricky return value handling for writepage_delalloc(). Specifically, when that returns 1, we exit, but for other return values we continue and end up calling btrfs_folio_end_all_writers(). If the folio has been unlocked (note that we check the PageLocked bit at the start of __extent_writepage()), this results in an assert panic like this one from syzbot: BTRFS: error (device loop0 state EAL) in free_log_tree:3267: errno=-5 IO failure BTRFS warning (device loop0 state EAL): Skipping commit of aborted transaction. BTRFS: error (device loop0 state EAL) in cleanup_transaction:2018: errno=-5 IO failure assertion failed: folio_test_locked(folio), in fs/btrfs/subpage.c:871 ------------[ cut here ]------------ kernel BUG at fs/btrfs/subpage.c:871! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 1 PID: 5090 Comm: syz-executor225 Not tainted 6.10.0-syzkaller-05505-gb1bc554e009e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024 RIP: 0010:btrfs_folio_end_all_writers+0x55b/0x610 fs/btrfs/subpage.c:871 Code: e9 d3 fb ff ff e8 25 22 c2 fd 48 c7 c7 c0 3c 0e 8c 48 c7 c6 80 3d 0e 8c 48 c7 c2 60 3c 0e 8c b9 67 03 00 00 e8 66 47 ad 07 90 <0f> 0b e8 6e 45 b0 07 4c 89 ff be 08 00 00 00 e8 21 12 25 fe 4c 89 RSP: 0018:ffffc900033d72e0 EFLAGS: 00010246 RAX: 0000000000000045 RBX: 00fff0000000402c RCX: 663b7a08c50a0a00 RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000 RBP: ffffc900033d73b0 R08: ffffffff8176b98c R09: 1ffff9200067adfc R10: dffffc0000000000 R11: fffff5200067adfd R12: 0000000000000001 R13: dffffc0000000000 R14: 0000000000000000 R15: ffffea0001cbee80 FS: 0000000000000000(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f5f076012f8 CR3: 000000000e134000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __extent_writepage fs/btrfs/extent_io.c:1597 [inline] extent_write_cache_pages fs/btrfs/extent_io.c:2251 [inline] btrfs_writepages+0x14d7/0x2760 fs/btrfs/extent_io.c:2373 do_writepages+0x359/0x870 mm/page-writeback.c:2656 filemap_fdatawrite_wbc+0x125/0x180 mm/filemap.c:397 __filemap_fdatawrite_range mm/filemap.c:430 [inline] __filemap_fdatawrite mm/filemap.c:436 [inline] filemap_flush+0xdf/0x130 mm/filemap.c:463 btrfs_release_file+0x117/0x130 fs/btrfs/file.c:1547 __fput+0x24a/0x8a0 fs/file_table.c:422 task_work_run+0x24f/0x310 kernel/task_work.c:222 exit_task_work include/linux/task_work.h:40 [inline] do_exit+0xa2f/0x27f0 kernel/exit.c:877 do_group_exit+0x207/0x2c0 kernel/exit.c:1026 __do_sys_exit_group kernel/exit.c:1037 [inline] __se_sys_exit_group kernel/exit.c:1035 [inline] __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1035 x64_sys_call+0x2634/0x2640 arch/x86/include/generated/asm/syscalls_64.h:232 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f5f075b70c9 Code: Unable to access opcode bytes at 0x7f5f075b709f. I was hitting the same issue by doing hundreds of accelerated runs of generic/475, which also hits IO errors by design. I instrumented that reproducer with bpftrace and found that the undesirable folio_unlock was coming from the following callstack: folio_unlock+5 __process_pages_contig+475 cow_file_range_inline.constprop.0+230 cow_file_range+803 btrfs_run_delalloc_range+566 writepage_delalloc+332 __extent_writepage # inlined in my stacktrace, but I added it here extent_write_cache_pages+622 Looking at the bisected-to patch in the syzbot report, Josef realized that the logic of the cow_file_range_inline error path subtly changing. In the past, on error, it jumped to out_unlock in cow_file_range(), which honors the locked_page, so when we ultimately call folio_end_all_writers(), the folio of interest is still locked. After the change, we always unlocked ignoring the locked_page, on both success and error. On the success path, this all results in returning 1 to __extent_writepage(), which skips the folio_end_all_writers() call, which makes it OK to have unlocked. Fix the bug by wiring the locked_page into cow_file_range_inline() and only setting locked_page to NULL on success. Reported-by: syzbot+a14d8ac9af3a2a4fd0c8@syzkaller.appspotmail.com Fixes: 0586d0a89e77 ("btrfs: move extent bit and page cleanup into cow_file_range_inline") CC: stable@vger.kernel.org # 6.10+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-29ice: xsk: fix txq interrupt mappingMaciej Fijalkowski
ice_cfg_txq_interrupt() internally handles XDP Tx ring. Do not use ice_for_each_tx_ring() in ice_qvec_cfg_msix() as this causing us to treat XDP ring that belongs to queue vector as Tx ring and therefore misconfiguring the interrupts. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-29ice: add missing WRITE_ONCE when clearing ice_rx_ring::xdp_progMaciej Fijalkowski
It is read by data path and modified from process context on remote cpu so it is needed to use WRITE_ONCE to clear the pointer. Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-29ice: improve updating ice_{t,r}x_ring::xsk_poolMaciej Fijalkowski
xsk_buff_pool pointers that ice ring structs hold are updated via ndo_bpf that is executed in process context while it can be read by remote CPU at the same time within NAPI poll. Use synchronize_net() after pointer update and {READ,WRITE}_ONCE() when working with mentioned pointer. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-29ice: toggle netif_carrier when setting up XSK poolMaciej Fijalkowski
This so we prevent Tx timeout issues. One of conditions checked on running in the background dev_watchdog() is netif_carrier_ok(), so let us turn it off when we disable the queues that belong to a q_vector where XSK pool is being configured. Turn carrier on in ice_qp_ena() only when ice_get_link_status() tells us that physical link is up. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-29ice: modify error handling when setting XSK pool in ndo_bpfMaciej Fijalkowski
Don't bail out right when spotting an error within ice_qp_{dis,ena}() but rather track error and go through whole flow of disabling and enabling queue pair. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-29ice: replace synchronize_rcu with synchronize_netMaciej Fijalkowski
Given that ice_qp_dis() is called under rtnl_lock, synchronize_net() can be called instead of synchronize_rcu() so that XDP rings can finish its job in a faster way. Also let us do this as earlier in XSK queue disable flow. Additionally, turn off regular Tx queue before disabling irqs and NAPI. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-29ice: don't busy wait for Rx queue disable in ice_qp_dis()Maciej Fijalkowski
When ice driver is spammed with multiple xdpsock instances and flow control is enabled, there are cases when Rx queue gets stuck and unable to reflect the disable state in QRX_CTRL register. Similar issue has previously been addressed in commit 13a6233b033f ("ice: Add support to enable/disable all Rx queues before waiting"). To workaround this, let us simply not wait for a disabled state as later patch will make sure that regardless of the encountered error in the process of disabling a queue pair, the Rx queue will be enabled. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-29ice: respect netif readiness in AF_XDP ZC related ndo'sMichal Kubiak
Address a scenario in which XSK ZC Tx produces descriptors to XDP Tx ring when link is either not yet fully initialized or process of stopping the netdev has already started. To avoid this, add checks against carrier readiness in ice_xsk_wakeup() and in ice_xmit_zc(). One could argue that bailing out early in ice_xsk_wakeup() would be sufficient but given the fact that we produce Tx descriptors on behalf of NAPI that is triggered for Rx traffic, the latter is also needed. Bringing link up is an asynchronous event executed within ice_service_task so even though interface has been brought up there is still a time frame where link is not yet ok. Without this patch, when AF_XDP ZC Tx is used simultaneously with stack Tx, Tx timeouts occur after going through link flap (admin brings interface down then up again). HW seem to be unable to transmit descriptor to the wire after HW tail register bump which in turn causes bit __QUEUE_STATE_STACK_XOFF to be set forever as netdev_tx_completed_queue() sees no cleaned bytes on the input. Fixes: 126cdfe1007a ("ice: xsk: Improve AF_XDP ZC Tx and use batching API") Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-29parisc: fix a possible DMA corruptionMikulas Patocka
ARCH_DMA_MINALIGN was defined as 16 - this is too small - it may be possible that two unrelated 16-byte allocations share a cache line. If one of these allocations is written using DMA and the other is written using cached write, the value that was written with DMA may be corrupted. This commit changes ARCH_DMA_MINALIGN to be 128 on PA20 and 32 on PA1.1 - that's the largest possible cache line size. As different parisc microarchitectures have different cache line size, we define arch_slab_minalign(), cache_line_size() and dma_get_cache_alignment() so that the kernel may tune slab cache parameters dynamically, based on the detected cache line size. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Helge Deller <deller@gmx.de>
2024-07-29parisc: fix unaligned accesses in BPFMikulas Patocka
There were spurious unaligned access warnings when calling BPF code. Sometimes, the warnings were triggered with any incoming packet, making the machine hard to use. The reason for the warnings is this: on parisc64, pointers to functions are not really pointers to functions, they are pointers to 16-byte descriptor. The first 8 bytes of the descriptor is a pointer to the function and the next 8 bytes of the descriptor is the content of the "dp" register. This descriptor is generated in the function bpf_jit_build_prologue. The problem is that the function bpf_int_jit_compile advertises 4-byte alignment when calling bpf_jit_binary_alloc, bpf_jit_binary_alloc randomizes the returned array and if the array happens to be not aligned on 8-byte boundary, the descriptor generated in bpf_jit_build_prologue is also not aligned and this triggers the unaligned access warning. Fix this by advertising 8-byte alignment on parisc64 when calling bpf_jit_binary_alloc. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Helge Deller <deller@gmx.de>