summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-12-30kcov: mark in_softirq_really() as __always_inlineArnd Bergmann
If gcc decides not to inline in_softirq_really(), objtool warns about a function call with UACCESS enabled: kernel/kcov.o: warning: objtool: __sanitizer_cov_trace_pc+0x1e: call to in_softirq_really() with UACCESS enabled kernel/kcov.o: warning: objtool: check_kcov_mode+0x11: call to in_softirq_really() with UACCESS enabled Mark this as __always_inline to avoid the problem. Link: https://lkml.kernel.org/r/20241217071814.2261620-1-arnd@kernel.org Fixes: 7d4df2dad312 ("kcov: properly check for softirq context") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Marco Elver <elver@google.com> Cc: Aleksandr Nogikh <nogikh@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-30docs: mm: fix the incorrect 'FileHugeMapped' fieldBaolin Wang
The '/proc/PID/smaps' does not have the 'FileHugeMapped' field to count the file transparent huge pages, instead, the 'FilePmdMapped' field should be used. Fix it. Link: https://lkml.kernel.org/r/d520ce3aba2b03b088be30bece732426a939049a.1734425264.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-30mailmap: modify the entry for Mathieu OthaceheMathieu Othacehe
Set my gnu address as the main one. Link: https://lkml.kernel.org/r/20241217100924.7821-1-othacehe@gnu.org Signed-off-by: Mathieu Othacehe <othacehe@gnu.org> Cc: Alex Elder <elder@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Geliang Tang <geliang@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: Matthieu Baerts (NGI0) <matttbe@kernel.org> Cc: Neeraj Upadhyay <neeraj.upadhyay@kernel.org> Cc: Quentin Monnet <qmo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-30mm/kmemleak: fix sleeping function called from invalid context at print messageAlessandro Carminati
Address a bug in the kernel that triggers a "sleeping function called from invalid context" warning when /sys/kernel/debug/kmemleak is printed under specific conditions: - CONFIG_PREEMPT_RT=y - Set SELinux as the LSM for the system - Set kptr_restrict to 1 - kmemleak buffer contains at least one item BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 136, name: cat preempt_count: 1, expected: 0 RCU nest depth: 2, expected: 2 6 locks held by cat/136: #0: ffff32e64bcbf950 (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0xb8/0xe30 #1: ffffafe6aaa9dea0 (scan_mutex){+.+.}-{3:3}, at: kmemleak_seq_start+0x34/0x128 #3: ffff32e6546b1cd0 (&object->lock){....}-{2:2}, at: kmemleak_seq_show+0x3c/0x1e0 #4: ffffafe6aa8d8560 (rcu_read_lock){....}-{1:2}, at: has_ns_capability_noaudit+0x8/0x1b0 #5: ffffafe6aabbc0f8 (notif_lock){+.+.}-{2:2}, at: avc_compute_av+0xc4/0x3d0 irq event stamp: 136660 hardirqs last enabled at (136659): [<ffffafe6a80fd7a0>] _raw_spin_unlock_irqrestore+0xa8/0xd8 hardirqs last disabled at (136660): [<ffffafe6a80fd85c>] _raw_spin_lock_irqsave+0x8c/0xb0 softirqs last enabled at (0): [<ffffafe6a5d50b28>] copy_process+0x11d8/0x3df8 softirqs last disabled at (0): [<0000000000000000>] 0x0 Preemption disabled at: [<ffffafe6a6598a4c>] kmemleak_seq_show+0x3c/0x1e0 CPU: 1 UID: 0 PID: 136 Comm: cat Tainted: G E 6.11.0-rt7+ #34 Tainted: [E]=UNSIGNED_MODULE Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0xa0/0x128 show_stack+0x1c/0x30 dump_stack_lvl+0xe8/0x198 dump_stack+0x18/0x20 rt_spin_lock+0x8c/0x1a8 avc_perm_nonode+0xa0/0x150 cred_has_capability.isra.0+0x118/0x218 selinux_capable+0x50/0x80 security_capable+0x7c/0xd0 has_ns_capability_noaudit+0x94/0x1b0 has_capability_noaudit+0x20/0x30 restricted_pointer+0x21c/0x4b0 pointer+0x298/0x760 vsnprintf+0x330/0xf70 seq_printf+0x178/0x218 print_unreferenced+0x1a4/0x2d0 kmemleak_seq_show+0xd0/0x1e0 seq_read_iter+0x354/0xe30 seq_read+0x250/0x378 full_proxy_read+0xd8/0x148 vfs_read+0x190/0x918 ksys_read+0xf0/0x1e0 __arm64_sys_read+0x70/0xa8 invoke_syscall.constprop.0+0xd4/0x1d8 el0_svc+0x50/0x158 el0t_64_sync+0x17c/0x180 %pS and %pK, in the same back trace line, are redundant, and %pS can void %pK service in certain contexts. %pS alone already provides the necessary information, and if it cannot resolve the symbol, it falls back to printing the raw address voiding the original intent behind the %pK. Additionally, %pK requires a privilege check CAP_SYSLOG enforced through the LSM, which can trigger a "sleeping function called from invalid context" warning under RT_PREEMPT kernels when the check occurs in an atomic context. This issue may also affect other LSMs. This change avoids the unnecessary privilege check and resolves the sleeping function warning without any loss of information. Link: https://lkml.kernel.org/r/20241217142032.55793-1-acarmina@redhat.com Fixes: 3a6f33d86baa ("mm/kmemleak: use %pK to display kernel pointers in backtrace") Signed-off-by: Alessandro Carminati <acarmina@redhat.com> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Clément Léger <clement.leger@bootlin.com> Cc: Alessandro Carminati <acarmina@redhat.com> Cc: Eric Chanudet <echanude@redhat.com> Cc: Gabriele Paoloni <gpaoloni@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-30mm: hugetlb: independent PMD page table shared countLiu Shixin
The folio refcount may be increased unexpectly through try_get_folio() by caller such as split_huge_pages. In huge_pmd_unshare(), we use refcount to check whether a pmd page table is shared. The check is incorrect if the refcount is increased by the above caller, and this can cause the page table leaked: BUG: Bad page state in process sh pfn:109324 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x66 pfn:0x109324 flags: 0x17ffff800000000(node=0|zone=2|lastcpupid=0xfffff) page_type: f2(table) raw: 017ffff800000000 0000000000000000 0000000000000000 0000000000000000 raw: 0000000000000066 0000000000000000 00000000f2000000 0000000000000000 page dumped because: nonzero mapcount ... CPU: 31 UID: 0 PID: 7515 Comm: sh Kdump: loaded Tainted: G B 6.13.0-rc2master+ #7 Tainted: [B]=BAD_PAGE Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: show_stack+0x20/0x38 (C) dump_stack_lvl+0x80/0xf8 dump_stack+0x18/0x28 bad_page+0x8c/0x130 free_page_is_bad_report+0xa4/0xb0 free_unref_page+0x3cc/0x620 __folio_put+0xf4/0x158 split_huge_pages_all+0x1e0/0x3e8 split_huge_pages_write+0x25c/0x2d8 full_proxy_write+0x64/0xd8 vfs_write+0xcc/0x280 ksys_write+0x70/0x110 __arm64_sys_write+0x24/0x38 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0xc8/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x34/0x128 el0t_64_sync_handler+0xc8/0xd0 el0t_64_sync+0x190/0x198 The issue may be triggered by damon, offline_page, page_idle, etc, which will increase the refcount of page table. 1. The page table itself will be discarded after reporting the "nonzero mapcount". 2. The HugeTLB page mapped by the page table miss freeing since we treat the page table as shared and a shared page table will not be unmapped. Fix it by introducing independent PMD page table shared count. As described by comment, pt_index/pt_mm/pt_frag_refcount are used for s390 gmap, x86 pgds and powerpc, pt_share_count is used for x86/arm64/riscv pmds, so we can reuse the field as pt_share_count. Link: https://lkml.kernel.org/r/20241216071147.3984217-1-liushixin2@huawei.com Fixes: 39dde65c9940 ("[PATCH] shared page table for hugetlb page") Signed-off-by: Liu Shixin <liushixin2@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Ken Chen <kenneth.w.chen@intel.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-30maple_tree: reload mas before the second call for mas_empty_areaYang Erkun
Change the LONG_MAX in simple_offset_add to 1024, and do latter: [root@fedora ~]# mkdir /tmp/dir [root@fedora ~]# for i in {1..1024}; do touch /tmp/dir/$i; done touch: cannot touch '/tmp/dir/1024': Device or resource busy [root@fedora ~]# rm /tmp/dir/123 [root@fedora ~]# touch /tmp/dir/1024 [root@fedora ~]# rm /tmp/dir/100 [root@fedora ~]# touch /tmp/dir/1025 touch: cannot touch '/tmp/dir/1025': Device or resource busy After we delete file 100, actually this is a empty entry, but the latter create failed unexpected. mas_alloc_cyclic has two chance to find empty entry. First find the entry with range range_lo and range_hi, if no empty entry exist, and range_lo > min, retry find with range min and range_hi. However, the first call mas_empty_area may mark mas as EBUSY, and the second call for mas_empty_area will return false directly. Fix this by reload mas before second call for mas_empty_area. [Liam.Howlett@Oracle.com: fix mas_alloc_cyclic() second search] Link: https://lore.kernel.org/all/20241216060600.287B4C4CED0@smtp.kernel.org/ Link: https://lkml.kernel.org/r/20241216190113.1226145-2-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20241214093005.72284-1-yangerkun@huaweicloud.com Fixes: 9b6713cc7522 ("maple_tree: Add mtree_alloc_cyclic()") Signed-off-by: Yang Erkun <yangerkun@huawei.com> Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> says: Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-30mm/readahead: fix large folio support in async readaheadYafang Shao
When testing large folio support with XFS on our servers, we observed that only a few large folios are mapped when reading large files via mmap. After a thorough analysis, I identified it was caused by the `/sys/block/*/queue/read_ahead_kb` setting. On our test servers, this parameter is set to 128KB. After I tune it to 2MB, the large folio can work as expected. However, I believe the large folio behavior should not be dependent on the value of read_ahead_kb. It would be more robust if the kernel can automatically adopt to it. With /sys/block/*/queue/read_ahead_kb set to 128KB and performing a sequential read on a 1GB file using MADV_HUGEPAGE, the differences in /proc/meminfo are as follows: - before this patch FileHugePages: 18432 kB FilePmdMapped: 4096 kB - after this patch FileHugePages: 1067008 kB FilePmdMapped: 1048576 kB This shows that after applying the patch, the entire 1GB file is mapped to huge pages. The stable list is CCed, as without this patch, large folios don't function optimally in the readahead path. It's worth noting that if read_ahead_kb is set to a larger value that isn't aligned with huge page sizes (e.g., 4MB + 128KB), it may still fail to map to hugepages. Link: https://lkml.kernel.org/r/20241108141710.9721-1-laoar.shao@gmail.com Link: https://lkml.kernel.org/r/20241206083025.3478-1-laoar.shao@gmail.com Fixes: 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings") Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Tested-by: kernel test robot <oliver.sang@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-30mm: don't try THP alignment for FS without get_unmapped_areaKefeng Wang
Commit ed48e87c7df3 ("thp: add thp_get_unmapped_area_vmflags()") changes thp_get_unmapped_area() to thp_get_unmapped_area_vmflags() in __get_unmapped_area(), which doesn't initialize local get_area for anonymous mappings. This leads to us always trying THP alignment even for file_operations which have a NULL ->get_unmapped_area() callback. Since commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") we only want to enable THP alignment for anonymous mappings, so add a !file check to avoid attempting THP alignment for file mappings. Found issue by code inspection. THP alignment is used for easy or more pmd mappings, from vma side. This may cause unnecessary VMA fragmentation and potentially worse performance on filesystems that do not actually support THPs and thus cannot benefit from the alignment. Link: https://lkml.kernel.org/r/20241206070345.2526501-1-wangkefeng.wang@huawei.com Fixes: ed48e87c7df3 ("thp: add thp_get_unmapped_area_vmflags()") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-30mm: vmscan: account for free pages to prevent infinite Loop in ↵Seiji Nishikawa
throttle_direct_reclaim() The task sometimes continues looping in throttle_direct_reclaim() because allow_direct_reclaim(pgdat) keeps returning false. #0 [ffff80002cb6f8d0] __switch_to at ffff8000080095ac #1 [ffff80002cb6f900] __schedule at ffff800008abbd1c #2 [ffff80002cb6f990] schedule at ffff800008abc50c #3 [ffff80002cb6f9b0] throttle_direct_reclaim at ffff800008273550 #4 [ffff80002cb6fa20] try_to_free_pages at ffff800008277b68 #5 [ffff80002cb6fae0] __alloc_pages_nodemask at ffff8000082c4660 #6 [ffff80002cb6fc50] alloc_pages_vma at ffff8000082e4a98 #7 [ffff80002cb6fca0] do_anonymous_page at ffff80000829f5a8 #8 [ffff80002cb6fce0] __handle_mm_fault at ffff8000082a5974 #9 [ffff80002cb6fd90] handle_mm_fault at ffff8000082a5bd4 At this point, the pgdat contains the following two zones: NODE: 4 ZONE: 0 ADDR: ffff00817fffe540 NAME: "DMA32" SIZE: 20480 MIN/LOW/HIGH: 11/28/45 VM_STAT: NR_FREE_PAGES: 359 NR_ZONE_INACTIVE_ANON: 18813 NR_ZONE_ACTIVE_ANON: 0 NR_ZONE_INACTIVE_FILE: 50 NR_ZONE_ACTIVE_FILE: 0 NR_ZONE_UNEVICTABLE: 0 NR_ZONE_WRITE_PENDING: 0 NR_MLOCK: 0 NR_BOUNCE: 0 NR_ZSPAGES: 0 NR_FREE_CMA_PAGES: 0 NODE: 4 ZONE: 1 ADDR: ffff00817fffec00 NAME: "Normal" SIZE: 8454144 PRESENT: 98304 MIN/LOW/HIGH: 68/166/264 VM_STAT: NR_FREE_PAGES: 146 NR_ZONE_INACTIVE_ANON: 94668 NR_ZONE_ACTIVE_ANON: 3 NR_ZONE_INACTIVE_FILE: 735 NR_ZONE_ACTIVE_FILE: 78 NR_ZONE_UNEVICTABLE: 0 NR_ZONE_WRITE_PENDING: 0 NR_MLOCK: 0 NR_BOUNCE: 0 NR_ZSPAGES: 0 NR_FREE_CMA_PAGES: 0 In allow_direct_reclaim(), while processing ZONE_DMA32, the sum of inactive/active file-backed pages calculated in zone_reclaimable_pages() based on the result of zone_page_state_snapshot() is zero. Additionally, since this system lacks swap, the calculation of inactive/ active anonymous pages is skipped. crash> p nr_swap_pages nr_swap_pages = $1937 = { counter = 0 } As a result, ZONE_DMA32 is deemed unreclaimable and skipped, moving on to the processing of the next zone, ZONE_NORMAL, despite ZONE_DMA32 having free pages significantly exceeding the high watermark. The problem is that the pgdat->kswapd_failures hasn't been incremented. crash> px ((struct pglist_data *) 0xffff00817fffe540)->kswapd_failures $1935 = 0x0 This is because the node deemed balanced. The node balancing logic in balance_pgdat() evaluates all zones collectively. If one or more zones (e.g., ZONE_DMA32) have enough free pages to meet their watermarks, the entire node is deemed balanced. This causes balance_pgdat() to exit early before incrementing the kswapd_failures, as it considers the overall memory state acceptable, even though some zones (like ZONE_NORMAL) remain under significant pressure. The patch ensures that zone_reclaimable_pages() includes free pages (NR_FREE_PAGES) in its calculation when no other reclaimable pages are available (e.g., file-backed or anonymous pages). This change prevents zones like ZONE_DMA32, which have sufficient free pages, from being mistakenly deemed unreclaimable. By doing so, the patch ensures proper node balancing, avoids masking pressure on other zones like ZONE_NORMAL, and prevents infinite loops in throttle_direct_reclaim() caused by allow_direct_reclaim(pgdat) repeatedly returning false. The kernel hangs due to a task stuck in throttle_direct_reclaim(), caused by a node being incorrectly deemed balanced despite pressure in certain zones, such as ZONE_NORMAL. This issue arises from zone_reclaimable_pages() returning 0 for zones without reclaimable file- backed or anonymous pages, causing zones like ZONE_DMA32 with sufficient free pages to be skipped. The lack of swap or reclaimable pages results in ZONE_DMA32 being ignored during reclaim, masking pressure in other zones. Consequently, pgdat->kswapd_failures remains 0 in balance_pgdat(), preventing fallback mechanisms in allow_direct_reclaim() from being triggered, leading to an infinite loop in throttle_direct_reclaim(). This patch modifies zone_reclaimable_pages() to account for free pages (NR_FREE_PAGES) when no other reclaimable pages exist. This ensures zones with sufficient free pages are not skipped, enabling proper balancing and reclaim behavior. [akpm@linux-foundation.org: coding-style cleanups] Link: https://lkml.kernel.org/r/20241130164346.436469-1-snishika@redhat.com Link: https://lkml.kernel.org/r/20241130161236.433747-2-snishika@redhat.com Fixes: 5a1c84b404a7 ("mm: remove reclaim and compaction retry approximations") Signed-off-by: Seiji Nishikawa <snishika@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-30selftests/memfd: add test for mapping write-sealed memfd read-onlyLorenzo Stoakes
Now we have reinstated the ability to map F_SEAL_WRITE mappings read-only, assert that we are able to do this in a test to ensure that we do not regress this again. Link: https://lkml.kernel.org/r/a6377ec470b14c0539b4600cf8fa24bf2e4858ae.1732804776.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Julian Orth <ju.orth@gmail.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-30mm: reinstate ability to map write-sealed memfd mappings read-onlyLorenzo Stoakes
Patch series "mm: reinstate ability to map write-sealed memfd mappings read-only". In commit 158978945f31 ("mm: perform the mapping_map_writable() check after call_mmap()") (and preceding changes in the same series) it became possible to mmap() F_SEAL_WRITE sealed memfd mappings read-only. Commit 5de195060b2e ("mm: resolve faulty mmap_region() error path behaviour") unintentionally undid this logic by moving the mapping_map_writable() check before the shmem_mmap() hook is invoked, thereby regressing this change. This series reworks how we both permit write-sealed mappings being mapped read-only and disallow mprotect() from undoing the write-seal, fixing this regression. We also add a regression test to ensure that we do not accidentally regress this in future. Thanks to Julian Orth for reporting this regression. This patch (of 2): In commit 158978945f31 ("mm: perform the mapping_map_writable() check after call_mmap()") (and preceding changes in the same series) it became possible to mmap() F_SEAL_WRITE sealed memfd mappings read-only. This was previously unnecessarily disallowed, despite the man page documentation indicating that it would be, thereby limiting the usefulness of F_SEAL_WRITE logic. We fixed this by adapting logic that existed for the F_SEAL_FUTURE_WRITE seal (one which disallows future writes to the memfd) to also be used for F_SEAL_WRITE. For background - the F_SEAL_FUTURE_WRITE seal clears VM_MAYWRITE for a read-only mapping to disallow mprotect() from overriding the seal - an operation performed by seal_check_write(), invoked from shmem_mmap(), the f_op->mmap() hook used by shmem mappings. By extending this to F_SEAL_WRITE and critically - checking mapping_map_writable() to determine if we may map the memfd AFTER we invoke shmem_mmap() - the desired logic becomes possible. This is because mapping_map_writable() explicitly checks for VM_MAYWRITE, which we will have cleared. Commit 5de195060b2e ("mm: resolve faulty mmap_region() error path behaviour") unintentionally undid this logic by moving the mapping_map_writable() check before the shmem_mmap() hook is invoked, thereby regressing this change. We reinstate this functionality by moving the check out of shmem_mmap() and instead performing it in do_mmap() at the point at which VMA flags are being determined, which seems in any case to be a more appropriate place in which to make this determination. In order to achieve this we rework memfd seal logic to allow us access to this information using existing logic and eliminate the clearing of VM_MAYWRITE from seal_check_write() which we are performing in do_mmap() instead. Link: https://lkml.kernel.org/r/99fc35d2c62bd2e05571cf60d9f8b843c56069e0.1732804776.git.lorenzo.stoakes@oracle.com Fixes: 5de195060b2e ("mm: resolve faulty mmap_region() error path behaviour") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by: Julian Orth <ju.orth@gmail.com> Closes: https://lore.kernel.org/all/CAHijbEUMhvJTN9Xw1GmbM266FXXv=U7s4L_Jem5x3AaPZxrYpQ@mail.gmail.com/ Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-30sky2: Add device ID 11ab:4373 for Marvell 88E8075Pascal Hambourg
A Marvell 88E8075 ethernet controller has this device ID instead of 11ab:4370 and works fine with the sky2 driver. Signed-off-by: Pascal Hambourg <pascal@plouf.fr.eu.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/10165a62-99fb-4be6-8c64-84afd6234085@plouf.fr.eu.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-30mptcp: fix TCP options overflow.Paolo Abeni
Syzbot reported the following splat: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 1 UID: 0 PID: 5836 Comm: sshd Not tainted 6.13.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/25/2024 RIP: 0010:_compound_head include/linux/page-flags.h:242 [inline] RIP: 0010:put_page+0x23/0x260 include/linux/mm.h:1552 Code: 90 90 90 90 90 90 90 55 41 57 41 56 53 49 89 fe 48 bd 00 00 00 00 00 fc ff df e8 f8 5e 12 f8 49 8d 5e 08 48 89 d8 48 c1 e8 03 <80> 3c 28 00 74 08 48 89 df e8 8f c7 78 f8 48 8b 1b 48 89 de 48 83 RSP: 0000:ffffc90003916c90 EFLAGS: 00010202 RAX: 0000000000000001 RBX: 0000000000000008 RCX: ffff888030458000 RDX: 0000000000000100 RSI: 0000000000000000 RDI: 0000000000000000 RBP: dffffc0000000000 R08: ffffffff898ca81d R09: 1ffff110054414ac R10: dffffc0000000000 R11: ffffed10054414ad R12: 0000000000000007 R13: ffff88802a20a542 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f34f496e800(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f9d6ec9ec28 CR3: 000000004d260000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> skb_page_unref include/linux/skbuff_ref.h:43 [inline] __skb_frag_unref include/linux/skbuff_ref.h:56 [inline] skb_release_data+0x483/0x8a0 net/core/skbuff.c:1119 skb_release_all net/core/skbuff.c:1190 [inline] __kfree_skb+0x55/0x70 net/core/skbuff.c:1204 tcp_clean_rtx_queue net/ipv4/tcp_input.c:3436 [inline] tcp_ack+0x2442/0x6bc0 net/ipv4/tcp_input.c:4032 tcp_rcv_state_process+0x8eb/0x44e0 net/ipv4/tcp_input.c:6805 tcp_v4_do_rcv+0x77d/0xc70 net/ipv4/tcp_ipv4.c:1939 tcp_v4_rcv+0x2dc0/0x37f0 net/ipv4/tcp_ipv4.c:2351 ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 __netif_receive_skb_one_core net/core/dev.c:5672 [inline] __netif_receive_skb+0x2bf/0x650 net/core/dev.c:5785 process_backlog+0x662/0x15b0 net/core/dev.c:6117 __napi_poll+0xcb/0x490 net/core/dev.c:6883 napi_poll net/core/dev.c:6952 [inline] net_rx_action+0x89b/0x1240 net/core/dev.c:7074 handle_softirqs+0x2d4/0x9b0 kernel/softirq.c:561 __do_softirq kernel/softirq.c:595 [inline] invoke_softirq kernel/softirq.c:435 [inline] __irq_exit_rcu+0xf7/0x220 kernel/softirq.c:662 irq_exit_rcu+0x9/0x30 kernel/softirq.c:678 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1049 [inline] sysvec_apic_timer_interrupt+0x57/0xc0 arch/x86/kernel/apic/apic.c:1049 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 RIP: 0033:0x7f34f4519ad5 Code: 85 d2 74 0d 0f 10 02 48 8d 54 24 20 0f 11 44 24 20 64 8b 04 25 18 00 00 00 85 c0 75 27 41 b8 08 00 00 00 b8 0f 01 00 00 0f 05 <48> 3d 00 f0 ff ff 76 75 48 8b 15 24 73 0d 00 f7 d8 64 89 02 48 83 RSP: 002b:00007ffec5b32ce0 EFLAGS: 00000246 RAX: 0000000000000001 RBX: 00000000000668a0 RCX: 00007f34f4519ad5 RDX: 00007ffec5b32d00 RSI: 0000000000000004 RDI: 0000564f4bc6cae0 RBP: 0000564f4bc6b5a0 R08: 0000000000000008 R09: 0000000000000000 R10: 00007ffec5b32de8 R11: 0000000000000246 R12: 0000564f48ea8aa4 R13: 0000000000000001 R14: 0000564f48ea93e8 R15: 00007ffec5b32d68 </TASK> Eric noted a probable shinfo->nr_frags corruption, which indeed occurs. The root cause is a buggy MPTCP option len computation in some circumstances: the ADD_ADDR option should be mutually exclusive with DSS since the blamed commit. Still, mptcp_established_options_add_addr() tries to set the relevant info in mptcp_out_options, if the remaining space is large enough even when DSS is present. Since the ADD_ADDR infos and the DSS share the same union fields, adding first corrupts the latter. In the worst-case scenario, such corruption increases the DSS binary layout, exceeding the computed length and possibly overwriting the skb shared info. Address the issue by enforcing mutual exclusion in mptcp_established_options_add_addr(), too. Cc: stable@vger.kernel.org Reported-by: syzbot+38a095a81f30d82884c1@syzkaller.appspotmail.com Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/538 Fixes: 1bff1e43a30e ("mptcp: optimize out option generation") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/025d9df8cde3c9a557befc47e9bc08fbbe3476e5.1734771049.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-30net: mv643xx_eth: fix an OF node reference leakJoe Hattori
Current implementation of mv643xx_eth_shared_of_add_port() calls of_parse_phandle(), but does not release the refcount on error. Call of_node_put() in the error path and in mv643xx_eth_shared_of_remove(). This bug was found by an experimental verification tool that I am developing. Fixes: 76723bca2802 ("net: mv643xx_eth: add DT parsing support") Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp> Link: https://patch.msgid.link/20241221081448.3313163-1-joe@pf.is.s.u-tokyo.ac.jp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-30gve: trigger RX NAPI instead of TX NAPI in gve_xsk_wakeupJoshua Washington
Commit ba0925c34e0f ("gve: process XSK TX descriptors as part of RX NAPI") moved XSK TX processing to be part of the RX NAPI. However, that commit did not include triggering the RX NAPI in gve_xsk_wakeup. This is necessary because the TX NAPI only processes TX completions, meaning that a TX wakeup would not actually trigger XSK descriptor processing. Also, the branch on XDP_WAKEUP_TX was supposed to have been removed, as the NAPI should be scheduled whether the wakeup is for RX or TX. Fixes: ba0925c34e0f ("gve: process XSK TX descriptors as part of RX NAPI") Cc: stable@vger.kernel.org Signed-off-by: Joshua Washington <joshwash@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Link: https://patch.msgid.link/20241221032807.302244-1-pkaligineedi@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-30eth: bcmsysport: fix call balance of priv->clk handling routinesVitalii Mordan
Check the return value of clk_prepare_enable to ensure that priv->clk has been successfully enabled. If priv->clk was not enabled during bcm_sysport_probe, bcm_sysport_resume, or bcm_sysport_open, it must not be disabled in any subsequent execution paths. Fixes: 31bc72d97656 ("net: systemport: fetch and use clock resources") Signed-off-by: Vitalii Mordan <mordan@ispras.ru> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20241227123007.2333397-1-mordan@ispras.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-30io_uring/timeout: flush timeouts outside of the timeout lockJens Axboe
syzbot reports that a recent fix causes nesting issues between the (now) raw timeoutlock and the eventfd locking: ============================= [ BUG: Invalid wait context ] 6.13.0-rc4-00080-g9828a4c0901f #29 Not tainted ----------------------------- kworker/u32:0/68094 is trying to lock: ffff000014d7a520 (&ctx->wqh#2){..-.}-{3:3}, at: eventfd_signal_mask+0x64/0x180 other info that might help us debug this: context-{5:5} 6 locks held by kworker/u32:0/68094: #0: ffff0000c1d98148 ((wq_completion)iou_exit){+.+.}-{0:0}, at: process_one_work+0x4e8/0xfc0 #1: ffff80008d927c78 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x53c/0xfc0 #2: ffff0000c59bc3d8 (&ctx->completion_lock){+.+.}-{3:3}, at: io_kill_timeouts+0x40/0x180 #3: ffff0000c59bc358 (&ctx->timeout_lock){-.-.}-{2:2}, at: io_kill_timeouts+0x48/0x180 #4: ffff800085127aa0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x8/0x38 #5: ffff800085127aa0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x8/0x38 stack backtrace: CPU: 7 UID: 0 PID: 68094 Comm: kworker/u32:0 Not tainted 6.13.0-rc4-00080-g9828a4c0901f #29 Hardware name: linux,dummy-virt (DT) Workqueue: iou_exit io_ring_exit_work Call trace: show_stack+0x1c/0x30 (C) __dump_stack+0x24/0x30 dump_stack_lvl+0x60/0x80 dump_stack+0x14/0x20 __lock_acquire+0x19f8/0x60c8 lock_acquire+0x1a4/0x540 _raw_spin_lock_irqsave+0x90/0xd0 eventfd_signal_mask+0x64/0x180 io_eventfd_signal+0x64/0x108 io_req_local_work_add+0x294/0x430 __io_req_task_work_add+0x1c0/0x270 io_kill_timeout+0x1f0/0x288 io_kill_timeouts+0xd4/0x180 io_uring_try_cancel_requests+0x2e8/0x388 io_ring_exit_work+0x150/0x550 process_one_work+0x5e8/0xfc0 worker_thread+0x7ec/0xc80 kthread+0x24c/0x300 ret_from_fork+0x10/0x20 because after the preempt-rt fix for the timeout lock nesting inside the io-wq lock, we now have the eventfd spinlock nesting inside the raw timeout spinlock. Rather than play whack-a-mole with other nesting on the timeout lock, split the deletion and killing of timeouts so queueing the task_work for the timeout cancelations can get done outside of the timeout lock. Reported-by: syzbot+b1fc199a40b65d601b65@syzkaller.appspotmail.com Fixes: 020b40f35624 ("io_uring: make ctx->timeout_lock a raw spinlock") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-30cdrom: Fix typo, 'devicen' to 'device'Steven Davis
Fix typo in cd_dbg line to add trailing newline character. Signed-off-by: Steven Davis <goldside000@outlook.com> Link: https://lore.kernel.org/lkml/20241229165744.21725-1-goldside000@outlook.com Reviewed-by: Phillip Potter <phil@philpotter.co.uk> Link: https://lore.kernel.org/lkml/Z3GV2W_MUOw5BrtR@equinox Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Link: https://lore.kernel.org/r/20241230193431.441120-2-phil@philpotter.co.uk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-30Merge tag 'platform-drivers-x86-v6.13-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform drivers fixes from Ilpo Järvinen: "hp-wmi: - mark 8A15 board for timed OMEN thermal profile mlx-platform: - call pci_dev_put() to balance the refcount thinkpad-acpi: - Add support for hotkey 0x1401" * tag 'platform-drivers-x86-v6.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: thinkpad-acpi: Add support for hotkey 0x1401 platform/x86: hp-wmi: mark 8A15 board for timed OMEN thermal profile platform/x86: mlx-platform: call pci_dev_put() to balance the refcount
2024-12-30watchdog: stm32_iwdg: fix error message during driver probeClément Le Goffic
The commit 3ab1663af6c1 ("watchdog: stm32_iwdg: Add pretimeout support") introduces the support for the pre-timeout interrupt. The support for this interrupt is optional but the driver uses the platform_get_irq() which produces an error message during the driver probe if we don't have any `interrupts` property in the DT. Use the platform_get_irq_optional() API to get rid of the error message as this property is optional. Fixes: 3ab1663af6c1 ("watchdog: stm32_iwdg: Add pretimeout support") Signed-off-by: Clément Le Goffic <clement.legoffic@foss.st.com> Reviewed-by: Marek Vasut <marex@denx.de> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20241218092227.771133-1-clement.legoffic@foss.st.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2024-12-30Revert "ALSA: ump: Don't enumeration invalid groups for legacy rawmidi"Takashi Iwai
This reverts commit c2d188e137e77294323132a760a4608321a36a70. Although it's fine to filter the invalid UMP groups at the first probe time, this will become a problem when UMP groups are updated and (re-)activated. Then there is no way to re-add the substreams properly for the legacy rawmidi, and the new active groups will be still invisible. So let's revert the change. This will move back to showing the full 16 groups, but it's better than forever lost. Link: https://patch.msgid.link/20241230114023.3787-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-12-30ALSA: seq: oss: Fix races at processing SysEx messagesTakashi Iwai
OSS sequencer handles the SysEx messages split in 6 bytes packets, and ALSA sequencer OSS layer tries to combine those. It stores the data in the internal buffer and this access is racy as of now, which may lead to the out-of-bounds access. As a temporary band-aid fix, introduce a mutex for serializing the process of the SysEx message packets. Reported-by: Kun Hu <huk23@m.fudan.edu.cn> Closes: https://lore.kernel.org/2B7E93E4-B13A-4AE4-8E87-306A8EE9BBB7@m.fudan.edu.cn Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20241230110543.32454-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-12-30ALSA: compress_offload: fix remaining descriptor races in ↵Al Viro
sound/core/compress_offload.c 3d3f43fab4cf ("ALSA: compress_offload: improve file descriptors installation for dma-buf") fixed some of descriptor races in snd_compr_task_new(), but there's a couple more left. We need to grab the references to dmabuf before moving them into descriptor table - trying to do that by descriptor afterwards might end up getting a different object, with a dangling reference left in task->{input,output} Fixes: 3d3f43fab4cf ("ALSA: compress_offload: improve file descriptors installation for dma-buf") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://patch.msgid.link/20241229185232.GA1977892@ZenIV Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-12-30ALSA: compress_offload: Drop unneeded no_free_ptr()Takashi Iwai
The error path for memdup_user() no longer needs the tricky wrap with no_free_ptr() and we can safely return the error pointer directly. Fixes: 04177158cf98 ("ALSA: compress_offload: introduce accel operation mode") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202412290846.cncnpGaw-lkp@intel.com/ Link: https://patch.msgid.link/20241229083917.14912-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-12-30ALSA: hda/tas2781: Ignore SUBSYS_ID not found for tas2563 projectsBaojun Xu
Driver will return error if no SUBSYS_ID found in BIOS(acpi). It will cause error in tas2563 projects, which have no SUBSYS_ID. Fixes: 4e7035a75da9 ("ALSA: hda/tas2781: Add speaker id check for ASUS projects") Signed-off-by: Baojun Xu <baojun.xu@ti.com> Link: https://lore.kernel.org/20241223225442.1358491-1-stuart.a.hayhurst@gmail.com Link: https://patch.msgid.link/20241230064910.1583-1-baojun.xu@ti.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-12-29Linux 6.13-rc5v6.13-rc5Linus Torvalds
2024-12-29Merge tag 'sched-urgent-2024-12-29' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Fix a procfs task state reporting regression when freezing sleeping tasks" * tag 'sched-urgent-2024-12-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: freezer, sched: Report frozen tasks as 'D' instead of 'R'
2024-12-29Merge tag 'x86-urgent-2024-12-29' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix a hang in the "kernel IBT no ENDBR" self-test that may trigger on FRED systems, caused by incomplete FRED state cleanup in the #CP fault handler - Improve TDX (Coco VM) guest unrecoverable error handling to not potentially leak decrypted memory * tag 'x86-urgent-2024-12-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: virt: tdx-guest: Just leak decrypted memory on unrecoverable errors x86/fred: Clear WFE in missing-ENDBRANCH #CPs
2024-12-29Merge tag 'perf-urgent-2024-12-29' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fixes from Ingo Molnar: - Fix Intel Lunar Lake build-in event definitions - Fall back to (compatible) legacy features on new Intel PEBS format v6 hardware - Enable uncore support on Intel Clearwater Forest CPUs, which is the same as the existing Sierra Forest uncore driver * tag 'perf-urgent-2024-12-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix bitmask of OCR and FRONTEND events for LNC perf/x86/intel/ds: Add PEBS format 6 perf/x86/intel/uncore: Add Clearwater Forest support
2024-12-29Merge tag 'objtool-urgent-2024-12-29' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fix from Ingo Molnar: "Fix false positive objtool build warning related to a noreturn function in the bcachefs code" * tag 'objtool-urgent-2024-12-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Add bch2_trans_unlocked_error() to bcachefs noreturns
2024-12-29Merge tag 'locking-urgent-2024-12-29' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "Fix missed rtmutex wakeups causing sporadic boot hangs and other misbehavior" * tag 'locking-urgent-2024-12-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rtmutex: Make sure we wake anything on the wake_q when we release the lock->wait_lock
2024-12-29Merge tag 'irq-urgent-2024-12-29' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Ingo Molnar: "Fix bogus MSI IRQ setup warning on RISC-V" * tag 'irq-urgent-2024-12-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: PCI/MSI: Handle lack of irqdomain gracefully
2024-12-29Merge tag 'for-6.13-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more fixes that accumulated over the last two weeks, fixing some user reported problems: - swapfile fixes: - conditional reschedule in the activation loop - fix race with memory mapped file when activating - make activation loop interruptible - rework and fix extent sharing checks - folio fixes: - in send, recheck folio mapping after unlock - in relocation, recheck folio mapping after unlock - fix waiting for encoded read io_uring requests - fix transaction atomicity when enabling simple quotas - move COW block trace point before the block gets freed - print various sizes in sysfs with correct endianity" * tag 'for-6.13-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: sysfs: fix direct super block member reads btrfs: fix transaction atomicity bug when enabling simple quotas btrfs: avoid monopolizing a core when activating a swap file btrfs: allow swap activation to be interruptible btrfs: fix swap file activation failure due to extents that used to be shared btrfs: fix race with memory mapped writes when activating swap file btrfs: check folio mapping after unlock in put_file_data() btrfs: check folio mapping after unlock in relocate_one_folio() btrfs: fix use-after-free when COWing tree bock and tracing is enabled btrfs: fix use-after-free waiting for encoded read endios
2024-12-29Merge tag 'i2c-for-6.13-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: - IMX: fix stop condition in single master mode and add compatible string for errata adherence - Microchip: Add support for proper repeated sends and fix unnecessary NAKs on empty messages, which caused false bus detection * tag 'i2c-for-6.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: microchip-core: fix "ghost" detections i2c: microchip-core: actually use repeated sends i2c: imx: add imx7d compatible string for applying erratum ERR007805 i2c: imx: fix missing stop condition in single-master mode
2024-12-29platform/x86: thinkpad-acpi: Add support for hotkey 0x1401Vishnu Sankar
F8 mode key on Lenovo 2025 platforms use a different key code. Adding support for the new keycode 0x1401. Tested on X1 Carbon Gen 13 and X1 2-in-1 Gen 10. Signed-off-by: Vishnu Sankar <vishnuocv@gmail.com> Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Link: https://lore.kernel.org/r/20241227231840.21334-1-vishnuocv@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-12-29platform/x86: hp-wmi: mark 8A15 board for timed OMEN thermal profileMingcong Bai
The HP OMEN 8 (2022), corresponding to a board ID of 8A15, supports OMEN thermal profile and requires the timed profile quirk. Upon adding this ID to both the omen_thermal_profile_boards and omen_timed_thermal_profile_boards, significant bump in performance can be observed. For instance, SilverBench (https://silver.urih.com/) results improved from ~56,000 to ~69,000, as a result of higher power draws (and thus core frequencies) whilst under load: Package Power: - Before the patch: ~65W (dropping to about 55W under sustained load). - After the patch: ~115W (dropping to about 105W under sustained load). Core Power: - Before: ~60W (ditto above). - After: ~108W (ditto above). Add 8A15 to omen_thermal_profile_boards and omen_timed_thermal_profile_boards to improve performance. Signed-off-by: Xi Xiao <1577912515@qq.com> Signed-off-by: Mingcong Bai <jeffbai@aosc.io> Link: https://lore.kernel.org/r/20241226062207.3352629-1-jeffbai@aosc.io Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-12-29virt: tdx-guest: Just leak decrypted memory on unrecoverable errorsLi RongQing
In CoCo VMs it is possible for the untrusted host to cause set_memory_decrypted() to fail such that an error is returned and the resulting memory is shared. Callers need to take care to handle these errors to avoid returning decrypted (shared) memory to the page allocator, which could lead to functional or security issues. Leak the decrypted memory when set_memory_decrypted() fails, and don't need to print an error since set_memory_decrypted() will call WARN_ONCE(). Fixes: f4738f56d1dc ("virt: tdx-guest: Add Quote generation support using TSM_REPORTS") Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20240619111801.25630-1-lirongqing%40baidu.com
2024-12-29x86/fred: Clear WFE in missing-ENDBRANCH #CPsXin Li (Intel)
An indirect branch instruction sets the CPU indirect branch tracker (IBT) into WAIT_FOR_ENDBRANCH (WFE) state and WFE stays asserted across the instruction boundary. When the decoder finds an inappropriate instruction while WFE is set ENDBR, the CPU raises a #CP fault. For the "kernel IBT no ENDBR" selftest where #CPs are deliberately triggered, the WFE state of the interrupted context needs to be cleared to let execution continue. Otherwise when the CPU resumes from the instruction that just caused the previous #CP, another missing-ENDBRANCH #CP is raised and the CPU enters a dead loop. This is not a problem with IDT because it doesn't preserve WFE and IRET doesn't set WFE. But FRED provides space on the entry stack (in an expanded CS area) to save and restore the WFE state, thus the WFE state is no longer clobbered, so software must clear it. Clear WFE to avoid dead looping in ibt_clear_fred_wfe() and the !ibt_fatal code path when execution is allowed to continue. Clobbering WFE in any other circumstance is a security-relevant bug. [ dhansen: changelog rewording ] Fixes: a5f6c2ace997 ("x86/shstk: Add user control-protection fault handler") Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241113175934.3897541-1-xin%40zytor.com
2024-12-29freezer, sched: Report frozen tasks as 'D' instead of 'R'Chen Ridong
Before commit: f5d39b020809 ("freezer,sched: Rewrite core freezer logic") the frozen task stat was reported as 'D' in cgroup v1. However, after rewriting the core freezer logic, the frozen task stat is reported as 'R'. This is confusing, especially when a task with stat of 'S' is frozen. This bug can be reproduced with these steps: $ cd /sys/fs/cgroup/freezer/ $ mkdir test $ sleep 1000 & [1] 739 // task whose stat is 'S' $ echo 739 > test/cgroup.procs $ echo FROZEN > test/freezer.state $ ps -aux | grep 739 root 739 0.1 0.0 8376 1812 pts/0 R 10:56 0:00 sleep 1000 As shown above, a task whose stat is 'S' was changed to 'R' when it was frozen. To solve this regression, simply maintain the same reported state as before the rewrite. [ mingo: Enhanced the changelog and comments ] Fixes: f5d39b020809 ("freezer,sched: Rewrite core freezer logic") Signed-off-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Michal Koutný <mkoutny@suse.com> Link: https://lore.kernel.org/r/20241217004818.3200515-1-chenridong@huaweicloud.com
2024-12-29objtool: Add bch2_trans_unlocked_error() to bcachefs noreturnschenchangcheng
Fix the following objtool warning during build time: fs/bcachefs/btree_trans_commit.o: warning: objtool: bch2_trans_commit_write_locked.isra.0() falls through to next function do_bch2_trans_commit.isra.0() fs/bcachefs/btree_trans_commit.o: warning: objtool: .text: unexpected end of section ...... fs/bcachefs/btree_update.o: warning: objtool: bch2_trans_update_get_key_cache() falls through to next function flush_new_cached_update() fs/bcachefs/btree_update.o: warning: objtool: flush_new_cached_update() falls through to next function bch2_trans_update_by_path() bch2_trans_unlocked_error() is an Obviously Correct (tm) panic() wrapper, add it to the list of known noreturns. [ mingo: Improved the changelog ] Fixes: fd104e2967b7 ("bcachefs: bch2_trans_verify_not_unlocked()") Signed-off-by: chenchangcheng <chenchangcheng@kylinos.cn> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20241220074847.3418134-1-ccc194101@163.com
2024-12-29ALSA: usb-audio: US16x08: Initialize array before useTanya Agarwal
Initialize meter_urb array before use in mixer_us16x08.c. CID 1410197: (#1 of 1): Uninitialized scalar variable (UNINIT) uninit_use_in_call: Using uninitialized value *meter_urb when calling get_meter_levels_from_urb. Coverity Link: https://scan7.scan.coverity.com/#/project-view/52849/11354?selectedIssue=1410197 Fixes: d2bb390a2081 ("ALSA: usb-audio: Tascam US-16x08 DSP mixer quirk") Signed-off-by: Tanya Agarwal <tanyaagarwal25699@gmail.com> Link: https://patch.msgid.link/20241229060240.1642-1-tanyaagarwal25699@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-12-28io_uring/rw: fix downgraded mshot readPavel Begunkov
The io-wq path can downgrade a multishot request to oneshot mode, however io_read_mshot() doesn't handle that and would still post multiple CQEs. That's not allowed, because io_req_post_cqe() requires stricter context requirements. The described can only happen with pollable files that don't support FMODE_NOWAIT, which is an odd combination, so if even allowed it should be fairly rare. Cc: stable@vger.kernel.org Reported-by: chase xd <sl1589472800@gmail.com> Fixes: bee1d5becdf5b ("io_uring: disable io-wq execution of multishot NOWAIT requests") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c5c8c4a50a882fd581257b81bf52eee260ac29fd.1735407848.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-28Merge tag 'block-6.13-20241228' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fix from Jens Axboe: "Just a single fix for ublk setup error handling" * tag 'block-6.13-20241228' of git://git.kernel.dk/linux: ublk: detach gendisk from ublk device if add_disk() fails
2024-12-28Merge tag 'io_uring-6.13-20241228' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fix from Jens Axboe: "Just a single fix for a theoretical issue with SQPOLL setup" * tag 'io_uring-6.13-20241228' of git://git.kernel.dk/linux: io_uring/sqpoll: fix sqpoll error handling races
2024-12-28Merge tag '6.13-rc4-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - fix caching of files that will be reused for write - minor cleanup * tag '6.13-rc4-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Remove unused is_server_using_iface() smb: enable reuse of deferred file handles for write operations
2024-12-28modpost: work around unaligned data access errorMasahiro Yamada
With the latest binutils, modpost fails with a bus error on some architectures such as ARM and sparc64. Since binutils commit 1f1b5e506bf0 ("bfd/ELF: restrict file alignment for object files"), the byte offset to each section (sh_offset) in relocatable ELF is no longer guaranteed to be aligned. modpost parses MODULE_DEVICE_TABLE() data structures, which are usually located in the .rodata section. If it is not properly aligned, unaligned access errors may occur. To address the issue, this commit imports the get_unaligned() helper from include/linux/unaligned.h. The get_unaligned_native() helper caters to the endianness in addition to handling the unaligned access. I slightly refactored do_pcmcia_entry() and do_input() to avoid writing back to an unaligned address. (We would need the put_unaligned() helper to do that.) The addend_*_rel() functions need similar adjustments because the .text sections are not aligned either. It seems that the .symtab, .rel.* and .rela.* sections are still aligned. Keep normal pointer access for these sections to avoid unnecessary performance costs. Reported-by: Paulo Pisati <paolo.pisati@canonical.com> Reported-by: Matthias Klose <doko@debian.org> Closes: https://sourceware.org/bugzilla/show_bug.cgi?id=32435 Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Closes: https://sourceware.org/bugzilla/show_bug.cgi?id=32493 Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2024-12-28modpost: refactor do_vmbus_entry()Masahiro Yamada
Optimize the size of guid_name[], as it only requires 1 additional byte for '\0' instead of 2. Simplify the loop by incrementing the iterator by 1 instead of 2. Remove the unnecessary TO_NATIVE() call, as the guid is represented as a byte stream. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2024-12-28modpost: fix the missed iteration for the max bit in do_input()Masahiro Yamada
This loop should iterate over the range from 'min' to 'max' inclusively. The last interation is missed. Fixes: 1d8f430c15b3 ("[PATCH] Input: add modalias support") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2024-12-28scripts/mksysmap: Fix escape chars '$'Mostafa Saleh
Commit b18b047002b7 ("kbuild: change scripts/mksysmap into sed script") changed the invocation of the script, to call sed directly without shell. That means, the current extra escape that was added in: commit ec336aa83162 ("scripts/mksysmap: Fix badly escaped '$'") for the shell is not correct any more, at the moment the stack traces for nvhe are corrupted: [ 22.840904] kvm [190]: [<ffff80008116dd54>] __kvm_nvhe_$x.220+0x58/0x9c [ 22.842913] kvm [190]: [<ffff8000811709bc>] __kvm_nvhe_$x.9+0x44/0x50 [ 22.844112] kvm [190]: [<ffff80008116f8fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4 With this patch: [ 25.793513] kvm [192]: nVHE call trace: [ 25.794141] kvm [192]: [<ffff80008116dd54>] __kvm_nvhe_hyp_panic+0xb0/0xf4 [ 25.796590] kvm [192]: [<ffff8000811709bc>] __kvm_nvhe_handle_trap+0xe4/0x188 [ 25.797553] kvm [192]: [<ffff80008116f8fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4 Fixes: b18b047002b7 ("kbuild: change scripts/mksysmap into sed script") Signed-off-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-12-27Merge tag 'trace-tools-v6.13-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing tool fix from Steven Rostedt: - Fix rtla divide by zero when the count is zero in histograms * tag 'trace-tools-v6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rtla/timerlat: Fix histogram ALL for zero samples