summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-06-08tools/power turbostat: version 2025.06.08Len Brown
Add initial DMR support, which required smarter RAPL probe Fix AMD MSR RAPL energy reporting Add RAPL power limit configuration output Minor fixes Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Add initial support for BartlettLakeZhang Rui
Add initial support for BartlettLake. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Add initial support for DMRZhang Rui
Add initial support for DMR. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Dump RAPL sysfs infoZhang Rui
for example: intel-rapl:1: psys 28.0s:100W 976.0us:100W intel-rapl:0: package-0 28.0s:57W,max:15W 2.4ms:57W intel-rapl:0/intel-rapl:0:0: core disabled intel-rapl:0/intel-rapl:0:1: uncore disabled intel-rapl-mmio:0: package-0 28.0s:28W,max:15W 2.4ms:57W [lenb: simplified format] Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> squish me Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Avoid probing the same perf countersZhang Rui
For the RAPL package energy status counter, Intel and AMD share the same perf_subsys and perf_name, but with different MSR addresses. Both rapl_counter_arch_infos[0] and rapl_counter_arch_infos[1] are introduced to describe this counter for different Vendors. As a result, the perf counter is probed twice, and causes a failure in in get_rapl_counters() because expected_read_size and actual_read_size don't match. Fix the problem by skipping the already probed counter. Note, this is not a perfect fix. For example, if different vendors/platforms use the same MSR value for different purpose, the code can be fooled when it probes a rapl_counter_arch_infos[] entry that does not belong to the running Vendor/Platform. In a long run, better to put rapl_counter_arch_infos[] into the platform_features so that this becomes Vendor/Platform specific. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Allow probing RAPL with platform_features->rapl_msrs ↵Zhang Rui
cleared platform_features->rapl_msrs describes the RAPL MSRs supported. While RAPL Perf counters can be exposed from different kernel backend drivers, e.g. RAPL MSR I/F driver, or RAPL TPMI I/F driver. Thus, turbostat should first blindly probe all the available RAPL Perf counters, and falls back to the RAPL MSR counters if they are listed in platform_features->rapl_msrs. With this, platforms that don't have RAPL MSRs can clear the platform_features->rapl_msrs bits and use RAPL Perf counters only. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Clean up add perf/msr counter logicZhang Rui
Increase the code readability by moving the no_perf/no_msr flag and the cai->perf_name/cai->msr sanity checks into the counter probe functions. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Introduce add_msr_counter()Zhang Rui
probe_rapl_msr() is reused for probing RAPL MSR counters, cstate MSR counters and MPERF/APERF/SMI MSR counters, thus its name is misleading. Similar to add_perf_counter(), introduce add_msr_counter() to probe a counter via MSR. Introduce wrapper function add_rapl_msr_counter() at the same time to add extra check for Zero return value for specified RAPL counters. No functional change intended. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Remove add_msr_perf_counter_()Zhang Rui
As the only caller of add_msr_perf_counter_(), add_msr_perf_counter() just gives extra debug output on top. There is no need to keep both functions. Remove add_msr_perf_counter_() and move all the logic to add_msr_perf_counter(). No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Remove add_cstate_perf_counter_()Zhang Rui
As the only caller of add_cstate_perf_counter_(), add_cstate_perf_counter() just gives extra debug output on top. There is no need to keep both functions. Remove add_cstate_perf_counter_() and move all the logic to add_cstate_perf_counter(). No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Remove add_rapl_perf_counter_()Zhang Rui
As the only caller of add_rapl_perf_counter_(), add_rapl_perf_counter() just gives extra debug output on top. There is no need to keep both functions. Remove add_rapl_perf_counter_() and move all the logic to add_rapl_perf_counter(). No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Quit early for unsupported RAPL countersZhang Rui
Quit early for unsupported RAPL counters. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Always check rapl_joules flagZhang Rui
rapl_joules bit should always be checked even if platform_features->rapl_msrs is not set or no_msr flag is used. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Fix AMD package-energy reportingGautham R. Shenoy
commit 05a2f07db888 ("tools/power turbostat: read RAPL counters via perf") that adds support to read RAPL counters via perf defines the notion of a RAPL domain_id which is set to physical_core_id on platforms which support per_core_rapl counters (Eg: AMD processors Family 17h onwards) and is set to the physical_package_id on all the other platforms. However, the physical_core_id is only unique within a package and on platforms with multiple packages more than one core can have the same physical_core_id and thus the same domain_id. (For eg, the first cores of each package have the physical_core_id = 0). This results in all these cores with the same physical_core_id using the same entry in the rapl_counter_info_perdomain[]. Since rapl_perf_init() skips the perf-initialization for cores whose domain_ids have already been visited, cores that have the same physical_core_id always read the perf file corresponding to the physical_core_id of the first package and thus the package-energy is incorrectly reported to be the same value for different packages. Note: This issue only arises when RAPL counters are read via perf and not when they are read via MSRs since in the latter case the MSRs are read separately on each core. Fix this issue by associating each CPU with rapl_core_id which is unique across all the packages in the system. Fixes: 05a2f07db888 ("tools/power turbostat: read RAPL counters via perf") Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Fix RAPL_GFX_ALL typoKaushlendra Kumar
Fix typo in the currently unused RAPL_GFX_ALL macro definition. Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Add Android support for MSR device handlingKaushlendra Kumar
It uses /dev/msrN device paths on Android instead of /dev/cpu/N/msr, updates error messages and permission checks to reflect the Android device path, and wraps platform-specific code with #if defined(ANDROID) to ensure correct behavior on both Android and non-Android systems. These changes improve compatibility and usability of turbostat on Android devices. Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat.8: pm_domain wording fixLen Brown
turbostat.8: clarify that uncore "domains" are Power Management domains, aka pm_domains. Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat.8: fix typo: idle_pct should be pct_idleLen Brown
idle_pct should be pct_idle Signed-off-by: Len Brown <len.brown@intel.com>
2025-05-25Linux 6.15v6.15Linus Torvalds
2025-05-25Disable FOP_DONTCACHE for now due to bugsLinus Torvalds
This is kind of last-minute, but Al Viro reported that the new FOP_DONTCACHE flag causes memory corruption due to use-after-free issues. This was triggered by commit 974c5e6139db ("xfs: flag as supporting FOP_DONTCACHE"), but that is not the underlying bug - it is just the first user of the flag. Vlastimil Babka suspects the underlying problem stems from the folio_end_writeback() logic introduced in commit fb7d3bc414939 ("mm/filemap: drop streaming/uncached pages when writeback completes"). The most straightforward fix would be to just revert the commit that exposed this, but Matthew Wilcox points out that other filesystems are also starting to enable the FOP_DONTCACHE logic, so this instead disables that bit globally for now. The fix will hopefully end up being trivial and we can just re-enable this logic after more testing, but until such a time we'll have to disable the new FOP_DONTCACHE flag. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/all/20250525083209.GS2023217@ZenIV/ Triggered-by: 974c5e6139db ("xfs: flag as supporting FOP_DONTCACHE") Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-05-25Merge tag 'mm-hotfixes-stable-2025-05-25-00-58' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "22 hotfixes. 13 are cc:stable and the remainder address post-6.14 issues or aren't considered necessary for -stable kernels. 19 are for MM" * tag 'mm-hotfixes-stable-2025-05-25-00-58' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits) mailmap: add Jarkko's employer email address mm: fix copy_vma() error handling for hugetlb mappings memcg: always call cond_resched() after fn() mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios mm: vmalloc: only zero-init on vrealloc shrink mm: vmalloc: actually use the in-place vrealloc region alloc_tag: allocate percpu counters for module tags dynamically module: release codetag section when module load fails mm/cma: make detection of highmem_start more robust MAINTAINERS: add mm memory policy section MAINTAINERS: add mm ksm section kasan: avoid sleepable page allocation from atomic context highmem: add folio_test_partial_kmap() MAINTAINERS: add hung-task detector section taskstats: fix struct taskstats breaks backward compatibility since version 15 mm/truncate: fix out-of-bounds when doing a right-aligned split MAINTAINERS: add mm reclaim section MAINTAINERS: update page allocator section mm: fix VM_UFFD_MINOR == VM_SHADOW_STACK on USERFAULTFD=y && ARM64_GCS=y mm: mmap: map MAP_STACK to VM_NOHUGEPAGE only if THP is enabled ...
2025-05-25mailmap: add Jarkko's employer email addressJarkko Sakkinen
Add the current employer email address to mailmap. Link: https://lkml.kernel.org/r/20250523121105.15850-1-jarkko@kernel.org Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Cc: Alexander Sverdlin <alexander.sverdlin@gmail.com> Cc: Antonio Quartulli <antonio@openvpn.net> Cc: Carlos Bilbao <carlos.bilbao@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25mm: fix copy_vma() error handling for hugetlb mappingsRicardo Cañuelo Navarro
If, during a mremap() operation for a hugetlb-backed memory mapping, copy_vma() fails after the source vma has been duplicated and opened (ie. vma_link() fails), the error is handled by closing the new vma. This updates the hugetlbfs reservation counter of the reservation map which at this point is referenced by both the source vma and the new copy. As a result, once the new vma has been freed and copy_vma() returns, the reservation counter for the source vma will be incorrect. This patch addresses this corner case by clearing the hugetlb private page reservation reference for the new vma and decrementing the reference before closing the vma, so that vma_close() won't update the reservation counter. This is also what copy_vma_and_data() does with the source vma if copy_vma() succeeds, so a helper function has been added to do the fixup in both functions. The issue was reported by a private syzbot instance and can be reproduced using the C reproducer in [1]. It's also a possible duplicate of public syzbot report [2]. The WARNING report is: ============================================================ page_counter underflow: -1024 nr_pages=1024 WARNING: CPU: 0 PID: 3287 at mm/page_counter.c:61 page_counter_cancel+0xf6/0x120 Modules linked in: CPU: 0 UID: 0 PID: 3287 Comm: repro__WARNING_ Not tainted 6.15.0-rc7+ #54 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014 RIP: 0010:page_counter_cancel+0xf6/0x120 Code: ff 5b 41 5e 41 5f 5d c3 cc cc cc cc e8 f3 4f 8f ff c6 05 64 01 27 06 01 48 c7 c7 60 15 f8 85 48 89 de 4c 89 fa e8 2a a7 51 ff <0f> 0b e9 66 ff ff ff 44 89 f9 80 e1 07 38 c1 7c 9d 4c 81 RSP: 0018:ffffc900025df6a0 EFLAGS: 00010246 RAX: 2edfc409ebb44e00 RBX: fffffffffffffc00 RCX: ffff8880155f0000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 RBP: dffffc0000000000 R08: ffffffff81c4a23c R09: 1ffff1100330482a R10: dffffc0000000000 R11: ffffed100330482b R12: 0000000000000000 R13: ffff888058a882c0 R14: ffff888058a882c0 R15: 0000000000000400 FS: 0000000000000000(0000) GS:ffff88808fc53000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004b33e0 CR3: 00000000076d6000 CR4: 00000000000006f0 Call Trace: <TASK> page_counter_uncharge+0x33/0x80 hugetlb_cgroup_uncharge_counter+0xcb/0x120 hugetlb_vm_op_close+0x579/0x960 ? __pfx_hugetlb_vm_op_close+0x10/0x10 remove_vma+0x88/0x130 exit_mmap+0x71e/0xe00 ? __pfx_exit_mmap+0x10/0x10 ? __mutex_unlock_slowpath+0x22e/0x7f0 ? __pfx_exit_aio+0x10/0x10 ? __up_read+0x256/0x690 ? uprobe_clear_state+0x274/0x290 ? mm_update_next_owner+0xa9/0x810 __mmput+0xc9/0x370 exit_mm+0x203/0x2f0 ? __pfx_exit_mm+0x10/0x10 ? taskstats_exit+0x32b/0xa60 do_exit+0x921/0x2740 ? do_raw_spin_lock+0x155/0x3b0 ? __pfx_do_exit+0x10/0x10 ? __pfx_do_raw_spin_lock+0x10/0x10 ? _raw_spin_lock_irq+0xc5/0x100 do_group_exit+0x20c/0x2c0 get_signal+0x168c/0x1720 ? __pfx_get_signal+0x10/0x10 ? schedule+0x165/0x360 arch_do_signal_or_restart+0x8e/0x7d0 ? __pfx_arch_do_signal_or_restart+0x10/0x10 ? __pfx___se_sys_futex+0x10/0x10 syscall_exit_to_user_mode+0xb8/0x2c0 do_syscall_64+0x75/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x422dcd Code: Unable to access opcode bytes at 0x422da3. RSP: 002b:00007ff266cdb208 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: 0000000000000001 RBX: 00007ff266cdbcdc RCX: 0000000000422dcd RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00000000004c7bec RBP: 00007ff266cdb220 R08: 203a6362696c6720 R09: 203a6362696c6720 R10: 0000200000c00000 R11: 0000000000000246 R12: ffffffffffffffd0 R13: 0000000000000002 R14: 00007ffe1cb5f520 R15: 00007ff266cbb000 </TASK> ============================================================ Link: https://lkml.kernel.org/r/20250523-warning_in_page_counter_cancel-v2-1-b6df1a8cfefd@igalia.com Link: https://people.igalia.com/rcn/kernel_logs/20250422__WARNING_in_page_counter_cancel__repro.c [1] Link: https://lore.kernel.org/all/67000a50.050a0220.49194.048d.GAE@google.com/ [2] Signed-off-by: Ricardo Cañuelo Navarro <rcn@igalia.com> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Florent Revest <revest@google.com> Cc: Jann Horn <jannh@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25memcg: always call cond_resched() after fn()Breno Leitao
I am seeing soft lockup on certain machine types when a cgroup OOMs. This is happening because killing the process in certain machine might be very slow, which causes the soft lockup and RCU stalls. This happens usually when the cgroup has MANY processes and memory.oom.group is set. Example I am seeing in real production: [462012.244552] Memory cgroup out of memory: Killed process 3370438 (crosvm) .... .... [462037.318059] Memory cgroup out of memory: Killed process 4171372 (adb) .... [462037.348314] watchdog: BUG: soft lockup - CPU#64 stuck for 26s! [stat_manager-ag:1618982] .... Quick look at why this is so slow, it seems to be related to serial flush for certain machine types. For all the crashes I saw, the target CPU was at console_flush_all(). In the case above, there are thousands of processes in the cgroup, and it is soft locking up before it reaches the 1024 limit in the code (which would call the cond_resched()). So, cond_resched() in 1024 blocks is not sufficient. Remove the counter-based conditional rescheduling logic and call cond_resched() unconditionally after each task iteration, after fn() is called. This avoids the lockup independently of how slow fn() is. Link: https://lkml.kernel.org/r/20250523-memcg_fix-v1-1-ad3eafb60477@debian.org Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process") Signed-off-by: Breno Leitao <leitao@debian.org> Suggested-by: Rik van Riel <riel@surriel.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Michael van der Westhuizen <rmikey@meta.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Chen Ridong <chenridong@huawei.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb ↵Ge Yang
folios A kernel crash was observed when replacing free hugetlb folios: BUG: kernel NULL pointer dereference, address: 0000000000000028 PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary) RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0 RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000 RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000 RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000 R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000 R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004 FS: 00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0 Call Trace: <TASK> replace_free_hugepage_folios+0xb6/0x100 alloc_contig_range_noprof+0x18a/0x590 ? srso_return_thunk+0x5/0x5f ? down_read+0x12/0xa0 ? srso_return_thunk+0x5/0x5f cma_range_alloc.constprop.0+0x131/0x290 __cma_alloc+0xcf/0x2c0 cma_alloc_write+0x43/0xb0 simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110 debugfs_attr_write+0x46/0x70 full_proxy_write+0x62/0xa0 vfs_write+0xf8/0x420 ? srso_return_thunk+0x5/0x5f ? filp_flush+0x86/0xa0 ? srso_return_thunk+0x5/0x5f ? filp_close+0x1f/0x30 ? srso_return_thunk+0x5/0x5f ? do_dup2+0xaf/0x160 ? srso_return_thunk+0x5/0x5f ksys_write+0x65/0xe0 do_syscall_64+0x64/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e There is a potential race between __update_and_free_hugetlb_folio() and replace_free_hugepage_folios(): CPU1 CPU2 __update_and_free_hugetlb_folio replace_free_hugepage_folios folio_test_hugetlb(folio) -- It's still hugetlb folio. __folio_clear_hugetlb(folio) hugetlb_free_folio(folio) h = folio_hstate(folio) -- Here, h is NULL pointer When the above race condition occurs, folio_hstate(folio) returns NULL, and subsequent access to this NULL pointer will cause the system to crash. To resolve this issue, execute folio_hstate(folio) under the protection of the hugetlb_lock lock, ensuring that folio_hstate(folio) does not return NULL. Link: https://lkml.kernel.org/r/1747884137-26685-1-git-send-email-yangge1116@126.com Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration") Signed-off-by: Ge Yang <yangge1116@126.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <21cnbao@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25mm: vmalloc: only zero-init on vrealloc shrinkKees Cook
The common case is to grow reallocations, and since init_on_alloc will have already zeroed the whole allocation, we only need to zero when shrinking the allocation. Link: https://lkml.kernel.org/r/20250515214217.619685-2-kees@kernel.org Fixes: a0309faf1cb0 ("mm: vmalloc: support more granular vrealloc() sizing") Signed-off-by: Kees Cook <kees@kernel.org> Tested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: "Erhard F." <erhard_f@mailbox.org> Cc: Shung-Hsi Yu <shung-hsi.yu@suse.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25mm: vmalloc: actually use the in-place vrealloc regionKees Cook
Patch series "mm: vmalloc: Actually use the in-place vrealloc region". This fixes a performance regression[1] with vrealloc()[1]. The refactoring to not build a new vmalloc region only actually worked when shrinking. Actually return the resized area when it grows. Ugh. Link: https://lkml.kernel.org/r/20250515214217.619685-1-kees@kernel.org Fixes: a0309faf1cb0 ("mm: vmalloc: support more granular vrealloc() sizing") Signed-off-by: Kees Cook <kees@kernel.org> Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Closes: https://lore.kernel.org/all/20250515-bpf-verifier-slowdown-vwo2meju4cgp2su5ckj@6gi6ssxbnfqg [1] Tested-by: Eduard Zingerman <eddyz87@gmail.com> Tested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Tested-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Reviewed-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Cc: "Erhard F." <erhard_f@mailbox.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25alloc_tag: allocate percpu counters for module tags dynamicallySuren Baghdasaryan
When a module gets unloaded it checks whether any of its tags are still in use and if so, we keep the memory containing module's allocation tags alive until all tags are unused. However percpu counters referenced by the tags are freed by free_module(). This will lead to UAF if the memory allocated by a module is accessed after module was unloaded. To fix this we allocate percpu counters for module allocation tags dynamically and we keep it alive for tags which are still in use after module unloading. This also removes the requirement of a larger PERCPU_MODULE_RESERVE when memory allocation profiling is enabled because percpu memory for counters does not need to be reserved anymore. Link: https://lkml.kernel.org/r/20250517000739.5930-1-surenb@google.com Fixes: 0db6f8d7820a ("alloc_tag: load module tags into separate contiguous memory") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: David Wang <00107082@163.com> Closes: https://lore.kernel.org/all/20250516131246.6244-1-00107082@163.com/ Tested-by: David Wang <00107082@163.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25module: release codetag section when module load failsDavid Wang
When module load fails after memory for codetag section is ready, codetag section memory will not be properly released. This causes memory leak, and if next module load happens to get the same module address, codetag may pick the uninitialized section when manipulating tags during module unload, and leads to "unable to handle page fault" BUG. Link: https://lkml.kernel.org/r/20250519163823.7540-1-00107082@163.com Fixes: 0db6f8d7820a ("alloc_tag: load module tags into separate contiguous memory") Closes: https://lore.kernel.org/all/20250516131246.6244-1-00107082@163.com/ Signed-off-by: David Wang <00107082@163.com> Acked-by: Suren Baghdasaryan <surenb@google.com> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25mm/cma: make detection of highmem_start more robustMike Rapoport (Microsoft)
Pratyush Yadav reports the following crash: ------------[ cut here ]------------ kernel BUG at arch/x86/mm/physaddr.c:23! ception 0x06 IP 10:ffffffff812ebbf8 error 0 cr2 0xffff88903ffff000 CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc6+ #231 PREEMPT(undef) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 RIP: 0010:__phys_addr+0x58/0x60 Code: 01 48 89 c2 48 d3 ea 48 85 d2 75 05 e9 91 52 cf 00 0f 0b 48 3d ff ff ff 1f 77 0f 48 8b 05 20 54 55 01 48 01 d0 e9 78 52 cf 00 <0f> 0b 90 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 RSP: 0000:ffffffff82803dd8 EFLAGS: 00010006 ORIG_RAX: 0000000000000000 RAX: 000000007fffffff RBX: 00000000ffffffff RCX: 0000000000000000 RDX: 000000007fffffff RSI: 0000000280000000 RDI: ffffffffffffffff RBP: ffffffff82803e68 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff83153180 R11: ffffffff82803e48 R12: ffffffff83c9aed0 R13: 0000000000000000 R14: 0000001040000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:0000000000000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88903ffff000 CR3: 0000000002838000 CR4: 00000000000000b0 Call Trace: <TASK> ? __cma_declare_contiguous_nid+0x6e/0x340 ? cma_declare_contiguous_nid+0x33/0x70 ? dma_contiguous_reserve_area+0x2f/0x70 ? setup_arch+0x6f1/0x870 ? start_kernel+0x52/0x4b0 ? x86_64_start_reservations+0x29/0x30 ? x86_64_start_kernel+0x7c/0x80 ? common_startup_64+0x13e/0x141 The reason is that __cma_declare_contiguous_nid() does: highmem_start = __pa(high_memory - 1) + 1; If dma_contiguous_reserve_area() (or any other CMA declaration) is called before free_area_init(), high_memory is uninitialized. Without CONFIG_DEBUG_VIRTUAL, it will likely work but use the wrong value for highmem_start. The issue occurs because commit e120d1bc12da ("arch, mm: set high_memory in free_area_init()") moved initialization of high_memory after the call to dma_contiguous_reserve() -> __cma_declare_contiguous_nid() on several architectures. In the case CONFIG_HIGHMEM is enabled, some architectures that actually support HIGHMEM (arm, powerpc and x86) have initialization of high_memory before a possible call to __cma_declare_contiguous_nid() and some initialized high_memory late anyway (arc, csky, microblase, mips, sparc, xtensa) even before the commit e120d1bc12da so they are fine with using uninitialized value of high_memory. And in the case CONFIG_HIGHMEM is disabled high_memory essentially becomes the first address after memory end, so instead of relying on high_memory to calculate highmem_start use memblock_end_of_DRAM() and eliminate the dependency of CMA area creation on high_memory in majority of configurations. Link: https://lkml.kernel.org/r/20250519171805.1288393-1-rppt@kernel.org Fixes: e120d1bc12da ("arch, mm: set high_memory in free_area_init()") Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reported-by: Pratyush Yadav <ptyadav@amazon.de> Tested-by: Pratyush Yadav <ptyadav@amazon.de> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-24Merge tag 'input-for-v6.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: - even more Xbox controllers added to xpad driver: Turtle Beach Recon Wired Controller, Turtle Beach Stealth Ultra, and PowerA Wired Controller - a fix to Synaptics RMI driver to not crash if controller reports unsupported version of F34 (firmware flash) function * tag 'input-for-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: synaptics-rmi - fix crash with unsupported versions of F34 Input: xpad - add more controllers
2025-05-24Merge tag 'spi-fix-v6.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A few final fixes for v6.15, some driver fixes for the Freescale DSPI driver pulled over from their vendor code and another instance of the fixes Greg has been sending throughout the kernel for constification of the bus_type in driver core match() functions" * tag 'spi-fix-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: spi-fsl-dspi: Reset SR flags before sending a new message spi: spi-fsl-dspi: Halt the module after a new message transfer spi: spi-fsl-dspi: restrict register range for regmap access spi: use container_of_cont() for to_spi_device()
2025-05-24Merge tag 'iommu-fixes-v6.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fix from Joerg Roedel: - core: skip PASID validation for devices without PASID support * tag 'iommu-fixes-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu: Skip PASID validation for devices without PASID capability
2025-05-23Merge tag 'drm-fixes-2025-05-24' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Weekly drm fixes pull, on target to be quiet, just one amdgpu, one edid and a few minor xe fixes. edid: - fix HDR metadata reset amdgpu: - Hibernate fix xe: - Make sure to check all forcewakes when dumping mocs - Fix wrong use of read64 on 32b register - Synchronize Panther Lake PCI IDs" * tag 'drm-fixes-2025-05-24' of https://gitlab.freedesktop.org/drm/kernel: drm/xe/ptl: Update the PTL pci id table drm/xe: Use xe_mmio_read32() to read mtcfg register drm/xe/mocs: Check if all domains awake Revert "drm/amd: Keep display off while going into S4" drm/edid: fixed the bug that hdr metadata was not reset
2025-05-24Merge tag 'drm-xe-fixes-2025-05-23' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - Make sure to check all forcewakes when dumping mocs - Fix wrong use of read64 on 32b register - Synchronize Panther Lake PCI IDs Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/uixp5cq7emz32lmwwvq4vbujppugfozhyj3cm2aqzx4lcg7ivn@m2khvf4kvz5p
2025-05-24Merge tag 'amd-drm-fixes-6.15-2025-05-22' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.15-2025-05-22: amdgpu: - Hibernate fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250522183941.9606-1-alexander.deucher@amd.com
2025-05-24Merge tag 'drm-misc-fixes-2025-05-22' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: edid: - fix HDR metadata reset Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20250522113902.GA7000@localhost.localdomain
2025-05-23Merge tag 'thermal-6.15-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fix from Rafael Wysocki: "This fixes a coding mistake in the x86_pkg_temp_thermal Intel thermal driver that was introduced by an incorrect conflict resolution during a merge (Zhang Rui)" * tag 'thermal-6.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: intel: x86_pkg_temp_thermal: Fix bogus trip temperature
2025-05-23Merge tag 'v6.15-rc8-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: - Fix for rename regression due to the recent VFS lookup changes - Fix write failure - locking fix for oplock handling * tag 'v6.15-rc8-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: use list_first_entry_or_null for opinfo_get_list() ksmbd: fix rename failure ksmbd: fix stream write failure
2025-05-23Merge tag 'soc-fixes-6.15-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "A few last minute fixes: - two driver fixes for samsung/google platforms, both addressing mistakes in changes from the 6.15 merge window - a revert for an allwinner devicetree change that caused problems - a fix for an older regression with the LEDs on Marvell Armada 3720 - a defconfig change to enable chacha20 again after a crypto subsystem change in 6.15 inadventently turned it off" * tag 'soc-fixes-6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: arm64: defconfig: Ensure CRYPTO_CHACHA20_NEON is selected arm64: dts: marvell: uDPU: define pinctrl state for alarm LEDs Revert "arm64: dts: allwinner: h6: Use RSB for AXP805 PMIC connection" soc: samsung: usi: prevent wrong bits inversion during unconfiguring firmware: exynos-acpm: check saved RX before bailing out on empty RX queue
2025-05-23Merge tag 'platform-drivers-x86-v6.15-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform drivers fixes from Ilpo Järvinen: - dell-wmi-sysman: Avoid buffer overflow in current_password_store() - fujitsu-laptop: Support Lifebook S2110 hotkeys - intel/pmc: Fix Arrow Lake U/H NPU PCI ID - think-lmi: Fix attribute name usage for non-compliant items - thinkpad_acpi: Ignore battery threshold change event notification * tag 'platform-drivers-x86-v6.15-6' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86/intel/pmc: Fix Arrow Lake U/H NPU PCI ID platform/x86: think-lmi: Fix attribute name usage for non-compliant items platform/x86: thinkpad_acpi: Ignore battery threshold change event notification platform/x86: dell-wmi-sysman: Avoid buffer overflow in current_password_store() platform/x86: fujitsu-laptop: Support Lifebook S2110 hotkeys
2025-05-23Merge tag 'vfs-6.15-rc8.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "This contains a small set of fixes for the blocking buffer lookup conversion done earlier this cycle. It adds a missing conversion in the getblk slowpath and a few minor optimizations and cleanups" * tag 'vfs-6.15-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs/buffer: optimize discard_buffer() fs/buffer: remove superfluous statements fs/buffer: avoid redundant lookup in getblk slowpath fs/buffer: use sleeping lookup in __getblk_slowpath()
2025-05-23platform/x86/intel/pmc: Fix Arrow Lake U/H NPU PCI IDTodd Brandt
The ARL requires that the GMA and NPU devices both be in D3Hot in order for PC10 and S0iX to be achieved in S2idle. The original ARL-H/U addition to the intel_pmc_core driver attempted to do this by switching them to D3 in the init and resume calls of the intel_pmc_core driver. The problem is the ARL-H/U have a different NPU device and thus are not being properly set and thus S0iX does not work properly in ARL-H/U. This patch creates a new ARL-H specific device id that is correct and also adds the D3 fixup to the suspend callback. This way if the PCI devies drop from D3 to D0 after resume they can be corrected for the next suspend. Thus there is no dropout in S0iX. Fixes: bd820906ea9d ("platform/x86/intel/pmc: Add Arrow Lake U/H support to intel_pmc_core driver") Signed-off-by: Todd Brandt <todd.e.brandt@intel.com> Link: https://lore.kernel.org/r/a61f78be45c13f39e122dcc684b636f4b21e79a0.1747737446.git.todd.e.brandt@intel.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-05-22drm/xe/ptl: Update the PTL pci id tableMatt Atwood
Update to current bspec table. Bspec: 72574 Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com> Link: https://lore.kernel.org/r/20250520195749.371748-1-matthew.s.atwood@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 49c6dc74b5968885f421f9f1b45eb4890b955870) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-05-22drm/xe: Use xe_mmio_read32() to read mtcfg registerShuicheng Lin
The mtcfg register is a 32-bit register and should therefore be accessed using xe_mmio_read32(). Other 3 changes per codestyle suggestion: " xe_mmio.c:83: CHECK: Alignment should match open parenthesis xe_mmio.c:131: CHECK: Comparison to NULL could be written "!xe->mmio.regs" xe_mmio.c:315: CHECK: line length of 103 exceeds 100 columns " Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250513153010.3464767-1-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit d2662cf8f44a68deb6c76ad9f1d9f29dbf7ba601) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-05-22drm/xe/mocs: Check if all domains awakeTejas Upadhyay
Check if all domains are awake specially for LNCF regs Fixes: 298661cd9cea ("drm/xe: Fix MOCS debugfs LNCF readout") Improvements-suggested-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250506142300.1865783-1-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> (cherry picked from commit a383cf218ef8bb35d4c03958bd956573b65cf778) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-05-22Merge tag 'bcachefs-2025-05-22' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "Small stuff, main ones users will be interested in: - Couple more casefolding fixes; we can now detect and repair casefolded dirents in non-casefolded dir and vice versa - Fix for massive write inflation with mmapped io, which hit certain databases" * tag 'bcachefs-2025-05-22' of git://evilpiepirate.org/bcachefs: bcachefs: Check for casefolded dirents in non casefolded dirs bcachefs: Fix bch2_dirent_create_snapshot() for casefolding bcachefs: Fix casefold opt via xattr interface bcachefs: mkwrite() now only dirties one page bcachefs: fix extent_has_stripe_ptr() bcachefs: Fix bch2_btree_path_traverse_cached() when paths realloced
2025-05-22Merge tag 'pmdomain-v6.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm Pull pmdomain fixes from Ulf Hansson: "Core: - Fix error checking in genpd_dev_pm_attach_by_id() Providers: - renesas: Remove obsolete nullify checks for rcar domains" * tag 'pmdomain-v6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: pmdomain: core: Fix error checking in genpd_dev_pm_attach_by_id() pmdomain: renesas: rcar: Remove obsolete nullify checks
2025-05-22Merge tag 'mmc-v6.15-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC host fixes from Ulf Hansson: - sdhci_am654: Fix MMC init failures on am62x boards - sdhci-of-dwcmshc: Add PD workaround on RK3576 to avoid hang * tag 'mmc-v6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci_am654: Add SDHCI_QUIRK2_SUPPRESS_V1P8_ENA quirk to am62 compatible mmc: sdhci-of-dwcmshc: add PD workaround on RK3576
2025-05-22Merge tag 'block-6.15-20250522' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - Fix for a regression with setting up loop on a file system without ->write_iter() - Fix for an nvme sysfs regression * tag 'block-6.15-20250522' of git://git.kernel.dk/linux: nvme: avoid creating multipath sysfs group under namespace path devices loop: don't require ->write_iter for writable files in loop_configure