summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-02-17kasan: don't call find_vm_area() in a PREEMPT_RT kernelWaiman Long
The following bug report was found when running a PREEMPT_RT debug kernel. BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 140605, name: kunit_try_catch preempt_count: 1, expected: 0 Call trace: rt_spin_lock+0x70/0x140 find_vmap_area+0x84/0x168 find_vm_area+0x1c/0x50 print_address_description.constprop.0+0x2a0/0x320 print_report+0x108/0x1f8 kasan_report+0x90/0xc8 Since commit e30a0361b851 ("kasan: make report_lock a raw spinlock"), report_lock was changed to raw_spinlock_t to fix another similar PREEMPT_RT problem. That alone isn't enough to cover other corner cases. print_address_description() is always invoked under the report_lock. The context under this lock is always atomic even on PREEMPT_RT. find_vm_area() acquires vmap_node::busy.lock which is a spinlock_t, becoming a sleeping lock on PREEMPT_RT and must not be acquired in atomic context. Don't invoke find_vm_area() on PREEMPT_RT and just print the address. Non-PREEMPT_RT builds remain unchanged. Add a DEFINE_WAIT_OVERRIDE_MAP() macro to tell lockdep that this lock nesting is allowed because the PREEMPT_RT part (which is invalid) has been taken care of. This macro was first introduced in commit 0cce06ba859a ("debugobjects,locking: Annotate debug_object_fill_pool() wait type violation"). Link: https://lkml.kernel.org/r/20250217204402.60533-1-longman@redhat.com Fixes: e30a0361b851 ("kasan: make report_lock a raw spinlock") Signed-off-by: Waiman Long <longman@redhat.com> Suggested-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mariano Pache <npache@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17MAINTAINERS: update Nick's contact infoNick Desaulniers
Updated .mailmap, but forgot these other places. Link: https://lkml.kernel.org/r/20250212173523.3979840-1-ndesaulniers@google.com Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17selftests/mm: fix check for running THP testsMark Brown
When testing if we should try to compact memory or drop caches before we run the THP or HugeTLB tests we use | as an or operator. This doesn't work since run_vmtests.sh is written in shell where this is used to pipe the output of the first argument into the second. Instead use the shell's -o operator. Link: https://lkml.kernel.org/r/20250212-kselftest-mm-no-hugepages-v1-1-44702f538522@kernel.org Fixes: b433ffa8dbac ("selftests: mm: perform some system cleanup before using hugepages") Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Nico Pache <npache@redhat.com> Cc: Mariano Pache <npache@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17mm: hugetlb: avoid fallback for specific node allocation of 1G pagesLuiz Capitulino
When using the HugeTLB kernel command-line to allocate 1G pages from a specific node, such as: default_hugepagesz=1G hugepages=1:1 If node 1 happens to not have enough memory for the requested number of 1G pages, the allocation falls back to other nodes. A quick way to reproduce this is by creating a KVM guest with a memory-less node and trying to allocate 1 1G page from it. Instead of failing, the allocation will fallback to other nodes. This defeats the purpose of node specific allocation. Also, specific node allocation for 2M pages don't have this behavior: the allocation will just fail for the pages it can't satisfy. This issue happens because HugeTLB calls memblock_alloc_try_nid_raw() for 1G boot-time allocation as this function falls back to other nodes if the allocation can't be satisfied. Use memblock_alloc_exact_nid_raw() instead, which ensures that the allocation will only be satisfied from the specified node. Link: https://lkml.kernel.org/r/20250211034856.629371-1-luizcap@redhat.com Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation") Signed-off-by: Luiz Capitulino <luizcap@redhat.com> Acked-by: Oscar Salvador <osalvador@suse.de> Acked-by: David Hildenbrand <david@redhat.com> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Zhenguo Yao <yaozhenguo1@gmail.com> Cc: Frank van der Linden <fvdl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17memcg: avoid dead loop when setting memory.maxChen Ridong
A softlockup issue was found with stress test: watchdog: BUG: soft lockup - CPU#27 stuck for 26s! [migration/27:181] CPU: 27 UID: 0 PID: 181 Comm: migration/27 6.14.0-rc2-next-20250210 #1 Stopper: multi_cpu_stop <- stop_machine_from_inactive_cpu RIP: 0010:stop_machine_yield+0x2/0x10 RSP: 0000:ff4a0dcecd19be48 EFLAGS: 00000246 RAX: ffffffff89c0108f RBX: ff4a0dcec03afe44 RCX: 0000000000000000 RDX: ff1cdaaf6eba5808 RSI: 0000000000000282 RDI: ff1cda80c1775a40 RBP: 0000000000000001 R08: 00000011620096c6 R09: 7fffffffffffffff R10: 0000000000000001 R11: 0000000000000100 R12: ff1cda80c1775a40 R13: 0000000000000000 R14: 0000000000000001 R15: ff4a0dcec03afe20 FS: 0000000000000000(0000) GS:ff1cdaaf6eb80000(0000) CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000025e2c2a001 CR4: 0000000000773ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: multi_cpu_stop+0x8f/0x100 cpu_stopper_thread+0x90/0x140 smpboot_thread_fn+0xad/0x150 kthread+0xc2/0x100 ret_from_fork+0x2d/0x50 The stress test involves CPU hotplug operations and memory control group (memcg) operations. The scenario can be described as follows: echo xx > memory.max cache_ap_online oom_reaper (CPU23) (CPU50) xx < usage stop_machine_from_inactive_cpu for(;;) // all active cpus trigger OOM queue_stop_cpus_work // waiting oom_reaper multi_cpu_stop(migration/xx) // sync all active cpus ack // waiting cpu23 ack // CPU50 loops in multi_cpu_stop waiting cpu50 Detailed explanation: 1. When the usage is larger than xx, an OOM may be triggered. If the process does not handle with ths kill signal immediately, it will loop in the memory_max_write. 2. When cache_ap_online is triggered, the multi_cpu_stop is queued to the active cpus. Within the multi_cpu_stop function, it attempts to synchronize the CPU states. However, the CPU23 didn't acknowledge because it is stuck in a loop within the for(;;). 3. The oom_reaper process is blocked because CPU50 is in a loop, waiting for CPU23 to acknowledge the synchronization request. 4. Finally, it formed cyclic dependency and lead to softlockup and dead loop. To fix this issue, add cond_resched() in the memory_max_write, so that it will not block migration task. Link: https://lkml.kernel.org/r/20250211081819.33307-1-chenridong@huaweicloud.com Fixes: b6e6edcfa405 ("mm: memcontrol: reclaim and OOM kill when shrinking memory.max below usage") Signed-off-by: Chen Ridong <chenridong@huawei.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Wang Weiyang <wangweiyang2@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17mailmap: update Nick's entryNick Desaulniers
Link: https://lkml.kernel.org/r/20250211212117.3195265-1-ndesaulniers@google.com Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17mm: pgtable: fix incorrect reclaim of non-empty PTE pagesQi Zheng
In zap_pte_range(), if the pte lock was released midway, the pte entries may be refilled with physical pages by another thread, which may cause a non-empty PTE page to be reclaimed and eventually cause the system to crash. To fix it, fall back to the slow path in this case to recheck if all pte entries are still none. Link: https://lkml.kernel.org/r/20250211072625.89188-1-zhengqi.arch@bytedance.com Fixes: 6375e95f381e ("mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED)") Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reported-by: Christian Brauner <brauner@kernel.org> Closes: https://lore.kernel.org/all/20250207-anbot-bankfilialen-acce9d79a2c7@brauner/ Reported-by: Qu Wenruo <quwenruo.btrfs@gmx.com> Closes: https://lore.kernel.org/all/152296f3-5c81-4a94-97f3-004108fba7be@gmx.com/ Tested-by: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17taskstats: modify taskstats versionWang Yaxin
After adding "delay max" and "delay min" to the taskstats structure, the taskstats version needs to be updated. Link: https://lkml.kernel.org/r/20250208144901218Q5ptVpqsQkb2MOEmW4Ujn@zte.com.cn Fixes: f65c64f311ee ("delayacct: add delay min to record delay peak") Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn> Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn> Reviewed-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17getdelays: fix error format charactersWang Yaxin
getdelays had a compilation issue because the format string was not updated when the "delay min" was added. For example, after adding the "delay min" in printf, there were 7 strings but only 6 "%s" format specifiers. Similarly, after adding the 't->cpu_delay_total', there were 7 variables but only 6 format characters specifiers, causing compilation issues as follows. This commit fixes these issues to ensure that getdelays compiles correctly. root@xx:~/linux-next/tools/accounting$ make getdelays.c:199:9: warning: format `%llu' expects argument of type `long long unsigned int', but argument 8 has type `char *' [-Wformat=] 199 | printf("\n\nCPU %15s%15s%15s%15s%15s%15s\n" | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ..... 216 | "delay total", "delay average", "delay max", "delay min", | ~~~~~~~~~~~ | | | char * getdelays.c:200:21: note: format string is defined here 200 | " %15llu%15llu%15llu%15llu%15.3fms%13.6fms\n" | ~~~~~^ | | | long long unsigned int | %15s getdelays.c:199:9: warning: format `%f' expects argument of type `double', but argument 12 has type `long long unsigned int' [-Wformat=] 199 | printf("\n\nCPU %15s%15s%15s%15s%15s%15s\n" | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ..... 220 | (unsigned long long)t->cpu_delay_total, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | | long long unsigned int ..... Link: https://lkml.kernel.org/r/20250208144400544RduNRhwIpT3m2JyRBqskZ@zte.com.cn Fixes: f65c64f311ee ("delayacct: add delay min to record delay peak") Reviewed-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn> Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Fan Yu <fan.yu9@zte.com.cn> Cc: Peilin He <he.peilin@zte.com.cn> Cc: Qiang Tu <tu.qiang35@zte.com.cn> Cc: wangyong <wang.yong12@zte.com.cn> Cc: ye xingchen <ye.xingchen@zte.com.cn> Cc: Yunkai Zhang <zhang.yunkai@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17mm/migrate_device: don't add folio to be freed to LRU in ↵David Hildenbrand
migrate_device_finalize() If migration succeeded, we called folio_migrate_flags()->mem_cgroup_migrate() to migrate the memcg from the old to the new folio. This will set memcg_data of the old folio to 0. Similarly, if migration failed, memcg_data of the dst folio is left unset. If we call folio_putback_lru() on such folios (memcg_data == 0), we will add the folio to be freed to the LRU, making memcg code unhappy. Running the hmm selftests: # ./hmm-tests ... # RUN hmm.hmm_device_private.migrate ... [ 102.078007][T14893] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x7ff27d200 pfn:0x13cc00 [ 102.079974][T14893] anon flags: 0x17ff00000020018(uptodate|dirty|swapbacked|node=0|zone=2|lastcpupid=0x7ff) [ 102.082037][T14893] raw: 017ff00000020018 dead000000000100 dead000000000122 ffff8881353896c9 [ 102.083687][T14893] raw: 00000007ff27d200 0000000000000000 00000001ffffffff 0000000000000000 [ 102.085331][T14893] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled()) [ 102.087230][T14893] ------------[ cut here ]------------ [ 102.088279][T14893] WARNING: CPU: 0 PID: 14893 at ./include/linux/memcontrol.h:726 folio_lruvec_lock_irqsave+0x10e/0x170 [ 102.090478][T14893] Modules linked in: [ 102.091244][T14893] CPU: 0 UID: 0 PID: 14893 Comm: hmm-tests Not tainted 6.13.0-09623-g6c216bc522fd #151 [ 102.093089][T14893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 [ 102.094848][T14893] RIP: 0010:folio_lruvec_lock_irqsave+0x10e/0x170 [ 102.096104][T14893] Code: ... [ 102.099908][T14893] RSP: 0018:ffffc900236c37b0 EFLAGS: 00010293 [ 102.101152][T14893] RAX: 0000000000000000 RBX: ffffea0004f30000 RCX: ffffffff8183f426 [ 102.102684][T14893] RDX: ffff8881063cb880 RSI: ffffffff81b8117f RDI: ffff8881063cb880 [ 102.104227][T14893] RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000 [ 102.105757][T14893] R10: 0000000000000001 R11: 0000000000000002 R12: ffffc900236c37d8 [ 102.107296][T14893] R13: ffff888277a2bcb0 R14: 000000000000001f R15: 0000000000000000 [ 102.108830][T14893] FS: 00007ff27dbdd740(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000 [ 102.110643][T14893] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 102.111924][T14893] CR2: 00007ff27d400000 CR3: 000000010866e000 CR4: 0000000000750ef0 [ 102.113478][T14893] PKRU: 55555554 [ 102.114172][T14893] Call Trace: [ 102.114805][T14893] <TASK> [ 102.115397][T14893] ? folio_lruvec_lock_irqsave+0x10e/0x170 [ 102.116547][T14893] ? __warn.cold+0x110/0x210 [ 102.117461][T14893] ? folio_lruvec_lock_irqsave+0x10e/0x170 [ 102.118667][T14893] ? report_bug+0x1b9/0x320 [ 102.119571][T14893] ? handle_bug+0x54/0x90 [ 102.120494][T14893] ? exc_invalid_op+0x17/0x50 [ 102.121433][T14893] ? asm_exc_invalid_op+0x1a/0x20 [ 102.122435][T14893] ? __wake_up_klogd.part.0+0x76/0xd0 [ 102.123506][T14893] ? dump_page+0x4f/0x60 [ 102.124352][T14893] ? folio_lruvec_lock_irqsave+0x10e/0x170 [ 102.125500][T14893] folio_batch_move_lru+0xd4/0x200 [ 102.126577][T14893] ? __pfx_lru_add+0x10/0x10 [ 102.127505][T14893] __folio_batch_add_and_move+0x391/0x720 [ 102.128633][T14893] ? __pfx_lru_add+0x10/0x10 [ 102.129550][T14893] folio_putback_lru+0x16/0x80 [ 102.130564][T14893] migrate_device_finalize+0x9b/0x530 [ 102.131640][T14893] dmirror_migrate_to_device.constprop.0+0x7c5/0xad0 [ 102.133047][T14893] dmirror_fops_unlocked_ioctl+0x89b/0xc80 Likely, nothing else goes wrong: putting the last folio reference will remove the folio from the LRU again. So besides memcg complaining, adding the folio to be freed to the LRU is just an unnecessary step. The new flow resembles what we have in migrate_folio_move(): add the dst to the lru, remove migration ptes, unlock and unref dst. Link: https://lkml.kernel.org/r/20250210161317.717936-1-david@redhat.com Fixes: 8763cb45ab96 ("mm/migrate: new memory migration helper for use with device memory") Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17tools/mm: fix build warnings with musl-libcFlorian Fainelli
musl-libc warns about the following: /home/florian/dev/buildroot/output/arm64/rpi4-b/host/aarch64-buildroot-linux-musl/sysroot/usr/include/sys/errno.h:1:2: attention: #warning redirecting incorrect #include <sys/errno.h> to <errno.h> [-Wcpp] 1 | #warning redirecting incorrect #include <sys/errno.h> to <errno.h> | ^~~~~~~ /home/florian/dev/buildroot/output/arm64/rpi4-b/host/aarch64-buildroot-linux-musl/sysroot/usr/include/sys/fcntl.h:1:2: attention: #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h> [-Wcpp] 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h> | ^~~~~~~ include errno.h and fcntl.h directly. Link: https://lkml.kernel.org/r/20250210200518.1137295-1-florian.fainelli@broadcom.com Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17mailmap: add entry for Feng TangFeng Tang
Map my old business email to personal email. Link: https://lkml.kernel.org/r/20250205060457.53667-1-feng.tang@linux.alibaba.com Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17.mailmap: add entries for Jeff JohnsonJeff Johnson
Map past iterations of my e-mail addresses to the current one. Link: https://lkml.kernel.org/r/20250205-jjohnson-mailmap-v1-1-269cb7b1710d@oss.qualcomm.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17mm,madvise,hugetlb: check for 0-length range after end address adjustmentRicardo Cañuelo Navarro
Add a sanity check to madvise_dontneed_free() to address a corner case in madvise where a race condition causes the current vma being processed to be backed by a different page size. During a madvise(MADV_DONTNEED) call on a memory region registered with a userfaultfd, there's a period of time where the process mm lock is temporarily released in order to send a UFFD_EVENT_REMOVE and let userspace handle the event. During this time, the vma covering the current address range may change due to an explicit mmap done concurrently by another thread. If, after that change, the memory region, which was originally backed by 4KB pages, is now backed by hugepages, the end address is rounded down to a hugepage boundary to avoid data loss (see "Fixes" below). This rounding may cause the end address to be truncated to the same address as the start. Make this corner case follow the same semantics as in other similar cases where the requested region has zero length (ie. return 0). This will make madvise_walk_vmas() continue to the next vma in the range (this time holding the process mm lock) which, due to the prev pointer becoming stale because of the vma change, will be the same hugepage-backed vma that was just checked before. The next time madvise_dontneed_free() runs for this vma, if the start address isn't aligned to a hugepage boundary, it'll return -EINVAL, which is also in line with the madvise api. From userspace perspective, madvise() will return EINVAL because the start address isn't aligned according to the new vma alignment requirements (hugepage), even though it was correctly page-aligned when the call was issued. Link: https://lkml.kernel.org/r/20250203075206.1452208-1-rcn@igalia.com Fixes: 8ebe0a5eaaeb ("mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs") Signed-off-by: Ricardo Cañuelo Navarro <rcn@igalia.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Florent Revest <revest@google.com> Cc: Rik van Riel <riel@surriel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17mm/zswap: fix inconsistency when zswap_store_page() failsHyeonggon Yoo
Commit b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()") skips charging any zswap entries when it failed to zswap the entire folio. However, when some base pages are zswapped but it failed to zswap the entire folio, the zswap operation is rolled back. When freeing zswap entries for those pages, zswap_entry_free() uncharges the zswap entries that were not previously charged, causing zswap charging to become inconsistent. This inconsistency triggers two warnings with following steps: # On a machine with 64GiB of RAM and 36GiB of zswap $ stress-ng --bigheap 2 # wait until the OOM-killer kills stress-ng $ sudo reboot The two warnings are: in mm/memcontrol.c:163, function obj_cgroup_release(): WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1)); in mm/page_counter.c:60, function page_counter_cancel(): if (WARN_ONCE(new < 0, "page_counter underflow: %ld nr_pages=%lu\n", new, nr_pages)) zswap_stored_pages also becomes inconsistent in the same way. As suggested by Kanchana, increment zswap_stored_pages and charge zswap entries within zswap_store_page() when it succeeds. This way, zswap_entry_free() will decrement the counter and uncharge the entries when it failed to zswap the entire folio. While this could potentially be optimized by batching objcg charging and incrementing the counter, let's focus on fixing the bug this time and leave the optimization for later after some evaluation. After resolving the inconsistency, the warnings disappear. [42.hyeyoo@gmail.com: refactor zswap_store_page()] Link: https://lkml.kernel.org/r/20250131082037.2426-1-42.hyeyoo@gmail.com Link: https://lkml.kernel.org/r/20250129100844.2935-1-42.hyeyoo@gmail.com Fixes: b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()") Co-developed-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: Yosry Ahmed <yosry.ahmed@linux.dev> Acked-by: Nhat Pham <nphamcs@gmail.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17lib/iov_iter: fix import_iovec_ubuf iovec managementPavel Begunkov
import_iovec() says that it should always be fine to kfree the iovec returned in @iovp regardless of the error code. __import_iovec_ubuf() never reallocates it and thus should clear the pointer even in cases when copy_iovec_*() fail. Link: https://lkml.kernel.org/r/378ae26923ffc20fd5e41b4360d673bf47b1775b.1738332461.git.asml.silence@gmail.com Fixes: 3b2deb0e46da ("iov_iter: import single vector iovecs as ITER_UBUF") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17procfs: fix a locking bug in a vmcore_add_device_dump() error pathBart Van Assche
Unlock vmcore_mutex when returning -EBUSY. Link: https://lkml.kernel.org/r/20250129222003.1495713-1-bvanassche@acm.org Fixes: 0f3b1c40c652 ("fs/proc/vmcore: disallow vmcore modifications while the vmcore is open") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Baoquan he <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17Merge branch 'extend-struct_ops-support-for-operators'Alexei Starovoitov
Amery Hung says: ==================== This patchset supports struct_ops operators that acquire kptrs through arguments and operators that return a kptr. A coming new struct_ops use case, bpf qdisc [0], has two operators that are not yet supported by current struct_ops infrastructure. Qdisc_ops::enqueue requires getting referenced skb kptr from the argument; Qdisc_ops::dequeue needs to return a referenced skb kptr. This patchset will allow bpf qdisc and other potential struct_ops implementers to do so. For struct_ops implementers: - To get a kptr from an argument, a struct_ops implementer needs to annotate the argument name in the stub function with "__ref" suffix. - The kptr return will automatically work as we now allow operators that return a struct pointer. - The verifier allows returning a null pointer. More control can be added later if there is a future struct_ops implementer only expecting valid pointers. For struct_ops users: - The referenced kptr acquired through the argument needs to be released or xchged into maps just like ones acquired via kfuncs. - To return a referenced kptr in struct_ops, 1) The type of the pointer must matches the return type 2) The pointer must comes from the kernel (not locally allocated), and 3) The pointer must be in its unmodified form [0] https://lore.kernel.org/bpf/20250210174336.2024258-1-ameryhung@gmail.com/ --- v2 - Replace kcalloc+memcpy with kmemdup_array in bpf_prog_ctx_arg_info_init() - Remove unnecessary checks when kfree-ing ctx_arg_info - Remove conditional assignment of ref_obj_id in btf_ctx_access() v1 - Link: https://lore.kernel.org/bpf/20250214164520.1001211-1-ameryhung@gmail.com/ - Fix missing kfree for ctx_arg_info ==================== Link: https://patch.msgid.link/20250217190640.1748177-1-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-17selftests/bpf: Test returning referenced kptr from struct_ops programsAmery Hung
Test struct_ops programs returning referenced kptr. When the return type of a struct_ops operator is pointer to struct, the verifier should only allow programs that return a scalar NULL or a non-local kptr with the correct type in its unmodified form. Signed-off-by: Amery Hung <amery.hung@bytedance.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250217190640.1748177-6-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-17bpf: Allow struct_ops prog to return referenced kptrAmery Hung
Allow a struct_ops program to return a referenced kptr if the struct_ops operator's return type is a struct pointer. To make sure the returned pointer continues to be valid in the kernel, several constraints are required: 1) The type of the pointer must matches the return type 2) The pointer originally comes from the kernel (not locally allocated) 3) The pointer is in its unmodified form Implementation wise, a referenced kptr first needs to be allowed to _leak_ in check_reference_leak() if it is in the return register. Then, in check_return_code(), constraints 1-3 are checked. During struct_ops registration, a check is also added to warn about operators with non-struct pointer return. In addition, since the first user, Qdisc_ops::dequeue, allows a NULL pointer to be returned when there is no skb to be dequeued, we will allow a scalar value with value equals to NULL to be returned. In the future when there is a struct_ops user that always expects a valid pointer to be returned from an operator, we may extend tagging to the return value. We can tell the verifier to only allow NULL pointer return if the return value is tagged with MAY_BE_NULL. Signed-off-by: Amery Hung <amery.hung@bytedance.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250217190640.1748177-5-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-17selftests/bpf: Test referenced kptr arguments of struct_ops programsAmery Hung
Test referenced kptr acquired through struct_ops argument tagged with "__ref". The success case checks whether 1) a reference to the correct type is acquired, and 2) the referenced kptr argument can be accessed in multiple paths as long as it hasn't been released. In the fail cases, we first confirm that a referenced kptr acquried through a struct_ops argument is not allowed to be leaked. Then, we make sure this new referenced kptr acquiring mechanism does not accidentally allow referenced kptrs to flow into global subprograms through their arguments. Signed-off-by: Amery Hung <amery.hung@bytedance.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250217190640.1748177-4-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-17bpf: Support getting referenced kptr from struct_ops argumentAmery Hung
Allows struct_ops programs to acqurie referenced kptrs from arguments by directly reading the argument. The verifier will acquire a reference for struct_ops a argument tagged with "__ref" in the stub function in the beginning of the main program. The user will be able to access the referenced kptr directly by reading the context as long as it has not been released by the program. This new mechanism to acquire referenced kptr (compared to the existing "kfunc with KF_ACQUIRE") is introduced for ergonomic and semantic reasons. In the first use case, Qdisc_ops, an skb is passed to .enqueue in the first argument. This mechanism provides a natural way for users to get a referenced kptr in the .enqueue struct_ops programs and makes sure that a qdisc will always enqueue or drop the skb. Signed-off-by: Amery Hung <amery.hung@bytedance.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250217190640.1748177-3-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-17bpf: Make every prog keep a copy of ctx_arg_infoAmery Hung
Currently, ctx_arg_info is read-only in the view of the verifier since it is shared among programs of the same attach type. Make each program have their own copy of ctx_arg_info so that we can use it to store program specific information. In the next patch where we support acquiring a referenced kptr through a struct_ops argument tagged with "__ref", ctx_arg_info->ref_obj_id will be used to store the unique reference object id of the argument. This avoids creating a requirement in the verifier that "__ref" tagged arguments must be the first set of references acquired [0]. [0] https://lore.kernel.org/bpf/20241220195619.2022866-2-amery.hung@gmail.com/ Signed-off-by: Amery Hung <ameryhung@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250217190640.1748177-2-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-17MAINTAINERS: create entry for ethtool MAC mergeJakub Kicinski
Vladimir implemented the MAC merge support and reviews all the new driver implementations. Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250215225200.2652212-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17ibmvnic: Don't reference skb after sending to VIOSNick Child
Previously, after successfully flushing the xmit buffer to VIOS, the tx_bytes stat was incremented by the length of the skb. It is invalid to access the skb memory after sending the buffer to the VIOS because, at any point after sending, the VIOS can trigger an interrupt to free this memory. A race between reading skb->len and freeing the skb is possible (especially during LPM) and will result in use-after-free: ================================================================== BUG: KASAN: slab-use-after-free in ibmvnic_xmit+0x75c/0x1808 [ibmvnic] Read of size 4 at addr c00000024eb48a70 by task hxecom/14495 <...> Call Trace: [c000000118f66cf0] [c0000000018cba6c] dump_stack_lvl+0x84/0xe8 (unreliable) [c000000118f66d20] [c0000000006f0080] print_report+0x1a8/0x7f0 [c000000118f66df0] [c0000000006f08f0] kasan_report+0x128/0x1f8 [c000000118f66f00] [c0000000006f2868] __asan_load4+0xac/0xe0 [c000000118f66f20] [c0080000046eac84] ibmvnic_xmit+0x75c/0x1808 [ibmvnic] [c000000118f67340] [c0000000014be168] dev_hard_start_xmit+0x150/0x358 <...> Freed by task 0: kasan_save_stack+0x34/0x68 kasan_save_track+0x2c/0x50 kasan_save_free_info+0x64/0x108 __kasan_mempool_poison_object+0x148/0x2d4 napi_skb_cache_put+0x5c/0x194 net_tx_action+0x154/0x5b8 handle_softirqs+0x20c/0x60c do_softirq_own_stack+0x6c/0x88 <...> The buggy address belongs to the object at c00000024eb48a00 which belongs to the cache skbuff_head_cache of size 224 ================================================================== Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250214155233.235559-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17s390/ism: add release function for struct deviceJulian Ruess
According to device_release() in /drivers/base/core.c, a device without a release function is a broken device and must be fixed. The current code directly frees the device after calling device_add() without waiting for other kernel parts to release their references. Thus, a reference could still be held to a struct device, e.g., by sysfs, leading to potential use-after-free issues if a proper release function is not set. Fixes: 8c81ba20349d ("net/smc: De-tangle ism and smc device initialization") Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: Julian Ruess <julianr@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250214120137.563409-1-wintera@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17drop_monitor: fix incorrect initialization orderGavrilov Ilia
Syzkaller reports the following bug: BUG: spinlock bad magic on CPU#1, syz-executor.0/7995 lock: 0xffff88805303f3e0, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 CPU: 1 PID: 7995 Comm: syz-executor.0 Tainted: G E 5.10.209+ #1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x119/0x179 lib/dump_stack.c:118 debug_spin_lock_before kernel/locking/spinlock_debug.c:83 [inline] do_raw_spin_lock+0x1f6/0x270 kernel/locking/spinlock_debug.c:112 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:117 [inline] _raw_spin_lock_irqsave+0x50/0x70 kernel/locking/spinlock.c:159 reset_per_cpu_data+0xe6/0x240 [drop_monitor] net_dm_cmd_trace+0x43d/0x17a0 [drop_monitor] genl_family_rcv_msg_doit+0x22f/0x330 net/netlink/genetlink.c:739 genl_family_rcv_msg net/netlink/genetlink.c:783 [inline] genl_rcv_msg+0x341/0x5a0 net/netlink/genetlink.c:800 netlink_rcv_skb+0x14d/0x440 net/netlink/af_netlink.c:2497 genl_rcv+0x29/0x40 net/netlink/genetlink.c:811 netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline] netlink_unicast+0x54b/0x800 net/netlink/af_netlink.c:1348 netlink_sendmsg+0x914/0xe00 net/netlink/af_netlink.c:1916 sock_sendmsg_nosec net/socket.c:651 [inline] __sock_sendmsg+0x157/0x190 net/socket.c:663 ____sys_sendmsg+0x712/0x870 net/socket.c:2378 ___sys_sendmsg+0xf8/0x170 net/socket.c:2432 __sys_sendmsg+0xea/0x1b0 net/socket.c:2461 do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x62/0xc7 RIP: 0033:0x7f3f9815aee9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3f972bf0c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f3f9826d050 RCX: 00007f3f9815aee9 RDX: 0000000020000000 RSI: 0000000020001300 RDI: 0000000000000007 RBP: 00007f3f981b63bd R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000006e R14: 00007f3f9826d050 R15: 00007ffe01ee6768 If drop_monitor is built as a kernel module, syzkaller may have time to send a netlink NET_DM_CMD_START message during the module loading. This will call the net_dm_monitor_start() function that uses a spinlock that has not yet been initialized. To fix this, let's place resource initialization above the registration of a generic netlink family. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 9a8afc8d3962 ("Network Drop Monitor: Adding drop monitor implementation & Netlink protocol") Cc: stable@vger.kernel.org Signed-off-by: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250213152054.2785669-1-Ilia.Gavrilov@infotecs.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17Merge tag 'vfs-6.14-rc4.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "It was reported that the acct(2) system call can be used to trigger a NULL deref in cases where it is set to write to a file that triggers an internal lookup. This can e.g., happen when pointing acct(2) to /sys/power/resume. At the point the where the write to this file happens the calling task has already exited and called exit_fs() but an internal lookup might be triggered through lookup_bdev(). This may trigger a NULL-deref when accessing current->fs. Reorganize the code so that the the final write happens from the workqueue but with the caller's credentials. This preserves the (strange) permission model and has almost no regression risk. Also block access to kernel internal filesystems as well as procfs and sysfs in the first place. Various fixes for netfslib: - Fix a number of read-retry hangs, including: - Incorrect getting/putting of references on subreqs as we retry them - Failure to track whether a last old subrequest in a retried set is superfluous - Inconsistency in the usage of wait queues used for subrequests (ie. using clear_and_wake_up_bit() whilst waiting on a private waitqueue) - Add stats counters for retries and publish in /proc/fs/netfs/stats. This is not a fix per se, but is useful in debugging and shouldn't otherwise change the operation of the code - Fix the ordering of queuing subrequests with respect to setting the request flag that says we've now queued them all" * tag 'vfs-6.14-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: netfs: Fix setting NETFS_RREQ_ALL_QUEUED to be after all subreqs queued netfs: Add retry stat counters netfs: Fix a number of read-retry hangs acct: block access to kernel internal filesystems acct: perform last write from workqueue
2025-02-17Merge tag 'powerpc-6.14-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Madhavan Srinivasan: - Couple of patches to fix KASAN failduring boot - Fix to avoid warnings/errors when building with 4k page size Thanks to Christophe Leroy, Ritesh Harjani (IBM), and Erhard Furtner * tag 'powerpc-6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/code-patching: Fix KASAN hit by not flagging text patching area as VM_ALLOC powerpc/64s: Rewrite __real_pte() and __rpte_to_hidx() as static inline powerpc/code-patching: Disable KASAN report during patching via temporary mm
2025-02-17ALSA: seq: Drop UMP events when no UMP-conversion is setTakashi Iwai
When a destination client is a user client in the legacy MIDI mode and it sets the no-UMP-conversion flag, currently the all UMP events are still passed as-is. But this may confuse the user-space, because the event packet size is different from the legacy mode. Since we cannot handle UMP events in user clients unless it's running in the UMP client mode, we should filter out those events instead of accepting blindly. This patch addresses it by slightly adjusting the conditions for UMP event handling at the event delivery time. Fixes: 329ffe11a014 ("ALSA: seq: Allow suppressing UMP conversions") Link: https://lore.kernel.org/b77a2cd6-7b59-4eb0-a8db-22d507d3af5f@gmail.com Link: https://patch.msgid.link/20250217170034.21930-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-02-17ALSA: hda/conexant: Add quirk for HP ProBook 450 G4 mute LEDJohn Veness
Allows the LED on the dedicated mute button on the HP ProBook 450 G4 laptop to change colour correctly. Signed-off-by: John Veness <john-linux@pelago.org.uk> Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/2fb55d48-6991-4a42-b591-4c78f2fad8d7@pelago.org.uk Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-02-16smb: client: fix chmod(2) regression with ATTR_READONLYPaulo Alcantara
When the user sets a file or directory as read-only (e.g. ~S_IWUGO), the client will set the ATTR_READONLY attribute by sending an SMB2_SET_INFO request to the server in cifs_setattr_{,nounix}(), but cifsInodeInfo::cifsAttrs will be left unchanged as the client will only update the new file attributes in the next call to {smb311_posix,cifs}_get_inode_info() with the new metadata filled in @data parameter. Commit a18280e7fdea ("smb: cilent: set reparse mount points as automounts") mistakenly removed the @data NULL check when calling is_inode_cache_good(), which broke the above case as the new ATTR_READONLY attribute would end up not being updated on files with a read lease. Fix this by updating the inode whenever we have cached metadata in @data parameter. Reported-by: Horst Reiterer <horst.reiterer@fabasoft.com> Closes: https://lore.kernel.org/r/85a16504e09147a195ac0aac1c801280@fabasoft.com Fixes: a18280e7fdea ("smb: cilent: set reparse mount points as automounts") Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-02-16Linux 6.14-rc3v6.14-rc3Linus Torvalds
2025-02-16Merge tag 'kbuild-fixes-v6.14-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix annoying logs when building tools in parallel - Fix the Debian linux-headers package build again - Fix the target triple detection for userspace programs on Clang * tag 'kbuild-fixes-v6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: modpost: Fix a few typos in a comment kbuild: userprogs: fix bitsize and target detection on clang kbuild: fix linux-headers package build when $(CC) cannot link userspace tools: fix annoying "mkdir -p ..." logs when building tools in parallel
2025-02-16Merge tag 'driver-core-6.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core api addition from Greg KH: "Here is a driver core new api for 6.14-rc3 that is being added to allow platform devices from stop being abused. It adds a new 'faux_device' structure and bus and api to allow almost a straight or simpler conversion from platform devices that were not really a platform device. It also comes with a binding for rust, with an example driver in rust showing how it's used. I'm adding this now so that the patches that convert the different drivers and subsystems can all start flowing into linux-next now through their different development trees, in time for 6.15-rc1. We have a number that are already reviewed and tested, but adding those conversions now doesn't seem right. For now, no one is using this, and it passes all build tests from 0-day and linux-next, so all should be good" * tag 'driver-core-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: rust/kernel: Add faux device bindings driver core: add a faux bus for use when a simple device/bus is needed
2025-02-16Merge tag 'tty-6.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull serial driver fixes from Greg KH: "Here are some small serial driver fixes for some reported problems. Nothing major, just: - sc16is7xx irq check fix - 8250 fifo underflow fix - serial_port and 8250 iotype fixes Most of these have been in linux-next already, and all have passed 0-day testing" * tag 'tty-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: 8250: Fix fifo underflow on flush serial: 8250_pnp: Remove unneeded ->iotype assignment serial: 8250_platform: Remove unneeded ->iotype assignment serial: 8250_of: Remove unneeded ->iotype assignment serial: port: Make ->iotype validation global in __uart_read_properties() serial: port: Always update ->iotype in __uart_read_properties() serial: port: Assign ->iotype correctly when ->iobase is set serial: sc16is7xx: Fix IRQ number check behavior
2025-02-16Merge tag 'usb-6.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB driver fixes, and new device ids, for 6.14-rc3. Lots of tiny stuff for reported problems, including: - new device ids and quirks - usb hub crash fix found by syzbot - dwc2 driver fix - dwc3 driver fixes - uvc gadget driver fix - cdc-acm driver fixes for a variety of different issues - other tiny bugfixes Almost all of these have been in linux-next this week, and all have passed 0-day testing" * tag 'usb-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (25 commits) usb: typec: tcpm: PSSourceOffTimer timeout in PR_Swap enters ERROR_RECOVERY usb: roles: set switch registered flag early on usb: gadget: uvc: Fix unstarted kthread worker USB: quirks: add USB_QUIRK_NO_LPM quirk for Teclast dist usb: gadget: core: flush gadget workqueue after device removal USB: gadget: f_midi: f_midi_complete to call queue_work usb: core: fix pipe creation for get_bMaxPacketSize0 usb: dwc3: Fix timeout issue during controller enter/exit from halt state USB: Add USB_QUIRK_NO_LPM quirk for sony xperia xz1 smartphone USB: cdc-acm: Fill in Renesas R-Car D3 USB Download mode quirk usb: cdc-acm: Fix handling of oversized fragments usb: cdc-acm: Check control transfer buffer size before access usb: xhci: Restore xhci_pci support for Renesas HCs USB: pci-quirks: Fix HCCPARAMS register error for LS7A EHCI USB: serial: option: drop MeiG Smart defines USB: serial: option: fix Telit Cinterion FN990A name USB: serial: option: add Telit Cinterion FN990B compositions USB: serial: option: add MeiG Smart SLM828 usb: gadget: f_midi: fix MIDI Streaming descriptor lengths usb: dwc2: gadget: remove of_node reference upon udc_stop ...
2025-02-16Merge tag 'irq_urgent_for_v6.14_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq Kconfig cleanup from Borislav Petkov: - Remove an unused config item GENERIC_PENDING_IRQ_CHIPFLAGS * tag 'irq_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Remove unused CONFIG_GENERIC_PENDING_IRQ_CHIPFLAGS
2025-02-16Merge tag 'perf_urgent_for_v6.14_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fixes from Borislav Petkov: - Explicitly clear DEBUGCTL.LBR to prevent LBRs continuing being enabled after handoff to the OS - Check CPUID(0x23) leaf and subleafs presence properly - Remove the PEBS-via-PT feature from being supported on hybrid systems - Fix perf record/top default commands on systems without a raw PMU registered * tag 'perf_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Ensure LBRs are disabled when a CPU is starting perf/x86/intel: Fix ARCH_PERFMON_NUM_COUNTER_LEAF perf/x86/intel: Clean up PEBS-via-PT on hybrid perf/x86/rapl: Fix the error checking order
2025-02-16Merge tag 'sched_urgent_for_v6.14_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Borislav Petkov: - Clarify what happens when a task is woken up from the wake queue and make clear its removal from that queue is atomic * tag 'sched_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Clarify wake_up_q()'s write to task->wake_q.next
2025-02-16Merge tag 'objtool_urgent_for_v6.14_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Borislav Petkov: - Move a warning about a lld.ld breakage into the verbose setting as said breakage has been fixed in the meantime - Teach objtool to ignore dangling jump table entries added by Clang * tag 'objtool_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Move dodgy linker warn to verbose objtool: Ignore dangling jump table entries
2025-02-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Large set of fixes for vector handling, especially in the interactions between host and guest state. This fixes a number of bugs affecting actual deployments, and greatly simplifies the FP/SIMD/SVE handling. Thanks to Mark Rutland for dealing with this thankless task. - Fix an ugly race between vcpu and vgic creation/init, resulting in unexpected behaviours - Fix use of kernel VAs at EL2 when emulating timers with nVHE - Small set of pKVM improvements and cleanups x86: - Fix broken SNP support with KVM module built-in, ensuring the PSP module is initialized before KVM even when the module infrastructure cannot be used to order initcalls - Reject Hyper-V SEND_IPI hypercalls if the local APIC isn't being emulated by KVM to fix a NULL pointer dereference - Enter guest mode (L2) from KVM's perspective before initializing the vCPU's nested NPT MMU so that the MMU is properly tagged for L2, not L1 - Load the guest's DR6 outside of the innermost .vcpu_run() loop, as the guest's value may be stale if a VM-Exit is handled in the fastpath" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits) x86/sev: Fix broken SNP support with KVM module built-in KVM: SVM: Ensure PSP module is initialized if KVM module is built-in crypto: ccp: Add external API interface for PSP module initialization KVM: arm64: vgic: Hoist SGI/PPI alloc from vgic_init() to kvm_create_vgic() KVM: arm64: timer: Drop warning on failed interrupt signalling KVM: arm64: Fix alignment of kvm_hyp_memcache allocations KVM: arm64: Convert timer offset VA when accessed in HYP code KVM: arm64: Simplify warning in kvm_arch_vcpu_load_fp() KVM: arm64: Eagerly switch ZCR_EL{1,2} KVM: arm64: Mark some header functions as inline KVM: arm64: Refactor exit handlers KVM: arm64: Refactor CPTR trap deactivation KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN KVM: arm64: Remove host FPSIMD saving for non-protected KVM KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state KVM: x86: Load DR6 with guest value only before entering .vcpu_run() loop KVM: nSVM: Enter guest mode before initializing nested NPT MMU KVM: selftests: Add CPUID tests for Hyper-V features that need in-kernel APIC KVM: selftests: Manage CPUID array in Hyper-V CPUID test's core helper ...
2025-02-16Merge tag 'mips-fixes_6.14_1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Thomas Bogendoerfer: "Fix for o32 ptrace/get_syscall_info" * tag 'mips-fixes_6.14_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: fix mips_get_syscall_arg() for o32 MIPS: Export syscall stack arguments properly for remote use
2025-02-15Merge tag 'devicetree-fixes-for-6.14-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: - Add bindings for QCom QCS8300 clocks, QCom SAR2130P qfprom, and powertip,{st7272|hx8238a} displays - Fix compatible for TI am62a7 dss - Add a kunit test for __of_address_resource_bounds() * tag 'devicetree-fixes-for-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: display: Add powertip,{st7272|hx8238a} as DT Schema description dt-bindings: nvmem: qcom,qfprom: Add SAR2130P compatible dt-bindings: display: ti: Fix compatible for am62a7 dss of: address: Add kunit test for __of_address_resource_bounds() dt-bindings: clock: qcom: Add QCS8300 video clock controller dt-bindings: clock: qcom: Add CAMCC clocks for QCS8300 dt-bindings: clock: qcom: Add GPU clocks for QCS8300
2025-02-15Merge tag 'uml-for-linus-6.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML fixes from Richard Weinberger: - Align signal stack correctly - Convert to raw spinlocks where needed (irq and virtio) - FPU related fixes * tag 'uml-for-linus-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: um: convert irq_lock to raw spinlock um: virtio_uml: use raw spinlock um: virt-pci: don't use kmalloc() um: fix execve stub execution on old host OSs um: properly align signal stack on x86_64 um: avoid copying FP state from init_task um: add back support for FXSAVE registers
2025-02-15Merge tag 'trace-ring-buffer-v6.14-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull trace ring buffer fixes from Steven Rostedt: - Enable resize on mmap() error When a process mmaps a ring buffer, its size is locked and resizing is disabled. But if the user passes in a wrong parameter, the mmap() can fail after the resize was disabled and the mmap() exits with error without reenabling the ring buffer resize. This prevents the ring buffer from ever being resized after that. Reenable resizing of the ring buffer on mmap() error. - Have resizing return proper error and not always -ENOMEM If the ring buffer is mmapped by one task and another task tries to resize the buffer it will error with -ENOMEM. This is confusing to the user as there may be plenty of memory available. Have it return the error that actually happens (in this case -EBUSY) where the user can understand why the resize failed. - Test the sub-buffer array to validate persistent memory buffer On boot up, the initialization of the persistent memory buffer will do a validation check to see if the content of the data is valid, and if so, it will use the memory as is, otherwise it re-initializes it. There's meta data in this persistent memory that keeps track of which sub-buffer is the reader page and an array that states the order of the sub-buffers. The values in this array are indexes into the sub-buffers. The validator checks to make sure that all the entries in the array are within the sub-buffer list index, but it does not check for duplications. While working on this code, the array got corrupted and had duplicates, where not all the sub-buffers were accounted for. This passed the validator as all entries were valid, but the link list was incorrect and could have caused a crash. The corruption only produced incorrect data, but it could have been more severe. To fix this, create a bitmask that covers all the sub-buffer indexes and set it to all zeros. While iterating the array checking the values of the array content, have it set a bit corresponding to the index in the array. If the bit was already set, then it is a duplicate and mark the buffer as invalid and reset it. - Prevent mmap()ing persistent ring buffer The persistent ring buffer uses vmap() to map the persistent memory. Currently, the mmap() logic only uses virt_to_page() to get the page from the ring buffer memory and use that to map to user space. This works because a normal ring buffer uses alloc_page() to allocate its memory. But because the persistent ring buffer use vmap() it causes a kernel crash. Fixing this to work with vmap() is not hard, but since mmap() on persistent memory buffers never worked, just have the mmap() return -ENODEV (what was returned before mmap() for persistent memory ring buffers, as they never supported mmap. Normal buffers will still allow mmap(). Implementing mmap() for persistent memory ring buffers can wait till the next merge window. - Fix polling on persistent ring buffers There's a "buffer_percent" option (default set to 50), that is used to have reads of the ring buffer binary data block until the buffer fills to that percentage. The field "pages_touched" is incremented every time a new sub-buffer has content added to it. This field is used in the calculations to determine the amount of content is in the buffer and if it exceeds the "buffer_percent" then it will wake the task polling on the buffer. As persistent ring buffers can be created by the content from a previous boot, the "pages_touched" field was not updated. This means that if a task were to poll on the persistent buffer, it would block even if the buffer was completely full. It would block even if the "buffer_percent" was zero, because with "pages_touched" as zero, it would be calculated as the buffer having no content. Update pages_touched when initializing the persistent ring buffer from a previous boot. * tag 'trace-ring-buffer-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ring-buffer: Update pages_touched to reflect persistent buffer content tracing: Do not allow mmap() of persistent ring buffer ring-buffer: Validate the persistent meta data subbuf array tracing: Have the error of __tracing_resize_ring_buffer() passed to user ring-buffer: Unlock resize on mmap error
2025-02-15ring-buffer: Update pages_touched to reflect persistent buffer contentSteven Rostedt
The pages_touched field represents the number of subbuffers in the ring buffer that have content that can be read. This is used in accounting of "dirty_pages" and "buffer_percent" to allow the user to wait for the buffer to be filled to a certain amount before it reads the buffer in blocking mode. The persistent buffer never updated this value so it was set to zero, and this accounting would take it as it had no content. This would cause user space to wait for content even though there's enough content in the ring buffer that satisfies the buffer_percent. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/20250214123512.0631436e@gandalf.local.home Fixes: 5f3b6e839f3ce ("ring-buffer: Validate boot range memory events") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-02-15tracing: Do not allow mmap() of persistent ring bufferSteven Rostedt
When trying to mmap a trace instance buffer that is attached to reserve_mem, it would crash: BUG: unable to handle page fault for address: ffffe97bd00025c8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 2862f3067 P4D 2862f3067 PUD 0 Oops: Oops: 0000 [#1] PREEMPT_RT SMP PTI CPU: 4 UID: 0 PID: 981 Comm: mmap-rb Not tainted 6.14.0-rc2-test-00003-g7f1a5e3fbf9e-dirty #233 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:validate_page_before_insert+0x5/0xb0 Code: e2 01 89 d0 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 <48> 8b 46 08 a8 01 75 67 66 90 48 89 f0 8b 50 34 85 d2 74 76 48 89 RSP: 0018:ffffb148c2f3f968 EFLAGS: 00010246 RAX: ffff9fa5d3322000 RBX: ffff9fa5ccff9c08 RCX: 00000000b879ed29 RDX: ffffe97bd00025c0 RSI: ffffe97bd00025c0 RDI: ffff9fa5ccff9c08 RBP: ffffb148c2f3f9f0 R08: 0000000000000004 R09: 0000000000000004 R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000 R13: 00007f16a18d5000 R14: ffff9fa5c48db6a8 R15: 0000000000000000 FS: 00007f16a1b54740(0000) GS:ffff9fa73df00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffe97bd00025c8 CR3: 00000001048c6006 CR4: 0000000000172ef0 Call Trace: <TASK> ? __die_body.cold+0x19/0x1f ? __die+0x2e/0x40 ? page_fault_oops+0x157/0x2b0 ? search_module_extables+0x53/0x80 ? validate_page_before_insert+0x5/0xb0 ? kernelmode_fixup_or_oops.isra.0+0x5f/0x70 ? __bad_area_nosemaphore+0x16e/0x1b0 ? bad_area_nosemaphore+0x16/0x20 ? do_kern_addr_fault+0x77/0x90 ? exc_page_fault+0x22b/0x230 ? asm_exc_page_fault+0x2b/0x30 ? validate_page_before_insert+0x5/0xb0 ? vm_insert_pages+0x151/0x400 __rb_map_vma+0x21f/0x3f0 ring_buffer_map+0x21b/0x2f0 tracing_buffers_mmap+0x70/0xd0 __mmap_region+0x6f0/0xbd0 mmap_region+0x7f/0x130 do_mmap+0x475/0x610 vm_mmap_pgoff+0xf2/0x1d0 ksys_mmap_pgoff+0x166/0x200 __x64_sys_mmap+0x37/0x50 x64_sys_call+0x1670/0x1d70 do_syscall_64+0xbb/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f The reason was that the code that maps the ring buffer pages to user space has: page = virt_to_page((void *)cpu_buffer->subbuf_ids[s]); And uses that in: vm_insert_pages(vma, vma->vm_start, pages, &nr_pages); But virt_to_page() does not work with vmap()'d memory which is what the persistent ring buffer has. It is rather trivial to allow this, but for now just disable mmap() of instances that have their ring buffer from the reserve_mem option. If an mmap() is performed on a persistent buffer it will return -ENODEV just like it would if the .mmap field wasn't defined in the file_operations structure. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/20250214115547.0d7287d3@gandalf.local.home Fixes: 9b7bdf6f6ece6 ("tracing: Have trace_printk not use binary prints if boot buffer") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-02-15Merge tag 'i2c-for-6.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "MAINTAINERS maintenance. Changed email, added entry, deleted entry falling back to a generic one" * tag 'i2c-for-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: MAINTAINERS: Add maintainer for Qualcomm's I2C GENI driver MAINTAINERS: delete entry for AXXIA I2C MAINTAINERS: Use my kernel.org address for I2C ACPI work
2025-02-15Merge tag 's390-6.14-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix isolated VFs handling by verifying that a VF’s parent PF is locally owned before registering it in an existing PCI domain - Disable arch_test_bit() optimization for PROFILE_ALL_BRANCHES to workaround gcc failure in handling __builtin_constant_p() in this case - Fix CHPID "configure" attribute caching in CIO by not updating the cache when SCLP returns no data, ensuring consistent sysfs output - Remove CONFIG_LSM from default configs and rely on defaults, which enables BPF LSM hook * tag 's390-6.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/pci: Fix handling of isolated VFs s390/pci: Pull search for parent PF out of zpci_iov_setup_virtfn() s390/bitops: Disable arch_test_bit() optimization for PROFILE_ALL_BRANCHES s390/cio: Fix CHPID "configure" attribute caching s390/configs: Remove CONFIG_LSM