summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-12-06bpf: Remove unnecessary kfree(im_node) in lpm_trie_update_elemHou Tao
There is no need to call kfree(im_node) when updating element fails, because im_node must be NULL. Remove the unnecessary kfree() for im_node. Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241206110622.1161752-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-06bpf: Remove unnecessary check when updating LPM trieHou Tao
When "node->prefixlen == matchlen" is true, it means that the node is fully matched. If "node->prefixlen == key->prefixlen" is false, it means the prefix length of key is greater than the prefix length of node, otherwise, matchlen will not be equal with node->prefixlen. However, it also implies that the prefix length of node must be less than max_prefixlen. Therefore, "node->prefixlen == trie->max_prefixlen" will always be false when the check of "node->prefixlen == key->prefixlen" returns false. Remove this unnecessary comparison. Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241206110622.1161752-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-06blk-mq: move cpuhp callback registering out of q->sysfs_lockMing Lei
Registering and unregistering cpuhp callback requires global cpu hotplug lock, which is used everywhere. Meantime q->sysfs_lock is used in block layer almost everywhere. It is easy to trigger lockdep warning[1] by connecting the two locks. Fix the warning by moving blk-mq's cpuhp callback registering out of q->sysfs_lock. Add one dedicated global lock for covering registering & unregistering hctx's cpuhp, and it is safe to do so because hctx is guaranteed to be live if our request_queue is live. [1] https://lore.kernel.org/lkml/Z04pz3AlvI4o0Mr8@agluck-desk3/ Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Peter Newman <peternewman@google.com> Cc: Babu Moger <babu.moger@amd.com> Reported-by: Luck Tony <tony.luck@intel.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20241206111611.978870-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-06blk-mq: register cpuhp callback after hctx is added to xarray tableMing Lei
We need to retrieve 'hctx' from xarray table in the cpuhp callback, so the callback should be registered after this 'hctx' is added to xarray table. Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Peter Newman <peternewman@google.com> Cc: Babu Moger <babu.moger@amd.com> Cc: Luck Tony <tony.luck@intel.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20241206111611.978870-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-06smb: client: fix potential race in cifs_put_tcon()Paulo Alcantara
dfs_cache_refresh() delayed worker could race with cifs_put_tcon(), so make sure to call list_replace_init() on @tcon->dfs_ses_list after kworker is cancelled or finished. Fixes: 4f42a8b54b5c ("smb: client: fix DFS interlink failover") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-12-06smb3.1.1: fix posix mounts to older serversSteve French
Some servers which implement the SMB3.1.1 POSIX extensions did not set the file type in the mode in the infolevel 100 response. With the recent changes for checking the file type via the mode field, this can cause the root directory to be reported incorrectly and mounts (e.g. to ksmbd) to fail. Fixes: 6a832bc8bbb2 ("fs/smb/client: Implement new SMB3 POSIX type") Cc: stable@vger.kernel.org Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Cc: Ralph Boehme <slow@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-12-06x86/cacheinfo: Delete global num_cache_leavesRicardo Neri
Linux remembers cpu_cachinfo::num_leaves per CPU, but x86 initializes all CPUs from the same global "num_cache_leaves". This is erroneous on systems such as Meteor Lake, where each CPU has a distinct num_leaves value. Delete the global "num_cache_leaves" and initialize num_leaves on each CPU. init_cache_level() no longer needs to set num_leaves. Also, it never had to set num_levels as it is unnecessary in x86. Keep checking for zero cache leaves. Such condition indicates a bug. [ bp: Cleanup. ] Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org # 6.3+ Link: https://lore.kernel.org/r/20241128002247.26726-3-ricardo.neri-calderon@linux.intel.com
2024-12-06cacheinfo: Allocate memory during CPU hotplug if not done from the primary CPURicardo Neri
Commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU") adds functionality that architectures can use to optionally allocate and build cacheinfo early during boot. Commit 6539cffa9495 ("cacheinfo: Add arch specific early level initializer") lets secondary CPUs correct (and reallocate memory) cacheinfo data if needed. If the early build functionality is not used and cacheinfo does not need correction, memory for cacheinfo is never allocated. x86 does not use the early build functionality. Consequently, during the cacheinfo CPU hotplug callback, last_level_cache_is_valid() attempts to dereference a NULL pointer: BUG: kernel NULL pointer dereference, address: 0000000000000100 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not present page PGD 0 P4D 0 Oops: 0000 [#1] PREEPMT SMP NOPTI CPU: 0 PID 19 Comm: cpuhp/0 Not tainted 6.4.0-rc2 #1 RIP: 0010: last_level_cache_is_valid+0x95/0xe0a Allocate memory for cacheinfo during the cacheinfo CPU hotplug callback if not done earlier. Moreover, before determining the validity of the last-level cache info, ensure that it has been allocated. Simply checking for non-zero cache_leaves() is not sufficient, as some architectures (e.g., Intel processors) have non-zero cache_leaves() before allocation. Dereferencing NULL cacheinfo can occur in update_per_cpu_data_slice_size(). This function iterates over all online CPUs. However, a CPU may have come online recently, but its cacheinfo may not have been allocated yet. While here, remove an unnecessary indentation in allocate_cache_info(). [ bp: Massage. ] Fixes: 6539cffa9495 ("cacheinfo: Add arch specific early level initializer") Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Radu Rendec <rrendec@redhat.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Cc: stable@vger.kernel.org # 6.3+ Link: https://lore.kernel.org/r/20241128002247.26726-2-ricardo.neri-calderon@linux.intel.com
2024-12-06x86/kexec: Restore GDT on return from ::preserve_context kexecDavid Woodhouse
The restore_processor_state() function explicitly states that "the asm code that gets us here will have restored a usable GDT". That wasn't true in the case of returning from a ::preserve_context kexec. Make it so. Without this, the kernel was depending on the called function to reload a GDT which is appropriate for the kernel before returning. Test program: #include <unistd.h> #include <errno.h> #include <stdio.h> #include <stdlib.h> #include <linux/kexec.h> #include <linux/reboot.h> #include <sys/reboot.h> #include <sys/syscall.h> int main (void) { struct kexec_segment segment = {}; unsigned char purgatory[] = { 0x66, 0xba, 0xf8, 0x03, // mov $0x3f8, %dx 0xb0, 0x42, // mov $0x42, %al 0xee, // outb %al, (%dx) 0xc3, // ret }; int ret; segment.buf = &purgatory; segment.bufsz = sizeof(purgatory); segment.mem = (void *)0x400000; segment.memsz = 0x1000; ret = syscall(__NR_kexec_load, 0x400000, 1, &segment, KEXEC_PRESERVE_CONTEXT); if (ret) { perror("kexec_load"); exit(1); } ret = syscall(__NR_reboot, LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2, LINUX_REBOOT_CMD_KEXEC); if (ret) { perror("kexec reboot"); exit(1); } printf("Success\n"); return 0; } Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241205153343.3275139-2-dwmw2@infradead.org
2024-12-05iio: magnetometer: yas530: use signed integer type for clamp limitsJakob Hauser
In the function yas537_measure() there is a clamp_val() with limits of -BIT(13) and BIT(13) - 1. The input clamp value h[] is of type s32. The BIT() is of type unsigned long integer due to its define in include/vdso/bits.h. The lower limit -BIT(13) is recognized as -8192 but expressed as an unsigned long integer. The size of an unsigned long integer differs between 32-bit and 64-bit architectures. Converting this to type s32 may lead to undesired behavior. Additionally, in the calculation lines h[0], h[1] and h[2] the unsigned long integer divisor BIT(13) causes an unsigned division, shifting the left-hand side of the equation back and forth, possibly ending up in large positive values instead of negative values on 32-bit architectures. To solve those two issues, declare a signed integer with a value of BIT(13). There is another omission in the clamp line: clamp_val() returns a value and it's going nowhere here. Self-assign it to h[i] to make use of the clamp macro. Finally, replace clamp_val() macro by clamp() because after changing the limits from type unsigned long integer to signed integer it's fine that way. Link: https://lkml.kernel.org/r/11609b2243c295d65ab4d47e78c239d61ad6be75.1732914810.git.jahau@rocketmail.com Fixes: 65f79b501030 ("iio: magnetometer: yas530: Add YAS537 variant") Signed-off-by: Jakob Hauser <jahau@rocketmail.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202411230458.dhZwh3TT-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202411282222.oF0B4110-lkp@intel.com/ Reviewed-by: David Laight <david.laight@aculab.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05sched/numa: fix memory leak due to the overwritten vma->numab_stateAdrian Huang
[Problem Description] When running the hackbench program of LTP, the following memory leak is reported by kmemleak. # /opt/ltp/testcases/bin/hackbench 20 thread 1000 Running with 20*40 (== 800) tasks. # dmesg | grep kmemleak ... kmemleak: 480 new suspected memory leaks (see /sys/kernel/debug/kmemleak) kmemleak: 665 new suspected memory leaks (see /sys/kernel/debug/kmemleak) # cat /sys/kernel/debug/kmemleak unreferenced object 0xffff888cd8ca2c40 (size 64): comm "hackbench", pid 17142, jiffies 4299780315 hex dump (first 32 bytes): ac 74 49 00 01 00 00 00 4c 84 49 00 01 00 00 00 .tI.....L.I..... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc bff18fd4): [<ffffffff81419a89>] __kmalloc_cache_noprof+0x2f9/0x3f0 [<ffffffff8113f715>] task_numa_work+0x725/0xa00 [<ffffffff8110f878>] task_work_run+0x58/0x90 [<ffffffff81ddd9f8>] syscall_exit_to_user_mode+0x1c8/0x1e0 [<ffffffff81dd78d5>] do_syscall_64+0x85/0x150 [<ffffffff81e0012b>] entry_SYSCALL_64_after_hwframe+0x76/0x7e ... This issue can be consistently reproduced on three different servers: * a 448-core server * a 256-core server * a 192-core server [Root Cause] Since multiple threads are created by the hackbench program (along with the command argument 'thread'), a shared vma might be accessed by two or more cores simultaneously. When two or more cores observe that vma->numab_state is NULL at the same time, vma->numab_state will be overwritten. Although current code ensures that only one thread scans the VMAs in a single 'numa_scan_period', there might be a chance for another thread to enter in the next 'numa_scan_period' while we have not gotten till numab_state allocation [1]. Note that the command `/opt/ltp/testcases/bin/hackbench 50 process 1000` cannot the reproduce the issue. It is verified with 200+ test runs. [Solution] Use the cmpxchg atomic operation to ensure that only one thread executes the vma->numab_state assignment. [1] https://lore.kernel.org/lkml/1794be3c-358c-4cdc-a43d-a1f841d91ef7@amd.com/ Link: https://lkml.kernel.org/r/20241113102146.2384-1-ahuang12@lenovo.com Fixes: ef6a22b70f6d ("sched/numa: apply the scan delay to every new vma") Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Reported-by: Jiwei Sun <sunjw10@lenovo.com> Reviewed-by: Raghavendra K T <raghavendra.kt@amd.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05mm/damon: fix order of arguments in damos_before_apply tracepointAkinobu Mita
Since the order of the scheme_idx and target_idx arguments in TP_ARGS is reversed, they are stored in the trace record in reverse. Link: https://lkml.kernel.org/r/20241115182023.43118-1-sj@kernel.org Link: https://patch.msgid.link/20241112154828.40307-1-akinobu.mita@gmail.com Fixes: c603c630b509 ("mm/damon/core: add a tracepoint for damos apply target regions") Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05lib: stackinit: hide never-taken branch from compilerKees Cook
The never-taken branch leads to an invalid bounds condition, which is by design. To avoid the unwanted warning from the compiler, hide the variable from the optimizer. ../lib/stackinit_kunit.c: In function 'do_nothing_u16_zero': ../lib/stackinit_kunit.c:51:49: error: array subscript 1 is outside array bounds of 'u16[0]' {aka 'short unsigned int[]'} [-Werror=array-bounds=] 51 | #define DO_NOTHING_RETURN_SCALAR(ptr) *(ptr) | ^~~~~~ ../lib/stackinit_kunit.c:219:24: note: in expansion of macro 'DO_NOTHING_RETURN_SCALAR' 219 | return DO_NOTHING_RETURN_ ## which(ptr + 1); \ | ^~~~~~~~~~~~~~~~~~ Link: https://lkml.kernel.org/r/20241117113813.work.735-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05mm/filemap: don't call folio_test_locked() without a reference in ↵David Hildenbrand
next_uptodate_folio() The folio can get freed + buddy-merged + reallocated in the meantime, resulting in us calling folio_test_locked() possibly on a tail page. This makes const_folio_flags VM_BUG_ON_PGFLAGS() when stumbling over the tail page. Could this result in other issues? Doesn't look like it. False positives and false negatives don't really matter, because this folio would get skipped either way when detecting that they have been reallocated in the meantime. Fix it by performing the folio_test_locked() checked after grabbing a reference. If this ever becomes a real problem, we could add a special helper that racily checks if the bit is set even on tail pages ... but let's hope that's not required so we can just handle it cleaner: work on the folio after we hold a reference. Do we really need the folio_test_locked() check if we are going to trylock briefly after? Well, we can at least avoid a xas_reload(). It's a bit unclear which exact change introduced that issue. Likely, ever since we made PG_locked obey to the PF_NO_TAIL policy it could have been triggered in some way. Link: https://lkml.kernel.org/r/20241129125303.4033164-1-david@redhat.com Fixes: 48c935ad88f5 ("page-flags: define PG_locked behavior on compound pages") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: syzbot+9f9a7f73fb079b2387a6@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/674184c9.050a0220.1cc393.0001.GAE@google.com/ Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Hillf Danton <hdanton@sina.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05scatterlist: fix incorrect func name in kernel-docRandy Dunlap
Fix a kernel-doc warning by making the kernel-doc function description match the function name: include/linux/scatterlist.h:323: warning: expecting prototype for sg_unmark_bus_address(). Prototype was for sg_dma_unmark_bus_address() instead Link: https://lkml.kernel.org/r/20241130022406.537973-1-rdunlap@infradead.org Fixes: 42399301203e ("lib/scatterlist: add flag for indicating P2PDMA segments in an SGL") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05mm: correct typo in MMAP_STATE() macroLorenzo Stoakes
We mistakenly refer to len rather than len_ here. The only existing caller passes len to the len_ parameter so this has no impact on the code, but it is obviously incorrect to do this, so fix it. Link: https://lkml.kernel.org/r/20241118175414.390827-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: Jann Horn <jannh@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05mm: respect mmap hint address when aligning for THPKalesh Singh
Commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") updated __get_unmapped_area() to align the start address for the VMA to a PMD boundary if CONFIG_TRANSPARENT_HUGEPAGE=y. It does this by effectively looking up a region that is of size, request_size + PMD_SIZE, and aligning up the start to a PMD boundary. Commit 4ef9ad19e176 ("mm: huge_memory: don't force huge page alignment on 32 bit") opted out of this for 32bit due to regressions in mmap base randomization. Commit d4148aeab412 ("mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes") restricted this to only mmap sizes that are multiples of the PMD_SIZE due to reported regressions in some performance benchmarks -- which seemed mostly due to the reduced spatial locality of related mappings due to the forced PMD-alignment. Another unintended side effect has emerged: When a user specifies an mmap hint address, the THP alignment logic modifies the behavior, potentially ignoring the hint even if a sufficiently large gap exists at the requested hint location. Example Scenario: Consider the following simplified virtual address (VA) space: ... 0x200000-0x400000 --- VMA A 0x400000-0x600000 --- Hole 0x600000-0x800000 --- VMA B ... A call to mmap() with hint=0x400000 and len=0x200000 behaves differently: - Before THP alignment: The requested region (size 0x200000) fits into the gap at 0x400000, so the hint is respected. - After alignment: The logic searches for a region of size 0x400000 (len + PMD_SIZE) starting at 0x400000. This search fails due to the mapping at 0x600000 (VMA B), and the hint is ignored, falling back to arch_get_unmapped_area[_topdown](). In general the hint is effectively ignored, if there is any existing mapping in the below range: [mmap_hint + mmap_size, mmap_hint + mmap_size + PMD_SIZE) This changes the semantics of mmap hint; from ""Respect the hint if a sufficiently large gap exists at the requested location" to "Respect the hint only if an additional PMD-sized gap exists beyond the requested size". This has performance implications for allocators that allocate their heap using mmap but try to keep it "as contiguous as possible" by using the end of the exisiting heap as the address hint. With the new behavior it's more likely to get a much less contiguous heap, adding extra fragmentation and performance overhead. To restore the expected behavior; don't use thp_get_unmapped_area_vmflags() when the user provided a hint address, for anonymous mappings. Note: As Yang Shi pointed out: the issue still remains for filesystems which are using thp_get_unmapped_area() for their get_unmapped_area() op. It is unclear what worklaods will regress for if we ignore THP alignment when the hint address is provided for such file backed mappings -- so this fix will be handled separately. Link: https://lkml.kernel.org/r/20241118214650.3667577-1-kaleshsingh@google.com Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Rik van Riel <riel@surriel.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Hans Boehm <hboehm@google.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05mm: memcg: declare do_memsw_account inlineJohn Sperbeck
In commit 66d60c428b23 ("mm: memcg: move legacy memcg event code into memcontrol-v1.c"), the static do_memsw_account() function was moved from a .c file to a .h file. Unfortunately, the traditional inline keyword wasn't added. If a file (e.g., a unit test) includes the .h file, but doesn't refer to do_memsw_account(), it will get a warning like: mm/memcontrol-v1.h:41:13: warning: unused function 'do_memsw_account' [-Wunused-function] 41 | static bool do_memsw_account(void) | ^~~~~~~~~~~~~~~~ Link: https://lkml.kernel.org/r/20241128203959.726527-1-jsperbeck@google.com Fixes: 66d60c428b23 ("mm: memcg: move legacy memcg event code into memcontrol-v1.c") Signed-off-by: John Sperbeck <jsperbeck@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05mm/codetag: swap tags when migrate pagesDavid Wang
Current solution to adjust codetag references during page migration is done in 3 steps: 1. sets the codetag reference of the old page as empty (not pointing to any codetag); 2. subtracts counters of the new page to compensate for its own allocation; 3. sets codetag reference of the new page to point to the codetag of the old page. This does not work if CONFIG_MEM_ALLOC_PROFILING_DEBUG=n because set_codetag_empty() becomes NOOP. Instead, let's simply swap codetag references so that the new page is referencing the old codetag and the old page is referencing the new codetag. This way accounting stays valid and the logic makes more sense. Link: https://lkml.kernel.org/r/20241129025213.34836-1-00107082@163.com Fixes: e0a955bf7f61 ("mm/codetag: add pgalloc_tag_copy()") Signed-off-by: David Wang <00107082@163.com> Closes: https://lore.kernel.org/lkml/20241124074318.399027-1-00107082@163.com/ Acked-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Yu Zhao <yuzhao@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05ocfs2: update seq_file index in ocfs2_dlm_seq_nextWengang Wang
The following INFO level message was seen: seq_file: buggy .next function ocfs2_dlm_seq_next [ocfs2] did not update position index Fix: Update *pos (so m->index) to make seq_read_iter happy though the index its self makes no sense to ocfs2_dlm_seq_next. Link: https://lkml.kernel.org/r/20241119174500.9198-1-wen.gang.wang@oracle.com Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05stackdepot: fix stack_depot_save_flags() in NMI contextMarco Elver
Per documentation, stack_depot_save_flags() was meant to be usable from NMI context if STACK_DEPOT_FLAG_CAN_ALLOC is unset. However, it still would try to take the pool_lock in an attempt to save a stack trace in the current pool (if space is available). This could result in deadlock if an NMI is handled while pool_lock is already held. To avoid deadlock, only try to take the lock in NMI context and give up if unsuccessful. The documentation is fixed to clearly convey this. Link: https://lkml.kernel.org/r/Z0CcyfbPqmxJ9uJH@elver.google.com Link: https://lkml.kernel.org/r/20241122154051.3914732-1-elver@google.com Fixes: 4434a56ec209 ("stackdepot: make fast paths lock-less again") Signed-off-by: Marco Elver <elver@google.com> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05mm: open-code page_folio() in dump_page()Matthew Wilcox (Oracle)
page_folio() calls page_fixed_fake_head() which will misidentify this page as being a fake head and load off the end of 'precise'. We may have a pointer to a fake head, but that's OK because it contains the right information for dump_page(). gcc-15 is smart enough to catch this with -Warray-bounds: In function 'page_fixed_fake_head', inlined from '_compound_head' at ../include/linux/page-flags.h:251:24, inlined from '__dump_page' at ../mm/debug.c:123:11: ../include/asm-generic/rwonce.h:44:26: warning: array subscript 9 is outside +array bounds of 'struct page[1]' [-Warray-bounds=] Link: https://lkml.kernel.org/r/20241125201721.2963278-2-willy@infradead.org Fixes: fae7d834c43c ("mm: add __dump_folio()") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: Kees Cook <kees@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05mm: open-code PageTail in folio_flags() and const_folio_flags()Matthew Wilcox (Oracle)
It is unsafe to call PageTail() in dump_page() as page_is_fake_head() will almost certainly return true when called on a head page that is copied to the stack. That will cause the VM_BUG_ON_PGFLAGS() in const_folio_flags() to trigger when it shouldn't. Fortunately, we don't need to call PageTail() here; it's fine to have a pointer to a virtual alias of the page's flag word rather than the real page's flag word. Link: https://lkml.kernel.org/r/20241125201721.2963278-1-willy@infradead.org Fixes: fae7d834c43c ("mm: add __dump_folio()") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Kees Cook <kees@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05mm: fix vrealloc()'s KASAN poisoning logicAndrii Nakryiko
When vrealloc() reuses already allocated vmap_area, we need to re-annotate poisoned and unpoisoned portions of underlying memory according to the new size. This results in a KASAN splat recorded at [1]. A KASAN mis-reporting issue where there is none. Note, hard-coding KASAN_VMALLOC_PROT_NORMAL might not be exactly correct, but KASAN flag logic is pretty involved and spread out throughout __vmalloc_node_range_noprof(), so I'm using the bare minimum flag here and leaving the rest to mm people to refactor this logic and reuse it here. Link: https://lkml.kernel.org/r/20241126005206.3457974-1-andrii@kernel.org Link: https://lore.kernel.org/bpf/67450f9b.050a0220.21d33d.0004.GAE@google.com/ [1] Fixes: 3ddc2fefe6f3 ("mm: vmalloc: implement vrealloc()") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05Revert "readahead: properly shorten readahead when falling back to ↵Jan Kara
do_page_cache_ra()" This reverts commit 7c877586da3178974a8a94577b6045a48377ff25. Anders and Philippe have reported that recent kernels occasionally hang when used with NFS in readahead code. The problem has been bisected to 7c877586da3 ("readahead: properly shorten readahead when falling back to do_page_cache_ra()"). The cause of the problem is that ra->size can be shrunk by read_pages() call and subsequently we end up calling do_page_cache_ra() with negative (read huge positive) number of pages. Let's revert 7c877586da3 for now until we can find a proper way how the logic in read_pages() and page_cache_ra_order() can coexist. This can lead to reduced readahead throughput due to readahead window confusion but that's better than outright hangs. Link: https://lkml.kernel.org/r/20241126145208.985-1-jack@suse.cz Fixes: 7c877586da31 ("readahead: properly shorten readahead when falling back to do_page_cache_ra()") Reported-by: Anders Blomdell <anders.blomdell@gmail.com> Reported-by: Philippe Troin <phil@fifi.org> Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Philippe Troin <phil@fifi.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05selftests/damon: add _damon_sysfs.py to TEST_FILESMaximilian Heyne
When running selftests I encountered the following error message with some damon tests: # Traceback (most recent call last): # File "[...]/damon/./damos_quota.py", line 7, in <module> # import _damon_sysfs # ModuleNotFoundError: No module named '_damon_sysfs' Fix this by adding the _damon_sysfs.py file to TEST_FILES so that it will be available when running the respective damon selftests. Link: https://lkml.kernel.org/r/20241127-picks-visitor-7416685b-mheyne@amazon.de Fixes: 306abb63a8ca ("selftests/damon: implement a python module for test-purpose DAMON sysfs controls") Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05selftest: hugetlb_dio: fix test namingMark Brown
The string logged when a test passes or fails is used by the selftest framework to identify which test is being reported. The hugetlb_dio test not only uses the same strings for every test that is run but it also uses different strings for test passes and failures which means that test automation is unable to follow what the test is doing at all. Pull the existing duplicated logging of the number of free huge pages before and after the test out of the conditional and replace that and the logging of the result with a single ksft_print_result() which incorporates the parameters passed into the test into the output. Link: https://lkml.kernel.org/r/20241127-kselftest-mm-hugetlb-dio-names-v1-1-22aab01bf550@kernel.org Fixes: fae1980347bf ("selftests: hugetlb_dio: fixup check for initial conditions to skip in the start") Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05ocfs2: free inode when ocfs2_get_init_inode() failsTetsuo Handa
syzbot is reporting busy inodes after unmount, for commit 9c89fe0af826 ("ocfs2: Handle error from dquot_initialize()") forgot to call iput() when new_inode() succeeded and dquot_initialize() failed. Link: https://lkml.kernel.org/r/e68c0224-b7c6-4784-b4fa-a9fc8c675525@I-love.SAKURA.ne.jp Fixes: 9c89fe0af826 ("ocfs2: Handle error from dquot_initialize()") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot+0af00f6a2cba2058b5db@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0af00f6a2cba2058b5db Tested-by: syzbot+0af00f6a2cba2058b5db@syzkaller.appspotmail.com Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05nilfs2: fix potential out-of-bounds memory access in nilfs_find_entry()Ryusuke Konishi
Syzbot reported that when searching for records in a directory where the inode's i_size is corrupted and has a large value, memory access outside the folio/page range may occur, or a use-after-free bug may be detected if KASAN is enabled. This is because nilfs_last_byte(), which is called by nilfs_find_entry() and others to calculate the number of valid bytes of directory data in a page from i_size and the page index, loses the upper 32 bits of the 64-bit size information due to an inappropriate type of local variable to which the i_size value is assigned. This caused a large byte offset value due to underflow in the end address calculation in the calling nilfs_find_entry(), resulting in memory access that exceeds the folio/page size. Fix this issue by changing the type of the local variable causing the bit loss from "unsigned int" to "u64". The return value of nilfs_last_byte() is also of type "unsigned int", but it is truncated so as not to exceed PAGE_SIZE and no bit loss occurs, so no change is required. Link: https://lkml.kernel.org/r/20241119172403.9292-1-konishi.ryusuke@gmail.com Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+96d5d14c47d97015c624@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=96d5d14c47d97015c624 Tested-by: syzbot+96d5d14c47d97015c624@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05kasan: make report_lock a raw spinlockJared Kangas
If PREEMPT_RT is enabled, report_lock is a sleeping spinlock and must not be locked when IRQs are disabled. However, KASAN reports may be triggered in such contexts. For example: char *s = kzalloc(1, GFP_KERNEL); kfree(s); local_irq_disable(); char c = *s; /* KASAN report here leads to spin_lock() */ local_irq_enable(); Make report_spinlock a raw spinlock to prevent rescheduling when PREEMPT_RT is enabled. Link: https://lkml.kernel.org/r/20241119210234.1602529-1-jkangas@redhat.com Fixes: 342a93247e08 ("locking/spinlock: Provide RT variant header: <linux/spinlock_rt.h>") Signed-off-by: Jared Kangas <jkangas@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05mm/mempolicy: fix migrate_to_node() assuming there is at least one VMA in a MMDavid Hildenbrand
We currently assume that there is at least one VMA in a MM, which isn't true. So we might end up having find_vma() return NULL, to then de-reference NULL. So properly handle find_vma() returning NULL. This fixes the report: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 UID: 0 PID: 6021 Comm: syz-executor284 Not tainted 6.12.0-rc7-syzkaller-00187-gf868cd251776 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024 RIP: 0010:migrate_to_node mm/mempolicy.c:1090 [inline] RIP: 0010:do_migrate_pages+0x403/0x6f0 mm/mempolicy.c:1194 Code: ... RSP: 0018:ffffc9000375fd08 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffc9000375fd78 RCX: 0000000000000000 RDX: ffff88807e171300 RSI: dffffc0000000000 RDI: ffff88803390c044 RBP: ffff88807e171428 R08: 0000000000000014 R09: fffffbfff2039ef1 R10: ffffffff901cf78f R11: 0000000000000000 R12: 0000000000000003 R13: ffffc9000375fe90 R14: ffffc9000375fe98 R15: ffffc9000375fdf8 FS: 00005555919e1380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005555919e1ca8 CR3: 000000007f12a000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> kernel_migrate_pages+0x5b2/0x750 mm/mempolicy.c:1709 __do_sys_migrate_pages mm/mempolicy.c:1727 [inline] __se_sys_migrate_pages mm/mempolicy.c:1723 [inline] __x64_sys_migrate_pages+0x96/0x100 mm/mempolicy.c:1723 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f [akpm@linux-foundation.org: add unlikely()] Link: https://lkml.kernel.org/r/20241120201151.9518-1-david@redhat.com Fixes: 39743889aaf7 ("[PATCH] Swap Migration V5: sys_migrate_pages interface") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: syzbot+3511625422f7aa637f0d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/673d2696.050a0220.3c9d61.012f.GAE@google.com/T/ Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Christoph Lameter <cl@linux.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05mm/gup: handle NULL pages in unpin_user_pages()John Hubbard
The recent addition of "pofs" (pages or folios) handling to gup has a flaw: it assumes that unpin_user_pages() handles NULL pages in the pages** array. That's not the case, as I discovered when I ran on a new configuration on my test machine. Fix this by skipping NULL pages in unpin_user_pages(), just like unpin_folios() already does. Details: when booting on x86 with "numa=fake=2 movablecore=4G" on Linux 6.12, and running this: tools/testing/selftests/mm/gup_longterm ...I get the following crash: BUG: kernel NULL pointer dereference, address: 0000000000000008 RIP: 0010:sanity_check_pinned_pages+0x3a/0x2d0 ... Call Trace: <TASK> ? __die_body+0x66/0xb0 ? page_fault_oops+0x30c/0x3b0 ? do_user_addr_fault+0x6c3/0x720 ? irqentry_enter+0x34/0x60 ? exc_page_fault+0x68/0x100 ? asm_exc_page_fault+0x22/0x30 ? sanity_check_pinned_pages+0x3a/0x2d0 unpin_user_pages+0x24/0xe0 check_and_migrate_movable_pages_or_folios+0x455/0x4b0 __gup_longterm_locked+0x3bf/0x820 ? mmap_read_lock_killable+0x12/0x50 ? __pfx_mmap_read_lock_killable+0x10/0x10 pin_user_pages+0x66/0xa0 gup_test_ioctl+0x358/0xb20 __se_sys_ioctl+0x6b/0xc0 do_syscall_64+0x7b/0x150 entry_SYSCALL_64_after_hwframe+0x76/0x7e Link: https://lkml.kernel.org/r/20241121034933.77502-1-jhubbard@nvidia.com Fixes: 94efde1d1539 ("mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases") Signed-off-by: John Hubbard <jhubbard@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vivek Kasireddy <vivek.kasireddy@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Junxiao Chang <junxiao.chang@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05fs/proc/vmcore.c: fix warning when CONFIG_MMU=nAndrew Morton
>> fs/proc/vmcore.c:424:19: warning: 'mmap_vmcore_fault' defined but not used [-Wunused-function] 424 | static vm_fault_t mmap_vmcore_fault(struct vm_fault *vmf) Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202411140156.2o0nS4fl-lkp@intel.com/ Cc: Qi Xi <xiqi2@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05Merge tag 'audit-pr-20241205' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit build problem workaround from Paul Moore: "A minor audit patch that shuffles some code slightly to workaround a GCC bug affecting a number of people. The GCC folks have been able to reproduce the problem and are discussing solutions (see the bug report link in the commit), but since the workaround is trivial let's do that in the kernel so we can unblock people who are hitting this" * tag 'audit-pr-20241205' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: workaround a GCC bug triggered by task comm changes
2024-12-05Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd fixes from Jason Gunthorpe: "One bug fix and some documentation updates: - Correct typos in comments - Elaborate a comment about how the uAPI works for IOMMU_HW_INFO_TYPE_ARM_SMMUV3 - Fix a double free on error path and add test coverage for the bug" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommu/arm-smmu-v3: Improve uAPI comment for IOMMU_HW_INFO_TYPE_ARM_SMMUV3 iommufd/selftest: Cover IOMMU_FAULT_QUEUE_ALLOC in iommufd_fail_nth iommufd: Fix out_fput in iommufd_fault_alloc() iommufd: Fix typos in kernel-doc comments
2024-12-06Merge tag 'drm-misc-fixes-2024-12-05' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes drm-misc-fixes v6.13-rc2: - v3d performance counter fix. - A lot of DP-MST related fixes. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/2ce1650d-801f-4265-a876-5a8743f1c82b@linux.intel.com
2024-12-05Merge tag 'v6.13-rc1-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: - Three fixes for potential out of bound accesses in read and write paths (e.g. when alternate data streams enabled) - GCC 15 build fix * tag 'v6.13-rc1-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: align aux_payload_buf to avoid OOB reads in cryptographic operations ksmbd: fix Out-of-Bounds Write in ksmbd_vfs_stream_write ksmbd: fix Out-of-Bounds Read in ksmbd_vfs_stream_read smb: server: Fix building with GCC 15
2024-12-06Merge tag 'drm-xe-fixes-2024-12-04' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - Missing init value and 64-bit write-order check (Zhanjung) - Fix a memory allocation issue causing lockdep violation (John) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Z1BidZBFQOLjz__J@fedora
2024-12-05Merge tag 'net-6.13-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from can and netfilter. Current release - regressions: - rtnetlink: fix double call of rtnl_link_get_net_ifla() - tcp: populate XPS related fields of timewait sockets - ethtool: fix access to uninitialized fields in set RXNFC command - selinux: use sk_to_full_sk() in selinux_ip_output() Current release - new code bugs: - net: make napi_hash_lock irq safe - eth: - bnxt_en: support header page pool in queue API - ice: fix NULL pointer dereference in switchdev Previous releases - regressions: - core: fix icmp host relookup triggering ip_rt_bug - ipv6: - avoid possible NULL deref in modify_prefix_route() - release expired exception dst cached in socket - smc: fix LGR and link use-after-free issue - hsr: avoid potential out-of-bound access in fill_frame_info() - can: hi311x: fix potential use-after-free - eth: ice: fix VLAN pruning in switchdev mode Previous releases - always broken: - netfilter: - ipset: hold module reference while requesting a module - nft_inner: incorrect percpu area handling under softirq - can: j1939: fix skb reference counting - eth: - mlxsw: use correct key block on Spectrum-4 - mlx5: fix memory leak in mlx5hws_definer_calc_layout" * tag 'net-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits) net :mana :Request a V2 response version for MANA_QUERY_GF_STAT net: avoid potential UAF in default_operstate() vsock/test: verify socket options after setting them vsock/test: fix parameter types in SO_VM_SOCKETS_* calls vsock/test: fix failures due to wrong SO_RCVLOWAT parameter net/mlx5e: Remove workaround to avoid syndrome for internal port net/mlx5e: SD, Use correct mdev to build channel param net/mlx5: E-Switch, Fix switching to switchdev mode in MPV net/mlx5: E-Switch, Fix switching to switchdev mode with IB device disabled net/mlx5: HWS: Properly set bwc queue locks lock classes net/mlx5: HWS: Fix memory leak in mlx5hws_definer_calc_layout bnxt_en: handle tpa_info in queue API implementation bnxt_en: refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap() bnxt_en: refactor tpa_info alloc/free into helpers geneve: do not assume mac header is set in geneve_xmit_skb() mlxsw: spectrum_acl_flex_keys: Use correct key block on Spectrum-4 ethtool: Fix wrong mod state in case of verbose and no_mask bitset ipmr: tune the ipmr_can_free_table() checks. netfilter: nft_set_hash: skip duplicated elements pending gc run netfilter: ipset: Hold module reference while requesting a module ...
2024-12-05Merge tag 'trace-v6.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix trace histogram sort function cmp_entries_dup() The sort function cmp_entries_dup() returns either 1 or 0, and not -1 if parameter "a" is less than "b" by memcmp(). - Fix archs that call trace_hardirqs_off() without RCU watching Both x86 and arm64 no longer call any tracepoints with RCU not watching. It was assumed that it was safe to get rid of trace_*_rcuidle() version of the tracepoint calls. This was needed to get rid of the SRCU protection and be able to implement features like faultable traceponits and add rust tracepoints. Unfortunately, there were a few architectures that still relied on that logic. There's only one file that has tracepoints that are called without RCU watching. Add macro logic around the tracepoints for architectures that do not have CONFIG_ARCH_WANTS_NO_INSTR defined will check if the code is in the idle path (the only place RCU isn't watching), and enable RCU around calling the tracepoint, but only do it if the tracepoint is enabled. * tag 'trace-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix archs that still call tracepoints without RCU watching tracing: Fix cmp_entries_dup() to respect sort() comparison rules
2024-12-05Merge tag 'hid-for-linus-2024120501' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Benjamin Tissoires: - regression fix in suspend/resume for i2c-hid (Kenny Levinsen) - fix wacom driver assuming a name can not be null (WangYuli) - a couple of constify changes/fixes (Thomas Weißschuh) - a couple of selftests/hid fixes (Maximilian Heyne & Benjamin Tissoires) * tag 'hid-for-linus-2024120501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: selftests/hid: fix kfunc inclusions with newer bpftool HID: bpf: drop unneeded casts discarding const HID: bpf: constify hid_ops selftests: hid: fix typo and exit code HID: wacom: fix when get product name maybe null pointer HID: i2c-hid: Revert to using power commands to wake on resume
2024-12-05arm64: ptrace: fix partial SETREGSET for NT_ARM_GCSMark Rutland
Currently gcs_set() doesn't initialize the temporary 'user_gcs' variable, and a SETREGSET call with a length of 0, 8, or 16 will leave some portion of this uninitialized. Consequently some arbitrary uninitialized values may be written back to the relevant fields in task struct, potentially leaking up to 192 bits of memory from the kernel stack. The read is limited to a specific slot on the stack, and the issue does not provide a write mechanism. As gcs_set() rejects cases where user_gcs::features_enabled has bits set other than PR_SHADOW_STACK_SUPPORTED_STATUS_MASK, a SETREGSET call with a length of zero will randomly succeed or fail depending on the value of the uninitialized value, it isn't possible to leak the full 192 bits. With a length of 8 or 16, user_gcs::features_enabled can be initialized to an accepted value, making it practical to leak 128 or 64 bits. Fix this by initializing the temporary value before copying the regset from userspace, as for other regsets (e.g. NT_PRSTATUS, NT_PRFPREG, NT_ARM_SYSTEM_CALL). In the case of a zero-length or partial write, the existing contents of the fields which are not written to will be retained. To ensure that the extraction and insertion of fields is consistent across the GETREGSET and SETREGSET calls, new task_gcs_to_user() and task_gcs_from_user() helpers are added, matching the style of pac_address_keys_to_user() and pac_address_keys_from_user(). Before this patch: | # ./gcs-test | Attempting to write NT_ARM_GCS::user_gcs = { | .features_enabled = 0x0000000000000000, | .features_locked = 0x0000000000000000, | .gcspr_el0 = 0x900d900d900d900d, | } | SETREGSET(nt=0x410, len=24) wrote 24 bytes | | Attempting to read NT_ARM_GCS::user_gcs | GETREGSET(nt=0x410, len=24) read 24 bytes | Read NT_ARM_GCS::user_gcs = { | .features_enabled = 0x0000000000000000, | .features_locked = 0x0000000000000000, | .gcspr_el0 = 0x900d900d900d900d, | } | | Attempting partial write NT_ARM_GCS::user_gcs = { | .features_enabled = 0x0000000000000000, | .features_locked = 0x1de7ec7edbadc0de, | .gcspr_el0 = 0x1de7ec7edbadc0de, | } | SETREGSET(nt=0x410, len=8) wrote 8 bytes | | Attempting to read NT_ARM_GCS::user_gcs | GETREGSET(nt=0x410, len=24) read 24 bytes | Read NT_ARM_GCS::user_gcs = { | .features_enabled = 0x0000000000000000, | .features_locked = 0x000000000093e780, | .gcspr_el0 = 0xffff800083a63d50, | } After this patch: | # ./gcs-test | Attempting to write NT_ARM_GCS::user_gcs = { | .features_enabled = 0x0000000000000000, | .features_locked = 0x0000000000000000, | .gcspr_el0 = 0x900d900d900d900d, | } | SETREGSET(nt=0x410, len=24) wrote 24 bytes | | Attempting to read NT_ARM_GCS::user_gcs | GETREGSET(nt=0x410, len=24) read 24 bytes | Read NT_ARM_GCS::user_gcs = { | .features_enabled = 0x0000000000000000, | .features_locked = 0x0000000000000000, | .gcspr_el0 = 0x900d900d900d900d, | } | | Attempting partial write NT_ARM_GCS::user_gcs = { | .features_enabled = 0x0000000000000000, | .features_locked = 0x1de7ec7edbadc0de, | .gcspr_el0 = 0x1de7ec7edbadc0de, | } | SETREGSET(nt=0x410, len=8) wrote 8 bytes | | Attempting to read NT_ARM_GCS::user_gcs | GETREGSET(nt=0x410, len=24) read 24 bytes | Read NT_ARM_GCS::user_gcs = { | .features_enabled = 0x0000000000000000, | .features_locked = 0x0000000000000000, | .gcspr_el0 = 0x900d900d900d900d, | } Fixes: 7ec3b57cb29f ("arm64/ptrace: Expose GCS via ptrace and core files") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241205121655.1824269-5-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-12-05arm64: ptrace: fix partial SETREGSET for NT_ARM_POEMark Rutland
Currently poe_set() doesn't initialize the temporary 'ctrl' variable, and a SETREGSET call with a length of zero will leave this uninitialized. Consequently an arbitrary value will be written back to target->thread.por_el0, potentially leaking up to 64 bits of memory from the kernel stack. The read is limited to a specific slot on the stack, and the issue does not provide a write mechanism. Fix this by initializing the temporary value before copying the regset from userspace, as for other regsets (e.g. NT_PRSTATUS, NT_PRFPREG, NT_ARM_SYSTEM_CALL). In the case of a zero-length write, the existing contents of POR_EL1 will be retained. Before this patch: | # ./poe-test | Attempting to write NT_ARM_POE::por_el0 = 0x900d900d900d900d | SETREGSET(nt=0x40f, len=8) wrote 8 bytes | | Attempting to read NT_ARM_POE::por_el0 | GETREGSET(nt=0x40f, len=8) read 8 bytes | Read NT_ARM_POE::por_el0 = 0x900d900d900d900d | | Attempting to write NT_ARM_POE (zero length) | SETREGSET(nt=0x40f, len=0) wrote 0 bytes | | Attempting to read NT_ARM_POE::por_el0 | GETREGSET(nt=0x40f, len=8) read 8 bytes | Read NT_ARM_POE::por_el0 = 0xffff8000839c3d50 After this patch: | # ./poe-test | Attempting to write NT_ARM_POE::por_el0 = 0x900d900d900d900d | SETREGSET(nt=0x40f, len=8) wrote 8 bytes | | Attempting to read NT_ARM_POE::por_el0 | GETREGSET(nt=0x40f, len=8) read 8 bytes | Read NT_ARM_POE::por_el0 = 0x900d900d900d900d | | Attempting to write NT_ARM_POE (zero length) | SETREGSET(nt=0x40f, len=0) wrote 0 bytes | | Attempting to read NT_ARM_POE::por_el0 | GETREGSET(nt=0x40f, len=8) read 8 bytes | Read NT_ARM_POE::por_el0 = 0x900d900d900d900d Fixes: 175198199262 ("arm64/ptrace: add support for FEAT_POE") Cc: <stable@vger.kernel.org> # 6.12.x Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241205121655.1824269-4-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-12-05arm64: ptrace: fix partial SETREGSET for NT_ARM_FPMRMark Rutland
Currently fpmr_set() doesn't initialize the temporary 'fpmr' variable, and a SETREGSET call with a length of zero will leave this uninitialized. Consequently an arbitrary value will be written back to target->thread.uw.fpmr, potentially leaking up to 64 bits of memory from the kernel stack. The read is limited to a specific slot on the stack, and the issue does not provide a write mechanism. Fix this by initializing the temporary value before copying the regset from userspace, as for other regsets (e.g. NT_PRSTATUS, NT_PRFPREG, NT_ARM_SYSTEM_CALL). In the case of a zero-length write, the existing contents of FPMR will be retained. Before this patch: | # ./fpmr-test | Attempting to write NT_ARM_FPMR::fpmr = 0x900d900d900d900d | SETREGSET(nt=0x40e, len=8) wrote 8 bytes | | Attempting to read NT_ARM_FPMR::fpmr | GETREGSET(nt=0x40e, len=8) read 8 bytes | Read NT_ARM_FPMR::fpmr = 0x900d900d900d900d | | Attempting to write NT_ARM_FPMR (zero length) | SETREGSET(nt=0x40e, len=0) wrote 0 bytes | | Attempting to read NT_ARM_FPMR::fpmr | GETREGSET(nt=0x40e, len=8) read 8 bytes | Read NT_ARM_FPMR::fpmr = 0xffff800083963d50 After this patch: | # ./fpmr-test | Attempting to write NT_ARM_FPMR::fpmr = 0x900d900d900d900d | SETREGSET(nt=0x40e, len=8) wrote 8 bytes | | Attempting to read NT_ARM_FPMR::fpmr | GETREGSET(nt=0x40e, len=8) read 8 bytes | Read NT_ARM_FPMR::fpmr = 0x900d900d900d900d | | Attempting to write NT_ARM_FPMR (zero length) | SETREGSET(nt=0x40e, len=0) wrote 0 bytes | | Attempting to read NT_ARM_FPMR::fpmr | GETREGSET(nt=0x40e, len=8) read 8 bytes | Read NT_ARM_FPMR::fpmr = 0x900d900d900d900d Fixes: 4035c22ef7d4 ("arm64/ptrace: Expose FPMR via ptrace") Cc: <stable@vger.kernel.org> # 6.9.x Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241205121655.1824269-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-12-05Merge tag 'linux-watchdog-6.13-rc1' of ↵Linus Torvalds
git://www.linux-watchdog.org/linux-watchdog Pull watchdog updates from Wim Van Sebroeck: - Add support for exynosautov920 SoC - Add support for Airoha EN7851 watchdog - Add support for MT6735 TOPRGU/WDT - Delete the cpu5wdt driver - Always print when registering watchdog fails - Several other small fixes and improvements * tag 'linux-watchdog-6.13-rc1' of git://www.linux-watchdog.org/linux-watchdog: (36 commits) watchdog: rti: of: honor timeout-sec property watchdog: s3c2410_wdt: add support for exynosautov920 SoC dt-bindings: watchdog: Document ExynosAutoV920 watchdog bindings watchdog: mediatek: Add support for MT6735 TOPRGU/WDT watchdog: mediatek: Make sure system reset gets asserted in mtk_wdt_restart() dt-bindings: watchdog: fsl-imx-wdt: Add missing 'big-endian' property dt-bindings: watchdog: Document Qualcomm QCS8300 docs: ABI: Fix spelling mistake in pretimeout_avaialable_governors Revert "watchdog: s3c2410_wdt: use exynos_get_pmu_regmap_by_phandle() for PMU regs" watchdog: rzg2l_wdt: Power on the watchdog domain in the restart handler watchdog: Switch back to struct platform_driver::remove() watchdog: it87_wdt: add PWRGD enable quirk for Qotom QCML04 watchdog: da9063: Remove __maybe_unused notations watchdog: da9063: Do not use a global variable watchdog: Delete the cpu5wdt driver watchdog: Add support for Airoha EN7851 watchdog dt-bindings: watchdog: airoha: document watchdog for Airoha EN7581 watchdog: sl28cpld_wdt: don't print out if registering watchdog fails watchdog: rza_wdt: don't print out if registering watchdog fails watchdog: rti_wdt: don't print out if registering watchdog fails ...
2024-12-05arm64: ptrace: fix partial SETREGSET for NT_ARM_TAGGED_ADDR_CTRLMark Rutland
Currently tagged_addr_ctrl_set() doesn't initialize the temporary 'ctrl' variable, and a SETREGSET call with a length of zero will leave this uninitialized. Consequently tagged_addr_ctrl_set() will consume an arbitrary value, potentially leaking up to 64 bits of memory from the kernel stack. The read is limited to a specific slot on the stack, and the issue does not provide a write mechanism. As set_tagged_addr_ctrl() only accepts values where bits [63:4] zero and rejects other values, a partial SETREGSET attempt will randomly succeed or fail depending on the value of the uninitialized value, and the exposure is significantly limited. Fix this by initializing the temporary value before copying the regset from userspace, as for other regsets (e.g. NT_PRSTATUS, NT_PRFPREG, NT_ARM_SYSTEM_CALL). In the case of a zero-length write, the existing value of the tagged address ctrl will be retained. The NT_ARM_TAGGED_ADDR_CTRL regset is only visible in the user_aarch64_view used by a native AArch64 task to manipulate another native AArch64 task. As get_tagged_addr_ctrl() only returns an error value when called for a compat task, tagged_addr_ctrl_get() and tagged_addr_ctrl_set() should never observe an error value from get_tagged_addr_ctrl(). Add a WARN_ON_ONCE() to both to indicate that such an error would be unexpected, and error handlnig is not missing in either case. Fixes: 2200aa7154cb ("arm64: mte: ptrace: Add NT_ARM_TAGGED_ADDR_CTRL regset") Cc: <stable@vger.kernel.org> # 5.10.x Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241205121655.1824269-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-12-05drm/v3d: Enable Performance Counters before clearing themMaíra Canal
On the Raspberry Pi 5, performance counters are not being cleared when `v3d_perfmon_start()` is called, even though we write to the CLR register. As a result, their values accumulate until they overflow. The expected behavior is for performance counters to reset to zero at the start of a job. When the job finishes and the perfmon is stopped, the counters should accurately reflect the values for that specific job. To ensure this behavior, the performance counters are now enabled before being cleared. This allows the CLR register to function as intended, zeroing the counter values when the job begins. Fixes: 26a4dc29b74a ("drm/v3d: Expose performance counters to userspace") Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241204122831.17015-1-mcanal@igalia.com
2024-12-05arm64: cpufeature: Add GCS to cpucap_is_possible()Robin Murphy
Since system_supports_gcs() ends up referring to cpucap_is_possible(), teach the latter about GCS for consistency with similar features. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/416c7369fcdce4ebb2a8f12daae234507be27e38.1733406275.git.robin.murphy@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-12-05Merge tag 'nvme-6.13-2024-12-05' of git://git.infradead.org/nvme into block-6.13Jens Axboe
Pull NVMe fixess from Keith: "nvme fixes for Linux 6.13 - Target fix using incorrect zero buffer (Nilay) - Device specifc deallocate quirk fixes (Christoph, Keith) - Fabrics fix for handling max command target bugs (Maurizio) - Cocci fix usage for kzalloc (Yu-Chen) - DMA size fix for host memory buffer feature (Christoph) - Fabrics queue cleanup fixes (Chunguang)" * tag 'nvme-6.13-2024-12-05' of git://git.infradead.org/nvme: nvme-tcp: simplify nvme_tcp_teardown_io_queues() nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues() nvme-rdma: unquiesce admin_q before destroy it nvme-tcp: fix the memleak while create new ctrl failed nvme-pci: don't use dma_alloc_noncontiguous with 0 merge boundary nvmet: replace kmalloc + memset with kzalloc for data allocation nvme-fabrics: handle zero MAXCMD without closing the connection nvme-pci: remove two deallocate zeroes quirks nvme: don't apply NVME_QUIRK_DEALLOCATE_ZEROES when DSM is not supported nvmet: use kzalloc instead of ZERO_PAGE in nvme_execute_identify_ns_nvm()
2024-12-05Merge tag 'asoc-fix-v6.13-rc1' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.13 A few small fixes for v6.13, all system specific - the biggest thing is the fix for jack handling over suspend on some Intel laptops.