summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-01-25mm/huge_memory.c: rename shadowed localAndrew Morton
split_huge_pages_write() has a lccal `buf' which shadows incoming arg `buf'. Reviewer confusion resulted. Rename the inner local to `tok_buf'. Cc: Leo Stone <leocstone@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25tools: testing: add simple __mmap_region() userland testLorenzo Stoakes
Introduce demonstrative, basic, __mmap_region() test upon which we can base further work upon moving forwards. This simply asserts that mappings can be made and merges occur as expected. As part of this change, fix the security_vm_enough_memory_mm() stub which was previously incorrectly implemented. Link: https://lkml.kernel.org/r/20241213162409.41498-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm: unexport apply_to_existing_page_rangeChristoph Hellwig
apply_to_existing_page_range() is only used by non-modular code. Link: https://lkml.kernel.org/r/20241212073423.1439954-1-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm: fix outdated incorrect code comments for handle_mm_fault()Jinliang Zheng
[akpm@linux-foundation.org: s/mmap_Lock/mmap_lock/, per Liam] Link: https://lkml.kernel.org/r/20241213031820.778342-1-alexjlzheng@tencent.com Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/early_ioremap: add null pointer checks to prevent NULL-pointer dereferenceGuo Weikang
The early_ioremap interface can fail and return NULL in certain cases. To prevent NULL-pointer dereference crashes, fixed issues in the acpi_extlog and copy_early_mem interfaces, improving robustness when handling early memory. Link: https://lkml.kernel.org/r/20241212101004.1544070-1-guoweikang.kernel@gmail.com Signed-off-by: Guo Weikang <guoweikang.kernel@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Julian Stecklina <julian.stecklina@cyberus-technology.de> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <lenb@kernel.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: add comments to do_mmap(), mmap_region() and vm_mmap()Lorenzo Stoakes
It isn't always entirely clear to users the difference between do_mmap(), mmap_region() and vm_mmap(), so add comments to clarify what's going on in each. This is compounded by the fact that we actually allow callers external to mm to invoke both do_mmap() and mmap_region() (!), the latter of which is really strictly speaking an internal memory mapping implementation detail. Link: https://lkml.kernel.org/r/20241212113152.28849-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: assert mmap write lock held on do_mmap(), mmap_region()Lorenzo Stoakes
Both of these functions can be invoked outside of mm, so it is probably a good idea to assert that the required lock is held. Will only have an impact if CONFIG_DEBUG_VM is set, otherwise this amounts to no change at all. Link: https://lkml.kernel.org/r/20241212114841.55185-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13MAINTAINERS: update MEMORY MAPPING sectionLorenzo Stoakes
Update the MEMORY MAPPING section to contain VMA logic as it makes no sense to have these two sections separate. Additionally, add files which permit changes to the attributes and/or ranges spanned by memory mappings, in essence anything which might alter the output of /proc/$pid/[s]maps. This is necessarily fuzzy, as there is not quite as good separation of concerns as we would ideally like in the kernel. However each of these files interacts with the VMA and memory mapping logic in such a way as to be inseparatable from it, and it is important that they are maintained in conjunction with it. Link: https://lkml.kernel.org/r/20241211105315.21756-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13memcg/hugetlb: remove memcg hugetlb try-commit-cancel protocolJoshua Hahn
This patch fully removes the mem_cgroup_{try, commit, cancel}_charge functions, as well as their hugetlb variants. Link: https://lkml.kernel.org/r/20241211203951.764733-4-joshua.hahnjy@gmail.com Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13memcg/hugetlb: introduce mem_cgroup_charge_hugetlbJoshua Hahn
This patch introduces mem_cgroup_charge_hugetlb which combines the logic of mem_cgroup_hugetlb_try_charge / mem_cgroup_hugetlb_commit_charge and removes the need for mem_cgroup_hugetlb_cancel_charge. It also reduces the footprint of memcg in hugetlb code and consolidates all memcg related error paths into one. Link: https://lkml.kernel.org/r/20241211203951.764733-3-joshua.hahnjy@gmail.com Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13memcg/hugetlb: introduce memcg_accounts_hugetlbJoshua Hahn
Patch series "memcg/hugetlb: Rework memcg hugetlb charging", v3. This series cleans up memcg's hugetlb charging logic by deprecating the current memcg hugetlb try-charge + {commit, cancel} logic present in alloc_hugetlb_folio. A single function mem_cgroup_charge_hugetlb takes its place instead. This makes the code more maintainable by simplifying the error path and reduces memcg's footprint in hugetlb logic. This patch introduces a few changes in the hugetlb folio allocation error path: (a) Instead of having multiple return points, we consolidate them to two: one for reaching the memcg limit or running out of memory (-ENOMEM) and one for hugetlb allocation fails / limit being reached (-ENOSPC). (b) Previously, the memcg limit was checked before the folio is acquired, meaning the hugeTLB folio isn't acquired if the limit is reached. This patch performs the charging after the folio is reached, meaning if memcg's limit is reached, the acquired folio is freed right away. This patch builds on two earlier patch series: [2] which adds memcg hugeTLB counters, and [3] which deprecates charge moving and removes the last references to mem_cgroup_cancel_charge. The request for this cleanup can be found in [2]. [1] https://lore.kernel.org/all/20231006184629.155543-1-nphamcs@gmail.com/ [2] https://lore.kernel.org/all/20241101204402.1885383-1-joshua.hahnjy@gmail.com/ [3] https://lore.kernel.org/linux-mm/20241025012304.2473312-1-shakeel.butt@linux.dev/ This patch (of 3): This patch isolates the check for whether memcg accounts hugetlb. This condition can only be true if the memcg mount option memory_hugetlb_accounting is on, which includes hugetlb usage in memory.current. Link: https://lkml.kernel.org/r/20241211203951.764733-1-joshua.hahnjy@gmail.com Link: https://lkml.kernel.org/r/20241211203951.764733-2-joshua.hahnjy@gmail.com Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/migrate: remove slab checks in isolate_movable_page()Hyeonggon Yoo
Commit 8b8817630ae8 ("mm/migrate: make isolate_movable_page() skip slab pages") introduced slab checks to prevent mis-identification of slab pages as movable kernel pages. However, after Matthew's frozen folio series, these slab checks became unnecessary as the migration logic fails to increase the reference count for frozen slab folios. Remove these redundant slab checks and associated memory barriers. Link: https://lkml.kernel.org/r/20241210124807.8584-1-42.hyeyoo@gmail.com Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13samples/damon/prcl: implement schemes setupSeongJae Park
Implement a proactive cold memory regions reclaiming logic of prcl sample module using DAMOS. The logic treats memory regions that not accessed at all for five or more seconds as cold, and reclaim those as soon as found. Link: https://lkml.kernel.org/r/20241210215030.85675-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13samples/damon: introduce a skeleton of a smaple DAMON module for proactive ↵SeongJae Park
reclamation DAMON is not only for monitoring of access patterns, but also for access-aware system operations. For the system operations, DAMON provides a feature called DAMOS (Data Access Monitoring-based Operation Schemes). There is no sample API usage of DAMOS, though. Copy the working set size estimation sample modules with changed names of the module and symbols, to use it as a skeleton for a sample module showing the DAMOS API usage. The following commit will make it proactively reclaim cold memory of the given process, using DAMOS. Link: https://lkml.kernel.org/r/20241210215030.85675-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13samples/damon/wsse: implement working set size estimation and loggingSeongJae Park
Implement the DAMON-based working set size estimation logic. The logic iterates memory regions in DAMON-generated access pattern snapshot for every aggregation interval and get the total sum of the size of any region having one or higher 'nr_accesses' count. That is, it assumes any region having one or higher 'nr_accesses' to be a part of the working set. The estimated value is reported to the user by printing it to the kernel log. Link: https://lkml.kernel.org/r/20241210215030.85675-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13samples/damon/wsse: start and stop DAMON as the user requestsSeongJae Park
Start running DAMON to monitor accesses of a process that the user specified via 'target_pid' parameter, when 'y' is passed to 'enable' parameter. Stop running DAMON when 'n' is passed to 'enable' parameter. Estimating the working set size from DAMON's monitoring results and reporting it to the user will be implemented by the following commit. Link: https://lkml.kernel.org/r/20241210215030.85675-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13samples: add a skeleton of a sample DAMON module for working set size estimationSeongJae Park
Patch series "mm/damon: add sample modules". Implement a proactive cold memory regions reclaiming logic of prcl sample module using DAMOS. The logic treats memory regions that not accessed at all for five or more seconds as cold, and reclaim those as soon as found. This patch (of 5): Add a skeleton for a sample DAMON static module that can be used for estimating working set size of a given process. Note that it is a static module since DAMON is not exporting symbols to loadable modules for now. It exposes two module parameters, namely 'pid' and 'enable'. 'pid' will specify the process that the module will estimate the working set size of. 'enable' will receive whether to start or stop the estimation. Because this is just a skeleton, the parameters do nothing, though. The functionalities will be implemented by following commits. Link: https://lkml.kernel.org/r/20241210215030.85675-1-sj@kernel.org Link: https://lkml.kernel.org/r/20241210215030.85675-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: remove X permission from sigaltstack mappingKevin Brodsky
There is no reason why the alternate signal stack should be mapped as RWX. Map it as RW instead. Link: https://lkml.kernel.org/r/20241209095019.1732120-15-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: skip pkey_sighandler_tests if support is missingKevin Brodsky
The pkey_sighandler_tests are bound to fail if either the kernel or CPU doesn't support pkeys. Skip the tests if pkeys support is missing. Link: https://lkml.kernel.org/r/20241209095019.1732120-14-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: rename pkey register macroKevin Brodsky
PKEY_ALLOW_ALL is meant to represent the pkey register value that allows all accesses (enables all pkeys). However its current naming suggests that the value applies to *one* key only (like PKEY_DISABLE_ACCESS for instance). Rename PKEY_ALLOW_ALL to PKEY_REG_ALLOW_ALL to avoid such misunderstanding. This is consistent with the PKEY_REG_ALLOW_NONE macro introduced by commit 6e182dc9f268 ("selftests/mm: Use generic pkey register manipulation"). Link: https://lkml.kernel.org/r/20241209095019.1732120-13-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: use sys_pkey helpers consistentlyKevin Brodsky
sys_pkey_alloc, sys_pkey_free and sys_mprotect_pkey are currently used in protections_keys.c, while pkey_sighandler_tests.c calls the libc wrappers directly (e.g. pkey_mprotect()). This is probably ok when using glibc (those symbols appeared a while ago), but Musl does not currently provide them. The logging in the helpers from pkey-helpers.h can also come in handy. Make things more consistent by using the sys_pkey helpers in pkey_sighandler_tests.c too. To that end their implementation is moved to a common .c file (pkey_util.c). This also enables calling is_pkeys_supported() outside of protections_keys.c, since it relies on sys_pkey_{alloc,free}. [kevin.brodsky@arm.com: fix dependency on pkey_util.c] Link: https://lkml.kernel.org/r/20241216092849.2140850-1-kevin.brodsky@arm.com Link: https://lkml.kernel.org/r/20241209095019.1732120-12-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: ensure non-global pkey symbols are marked staticKevin Brodsky
The pkey tests define a whole lot of functions and some global variables. A few are truly global (declared in pkey-helpers.h), but the majority are file-scoped. Make sure those are labelled static. Some of the pkey_{access,write}_{allow,deny} helpers are not called, or only called when building for some architectures. Mark them __maybe_unused to suppress compiler warnings. Link: https://lkml.kernel.org/r/20241209095019.1732120-11-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: remove empty pkey helper definitionKevin Brodsky
Some of the functions declared in pkey-helpers.h are actually defined in protections_keys.c, meaning they can only be called from protections_keys.c. This is less than ideal, but it is hard to avoid as these helpers are themselves called from inline functions in pkey-<arch>.h. Let's at least add a comment clarifying that. We can also remove the empty definition in pkey_sighandler_tests.c: expected_pkey_fault() is not meant to be called from there. Link: https://lkml.kernel.org/r/20241209095019.1732120-10-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: ensure pkey-*.h define inline functions onlyKevin Brodsky
Headers should not define non-inline functions, as this prevents them from being included more than once in a given program. pkey-helpers.h and the arch-specific headers it includes currently define multiple such non-inline functions. In most cases those functions can simply be made inline - this patch does just that. read_ptr() is an exception as it must not be inlined. Since it is only called from protection_keys.c, we just move it there. Link: https://lkml.kernel.org/r/20241209095019.1732120-9-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: define types using typedef in pkey-helpers.hKevin Brodsky
Using #define to define types should be avoided. Use typedef instead. Also ensure that __u* types are actually defined by including <linux/types.h>. Link: https://lkml.kernel.org/r/20241209095019.1732120-8-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: remove unused pkey helpersKevin Brodsky
Commit 5f23f6d082a9 ("x86/pkeys: Add self-tests") introduced a number of helpers and functions that don't seem to have ever been used. Let's remove them. Link: https://lkml.kernel.org/r/20241209095019.1732120-7-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: build with -O2Kevin Brodsky
The mm kselftests are currently built with no optimisation (-O0). It's unclear why, and besides being obviously suboptimal, this also prevents the pkeys tests from working as intended. Let's build all the tests with -O2. [kevin.brodsky@arm.com: silence unused-result warnings] Link: https://lkml.kernel.org/r/20250107170110.2819685-1-kevin.brodsky@arm.com Link: https://lkml.kernel.org/r/20241209095019.1732120-6-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: fix -Warray-bounds warnings in pkey_sighandler_testsKevin Brodsky
GCC doesn't like dereferencing a pointer set to 0x1 (when building at -O2): pkey_sighandler_tests.c:166:9: warning: array subscript 0 is outside array bounds of 'int[0]' [-Warray-bounds=] 166 | *(int *) (0x1) = 1; | ^~~~~~~~~~~~~~ cc1: note: source object is likely at address zero Using NULL instead seems to make it happy. This should make no difference in practice (SIGSEGV with SEGV_MAPERR will be the outcome regardless), we just need to update the expected si_addr. [kevin.brodsky@arm.com: fix clang dereferencing-null issue] Link: https://lkml.kernel.org/r/20241218153615.2267571-1-kevin.brodsky@arm.com Link: https://lkml.kernel.org/r/20241209095019.1732120-5-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: fix strncpy() lengthKevin Brodsky
GCC complains (with -O2) that the length is equal to the destination size, which is indeed invalid. Subtract 1 from the size of the array to leave room for '\0'. Link: https://lkml.kernel.org/r/20241209095019.1732120-4-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: fix -Wmaybe-uninitialized warningsKevin Brodsky
A few -Wmaybe-uninitialized warnings show up when building the mm tests with -O2. None of them looks worrying; silence them by initialising the problematic variables. Link: https://lkml.kernel.org/r/20241209095019.1732120-3-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: fix condition in uffd_move_test_common()Kevin Brodsky
Patch series "pkeys kselftests improvements". This series brings various cleanups and fixes for the mm (mostly pkeys) kselftests. The original goal was to make the pkeys tests work out of the box and without build warning - it turned out to be more involved than expected. The most important change is enabling -O2 when building all mm kselftests (patch 5). This is actually needed for the pkeys tests to run successfully (see gcc command line at the top of protection_keys.c and pkey_sighandler_tests.c), and seems to have no negative impact on the other tests. It certainly can't hurt performance! The following patches address a few obvious issues in the pkeys tests (unused code, bad scope for functions/variables, etc.) and finally make a couple of small improvements. There is one ugliness that this series does not fix: some functions in pkey-<arch>.h call functions that are actually defined in protection_keys.c. For instance, expect_fault_on_read_execonly_key() in pkey-x86.h calls expected_pkey_fault(). This means that other test programs that use pkey-helpers.h (namely pkey_sighandler_tests) would fail to link if they called such functions defined in pkey-<arch>.h. Fixing this would require a more comprehensive reorganisation of the pkey-* headers, which doesn't seem worth it (patch 9 adds a comment to pkey-helpers.h to clarify the situation). Some more details on the patches: - Patch 1 is an unrelated fix that was revealed by inspecting a warning. It seems fairly harmless though, so I thought I'd just post it as part of this series. - Patch 2-5 fix various warnings that come up by building the mm tests at -O2 and finally enable -O2. - Patch 6-12 are various cleanups for the pkeys tests. Patch 11 in particular enables is_pkeys_supported() to be called from outside protection_keys.c (patch 13 relies on this). - Patch 13-14 are small improvements to pkey_sighandler_tests.c. Many thanks to Ryan Roberts for checking that the mm tests still run fine on arm64 with those patches applied. I've also checked that the pkeys tests run fine on arm64 and x86. This patch (of 14): area_src and area_dst are saved at the beginning of the function if chunk_size > page_size. The intention is quite clearly to restore them at the end based on the same condition, but step_size is considered instead of chunk_size. Considering that step_size is a number of pages, the condition is likely to be false. Use the same condition as when saving so that the globals are restored as intended. Link: https://lkml.kernel.org/r/20241209095019.1732120-1-kevin.brodsky@arm.com Link: https://lkml.kernel.org/r/20241209095019.1732120-2-kevin.brodsky@arm.com Fixes: a2bf6a9ca805 ("selftests/mm: add UFFDIO_MOVE ioctl test") Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/memory_hotplug: don't use __GFP_HARDWALL when migrating pages via memory ↵David Hildenbrand
offlining We'll migrate pages allocated by other context; respecting the cpuset of the memory offlining context when allocating a migration target does not make sense. Drop the __GFP_HARDWALL by using GFP_KERNEL. Note that in an ideal world, migration code could figure out the cpuset of the original context and take that into consideration. Link: https://lkml.kernel.org/r/20241205090508.2095225-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/page_alloc: don't use __GFP_HARDWALL when migrating pages via alloc_contig*()David Hildenbrand
Patch series "mm: don't use __GFP_HARDWALL when migrating remote pages". __GFP_HARDWALL means that we will be respecting the cpuset of the caller when allocating a page. However, when we are migrating remote allocations (pages allocated from other context), the cpuset of the current context is irrelevant. For memory offlining + alloc_contig_*(), this is rather obvious. There might be other such page migration users, let's start with the obvious ones. This patch (of 2): We'll migrate pages allocated by other contexts; respecting the cpuset of the alloc_contig*() caller when allocating a migration target does not make sense. Drop the __GFP_HARDWALL. Note that in an ideal world, migration code could figure out the cpuset of the original context and take that into consideration. Link: https://lkml.kernel.org/r/20241205090508.2095225-1-david@redhat.com Link: https://lkml.kernel.org/r/20241205090508.2095225-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: mremap_test: Remove unused variable and type mismatchesMuhammad Usama Anjum
Remove unused variable and fix type mismatches. Link: https://lkml.kernel.org/r/20241209185624.2245158-5-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: mseal_test: remove unused variablesMuhammad Usama Anjum
Fix following warnings: - Remove unused variables and fix following warnings: Link: https://lkml.kernel.org/r/20241209185624.2245158-4-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: pagemap_ioctl: Fix types mismatches shown by compiler optionsMuhammad Usama Anjum
Fix following warnings caught by compiler: - There are several type mismatches among different variables. - Remove unused variable warnings. Link: https://lkml.kernel.org/r/20241209185624.2245158-3-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: thp_settings: remove const from return typeMuhammad Usama Anjum
Patch series "selftest/mm: Remove warnings found by adding compiler flags". Recently, I reviewed a patch on the mm/kselftest mailing list about a test which had obvious type mismatch fix in it. It was strange why that wasn't caught during development and when patch was accepted. This led me to discover that those extra compiler options to catch these warnings aren't being used. When I added them, I found tens of warnings in just mm suite. In this series, I'm fixing those warnings in a few files. More fixes will be sent later. This patch (of 4): Remove cost from the return type as it is ignored anyways and generates the warning: warning: type qualifiers ignored on function return type [-Wignored-qualifiers] Link: https://lkml.kernel.org/r/20241209185624.2245158-1-usama.anjum@collabora.com Link: https://lkml.kernel.org/r/20241209185624.2245158-2-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: David Hildenbrand <david@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mseal: remove can_do_mseal()Jeff Xu
No code logic change. can_do_mseal() is called exclusively by mseal.c, and mseal.c is compiled only when CONFIG_64BIT flag is set in makefile. Therefore, it is unnecessary to have 32 bit stub function in the header file, remove this function and merge the logic into do_mseal(). Link: https://lkml.kernel.org/r/20241206013934.2782793-1-jeffxu@google.com Link: https://lkml.kernel.org/r/20241206194839.3030596-2-jeffxu@google.com Signed-off-by: Jeff Xu <jeffxu@chromium.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jorge Lucangeli Obes <jorgelo@chromium.org> Cc: Kees Cook <keescook@chromium.org> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/hugetlb: support FOLL_FORCE|FOLL_WRITEGuillaume Morin
Eric reported that PTRACE_POKETEXT fails when applications use hugetlb for mapping text using huge pages. Before commit 1d8d14641fd9 ("mm/hugetlb: support write-faults in shared mappings"), PTRACE_POKETEXT worked by accident, but it was buggy and silently ended up mapping pages writable into the page tables even though VM_WRITE was not set. In general, FOLL_FORCE|FOLL_WRITE does currently not work with hugetlb. Let's implement FOLL_FORCE|FOLL_WRITE properly for hugetlb, such that what used to work in the past by accident now properly works, allowing applications using hugetlb for text etc. to get properly debugged. This change might also be required to implement uprobes support for hugetlb [1]. [1] https://lore.kernel.org/lkml/ZiK50qob9yl5e0Xz@bender.morinfr.org/ Link: https://lkml.kernel.org/r/Z1NshNfWuzUCPebA@bender.morinfr.org Signed-off-by: Guillaume Morin <guillaume@morinfr.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Xu <peterx@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric Hagberg <ehagberg@janestreet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: perform all memfd seal checks in a single placeLorenzo Stoakes
We no longer actually need to perform these checks in the f_op->mmap() hook any longer. We already moved the operation which clears VM_MAYWRITE on a read-only mapping of a write-sealed memfd in order to work around the restrictions imposed by commit 5de195060b2e ("mm: resolve faulty mmap_region() error path behaviour"). There is no reason for us not to simply go ahead and additionally check to see if any pre-existing seals are in place here rather than defer this to the f_op->mmap() hook. By doing this we remove more logic from shmem_mmap() which doesn't belong there, as well as doing the same for hugetlbfs_file_mmap(). We also remove dubious shared logic in mm.h which simply does not belong there either. It makes sense to do these checks at the earliest opportunity, we know these are shmem (or hugetlbfs) mappings whose relevant VMA flags will not change from the invoking do_mmap() so there is simply no need to wait. This also means the implementation of further memfd seal flags can be done within mm/memfd.c and also have the opportunity to modify VMA flags as necessary early in the mapping logic. [lorenzo.stoakes@oracle.com: fix typos in !memfd inline stub] Link: https://lkml.kernel.org/r/7dee6c5d-480b-4c24-b98e-6fa47dbd8a23@lucifer.local Link: https://lkml.kernel.org/r/20241206212846.210835-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jeff Xu <jeffxu@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: enforce __must_check on VMA merge and splitLorenzo Stoakes
It is of critical importance to check the return results on VMA merge (and split), failure to do so can result in use-after-free's. This bug has recurred, so have the compiler enforce this check to prevent any future repetition. Link: https://lkml.kernel.org/r/20241206225036.273103-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/damon/tests/vaddr-kunit.h: reduce stack consumptionAndrew Morton
After "mm: move per-vma lock into vm_area_struct" we're hitting mm/damon/tests/vaddr-kunit.h: In function 'damon_test_three_regions_in_vmas': mm/damon/tests/vaddr-kunit.h:92:1: error: the frame size of 3280 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] Fix by moving all those vmas off the stack. [akpm@linux-foundation.org: fix build] Closes: https://lkml.kernel.org/r/20241209170829.11311e70@canb.auug.org.au Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: introduce mmap_lock_speculate_{try_begin|retry}Suren Baghdasaryan
Add helper functions to speculatively perform operations without read-locking mmap_lock, expecting that mmap_lock will not be write-locked and mm is not modified from under us. [akpm@linux-foundation.org: use read_seqcount_retry() in mmap_lock_speculate_retry(), per Wei Yang] Link: https://lkml.kernel.org/r/20241122174416.1367052-3-surenb@google.com Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hillf Danton <hdanton@sina.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Sourav Panda <souravpanda@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: convert mm_lock_seq to a proper seqcountSuren Baghdasaryan
Convert mm_lock_seq to be seqcount_t and change all mmap_write_lock variants to increment it, in-line with the usual seqcount usage pattern. This lets us check whether the mmap_lock is write-locked by checking mm_lock_seq.sequence counter (odd=locked, even=unlocked). This will be used when implementing mmap_lock speculation functions. As a result vm_lock_seq is also change to be unsigned to match the type of mm_lock_seq.sequence. Link: https://lkml.kernel.org/r/20241122174416.1367052-2-surenb@google.com Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hillf Danton <hdanton@sina.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Sourav Panda <souravpanda@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13seqlock: add raw_seqcount_try_beginSuren Baghdasaryan
Add raw_seqcount_try_begin() to opens a read critical section of the given seqcount_t if the counter is even. This enables eliding the critical section entirely if the counter is odd, instead of doing the speculation knowing it will fail. Link: https://lkml.kernel.org/r/20241122174416.1367052-1-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Cc: Christian Brauner <brauner@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hillf Danton <hdanton@sina.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Sourav Panda <souravpanda@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/shmem: refactor to reuse vfs_parse_monolithic_sep for option parsingGuo Weikang
shmem_parse_options() is refactored to use vfs_parse_monolithic_sep() with a custom separator function, shmem_next_opt(). This eliminates redundant logic for parsing comma-separated options and ensures consistency with other kernel code that uses the same interface. The vfs_parse_monolithic_sep() helper was introduced in commit e001d1447cd4 ("fs: factor out vfs_parse_monolithic_sep() helper"). Link: https://lkml.kernel.org/r/20241205094521.1244678-1-guoweikang.kernel@gmail.com Signed-off-by: Guo Weikang <guoweikang.kernel@gmail.com> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: add fork CoW guard page testLorenzo Stoakes
When we fork anonymous pages, apply a guard page then remove it, the previous CoW mapping is cleared. This might not be obvious to an outside observer without taking some time to think about how the overall process functions, so document that this is the case through a test, which also usefully asserts that the behaviour is as we expect. This is grouped with other, more important, fork tests that ensure that guard pages are correctly propagated on fork. Fix a typo in a nearby comment at the same time. [ryan.roberts@arm.com: static process_madvise() wrapper for guard-pages] Link: https://lkml.kernel.org/r/20250107142937.1870478-1-ryan.roberts@arm.com Link: https://lkml.kernel.org/r/20241205190748.115656-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: add per-order mTHP swap-in fallback/fallback_charge countersWenchao Hao
Currently, large folio swap-in is supported, but we lack a method to analyze their success ratio. Similar to anon_fault_fallback, we introduce per-order mTHP swpin_fallback and swpin_fallback_charge counters for calculating their success ratio. The new counters are located at: /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats/ swpin_fallback swpin_fallback_charge Link: https://lkml.kernel.org/r/20241202124730.2407037-1-haowenchao22@gmail.com Signed-off-by: Wenchao Hao <haowenchao22@gmail.com> Reviewed-by: Barry Song <baohua@kernel.org> Reviewed-by: Lance Yang <ioworker0@gmail.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13x86: select ARCH_SUPPORTS_PT_RECLAIM if X86_64Qi Zheng
Now, x86 has fully supported the CONFIG_PT_RECLAIM feature, and reclaiming PTE pages is profitable only on 64-bit systems, so select ARCH_SUPPORTS_PT_RECLAIM if X86_64. Link: https://lkml.kernel.org/r/841c1f35478d5354872d307888979c9e20de9c09.1733305182.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13x86: mm: free page table pages by RCU instead of semi RCUQi Zheng
Now, if CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected, the page table pages will be freed by semi RCU, that is: - batch table freeing: asynchronous free by RCU - single table freeing: IPI + synchronous free In this way, the page table can be lockless traversed by disabling IRQ in paths such as fast GUP. But this is not enough to free the empty PTE page table pages in paths other that munmap and exit_mmap path, because IPI cannot be synchronized with rcu_read_lock() in pte_offset_map{_lock}(). In preparation for supporting empty PTE page table pages reclaimation, let single table also be freed by RCU like batch table freeing. Then we can also use pte_offset_map() etc to prevent PTE page from being freed. Like pte_free_defer(), we can also safely use ptdesc->pt_rcu_head to free the page table pages: - The pt_rcu_head is unioned with pt_list and pmd_huge_pte. - For pt_list, it is used to manage the PGD page in x86. Fortunately tlb_remove_table() will not be used for free PGD pages, so it is safe to use pt_rcu_head. - For pmd_huge_pte, it is used for THPs, so it is safe. After applying this patch, if CONFIG_PT_RECLAIM is enabled, the function call of free_pte() is as follows: free_pte pte_free_tlb __pte_free_tlb ___pte_free_tlb paravirt_tlb_remove_table tlb_remove_table [!CONFIG_PARAVIRT, Xen PV, Hyper-V, KVM] [no-free-memory slowpath:] tlb_table_invalidate tlb_remove_table_one __tlb_remove_table_one [frees via RCU] [fastpath:] tlb_table_flush tlb_remove_table_free [frees via RCU] native_tlb_remove_table [CONFIG_PARAVIRT on native] tlb_remove_table [see above] Link: https://lkml.kernel.org/r/0287d442a973150b0e1019cc406e6322d148277a.1733305182.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>