summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-01-25riscv: mm: skip pgtable level check in {pud,p4d}_alloc_oneKevin Brodsky
Patch series "move pagetable_*_dtor() to __tlb_remove_table()", v5. As proposed [1] by Peter Zijlstra below, this patch series aims to move pagetable_*_dtor() into __tlb_remove_table(). This will cleanup pagetable_*_dtor() a bit and more gracefully fix the UAF issue [2] reported by syzbot. : Notably: : : - s390 pud isn't calling the existing pagetable_pud_[cd]tor() : - none of the p4d things have pagetable_p4d_[cd]tor() (x86,arm64,s390,riscv) : and they have inconsistent accounting : - while much of the _ctor calls are in generic code, many of the _dtor : calls are in arch code for hysterial raisins, this could easily be : fixed : - if we fix ptlock_free() to handle NULL, then all the _dtor() : functions can use it, and we can observe they're all identical : and can be folded : : after all that cleanup, you can move the _dtor from *_free_tlb() into : tlb_remove_table() -- which for the above case, would then have it called : from __tlb_remove_table_free(). This patch (of 16): {pmd,pud,p4d}_alloc_one() is never called if the corresponding page table level is folded, as {pmd,pud,p4d}_alloc() already does the required check. We can therefore remove the runtime page table level checks in {pud,p4d}_alloc_one. The PUD helper becomes equivalent to the generic version, so we remove it altogether. This is consistent with the way arm64 and x86 handle this situation (runtime check in p4d_free() only). Link: https://lkml.kernel.org/r/cover.1736317725.git.zhengqi.arch@bytedance.com Link: https://lkml.kernel.org/r/93a1c6bddc0ded9f1a9f15658c1e4af5c93d1194.1736317725.git.zhengqi.arch@bytedance.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm: remove unnecessary calls to lru_add_drainRik van Riel
There seem to be several categories of calls to lru_add_drain and lru_add_drain_all. The first are code paths that recently allocated, swapped in, or otherwise processed a batch of pages, and want them all on the LRU. These drain pages that were recently allocated, probably on the local CPU. A second category are code paths that are actively trying to reclaim, migrate, or offline memory. These often use lru_add_drain_all, to drain the caches on all CPUs. However, there also seem to be some other callers where we aren't really doing either. They are calling lru_add_drain(), despite operating on pages that may have been allocated long ago, and quite possibly on different CPUs. Those calls are not likely to be effective at anything but creating lock contention on the LRU locks. Remove the lru_add_drain calls in the latter category. For detailed reasoning, see [1] and [2]. Link: https://lkml.kernel.org/r/dca2824e8e88e826c6b260a831d79089b5b9c79d.camel@surriel.com [1] Link: https://lkml.kernel.org/r/xxfhcjaq2xxcl5adastz5omkytenq7izo2e5f4q7e3ns4z6lko@odigjjc7hqrg [2] Link: https://lkml.kernel.org/r/20241219153253.3da9e8aa@fangorn Signed-off-by: Rik van Riel <riel@surriel.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Chris Li <chrisl@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm: add build-time option for hotplug memory default online typeGregory Price
Memory hotplug presently auto-onlines memory into a zone the kernel deems appropriate if CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y. The memhp_default_state boot param enables runtime config, but it's not possible to do this at build-time. Remove CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE, and replace it with CONFIG_MHP_DEFAULT_ONLINE_TYPE_* choices that sync with the boot param. Selections: CONFIG_MHP_DEFAULT_ONLINE_TYPE_OFFLINE => mhp_default_online_type = "offline" Memory will not be onlined automatically. CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO => mhp_default_online_type = "online" Memory will be onlined automatically in a zone deemed. appropriate by the kernel. CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_KERNEL => mhp_default_online_type = "online_kernel" Memory will be onlined automatically. The zone may allow kernel data (e.g. ZONE_NORMAL). CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_MOVABLE => mhp_default_online_type = "online_movable" Memory will be onlined automatically. The zone will be ZONE_MOVABLE. Default to CONFIG_MHP_DEFAULT_ONLINE_TYPE_OFFLINE to match the existing default CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n behavior. Existing users of CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y should use CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO. [gourry@gourry.net: update KConfig comments] Link: https://lkml.kernel.org/r/20241226182918.648799-1-gourry@gourry.net Link: https://lkml.kernel.org/r/20241220210709.300066-1-gourry@gourry.net Signed-off-by: Gregory Price <gourry@gourry.net> Acked-by: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25selftests/mm: add new test cases to the migration testDonet Tom
Added three new test cases to the migration tests: 1. Shared anon THP migration test This test will mmap shared anon memory, madvise it to MADV_HUGEPAGE, then do migration entry testing. One thread will move pages back and forth between nodes whilst other threads try and access them. 2. Private anon hugetlb migration test This test will mmap private anon hugetlb memory and then do the migration entry testing. 3. Shared anon hugetlb migration test This test will mmap shared anon hugetlb memory and then do the migration entry testing. Test results ============ # ./tools/testing/selftests/mm/migration TAP version 13 1..6 # Starting 6 tests from 1 test cases. # RUN migration.private_anon ... # OK migration.private_anon ok 1 migration.private_anon # RUN migration.shared_anon ... # OK migration.shared_anon ok 2 migration.shared_anon # RUN migration.private_anon_thp ... # OK migration.private_anon_thp ok 3 migration.private_anon_thp # RUN migration.shared_anon_thp ... # OK migration.shared_anon_thp ok 4 migration.shared_anon_thp # RUN migration.private_anon_htlb ... # OK migration.private_anon_htlb ok 5 migration.private_anon_htlb # RUN migration.shared_anon_htlb ... # OK migration.shared_anon_htlb ok 6 migration.shared_anon_htlb # PASSED: 6 / 6 tests passed. # Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0 # Link: https://lkml.kernel.org/r/20241219102720.4487-1-donettom@linux.ibm.com Signed-off-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm: replace free hugepage folios after migrationyangge
My machine has 4 NUMA nodes, each equipped with 32GB of memory. I have configured each NUMA node with 16GB of CMA and 16GB of in-use hugetlb pages. The allocation of contiguous memory via cma_alloc() can fail probabilistically. When there are free hugetlb folios in the hugetlb pool, during the migration of in-use hugetlb folios, new folios are allocated from the free hugetlb pool. After the migration is completed, the old folios are released back to the free hugetlb pool instead of being returned to the buddy system. This can cause test_pages_isolated() check to fail, ultimately leading to the failure of cma_alloc(). Call trace: cma_alloc() __alloc_contig_migrate_range() // migrate in-use hugepage test_pages_isolated() __test_page_isolated_in_pageblock() PageBuddy(page) // check if the page is in buddy To address this issue, we introduce a function named replace_free_hugepage_folios(). This function will replace the hugepage in the free hugepage pool with a new one and release the old one to the buddy system. After the migration of in-use hugetlb pages is completed, we will invoke replace_free_hugepage_folios() to ensure that these hugepages are properly released to the buddy system. Following this step, when test_pages_isolated() is executed for inspection, it will successfully pass. Additionally, when alloc_contig_range() is used to migrate multiple in-use hugetlb pages, it can result in some in-use hugetlb pages being released back to the free hugetlb pool and subsequently being reallocated and used again. For example: [huge 0] [huge 1] To migrate huge 0, we obtain huge x from the pool. After the migration is completed, we return the now-freed huge 0 back to the pool. When it's time to migrate huge 1, we can simply reuse the now-freed huge 0 from the pool. As a result, when replace_free_hugepage_folios() is executed, it cannot release huge 0 back to the buddy system. To address this issue, we should prevent the reuse of isolated free hugepages during the migration process. Link: https://lkml.kernel.org/r/1734503588-16254-1-git-send-email-yangge1116@126.com Link: https://lkml.kernel.org/r/1736582300-11364-1-git-send-email-yangge1116@126.com Signed-off-by: yangge <yangge1116@126.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <21cnbao@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25zram: cond_resched() in writeback loopSergey Senozhatsky
zram writeback is a costly operation, because every target slot (unless ZRAM_HUGE) is decompressed before it gets written to a backing device. The writeback to a backing device uses submit_bio_wait() which may look like a rescheduling point. However, if the backing device has BD_HAS_SUBMIT_BIO bit set __submit_bio() calls directly disk->fops->submit_bio(bio) on the backing device and so when submit_bio_wait() calls blk_wait_io() the I/O is already done. On such systems we effective end up in a loop for_each (target slot) { decompress(slot) __submit_bio() disk->fops->submit_bio(bio) } Which on PREEMPT_NONE systems triggers watchdogs (since there are no explicit rescheduling points). Add cond_resched() to the zram writeback loop. Link: https://lkml.kernel.org/r/20241218063513.297475-8-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25zram: use zram_read_from_zspool() in writebackSergey Senozhatsky
We only can read pages from zspool in writeback, zram_read_page() is not really right in that context not only because it's a more generic function that handles ZRAM_WB pages, but also because it requires us to unlock slot between slot flag check and actual page read. Use zram_read_from_zspool() instead and do slot flags check and page read under the same slot lock. Link: https://lkml.kernel.org/r/20241218063513.297475-7-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25zram: factor out different page types readSergey Senozhatsky
Similarly to write, split the page read code into ZRAM_HUGE read, ZRAM_SAME read and compressed page read to simplify the code. Link: https://lkml.kernel.org/r/20241218063513.297475-6-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25zram: factor out ZRAM_HUGE writeSergey Senozhatsky
zram_write_page() handles: ZRAM_SAME pages (which was already factored out) stores, regular page stores and ZRAM_HUGE pages stores. ZRAM_HUGE handling adds a significant amount of complexity. Instead, we can handle ZRAM_HUGE in a separate function. This allows us to simplify zs_handle allocations slow-path, as it now does not handle ZRAM_HUGE case. ZRAM_HUGE zs_handle allocation, on the other hand, can now drop __GFP_KSWAPD_RECLAIM because we handle ZRAM_HUGE in preemptible context (outside of local-lock scope). Link: https://lkml.kernel.org/r/20241218063513.297475-5-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25zram: factor out ZRAM_SAME writeSergey Senozhatsky
Handling of ZRAM_SAME now uses a goto to the final stages of zram_write_page() plus it introduces a branch and flags variable, which is not making the code any simpler. In reality, we can handle ZRAM_SAME immediately when we detect such pages and remove a goto and a branch. Factor out ZRAM_SAME handling into a separate routine to simplify zram_write_page(). Link: https://lkml.kernel.org/r/20241218063513.297475-4-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25zram: remove entry element memberSergey Senozhatsky
Element is in the same anon union as handle and hence holds the same value, which makes code below sort of confusing handle = zram_get_handle() if (!handle) element = zram_get_element() Element doesn't really simplify the code, let's just remove it. We already re-purpose handle to store the block id a written back page. Link: https://lkml.kernel.org/r/20241218063513.297475-3-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25zram: free slot memory early during writeSergey Senozhatsky
Patch series "zram: split page type read/write handling", v2. This is a subset of [1] series which contains only fixes and improvements (no new features, as ZRAM_HUGE split is still under consideration). The motivation for factoring out is that zram_write_page() gets more and more complex all the time, because it tries to handle too many scenarios: ZRAM_SAME store, ZRAM_HUGE store, compress page store with zs_malloc allocation slowpath and conditional recompression, etc. Factor those out and make things easier to handle. Addition of cond_resched() is simply a fix, I can trigger watchdog from zram writeback(). And early slot free is just a reasonable thing to do. [1] https://lore.kernel.org/linux-kernel/20241119072057.3440039-1-senozhatsky@chromium.org This patch (of 7): In the current implementation entry's previously allocated memory is released in the very last moment, when we already have allocated a new memory for new data. This, basically, temporarily increases memory usage for no good reason. For example, consider the case when both old (stale) and new entry data are incompressible so such entry will temporarily use two physical pages - one for stale (old) data and one for new data. We can release old memory as soon as we get a write request for entry. Link: https://lkml.kernel.org/r/20241218063513.297475-1-senozhatsky@chromium.org Link: https://lkml.kernel.org/r/20241218063513.297475-2-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm/swap_cgroup: decouple swap cgroup recording and clearingKairui Song
The current implementation of swap cgroup tracking is a bit complex and fragile: On charging path, swap_cgroup_record always records an actual memcg id, and it depends on the caller to make sure all entries passed in must belong to one single folio. As folios are always charged or uncharged as a whole, and always charged and uncharged in order, swap_cgroup doesn't need an extra lock. On uncharging path, swap_cgroup_record always sets the record to zero. These entries won't be charged again until uncharging is done. So there is no extra lock needed either. Worth noting that swap cgroup clearing may happen without folio involved, eg. exiting processes will zap its page table without swapin. The xchg/cmpxchg provides atomic operations and barriers to ensure no tearing or synchronization issue of these swap cgroup records. It works but quite error-prone. Things can be much clear and robust by decoupling recording and clearing into two helpers. Recording takes the actual folio being charged as argument, and clearing always set the record to zero, and refine the debug sanity checks to better reflect their usage Benchmark even showed a very slight improvement as it saved some extra arguments and lookups: make -j96 with defconfig on tmpfs in 1.5G memory cgroup using 4k folios: Before: sys 9617.23 (stdev 37.764062) After : sys 9541.54 (stdev 42.973976) make -j96 with defconfig on tmpfs in 2G memory cgroup using 64k folios: Before: sys 7358.98 (stdev 54.927593) After : sys 7337.82 (stdev 39.398956) Link: https://lkml.kernel.org/r/20241218114633.85196-5-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Suggested-by: Chris Li <chrisl@kernel.org> Cc: Barry Song <baohua@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm/swap_cgroup: remove global swap cgroup lockKairui Song
commit e9e58a4ec3b1 ("memcg: avoid use cmpxchg in swap cgroup maintainance") replaced the cmpxchg/xchg with a global irq spinlock because some archs doesn't support 2 bytes cmpxchg/xchg. Clearly this won't scale well. And as commented in swap_cgroup.c, this lock is not needed for map synchronization. Emulation of 2 bytes xchg with atomic cmpxchg isn't hard, so implement it to get rid of this lock. Introduced two helpers for doing so and they can be easily dropped if a generic 2 byte xchg is support. Testing using 64G brd and build with build kernel with make -j96 in 1.5G memory cgroup using 4k folios showed below improvement (6 test run): Before this series: Sys time: 10782.29 (stdev 42.353886) Real time: 171.49 (stdev 0.595541) After this commit: Sys time: 9617.23 (stdev 37.764062), -10.81% Real time: 159.65 (stdev 0.587388), -6.90% With 64k folios and 2G memcg: Before this series: Sys time: 8176.94 (stdev 26.414712) Real time: 141.98 (stdev 0.797382) After this commit: Sys time: 7358.98 (stdev 54.927593), -10.00% Real time: 134.07 (stdev 0.757463), -5.57% Sequential swapout of 8G 64k zero folios with madvise (24 test run): Before this series: 5461409.12 us (stdev 183957.827084) After this commit: 5420447.26 us (stdev 196419.240317) Sequential swapin of 8G 4k zero folios (24 test run): Before this series: 19736958.916667 us (stdev 189027.246676) After this commit: 19662182.629630 us (stdev 172717.640614) Performance is better or at least not worse for all tests above. Link: https://lkml.kernel.org/r/20241218114633.85196-4-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm/swap_cgroup: remove swap_cgroup_cmpxchgKairui Song
This function is never used after commit 6b611388b626 ("memcg-v1: remove charge move code"). Link: https://lkml.kernel.org/r/20241218114633.85196-3-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Chris Li <chrisl@kernel.org> Cc: Barry Song <baohua@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm, memcontrol: avoid duplicated memcg enable checkKairui Song
Patch series "mm/swap_cgroup: remove global swap cgroup lock", v3. This series removes the global swap cgroup lock. The critical section of this lock is very short but it's still a bottle neck for mass parallel swap workloads. Up to 10% performance gain for tmpfs build kernel test on a 48c96t system under memory pressure, and no regression for other cases: This patch (of 3): mem_cgroup_uncharge_swap() includes a mem_cgroup_disabled() check, so the caller doesn't need to check that. Link: https://lkml.kernel.org/r/20241218114633.85196-1-ryncsn@gmail.com Link: https://lkml.kernel.org/r/20241218114633.85196-2-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Chris Li <chrisl@kernel.org> Cc: Barry Song <baohua@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25test_maple_tree: test exhausted upper limit of mtree_alloc_cyclic()Liam R. Howlett
When the upper bound of the search is exhausted, the maple state may be returned in an error state of -EBUSY. This means maple state needs to be reset before the second search in mas_alloc_cylic() to ensure the search happens. This test ensures the issue is not recreated. Link: https://lkml.kernel.org/r/20241216190113.1226145-3-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Yang Erkun <yangerkun@huawei.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> says: Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm/page_idle: constify 'struct bin_attribute'Thomas Weißschuh
The sysfs core now allows instances of 'struct bin_attribute' to be moved into read-only memory. Make use of that to protect them against accidental or malicious modifications. Link: https://lkml.kernel.org/r/20241216-sysfs-const-bin_attr-page_idle-v1-1-cc01ecc55196@weissschuh.net Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm/huge_memory.c: rename shadowed localAndrew Morton
split_huge_pages_write() has a lccal `buf' which shadows incoming arg `buf'. Reviewer confusion resulted. Rename the inner local to `tok_buf'. Cc: Leo Stone <leocstone@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25tools: testing: add simple __mmap_region() userland testLorenzo Stoakes
Introduce demonstrative, basic, __mmap_region() test upon which we can base further work upon moving forwards. This simply asserts that mappings can be made and merges occur as expected. As part of this change, fix the security_vm_enough_memory_mm() stub which was previously incorrectly implemented. Link: https://lkml.kernel.org/r/20241213162409.41498-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm: unexport apply_to_existing_page_rangeChristoph Hellwig
apply_to_existing_page_range() is only used by non-modular code. Link: https://lkml.kernel.org/r/20241212073423.1439954-1-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm: fix outdated incorrect code comments for handle_mm_fault()Jinliang Zheng
[akpm@linux-foundation.org: s/mmap_Lock/mmap_lock/, per Liam] Link: https://lkml.kernel.org/r/20241213031820.778342-1-alexjlzheng@tencent.com Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/early_ioremap: add null pointer checks to prevent NULL-pointer dereferenceGuo Weikang
The early_ioremap interface can fail and return NULL in certain cases. To prevent NULL-pointer dereference crashes, fixed issues in the acpi_extlog and copy_early_mem interfaces, improving robustness when handling early memory. Link: https://lkml.kernel.org/r/20241212101004.1544070-1-guoweikang.kernel@gmail.com Signed-off-by: Guo Weikang <guoweikang.kernel@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Julian Stecklina <julian.stecklina@cyberus-technology.de> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <lenb@kernel.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: add comments to do_mmap(), mmap_region() and vm_mmap()Lorenzo Stoakes
It isn't always entirely clear to users the difference between do_mmap(), mmap_region() and vm_mmap(), so add comments to clarify what's going on in each. This is compounded by the fact that we actually allow callers external to mm to invoke both do_mmap() and mmap_region() (!), the latter of which is really strictly speaking an internal memory mapping implementation detail. Link: https://lkml.kernel.org/r/20241212113152.28849-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: assert mmap write lock held on do_mmap(), mmap_region()Lorenzo Stoakes
Both of these functions can be invoked outside of mm, so it is probably a good idea to assert that the required lock is held. Will only have an impact if CONFIG_DEBUG_VM is set, otherwise this amounts to no change at all. Link: https://lkml.kernel.org/r/20241212114841.55185-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13MAINTAINERS: update MEMORY MAPPING sectionLorenzo Stoakes
Update the MEMORY MAPPING section to contain VMA logic as it makes no sense to have these two sections separate. Additionally, add files which permit changes to the attributes and/or ranges spanned by memory mappings, in essence anything which might alter the output of /proc/$pid/[s]maps. This is necessarily fuzzy, as there is not quite as good separation of concerns as we would ideally like in the kernel. However each of these files interacts with the VMA and memory mapping logic in such a way as to be inseparatable from it, and it is important that they are maintained in conjunction with it. Link: https://lkml.kernel.org/r/20241211105315.21756-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13memcg/hugetlb: remove memcg hugetlb try-commit-cancel protocolJoshua Hahn
This patch fully removes the mem_cgroup_{try, commit, cancel}_charge functions, as well as their hugetlb variants. Link: https://lkml.kernel.org/r/20241211203951.764733-4-joshua.hahnjy@gmail.com Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13memcg/hugetlb: introduce mem_cgroup_charge_hugetlbJoshua Hahn
This patch introduces mem_cgroup_charge_hugetlb which combines the logic of mem_cgroup_hugetlb_try_charge / mem_cgroup_hugetlb_commit_charge and removes the need for mem_cgroup_hugetlb_cancel_charge. It also reduces the footprint of memcg in hugetlb code and consolidates all memcg related error paths into one. Link: https://lkml.kernel.org/r/20241211203951.764733-3-joshua.hahnjy@gmail.com Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13memcg/hugetlb: introduce memcg_accounts_hugetlbJoshua Hahn
Patch series "memcg/hugetlb: Rework memcg hugetlb charging", v3. This series cleans up memcg's hugetlb charging logic by deprecating the current memcg hugetlb try-charge + {commit, cancel} logic present in alloc_hugetlb_folio. A single function mem_cgroup_charge_hugetlb takes its place instead. This makes the code more maintainable by simplifying the error path and reduces memcg's footprint in hugetlb logic. This patch introduces a few changes in the hugetlb folio allocation error path: (a) Instead of having multiple return points, we consolidate them to two: one for reaching the memcg limit or running out of memory (-ENOMEM) and one for hugetlb allocation fails / limit being reached (-ENOSPC). (b) Previously, the memcg limit was checked before the folio is acquired, meaning the hugeTLB folio isn't acquired if the limit is reached. This patch performs the charging after the folio is reached, meaning if memcg's limit is reached, the acquired folio is freed right away. This patch builds on two earlier patch series: [2] which adds memcg hugeTLB counters, and [3] which deprecates charge moving and removes the last references to mem_cgroup_cancel_charge. The request for this cleanup can be found in [2]. [1] https://lore.kernel.org/all/20231006184629.155543-1-nphamcs@gmail.com/ [2] https://lore.kernel.org/all/20241101204402.1885383-1-joshua.hahnjy@gmail.com/ [3] https://lore.kernel.org/linux-mm/20241025012304.2473312-1-shakeel.butt@linux.dev/ This patch (of 3): This patch isolates the check for whether memcg accounts hugetlb. This condition can only be true if the memcg mount option memory_hugetlb_accounting is on, which includes hugetlb usage in memory.current. Link: https://lkml.kernel.org/r/20241211203951.764733-1-joshua.hahnjy@gmail.com Link: https://lkml.kernel.org/r/20241211203951.764733-2-joshua.hahnjy@gmail.com Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/migrate: remove slab checks in isolate_movable_page()Hyeonggon Yoo
Commit 8b8817630ae8 ("mm/migrate: make isolate_movable_page() skip slab pages") introduced slab checks to prevent mis-identification of slab pages as movable kernel pages. However, after Matthew's frozen folio series, these slab checks became unnecessary as the migration logic fails to increase the reference count for frozen slab folios. Remove these redundant slab checks and associated memory barriers. Link: https://lkml.kernel.org/r/20241210124807.8584-1-42.hyeyoo@gmail.com Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13samples/damon/prcl: implement schemes setupSeongJae Park
Implement a proactive cold memory regions reclaiming logic of prcl sample module using DAMOS. The logic treats memory regions that not accessed at all for five or more seconds as cold, and reclaim those as soon as found. Link: https://lkml.kernel.org/r/20241210215030.85675-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13samples/damon: introduce a skeleton of a smaple DAMON module for proactive ↵SeongJae Park
reclamation DAMON is not only for monitoring of access patterns, but also for access-aware system operations. For the system operations, DAMON provides a feature called DAMOS (Data Access Monitoring-based Operation Schemes). There is no sample API usage of DAMOS, though. Copy the working set size estimation sample modules with changed names of the module and symbols, to use it as a skeleton for a sample module showing the DAMOS API usage. The following commit will make it proactively reclaim cold memory of the given process, using DAMOS. Link: https://lkml.kernel.org/r/20241210215030.85675-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13samples/damon/wsse: implement working set size estimation and loggingSeongJae Park
Implement the DAMON-based working set size estimation logic. The logic iterates memory regions in DAMON-generated access pattern snapshot for every aggregation interval and get the total sum of the size of any region having one or higher 'nr_accesses' count. That is, it assumes any region having one or higher 'nr_accesses' to be a part of the working set. The estimated value is reported to the user by printing it to the kernel log. Link: https://lkml.kernel.org/r/20241210215030.85675-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13samples/damon/wsse: start and stop DAMON as the user requestsSeongJae Park
Start running DAMON to monitor accesses of a process that the user specified via 'target_pid' parameter, when 'y' is passed to 'enable' parameter. Stop running DAMON when 'n' is passed to 'enable' parameter. Estimating the working set size from DAMON's monitoring results and reporting it to the user will be implemented by the following commit. Link: https://lkml.kernel.org/r/20241210215030.85675-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13samples: add a skeleton of a sample DAMON module for working set size estimationSeongJae Park
Patch series "mm/damon: add sample modules". Implement a proactive cold memory regions reclaiming logic of prcl sample module using DAMOS. The logic treats memory regions that not accessed at all for five or more seconds as cold, and reclaim those as soon as found. This patch (of 5): Add a skeleton for a sample DAMON static module that can be used for estimating working set size of a given process. Note that it is a static module since DAMON is not exporting symbols to loadable modules for now. It exposes two module parameters, namely 'pid' and 'enable'. 'pid' will specify the process that the module will estimate the working set size of. 'enable' will receive whether to start or stop the estimation. Because this is just a skeleton, the parameters do nothing, though. The functionalities will be implemented by following commits. Link: https://lkml.kernel.org/r/20241210215030.85675-1-sj@kernel.org Link: https://lkml.kernel.org/r/20241210215030.85675-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: remove X permission from sigaltstack mappingKevin Brodsky
There is no reason why the alternate signal stack should be mapped as RWX. Map it as RW instead. Link: https://lkml.kernel.org/r/20241209095019.1732120-15-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: skip pkey_sighandler_tests if support is missingKevin Brodsky
The pkey_sighandler_tests are bound to fail if either the kernel or CPU doesn't support pkeys. Skip the tests if pkeys support is missing. Link: https://lkml.kernel.org/r/20241209095019.1732120-14-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: rename pkey register macroKevin Brodsky
PKEY_ALLOW_ALL is meant to represent the pkey register value that allows all accesses (enables all pkeys). However its current naming suggests that the value applies to *one* key only (like PKEY_DISABLE_ACCESS for instance). Rename PKEY_ALLOW_ALL to PKEY_REG_ALLOW_ALL to avoid such misunderstanding. This is consistent with the PKEY_REG_ALLOW_NONE macro introduced by commit 6e182dc9f268 ("selftests/mm: Use generic pkey register manipulation"). Link: https://lkml.kernel.org/r/20241209095019.1732120-13-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: use sys_pkey helpers consistentlyKevin Brodsky
sys_pkey_alloc, sys_pkey_free and sys_mprotect_pkey are currently used in protections_keys.c, while pkey_sighandler_tests.c calls the libc wrappers directly (e.g. pkey_mprotect()). This is probably ok when using glibc (those symbols appeared a while ago), but Musl does not currently provide them. The logging in the helpers from pkey-helpers.h can also come in handy. Make things more consistent by using the sys_pkey helpers in pkey_sighandler_tests.c too. To that end their implementation is moved to a common .c file (pkey_util.c). This also enables calling is_pkeys_supported() outside of protections_keys.c, since it relies on sys_pkey_{alloc,free}. [kevin.brodsky@arm.com: fix dependency on pkey_util.c] Link: https://lkml.kernel.org/r/20241216092849.2140850-1-kevin.brodsky@arm.com Link: https://lkml.kernel.org/r/20241209095019.1732120-12-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: ensure non-global pkey symbols are marked staticKevin Brodsky
The pkey tests define a whole lot of functions and some global variables. A few are truly global (declared in pkey-helpers.h), but the majority are file-scoped. Make sure those are labelled static. Some of the pkey_{access,write}_{allow,deny} helpers are not called, or only called when building for some architectures. Mark them __maybe_unused to suppress compiler warnings. Link: https://lkml.kernel.org/r/20241209095019.1732120-11-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: remove empty pkey helper definitionKevin Brodsky
Some of the functions declared in pkey-helpers.h are actually defined in protections_keys.c, meaning they can only be called from protections_keys.c. This is less than ideal, but it is hard to avoid as these helpers are themselves called from inline functions in pkey-<arch>.h. Let's at least add a comment clarifying that. We can also remove the empty definition in pkey_sighandler_tests.c: expected_pkey_fault() is not meant to be called from there. Link: https://lkml.kernel.org/r/20241209095019.1732120-10-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: ensure pkey-*.h define inline functions onlyKevin Brodsky
Headers should not define non-inline functions, as this prevents them from being included more than once in a given program. pkey-helpers.h and the arch-specific headers it includes currently define multiple such non-inline functions. In most cases those functions can simply be made inline - this patch does just that. read_ptr() is an exception as it must not be inlined. Since it is only called from protection_keys.c, we just move it there. Link: https://lkml.kernel.org/r/20241209095019.1732120-9-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: define types using typedef in pkey-helpers.hKevin Brodsky
Using #define to define types should be avoided. Use typedef instead. Also ensure that __u* types are actually defined by including <linux/types.h>. Link: https://lkml.kernel.org/r/20241209095019.1732120-8-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: remove unused pkey helpersKevin Brodsky
Commit 5f23f6d082a9 ("x86/pkeys: Add self-tests") introduced a number of helpers and functions that don't seem to have ever been used. Let's remove them. Link: https://lkml.kernel.org/r/20241209095019.1732120-7-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: build with -O2Kevin Brodsky
The mm kselftests are currently built with no optimisation (-O0). It's unclear why, and besides being obviously suboptimal, this also prevents the pkeys tests from working as intended. Let's build all the tests with -O2. [kevin.brodsky@arm.com: silence unused-result warnings] Link: https://lkml.kernel.org/r/20250107170110.2819685-1-kevin.brodsky@arm.com Link: https://lkml.kernel.org/r/20241209095019.1732120-6-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: fix -Warray-bounds warnings in pkey_sighandler_testsKevin Brodsky
GCC doesn't like dereferencing a pointer set to 0x1 (when building at -O2): pkey_sighandler_tests.c:166:9: warning: array subscript 0 is outside array bounds of 'int[0]' [-Warray-bounds=] 166 | *(int *) (0x1) = 1; | ^~~~~~~~~~~~~~ cc1: note: source object is likely at address zero Using NULL instead seems to make it happy. This should make no difference in practice (SIGSEGV with SEGV_MAPERR will be the outcome regardless), we just need to update the expected si_addr. [kevin.brodsky@arm.com: fix clang dereferencing-null issue] Link: https://lkml.kernel.org/r/20241218153615.2267571-1-kevin.brodsky@arm.com Link: https://lkml.kernel.org/r/20241209095019.1732120-5-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: fix strncpy() lengthKevin Brodsky
GCC complains (with -O2) that the length is equal to the destination size, which is indeed invalid. Subtract 1 from the size of the array to leave room for '\0'. Link: https://lkml.kernel.org/r/20241209095019.1732120-4-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: fix -Wmaybe-uninitialized warningsKevin Brodsky
A few -Wmaybe-uninitialized warnings show up when building the mm tests with -O2. None of them looks worrying; silence them by initialising the problematic variables. Link: https://lkml.kernel.org/r/20241209095019.1732120-3-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13selftests/mm: fix condition in uffd_move_test_common()Kevin Brodsky
Patch series "pkeys kselftests improvements". This series brings various cleanups and fixes for the mm (mostly pkeys) kselftests. The original goal was to make the pkeys tests work out of the box and without build warning - it turned out to be more involved than expected. The most important change is enabling -O2 when building all mm kselftests (patch 5). This is actually needed for the pkeys tests to run successfully (see gcc command line at the top of protection_keys.c and pkey_sighandler_tests.c), and seems to have no negative impact on the other tests. It certainly can't hurt performance! The following patches address a few obvious issues in the pkeys tests (unused code, bad scope for functions/variables, etc.) and finally make a couple of small improvements. There is one ugliness that this series does not fix: some functions in pkey-<arch>.h call functions that are actually defined in protection_keys.c. For instance, expect_fault_on_read_execonly_key() in pkey-x86.h calls expected_pkey_fault(). This means that other test programs that use pkey-helpers.h (namely pkey_sighandler_tests) would fail to link if they called such functions defined in pkey-<arch>.h. Fixing this would require a more comprehensive reorganisation of the pkey-* headers, which doesn't seem worth it (patch 9 adds a comment to pkey-helpers.h to clarify the situation). Some more details on the patches: - Patch 1 is an unrelated fix that was revealed by inspecting a warning. It seems fairly harmless though, so I thought I'd just post it as part of this series. - Patch 2-5 fix various warnings that come up by building the mm tests at -O2 and finally enable -O2. - Patch 6-12 are various cleanups for the pkeys tests. Patch 11 in particular enables is_pkeys_supported() to be called from outside protection_keys.c (patch 13 relies on this). - Patch 13-14 are small improvements to pkey_sighandler_tests.c. Many thanks to Ryan Roberts for checking that the mm tests still run fine on arm64 with those patches applied. I've also checked that the pkeys tests run fine on arm64 and x86. This patch (of 14): area_src and area_dst are saved at the beginning of the function if chunk_size > page_size. The intention is quite clearly to restore them at the end based on the same condition, but step_size is considered instead of chunk_size. Considering that step_size is a number of pages, the condition is likely to be false. Use the same condition as when saving so that the globals are restored as intended. Link: https://lkml.kernel.org/r/20241209095019.1732120-1-kevin.brodsky@arm.com Link: https://lkml.kernel.org/r/20241209095019.1732120-2-kevin.brodsky@arm.com Fixes: a2bf6a9ca805 ("selftests/mm: add UFFDIO_MOVE ioctl test") Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/memory_hotplug: don't use __GFP_HARDWALL when migrating pages via memory ↵David Hildenbrand
offlining We'll migrate pages allocated by other context; respecting the cpuset of the memory offlining context when allocating a migration target does not make sense. Drop the __GFP_HARDWALL by using GFP_KERNEL. Note that in an ideal world, migration code could figure out the cpuset of the original context and take that into consideration. Link: https://lkml.kernel.org/r/20241205090508.2095225-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>