summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-06-19mm: move mm_count into its own cache lineMathieu Desnoyers
The mm_struct mm_count field is frequently updated by mmgrab/mmdrop performed by context switch. This causes false-sharing for surrounding mm_struct fields which are read-mostly. This has been observed on a 2sockets/112core/224cpu Intel Sapphire Rapids server running hackbench, and by the kernel test robot will-it-scale testcase. Move the mm_count field into its own cache line to prevent false-sharing with other mm_struct fields. Move mm_count to the first field of mm_struct to minimize the amount of padding required: rather than adding padding before and after the mm_count field, padding is only added after mm_count. Note that I noticed this odd comment in mm_struct: commit 2e3025434a6b ("mm: relocate 'write_protect_seq' in struct mm_struct") /* * With some kernel config, the current mmap_lock's offset * inside 'mm_struct' is at 0x120, which is very optimal, as * its two hot fields 'count' and 'owner' sit in 2 different * cachelines, and when mmap_lock is highly contended, both * of the 2 fields will be accessed frequently, current layout * will help to reduce cache bouncing. * * So please be careful with adding new fields before * mmap_lock, which can easily push the 2 fields into one * cacheline. */ struct rw_semaphore mmap_lock; This comment is rather odd for a few reasons: - It requires addition/removal of mm_struct fields to carefully consider field alignment of _other_ fields, - It expresses the wish to keep an "optimal" alignment for a specific kernel config. I suspect that the author of this comment may want to revisit this topic and perhaps introduce a split-struct approach for struct rw_semaphore, if the need is to place various fields of this structure in different cache lines. Link: https://lkml.kernel.org/r/20230515143536.114960-1-mathieu.desnoyers@efficios.com Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid") Fixes: af7f588d8f73 ("sched: Introduce per-memory-map concurrency ID") Link: https://lore.kernel.org/lkml/7a0c1db1-103d-d518-ed96-1584a28fbf32@efficios.com Reported-by: kernel test robot <yujie.liu@intel.com> Link: https://lore.kernel.org/oe-lkp/202305151017.27581d75-yujie.liu@intel.com Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Aaron Lu <aaron.lu@intel.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Olivier Dion <odion@efficios.com> Cc: <michael.christie@oracle.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm/memcg: remove return value of mem_cgroup_scan_tasks()ZhangPeng
No user checks the return value of mem_cgroup_scan_tasks(). Make the return value void. Link: https://lkml.kernel.org/r/20230616063030.977586-1-zhangpeng362@huawei.com Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm/damon/core-test: add a test for damon_set_attrs()SeongJae Park
Commit 5ff6e2fff88e ("mm/damon/core: fix divide error in damon_nr_accesses_to_accesses_bp()") fixed a bug by adding arguments validation in damon_set_attrs(). Add a unit test for the added validation to ensure the bug cannot occur again. Link: https://lkml.kernel.org/r/20230615183323.87561-1-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm: remove is_longterm_pinnable_page() and reimplement ↵Vishal Moola (Oracle)
folio_is_longterm_pinnable() folio_is_longterm_pinnable() already exists as a wrapper function. Now that the whole implementation of is_longterm_pinnable_page() can be implemented using folios, folio_is_longterm_pinnable() can be made its own standalone function - and we can remove is_longterm_pinnable_page(). Link: https://lkml.kernel.org/r/20230614021312.34085-6-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm/gup.c: reorganize try_get_folio()Vishal Moola (Oracle)
try_get_folio() takes in a page, then chooses to do some folio operations based on the flags (either FOLL_GET or FOLL_PIN). We can rewrite this function to be more purpose oriented. After calling try_get_folio(), if neither FOLL_GET nor FOLL_PIN are set, warn and fail. If FOLL_GET is set we can return the result. If FOLL_GET is not set then FOLL_PIN is set, so we pin the folio. This change assists with folio conversions, and makes the function more readable. Link: https://lkml.kernel.org/r/20230614021312.34085-5-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm/gup_test.c: convert verify_dma_pinned() to us foliosVishal Moola (Oracle)
verify_dma_pinned() checks that pages are dma-pinned. We can convert this to use folios. Link: https://lkml.kernel.org/r/20230614021312.34085-4-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mmzone: introduce folio_migratetype()Vishal Moola (Oracle)
Introduce folio_migratetype() as a folio equivalent for get_pageblock_migratetype(). This function intends to return the migratetype the folio is located in, hence the name choice. Link: https://lkml.kernel.org/r/20230614021312.34085-3-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mmzone: introduce folio_is_zone_movable()Vishal Moola (Oracle)
Patch series "Replace is_longterm_pinnable_page()", v2. This patchset introduces some more helper functions for the folio conversions, and converts all callers of is_longterm_pinnable_page() to use folios. This patch (of 5): Introduce folio_is_zone_movable() to act as a folio equivalent for is_zone_movable_page(). This is to assist in later folio conversions. Link: https://lkml.kernel.org/r/20230614021312.34085-1-vishal.moola@gmail.com Link: https://lkml.kernel.org/r/20230614021312.34085-2-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19kasan: add support for kasan.fault=panic_on_writeMarco Elver
KASAN's boot time kernel parameter 'kasan.fault=' currently supports 'report' and 'panic', which results in either only reporting bugs or also panicking on reports. However, some users may wish to have more control over when KASAN reports result in a kernel panic: in particular, KASAN reported invalid _writes_ are of special interest, because they have greater potential to corrupt random kernel memory or be more easily exploited. To panic on invalid writes only, introduce 'kasan.fault=panic_on_write', which allows users to choose to continue running on invalid reads, but panic only on invalid writes. Link: https://lkml.kernel.org/r/20230614095158.1133673-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Aleksandr Nogikh <nogikh@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Taras Madan <tarasmadan@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19zram: further limit recompression thresholdSergey Senozhatsky
Recompression threshold should be below huge-size-class watermark. Any object larger than huge-size-class is a "huge object" and occupies a whole physical page on the zsmalloc side, in other words it's incompressible, as far as zsmalloc is concerned. Link: https://lkml.kernel.org/r/20230614141338.3480029-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Suggested-by: Brian Geffon <bgeffon@google.com> Acked-by: Brian Geffon <bgeffon@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm: zswap: invaldiate entry after writebackDomenico Cerasuolo
When an entry started writeback, it used to be invalidated with ref count logic alone, meaning that it would stay on the tree until all references were put. The problem with this behavior is that as soon as the writeback started, the ownership of the data held by the entry is passed to the swapcache and should not be left in zswap too. Currently there are no known issues because of this, but this change explicitly invalidates an entry that started writeback to reduce opportunities for future bugs. This patch is a follow up on the series titled "mm: zswap: move writeback LRU from zpool to zswap" + commit f090b7949768("mm: zswap: support exclusive loads"). Link: https://lkml.kernel.org/r/20230614143122.74471-1-cerasuolodomenico@gmail.com Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm: kill lock|unlock_page_memcg()Kefeng Wang
Since commit c7c3dec1c9db ("mm: rmap: remove lock_page_memcg()"), no more user, kill lock_page_memcg() and unlock_page_memcg(). Link: https://lkml.kernel.org/r/20230614143612.62575-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm/page_owner/cma: show pfn in cma/page_owner with hex formatKassey Li
cma: display pfn as well as pfn_to_page(pfn) page_owner: display pfn in hex rather than decimal Link: https://lkml.kernel.org/r/20230613092533.15449-1-quic_yingangl@quicinc.com Signed-off-by: Kassey Li <quic_yingangl@quicinc.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19buffer: convert block_truncate_page() to use a folioMatthew Wilcox (Oracle)
Support large folios in block_truncate_page() and avoid three hidden calls to compound_head(). [willy@infradead.org: fix check of filemap_grab_folio() return value] Link: https://lkml.kernel.org/r/ZItZOt+XxV12HtzL@casper.infradead.org Link: https://lkml.kernel.org/r/20230612210141.730128-15-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19buffer: use a folio in __find_get_block_slow()Matthew Wilcox (Oracle)
Saves a call to compound_head() and may be needed to support block size > PAGE_SIZE. Link: https://lkml.kernel.org/r/20230612210141.730128-14-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19buffer: convert link_dev_buffers to take a folioMatthew Wilcox (Oracle)
Its one caller already has a folio, so switch it to use the folio API. Removes a hidden call to compound_head(). Link: https://lkml.kernel.org/r/20230612210141.730128-13-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19buffer: convert init_page_buffers() to folio_init_buffers()Matthew Wilcox (Oracle)
Use the folio API and pass the folio from both callers. Saves a hidden call to compound_head(). Link: https://lkml.kernel.org/r/20230612210141.730128-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19buffer: convert grow_dev_page() to use a folioMatthew Wilcox (Oracle)
Get a folio from the page cache instead of a page, then use the folio API throughout. Removes a few calls to compound_head() and may be needed to support block size > PAGE_SIZE. Link: https://lkml.kernel.org/r/20230612210141.730128-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19buffer: convert page_zero_new_buffers() to folio_zero_new_buffers()Matthew Wilcox (Oracle)
Most of the callers already have a folio; convert reiserfs_write_end() to have a folio. Removes a couple of hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20230612210141.730128-10-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19buffer: convert __block_commit_write() to take a folioMatthew Wilcox (Oracle)
This removes a hidden call to compound_head() inside __block_commit_write() and moves it to those callers which are still page based. Also make block_write_end() safe for large folios. Link: https://lkml.kernel.org/r/20230612210141.730128-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19buffer: convert block_page_mkwrite() to use a folioMatthew Wilcox (Oracle)
If any page in a folio is dirtied, dirty the entire folio. Removes a number of hidden calls to compound_head() and references to page->mapping and page->index. Fixes a pre-existing bug where we could mark a folio as dirty if the file is truncated to a multiple of the page size just as we take the page fault. I don't believe this bug has any bad effect, it's just inefficient. Link: https://lkml.kernel.org/r/20230612210141.730128-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19buffer: make block_write_full_page() handle large folios correctlyMatthew Wilcox (Oracle)
Keep the interface as struct page, but work entirely on the folio internally. Removes several PAGE_SIZE assumptions and removes some references to page->index and page->mapping. Link: https://lkml.kernel.org/r/20230612210141.730128-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Bob Peterson <rpeterso@redhat.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19gfs2: support ludicrously large folios in gfs2_trans_add_databufs()Matthew Wilcox (Oracle)
We may someday support folios larger than 4GB, so use a size_t for the byte count within a folio to prevent unpleasant truncations. Link: https://lkml.kernel.org/r/20230612210141.730128-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19buffer: convert __block_write_full_page() to __block_write_full_folio()Matthew Wilcox (Oracle)
Remove nine hidden calls to compound_head() by using a folio instead of a page. Link: https://lkml.kernel.org/r/20230612210141.730128-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Bob Peterson <rpeterso@redhat.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19gfs2: convert gfs2_write_jdata_page() to gfs2_write_jdata_folio()Matthew Wilcox (Oracle)
Add support for large folios and remove some accesses to page->mapping and page->index. Link: https://lkml.kernel.org/r/20230612210141.730128-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19gfs2: pass a folio to __gfs2_jdata_write_folio()Matthew Wilcox (Oracle)
Remove a couple of folio->page conversions in the callers, and two calls to compound_head() in the function itself. Rename it from __gfs2_jdata_writepage() to __gfs2_jdata_write_folio(). Link: https://lkml.kernel.org/r/20230612210141.730128-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19gfs2: use a folio inside gfs2_jdata_writepage()Matthew Wilcox (Oracle)
Patch series "gfs2/buffer folio changes for 6.5", v3. This kind of started off as a gfs2 patch series, then became entwined with buffer heads once I realised that gfs2 was the only remaining caller of __block_write_full_page(). For those not in the gfs2 world, the big point of this series is that block_write_full_page() should now handle large folios correctly. This patch (of 14): Replace a few implicit calls to compound_head() with one explicit one. Link: https://lkml.kernel.org/r/20230612210141.730128-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230612210141.730128-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm/khugepaged: use DEFINE_READ_MOSTLY_HASHTABLE macroNick Desaulniers
These are equivalent, but DEFINE_READ_MOSTLY_HASHTABLE exists to define a hashtable in the .data..read_mostly section. Link: https://lkml.kernel.org/r/20230609-khugepage-v1-1-dad4e8382298@google.com Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19percpu-internal/pcpu_chunk: re-layout pcpu_chunk structure to reduce false ↵Yu Ma
sharing When running UnixBench/Execl throughput case, false sharing is observed due to frequent read on base_addr and write on free_bytes, chunk_md. UnixBench/Execl represents a class of workload where bash scripts are spawned frequently to do some short jobs. It will do system call on execl frequently, and execl will call mm_init to initialize mm_struct of the process. mm_init will call __percpu_counter_init for percpu_counters initialization. Then pcpu_alloc is called to read the base_addr of pcpu_chunk for memory allocation. Inside pcpu_alloc, it will call pcpu_alloc_area to allocate memory from a specified chunk. This function will update "free_bytes" and "chunk_md" to record the rest free bytes and other meta data for this chunk. Correspondingly, pcpu_free_area will also update these 2 members when free memory. Call trace from perf is as below: + 57.15% 0.01% execl [kernel.kallsyms] [k] __percpu_counter_init + 57.13% 0.91% execl [kernel.kallsyms] [k] pcpu_alloc - 55.27% 54.51% execl [kernel.kallsyms] [k] osq_lock - 53.54% 0x654278696e552f34 main __execve entry_SYSCALL_64_after_hwframe do_syscall_64 __x64_sys_execve do_execveat_common.isra.47 alloc_bprm mm_init __percpu_counter_init pcpu_alloc - __mutex_lock.isra.17 In current pcpu_chunk layout, `base_addr' is in the same cache line with `free_bytes' and `chunk_md', and `base_addr' is at the last 8 bytes. This patch moves `bound_map' up to `base_addr', to let `base_addr' locate in a new cacheline. With this change, on Intel Sapphire Rapids 112C/224T platform, based on v6.4-rc4, the 160 parallel score improves by 24%. The pcpu_chunk struct is a backing data structure per chunk, so the additional memory should not be dramatic. A chunk covers ballpark between 64kb and 512kb memory depending on some config and boot time stuff, so I believe the additional memory used here is nominal at best. Working the #s on my desktop: Percpu: 58624 kB 28 cores -> ~2.1MB of percpu memory. At say ~128KB per chunk -> 33 chunks, generously 40 chunks. Adding alignment might bump the chunk size ~64 bytes, so in total ~2KB of overhead? I believe we can do a little better to avoid eating that full padding, so likely less than that. [dennis@kernel.org: changelog details] Link: https://lkml.kernel.org/r/20230610030730.110074-1-yu.ma@intel.com Signed-off-by: Yu Ma <yu.ma@intel.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Acked-by: Dennis Zhou <dennis@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19memory tier: remove unneeded !IS_ENABLED(CONFIG_MIGRATION) checkMiaohe Lin
establish_demotion_targets() is defined while CONFIG_MIGRATION is enabled. There's no need to check it again. Link: https://lkml.kernel.org/r/20230610034114.981861-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm: compaction: mark kcompactd_run() and kcompactd_stop() __meminitMiaohe Lin
Add __meminit to kcompactd_run() and kcompactd_stop() to ensure they're default to __init when memory hotplug is not enabled. Link: https://lkml.kernel.org/r/20230610034615.997813-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm: remove unused vma_init_lock()YueHaibing
commit c7f8f31c00d1 ("mm: separate vma->lock from vm_area_struct") left this behind. Link: https://lkml.kernel.org/r/20230610101956.20592-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19kernel: pid_namespace: remove unused set_memfd_noexec_scope()YueHaibing
This inline function is unused, remove it. Link: https://lkml.kernel.org/r/20230610102858.31488-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Cc: Jeff Xu <jeffxu@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19userfaultfd: fix regression in userfaultfd_unmap_prep()Liam R. Howlett
Android reported a performance regression in the userfaultfd unmap path. A closer inspection on the userfaultfd_unmap_prep() change showed that a second tree walk would be necessary in the reworked code. Fix the regression by passing each VMA that will be unmapped through to the userfaultfd_unmap_prep() function as they are added to the unmap list, instead of re-walking the tree for the VMA. Link: https://lkml.kernel.org/r/20230601015402.2819343-1-Liam.Howlett@oracle.com Fixes: 69dbe6daf104 ("userfaultfd: use maple tree iterator to iterate VMAs") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm/folio: replace set_compound_order with folio_set_orderTarun Sahu
The patch ("mm/folio: Avoid special handling for order value 0 in folio_set_order") [1] removed the need for special handling of order = 0 in folio_set_order. Now, folio_set_order and set_compound_order becomes similar function. This patch removes the set_compound_order and uses folio_set_order instead. [1] https://lore.kernel.org/all/20230609183032.13E08C433D2@smtp.kernel.org/ Link: https://lkml.kernel.org/r/20230612093514.689846-1-tsahu@linux.ibm.com Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com> Reviewed-by Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm: zswap: remove zswap_headerDomenico Cerasuolo
Previously, zswap_header served the purpose of storing the swpentry within zpool pages. This allowed zpool implementations to pass relevant information to the writeback function. However, with the current implementation, writeback is directly handled within zswap. Consequently, there is no longer a necessity for zswap_header, as the swp_entry_t can be stored directly in zswap_entry. Link: https://lkml.kernel.org/r/20230612093815.133504-8-cerasuolodomenico@gmail.com Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Tested-by: Yosry Ahmed <yosryahmed@google.com> Suggested-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm: zswap: simplify writeback functionDomenico Cerasuolo
zswap_writeback_entry() used to be a callback for the backends, which don't know about struct zswap_entry. Now that the only user is the generic zswap LRU reclaimer, it can be simplified: pass the pinned zswap_entry directly, and consolidate the refcount management in the shrink function. Link: https://lkml.kernel.org/r/20230612093815.133504-7-cerasuolodomenico@gmail.com Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Tested-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm: zswap: remove shrink from zpool interfaceDomenico Cerasuolo
Now that all three zswap backends have removed their shrink code, it is no longer necessary for the zpool interface to include shrink/writeback endpoints. Link: https://lkml.kernel.org/r/20230612093815.133504-6-cerasuolodomenico@gmail.com Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Nhat Pham <nphamcs@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm: zswap: remove page reclaim logic from zsmallocDomenico Cerasuolo
Switch zsmalloc to the new generic zswap LRU and remove its custom implementation. Link: https://lkml.kernel.org/r/20230612093815.133504-5-cerasuolodomenico@gmail.com Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Nhat Pham <nphamcs@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Tested-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm: zswap: remove page reclaim logic from z3foldDomenico Cerasuolo
Switch z3fold to the new generic zswap LRU and remove its custom implementation. Link: https://lkml.kernel.org/r/20230612093815.133504-4-cerasuolodomenico@gmail.com Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm: zswap: remove page reclaim logic from zbudDomenico Cerasuolo
Switch zbud to the new generic zswap LRU and remove its custom implementation. Link: https://lkml.kernel.org/r/20230612093815.133504-3-cerasuolodomenico@gmail.com Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm: zswap: add pool shrinking mechanismDomenico Cerasuolo
Patch series "mm: zswap: move writeback LRU from zpool to zswap", v3. This series aims to improve the zswap reclaim mechanism by reorganizing the LRU management. In the current implementation, the LRU is maintained within each zpool driver, resulting in duplicated code across the three drivers. The proposed change consists in moving the LRU management from the individual implementations up to the zswap layer. The primary objective of this refactoring effort is to simplify the codebase. By unifying the reclaim loop and consolidating LRU handling within zswap, we can eliminate redundant code and improve maintainability. Additionally, this change enables the reclamation of stored pages in their actual LRU order. Presently, the zpool drivers link backing pages in an LRU, causing compressed pages with different LRU positions to be written back simultaneously. The series consists of several patches. The first patch implements the LRU and the reclaim loop in zswap, but it is not used yet because all three driver implementations are marked as zpool_evictable. The following three commits modify each zpool driver to be not zpool_evictable, allowing the use of the reclaim loop in zswap. As the drivers removed their shrink functions, the zpool interface is then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink. Finally, the code in zswap is further cleaned up by simplifying the writeback function and removing the now unnecessary zswap_header. This patch (of 7): Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink function, which is called from zpool_shrink. However, with this commit, a unified shrink function is added to zswap. The ultimate goal is to eliminate the need for zpool_shrink once all zpool implementations have dropped their shrink code. To ensure the functionality of each commit, this change focuses solely on adding the mechanism itself. No modifications are made to the backends, meaning that functionally, there are no immediate changes. The zswap mechanism will only come into effect once the backends have removed their shrink code. The subsequent commits will address the modifications needed in the backends. Link: https://lkml.kernel.org/r/20230612093815.133504-1-cerasuolodomenico@gmail.com Link: https://lkml.kernel.org/r/20230612093815.133504-2-cerasuolodomenico@gmail.com Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Acked-by: Nhat Pham <nphamcs@gmail.com> Tested-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19selftests: mm: remove duplicate unneeded definesMuhammad Usama Anjum
Remove all defines which aren't needed after correctly including the kernel header files. Link: https://lkml.kernel.org/r/20230612095347.996335-2-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: David Hildenbrand <david@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Stefan Roesch <shr@devkernel.io> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19selftests: mm: remove wrong kernel header inclusionMuhammad Usama Anjum
It is wrong to include unprocessed user header files directly. They are processed to "<source_tree>/usr/include" by running "make headers" and they are included in selftests by kselftest makefiles automatically with help of KHDR_INCLUDES variable. These headers should always bulilt first before building kselftests. Link: https://lkml.kernel.org/r/20230612095347.996335-1-usama.anjum@collabora.com Fixes: 07115fcc15b4 ("selftests/mm: add new selftests for KSM") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Stefan Roesch <shr@devkernel.io> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm: ptep_get() conversionRyan Roberts
Convert all instances of direct pte_t* dereferencing to instead use ptep_get() helper. This means that by default, the accesses change from a C dereference to a READ_ONCE(). This is technically the correct thing to do since where pgtables are modified by HW (for access/dirty) they are volatile and therefore we should always ensure READ_ONCE() semantics. But more importantly, by always using the helper, it can be overridden by the architecture to fully encapsulate the contents of the pte. Arch code is deliberately not converted, as the arch code knows best. It is intended that arch code (arm64) will override the default with its own implementation that can (e.g.) hide certain bits from the core code, or determine young/dirty status by mixing in state from another source. Conversion was done using Coccinelle: ---- // $ make coccicheck \ // COCCI=ptepget.cocci \ // SPFLAGS="--include-headers" \ // MODE=patch virtual patch @ depends on patch @ pte_t *v; @@ - *v + ptep_get(v) ---- Then reviewed and hand-edited to avoid multiple unnecessary calls to ptep_get(), instead opting to store the result of a single call in a variable, where it is correct to do so. This aims to negate any cost of READ_ONCE() and will benefit arch-overrides that may be more complex. Included is a fix for an issue in an earlier version of this patch that was pointed out by kernel test robot. The issue arose because config MMU=n elides definition of the ptep helper functions, including ptep_get(). HUGETLB_PAGE=n configs still define a simple huge_ptep_clear_flush() for linking purposes, which dereferences the ptep. So when both configs are disabled, this caused a build error because ptep_get() is not defined. Fix by continuing to do a direct dereference when MMU=n. This is safe because for this config the arch code cannot be trying to virtualize the ptes because none of the ptep helpers are defined. Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.com Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/ Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: SeongJae Park <sj@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm: move ptep_get() and pmdp_get() helpersRyan Roberts
There are many call sites that directly dereference a pte_t pointer. This makes it very difficult to properly encapsulate a page table in the arch code without having to allocate shadow page tables. We will shortly solve this by replacing all the call sites with ptep_get() calls. But there are call sites above the function definition in the header file, so let's move ptep_get() to an earlier location to solve that problem. And move pmdp_get() at the same time to keep it close to ptep_get(). Link: https://lkml.kernel.org/r/20230612151545.3317766-3-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: kernel test robot <lkp@intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: SeongJae Park <sj@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm: ptdump should use ptep_get_lockless()Ryan Roberts
Patch series "Encapsulate PTE contents from non-arch code", v3. A series to improve the encapsulation of pte entries by disallowing non-arch code from directly dereferencing pte_t pointers. This means that by default, the accesses change from a C dereference to a READ_ONCE(). This is technically the correct thing to do since where pgtables are modified by HW (for access/dirty) they are volatile and therefore we should always ensure READ_ONCE() semantics. But more importantly, by always using the helper, it can be overridden by the architecture to fully encapsulate the contents of the pte. Arch code is deliberately not converted, as the arch code knows best. It is intended that arch code (arm64) will override the default with its own implementation that can (e.g.) hide certain bits from the core code, or determine young/dirty status by mixing in state from another source. This patch (of 3): The page table dumper uses walk_page_range_novma() to walk the page tables, which does not lock the PTL before calling the pte_entry() callback. Therefore, the page table dumper's callback must use ptep_get_lockless() rather than ptep_get() to ensure that the pte it reads is not torn or otherwise corrupt when racing with writers. Link: https://lkml.kernel.org/r/20230612151545.3317766-1-ryan.roberts@arm.com Link: https://lkml.kernel.org/r/20230612151545.3317766-2-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: SeongJae Park <sj@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yu Zhao <yuzhao@google.com> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19sh: move the ARCH_DMA_MINALIGN definition to asm/cache.hCatalin Marinas
The sh architecture defines ARCH_DMA_MINALIGN in asm/page.h. Move it to asm/cache.h to allow a generic ARCH_DMA_MINALIGN definition in linux/cache.h without redefine errors/warnings. Link: https://lkml.kernel.org/r/20230613155245.1228274-4-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: kernel test robot <lkp@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19microblaze: move the ARCH_{DMA,SLAB}_MINALIGN definitions to asm/cache.hCatalin Marinas
The microblaze architecture defines ARCH_DMA_MINALIGN in asm/page.h. Move it to asm/cache.h to allow a generic ARCH_DMA_MINALIGN definition in linux/cache.h without redefine errors/warnings. While at it, also move ARCH_SLAB_MINALIGN to asm/cache.h for consistency. Link: https://lkml.kernel.org/r/20230613155245.1228274-3-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: kernel test robot <lkp@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Rich Felker <dalias@libc.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19powerpc: move the ARCH_DMA_MINALIGN definition to asm/cache.hCatalin Marinas
Patch series "Move the ARCH_DMA_MINALIGN definition to asm/cache.h". The ARCH_KMALLOC_MINALIGN reduction series defines a generic ARCH_DMA_MINALIGN in linux/cache.h: https://lore.kernel.org/r/20230612153201.554742-2-catalin.marinas@arm.com/ Unfortunately, this causes a duplicate definition warning for microblaze, powerpc (32-bit only) and sh as these architectures define ARCH_DMA_MINALIGN in a different file than asm/cache.h. Move the macro to asm/cache.h to avoid this issue and also bring them in line with the other architectures. This patch (of 3): The powerpc architecture defines ARCH_DMA_MINALIGN in asm/page_32.h and only if CONFIG_NOT_COHERENT_CACHE is enabled (32-bit platforms only). Move this macro to asm/cache.h to allow a generic ARCH_DMA_MINALIGN definition in linux/cache.h without redefine errors/warnings. Link: https://lkml.kernel.org/r/20230613155245.1228274-1-catalin.marinas@arm.com Link: https://lkml.kernel.org/r/20230613155245.1228274-2-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202306131053.1ybvRRhO-lkp@intel.com/ Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Michal Simek <monstr@monstr.eu> Cc: Rich Felker <dalias@libc.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>