summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-11-07mm: mass constification of folio/page pointersMatthew Wilcox (Oracle)
Now that page_pgoff() takes const pointers, we can constify the pointers to a lot of functions. Link: https://lkml.kernel.org/r/20241005200121.3231142-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07mm: renovate page_address_in_vma()Matthew Wilcox (Oracle)
This function doesn't modify any of its arguments, so if we make a few other functions take const pointers, we can make page_address_in_vma() take const pointers too. All of its callers have the containing folio already, so pass that in as an argument instead of recalculating it. Also add kernel-doc Link: https://lkml.kernel.org/r/20241005200121.3231142-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07mm: use page_pgoff() in more placesMatthew Wilcox (Oracle)
There are several places which currently open-code page_pgoff(), convert them to call it. Link: https://lkml.kernel.org/r/20241005200121.3231142-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07mm: convert page_to_pgoff() to page_pgoff()Matthew Wilcox (Oracle)
Patch series "page->index removals in mm", v2. As part of shrinking struct page, we need to stop using page->index. This patchset gets rid of most of the remaining references to page->index in mm, as well as increasing the number of functions which take a const folio/page pointer. It shrinks the text segment of mm by a few hundred bytes in my test config, probably mostly from removing calls to compound_head() in page_to_pgoff(). This patch (of 7): Change the function signature to pass in the folio as all three callers have it. This removes a reference to page->index, which we're trying to get rid of. And add kernel-doc. Link: https://lkml.kernel.org/r/20241005200121.3231142-1-willy@infradead.org Link: https://lkml.kernel.org/r/20241005200121.3231142-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07mm/zsmalloc: use memcpy_from/to_page whereever possiblePintu Kumar
As part of "zsmalloc: replace kmap_atomic with kmap_local_page" [1] we replaced kmap/kunmap_atomic() with kmap_local_page()/kunmap_local(). But later it was found that some of the code could be replaced with already available apis in highmem.h, such as memcpy_from_page()/memcpy_to_page(). Also, update the comments with correct api naming. [1] https://lkml.kernel.org/r/20241001175358.12970-1-quic_pintu@quicinc.com Link: https://lkml.kernel.org/r/20241010175143.27262-1-quic_pintu@quicinc.com Signed-off-by: Pintu Kumar <quic_pintu@quicinc.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Suggested-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Joe Perches <joe@perches.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Pintu Agarwal <pintu.ping@gmail.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07zsmalloc: replace kmap_atomic with kmap_local_pagePintu Kumar
The use of kmap_atomic/kunmap_atomic is deprecated. Replace it will kmap_local_page/kunmap_local all over the place. Also fix SPDX missing license header. WARNING: Missing or malformed SPDX-License-Identifier tag in line 1 WARNING: Deprecated use of 'kmap_atomic', prefer 'kmap_local_page' instead + vaddr = kmap_atomic(page); Link: https://lkml.kernel.org/r/20241001175358.12970-1-quic_pintu@quicinc.com Signed-off-by: Pintu Kumar <quic_pintu@quicinc.com> Cc: Joe Perches <joe@perches.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Pintu Agarwal <pintu.ping@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07mm/codetag: uninline and move pgalloc_tag_copy and pgalloc_tag_splitSuren Baghdasaryan
pgalloc_tag_copy() and pgalloc_tag_split() are sizable and outside of any performance-critical paths, so it should be fine to uninline them. Also move their declarations into pgalloc_tag.h which seems like a more appropriate place for them. No functional changes other than uninlining. Link: https://lkml.kernel.org/r/20241024162318.1640781-1-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Yu Zhao <yuzhao@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Sourav Panda <souravpanda@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07alloc_tag: support for page allocation tag compressionSuren Baghdasaryan
Implement support for storing page allocation tag references directly in the page flags instead of page extensions. sysctl.vm.mem_profiling boot parameter it extended to provide a way for a user to request this mode. Enabling compression eliminates memory overhead caused by page_ext and results in better performance for page allocations. However this mode will not work if the number of available page flag bits is insufficient to address all kernel allocations. Such condition can happen during boot or when loading a module. If this condition is detected, memory allocation profiling gets disabled with an appropriate warning. By default compression mode is disabled. Link: https://lkml.kernel.org/r/20241023170759.999909-7-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Sourav Panda <souravpanda@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Huth <thuth@redhat.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xiongwei Song <xiongwei.song@windriver.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07alloc_tag: introduce pgtag_ref_handle to abstract page tag referencesSuren Baghdasaryan
To simplify later changes to page tag references, introduce new pgtag_ref_handle type. This allows easy replacement of page_ext as a storage of page allocation tags. Link: https://lkml.kernel.org/r/20241023170759.999909-6-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Sourav Panda <souravpanda@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Huth <thuth@redhat.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xiongwei Song <xiongwei.song@windriver.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07alloc_tag: populate memory for module tags as neededSuren Baghdasaryan
The memory reserved for module tags does not need to be backed by physical pages until there are tags to store there. Change the way we reserve this memory to allocate only virtual area for the tags and populate it with physical pages as needed when we load a module. [surenb@google.com: avoid execmem_vmap() when !MMU] Link: https://lkml.kernel.org/r/20241031233611.3833002-1-surenb@google.com Link: https://lkml.kernel.org/r/20241023170759.999909-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Sourav Panda <souravpanda@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Huth <thuth@redhat.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xiongwei Song <xiongwei.song@windriver.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07alloc_tag: load module tags into separate contiguous memorySuren Baghdasaryan
When a module gets unloaded there is a possibility that some of the allocations it made are still used and therefore the allocation tags corresponding to these allocations are still referenced. As such, the memory for these tags can't be freed. This is currently handled as an abnormal situation and module's data section is not being unloaded. To handle this situation without keeping module's data in memory, allow codetags with longer lifespan than the module to be loaded into their own separate memory. The in-use memory areas and gaps after module unloading in this separate memory are tracked using maple trees. Allocation tags arrange their separate memory so that it is virtually contiguous and that will allow simple allocation tag indexing later on in this patchset. The size of this virtually contiguous memory is set to store up to 100000 allocation tags. [surenb@google.com: fix empty codetag module section handling] Link: https://lkml.kernel.org/r/20241101000017.3856204-1-surenb@google.com [akpm@linux-foundation.org: update comment, per Dan] Link: https://lkml.kernel.org/r/20241023170759.999909-4-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Sourav Panda <souravpanda@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Huth <thuth@redhat.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xiongwei Song <xiongwei.song@windriver.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07alloc_tag: introduce shutdown_mem_profiling helper functionSuren Baghdasaryan
Implement a helper function to disable memory allocation profiling and use it when creation of /proc/allocinfo fails. Ensure /proc/allocinfo does not get created when memory allocation profiling is disabled. Link: https://lkml.kernel.org/r/20241023170759.999909-3-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Sourav Panda <souravpanda@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Huth <thuth@redhat.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xiongwei Song <xiongwei.song@windriver.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07maple_tree: add mas_for_each_rev() helperSuren Baghdasaryan
Patch series "page allocation tag compression", v4. This patchset implements several improvements: 1. Gracefully handles module unloading while there are used allocations allocated from that module; 2. Provides an option to store page allocation tag references in the page flags, removing dependency on page extensions and eliminating the memory overhead from storing page allocation references (~0.2% of total system memory). This also improves page allocation performance when CONFIG_MEM_ALLOC_PROFILING is enabled by eliminating page extension lookup. Page allocation performance overhead is reduced from 41% to 5.5%. Patch #1 introduces mas_for_each_rev() helper function. Patch #2 introduces shutdown_mem_profiling() helper function to be used when disabling memory allocation profiling. Patch #3 copies module tags into virtually contiguous memory which serves two purposes: - Lets us deal with the situation when module is unloaded while there are still live allocations from that module. Since we are using a copy version of the tags we can safely unload the module. Space and gaps in this contiguous memory are managed using a maple tree. - Enables simple indexing of the tags in the later patches. Patch #4 changes the way we allocate virtually contiguous memory for module tags to reserve only vitrual area and populate physical pages only as needed at module load time. Patch #5 abstracts page allocation tag reference to simplify later changes. Patch #6 adds compression option to the sysctl.vm.mem_profiling boot parameter for storing page allocation tag references inside page flags if they fit. If the number of available page flag bits is insufficient to address all kernel allocations, memory allocation profiling gets disabled with an appropriate warning. This patch (of 6): Add mas_for_each_rev() function to iterate maple tree nodes in reverse order. Link: https://lkml.kernel.org/r/20241023170759.999909-1-surenb@google.com Link: https://lkml.kernel.org/r/20241023170759.999909-2-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Sourav Panda <souravpanda@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Huth <thuth@redhat.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xiongwei Song <xiongwei.song@windriver.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07x86/module: enable ROX caches for module text on 64 bitMike Rapoport (Microsoft)
Enable execmem's cache of PMD_SIZE'ed pages mapped as ROX for module text allocations on 64 bit. Link: https://lkml.kernel.org/r/20241023162711.2579610-9-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: kdevops <kdevops@lists.linux.dev> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Song Liu <song@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07execmem: add support for cache of large ROX pagesMike Rapoport (Microsoft)
Using large pages to map text areas reduces iTLB pressure and improves performance. Extend execmem_alloc() with an ability to use huge pages with ROX permissions as a cache for smaller allocations. To populate the cache, a writable large page is allocated from vmalloc with VM_ALLOW_HUGE_VMAP, filled with invalid instructions and then remapped as ROX. The direct map alias of that large page is exculded from the direct map. Portions of that large page are handed out to execmem_alloc() callers without any changes to the permissions. When the memory is freed with execmem_free() it is invalidated again so that it won't contain stale instructions. An architecture has to implement execmem_fill_trapping_insns() callback and select ARCH_HAS_EXECMEM_ROX configuration option to be able to use the ROX cache. The cache is enabled on per-range basis when an architecture sets EXECMEM_ROX_CACHE flag in definition of an execmem_range. Link: https://lkml.kernel.org/r/20241023162711.2579610-8-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: kdevops <kdevops@lists.linux.dev> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Song Liu <song@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07x86/module: prepare module loading for ROX allocations of textMike Rapoport (Microsoft)
When module text memory will be allocated with ROX permissions, the memory at the actual address where the module will live will contain invalid instructions and there will be a writable copy that contains the actual module code. Update relocations and alternatives patching to deal with it. [rppt@kernel.org: fix writable address in cfi_rewrite_endbr()] Link: https://lkml.kernel.org/r/ZysRwR29Ji8CcbXc@kernel.org Link: https://lkml.kernel.org/r/20241023162711.2579610-7-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: kdevops <kdevops@lists.linux.dev> Tested-by: Nathan Chancellor <nathan@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Song Liu <song@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07arch: introduce set_direct_map_valid_noflush()Mike Rapoport (Microsoft)
Add an API that will allow updates of the direct/linear map for a set of physically contiguous pages. It will be used in the following patches. Link: https://lkml.kernel.org/r/20241023162711.2579610-6-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: kdevops <kdevops@lists.linux.dev> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Song Liu <song@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07module: prepare to handle ROX allocations for textMike Rapoport (Microsoft)
In order to support ROX allocations for module text, it is necessary to handle modifications to the code, such as relocations and alternatives patching, without write access to that memory. One option is to use text patching, but this would make module loading extremely slow and will expose executable code that is not finally formed. A better way is to have memory allocated with ROX permissions contain invalid instructions and keep a writable, but not executable copy of the module text. The relocations and alternative patches would be done on the writable copy using the addresses of the ROX memory. Once the module is completely ready, the updated text will be copied to ROX memory using text patching in one go and the writable copy will be freed. Add support for that to module initialization code and provide necessary interfaces in execmem. Link: https://lkml.kernel.org/r/20241023162711.2579610-5-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewd-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: kdevops <kdevops@lists.linux.dev> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Song Liu <song@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07asm-generic: introduce text-patching.hMike Rapoport (Microsoft)
Several architectures support text patching, but they name the header files that declare patching functions differently. Make all such headers consistently named text-patching.h and add an empty header in asm-generic for architectures that do not support text patching. Link: https://lkml.kernel.org/r/20241023162711.2579610-4-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: kdevops <kdevops@lists.linux.dev> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Song Liu <song@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07mm: vmalloc: don't account for number of nodes for HUGE_VMAP allocationsMike Rapoport (Microsoft)
vmalloc allocations with VM_ALLOW_HUGE_VMAP that do not explicitly specify node ID will use huge pages only if size_per_node is larger than a huge page. Still the actual allocated memory is not distributed between nodes and there is no advantage in such approach. On the contrary, BPF allocates SZ_2M * num_possible_nodes() for each new bpf_prog_pack, while it could do with a single huge page per pack. Don't account for number of nodes for VM_ALLOW_HUGE_VMAP with NUMA_NO_NODE and use huge pages whenever the requested allocation size is larger than a huge page. Link: https://lkml.kernel.org/r/20241023162711.2579610-3-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: kdevops <kdevops@lists.linux.dev> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Song Liu <song@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07mm: vmalloc: group declarations depending on CONFIG_MMU togetherMike Rapoport (Microsoft)
Patch series "x86/module: use large ROX pages for text allocations", v7. These patches add support for using large ROX pages for allocations of executable memory on x86. They address Andy's comments [1] about having executable mappings for code that was not completely formed. The approach taken is to allocate ROX memory along with writable but not executable memory and use the writable copy to perform relocations and alternatives patching. After the module text gets into its final shape, the contents of the writable memory is copied into the actual ROX location using text poking. The allocations of the ROX memory use vmalloc(VMAP_ALLOW_HUGE_MAP) to allocate PMD aligned memory, fill that memory with invalid instructions and in the end remap it as ROX. Portions of these large pages are handed out to execmem_alloc() callers without any changes to the permissions. When the memory is freed with execmem_free() it is invalidated again so that it won't contain stale instructions. The module memory allocation, x86 code dealing with relocations and alternatives patching take into account the existence of the two copies, the writable memory and the ROX memory at the actual allocated virtual address. [1] https://lore.kernel.org/all/a17c65c6-863f-4026-9c6f-a04b659e9ab4@app.fastmail.com This patch (of 8): There are a couple of declarations that depend on CONFIG_MMU in include/linux/vmalloc.h spread all over the file. Group them all together to improve code readability. No functional changes. Link: https://lkml.kernel.org/r/20241023162711.2579610-1-rppt@kernel.org Link: https://lkml.kernel.org/r/20241023162711.2579610-2-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: kdevops <kdevops@lists.linux.dev> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Song Liu <song@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06memcg: workingset: remove folio_memcg_rcu usageShakeel Butt
The function workingset_activation() is called from folio_mark_accessed() with the guarantee that the given folio can not be freed under us in workingset_activation(). In addition, the association of the folio and its memcg can not be broken here because charge migration is no more. There is no need to use folio_memcg_rcu. Simply use folio_memcg_charged() because that is what this function cares about. [akpm@linux-foundation.org: provide folio_memcg_charged stub for CONFIG_MEMCG=n] Link: https://lkml.kernel.org/r/20241026163707.2479526-1-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Suggested-by: Yu Zhao <yuzhao@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Hugh Dickins <hughd@google.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm/vma: the pgoff is correct if can_merge_rightWei Yang
By this point can_vma_merge_right() must have returned true, which implies can_vma_merge_before() also returned true, which already asserts that the pgoff is as expected for a merge with the following VMA, thus this assignment is redundant. Below is a more detail explanation. Current definition of can_vma_merge_right() is: static bool can_vma_merge_right(struct vma_merge_struct *vmg, bool can_merge_left) { if (!vmg->next || vmg->end != vmg->next->vm_start || !can_vma_merge_before(vmg)) return false; ... } And: static bool can_vma_merge_before(struct vma_merge_struct *vmg) { pgoff_t pglen = PHYS_PFN(vmg->end - vmg->start); ... if (vmg->next->vm_pgoff == vmg->pgoff + pglen) return true; ... } Which implies vmg->pgoff == vmg->next->vm_pgoff - pglen. None of these values are changed between the check and prior assignment, so this was an entirely redundant assignment. [akpm@linux-foundation.org: remove now-unused local] [lorenzo.stoakes@oracle.com: rephrase the changelog] Link: https://lkml.kernel.org/r/20241024093347.18057-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm: defer second attempt at merge on mmap()Lorenzo Stoakes
Rather than trying to merge again when ostensibly allocating a new VMA, instead defer until the VMA is added and attempt to merge the existing range. This way we have no complicated unwinding logic midway through the process of mapping the VMA. In addition this removes limitations on the VMA not being able to be the first in the virtual memory address space which was previously implicitly required. In theory, for this very same reason, we should unconditionally attempt merge here, however this is likely to have a performance impact so it is better to avoid this given the unlikely outcome of a merge. [lorenzo.stoakes@oracle.com: remove unnecessary indirection] Link: https://lkml.kernel.org/r/5106696d-e625-4d8a-8545-9d1430301730@lucifer.local Link: https://lkml.kernel.org/r/d4f84502605d7651ac114587f507395c0fc76004.1729858176.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm: remove unnecessary reset state logic on merge new VMALorenzo Stoakes
The only place where this was used was in mmap_region(), which we have now adjusted to not require this to be performed (we reset ourselves in effect). It also created a dangerous assumption that VMG state could be safely reused after a merge, at which point it may have been mutated in unexpected ways, leading to subtle bugs. Note that it was discovered by Wei Yang that there was also an error in this code - we are comparing vmg->vma with prev after setting it to NULL. This however had no impact, as we previously reset VMA iterator state before attempting merge again, but it was useless effort. In any case, this patch removes all of the logic so also eliminates this wasted effort. Link: https://lkml.kernel.org/r/5d9a59eee6498ae017cc87d89aa723de7179f75d.1729858176.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm: refactor __mmap_region()Lorenzo Stoakes
We have seen bugs and resource leaks arise from the complexity of the __mmap_region() function. This, and the generally deeply fragile error handling logic and complexity which makes understanding the function difficult make it highly desirable to refactor it into something readable. Achieve this by separating the function into smaller logical parts which are easier to understand and follow, and which importantly very significantly simplify the error handling. Note that we now call vms_abort_munmap_vmas() in more error paths than we used to, however in cases where no abort need occur, vms->nr_pages will be equal to zero and we simply exit this function without doing more than we would have done previously. Importantly, the invocation of the driver mmap hook via mmap_file() now has very simple and obvious handling (this was previously the most problematic part of the mmap() operation). Use a generalised stack-based 'mmap state' to thread through values and also retrieve state as needed. Also avoid ever relying on vma merge (vmg) state after a merge is attempted, instead maintain meaningful state in the mmap state and establish vmg state as and when required. This avoids any subtle bugs arising from merge logic mutating this state and mmap_region() logic later relying upon it. Link: https://lkml.kernel.org/r/25bd2edc3275450f448cbfe0756ce2a7cd06810f.1729858176.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm: isolate mmap internal logic to mm/vma.cLorenzo Stoakes
In previous commits we effected improvements to the mmap() logic in mmap_region() and its newly introduced internal implementation function __mmap_region(). However as these changes are intended to be backported, we kept the delta as small as is possible and made as few changes as possible to the newly introduced mm/vma.* files. Take the opportunity to move this logic to mm/vma.c which not only isolates it, but also makes it available for later userland testing which can help us catch such logic errors far earlier. Link: https://lkml.kernel.org/r/93fc2c3aa37dd30590b7e4ee067dfd832007bf7e.1729858176.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06tools: testing: add additional vma_internal.h stubsLorenzo Stoakes
Patch series "fix error handling in mmap_region() and refactor", v3. The mmap_region() function is somewhat terrifying, with spaghetti-like control flow and numerous means by which issues can arise and incomplete state, memory leaks and other unpleasantness can occur. This series goes to great lengths to simplify how mmap_region() works and to avoid unwinding errors late on in the process of setting up the VMA for the new mapping, and equally avoids such operations occurring while the VMA is in an inconsistent state. This series builds on the previously submitted hotfix patches (see link to v2 below) which addresses the most critical issues around mmap_region(), and further works to improve mmap_region() complexity, stability, and testability. This series moves the code to mm/vma.c to render it userland testable, refactors and simplifies it into smaller functions that are significantly more readable. It additionally avoids performing an attempt at a second merge mid-way through allocating a new VMA, a dubious proposition at best and one that is highly subject to subtle bugs. Rather than do this, we simply note that we ought to retry the merge and do this as a final step. This patch (of 3): Add some additional vma_internal.h stubs in preparation for __mmap_region() being moved to mm/vma.c. Without these the move would result in the tests no longer compiling. Link: https://lkml.kernel.org/r/cover.1729858176.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/74b27e159e261d2ac1fe66a130edad1d61fdc176.1729858176.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Xu <peterx@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06memcg-v1: remove memcg move locking codeShakeel Butt
The memcg v1's charge move feature has been deprecated. All the places using the memcg move lock, have stopped using it as they don't need the protection any more. Let's proceed to remove all the locking code related to charge moving. Link: https://lkml.kernel.org/r/20241025012304.2473312-7-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06memcg-v1: no need for memcg locking for MGLRUShakeel Butt
While updating the generation of the folios, MGLRU requires that the folio's memcg association remains stable. With the charge migration deprecated, there is no need for MGLRU to acquire locks to keep the folio and memcg association stable. [yuzhao@google.com: remove !rcu_read_lock_held() assertion] Link: https://lkml.kernel.org/r/ZykEtcHrQRq-KrBC@google.com Link: https://syzkaller.appspot.com/bug?extid=24f45b8beab9788e467e Link: https://lore.kernel.org/lkml/67294349.050a0220.701a.0010.GAE@google.com/ [akpm@linux-foundation.org: remove now-unused local] [shakeel.butt@linux.dev: folio_rcu() fixup, per Yu Zhao] Link: https://lkml.kernel.org/r/iwmabnye3nl4merealrawt3bdvfii2pwavwrddrqpraoveet7h@ezrsdhjwwej7 Link: https://lkml.kernel.org/r/20241025012304.2473312-6-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06memcg-v1: no need for memcg locking for writeback trackingShakeel Butt
During the era of memcg charge migration, the kernel has to be make sure that the writeback stat updates do not race with the charge migration. Otherwise it might update the writeback stats of the wrong memcg. Now with the memcg charge migration gone, there is no more race for writeback stat updates and the previous locking can be removed. Link: https://lkml.kernel.org/r/20241025012304.2473312-5-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06memcg-v1: no need for memcg locking for dirty trackingShakeel Butt
During the era of memcg charge migration, the kernel has to be make sure that the dirty stat updates do not race with the charge migration. Otherwise it might update the dirty stats of the wrong memcg. Now with the memcg charge migration gone, there is no more race for dirty stat updates and the previous locking can be removed. Link: https://lkml.kernel.org/r/20241025012304.2473312-4-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06memcg-v1: remove charge move codeShakeel Butt
The memcg-v1 charge move feature has been deprecated completely and let's remove the relevant code as well. Link: https://lkml.kernel.org/r/20241025012304.2473312-3-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06memcg-v1: fully deprecate move_charge_at_immigrateShakeel Butt
Patch series "memcg-v1: fully deprecate charge moving". The memcg v1's charge moving feature has been deprecated for almost 2 years and the kernel warns if someone try to use it. This warning has been backported to all stable kernel and there have not been any report of the warning or the request to support this feature anymore. Let's proceed to fully deprecate this feature. This patch (of 6): Proceed with the complete deprecation of memcg v1's charge moving feature. The deprecation warning has been in the kernel for almost two years and has been ported to all stable kernel since. Now is the time to fully deprecate this feature. Link: https://lkml.kernel.org/r/20241025012304.2473312-1-shakeel.butt@linux.dev Link: https://lkml.kernel.org/r/20241025012304.2473312-2-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm: shmem: fallback to page size splice if large folio has poisoned pagesBaolin Wang
The tmpfs has already supported the PMD-sized large folios, and splice() can not read any pages if the large folio has a poisoned page, which is not good as Matthew pointed out in a previous email[1]: "so if we have hwpoison set on one page in a folio, we now can't read bytes from any page in the folio? That seems like we've made a bad situation worse." Thus add a fallback to the PAGE_SIZE splice() still allows reading normal pages if the large folio has hwpoisoned pages. [1] https://lore.kernel.org/all/Zw_d0EVAJkpNJEbA@casper.infradead.org/ [baolin.wang@linux.alibaba.com: code layout cleaup, per dhowells] Link: https://lkml.kernel.org/r/32dd938c-3531-49f7-93e4-b7ff21fec569@linux.alibaba.com Link: https://lkml.kernel.org/r/e3737fbd5366c4de4337bf5f2044817e77a5235b.1729915173.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm/damon/vaddr: add 'nr_piece == 1' check in damon_va_evenly_split_region()Zheng Yejian
As discussed in [1], damon_va_evenly_split_region() is called to size-evenly split a region into 'nr_pieces' small regions, when nr_pieces == 1, no actual split is required. Check that case for better code readability and add a simple kunit testcase. [1] https://lore.kernel.org/all/20241021163316.12443-1-sj@kernel.org/ Link: https://lkml.kernel.org/r/20241022083927.3592237-3-zhengyejian@huaweicloud.com Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Fernand Sieber <sieberf@amazon.com> Cc: Leonard Foerster <foersleo@amazon.de> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Ye Weihua <yeweihua4@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm/damon/vaddr: fix issue in damon_va_evenly_split_region()Zheng Yejian
Patch series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()". v2. According to the logic of damon_va_evenly_split_region(), currently following split case would not meet the expectation: Suppose DAMON_MIN_REGION=0x1000, Case: Split [0x0, 0x3000) into 2 pieces, then the result would be acutually 3 regions: [0x0, 0x1000), [0x1000, 0x2000), [0x2000, 0x3000) but NOT the expected 2 regions: [0x0, 0x1000), [0x1000, 0x3000) !!! The root cause is that when calculating size of each split piece in damon_va_evenly_split_region(): `sz_piece = ALIGN_DOWN(sz_orig / nr_pieces, DAMON_MIN_REGION);` both the dividing and the ALIGN_DOWN may cause loss of precision, then each time split one piece of size 'sz_piece' from origin 'start' to 'end' would cause more pieces are split out than expected!!! To fix it, count for each piece split and make sure no more than 'nr_pieces'. In addition, add above case into damon_test_split_evenly(). And add 'nr_piece == 1' check in damon_va_evenly_split_region() for better code readability and add a corresponding kunit testcase. This patch (of 2): According to the logic of damon_va_evenly_split_region(), currently following split case would not meet the expectation: Suppose DAMON_MIN_REGION=0x1000, Case: Split [0x0, 0x3000) into 2 pieces, then the result would be acutually 3 regions: [0x0, 0x1000), [0x1000, 0x2000), [0x2000, 0x3000) but NOT the expected 2 regions: [0x0, 0x1000), [0x1000, 0x3000) !!! The root cause is that when calculating size of each split piece in damon_va_evenly_split_region(): `sz_piece = ALIGN_DOWN(sz_orig / nr_pieces, DAMON_MIN_REGION);` both the dividing and the ALIGN_DOWN may cause loss of precision, then each time split one piece of size 'sz_piece' from origin 'start' to 'end' would cause more pieces are split out than expected!!! To fix it, count for each piece split and make sure no more than 'nr_pieces'. In addition, add above case into damon_test_split_evenly(). After this patch, damon-operations test passed: # ./tools/testing/kunit/kunit.py run damon-operations [...] ============== damon-operations (6 subtests) =============== [PASSED] damon_test_three_regions_in_vmas [PASSED] damon_test_apply_three_regions1 [PASSED] damon_test_apply_three_regions2 [PASSED] damon_test_apply_three_regions3 [PASSED] damon_test_apply_three_regions4 [PASSED] damon_test_split_evenly ================ [PASSED] damon-operations ================= Link: https://lkml.kernel.org/r/20241022083927.3592237-1-zhengyejian@huaweicloud.com Link: https://lkml.kernel.org/r/20241022083927.3592237-2-zhengyejian@huaweicloud.com Fixes: 3f49584b262c ("mm/damon: implement primitives for the virtual memory address spaces") Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Fernand Sieber <sieberf@amazon.com> Cc: Leonard Foerster <foersleo@amazon.de> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Ye Weihua <yeweihua4@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm/page_alloc: use str_off_on() helper in build_all_zonelists()Thorsten Blum
Remove hard-coded strings by using the str_off_on() helper function. Link: https://lkml.kernel.org/r/20241021091340.5243-2-thorsten.blum@linux.dev Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm/memcontrol: fix seq_buf size to save memory when PAGE_SIZE is largeRyan Roberts
Previously the seq_buf used for accumulating the memory.stat output was sized at PAGE_SIZE. But the amount of output is invariant to PAGE_SIZE; If 4K is enough on a 4K page system, then it should also be enough on a 64K page system, so we can save 60K on the static buffer used in mem_cgroup_print_oom_meminfo(). Let's make it so. This also has the beneficial side effect of removing a place in the code that assumed PAGE_SIZE is a compile-time constant. So this helps our quest towards supporting boot-time page size selection. Link: https://lkml.kernel.org/r/20241021130027.3615969-1-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm: add missing mmu_notifier_clear_young for !MMU_NOTIFIERJames Houghton
Remove the now unnecessary ifdef in mm/damon/vaddr.c as well. Link: https://lkml.kernel.org/r/20241021160212.9935-1-jthoughton@google.com Signed-off-by: James Houghton <jthoughton@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: SeongJae Park <sj@kernel.org> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06tools/mm: free the allocated memoryLiu Jing
The comm_str memory needs to be freed if the search_pattern function call fails in get_comm [akpm@linux-foundation.org: fix whitespace] Link: https://lkml.kernel.org/r/20241022012526.7597-1-liujing@cmss.chinamobile.com Signed-off-by: Liu Jing <liujing@cmss.chinamobile.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm/page-writeback: raise wb_thresh to prevent write blocking with strictlimitJim Zhao
With the strictlimit flag, wb_thresh acts as a hard limit in balance_dirty_pages() and wb_position_ratio(). When device write operations are inactive, wb_thresh can drop to 0, causing writes to be blocked. The issue occasionally occurs in fuse fs, particularly with network backends, the write thread is blocked frequently during a period. To address it, this patch raises the minimum wb_thresh to a controllable level, similar to the non-strictlimit case. Link: https://lkml.kernel.org/r/20241023100032.62952-1-jimzhao.ai@gmail.com Signed-off-by: Jim Zhao <jimzhao.ai@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm/memory.c: simplify pfnmap_lockdep_assertManas
Use local `mapping' to reduce the pointer chasing. akpm: extracted from a bugfix which Linus fixed with b1b46751671be ("mm: fix follow_pfnmap API lockdep assert"). Link: https://lkml.kernel.org/r/20241004-fix-null-deref-v4-1-d0a8ec01ac85@iiitd.ac.in Signed-off-by: Manas <manas18244@iiitd.ac.in> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Anup Sharma <anupnewsmail@gmail.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm/codetag: fix arg in pgalloc_tag_copy alloc_tag_subSourav Panda
alloc_tag_sub() takes bytes as opposed to number of pages as argument. Currently pgalloc_tag_copy() passes the number of pages. This fix passes the correct unit, which is the number of bytes allocated. Link: https://lkml.kernel.org/r/20241022232440.334820-1-souravpanda@google.com Fixes: e0a955bf7f61 ("mm/codetag: add pgalloc_tag_copy()") Signed-off-by: Sourav Panda <souravpanda@google.com> Acked-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Wei Xu <weixugc@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06maple_tree: fix outdated flag name in commentJann Horn
MAPLE_USE_RCU was renamed to MT_FLAGS_USE_RCU at some point, fix up the comment. Link: https://lkml.kernel.org/r/20241007-maple-tree-doc-fix-v1-1-6bbf89c1153d@google.com Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm: shmem: improve the tmpfs large folio read performanceBaolin Wang
tmpfs already supports PMD-sized large folios, but the tmpfs read operation still performs copying at PAGE_SIZE granularity, which is unreasonable. This patch changes tmpfs to copy data at folio granularity, which can improve the read performance, as well as changing to use folio related functions. Moreover, if a large folio has a subpage that is hwpoisoned, it will still fall back to page granularity copying. Use 'fio bs=64k' to read a 1G tmpfs file populated with 2M THPs, and I can see about 20% performance improvement, and no regression with bs=4k. Before the patch: READ: bw=10.0GiB/s After the patch: READ: bw=12.0GiB/s Link: https://lkml.kernel.org/r/2129a21a5b9f77d3bb7ddec152c009ce7c5653c4.1729218573.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm: shmem: update iocb->ki_pos directly to simplify tmpfs read logicBaolin Wang
Patch series "Improve the tmpfs large folio read performance", v2. tmpfs already supports PMD-sized large folios, but the tmpfs read operation still performs copying at PAGE_SIZE granularity, which is not perfect. This patchset changes tmpfs to copy data at the folio granularity, which can improve the read performance. Use 'fio bs=64k' to read a 1G tmpfs file populated with 2M THPs, and I can see about 20% performance improvement, and no regression with bs=4k. I also did some functional testing with the xfstests suite, and I did not find any regressions with the following xfstests config: FSTYP=tmpfs export TEST_DIR=/mnt/tempfs_mnt export TEST_DEV=/mnt/tempfs_mnt export SCRATCH_MNT=/mnt/scratchdir export SCRATCH_DEV=/mnt/scratchdir This patch (of 2): Using iocb->ki_pos to check if the read bytes exceeds the file size and to calculate the bytes to be read can help simplify the code logic. Meanwhile, this is also a preparation for improving tmpfs large folios read performance in the following patch. Link: https://lkml.kernel.org/r/cover.1729218573.git.baolin.wang@linux.alibaba.com Link: https://lkml.kernel.org/r/e8863e289577e0dc1e365b5419bf2d1c9a24ae3d.1729218573.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm: remove unused has_isolate_pageblockLuoxi Li
has_isolate_pageblock() has been unused since commit 55612e80e722 ("mm: page_alloc: close migratetype race between freeing and stealing") Remove it. Link: https://lkml.kernel.org/r/20241018092235.2764859-1-kaixa@kiloview.com Signed-off-by: Luoxi Li <kaixa@kiloview.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm: remove redundant condition for THP folioDev Jain
folio_test_pmd_mappable() implies folio_test_large(), therefore, simplify the expression for is_thp. Link: https://lkml.kernel.org/r/20241018094151.3458-1-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm/mremap: remove goto from mremap_to()Liam R. Howlett
mremap_to() has a goto label at the end that doesn't unwind anything. Removing the label makes the code cleaner. This commit also adds documentation to the function. Link: https://lkml.kernel.org/r/20241018174114.2871880-3-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Pedro Falcato <pedro.falcato@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>