Age | Commit message (Collapse) | Author |
|
Right now these are performed in kernel/fork.c which is odd and a
violation of separation of concerns, as well as preventing us from
integrating this and related logic into userland VMA testing going
forward.
There is a fly in the ointment - nommu - mmap.c is not compiled if
CONFIG_MMU not set, and neither is vma.c.
To square the circle, let's add a new file - vma_init.c. This will be
compiled for both CONFIG_MMU and nommu builds, and will also form part of
the VMA userland testing.
This allows us to de-duplicate code, while maintaining separation of
concerns and the ability for us to userland test this logic.
Update the VMA userland tests accordingly, additionally adding a
detach_free_vma() helper function to correctly detach VMAs before freeing
them in test code, as this change was triggering the assert for this.
[akpm@linux-foundation.org: remove stray newline, per Liam]
Link: https://lkml.kernel.org/r/f97b3a85a6da0196b28070df331b99e22b263be8.1745853549.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "move all VMA allocation, freeing and duplication logic to
mm", v3.
Currently VMA allocation, freeing and duplication exist in kernel/fork.c,
which is a violation of separation of concerns, and leaves these functions
exposed to the rest of the kernel when they are in fact internal
implementation details.
Resolve this by moving this logic to mm, and making it internal to vma.c,
vma.h.
This also allows us, in future, to provide userland testing around this
functionality.
We additionally abstract dup_mmap() to mm, being careful to ensure
kernel/fork.c acceses this via the mm internal header so it is not exposed
elsewhere in the kernel.
As part of this change, also abstract initial stack allocation performed
in __bprm_mm_init() out of fs code into mm via the
create_init_stack_vma(), as this code uses vm_area_alloc() and
vm_area_free().
In order to do so sensibly, we introduce a new mm/vma_exec.c file, which
contains the code that is shared by mm and exec. This file is added to
both memory mapping and exec sections in MAINTAINERS so both sets of
maintainers can maintain oversight.
As part of this change, we also move relocate_vma_down() to mm/vma_exec.c
so all shared mm/exec functionality is kept in one place.
We add code shared between nommu and mmu-enabled configurations in order
to share VMA allocation, freeing and duplication code correctly while also
keeping these functions available in userland VMA testing.
This is achieved by adding a mm/vma_init.c file which is also compiled by
the userland tests.
This patch (of 4):
There is functionality that overlaps the exec and memory mapping
subsystems. While it properly belongs in mm, it is important that exec
maintainers maintain oversight of this functionality correctly.
We can establish both goals by adding a new mm/vma_exec.c file which
contains these 'glue' functions, and have fs/exec.c import them.
As a part of this change, to ensure that proper oversight is achieved, add
the file to both the MEMORY MAPPING and EXEC & BINFMT API, ELF sections.
scripts/get_maintainer.pl can correctly handle files in multiple entries
and this neatly handles the cross-over.
[akpm@linux-foundation.org: fix comment typo]
Link: https://lkml.kernel.org/r/80f0d0c6-0b68-47f9-ab78-0ab7f74677fc@lucifer.local
Link: https://lkml.kernel.org/r/cover.1745853549.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/91f2cee8f17d65214a9d83abb7011aa15f1ea690.1745853549.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "fix incorrectly disallowed anonymous VMA merges", v2.
It appears that we have been incorrectly rejecting merge cases for 15
years, apparently by mistake.
Imagine a range of anonymous mapped momemory divided into two VMAs like
this, with incompatible protection bits:
RW RWX
unfaulted faulted
|-----------|-----------|
| prev | vma |
|-----------|-----------|
mprotect(RW)
Now imagine mprotect()'ing vma so it is RW. This appears as if it should
merge, it does not.
Neither does this case, again mprotect()'ing vma RW:
RWX RW
faulted unfaulted
|-----------|-----------|
| vma | next |
|-----------|-----------|
mprotect(RW)
Nor:
RW RWX RW
unfaulted faulted unfaulted
|-----------|-----------|-----------|
| prev | vma | next |
|-----------|-----------|-----------|
mprotect(RW)
What's going on here?
In commit 5beb49305251 ("mm: change anon_vma linking to fix multi-process
server scalability issue"), from 2010, Rik von Riel took careful care to
account for these cases - commenting that '[this is] easily overlooked:
when mprotect shifts the boundary, make sure the expanding vma has
anon_vma set if the shrinking vma had, to cover any anon pages imported.'
However, commit 965f55dea0e3 ("mmap: avoid merging cloned VMAs")
introduced a little over a year later, appears to have accidentally
disallowed this.
By adjusting the is_mergeable_anon_vma() function to avoid lock contention
across large trees of forked anon_vma's, this commit wrongly assumed the
VMA being checked (the ostensible merge 'target') should be faulted, that
is, have an anon_vma, and thus an anon_vma_chain list established, but
only of length 1.
This appears to have been unintentional, as disallowing empty target VMAs
like this across the board makes no sense.
We already have logic that accounts for this case, the same logic Rik
introduced in 2010, now via dup_anon_vma() (and ultimately
anon_vma_clone()), so there is no problem permitting this.
This series fixes this mistake and also ensures that scalability concerns
remain addressed by explicitly checking that whatever VMA is being merged
has not been forked.
A full set of self tests which reproduce the issue are provided, as well
as updating userland VMA tests to assert this behaviour.
The self tests additionally assert scalability concerns are addressed.
This patch (of 3):
anon_vma_chain's were introduced by Rik von Riel in commit 5beb49305251
("mm: change anon_vma linking to fix multi-process server scalability
issue").
This patch was introduced in March 2010. As part of this change, careful
attention was made to the instance of mprotect() causing a VMA merge, with
one faulted (i.e. having anon_vma set) and another not:
/*
* Easily overlooked: when mprotect shifts the boundary,
* make sure the expanding vma has anon_vma set if the
* shrinking vma had, to cover any anon pages imported.
*/
In the modern VMA code, this is handled in dup_anon_vma() (and ultimately
anon_vma_clone()).
This case is one of the three configurations of adjacent VMA anon_vma
state that we might encounter on merge (where dst is the VMA which will be
merged into and src the one being merged into dst):
1. dst->anon_vma, src->anon_vma - These must be equal, no-op.
2. dst->anon_vma, !src->anon_vma - We simply use dst->anon_vma, no-op.
3. !dst->anon_vma, src->anon_vma - The case in question here.
In case 3, the instance addressed here - we duplicate the AVC connections
from src and place into dst.
However, in practice, we very often do NOT do this.
This appears to be due to an inadvertent consequence of the change
introduced by commit 965f55dea0e3 ("mmap: avoid merging cloned VMAs"),
introduced in May 2011.
This implies that this merge case was functional only for a little over a
year, and has since been broken for ~15 years.
Here, lock scalability concerns lead to us restricting anonymous merges
only to those VMAs with 1 entry in their vma->anon_vma_chain, that is, a
VMA that is not connected to any parent process's anon_vma.
The mergeability test looks like this:
static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1,
struct anon_vma *anon_vma2, struct vm_area_struct *vma)
{
if ((!anon_vma1 || !anon_vma2) && (!vma ||
!vma->anon_vma || list_is_singular(&vma->anon_vma_chain)))
return true;
return anon_vma1 == anon_vma2;
}
However, we have a problem here - typically the vma passed here is the
destination VMA.
For instance in vma_merge_existing_range() we invoke:
can_vma_merge_left()
-> [ check that there is an immediately adjacent prior VMA ]
-> can_vma_merge_after()
-> is_mergeable_vma() for general attribute check
-> is_mergeable_anon_vma([ proposed anon_vma ], prev->anon_vma, prev)
So if we were considering a target unfaulted 'prev':
unfaulted faulted
|-----------|-----------|
| prev | vma |
|-----------|-----------|
This would call is_mergeable_anon_vma(NULL, vma->anon_vma, prev).
The list_is_singular() check for vma->anon_vma_chain, an empty list on
fault, would cause this merge to _fail_ even though all else indicates a
merge.
Equally a simple merge into a next VMA would hit the same problem:
faulted unfaulted
|-----------|-----------|
| vma | next |
|-----------|-----------|
can_vma_merge_right()
-> [ check that there is an immediately adjacent succeeding VMA ]
-> can_vma_merge_before()
-> is_mergeable_vma() for general attribute check
-> is_mergeable_anon_vma([ proposed anon_vma ], next->anon_vma, next)
For a 3-way merge, we'd also hit the same problem if it was configured like
this for instance:
unfaulted faulted unfaulted
|-----------|-----------|-----------|
| prev | vma | next |
|-----------|-----------|-----------|
As we'd call can_vma_merge_left() for prev, and can_vma_merge_right() for
next, both of which would fail.
vma_merge_new_range() (and relatedly, vma_expand()) are not impacted, as
the new VMA would never already be faulted (it is a proposed new range).
Because we already handle each of the aforementioned merge cases, and can
absolutely therefore deal with an existing VMA merge with !dst->anon_vma,
src->anon_vma, there is absolutely no reason to disallow this kind of
merge.
It seems that the intention of this patch is to ensure that, in the
instance of merging unfaulted VMAs with faulted ones, we never wish to do
so with those with multiple AVCs due to the fact that anon_vma lock's are
held across both parent and child anon_vma's (actually, the 'root' parent
anon_vma's lock is used).
In fact, the original commit alludes to this - "find_mergeable_anon_vma()
already considers this case".
In find_mergeable_anon_vma() however, we check the anon_vma which will be
merged from, if it is set, then we check
list_is_singular(vma->anon_vma_chain).
So to match this logic, update is_mergeable_anon_vma() to perform this
scalability check on the VMA whose anon_vma we ultimately merge into.
This matches existing behaviour with forked VMAs, only we no longer
wrongly disallow ALL empty target merges.
So we both allow merge cases and ensure the scalability check is correctly
applied.
We may wish to revisit these lock scalability concerns at a later date and
ensure they are still valid.
Additionally, correct userland VMA tests which were mistakenly not
asserting these cases correctly previously to now correctly assert this,
and to ensure vmg->anon_vma state is always consistent to account for
newly introduced asserts.
Link: https://lkml.kernel.org/r/cover.1744104124.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/18c756fc9eaf7ad082a710c91133b8346f8cd9a8.1744104124.git.lorenzo.stoakes@oracle.com
Fixes: 965f55dea0e3 ("mmap: avoid merging cloned VMAs")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
vma_iter_store() functions can be used both when adding a new vma and when
updating an existing one. However for existing ones we do not need to
mark them attached as they are already marked that way. With
vma->detached being a separate flag, double-marking a vmas as attached or
detached is not an issue because the flag will simply be overwritten with
the same value. However once we fold this flag into the refcount later in
this series, re-attaching or re-detaching a vma becomes an issue since
these operations will be incrementing/decrementing a refcount.
Introduce vma_iter_store_new() and vma_iter_store_overwrite() to replace
vma_iter_store() and avoid re-attaching a vma during vma update. Add
assertions in vma_mark_attached()/vma_mark_detached() to catch invalid
usage. Update vma tests to check for vma detached state correctness.
Link: https://lkml.kernel.org/r/20250213224655.1680278-5-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Shivank Garg <shivankg@amd.com>
Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Introduce internal vmg->__adjust_middle_start and vmg->__adjust_next_start
merge flags, enabling us to indicate to commit_merge() that we are
performing a merge which either spans only part of vmg->middle, or part of
vmg->next respectively.
In the former instance, we change the start of vmg->middle to match the
attributes of vmg->prev, without spanning all of vmg->middle.
This implies that vmg->prev->vm_end and vmg->middle->vm_start are both
increased to form the new merged VMA (vmg->prev) and the new subsequent
VMA (vmg->middle).
In the latter case, we change the end of vmg->middle to match the
attributes of vmg->next, without spanning all of vmg->next.
This implies that vmg->middle->vm_end and vmg->next->vm_start are both
decreased to form the new merged VMA (vmg->next) and the new prior VMA
(vmg->middle).
Since we now have a stable set of prev, middle, next VMAs threaded through
vmg and with these flags set know what is happening, we can perform the
calculation in commit_merge() instead.
This allows us to drop the confusing adj_start parameter and instead pass
semantic information to commit_merge().
In the latter case the -(middle->vm_end - start) calculation becomes
-(middle->vm-end - vmg->end), however this is correct as vmg->end is set
to the start parameter.
This is because in this case (rather confusingly), we manipulate
vmg->middle, but ultimately return vmg->next, whose range will be
correctly specified. At this point vmg->start, end is the new range for
the prior VMA rather than the merged one.
This patch has no change in functional behaviour.
Link: https://lkml.kernel.org/r/bcec0cd980b373a5eb02236cb033034ce1effe42.1738326519.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The current VMA merge mechanism contains a number of confusing mechanisms
around removal of VMAs on merge and the shrinking of the VMA adjacent to
vma->target in the case of merges which result in a partial merge with
that adjacent VMA.
Since we now have a STABLE set of VMAs - prev, middle, next - we are now
able to have the caller of commit_merge() explicitly tell us which VMAs
need deleting, using newly introduced internal VMA merge flags.
Doing so allows us to embed this state within the VMG and remove the
confusing remove, remove2 parameters from commit_merge().
We additionally are able to eliminate the highly confusing and misleading
'expanded' parameter - a parameter that in reality refers to whether or
not the return VMA is the target one or the one immediately adjacent.
We can infer which is the case from whether or not the adj_start parameter
is negative. This also allows us to simplify further logic around
iterator configuration and VMA iterator stores.
Doing so means we can also eliminate the adjust parameter, as we are able
to infer which VMA ought to be adjusted from adj_start - a positive value
implies we adjust the start of 'middle', a negative one implies we adjust
the start of 'next'.
We are then able to have commit_merge() explicitly return the target VMA,
or NULL on inability to pre-allocate memory. Errors were previously
filtered so behaviour does not change.
We additionally move from the slightly odd use of a bitwise-flag enum
vmg->merge_flags field to vmg bitfields.
This patch has no change in functional behaviour.
Link: https://lkml.kernel.org/r/7bf2ed24af68aac18672b7acebbd9102f48c5b03.1738326519.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm: further simplify VMA merge operation", v3.
While significant efforts have been made to improve the VMA merge
operation, there remains remnants of the bad (or rather confusing) old
days, which make the code difficult to understand, more bug prone and thus
harder to modify.
This series attempts to significantly improve matters in a number of
respects - with a focus on simplifying the commit_merge() function which
actually actions the merge operation - and importantly, adjusting the two
most confusing merge cases - those in which we 'adjust' the VMA
immediately adjacent to the one being merged.
One source of confusion are the VMAs being threaded through the operation
themselves - vmg->prev, vmg->vma and vmg->next.
At the start of the operation, vmg->vma is either NULL if a new VMA is
propose to be added, or if not then a pointer to an existing VMA being
modified, and prev/next are (perhaps not present) VMAs sat immediately
before and after the range specified in vmg->start, end, respectively.
However, during the VMA merge operation, we change vmg->start, end and
pgoff to span the newly merged range and vmg->vma to either be:
a. The ultimately returned VMA (in most cases) or b. A VMA which we will
manipulate, but ultimately instead return vmg->next.
Case b. especially here is confusing for somebody reading this code, but
the fact we update this state, along with vmg->start, end, pgoff only
makes matters worse.
We simplify things by replacing vmg->vma with vmg->middle and never
changing it - this is always either NULL (for a new VMA) or the VMA being
modified between vmg->prev and vmg->next.
We further simplify by placing the merged VMA in a new vmg->target field -
whether case b. above is the case or not. The reader of the code can now
simply rely on vmg->middle being the middle VMA and vmg->target being the
ultimately merged VMA.
We additionally tackle the confusing cases where we 'adjust' VMAs other
than the one we ultimately return as the merged VMA (this includes case b.
above). These are:
(1)
merge
<----------->
|------||--------| |------------|---|
| prev || middle | -> | target | m |
|------||--------| |------------|---|
In which case middle must be adjusted so middle->vm_start is increased as
well as performing the merge.
(2) (equivalent to case b. above)
<------------->
|---------||------| |---|-------------|
| middle || next | -> | m | target |
|---------||------| |---|-------------|
In which case next must be adjusted so next->vm_start is decreased as well
as performing the merge.
This cases have previously been performed by calculating and passing
around a dubious and confusing 'adj_start' parameter along side a pointer
to an 'adjust' VMA indicating which VMA requires additional adjustment
(middle in case 1 and next in case 2).
With the VMG structure in place we are able to avoid this by simply
setting a merge flag to describe each case:
(1) Sets the vmg->__adjust_middle_start flag
(2) Sets the vmg->__adjust_next_start flag
By doing so it turns out we can vastly simplify the logic and calculate
what is required to perform the operation.
Taken together the refactorings make it far easier to understand what is
being done even in these more confusing cases, make the code far more
maintainable, debuggable, and testable, providing more internal state
indicating what is happening in the merge operation.
The changes have no functional net impact on the merge operation and
everything should still behave as it did before.
This patch (of 5):
The merge code, while much improved, still has a number of points of
confusion. As part of a broader series cleaning this up to make this more
maintainable, we start by addressing some confusion around
vma_merge_struct fields.
So far, the caller either provides no vmg->vma (a new VMA) or supplies the
existing VMA which is being altered, setting vmg->start,end,pgoff to the
proposed VMA dimensions.
vmg->vma is then updated, as are vmg->start,end,pgoff as the merge process
proceeds and the appropriate merge strategy is determined.
This is rather confusing, as vmg->vma starts off as the 'middle' VMA
between vmg->prev,next, but becomes the 'target' VMA, except in one
specific edge case (merge next, shrink middle).
Int his patch we introduce vmg->middle to describe the VMA that is between
vmg->prev and vmg->next, and does NOT change during the merge operation.
We replace vmg->vma with vmg->target, and use this only during the merge
operation itself.
Aside from the merge right, shrink middle case, this becomes the VMA that
forms the basis of the VMA that is returned. This edge case can be
addressed in a future commit.
We also add a number of comments to explain what is going on.
Finally, we adjust the ASCII diagrams showing each merge case in
vma_merge_existing_range() to be clearer - the arrow range previously
showed the vmg->start, end spanned area, but it is clearer to change this
to show the final merged VMA.
This patch has no change in functional behaviour.
Link: https://lkml.kernel.org/r/cover.1738326519.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/4dfe60f1419d55e5d0516f56349695d73a57184c.1738326519.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Introduce demonstrative, basic, __mmap_region() test upon which we can
base further work upon moving forwards.
This simply asserts that mappings can be made and merges occur as
expected.
As part of this change, fix the security_vm_enough_memory_mm() stub which
was previously incorrectly implemented.
Link: https://lkml.kernel.org/r/20241213162409.41498-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Convert mm_lock_seq to be seqcount_t and change all mmap_write_lock
variants to increment it, in-line with the usual seqcount usage pattern.
This lets us check whether the mmap_lock is write-locked by checking
mm_lock_seq.sequence counter (odd=locked, even=unlocked). This will be
used when implementing mmap_lock speculation functions.
As a result vm_lock_seq is also change to be unsigned to match the type
of mm_lock_seq.sequence.
Link: https://lkml.kernel.org/r/20241122174416.1367052-2-surenb@google.com
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We build on previous work making expand_downwards() an entirely internal
function.
This logic is subtle and so it is highly useful to get it into vma.c so we
can then userland unit test.
We must additionally move acct_stack_growth() to vma.c as it is a helper
function used by both expand_downwards() and expand_upwards().
We are also then able to mark anon_vma_interval_tree_pre_update_vma() and
anon_vma_interval_tree_post_update_vma() static as these are no longer
used by anything else.
Link: https://lkml.kernel.org/r/0feb104eff85922019d4fb29280f3afb130c5204.1733248985.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We want to be able to unit test the unmapped area logic, so move it to
mm/vma.c. The wrappers which invoke this remain in place in mm/mmap.c.
In addition, naturally, update the existing test code to enable this to be
compiled in userland.
Link: https://lkml.kernel.org/r/53a57a52a64ea54e9d129d2e2abca3a538022379.1733248985.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The phys_addr_t size is predicated on whether CONFIG_PHYS_ADDR_T_64BIT is
set or not.
In the VMA tests, virt_to_phys() from tools/include/linux casts a volatile
void * pointer to phys_addr_t, if CONFIG_PHYS_ADDR_T_64BIT is not set,
this will be 32-bit and trigger a warning.
Obviously this might also lead to truncation, which we would rather avoid.
Fix this by adjusting the generation of generated/bit-length.h to generate
a CONFIG_PHYS_ADDR_T{bits}BIT define.
This does result in the generation of the useless CONFIG_PHYS_ADDR_T_32BIT
define for 32-bit systems, but this should have no effect, and makes
implementation of this easier.
This resolves the issue and the warning.
[lorenzo.stoakes@oracle.com: VMA tests not properly importing bit-length.h]
Link: https://lkml.kernel.org/r/a6183df9-3108-4d59-8128-4fc6c14e22a5@lucifer.local
Link: https://lkml.kernel.org/r/20241017165638.95602-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add a test to assert that VMG_FLAG_JUST_EXPAND functions as expected - that
is, when the VMA iterator is positioned at the previous VMA and no VMAs
proceed it, we observe an expansion with all state as expected.
Explicitly place a prior VMA that would otherwise fail this test if the
mode were not enabled (as it would traverse to the previous-previous VMA).
Link: https://lkml.kernel.org/r/d2f88330254a6448092412bf7dfe077a579ab0dc.1729174352.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In commit 714965ca8252 ("mm/mmap: start distinguishing if vma can be
removed in mergeability test") we relaxed the VMA merge rules for VMAs
possessing a vm_ops->close() hook, permitting this operation in instances
where we wouldn't delete the VMA as part of the merge operation.
This was later corrected in commit fc0c8f9089c2 ("mm, mmap: fix
vma_merge() case 7 with vma_ops->close") to account for a subtle case that
the previous commit had not taken into account.
In both instances, we first rely on is_mergeable_vma() to determine
whether we might be dealing with a VMA that might be removed, taking
advantage of the fact that a 'previous' VMA will never be deleted, only
VMAs that follow it.
The second patch corrects the instance where a merge of the previous VMA
into a subsequent one did not correctly check whether the subsequent VMA
had a vm_ops->close() handler.
Both changes prevent merge cases that are actually permissible (for
instance a merge of a VMA into a following VMA with a vm_ops->close(), but
with no previous VMA, which would result in the next VMA being extended,
not deleted).
In addition, both changes fail to consider the case where a VMA that would
otherwise be merged with the previous and next VMA might have
vm_ops->close(), on the assumption that for this to be the case, all three
would have to have the same vma->vm_file to be mergeable and thus the same
vm_ops.
And in addition both changes operate at 50,000 feet, trying to guess
whether a VMA will be deleted.
As we have majorly refactored the VMA merge operation and de-duplicated
code to the point where we know precisely where deletions will occur, this
patch removes the aforementioned checks altogether and instead explicitly
checks whether a VMA will be deleted.
In cases where a reduced merge is still possible (where we merge both
previous and next VMA but the next VMA has a vm_ops->close hook, meaning
we could just merge the previous and current VMA), we do so, otherwise the
merge is not permitted.
We take advantage of our userland testing to assert that this functions
correctly - replacing the previous limited vm_ops->close() tests with
tests for every single case where we delete a VMA.
We also update all testing for both new and modified VMAs to set
vma->vm_ops->close() in every single instance where this would not prevent
the merge, to assert that we never do so.
Link: https://lkml.kernel.org/r/9f96b8cfeef3d14afabddac3d6144afdfbef2e22.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The existing vma_merge() function is no longer required to handle what
were previously referred to as cases 1-3 (i.e. the merging of a new VMA),
as this is now handled by vma_merge_new_vma().
Additionally, simplify the convoluted control flow of the original,
maintaining identical logic only expressed more clearly and doing away
with a complicated set of cases, rather logically examining each possible
outcome - merging of both the previous and subsequent VMA, merging of the
previous VMA and merging of the subsequent VMA alone.
We now utilise the previously implemented commit_merge() function to share
logic with vma_expand() de-duplicating code and providing less surface
area for bugs and confusion. In order to do so, we adjust this function
to accept parameters specific to merging existing ranges.
Link: https://lkml.kernel.org/r/2cf6016b7bfcc4965fc3cde10827560c42e4f12c.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Abstract vma_merge_new_vma() to use vma_merge_struct and rename the
resultant function vma_merge_new_range() to be clear what the purpose of
this function is - a new VMA is desired in the specified range, and we
wish to see if it is possible to 'merge' surrounding VMAs into this range
rather than having to allocate a new VMA.
Note that this function uses vma_extend() exclusively, so adopts its
requirement that the iterator point at or before the gap. We add an
assert to this effect.
This is as opposed to vma_merge_existing_range(), which will be introduced
in a subsequent commit, and provide the same functionality for cases in
which we are modifying an existing VMA.
In mmap_region() and do_brk_flags() we open code scenarios where we prefer
to use vma_expand() rather than invoke a full vma_merge() operation.
Abstract this logic and eliminate all of the open-coding, and also use the
same logic for all cases where we add new VMAs to, rather than ultimately
use vma_merge(), rather use vma_expand().
Doing so removes duplication and simplifies VMA merging in all such cases,
laying the ground for us to eliminate the merging of new VMAs in
vma_merge() altogether.
Also add the ability for the vmg to track state, and able to report
errors, allowing for us to differentiate a failed merge from an inability
to allocate memory in callers.
This makes it far easier to understand what is happening in these cases
avoiding confusion, bugs and allowing for future optimisation.
Also introduce vma_iter_next_rewind() to allow for retrieval of the next,
and (optionally) the prev VMA, rewinding to the start of the previous gap.
Introduce are_anon_vmas_compatible() to abstract individual VMA anon_vma
comparison for the case of merging on both sides where the anon_vma of the
VMA being merged maybe compatible with prev and next, but prev and next's
anon_vma's may not be compatible with each other.
Finally also introduce can_vma_merge_left() / can_vma_merge_right() to
check adjacent VMA compatibility and that they are indeed adjacent.
Link: https://lkml.kernel.org/r/49d37c0769b6b9dc03b27fe4d059173832556392.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: Mark Brown <broonie@kernel.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The purpose of the vmg is to thread merge state through functions and
avoid egregious parameter lists. We expand this to vma_expand(), which is
used for a number of merge cases.
Accordingly, adjust its callers, mmap_region() and relocate_vma_down(), to
use a vmg.
An added purpose of this change is the ability in a future commit to
perform all new VMA range merging using vma_expand().
Link: https://lkml.kernel.org/r/4bc8c9dbc9ca52452ef8e587b28fe555854ceb38.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Rather than passing around huge numbers of parameters to numerous helper
functions, abstract them into a single struct that we thread through the
operation, the vma_merge_struct ('vmg').
Adjust vma_merge() and vma_modify() to accept this parameter, as well as
predicate functions can_vma_merge_before(), can_vma_merge_after(), and the
vma_modify_...() helper functions.
Also introduce VMG_STATE() and VMG_VMA_STATE() helper macros to allow for
easy vmg declaration.
We additionally remove the requirement that vma_merge() is passed a VMA
object representing the candidate new VMA. Previously it used this to
obtain the mm_struct, file and anon_vma properties of the proposed range
(a rather confusing state of affairs), which are now provided by the vmg
directly.
We also remove the pgoff calculation previously performed vma_modify(),
and instead calculate this in VMG_VMA_STATE() via the vma_pgoff_offset()
helper.
Link: https://lkml.kernel.org/r/a955aad09d81329f6fbeb636b2dd10cde7b73dab.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add a variety of VMA merge unit tests to assert that the behaviour of VMA
merge is correct at an abstract level and VMAs are merged or not merged as
expected.
These are intentionally added _before_ we start refactoring vma_merge() in
order that we can continually assert correctness throughout the rest of
the series.
In order to reduce churn going forward, we backport the vma_merge_struct
data type to the test code which we introduce and use in a future commit,
and add wrappers around the merge new and existing VMA cases.
Link: https://lkml.kernel.org/r/1c7a0b43cfad2c511a6b1b52f3507696478ff51a.1725040657.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Establish a new userland VMA unit testing implementation under
tools/testing which utilises existing logic providing maple tree support
in userland utilising the now-shared code previously exclusive to radix
tree testing.
This provides fundamental VMA operations whose API is defined in mm/vma.h,
while stubbing out superfluous functionality.
This exists as a proof-of-concept, with the test implementation functional
and sufficient to allow userland compilation of vma.c, but containing only
cursory tests to demonstrate basic functionality.
Link: https://lkml.kernel.org/r/533ffa2eec771cbe6b387dd049a7f128a53eb616.1722251717.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: SeongJae Park <sj@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Gow <davidgow@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Kees Cook <kees@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Rae Moar <rmoar@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|