Age | Commit message (Collapse) | Author |
|
Pull more kvm updates from Paolo Bonzini:
Generic:
- Clean up locking of all vCPUs for a VM by using the *_nest_lock()
family of functions, and move duplicated code to virt/kvm/. kernel/
patches acked by Peter Zijlstra
- Add MGLRU support to the access tracking perf test
ARM fixes:
- Make the irqbypass hooks resilient to changes in the GSI<->MSI
routing, avoiding behind stale vLPI mappings being left behind. The
fix is to resolve the VGIC IRQ using the host IRQ (which is stable)
and nuking the vLPI mapping upon a routing change
- Close another VGIC race where vCPU creation races with VGIC
creation, leading to in-flight vCPUs entering the kernel w/o
private IRQs allocated
- Fix a build issue triggered by the recently added workaround for
Ampere's AC04_CPU_23 erratum
- Correctly sign-extend the VA when emulating a TLBI instruction
potentially targeting a VNCR mapping
- Avoid dereferencing a NULL pointer in the VGIC debug code, which
can happen if the device doesn't have any mapping yet
s390:
- Fix interaction between some filesystems and Secure Execution
- Some cleanups and refactorings, preparing for an upcoming big
series
x86:
- Wait for target vCPU to ack KVM_REQ_UPDATE_PROTECTED_GUEST_STATE
to fix a race between AP destroy and VMRUN
- Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for
the VM
- Refine and harden handling of spurious faults
- Add support for ALLOWED_SEV_FEATURES
- Add #VMGEXIT to the set of handlers special cased for
CONFIG_RETPOLINE=y
- Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing
features that utilize those bits
- Don't account temporary allocations in sev_send_update_data()
- Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock
Threshold
- Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU
IBPB, between SVM and VMX
- Advertise support to userspace for WRMSRNS and PREFETCHI
- Rescan I/O APIC routes after handling EOI that needed to be
intercepted due to the old/previous routing, but not the
new/current routing
- Add a module param to control and enumerate support for device
posted interrupts
- Fix a potential overflow with nested virt on Intel systems running
32-bit kernels
- Flush shadow VMCSes on emergency reboot
- Add support for SNP to the various SEV selftests
- Add a selftest to verify fastops instructions via forced emulation
- Refine and optimize KVM's software processing of the posted
interrupt bitmap, and share the harvesting code between KVM and the
kernel's Posted MSI handler"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
rtmutex_api: provide correct extern functions
KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointer
KVM: arm64: vgic-init: Plug vCPU vs. VGIC creation race
KVM: arm64: Unmap vLPIs affected by changes to GSI routing information
KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding()
KVM: arm64: Protect vLPI translation with vgic_irq::irq_lock
KVM: arm64: Use lock guard in vgic_v4_set_forwarding()
KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation
arm64: sysreg: Drag linux/kconfig.h to work around vdso build issue
KVM: s390: Simplify and move pv code
KVM: s390: Refactor and split some gmap helpers
KVM: s390: Remove unneeded srcu lock
s390: Remove unneeded includes
s390/uv: Improve splitting of large folios that cannot be split while dirty
s390/uv: Always return 0 from s390_wiggle_split_folio() if successful
s390/uv: Don't return 0 from make_hva_secure() if the operation was not successful
rust: add helper for mutex_trylock
RISC-V: KVM: use kvm_trylock_all_vcpus when locking all vCPUs
KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs
x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementation
...
|
|
Move a handful of helpers out of cgroup_util.c and into test_memcontrol.c
that have nothing to with cgroups in general, in anticipation of making
cgroup_util.c a generic library that can be used by other selftests.
Make read_text() and write_text() non-static so test_memcontrol.c can
use them.
Signed-off-by: James Houghton <jthoughton@google.com>
Acked-by: Michal Koutný <mkoutny@suse.com>
Link: https://lore.kernel.org/r/20250508184649.2576210-4-jthoughton@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
test_memcg_protection()
The test_memcg_protection() function is used for the test_memcg_min and
test_memcg_low sub-tests. This function generates a set of parent/child
cgroups like:
parent: memory.min/low = 50M
child 0: memory.min/low = 75M, memory.current = 50M
child 1: memory.min/low = 25M, memory.current = 50M
child 2: memory.min/low = 0, memory.current = 50M
After applying memory pressure, the function expects the following actual
memory usages.
parent: memory.current ~= 50M
child 0: memory.current ~= 29M
child 1: memory.current ~= 21M
child 2: memory.current ~= 0
In reality, the actual memory usages can differ quite a bit from the
expected values. It uses an error tolerance of 10% with the
values_close() helper.
Both the test_memcg_min and test_memcg_low sub-tests can fail sporadically
because the actual memory usage exceeds the 10% error tolerance. Below
are a sample of the usage data of the tests runs that fail.
Child Actual usage Expected usage %err
----- ------------ -------------- ----
1 16990208 22020096 -12.9%
1 17252352 22020096 -12.1%
0 37699584 30408704 +10.7%
1 14368768 22020096 -21.0%
1 16871424 22020096 -13.2%
The current 10% error tolerenace might be right at the time
test_memcontrol.c was first introduced in v4.18 kernel, but memory reclaim
have certainly evolved quite a bit since then which may result in a bit
more run-to-run variation than previously expected.
Increase the error tolerance to 15% for child 0 and 20% for child 1 to
minimize the chance of this type of failure. The tolerance is bigger for
child 1 because an upswing in child 0 corresponds to a smaller %err than a
similar downswing in child 1 due to the way %err is used in
values_close().
Before this patch, a 100 test runs of test_memcontrol produced the
following results:
17 not ok 1 test_memcg_min
22 not ok 2 test_memcg_low
After applying this patch, there were no test failure for test_memcg_min
and test_memcg_low in 100 test runs. However, these tests may still fail
once in a while if the memory usage goes beyond the newly extended range.
Link: https://lkml.kernel.org/r/20250502010443.106022-3-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "memcg: Fix test_memcg_min/low test failures", v8.
The test_memcontrol selftest consistently fails its test_memcg_low
sub-test (with memory_recursiveprot enabled) and sporadically fails its
test_memcg_min sub-test. This patchset fixes the test_memcg_min and
test_memcg_low failures by adjusting the test_memcontrol selftest to fix
these test failures.
This patch (of 8):
The test_memcontrol selftest consistently fails its test_memcg_low
sub-test due to the fact that its 3rd test child cgroup which have a
memmory.low of 0 have low event count. This happens when
memory_recursiveprot mount option is enabled which is the default setting
used by systemd to mount cgroup2 filesystem.
This issue was originally fixed by commit cdc69458a5f3 ("cgroup: account
for memory_recursiveprot in test_memcg_low()"). It was later reverted by
commit 1d09069f5313 ("selftests: memcg: expect no low events in
unprotected sibling") expecting the memory reclaim code would be fixed.
However, it turns out the unprotected cgroup may still have some residual
effective memory.low protection depending on the memory.low settings in
its parent and its siblings. As a result, low events may still be
triggered.
One way to fix the test failure is to revert the revert commit. However,
Michal suggested that it might be better to ignore the low event count
with memory_recursiveprot enabled as low event may or may not happen
depending on the actual test configuration.
Modify the test_memcontrol.c to ignore low event in the 3rd child cgroup
with memory_recursiveprot on.
The 4th child cgroup has no memory usage and so has an effective low of 0.
It has no low event count because the mem_cgroup_below_low() check in
shrink_node_memcgs() is skipped as mem_cgroup_below_min() returns true.
If we ever change mem_cgroup_below_min() in such a way that it no longer
skips the no usage case, we will have to add code to explicitly skip it.
With this patch applied, the test_memcg_low sub-test finishes successfully
without failure in most cases. Though both test_memcg_low and
test_memcg_min sub-tests may still fail occasionally if the memory.current
values fall outside of the expected ranges.
Link: https://lkml.kernel.org/r/20250502010443.106022-1-longman@redhat.com
Link: https://lkml.kernel.org/r/20250502010443.106022-2-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Suggested-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Extend two existing tests to cover extracting memory usage through the
newly mutable memory.peak and memory.swap.peak handlers.
In particular, make sure to exercise adding and removing watchers with
overlapping lifetimes so the less-trivial logic gets tested.
The new/updated tests attempt to detect a lack of the write handler by
fstat'ing the memory.peak and memory.swap.peak files and skip the tests if
that's the case. Additionally, skip if the file doesn't exist at all.
[davidf@vimeo.com: update tests]
Link: https://lkml.kernel.org/r/20240730231304.761942-3-davidf@vimeo.com
Link: https://lkml.kernel.org/r/20240729143743.34236-3-davidf@vimeo.com
Signed-off-by: David Finkel <davidf@vimeo.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This reverts commit c1457d9aad5ee2feafcf85aa9a58ab50500159d2.
The framework change to add D_GNU_SOURCE to KHDR_INCLUDES
to Makefile, lib.mk, and kselftest_harness.h is reverted
as it is causing build failures and warnings.
Revert this change as this change depends on the framework
change.
Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
_GNU_SOURCE is provided by lib.mk, so it should be dropped to prevent
redefinition warnings.
Signed-off-by: Edward Liaw <edliaw@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
First of all, in order to build with clang at all, one must first apply
Valentin Obst's build fix for LLVM [1]. Once that is done, then when
building with clang, via:
make LLVM=1 -C tools/testing/selftests
...clang warns about fd being used uninitialized, in
test_memcg_reclaim()'s error handling path.
Fix this by initializing fd to -1.
[1] https://lore.kernel.org/all/20240329-selftests-libmk-llvm-rfc-v1-1-2f9ed7d1c49f@valentinobst.de/
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
without nsdelegate
The test case test_cgcore_lesser_ns_open only tasks effect when cgroup2
is mounted with "nsdelegate" mount option. If it misses this option, or
is remounted without "nsdelegate", the test case will fail. For example,
running bpf/test_cgroup_storage first, and then run cgroup/test_core will
fail on test_cgcore_lesser_ns_open. Skip it if "nsdelegate" is not
detected in cgroup2 mount options.
Fixes: bf35a7879f1d ("selftests: cgroup: Test open-time cgroup namespace usage for migration checks")
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Before server got a client connection, there were some memory allocations
in the test memcg, such as user stack. So do not count those allocations
which are not related to socket when checking socket memory accounting.
Link: https://lkml.kernel.org/r/20230619124735.2124-1-haifeng.xu@shopee.com
Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Since commit f079a020ba95 ("selftests: memcg: factor out common parts of
memory.{low,min} tests"), the value used in second alloc_anon has changed
from 148M to 170M. Because memory.low allows reclaiming page cache in
child cgroups, so the memory.current is close to 30M instead of 50M.
Therefore, adjust the expected value of parent cgroup.
Link: https://lkml.kernel.org/r/20230522095233.4246-2-haifeng.xu@shopee.com
Fixes: f079a020ba95 ("selftests: memcg: factor out common parts of memory.{low,min} tests")
Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
There are several 'malloc' calls in test_memcontrol, which can be
unsuccessful. This patch will add 'malloc' failures checking to
give more details about test's fail reasons and avoid possible
undefined behavior during the future null dereference (like the
one in alloc_anon_50M_check_swap function).
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Make sure that we ignore protection of a memcg that is the target of memcg
reclaim.
Link: https://lkml.kernel.org/r/20221202031512.1365483-4-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Chris Down <chris@chrisdown.name>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vasily Averin <vasily.averin@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Refactor the code that drives writing to memory.reclaim (retrying, error
handling, etc) from test_memcg_reclaim() to a helper called
reclaim_until(), which proactively reclaims from a memcg until its usage
reaches a certain value.
While we are at it, refactor and simplify the reclaim loop.
This will be used in a following patch in another test.
Link: https://lkml.kernel.org/r/20221202031512.1365483-3-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Suggested-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Chris Down <chris@chrisdown.name>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vasily Averin <vasily.averin@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The memory protection test setup and runtime is almost equal for
memory.low and memory.min cases.
It makes modification of the common parts prone to mistakes, since the
protections are similar not only in setup but also in principle, factor
the common part out.
Past exceptions between the tests:
- missing memory.min is fine (kept),
- test_memcg_low protected orphaned pagecache (adapted like
test_memcg_min and we keep the processes of protected memory running).
The evaluation in two tests is different (OOM of allocator vs low events
of protégés), this is kept different.
Link: https://lkml.kernel.org/r/20220518161859.21565-6-mkoutny@suse.com
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
CC: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Richard Palethorpe <rpalethorpe@suse.de>
Cc: David Vernet <void@manifault.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The reclaim is triggered by memory limit in a subtree, therefore the
testcase does not need configured protection against external reclaim.
Also, correct respective comments.
Link: https://lkml.kernel.org/r/20220518161859.21565-5-mkoutny@suse.com
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: David Vernet <void@manifault.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Richard Palethorpe <rpalethorpe@suse.de>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The numbers are not easy to derive in a closed form (certainly mere
protections ratios do not apply), therefore use a simulation to obtain
expected numbers.
Link: https://lkml.kernel.org/r/20220518161859.21565-4-mkoutny@suse.com
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: David Vernet <void@manifault.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Richard Palethorpe <rpalethorpe@suse.de>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This is effectively a revert of commit cdc69458a5f3 ("cgroup: account for
memory_recursiveprot in test_memcg_low()"). The case test_memcg_low will
fail with memory_recursiveprot until resolved in reclaim code.
However, this patch preserves the existing helpers and variables for later
uses.
Link: https://lkml.kernel.org/r/20220518161859.21565-3-mkoutny@suse.com
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: David Vernet <void@manifault.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Richard Palethorpe <rpalethorpe@suse.de>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "memcontrol selftests fixups", v2.
Flushing the patches to make memcontrol selftests check the events
behavior we had consensus about (test_memcg_low fails).
(test_memcg_reclaim, test_memcg_swap_max fail for me now but it's present
even before the refactoring.)
The two bigger changes are:
- adjustment of the protected values to make tests succeed with the given
tolerance,
- both test_memcg_low and test_memcg_min check protection of memory in
populated cgroups (actually as per Documentation/admin-guide/cgroup-v2.rst
memory.min should not apply to empty cgroups, which is not the case
currently. Therefore I unified tests with the populated case in order to to
bring more broken tests).
This patch (of 5):
This fixes mis-applied changes from commit 72b1e03aa725 ("cgroup: account
for memory_localevents in test_memcg_oom_group_leaf_events()").
Link: https://lkml.kernel.org/r/20220518161859.21565-1-mkoutny@suse.com
Link: https://lkml.kernel.org/r/20220518161859.21565-2-mkoutny@suse.com
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: David Vernet <void@manifault.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Richard Palethorpe <rpalethorpe@suse.de>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
If the first goto is taken, 'fd' is not opened yet (and is un-initialized).
So a direct return is safer.
Link: https://lkml.kernel.org/r/628312312eb40e0e39463a2c06415fde5295c716.1653229120.git.christophe.jaillet@wanadoo.fr
Fixes: c1a31a2f7a9c ("cgroup: fix racy check in alloc_pagecache_max_30M() helper function")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: David Vernet <void@manifault.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
alloc_pagecache_max_30M() in the cgroup memcg tests performs a 50MB
pagecache allocation, which it expects to be capped at 30MB due to the
calling process having a memory.high setting of 30MB. After the
allocation, the function contains a check that verifies that MB(29) <
memory.current <= MB(30). This check can actually fail
non-deterministically.
The testcases that use this function are test_memcg_high() and
test_memcg_max(), which set memory.min and memory.max to 30MB respectively
for the cgroup under test. The allocation can slightly exceed this number
in both cases, and for memory.max, the process performing the allocation
will not have the OOM killer invoked as it's performing a pagecache
allocation. This patchset therefore updates the above check to instead
use the verify_close() helper function.
Link: https://lkml.kernel.org/r/20220423155619.3669555-6-void@manifault.com
Signed-off-by: David Vernet <void@manifault.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
test_memcg_sock() in the cgroup memcg tests, verifies expected memory
accounting for sockets. The test forks a process which functions as a TCP
server, and sends large buffers back and forth between itself (as the TCP
client) and the forked TCP server. While doing so, it verifies that
memory.current and memory.stat.sock look correct.
There is currently a check in tcp_client() which asserts memory.current >=
memory.stat.sock. This check is racy, as between memory.current and
memory.stat.sock being queried, a packet could come in which causes
mem_cgroup_charge_skmem() to be invoked. This could cause
memory.stat.sock to exceed memory.current. Reversing the order of
querying doesn't address the problem either, as memory may be reclaimed
between the two calls. Instead, this patch just removes that assertion
altogether, and instead relies on the values_close() check that follows to
validate the expected accounting.
Link: https://lkml.kernel.org/r/20220423155619.3669555-5-void@manifault.com
Signed-off-by: David Vernet <void@manifault.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The test_memcg_oom_group_leaf_events() testcase in the cgroup memcg tests
validates that processes in a group that perform allocations exceeding
memory.oom.group are killed. It also validates that the
memory.events.oom_kill events are properly propagated in this case.
Commit 06e11c907ea4 ("kselftests: memcg: update the oom group leaf events
test") fixed test_memcg_oom_group_leaf_events() to account for the fact
that the memory.events.oom_kill events in a child cgroup is propagated up
to its parent. This behavior can actually be configured by the
memory_localevents mount option, so this patch updates the testcase to
properly account for the possible presence of this mount option.
Link: https://lkml.kernel.org/r/20220423155619.3669555-4-void@manifault.com
Signed-off-by: David Vernet <void@manifault.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The test_memcg_low() testcase in test_memcontrol.c verifies the expected
behavior of groups using the memory.low knob. Part of the testcase
verifies that a group with memory.low that experiences reclaim due to
memory pressure elsewhere in the system, observes memory.events.low events
as a result of that reclaim.
In commit 8a931f801340 ("mm: memcontrol: recursive memory.low
protection"), the memory controller was updated to propagate memory.low
and memory.min protection from a parent group to its children via a
configurable memory_recursiveprot mount option. This unfortunately broke
the memcg tests, which asserts that a sibling that experienced reclaim but
had a memory.low value of 0, would not observe any memory.low events.
This patch updates test_memcg_low() to account for the new behavior
introduced by memory_recursiveprot.
So as to make the test resilient to multiple configurations, the patch
also adds a new proc_mount_contains() helper that checks for a string in
/proc/mounts, and is used to toggle behavior based on whether the default
memory_recursiveprot was present.
Link: https://lkml.kernel.org/r/20220423155619.3669555-3-void@manifault.com
Signed-off-by: David Vernet <void@manifault.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Fix bugs in memcontroller cgroup tests", v2.
tools/testing/selftests/cgroup/test_memcontrol.c contains a set of
testcases which validate expected behavior of the cgroup memory
controller. Roman Gushchin recently sent out a patchset that fixed a few
issues in the test. This patchset continues that effort by fixing a few
more issues that were causing non-deterministic failures in the suite.
With this patchset, I'm unable to reproduce any more errors after running
the tests in a continuous loop for many iterations. Before, I was able to
reproduce at least one of the errors fixed in this patchset with just one
or two runs.
This patch (of 5):
In test_memcg_min() and test_memcg_low(), there is an array of four
sibling cgroups. All but one of these sibling groups does a 50MB
allocation, and the group that does no allocation is the third of four in
the array. This is not a problem per se, but makes it a bit tricky to do
some assertions in test_memcg_low(), as we want to make assertions on the
siblings based on whether or not they performed allocations. Having a
static index before which all groups have performed an allocation makes
this cleaner.
This patch therefore reorders the sibling groups so that the group that
performs no allocations is the last in the array. A follow-on patch will
leverage this to fix a bug in the test that incorrectly asserts that a
sibling group that had performed an allocation, but only had protection
from its parent, will not observe any memory.events.low events during
reclaim.
Link: https://lkml.kernel.org/r/20220423155619.3669555-1-void@manifault.com
Link: https://lkml.kernel.org/r/20220423155619.3669555-2-void@manifault.com
Signed-off-by: David Vernet <void@manifault.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add a new test for memory.reclaim that verifies that the interface
correctly reclaims memory as intended, from both anon and file pages.
Link: https://lkml.kernel.org/r/20220425190040.2475377-5-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Chen Wandun <chenwandun@huawei.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently, alloc_anon_noexit() calls alloc_anon() which instantly frees
the allocated memory. alloc_anon_noexit() is usually used with
cg_run_nowait() to run a process in the background that allocates
memory. It makes sense for the background process to keep the memory
allocated and not instantly free it (otherwise there is no point of
running it in the background).
Link: https://lkml.kernel.org/r/20220425190040.2475377-4-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Chen Wandun <chenwandun@huawei.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
After commit 0e4b01df8659 ("mm, memcg: throttle allocators when failing
reclaim over memory.high") allocating memory over memory.high became very
time consuming. But it's exactly what the memory.high test from cgroup
kselftests is doing: it tries to allocate 100M with 30M memory.high value.
It takes forever to complete.
In order to keep it passing (or failing) in a reasonable amount of time
let's try to allocate only a little over 30M: 31M to be precise.
With this change test_memcontrol finishes in a reasonable amount of
time:
$ time ./test_memcontrol
ok 1 test_memcg_subtree_control
ok 2 test_memcg_current
ok 3 test_memcg_min
ok 4 test_memcg_low
ok 5 test_memcg_high
ok 6 test_memcg_max
ok 7 test_memcg_oom_events
ok 8 test_memcg_swap_max
ok 9 test_memcg_sock
ok 10 test_memcg_oom_group_leaf_events
ok 11 test_memcg_oom_group_parent_events
ok 12 test_memcg_oom_group_score_events
real 0m2.273s
user 0m0.064s
sys 0m0.739s
Link: https://lkml.kernel.org/r/20220415000133.3955987-3-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: David Vernet <void@manifault.com>
Cc: Chris Down <chris@chrisdown.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm: memcg kselftests fixes".
This patch (of 4):
Commit 9852ae3fe529 ("mm, memcg: consider subtrees in memory.events") made
memory.events recursive: all events are propagated upwards by the tree.
It was a change in semantics.
It broke the oom group leaf events test: it assumes that after an OOM the
oom_kill counter is zero on parent's level.
Let's adjust the test: it should have similar expectations for the child
and parent levels.
The test passes after this fix.
Link: https://lkml.kernel.org/r/20220415000133.3955987-2-roman.gushchin@linux.dev
Link: https://lkml.kernel.org/r/20220415000133.3955987-1-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: David Vernet <void@manifault.com>
Cc: Chris Down <chris@chrisdown.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Test the enforcement of memory.high limit for large amount of memory
allocation within a single kernel entry. There are valid use-cases
where the application can trigger large amount of memory allocation
within a single syscall e.g. mlock() or mmap(MAP_POPULATE).
Make sure memory.high limit enforcement works for such use-cases.
Link: https://lkml.kernel.org/r/20220211064917.2028469-4-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Chris Down <chris@chrisdown.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The cgroup testing relies on the root cgroup's subtree_control setting,
If the 'memory' controller isn't set, all test cases will be failed
as following:
$ sudo ./test_memcontrol
not ok 1 test_memcg_subtree_control
not ok 2 test_memcg_current
ok 3 # skip test_memcg_min
not ok 4 test_memcg_low
not ok 5 test_memcg_high
not ok 6 test_memcg_max
not ok 7 test_memcg_oom_events
ok 8 # skip test_memcg_swap_max
not ok 9 test_memcg_sock
not ok 10 test_memcg_oom_group_leaf_events
not ok 11 test_memcg_oom_group_parent_events
not ok 12 test_memcg_oom_group_score_events
To correct this unexpected failure, this patch write the 'memory' to
subtree_control of root to get a right result.
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Jay Kamat <jgkamat@fb.com>
Cc: linux-kselftest@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Dan reported, that cleanup path in test_memcg_subtree_control()
triggers a static checker warning:
./tools/testing/selftests/cgroup/test_memcontrol.c:76 \
test_memcg_subtree_control()
error: uninitialized symbol 'child2'.
Fix this by initializing child2 and parent2 variables and
split the cleanup path into few stages.
Signed-off-by: Roman Gushchin <guro@fb.com>
Fixes: 84092dbcf901 ("selftests: cgroup: add memory controller self-tests")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Shuah Khan <shuah@kernel.org>
|
|
Add tests for memory.oom.group for the following cases:
- Killing all processes in a leaf cgroup, but leaving the
parent untouched
- Killing all processes in a parent and leaf cgroup
- Keeping processes marked by OOM_SCORE_ADJ_MIN alive when considered
for being killed by the group oom killer.
Signed-off-by: Jay Kamat <jgkamat@fb.com>
Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
|
|
The test verifies that when there is active TCP connection, the
memory.stat.sock and memory.current values are close.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
|
|
The new test verifies that memory.swap.max and memory.swap.current behave
as expected for simple allocation scenarios
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
|
|
Cgroups are used for controlling the physical resource distribution
(memory, CPU, io, etc) and often are used as basic building blocks
for large distributed computing systems. Even small differences
in the actual behavior may lead to significant incidents.
The codebase is under the active development, which will unlikely
stop at any time soon. Also it's scattered over different kernel
subsystems, which makes regressions more probable.
Given that, the lack of any tests is crying.
This patch implements some basic tests for the memory controller,
as well as a minimal required framework. It doesn't pretend for a
very good coverage, but pretends to be a starting point.
Hopefully, any following significant changes will include corresponding
tests.
Tests for CPU and io controllers, as well as cgroup core
are next in the todo list.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: kernel-team@fb.com
Cc: linux-kselftest@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
|