linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2025-06-02	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull more kvm updates from Paolo Bonzini: Generic: - Clean up locking of all vCPUs for a VM by using the _nest_lock() family of functions, and move duplicated code to virt/kvm/. kernel/ patches acked by Peter Zijlstra - Add MGLRU support to the access tracking perf test ARM fixes: - Make the irqbypass hooks resilient to changes in the GSI<->MSI routing, avoiding behind stale vLPI mappings being left behind. The fix is to resolve the VGIC IRQ using the host IRQ (which is stable) and nuking the vLPI mapping upon a routing change - Close another VGIC race where vCPU creation races with VGIC creation, leading to in-flight vCPUs entering the kernel w/o private IRQs allocated - Fix a build issue triggered by the recently added workaround for Ampere's AC04_CPU_23 erratum - Correctly sign-extend the VA when emulating a TLBI instruction potentially targeting a VNCR mapping - Avoid dereferencing a NULL pointer in the VGIC debug code, which can happen if the device doesn't have any mapping yet s390: - Fix interaction between some filesystems and Secure Execution - Some cleanups and refactorings, preparing for an upcoming big series x86: - Wait for target vCPU to ack KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to fix a race between AP destroy and VMRUN - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM - Refine and harden handling of spurious faults - Add support for ALLOWED_SEV_FEATURES - Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features that utilize those bits - Don't account temporary allocations in sev_send_update_data() - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold - Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between SVM and VMX - Advertise support to userspace for WRMSRNS and PREFETCHI - Rescan I/O APIC routes after handling EOI that needed to be intercepted due to the old/previous routing, but not the new/current routing - Add a module param to control and enumerate support for device posted interrupts - Fix a potential overflow with nested virt on Intel systems running 32-bit kernels - Flush shadow VMCSes on emergency reboot - Add support for SNP to the various SEV selftests - Add a selftest to verify fastops instructions via forced emulation - Refine and optimize KVM's software processing of the posted interrupt bitmap, and share the harvesting code between KVM and the kernel's Posted MSI handler" tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits) rtmutex_api: provide correct extern functions KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointer KVM: arm64: vgic-init: Plug vCPU vs. VGIC creation race KVM: arm64: Unmap vLPIs affected by changes to GSI routing information KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding() KVM: arm64: Protect vLPI translation with vgic_irq::irq_lock KVM: arm64: Use lock guard in vgic_v4_set_forwarding() KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation arm64: sysreg: Drag linux/kconfig.h to work around vdso build issue KVM: s390: Simplify and move pv code KVM: s390: Refactor and split some gmap helpers KVM: s390: Remove unneeded srcu lock s390: Remove unneeded includes s390/uv: Improve splitting of large folios that cannot be split while dirty s390/uv: Always return 0 from s390_wiggle_split_folio() if successful s390/uv: Don't return 0 from make_hva_secure() if the operation was not successful rust: add helper for mutex_trylock RISC-V: KVM: use kvm_trylock_all_vcpus when locking all vCPUs KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementation ...
2025-05-16	cgroup: selftests: Move memcontrol specific helpers out of common cgroup_util.c	Sean Christopherson
	Move a handful of helpers out of cgroup_util.c and into test_memcontrol.c that have nothing to with cgroups in general, in anticipation of making cgroup_util.c a generic library that can be used by other selftests. Make read_text() and write_text() non-static so test_memcontrol.c can use them. Signed-off-by: James Houghton <jthoughton@google.com> Acked-by: Michal Koutný <mkoutny@suse.com> Link: https://lore.kernel.org/r/20250508184649.2576210-4-jthoughton@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-13	selftests: memcg: increase error tolerance of child memory.current check in ↵	Waiman Long
	test_memcg_protection() The test_memcg_protection() function is used for the test_memcg_min and test_memcg_low sub-tests. This function generates a set of parent/child cgroups like: parent: memory.min/low = 50M child 0: memory.min/low = 75M, memory.current = 50M child 1: memory.min/low = 25M, memory.current = 50M child 2: memory.min/low = 0, memory.current = 50M After applying memory pressure, the function expects the following actual memory usages. parent: memory.current ~= 50M child 0: memory.current ~= 29M child 1: memory.current ~= 21M child 2: memory.current ~= 0 In reality, the actual memory usages can differ quite a bit from the expected values. It uses an error tolerance of 10% with the values_close() helper. Both the test_memcg_min and test_memcg_low sub-tests can fail sporadically because the actual memory usage exceeds the 10% error tolerance. Below are a sample of the usage data of the tests runs that fail. Child Actual usage Expected usage %err ----- ------------ -------------- ---- 1 16990208 22020096 -12.9% 1 17252352 22020096 -12.1% 0 37699584 30408704 +10.7% 1 14368768 22020096 -21.0% 1 16871424 22020096 -13.2% The current 10% error tolerenace might be right at the time test_memcontrol.c was first introduced in v4.18 kernel, but memory reclaim have certainly evolved quite a bit since then which may result in a bit more run-to-run variation than previously expected. Increase the error tolerance to 15% for child 0 and 20% for child 1 to minimize the chance of this type of failure. The tolerance is bigger for child 1 because an upswing in child 0 corresponds to a smaller %err than a similar downswing in child 1 due to the way %err is used in values_close(). Before this patch, a 100 test runs of test_memcontrol produced the following results: 17 not ok 1 test_memcg_min 22 not ok 2 test_memcg_low After applying this patch, there were no test failure for test_memcg_min and test_memcg_low in 100 test runs. However, these tests may still fail once in a while if the memory usage goes beyond the newly extended range. Link: https://lkml.kernel.org/r/20250502010443.106022-3-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-13	selftests: memcg: allow low event with no memory.low and memory_recursiveprot on	Waiman Long
	Patch series "memcg: Fix test_memcg_min/low test failures", v8. The test_memcontrol selftest consistently fails its test_memcg_low sub-test (with memory_recursiveprot enabled) and sporadically fails its test_memcg_min sub-test. This patchset fixes the test_memcg_min and test_memcg_low failures by adjusting the test_memcontrol selftest to fix these test failures. This patch (of 8): The test_memcontrol selftest consistently fails its test_memcg_low sub-test due to the fact that its 3rd test child cgroup which have a memmory.low of 0 have low event count. This happens when memory_recursiveprot mount option is enabled which is the default setting used by systemd to mount cgroup2 filesystem. This issue was originally fixed by commit cdc69458a5f3 ("cgroup: account for memory_recursiveprot in test_memcg_low()"). It was later reverted by commit 1d09069f5313 ("selftests: memcg: expect no low events in unprotected sibling") expecting the memory reclaim code would be fixed. However, it turns out the unprotected cgroup may still have some residual effective memory.low protection depending on the memory.low settings in its parent and its siblings. As a result, low events may still be triggered. One way to fix the test failure is to revert the revert commit. However, Michal suggested that it might be better to ignore the low event count with memory_recursiveprot enabled as low event may or may not happen depending on the actual test configuration. Modify the test_memcontrol.c to ignore low event in the 3rd child cgroup with memory_recursiveprot on. The 4th child cgroup has no memory usage and so has an effective low of 0. It has no low event count because the mem_cgroup_below_low() check in shrink_node_memcgs() is skipped as mem_cgroup_below_min() returns true. If we ever change mem_cgroup_below_min() in such a way that it no longer skips the no usage case, we will have to add code to explicitly skip it. With this patch applied, the test_memcg_low sub-test finishes successfully without failure in most cases. Though both test_memcg_low and test_memcg_min sub-tests may still fail occasionally if the memory.current values fall outside of the expected ranges. Link: https://lkml.kernel.org/r/20250502010443.106022-1-longman@redhat.com Link: https://lkml.kernel.org/r/20250502010443.106022-2-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Suggested-by: Michal Koutný <mkoutny@suse.com> Acked-by: Michal Koutný <mkoutny@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Shuah Khan <shuah@kernel.org> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01	mm, memcg: cg2 memory{.swap,}.peak write tests	David Finkel
	Extend two existing tests to cover extracting memory usage through the newly mutable memory.peak and memory.swap.peak handlers. In particular, make sure to exercise adding and removing watchers with overlapping lifetimes so the less-trivial logic gets tested. The new/updated tests attempt to detect a lack of the write handler by fstat'ing the memory.peak and memory.swap.peak files and skip the tests if that's the case. Additionally, skip if the file doesn't exist at all. [davidf@vimeo.com: update tests] Link: https://lkml.kernel.org/r/20240730231304.761942-3-davidf@vimeo.com Link: https://lkml.kernel.org/r/20240729143743.34236-3-davidf@vimeo.com Signed-off-by: David Finkel <davidf@vimeo.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Shuah Khan <shuah@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-20	Revert "selftests/cgroup: Drop define _GNU_SOURCE"	Shuah Khan
	This reverts commit c1457d9aad5ee2feafcf85aa9a58ab50500159d2. The framework change to add D_GNU_SOURCE to KHDR_INCLUDES to Makefile, lib.mk, and kselftest_harness.h is reverted as it is causing build failures and warnings. Revert this change as this change depends on the framework change. Reported-by: Mark Brown <broonie@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-05-13	selftests/cgroup: Drop define _GNU_SOURCE	Edward Liaw
	_GNU_SOURCE is provided by lib.mk, so it should be dropped to prevent redefinition warnings. Signed-off-by: Edward Liaw <edliaw@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-05-03	selftests/cgroup: fix clang warnings: uninitialized fd variable	John Hubbard
	First of all, in order to build with clang at all, one must first apply Valentin Obst's build fix for LLVM [1]. Once that is done, then when building with clang, via: make LLVM=1 -C tools/testing/selftests ...clang warns about fd being used uninitialized, in test_memcg_reclaim()'s error handling path. Fix this by initializing fd to -1. [1] https://lore.kernel.org/all/20240329-selftests-libmk-llvm-rfc-v1-1-2f9ed7d1c49f@valentinobst.de/ Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-04-03	selftests: cgroup: skip test_cgcore_lesser_ns_open when cgroup2 mounted ↵	Tianchen Ding
	without nsdelegate The test case test_cgcore_lesser_ns_open only tasks effect when cgroup2 is mounted with "nsdelegate" mount option. If it misses this option, or is remounted without "nsdelegate", the test case will fail. For example, running bpf/test_cgroup_storage first, and then run cgroup/test_core will fail on test_cgcore_lesser_ns_open. Skip it if "nsdelegate" is not detected in cgroup2 mount options. Fixes: bf35a7879f1d ("selftests: cgroup: Test open-time cgroup namespace usage for migration checks") Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-06-23	selftests: cgroup: fix unexpected failure on test_memcg_sock	Haifeng Xu
	Before server got a client connection, there were some memory allocations in the test memcg, such as user stack. So do not count those allocations which are not related to socket when checking socket memory accounting. Link: https://lkml.kernel.org/r/20230619124735.2124-1-haifeng.xu@shopee.com Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09	selftests: cgroup: fix unexpected failure on test_memcg_low	Haifeng Xu
	Since commit f079a020ba95 ("selftests: memcg: factor out common parts of memory.{low,min} tests"), the value used in second alloc_anon has changed from 148M to 170M. Because memory.low allows reclaiming page cache in child cgroups, so the memory.current is close to 30M instead of 50M. Therefore, adjust the expected value of parent cgroup. Link: https://lkml.kernel.org/r/20230522095233.4246-2-haifeng.xu@shopee.com Fixes: f079a020ba95 ("selftests: memcg: factor out common parts of memory.{low,min} tests") Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28	selftests: cgroup: Add 'malloc' failures checks in test_memcontrol	Ivan Orlov
	There are several 'malloc' calls in test_memcontrol, which can be unsuccessful. This patch will add 'malloc' failures checking to give more details about test's fail reasons and avoid possible undefined behavior during the future null dereference (like the one in alloc_anon_50M_check_swap function). Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-12-11	selftests: cgroup: make sure reclaim target memcg is unprotected	Yosry Ahmed
	Make sure that we ignore protection of a memcg that is the target of memcg reclaim. Link: https://lkml.kernel.org/r/20221202031512.1365483-4-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Chris Down <chris@chrisdown.name> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vasily Averin <vasily.averin@linux.dev> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11	selftests: cgroup: refactor proactive reclaim code to reclaim_until()	Yosry Ahmed
	Refactor the code that drives writing to memory.reclaim (retrying, error handling, etc) from test_memcg_reclaim() to a helper called reclaim_until(), which proactively reclaims from a memcg until its usage reaches a certain value. While we are at it, refactor and simplify the reclaim loop. This will be used in a following patch in another test. Link: https://lkml.kernel.org/r/20221202031512.1365483-3-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Suggested-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Chris Down <chris@chrisdown.name> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vasily Averin <vasily.averin@linux.dev> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27	selftests: memcg: factor out common parts of memory.{low,min} tests	Michal Koutný
	The memory protection test setup and runtime is almost equal for memory.low and memory.min cases. It makes modification of the common parts prone to mistakes, since the protections are similar not only in setup but also in principle, factor the common part out. Past exceptions between the tests: - missing memory.min is fine (kept), - test_memcg_low protected orphaned pagecache (adapted like test_memcg_min and we keep the processes of protected memory running). The evaluation in two tests is different (OOM of allocator vs low events of protégés), this is kept different. Link: https://lkml.kernel.org/r/20220518161859.21565-6-mkoutny@suse.com Signed-off-by: Michal Koutný <mkoutny@suse.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> CC: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Richard Palethorpe <rpalethorpe@suse.de> Cc: David Vernet <void@manifault.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27	selftests: memcg: remove protection from top level memcg	Michal Koutný
	The reclaim is triggered by memory limit in a subtree, therefore the testcase does not need configured protection against external reclaim. Also, correct respective comments. Link: https://lkml.kernel.org/r/20220518161859.21565-5-mkoutny@suse.com Signed-off-by: Michal Koutný <mkoutny@suse.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: David Vernet <void@manifault.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Richard Palethorpe <rpalethorpe@suse.de> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27	selftests: memcg: adjust expected reclaim values of protected cgroups	Michal Koutný
	The numbers are not easy to derive in a closed form (certainly mere protections ratios do not apply), therefore use a simulation to obtain expected numbers. Link: https://lkml.kernel.org/r/20220518161859.21565-4-mkoutny@suse.com Signed-off-by: Michal Koutný <mkoutny@suse.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: David Vernet <void@manifault.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Richard Palethorpe <rpalethorpe@suse.de> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27	selftests: memcg: expect no low events in unprotected sibling	Michal Koutný
	This is effectively a revert of commit cdc69458a5f3 ("cgroup: account for memory_recursiveprot in test_memcg_low()"). The case test_memcg_low will fail with memory_recursiveprot until resolved in reclaim code. However, this patch preserves the existing helpers and variables for later uses. Link: https://lkml.kernel.org/r/20220518161859.21565-3-mkoutny@suse.com Signed-off-by: Michal Koutný <mkoutny@suse.com> Reviewed-by: David Vernet <void@manifault.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Richard Palethorpe <rpalethorpe@suse.de> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27	selftests: memcg: fix compilation	Michal Koutný
	Patch series "memcontrol selftests fixups", v2. Flushing the patches to make memcontrol selftests check the events behavior we had consensus about (test_memcg_low fails). (test_memcg_reclaim, test_memcg_swap_max fail for me now but it's present even before the refactoring.) The two bigger changes are: - adjustment of the protected values to make tests succeed with the given tolerance, - both test_memcg_low and test_memcg_min check protection of memory in populated cgroups (actually as per Documentation/admin-guide/cgroup-v2.rst memory.min should not apply to empty cgroups, which is not the case currently. Therefore I unified tests with the populated case in order to to bring more broken tests). This patch (of 5): This fixes mis-applied changes from commit 72b1e03aa725 ("cgroup: account for memory_localevents in test_memcg_oom_group_leaf_events()"). Link: https://lkml.kernel.org/r/20220518161859.21565-1-mkoutny@suse.com Link: https://lkml.kernel.org/r/20220518161859.21565-2-mkoutny@suse.com Signed-off-by: Michal Koutný <mkoutny@suse.com> Reviewed-by: David Vernet <void@manifault.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Richard Palethorpe <rpalethorpe@suse.de> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25	cgroup: fix an error handling path in alloc_pagecache_max_30M()	Christophe JAILLET
	If the first goto is taken, 'fd' is not opened yet (and is un-initialized). So a direct return is safer. Link: https://lkml.kernel.org/r/628312312eb40e0e39463a2c06415fde5295c716.1653229120.git.christophe.jaillet@wanadoo.fr Fixes: c1a31a2f7a9c ("cgroup: fix racy check in alloc_pagecache_max_30M() helper function") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: Shuah Khan <shuah@kernel.org> Cc: David Vernet <void@manifault.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13	cgroup: fix racy check in alloc_pagecache_max_30M() helper function	David Vernet
	alloc_pagecache_max_30M() in the cgroup memcg tests performs a 50MB pagecache allocation, which it expects to be capped at 30MB due to the calling process having a memory.high setting of 30MB. After the allocation, the function contains a check that verifies that MB(29) < memory.current <= MB(30). This check can actually fail non-deterministically. The testcases that use this function are test_memcg_high() and test_memcg_max(), which set memory.min and memory.max to 30MB respectively for the cgroup under test. The allocation can slightly exceed this number in both cases, and for memory.max, the process performing the allocation will not have the OOM killer invoked as it's performing a pagecache allocation. This patchset therefore updates the above check to instead use the verify_close() helper function. Link: https://lkml.kernel.org/r/20220423155619.3669555-6-void@manifault.com Signed-off-by: David Vernet <void@manifault.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13	cgroup: remove racy check in test_memcg_sock()	David Vernet
	test_memcg_sock() in the cgroup memcg tests, verifies expected memory accounting for sockets. The test forks a process which functions as a TCP server, and sends large buffers back and forth between itself (as the TCP client) and the forked TCP server. While doing so, it verifies that memory.current and memory.stat.sock look correct. There is currently a check in tcp_client() which asserts memory.current >= memory.stat.sock. This check is racy, as between memory.current and memory.stat.sock being queried, a packet could come in which causes mem_cgroup_charge_skmem() to be invoked. This could cause memory.stat.sock to exceed memory.current. Reversing the order of querying doesn't address the problem either, as memory may be reclaimed between the two calls. Instead, this patch just removes that assertion altogether, and instead relies on the values_close() check that follows to validate the expected accounting. Link: https://lkml.kernel.org/r/20220423155619.3669555-5-void@manifault.com Signed-off-by: David Vernet <void@manifault.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13	cgroup: account for memory_localevents in test_memcg_oom_group_leaf_events()	David Vernet
	The test_memcg_oom_group_leaf_events() testcase in the cgroup memcg tests validates that processes in a group that perform allocations exceeding memory.oom.group are killed. It also validates that the memory.events.oom_kill events are properly propagated in this case. Commit 06e11c907ea4 ("kselftests: memcg: update the oom group leaf events test") fixed test_memcg_oom_group_leaf_events() to account for the fact that the memory.events.oom_kill events in a child cgroup is propagated up to its parent. This behavior can actually be configured by the memory_localevents mount option, so this patch updates the testcase to properly account for the possible presence of this mount option. Link: https://lkml.kernel.org/r/20220423155619.3669555-4-void@manifault.com Signed-off-by: David Vernet <void@manifault.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13	cgroup: account for memory_recursiveprot in test_memcg_low()	David Vernet
	The test_memcg_low() testcase in test_memcontrol.c verifies the expected behavior of groups using the memory.low knob. Part of the testcase verifies that a group with memory.low that experiences reclaim due to memory pressure elsewhere in the system, observes memory.events.low events as a result of that reclaim. In commit 8a931f801340 ("mm: memcontrol: recursive memory.low protection"), the memory controller was updated to propagate memory.low and memory.min protection from a parent group to its children via a configurable memory_recursiveprot mount option. This unfortunately broke the memcg tests, which asserts that a sibling that experienced reclaim but had a memory.low value of 0, would not observe any memory.low events. This patch updates test_memcg_low() to account for the new behavior introduced by memory_recursiveprot. So as to make the test resilient to multiple configurations, the patch also adds a new proc_mount_contains() helper that checks for a string in /proc/mounts, and is used to toggle behavior based on whether the default memory_recursiveprot was present. Link: https://lkml.kernel.org/r/20220423155619.3669555-3-void@manifault.com Signed-off-by: David Vernet <void@manifault.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13	cgroups: refactor children cgroups in memcg tests	David Vernet
	Patch series "Fix bugs in memcontroller cgroup tests", v2. tools/testing/selftests/cgroup/test_memcontrol.c contains a set of testcases which validate expected behavior of the cgroup memory controller. Roman Gushchin recently sent out a patchset that fixed a few issues in the test. This patchset continues that effort by fixing a few more issues that were causing non-deterministic failures in the suite. With this patchset, I'm unable to reproduce any more errors after running the tests in a continuous loop for many iterations. Before, I was able to reproduce at least one of the errors fixed in this patchset with just one or two runs. This patch (of 5): In test_memcg_min() and test_memcg_low(), there is an array of four sibling cgroups. All but one of these sibling groups does a 50MB allocation, and the group that does no allocation is the third of four in the array. This is not a problem per se, but makes it a bit tricky to do some assertions in test_memcg_low(), as we want to make assertions on the siblings based on whether or not they performed allocations. Having a static index before which all groups have performed an allocation makes this cleaner. This patch therefore reorders the sibling groups so that the group that performs no allocations is the last in the array. A follow-on patch will leverage this to fix a bug in the test that incorrectly asserts that a sibling group that had performed an allocation, but only had protection from its parent, will not observe any memory.events.low events during reclaim. Link: https://lkml.kernel.org/r/20220423155619.3669555-1-void@manifault.com Link: https://lkml.kernel.org/r/20220423155619.3669555-2-void@manifault.com Signed-off-by: David Vernet <void@manifault.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29	selftests: cgroup: add a selftest for memory.reclaim	Yosry Ahmed
	Add a new test for memory.reclaim that verifies that the interface correctly reclaims memory as intended, from both anon and file pages. Link: https://lkml.kernel.org/r/20220425190040.2475377-5-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: David Rientjes <rientjes@google.com> Cc: Chen Wandun <chenwandun@huawei.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vaibhav Jain <vaibhav@linux.ibm.com> Cc: Wei Xu <weixugc@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29	selftests: cgroup: fix alloc_anon_noexit() instantly freeing memory	Yosry Ahmed
	Currently, alloc_anon_noexit() calls alloc_anon() which instantly frees the allocated memory. alloc_anon_noexit() is usually used with cg_run_nowait() to run a process in the background that allocates memory. It makes sense for the background process to keep the memory allocated and not instantly free it (otherwise there is no point of running it in the background). Link: https://lkml.kernel.org/r/20220425190040.2475377-4-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Chen Wandun <chenwandun@huawei.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vaibhav Jain <vaibhav@linux.ibm.com> Cc: Wei Xu <weixugc@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28	kselftests: memcg: speed up the memory.high test	Roman Gushchin
	After commit 0e4b01df8659 ("mm, memcg: throttle allocators when failing reclaim over memory.high") allocating memory over memory.high became very time consuming. But it's exactly what the memory.high test from cgroup kselftests is doing: it tries to allocate 100M with 30M memory.high value. It takes forever to complete. In order to keep it passing (or failing) in a reasonable amount of time let's try to allocate only a little over 30M: 31M to be precise. With this change test_memcontrol finishes in a reasonable amount of time: $ time ./test_memcontrol ok 1 test_memcg_subtree_control ok 2 test_memcg_current ok 3 test_memcg_min ok 4 test_memcg_low ok 5 test_memcg_high ok 6 test_memcg_max ok 7 test_memcg_oom_events ok 8 test_memcg_swap_max ok 9 test_memcg_sock ok 10 test_memcg_oom_group_leaf_events ok 11 test_memcg_oom_group_parent_events ok 12 test_memcg_oom_group_score_events real 0m2.273s user 0m0.064s sys 0m0.739s Link: https://lkml.kernel.org/r/20220415000133.3955987-3-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: David Vernet <void@manifault.com> Cc: Chris Down <chris@chrisdown.name> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28	kselftests: memcg: update the oom group leaf events test	Roman Gushchin
	Patch series "mm: memcg kselftests fixes". This patch (of 4): Commit 9852ae3fe529 ("mm, memcg: consider subtrees in memory.events") made memory.events recursive: all events are propagated upwards by the tree. It was a change in semantics. It broke the oom group leaf events test: it assumes that after an OOM the oom_kill counter is zero on parent's level. Let's adjust the test: it should have similar expectations for the child and parent levels. The test passes after this fix. Link: https://lkml.kernel.org/r/20220415000133.3955987-2-roman.gushchin@linux.dev Link: https://lkml.kernel.org/r/20220415000133.3955987-1-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: David Vernet <void@manifault.com> Cc: Chris Down <chris@chrisdown.name> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-03-22	selftests: memcg: test high limit for single entry allocation	Shakeel Butt
	Test the enforcement of memory.high limit for large amount of memory allocation within a single kernel entry. There are valid use-cases where the application can trigger large amount of memory allocation within a single syscall e.g. mlock() or mmap(MAP_POPULATE). Make sure memory.high limit enforcement works for such use-cases. Link: https://lkml.kernel.org/r/20220211064917.2028469-4-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Roman Gushchin <guro@fb.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Chris Down <chris@chrisdown.name> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-30	kselftest/cgroup: fix unexpected testing failure on test_memcontrol	Alex Shi
	The cgroup testing relies on the root cgroup's subtree_control setting, If the 'memory' controller isn't set, all test cases will be failed as following: $ sudo ./test_memcontrol not ok 1 test_memcg_subtree_control not ok 2 test_memcg_current ok 3 # skip test_memcg_min not ok 4 test_memcg_low not ok 5 test_memcg_high not ok 6 test_memcg_max not ok 7 test_memcg_oom_events ok 8 # skip test_memcg_swap_max not ok 9 test_memcg_sock not ok 10 test_memcg_oom_group_leaf_events not ok 11 test_memcg_oom_group_parent_events not ok 12 test_memcg_oom_group_score_events To correct this unexpected failure, this patch write the 'memory' to subtree_control of root to get a right result. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Jay Kamat <jgkamat@fb.com> Cc: linux-kselftest@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Roman Gushchin <guro@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-04-08	selftests: cgroup: fix cleanup path in test_memcg_subtree_control()	Roman Gushchin
	Dan reported, that cleanup path in test_memcg_subtree_control() triggers a static checker warning: ./tools/testing/selftests/cgroup/test_memcontrol.c:76 \ test_memcg_subtree_control() error: uninitialized symbol 'child2'. Fix this by initializing child2 and parent2 variables and split the cleanup path into few stages. Signed-off-by: Roman Gushchin <guro@fb.com> Fixes: 84092dbcf901 ("selftests: cgroup: add memory controller self-tests") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Shuah Khan <shuah@kernel.org>
2018-09-07	Add tests for memory.oom.group	Jay Kamat
	Add tests for memory.oom.group for the following cases: - Killing all processes in a leaf cgroup, but leaving the parent untouched - Killing all processes in a parent and leaf cgroup - Keeping processes marked by OOM_SCORE_ADJ_MIN alive when considered for being killed by the group oom killer. Signed-off-by: Jay Kamat <jgkamat@fb.com> Acked-by: Roman Gushchin <guro@fb.com> Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
2018-05-30	selftests: cgroup/memcontrol: add basic test for socket accounting	Mike Rapoport
	The test verifies that when there is active TCP connection, the memory.stat.sock and memory.current values are close. Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Acked-by: Roman Gushchin <guro@fb.com> Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
2018-05-30	selftests: cgroup/memcontrol: add basic test for swap controls	Mike Rapoport
	The new test verifies that memory.swap.max and memory.swap.current behave as expected for simple allocation scenarios Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Roman Gushchin <guro@fb.com> Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
2018-05-30	selftests: cgroup: add memory controller self-tests	Roman Gushchin
	Cgroups are used for controlling the physical resource distribution (memory, CPU, io, etc) and often are used as basic building blocks for large distributed computing systems. Even small differences in the actual behavior may lead to significant incidents. The codebase is under the active development, which will unlikely stop at any time soon. Also it's scattered over different kernel subsystems, which makes regressions more probable. Given that, the lack of any tests is crying. This patch implements some basic tests for the memory controller, as well as a minimal required framework. It doesn't pretend for a very good coverage, but pretends to be a starting point. Hopefully, any following significant changes will include corresponding tests. Tests for CPU and io controllers, as well as cgroup core are next in the todo list. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: kernel-team@fb.com Cc: linux-kselftest@vger.kernel.org Cc: linux-kernel@vger.kernel.org Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>