linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2024-09-03	sched_ext: Don't call put_prev_task_scx() before picking the next task	Tejun Heo
	fd03c5b85855 ("sched: Rework pick_next_task()") changed the definition of pick_next_task() from: pick_next_task() := pick_task() + set_next_task(.first = true) to: pick_next_task(prev) := pick_task() + put_prev_task() + set_next_task(.first = true) making invoking put_prev_task() pick_next_task()'s responsibility. This reordering allows pick_task() to be shared between regular and core-sched paths and put_prev_task() to know the next task. sched_ext depended on put_prev_task_scx() enqueueing the current task before pick_next_task_scx() is called. While pulling sched/core changes, 70cc76aa0d80 ("Merge branch 'tip/sched/core' into for-6.12") added an explicit put_prev_task_scx() call for SCX tasks in pick_next_task_scx() before picking the first task as a workaround. Clean it up and adopt the conventions that other sched classes are following. The operation of keeping running the current task was spread and required the task to be put on the local DSQ before picking: - balance_one() used SCX_TASK_BAL_KEEP to indicate that the task is still runnable, hasn't exhausted its slice, and thus should keep running. - put_prev_task_scx() enqueued the task to local DSQ if SCX_TASK_BAL_KEEP is set. It also called do_enqueue_task() with SCX_ENQ_LAST if it is the only runnable task. do_enqueue_task() in turn decided whether to use the local DSQ depending on SCX_OPS_ENQ_LAST. Consolidate the logic in balance_one() as it always knows whether it is going to keep the current task. balance_one() now considers all conditions where the current task should be kept and uses SCX_TASK_BAL_KEEP to tell pick_next_task_scx() to keep the current task instead of picking one from the local DSQ. Accordingly, SCX_ENQ_LAST handling is removed from put_prev_task_scx() and do_enqueue_task() and pick_next_task_scx() is updated to pick the current task if SCX_TASK_BAL_KEEP is set. The workaround put_prev_task[_scx]() calls are replaced with put_prev_set_next_task(). This causes two behavior changes observable from the BPF scheduler: - When a task keep running, it no longer goes through enqueue/dequeue cycle and thus ops.stopping/running() transitions. The new behavior is better and all the existing schedulers should be able to handle the new behavior. - The BPF scheduler cannot keep executing the current task by enqueueing SCX_ENQ_LAST task to the local DSQ. If SCX_OPS_ENQ_LAST is specified, the BPF scheduler is responsible for resuming execution after each SCX_ENQ_LAST. SCX_OPS_ENQ_LAST is mostly useful for cases where scheduling decisions are not made on the local CPU - e.g. central or userspace-driven schedulin - and the new behavior is more logical and shouldn't pose any problems. SCX_OPS_ENQ_LAST demonstration from scx_qmap is dropped as it doesn't fit that well anymore and the last task handling is moved to the end of qmap_dispatch(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: David Vernet <void@manifault.com> Cc: Andrea Righi <righi.andrea@gmail.com> Cc: Changwoo Min <multics69@gmail.com> Cc: Daniel Hodges <hodges.daniel.scott@gmail.com> Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
2024-09-03	selftests/damon: add execute permissions to test scripts	SeongJae Park
	Some test scripts are missing executable permissions. It causes warnings that make the test output unnecessarily verbose. Add executable permissions. Link: https://lkml.kernel.org/r/20240827030336.7930-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03	selftests/damon: cleanup __pycache__/ with 'make clean'	SeongJae Park
	Python-based tests creates __pycache__/ directory. Remove it with 'make clean' by defining it as EXTRA_CLEAN. Link: https://lkml.kernel.org/r/20240827030336.7930-3-sj@kernel.org Fixes: b5906f5f7359 ("selftests/damon: add a test for update_schemes_tried_regions sysfs command") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03	selftests/damon: add access_memory_even to .gitignore	SeongJae Park
	Patch series "misc fixups for DAMON {self,kunit} tests". This patchset is for minor fixups of DAMON selftests and kunit tests. First three patches make DAMON selftests more cleanly maintained (patches 1 and 2) without unnecessary warnings (patch 3). Following six patches remove unnecessary test case (patch 4), handle configs combinations that can make tests fail (patches 5-7), reorganize the test files following the new guideline (patch 8), and add reference kunitconfig for DAMON kunit tests (patch 9). This patch (of 9): DAMON selftests build access_memory_even, but its not on the .gitignore list. Add it to make 'git status' output cleaner. Link: https://lkml.kernel.org/r/20240827030336.7930-1-sj@kernel.org Link: https://lkml.kernel.org/r/20240827030336.7930-2-sj@kernel.org Fixes: c94df805c774 ("selftests/damon: implement a program for even-numbered memory regions access") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03	mm: rework vm_ops->close() handling on VMA merge	Lorenzo Stoakes
	In commit 714965ca8252 ("mm/mmap: start distinguishing if vma can be removed in mergeability test") we relaxed the VMA merge rules for VMAs possessing a vm_ops->close() hook, permitting this operation in instances where we wouldn't delete the VMA as part of the merge operation. This was later corrected in commit fc0c8f9089c2 ("mm, mmap: fix vma_merge() case 7 with vma_ops->close") to account for a subtle case that the previous commit had not taken into account. In both instances, we first rely on is_mergeable_vma() to determine whether we might be dealing with a VMA that might be removed, taking advantage of the fact that a 'previous' VMA will never be deleted, only VMAs that follow it. The second patch corrects the instance where a merge of the previous VMA into a subsequent one did not correctly check whether the subsequent VMA had a vm_ops->close() handler. Both changes prevent merge cases that are actually permissible (for instance a merge of a VMA into a following VMA with a vm_ops->close(), but with no previous VMA, which would result in the next VMA being extended, not deleted). In addition, both changes fail to consider the case where a VMA that would otherwise be merged with the previous and next VMA might have vm_ops->close(), on the assumption that for this to be the case, all three would have to have the same vma->vm_file to be mergeable and thus the same vm_ops. And in addition both changes operate at 50,000 feet, trying to guess whether a VMA will be deleted. As we have majorly refactored the VMA merge operation and de-duplicated code to the point where we know precisely where deletions will occur, this patch removes the aforementioned checks altogether and instead explicitly checks whether a VMA will be deleted. In cases where a reduced merge is still possible (where we merge both previous and next VMA but the next VMA has a vm_ops->close hook, meaning we could just merge the previous and current VMA), we do so, otherwise the merge is not permitted. We take advantage of our userland testing to assert that this functions correctly - replacing the previous limited vm_ops->close() tests with tests for every single case where we delete a VMA. We also update all testing for both new and modified VMAs to set vma->vm_ops->close() in every single instance where this would not prevent the merge, to assert that we never do so. Link: https://lkml.kernel.org/r/9f96b8cfeef3d14afabddac3d6144afdfbef2e22.1725040657.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Mark Brown <broonie@kernel.org> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03	mm: refactor vma_merge() into modify-only vma_merge_existing_range()	Lorenzo Stoakes
	The existing vma_merge() function is no longer required to handle what were previously referred to as cases 1-3 (i.e. the merging of a new VMA), as this is now handled by vma_merge_new_vma(). Additionally, simplify the convoluted control flow of the original, maintaining identical logic only expressed more clearly and doing away with a complicated set of cases, rather logically examining each possible outcome - merging of both the previous and subsequent VMA, merging of the previous VMA and merging of the subsequent VMA alone. We now utilise the previously implemented commit_merge() function to share logic with vma_expand() de-duplicating code and providing less surface area for bugs and confusion. In order to do so, we adjust this function to accept parameters specific to merging existing ranges. Link: https://lkml.kernel.org/r/2cf6016b7bfcc4965fc3cde10827560c42e4f12c.1725040657.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Mark Brown <broonie@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03	mm: avoid using vma_merge() for new VMAs	Lorenzo Stoakes
	Abstract vma_merge_new_vma() to use vma_merge_struct and rename the resultant function vma_merge_new_range() to be clear what the purpose of this function is - a new VMA is desired in the specified range, and we wish to see if it is possible to 'merge' surrounding VMAs into this range rather than having to allocate a new VMA. Note that this function uses vma_extend() exclusively, so adopts its requirement that the iterator point at or before the gap. We add an assert to this effect. This is as opposed to vma_merge_existing_range(), which will be introduced in a subsequent commit, and provide the same functionality for cases in which we are modifying an existing VMA. In mmap_region() and do_brk_flags() we open code scenarios where we prefer to use vma_expand() rather than invoke a full vma_merge() operation. Abstract this logic and eliminate all of the open-coding, and also use the same logic for all cases where we add new VMAs to, rather than ultimately use vma_merge(), rather use vma_expand(). Doing so removes duplication and simplifies VMA merging in all such cases, laying the ground for us to eliminate the merging of new VMAs in vma_merge() altogether. Also add the ability for the vmg to track state, and able to report errors, allowing for us to differentiate a failed merge from an inability to allocate memory in callers. This makes it far easier to understand what is happening in these cases avoiding confusion, bugs and allowing for future optimisation. Also introduce vma_iter_next_rewind() to allow for retrieval of the next, and (optionally) the prev VMA, rewinding to the start of the previous gap. Introduce are_anon_vmas_compatible() to abstract individual VMA anon_vma comparison for the case of merging on both sides where the anon_vma of the VMA being merged maybe compatible with prev and next, but prev and next's anon_vma's may not be compatible with each other. Finally also introduce can_vma_merge_left() / can_vma_merge_right() to check adjacent VMA compatibility and that they are indeed adjacent. Link: https://lkml.kernel.org/r/49d37c0769b6b9dc03b27fe4d059173832556392.1725040657.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Tested-by: Mark Brown <broonie@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03	mm: abstract vma_expand() to use vma_merge_struct	Lorenzo Stoakes
	The purpose of the vmg is to thread merge state through functions and avoid egregious parameter lists. We expand this to vma_expand(), which is used for a number of merge cases. Accordingly, adjust its callers, mmap_region() and relocate_vma_down(), to use a vmg. An added purpose of this change is the ability in a future commit to perform all new VMA range merging using vma_expand(). Link: https://lkml.kernel.org/r/4bc8c9dbc9ca52452ef8e587b28fe555854ceb38.1725040657.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Mark Brown <broonie@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03	mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify()	Lorenzo Stoakes
	Rather than passing around huge numbers of parameters to numerous helper functions, abstract them into a single struct that we thread through the operation, the vma_merge_struct ('vmg'). Adjust vma_merge() and vma_modify() to accept this parameter, as well as predicate functions can_vma_merge_before(), can_vma_merge_after(), and the vma_modify_...() helper functions. Also introduce VMG_STATE() and VMG_VMA_STATE() helper macros to allow for easy vmg declaration. We additionally remove the requirement that vma_merge() is passed a VMA object representing the candidate new VMA. Previously it used this to obtain the mm_struct, file and anon_vma properties of the proposed range (a rather confusing state of affairs), which are now provided by the vmg directly. We also remove the pgoff calculation previously performed vma_modify(), and instead calculate this in VMG_VMA_STATE() via the vma_pgoff_offset() helper. Link: https://lkml.kernel.org/r/a955aad09d81329f6fbeb636b2dd10cde7b73dab.1725040657.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Mark Brown <broonie@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03	tools: add VMA merge tests	Lorenzo Stoakes
	Add a variety of VMA merge unit tests to assert that the behaviour of VMA merge is correct at an abstract level and VMAs are merged or not merged as expected. These are intentionally added _before_ we start refactoring vma_merge() in order that we can continually assert correctness throughout the rest of the series. In order to reduce churn going forward, we backport the vma_merge_struct data type to the test code which we introduce and use in a future commit, and add wrappers around the merge new and existing VMA cases. Link: https://lkml.kernel.org/r/1c7a0b43cfad2c511a6b1b52f3507696478ff51a.1725040657.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Mark Brown <broonie@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03	tools: improve vma test Makefile	Lorenzo Stoakes
	Patch series "mm: remove vma_merge()", v3. The infamous vma_merge() function has been the cause of a great deal of pain, bugs and confusion for a very long time. It is subtle, contains many corner cases, tries to do far too much and is as a result very fragile. The fact that the function requires there to be a numbering system to cover each possible eventuality with references to each in the many branches of its implementation as to which case you are looking at speaks to all this. Some of this complexity is inherent - unfortunately there is no getting away from the need to figure out precisely how to execute the merge, whether we need to remove VMAs, whether it is safe to do so, what constitutes a mergeable VMA and so on. However, a lot of the complexity is not inherent but instead a product of the function's 'organic' development. Liam has gone to great lengths to improve the situation as a part of his maple tree implementation, greatly improving the readability of the code, and Vlastimil and myself have additionally gone to lengths to try to improve things further. However, with the availability of userland VMA testing, it now becomes possible to perform a rather more significant refactoring while maintaining confidence in its correct operation. An attempt was previously made by Vlastimil [0] to eliminate vma_merge(), however it was rather - brutal - and an astute reader might refer to the date of that patch for insight as to its intent. This series instead divides merge operations into two natural kinds - merges which occur when a NEW vma is being added to the address space, and merges which occur when a vma is being MODIFIED. Happily, the vma_expand() function introduced by Liam, which has the capacity for also deleting a subsequent VMA, covers each of the NEW vma cases. By abstracting the actual final commit of changes to a VMA to its own function, commit_merge() and writing a wrapper around vma_expand() for new VMA cases vma_merge_new_range(), we can avoid having to use vma_merge() for these instances altogether. By doing so we are also able to then de-duplicate all existing merge logic in mmap_region() and do_brk_flags() and have everything invoke this new function, so we universally take the same approach to merging new VMAs. Having done so, we can then completely rework vma_merge() into vma_merge_existing_range() and use this for the instances where a merge is proposed for a region of an existing VMA. This eliminates vma_merge() and its numbered cases and instead divides things into logical cases - merge both, merge left, merge right (the latter 2 being either partial or full merges). The code is heavily annotated with ASCII diagrams and greatly simplified in comparison to the existing vma_merge() function. Having made this change, we take the opportunity to address an issue with merging VMAs possessing a vm_ops->close() hook - commit 714965ca8252 ("mm/mmap: start distinguishing if vma can be removed in mergeability test") and commit fc0c8f9089c2 ("mm, mmap: fix vma_merge() case 7 with vma_ops->close") make efforts to relax how we handle these, making assumptions about which VMAs might end up deleted (and thus, if possessing a vm_ops->close() hook, cannot be). This refactor means we do not need to guess, so instead explicitly only disallow merge in instances where a VMA with a vm_ops->close() hook would be deleted (and try a smaller merge in cases where this is possible). In addition to these changes, we introduce a new vma_merge_struct abstraction to allow VMA merge state to be threaded through the operation neatly. There is heavy unit testing provided for all merge functionality, added prior to the refactoring, allowing for before/after testing. The vm_ops->close() change also introduces exhaustive testing to demonstrate that this functions as expected, and in addition to this the reproduction code from commit fc0c8f9089c2 ("mm, mmap: fix vma_merge() case 7 with vma_ops->close") was tested and confirmed passing. [0]:https://lore.kernel.org/linux-mm/20240401192623.18575-2-vbabka@suse.cz/ This patch (of 10): Have vma.o depend on its source dependencies explicitly, as previously these were simply being ignored as existing object files were up to date. This now correctly re-triggers the build if mm/ source is changed as well as local source code. Also set clean as a phony rule. Link: https://lkml.kernel.org/r/cover.1725040657.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/e3ea58f08364ae5432c9a074de0195a7c7e0b04a.1725040657.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Mark Brown <broonie@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03	selftests: test_zswap: add test for hierarchical zswap.writeback	Mike Yuan
	Ensure that zswap.writeback check goes up the cgroup tree, i.e. is hierarchical. Create a subcgroup which has zswap.writeback set to 1, and the upper hierarchy's restrictions shall apply. Link: https://lkml.kernel.org/r/20240823162506.12117-2-me@yhndnzj.com Signed-off-by: Mike Yuan <me@yhndnzj.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03	selftests/mm: fix charge_reserved_hugetlb.sh test	David Hildenbrand
	Currently, running the charge_reserved_hugetlb.sh selftest we can sometimes observe something like: $ ./charge_reserved_hugetlb.sh -cgroup-v2 ... write_result is 0 After write: hugetlb_usage=0 reserved_usage=10485760 killing write_to_hugetlbfs Received 2. Deleting the memory Detach failure: Invalid argument umount: /mnt/huge: target is busy. Both cases are issues in the test. While the unmount error seems to be racy, it will make the test fail: $ ./run_vmtests.sh -t hugetlb ... # [FAIL] not ok 10 charge_reserved_hugetlb.sh -cgroup-v2 # exit=32 The issue is that we are not waiting for the write_to_hugetlbfs process to quit. So it might still have a hugetlbfs file open, about which umount is not happy. Fix that by making "killall" wait for the process to quit. The other error ("Detach failure: Invalid argument") does not seem to result in a test error, but is misleading. Turns out write_to_hugetlbfs.c unconditionally tries to cleanup using shmdt(), even when we only mmap()'ed a hugetlb file. Even worse, shmaddr is never even set for the SHM case. Fix that as well. With this change it seems to work as expected. Link: https://lkml.kernel.org/r/20240821123115.2068812-1-david@redhat.com Fixes: 29750f71a9b4 ("hugetlb_cgroup: add hugetlb_cgroup reservation tests") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03	x86: remove PG_uncached	Matthew Wilcox (Oracle)
	Convert x86 to use PG_arch_2 instead of PG_uncached and remove PG_uncached. Link: https://lkml.kernel.org/r/20240821193445.2294269-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03	mm: rename PG_mappedtodisk to PG_owner_2	Matthew Wilcox (Oracle)
	This flag has similar constraints to PG_owner_priv_1 -- it is ignored by core code, and is entirely for the use of the code which allocated the folio. Since the pagecache does not use it, individual filesystems can use it. The bufferhead code does use it, so filesystems which use the buffer cache must not use it for another purpose. Link: https://lkml.kernel.org/r/20240821193445.2294269-10-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03	selftests/mm: add more mseal traversal tests	Pedro Falcato
	Add more mseal traversal tests across VMAs, where we could possibly screw up sealing checks. These test more across-vma traversal for mprotect, munmap and madvise. Particularly, we test for the case where a regular VMA is followed by a sealed VMA. [akpm@linux-foundation.org: remove incorrect comment, per review] [akpm@linux-foundation.org: remove the correct comment, per Pedro] [pedro.falcato@gmail.com: fix mseal's length] Link: https://lkml.kernel.org/r/vc4czyuemmu3kylqb4ctaga6y5yvondlyabimx6jvljlw2fkea@djawlllf45xa Link: https://lkml.kernel.org/r/20240817-mseal-depessimize-v3-7-d8d2e037df30@gmail.com Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Kees Cook <kees@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03	selftests: mm: support shmem mTHP collapse testing	Baolin Wang
	Add shmem mTHP collpase testing. Similar to the anonymous page, users can use the '-s' parameter to specify the shmem mTHP size for testing. Link: https://lkml.kernel.org/r/fa44bfa20ca5b9fd6f9163a048f3d3c1e53cd0a8.1724140601.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <21cnbao@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03	selftests/mm: remove unnecessary ia64 code and comment	Jinjiang Tu
	IA64 has gone with commit cf8e8658100d ("arch: Remove Itanium (IA-64) architecture"), so remove unnecessary ia64 special mm code and comment in selftests too. Link: https://lkml.kernel.org/r/20240819130609.3386195-1-tujinjiang@huawei.com Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nanyong Sun <sunnanyong@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03	cxl/pci: Rename cxl_setup_parent_dport() and cxl_dport_map_regs()	Li Ming
	The name of cxl_setup_parent_dport() function is not clear, the function is used to initialize AER and RAS capabilities on a dport, therefore, rename the function to cxl_dport_init_ras_reporting(), it is easier for user to understand what the function does. Besides, adjust the order of the function parameters, the subject of cxl_dport_init_ras_reporting() is a cxl dport, so a struct cxl_dport as the first parameter of the function should be better. cxl_dport_map_regs() is used to map CXL RAS capability on a cxl dport, using cxl_dport_map_ras() as the function name. Signed-off-by: Li Ming <ming4.li@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20240830061308.2327065-1-ming4.li@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-09-03	selftests: mptcp: pm_nl_ctl: remove re-definition	Matthieu Baerts (NGI0)
	'MPTCP_PM_NAME' is defined in 'linux/mptcp_pm.h', included in 'linux/mptcp.h', no need to re-define it. 'MPTCP_PM_EVENTS' is not defined in 'linux/mptcp.h', but 'MPTCP_PM_EV_GRP_NAME' is, with the same value. We can then use the latter, and drop the other one. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-11-d3e0f3773b90@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03	selftests: mptcp: join: simplify checksum_tests	Geliang Tang
	The four checksum tests are similar, only one line is different. So a for-loop can be used to simplify these tests. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-10-d3e0f3773b90@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03	selftests: mptcp: join: mute errors when ran in the background	Matthieu Baerts (NGI0)
	The test is supposed to be killed before the end, which will likely cause "Connection reset by peer" errors. It is confusing, especially because in case of real transfer errors, the test will not be marked as failed. But that's OK, there are many other tests checking that. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-9-d3e0f3773b90@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03	selftests: mptcp: join: specify host being checked	Matthieu Baerts (NGI0)
	Instead of displaying 'invert' when looking at some events like MP_FAIL, MP_FASTCLOSE, MP_RESET, RM_ADDR, which is a bit vague because they are not traditionnaly sent from one side, the host being checked is now printed. For the ADD_ADDR, only display the host when it is the client sending it, which is more unusual. Also before, the 'invert' message was printed after a few checks, but it was not clear which ones exactly. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-8-d3e0f3773b90@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03	selftests: mptcp: join: more explicit check name	Matthieu Baerts (NGI0)
	Before, the check names had to be very short. It is no longer the case now that these checks are printed on a dedicated line. Then, it looks better to have more explicit names. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-7-d3e0f3773b90@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03	selftests: mptcp: join: validate MPJ SYN TX MIB counters	Matthieu Baerts (NGI0)
	A few new MPJoinSynTx MIB counters have been added in a previous commit. They are being validated here in mptcp_join.sh selftest, each time the number of received MPJ are checked. Most of the time, the number of sent SYN+MPJ is the same as the received ones. But sometimes, there are more, because there are dropped, or there are errors. While at it, the "no MPC reuse with single endpoint" subtest has been modified to force a bind() error. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-6-d3e0f3773b90@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03	selftests: mptcp: join: one line for join check	Matthieu Baerts (NGI0)
	Most tests are checking if the expected number of SYN/SYN+ACK/ACK JOINs have been received, each of them on one line. More Join related tests are going to be checked soon, no need to add 5 new lines per test in case of success, just one is enough. In case of issue, the errors will still be reported like before. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-5-d3e0f3773b90@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03	selftests: mptcp: join: reduce join_nr params	Matthieu Baerts (NGI0)
	chk_join_nr() currently takes 9 positional parameters, 6 of them are optional. It makes it hard to read: chk_join_nr 1 1 1 1 0 1 1 0 4 Naming these vars helps to make it easier to read: join_csum_ns1=1 join_csum_ns2=0 \ join_fail_nr=1 join_rst_nr=1 join_infi_nr=0 \ join_corrupted_pkts=4 \ chk_join_nr 1 1 1 It will then be easier to add new optional parameters. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240902-net-next-mptcp-mib-mpjtx-misc-v1-4-d3e0f3773b90@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03	tools/testing/cxl: Use dev_is_platform()	Kunwu Chan
	Use dev_is_platform() instead of checking bus type directly. Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20240827095123.168696-1-kunwu.chan@linux.dev Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-09-03	perf parse-events: Pass cpu_list as a perf_cpu_map in __add_event()	Ian Rogers
	Previously the cpu_list is a string and typically no cpu_list is passed to __add_event(). Wanting to make events have their cpus distinct from the PMU means that in more occassions we want to pass a cpu_list. If we're reading this from sysfs it is easier to read a perf_cpu_map than allocate and pass around strings that will later be parsed. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Gautham Shenoy <gautham.shenoy@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20240718003025.1486232-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03	perf pmu: Merge boolean sysfs event option parsing	Ian Rogers
	Merge perf_pmu__parse_per_pkg() and perf_pmu__parse_snapshot() that do the same parsing except for the file suffix used. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Gautham Shenoy <gautham.shenoy@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20240718003025.1486232-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03	perf sched timehist: Add --prio option	Yang Jihong
	The --prio option is used to only show events for the given task priority(ies). The default is to show events for all priority tasks, which is consistent with the previous behavior. Testcase: # perf sched record nice -n 9 perf bench sched messaging -l 10000 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 3.435 [sec] [ perf record: Woken up 270 times to write data ] [ perf record: Captured and wrote 618.688 MB perf.data (5729036 samples) ] # perf sched timehist -h Usage: perf sched timehist [<options>] -C, --cpu <cpu> list of cpus to profile -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -g, --call-graph Display call chains if present (default on) -I, --idle-hist Show idle events only -i, --input <file> input file name -k, --vmlinux <file> vmlinux pathname -M, --migrations Show migration events -n, --next Show next task -p, --pid <pid[,pid...]> analyze events only for given process id(s) -s, --summary Show only syscall summary with statistics -S, --with-summary Show all syscalls and summary with statistics -t, --tid <tid[,tid...]> analyze events only for given thread id(s) -V, --cpu-visual Add CPU visual -v, --verbose be more verbose (show symbol address, etc) -w, --wakeups Show wakeup events --kallsyms <file> kallsyms pathname --max-stack <n> Maximum number of functions to display backtrace. --prio <prio> analyze events only for given task priority(ies) --show-prio Show task priority --state Show task state when sched-out --symfs <directory> Look for files with symbols relative to this directory --time <str> Time span for analysis (start,stop) # perf sched timehist --prio 140 Samples of sched_switch event do not have callchains. Invalid prio string # perf sched timehist --show-prio --prio 129 Samples of sched_switch event do not have callchains. time cpu task name prio wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ -------- --------- --------- --------- 2090450.765421 [0002] sched-messaging[1229618] 129 0.000 0.000 0.029 2090450.765445 [0007] sched-messaging[1229616] 129 0.000 0.062 0.043 2090450.765448 [0014] sched-messaging[1229619] 129 0.000 0.000 0.032 2090450.765478 [0013] sched-messaging[1229617] 129 0.000 0.065 0.048 2090450.765503 [0014] sched-messaging[1229622] 129 0.000 0.000 0.017 2090450.765550 [0002] sched-messaging[1229624] 129 0.000 0.000 0.021 2090450.765562 [0007] sched-messaging[1229621] 129 0.000 0.071 0.028 2090450.765570 [0005] sched-messaging[1229620] 129 0.000 0.064 0.066 2090450.765583 [0001] sched-messaging[1229625] 129 0.000 0.001 0.031 2090450.765595 [0013] sched-messaging[1229623] 129 0.000 0.060 0.028 2090450.765637 [0014] sched-messaging[1229628] 129 0.000 0.000 0.019 2090450.765665 [0007] sched-messaging[1229627] 129 0.000 0.038 0.030 <SNIP> # perf sched timehist --show-prio --prio 0,120-129 Samples of sched_switch event do not have callchains. time cpu task name prio wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ -------- --------- --------- --------- 2090450.763231 [0000] perf[1229608] 120 0.000 0.000 0.000 2090450.763235 [0000] migration/0[15] 0 0.000 0.001 0.003 2090450.763263 [0001] perf[1229608] 120 0.000 0.000 0.000 2090450.763268 [0001] migration/1[21] 0 0.000 0.001 0.004 2090450.763302 [0002] perf[1229608] 120 0.000 0.000 0.000 2090450.763309 [0002] migration/2[27] 0 0.000 0.001 0.007 2090450.763338 [0003] perf[1229608] 120 0.000 0.000 0.000 2090450.763343 [0003] migration/3[33] 0 0.000 0.001 0.004 2090450.763459 [0004] perf[1229608] 120 0.000 0.000 0.000 2090450.763469 [0004] migration/4[39] 0 0.000 0.002 0.010 2090450.763496 [0005] perf[1229608] 120 0.000 0.000 0.000 2090450.763501 [0005] migration/5[45] 0 0.000 0.001 0.004 2090450.763613 [0006] perf[1229608] 120 0.000 0.000 0.000 2090450.763622 [0006] migration/6[51] 0 0.000 0.001 0.008 2090450.763652 [0007] perf[1229608] 120 0.000 0.000 0.000 2090450.763660 [0007] migration/7[57] 0 0.000 0.001 0.008 <SNIP> 2090450.765665 [0001] <idle> 120 0.031 0.031 0.081 2090450.765665 [0007] sched-messaging[1229627] 129 0.000 0.038 0.030 2090450.765667 [0000] s1-perf[8235/7168] 120 0.008 0.000 0.004 2090450.765684 [0013] <idle> 120 0.028 0.028 0.088 2090450.765685 [0001] sched-messaging[1229630] 129 0.000 0.001 0.020 2090450.765688 [0000] <idle> 120 0.004 0.004 0.020 2090450.765689 [0002] <idle> 120 0.021 0.021 0.138 2090450.765691 [0005] sched-messaging[1229626] 129 0.000 0.085 0.029 Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240819033016.2427235-3-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03	perf sched timehist: Add --show-prio option	Yang Jihong
	The --show-prio option is used to display the priority of task. It is disabled by default, which is consistent with original behavior. The display format is xxx (priority does not change during task running) or xxx->yyy (priority changes during task running) Testcase: # perf sched record nice -n 9 true [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 0.497 MB perf.data ] # perf sched timehist -h Usage: perf sched timehist [<options>] -C, --cpu <cpu> list of cpus to profile -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -g, --call-graph Display call chains if present (default on) -I, --idle-hist Show idle events only -i, --input <file> input file name -k, --vmlinux <file> vmlinux pathname -M, --migrations Show migration events -n, --next Show next task -p, --pid <pid[,pid...]> analyze events only for given process id(s) -s, --summary Show only syscall summary with statistics -S, --with-summary Show all syscalls and summary with statistics -t, --tid <tid[,tid...]> analyze events only for given thread id(s) -V, --cpu-visual Add CPU visual -v, --verbose be more verbose (show symbol address, etc) -w, --wakeups Show wakeup events --kallsyms <file> kallsyms pathname --max-stack <n> Maximum number of functions to display backtrace. --show-prio Show task priority --state Show task state when sched-out --symfs <directory> Look for files with symbols relative to this directory --time <str> Time span for analysis (start,stop) # perf sched timehist Samples of sched_switch event do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 23952.006537 [0000] perf[534] 0.000 0.000 0.000 23952.006593 [0000] migration/0[19] 0.000 0.014 0.056 23952.006899 [0001] perf[534] 0.000 0.000 0.000 23952.006947 [0001] migration/1[22] 0.000 0.015 0.047 23952.007138 [0002] perf[534] 0.000 0.000 0.000 <SNIP> # perf sched timehist --show-prio Samples of sched_switch event do not have callchains. time cpu task name prio wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ -------- --------- --------- --------- 23952.006537 [0000] perf[534] 120 0.000 0.000 0.000 23952.006593 [0000] migration/0[19] 0 0.000 0.014 0.056 23952.006899 [0001] perf[534] 120 0.000 0.000 0.000 <SNIP> 23952.034843 [0003] nice[535] 120->129 0.189 0.024 23.314 <SNIP> 23952.053838 [0005] rcu_preempt[16] 120 3.993 0.000 0.023 23952.053990 [0005] <idle> 120 0.023 0.023 0.152 23952.054137 [0006] <idle> 120 1.427 1.427 17.855 23952.054278 [0007] <idle> 120 0.506 0.506 1.650 Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240819033016.2427235-2-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03	perf sched timehist: Remove redundant BUG_ON in timehist_sched_change_event()	Yang Jihong
	The BUG_ON(thread__tid(thread) != 0) in timehist_sched_change_event() is redundant, remove it. No functional change. Fixes: 07235f84ece6b66f ("perf sched timehist: Add -I/--idle-hist option") Reviewed-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240812132606.3126490-2-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03	perf sched timehist: Skip print non-idle task samples when only show idle events	Yang Jihong
	when only show idle events, runtime stats of non-idle tasks is not updated, and the value is 0, there is no need to print non-idle samples. Before: # perf sched timehist -I Samples of sched_switch event do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 2090450.763235 [0000] migration/0[15] 0.000 0.000 0.000 2090450.763268 [0001] migration/1[21] 0.000 0.000 0.000 2090450.763309 [0002] migration/2[27] 0.000 0.000 0.000 2090450.763343 [0003] migration/3[33] 0.000 0.000 0.000 2090450.763469 [0004] migration/4[39] 0.000 0.000 0.000 2090450.763501 [0005] migration/5[45] 0.000 0.000 0.000 2090450.763622 [0006] migration/6[51] 0.000 0.000 0.000 2090450.763660 [0007] migration/7[57] 0.000 0.000 0.000 2090450.763741 [0009] migration/9[69] 0.000 0.000 0.000 2090450.763862 [0010] migration/10[75] 0.000 0.000 0.000 2090450.763894 [0011] migration/11[81] 0.000 0.000 0.000 2090450.764021 [0012] migration/12[87] 0.000 0.000 0.000 2090450.764056 [0013] migration/13[93] 0.000 0.000 0.000 2090450.764135 [0014] migration/14[99] 0.000 0.000 0.000 2090450.764163 [0015] migration/15[105] 0.000 0.000 0.000 2090450.764292 [0016] migration/16[111] 0.000 0.000 0.000 2090450.764371 [0017] migration/17[117] 0.000 0.000 0.000 2090450.764422 [0018] migration/18[123] 0.000 0.000 0.000 2090450.764490 [0000] <idle> 0.000 0.000 1.255 2090450.764505 [0000] s1-perf[8235/7168] 0.000 0.000 0.000 2090450.764571 [0016] <idle> 0.000 0.000 0.278 2090450.764588 [0010] <idle> 0.000 0.000 0.725 2090450.764590 [0016] s1-agent[7179/7162] 0.000 0.000 0.000 2090450.764635 [0000] <idle> 0.015 0.015 0.129 2090450.764637 [0017] <idle> 0.000 0.000 0.266 2090450.764639 [0000] s1-perf[8235/7168] 0.000 0.000 0.000 2090450.764668 [0017] s1-agent[7180/7162] 0.000 0.000 0.000 2090450.764669 [0000] <idle> 0.003 0.003 0.029 2090450.764672 [0000] s1-perf[8235/7168] 0.000 0.000 0.000 2090450.764683 [0000] <idle> 0.003 0.003 0.010 After: # perf sched timehist -I Samples of sched_switch event do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 2090450.764490 [0000] <idle> 0.000 0.000 1.255 2090450.764571 [0016] <idle> 0.000 0.000 0.278 2090450.764588 [0010] <idle> 0.000 0.000 0.725 2090450.764635 [0000] <idle> 0.015 0.015 0.129 2090450.764637 [0017] <idle> 0.000 0.000 0.266 2090450.764669 [0000] <idle> 0.003 0.003 0.029 2090450.764683 [0000] <idle> 0.003 0.003 0.010 2090450.764688 [0016] <idle> 0.019 0.019 0.097 2090450.764694 [0000] <idle> 0.001 0.001 0.009 2090450.764706 [0000] <idle> 0.001 0.001 0.010 2090450.764725 [0002] <idle> 0.000 0.000 1.415 2090450.764728 [0000] <idle> 0.002 0.002 0.019 2090450.764823 [0000] <idle> 0.003 0.003 0.091 2090450.764838 [0019] <idle> 0.000 0.000 0.154 2090450.764865 [0002] <idle> 0.109 0.109 0.029 2090450.764866 [0000] <idle> 0.012 0.012 0.030 2090450.764880 [0002] <idle> 0.013 0.013 0.001 2090450.764880 [0000] <idle> 0.002 0.002 0.011 2090450.764896 [0000] <idle> 0.001 0.001 0.013 2090450.764903 [0019] <idle> 0.063 0.063 0.002 2090450.764908 [0019] <idle> 0.003 0.003 0.001 Fixes: 07235f84ece6b66f ("perf sched timehist: Add -I/--idle-hist option") Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240812132606.3126490-1-yangjihong@bytedance.com Reviewed-and-tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03	tools/iio: Add memory allocation failure check for trigger_name	Zhu Jun
	Added a check to handle memory allocation failure for `trigger_name` and return `-ENOMEM`. Signed-off-by: Zhu Jun <zhujun2@cmss.chinamobile.com> Link: https://patch.msgid.link/20240828093129.3040-1-zhujun2@cmss.chinamobile.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-09-03	selftests: filesystems: fix warn_unused_result build warnings	Abhinav Jain
	Add return value checks for read & write calls in test_listmount_ns function. This patch resolves below compilation warnings: ``` statmount_test_ns.c: In function ‘test_listmount_ns’: statmount_test_ns.c:322:17: warning: ignoring return value of ‘write’ declared with attribute ‘warn_unused_result’ [-Wunused-result] statmount_test_ns.c:323:17: warning: ignoring return value of ‘read’ declared with attribute ‘warn_unused_result’ [-Wunused-result] ``` Signed-off-by: Abhinav Jain <jain.abhinav177@gmail.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-09-03	perf script: Minimize "not reaching sample" for '-F +brstackinsn'	Andi Kleen
	In some situations 'perf script -F +brstackinsn' sees a lot of "not reaching sample" messages. This happens when the last LBR block before the sample contains a branch that is not in the LBR, and the instruction dumping stops. $ perf record -b emacs -Q --batch '()' [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.396 MB perf.data (443 samples) ] $ perf script -F +brstackinsn ... 00007f0ab2d171a4 insn: 41 0f 94 c0 00007f0ab2d171a8 insn: 83 fa 01 00007f0ab2d171ab insn: 74 d3 # PRED 6 cycles [313] 1.00 IPC 00007f0ab2d17180 insn: 45 84 c0 00007f0ab2d17183 insn: 74 28 ... not reaching sample ... $ perf script -F +brstackinsn \| grep -c reach 136 $ This is a problem for further analysis that wants to see the full code upto the sample. There are two common cases where the message is bogus: - The LBR only logs taken branches, but the branch might be a conditional branch that is not taken (that is the most common case actually) - The LBR sampling uses a filter ignoring some branches, but the perf script check checks for all branches. This patch fixes these two conditions, by only checking for conditional branches, as well as checking the perf_event_attr's branch filter attributes. For the test case above it fixes all the messages: $ ./perf script -F +brstackinsn \| grep -c reach 0 Note that there are still conditions when the message is hit -- sometimes there can be a unconditional branch that misses the LBR update before the sample -- but they are much more rare now. Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20240229161828.386397-1-ak@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03	perf record offcpu: Constify control data for BPF	Namhyung Kim
	The control knobs set before loading BPF programs should be declared as 'const volatile' so that it can be optimized by the BPF core. Committer testing: root@x1:~# perf record --off-cpu ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.807 MB perf.data (5645 samples) ] root@x1:~# perf evlist cpu_atom/cycles/P cpu_core/cycles/P offcpu-time dummy:u root@x1:~# perf evlist -v cpu_atom/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0xa00000000, { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|CPU\|PERIOD\|IDENTIFIER, read_format: ID\|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1 cpu_core/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000000, { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|CPU\|PERIOD\|IDENTIFIER, read_format: ID\|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1 offcpu-time: type: 1 (software), size: 136, config: 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq }: 1, sample_type: IP\|TID\|TIME\|CALLCHAIN\|CPU\|PERIOD\|IDENTIFIER, read_format: ID\|LOST, disabled: 1, inherit: 1, freq: 1, sample_id_all: 1 dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP\|TID\|TIME\|CPU\|IDENTIFIER, read_format: ID\|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 root@x1:~# perf trace -e bpf --max-events 5 perf record --off-cpu 0.000 ( 0.015 ms): :2949124/2949124 bpf(cmd: 36, uattr: 0x7ffefc6dbe30, size: 8) = -1 EOPNOTSUPP (Operation not supported) 0.031 ( 0.115 ms): :2949124/2949124 bpf(cmd: PROG_LOAD, uattr: 0x7ffefc6dbb60, size: 148) = 14 0.159 ( 0.037 ms): :2949124/2949124 bpf(cmd: PROG_LOAD, uattr: 0x7ffefc6dbc20, size: 148) = 14 23.868 ( 0.144 ms): perf/2949124 bpf(cmd: PROG_LOAD, uattr: 0x7ffefc6dbad0, size: 148) = 14 24.027 ( 0.014 ms): perf/2949124 bpf(uattr: 0x7ffefc6dbc80, size: 80) = 14 root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240902200515.2103769-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03	perf lock contention: Constify control data for BPF	Namhyung Kim
	The control knobs set before loading BPF programs should be declared as 'const volatile' so that it can be optimized by the BPF core. Committer testing: root@x1:~# perf lock contention --use-bpf contended total wait max wait avg wait type caller 5 31.57 us 14.93 us 6.31 us mutex btrfs_delayed_update_inode+0x43 1 16.91 us 16.91 us 16.91 us rwsem:R btrfs_tree_read_lock_nested+0x1b 1 15.13 us 15.13 us 15.13 us spinlock btrfs_getattr+0xd1 1 6.65 us 6.65 us 6.65 us rwsem:R btrfs_tree_read_lock_nested+0x1b 1 4.34 us 4.34 us 4.34 us spinlock process_one_work+0x1a9 root@x1:~# root@x1:~# perf trace -e bpf --max-events 10 perf lock contention --use-bpf 0.000 ( 0.013 ms): :2948281/2948281 bpf(cmd: 36, uattr: 0x7ffd5f12d730, size: 8) = -1 EOPNOTSUPP (Operation not supported) 0.024 ( 0.120 ms): :2948281/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d460, size: 148) = 16 0.158 ( 0.034 ms): :2948281/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d520, size: 148) = 16 26.653 ( 0.154 ms): perf/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d3d0, size: 148) = 16 26.825 ( 0.014 ms): perf/2948281 bpf(uattr: 0x7ffd5f12d580, size: 80) = 16 87.924 ( 0.038 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d400, size: 40) = 16 87.988 ( 0.006 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d470, size: 40) = 16 88.019 ( 0.006 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d250, size: 40) = 16 88.029 ( 0.172 ms): perf/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d320, size: 148) = 17 88.217 ( 0.005 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d4d0, size: 40) = 16 root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240902200515.2103769-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03	perf kwork: Constify control data for BPF	Namhyung Kim
	The control knobs set before loading BPF programs should be declared as 'const volatile' so that it can be optimized by the BPF core. Committer testing: root@x1:~# perf kwork report --use-bpf Starting trace, Hit <Ctrl+C> to stop and report ^C Kwork Name \| Cpu \| Total Runtime \| Count \| Max runtime \| Max runtime start \| Max runtime end \| -------------------------------------------------------------------------------------------------------------------------------- (w)intel_atomic_commit_work [ \| 0009 \| 18.680 ms \| 2 \| 18.553 ms \| 362410.681580 s \| 362410.700133 s \| (w)pm_runtime_work \| 0007 \| 13.300 ms \| 1 \| 13.300 ms \| 362410.254996 s \| 362410.268295 s \| (w)intel_atomic_commit_work [ \| 0009 \| 9.846 ms \| 2 \| 9.717 ms \| 362410.172352 s \| 362410.182069 s \| (w)acpi_ec_event_processor \| 0002 \| 8.106 ms \| 1 \| 8.106 ms \| 362410.463187 s \| 362410.471293 s \| (s)SCHED:7 \| 0000 \| 1.351 ms \| 106 \| 0.063 ms \| 362410.658017 s \| 362410.658080 s \| i915:157 \| 0008 \| 0.994 ms \| 13 \| 0.361 ms \| 362411.222125 s \| 362411.222486 s \| (s)SCHED:7 \| 0001 \| 0.703 ms \| 98 \| 0.047 ms \| 362410.245004 s \| 362410.245051 s \| (s)SCHED:7 \| 0005 \| 0.674 ms \| 42 \| 0.074 ms \| 362411.483039 s \| 362411.483113 s \| (s)NET_RX:3 \| 0001 \| 0.556 ms \| 10 \| 0.079 ms \| 362411.066388 s \| 362411.066467 s \| <SNIP> root@x1:~# perf trace -e bpf --max-events 5 perf kwork report --use-bpf 0.000 ( 0.016 ms): perf/2948007 bpf(cmd: 36, uattr: 0x7ffededa6660, size: 8) = -1 EOPNOTSUPP (Operation not supported) 0.026 ( 0.106 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6390, size: 148) = 12 0.152 ( 0.032 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6450, size: 148) = 12 26.247 ( 0.138 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6300, size: 148) = 12 26.396 ( 0.012 ms): perf/2948007 bpf(uattr: 0x7ffededa64b0, size: 80) = 12 Starting trace, Hit <Ctrl+C> to stop and report root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240902200515.2103769-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03	perf ftrace latency: Constify control data for BPF	Namhyung Kim
	The control knobs set before loading BPF programs should be declared as 'const volatile' so that it can be optimized by the BPF core. Committer testing: root@x1:~# perf ftrace latency --use-bpf -T schedule ^C# DURATION \| COUNT \| GRAPH \| 0 - 1 us \| 0 \| \| 1 - 2 us \| 0 \| \| 2 - 4 us \| 0 \| \| 4 - 8 us \| 0 \| \| 8 - 16 us \| 1 \| \| 16 - 32 us \| 5 \| \| 32 - 64 us \| 2 \| \| 64 - 128 us \| 6 \| \| 128 - 256 us \| 7 \| \| 256 - 512 us \| 5 \| \| 512 - 1024 us \| 22 \| # \| 1 - 2 ms \| 36 \| ## \| 2 - 4 ms \| 68 \| ##### \| 4 - 8 ms \| 22 \| # \| 8 - 16 ms \| 91 \| ####### \| 16 - 32 ms \| 11 \| \| 32 - 64 ms \| 26 \| ## \| 64 - 128 ms \| 213 \| ################# \| 128 - 256 ms \| 19 \| # \| 256 - 512 ms \| 14 \| # \| 512 - 1024 ms \| 5 \| \| 1 - ... s \| 8 \| \| root@x1:~# root@x1:~# perf trace -e bpf perf ftrace latency --use-bpf -T schedule 0.000 ( 0.015 ms): perf/2944525 bpf(cmd: 36, uattr: 0x7ffe80de7b40, size: 8) = -1 EOPNOTSUPP (Operation not supported) 0.025 ( 0.102 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7870, size: 148) = 8 0.136 ( 0.026 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7930, size: 148) = 8 0.174 ( 0.026 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de77e0, size: 148) = 8 0.205 ( 0.010 ms): perf/2944525 bpf(uattr: 0x7ffe80de7990, size: 80) = 8 0.227 ( 0.011 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7810, size: 40) = 8 0.244 ( 0.004 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7880, size: 40) = 8 0.257 ( 0.006 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7660, size: 40) = 8 0.265 ( 0.058 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7730, size: 148) = 9 0.330 ( 0.004 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de78e0, size: 40) = 8 0.337 ( 0.003 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7890, size: 40) = 8 0.343 ( 0.004 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7880, size: 40) = 8 0.349 ( 0.003 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de78b0, size: 40) = 8 0.355 ( 0.004 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7890, size: 40) = 8 0.361 ( 0.003 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de78b0, size: 40) = 8 0.367 ( 0.003 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7880, size: 40) = 8 0.373 ( 0.014 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7a00, size: 40) = 8 0.390 ( 0.358 ms): perf/2944525 bpf(uattr: 0x7ffe80de7950, size: 80) = 9 0.763 ( 0.014 ms): perf/2944525 bpf(uattr: 0x7ffe80de7950, size: 80) = 9 0.783 ( 0.011 ms): perf/2944525 bpf(uattr: 0x7ffe80de7950, size: 80) = 9 0.798 ( 0.017 ms): perf/2944525 bpf(uattr: 0x7ffe80de7950, size: 80) = 9 0.819 ( 0.003 ms): perf/2944525 bpf(uattr: 0x7ffe80de7700, size: 80) = 9 0.824 ( 0.047 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de76c0, size: 148) = 10 0.878 ( 0.008 ms): perf/2944525 bpf(uattr: 0x7ffe80de7950, size: 80) = 9 0.891 ( 0.014 ms): perf/2944525 bpf(cmd: MAP_UPDATE_ELEM, uattr: 0x7ffe80de79e0, size: 32) = 0 0.910 ( 0.103 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7880, size: 148) = 9 1.016 ( 0.143 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7880, size: 148) = 10 3.777 ( 0.068 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7570, size: 148) = 12 3.848 ( 0.003 ms): perf/2944525 bpf(cmd: LINK_CREATE, uattr: 0x7ffe80de7550, size: 64) = -1 EBADF (Bad file descriptor) 3.859 ( 0.006 ms): perf/2944525 bpf(cmd: LINK_CREATE, uattr: 0x7ffe80de77c0, size: 64) = 12 6.504 ( 0.010 ms): perf/2944525 bpf(cmd: LINK_CREATE, uattr: 0x7ffe80de77c0, size: 64) = 14 ^C# DURATION \| COUNT \| GRAPH \| 0 - 1 us \| 0 \| \| 1 - 2 us \| 0 \| \| 2 - 4 us \| 1 \| \| 4 - 8 us \| 3 \| \| 8 - 16 us \| 3 \| \| 16 - 32 us \| 11 \| \| 32 - 64 us \| 9 \| \| 64 - 128 us \| 17 \| \| 128 - 256 us \| 30 \| # \| 256 - 512 us \| 20 \| \| 512 - 1024 us \| 42 \| # \| 1 - 2 ms \| 151 \| ###### \| 2 - 4 ms \| 106 \| #### \| 4 - 8 ms \| 18 \| \| 8 - 16 ms \| 149 \| ###### \| 16 - 32 ms \| 30 \| # \| 32 - 64 ms \| 17 \| \| 64 - 128 ms \| 360 \| ############### \| 128 - 256 ms \| 52 \| ## \| 256 - 512 ms \| 18 \| \| 512 - 1024 ms \| 28 \| # \| 1 - ... s \| 5 \| \| root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240902200515.2103769-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03	perf stat: Constify control data for BPF	Namhyung Kim
	The control knobs set before loading BPF programs should be declared as 'const volatile' so that it can be optimized by the BPF core. Committer testing: root@x1:~# perf stat --bpf-counters -e cpu_core/cycles/,cpu_core/instructions/ sleep 1 Performance counter stats for 'sleep 1': 2,442,583 cpu_core/cycles/ 2,494,425 cpu_core/instructions/ 1.002687372 seconds time elapsed 0.001126000 seconds user 0.001166000 seconds sys root@x1:~# perf trace -e bpf --max-events 10 perf stat --bpf-counters -e cpu_core/cycles/,cpu_core/instructions/ sleep 1 0.000 ( 0.019 ms): perf/2944119 bpf(cmd: OBJ_GET, uattr: 0x7fffdf5cdd40, size: 20) = 5 0.021 ( 0.002 ms): perf/2944119 bpf(cmd: OBJ_GET_INFO_BY_FD, uattr: 0x7fffdf5cdcd0, size: 16) = 0 0.030 ( 0.005 ms): perf/2944119 bpf(cmd: MAP_LOOKUP_ELEM, uattr: 0x7fffdf5ceda0, size: 32) = 0 0.037 ( 0.004 ms): perf/2944119 bpf(cmd: LINK_GET_FD_BY_ID, uattr: 0x7fffdf5ced80, size: 12) = -1 ENOENT (No such file or directory) 0.189 ( 0.004 ms): perf/2944119 bpf(cmd: 36, uattr: 0x7fffdf5cec10, size: 8) = -1 EOPNOTSUPP (Operation not supported) 0.201 ( 0.095 ms): perf/2944119 bpf(cmd: PROG_LOAD, uattr: 0x7fffdf5ce940, size: 148) = 10 0.305 ( 0.026 ms): perf/2944119 bpf(cmd: PROG_LOAD, uattr: 0x7fffdf5cea00, size: 148) = 10 0.347 ( 0.012 ms): perf/2944119 bpf(cmd: BTF_LOAD, uattr: 0x7fffdf5ce8e0, size: 40) = 10 0.364 ( 0.004 ms): perf/2944119 bpf(cmd: BTF_LOAD, uattr: 0x7fffdf5ce950, size: 40) = 10 0.376 ( 0.006 ms): perf/2944119 bpf(cmd: BTF_LOAD, uattr: 0x7fffdf5ce730, size: 40) = 10 root@x1:~# Performance counter stats for 'sleep 1': 271,221 cpu_core/cycles/ 139,150 cpu_core/instructions/ 1.002881677 seconds time elapsed 0.001318000 seconds user 0.001314000 seconds sys root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240902200515.2103769-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03	perf test: Make watchpoint data 32-bits on i386	Ian Rogers
	i386 only supports watchpoints up to size 4, 8 bytes causes extra counts and test failures. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: David Ahern <dsa@cumulusnetworks.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240831070415.506194-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03	perf test: Skip uprobe test if probe command isn't present	Ian Rogers
	The probe command is dependent on libelf. Skip the test if the required probe command isn't present. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: David Ahern <dsa@cumulusnetworks.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240831070415.506194-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03	perf time-utils: Fix 32-bit nsec parsing	Ian Rogers
	The "time utils" test fails in 32-bit builds: ... parse_nsec_time("18446744073.709551615") Failed. ptime 4294967295709551615 expected 18446744073709551615 ... Switch strtoul to strtoull as an unsigned long in 32-bit build isn't 64-bits. Fixes: c284d669a20d408b ("perf tools: Move parse_nsec_time to time-utils.c") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: David Ahern <dsa@cumulusnetworks.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240831070415.506194-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03	perf pmus: Fix name comparisons on 32-bit systems	Ian Rogers
	The hex PMU suffix maybe 64-bit but the comparisons were "unsigned long" or 32-bit on 32-bit systems. This was causing the "PMU name comparison" test to fail in a 32-bit build. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: David Ahern <dsa@cumulusnetworks.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240831070415.506194-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03	perf annotate: LLVM-based disassembler	Steinar H. Gunderson
	Support using LLVM as a disassembler method, allowing helperless annotation in non-distro builds. (It is also much faster than using libbfd or bfd objdump on binaries with a lot of debug information.) This is nearly identical to the output of llvm-objdump; there are some very rare whitespace differences, some minor changes to demangling (since we use perf's regular demangling and not LLVM's own) and the occasional case where llvm-objdump makes a different choice when multiple symbols share the same address. It should work across all of LLVM's supported architectures, although I've only tested 64-bit x86, and finding the right triple from perf's idea of machine architecture can sometimes be a bit tricky. Ideally, we should have some way of finding the triplet just from the file itself. Committer notes: Address this on 32-bit systems by using PRIu64 from inttypes.h 3 17.58 almalinux:9-i386 : FAIL gcc version 11.4.1 20231218 (Red Hat 11.4.1-3) (GCC) util/llvm-c-helpers.cpp: In function ‘char* make_symbol_relative_string(dso, const char, u64, u64)’: util/llvm-c-helpers.cpp:150:52: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘u64’ {aka +‘long long unsigned int’} [-Werror=format=] 150 \| snprintf(buf, sizeof(buf), "%s+0x%lx", \| ~~^ \| \| \| long unsigned int \| %llx 151 \| demangled ? demangled : sym_name, addr - base_addr); \| ~~~~~~~~~~~~~~~~ \| \| \| u64 {aka long long unsigned int} cc1plus: all warnings being treated as errors Signed-off-by: Steinar H. Gunderson <sesse@google.com> Cc: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240803152008.2818485-3-sesse@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03	perf annotate: Split out read_symbol()	Steinar H. Gunderson
	The Capstone disassembler code has a useful code snippet to read the bytes for a given code symbol into memory. Split it out into its own function, so that the LLVM disassembler can use it in the next patch. Signed-off-by: Steinar H. Gunderson <sesse@google.com> Cc: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240803152008.2818485-2-sesse@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03	perf report: Support LLVM for addr2line()	Steinar H. Gunderson
	In addition to the existing support for libbfd and calling out to an external addr2line command, add support for using libllvm directly. This is both faster than libbfd, and can be enabled in distro builds (the LLVM license has an explicit provision for GPLv2 compatibility). Thus, it is set as the primary choice if available. As an example, running 'perf report' on a medium-size profile with DWARF-based backtraces took 58 seconds with LLVM, 78 seconds with libbfd, 153 seconds with external llvm-addr2line, and I got tired and aborted the test after waiting for 55 minutes with external bfd addr2line (which is the default for perf as compiled by distributions today). Evidently, for this case, the bfd addr2line process needs 18 seconds (on a 5.2 GHz Zen 3) to load the .debug ELF in question, hits the 1-second timeout and gets killed during initialization, getting restarted anew every time. Having an in-process addr2line makes this much more robust. As future extensions, libllvm can be used in many other places where we currently use libbfd or other libraries: - Symbol enumeration (in particular, for PE binaries). - Demangling (including non-Itanium demangling, e.g. Microsoft or Rust). - Disassembling (perf annotate). However, these are much less pressing; most people don't profile PE binaries, and perf has non-bfd paths for ELF. The same with demangling; the default _cxa_demangle path works fine for most users, and while bfd objdump can be slow on large binaries, it is possible to use --objdump=llvm-objdump to get the speed benefits. (It appears LLVM-based demangling is very simple, should we want that.) Tested with LLVM 14, 15, 16, 18 and 19. For some reason, LLVM 12 was not correctly detected using feature_check, and thus was not tested. Committer notes: Added the name and a __maybe_unused to address: 1 13.50 almalinux:8 : FAIL gcc version 8.5.0 20210514 (Red Hat 8.5.0-22) (GCC) util/srcline.c: In function 'dso__free_a2l': util/srcline.c:184:20: error: parameter name omitted void dso__free_a2l(struct dso ) ^~~~~~~~~~~~ make[3]: ** [/git/perf-6.11.0-rc3/tools/build/Makefile.build:158: util] Error 2 Signed-off-by: Steinar H. Gunderson <sesse@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240803152008.2818485-1-sesse@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03	selftests: netfilter: nft_queue.sh: fix spurious timeout on debug kernel	Florian Westphal
	The sctp selftest is very slow on debug kernels. Its possible that the nf_queue listener program exits due to timeout before first sctp packet is processed. In this case socat hangs until script times out. Fix this by removing the -t option where possible and kill the test program once the file transfer/socat has exited. -t sets SO_RCVTIMEO, its inteded for the 'ping' part of the selftest where we want to make sure that packets get reinjected properly without skipping a second queue request. While at it, add a helper to compare the (binary) files instead of diff. The 'diff' part was copied from a another sub-test that compares text. Let helper dump file sizes on error so we can see the progress made. Tested on an old 2010-ish box with a debug kernel and 100 iterations. This is a followup to the earlier filesize reduction change. Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20240829080109.GB30766@breakpoint.cc/ Fixes: 0a8b08c554da ("selftests: netfilter: nft_queue.sh: reduce test file size for debug build") Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20240830092254.8029-1-fw@strlen.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>