summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2025-05-09perf symbol-elf: Integrate rust-v0 demanglingIan Rogers
Use the demangle-rust-v0 APIs to see if symbol is Rust mangled and demangle if so. The API requires a pre-allocated output buffer, some estimation and retrying are added for this. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@kernel.org> Cc: Ariel Ben-Yehuda <ariel.byd@gmail.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Bill Wendling <morbo@google.com> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Daniel Xu <dxu@dxuuu.xyz> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Justin Stitt <justinstitt@google.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Trevor Gross <tmgross@umich.edu> Link: https://lore.kernel.org/r/20250430004128.474388-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09perf demangle-rust: Add rustc-demangle C demanglerIan Rogers
Imported at commit 80e40f57d99f ("add comment about finding latest version of code") from: https://github.com/rust-lang/rustc-demangle/blob/main/crates/native-c/src/demangle.c https://github.com/rust-lang/rustc-demangle/blob/main/crates/native-c/include/demangle.h There is discussion of this issue motivating the import in: https://github.com/rust-lang/rust/issues/60705 https://lore.kernel.org/lkml/20250129193037.573431-1-irogers@google.com/ The SPDX lines reflect the dual license Apache-2 or MIT in: https://github.com/rust-lang/rustc-demangle/blob/main/README.md Following Migual Ojeda's suggestion comments were added on copyright and keeping the code in sync with upstream. The files are renamed as perf supports multiple demanglers and so demangle as a name would be overloaded. The work here was done by Ariel Ben-Yehuda <ariel.byd@gmail.com> and I am merely importing it as discussed in the rust-lang issue. Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@kernel.org> Cc: Ariel Ben-Yehuda <ariel.byd@gmail.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Bill Wendling <morbo@google.com> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Daniel Xu <dxu@dxuuu.xyz> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Justin Stitt <justinstitt@google.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Trevor Gross <tmgross@umich.edu> Link: https://lore.kernel.org/r/20250430004128.474388-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09perf test amd ibs: Fix spelling mistake "Asssuming" -> "Assuming"Colin Ian King
There is a spelling mistake ina pr_debug message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250507082421.188848-1-colin.i.king@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09perf pmu: Use available core PMU for raw eventsNamhyung Kim
When it finds a matching PMU for a legacy event, it should look for core PMUs. The raw events also refers to core events so it should be handled similarly. On x86, PERF_TYPE_RAW should match with the existing cpu PMU. But on ARM, there's no PMU with the matching type so it'll pick the first core PMU for it. Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250507215939.54399-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09perf lock contention: Add -J/--inject-delay optionNamhyung Kim
This is to slow down lock acquistion (on contention locks) deliberately. A possible use case is to estimate impact on application performance by optimization of kernel locking behavior. By delaying the lock it can simulate the worse condition as a control group, and then compare with the current behavior as a optimized condition. The syntax is 'time@function' and the time can have unit suffix like "us" and "ms". For example, I ran a simple test like below. $ sudo perf lock con -abl -L tasklist_lock -- \ sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 92 1.18 ms 199.54 us 12.79 us ffffffff8a806080 tasklist_lock (rwlock) The contention count was 92 and the average wait time was around 10 us. But if I add 100 usec of delay to the tasklist_lock, $ sudo perf lock con -abl -L tasklist_lock -J 100us@tasklist_lock -- \ sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 190 15.67 ms 230.10 us 82.46 us ffffffff8a806080 tasklist_lock (rwlock) The contention count increased and the average wait time was up closed to 100 usec. If I increase the delay even more, $ sudo perf lock con -abl -L tasklist_lock -J 1ms@tasklist_lock -- \ sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 1002 2.80 s 3.01 ms 2.80 ms ffffffff8a806080 tasklist_lock (rwlock) Now every sleep process had contention and the wait time was more than 1 msec. This is on my 4 CPU laptop so I guess one CPU has the lock while other 3 are waiting for it mostly. For simplicity, it only supports global locks for now. Committer testing: root@number:~# grep -m1 'model name' /proc/cpuinfo model name : AMD Ryzen 9 9950X3D 16-Core Processor root@number:~# perf lock con -abl -L tasklist_lock -- sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 142 453.85 us 25.39 us 3.20 us ffffffffae808080 tasklist_lock (rwlock) root@number:~# perf lock con -abl -L tasklist_lock -J 100us@tasklist_lock -- sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 1040 2.39 s 3.11 ms 2.30 ms ffffffffae808080 tasklist_lock (rwlock) root@number:~# perf lock con -abl -L tasklist_lock -J 1ms@tasklist_lock -- sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 1025 24.72 s 31.01 ms 24.12 ms ffffffffae808080 tasklist_lock (rwlock) root@number:~# Suggested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250509171950.183591-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09perf tests: Fix 'perf report' tests installationMichael Petlan
There was a copy-paste mistake in the installation commands. Also, we need to install stderr-whitelist.txt file, which contains allowed messages that are printed on stderr and should not cause test fail. Fixes: 097fe67df1aa9cc7 ("perf testsuite: Install perf-report tests in the 'make install-tests -C tools/perf' target") Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250113182605.130719-6-vmolnaro@redhat.com Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf trace: Fix leaks of 'struct thread' in set_filter_loop_pids()Namhyung Kim
I've found some leaks from 'perf trace -a'. It seems there are more leaks but this is what I can find for now. Fixes: 082ab9a18e532864 ("perf trace: Filter out 'sshd' in the tracer ancestry in syswide tracing") Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250403054213.7021-1-namhyung@kernel.org [ split from a larget patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf trace: Fix leaks of 'struct thread' in fprintf_sys_enter()Namhyung Kim
I've found some leaks from 'perf trace -a'. It seems there are more leaks but this is what I can find for now. Fixes: 70351029b55677eb ("perf thread: Add support for reading the e_machine type for a thread") Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250403054213.7021-1-namhyung@kernel.org [ split from a larget patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf parse-events: Add debug dump of evlist if reorderedIan Rogers
Add debug verbose output to show how evsels were reordered by parse_events__sort_events_and_fix_groups(). For example: ``` $ perf record -v -e '{instructions,cycles}' true Using CPUID GenuineIntel-6-B7-1 WARNING: events were regrouped to match PMUs evlist after sorting/fixing: '{cpu_atom/instructions/,cpu_atom/cycles/},{cpu_core/instructions/,cpu_core/cycles/}' ``` Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250402201549.4090305-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf evlist: Make groups visible in evlist__format_evsels() outputIan Rogers
Make groups visible in output: Before: {cycles,instructions} -> cpu_atom/cycles/,cpu_atom/instructions/,cpu_core/cycles/,cpu_core/instructions/ After: {cycles,instructions} -> {cpu_atom/cycles/,cpu_atom/instructions/},{cpu_core/cycles/,cpu_core/instructions/} Committer testing: Before: root@number:~# perf record -e '{cycles,instructions,cache-misses}' /tmp/bla Failed to collect 'cycles,instructions,cache-misses' for the '/tmp/bla' workload: Permission denied root@number:~# After: root@number:~# perf record -e '{cycles,instructions,cache-misses}' /tmp/bla Failed to collect '{cycles,instructions,cache-misses}' for the '/tmp/bla' workload: Permission denied root@number:~# Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250402201549.4090305-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf evlist: Refactor evlist__scnprintf_evsels()Ian Rogers
Switch output to using a strbuf so the storage can be resized. Add a maximum size argument to avoid too much output that may happen for uncore events. Rename as scnprintf is no longer used. Committer testing: With the patch applied: root@number:~# perf probe -x ~/bin/perf evlist__format_evsels Added new event: probe_perf:evlist_format_evsels (on evlist__format_evsels in /home/acme/bin/perf) You can now use it in all perf tools, such as: perf record -e probe_perf:evlist_format_evsels -aR sleep 1 root@number:~# perf probe -l probe_perf:evlist_format_evsels (on evlist__format_evsels@util/evlist.c in /home/acme/bin/perf) root@number:~# perf trace -e probe_perf:*/max-stack=10/ perf record -e cycles,instructions,cache-misses /tmp/bla Failed to collect 'cycles,instructions,cache-misses' for the '/tmp/bla' workload: Permission denied 0.000 perf/3893011 probe_perf:evlist_format_evsels(__probe_ip: 6183397) evlist__format_evsels (/home/acme/bin/perf) __cmd_record (/home/acme/bin/perf) cmd_record (/home/acme/bin/perf) run_builtin (/home/acme/bin/perf) handle_internal_command (/home/acme/bin/perf) run_argv (/home/acme/bin/perf) main (/home/acme/bin/perf) __libc_start_call_main (/usr/lib64/libc.so.6) __libc_start_main@@GLIBC_2.34 (/usr/lib64/libc.so.6) _start (/home/acme/bin/perf) root@number:~# Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250402201549.4090305-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf stat: Remove print_mixed_hw_group_errorIan Rogers
print_mixed_hw_group_error will print a warning when a group of events uses different PMUs. This isn't possible to happen as parse_events__sort_events_and_fix_groups() will break groups when this happens, adding the warning at the start of perf of: WARNING: events were regrouped to match PMUs As the previous mixed group warning can never happen, remove the associated code. Committer testing: Before/after: acme@five:~$ perf stat -e '{cpu_core/cycles/,cpu_atom/cycles/}' sleep 1 WARNING: events were regrouped to match PMUs Performance counter stats for 'sleep 1': 424,895 cpu_atom/cycles/u <not counted> cpu_core/cycles/u (0.00%) 1.011862314 seconds time elapsed 0.000000000 seconds user 0.003166000 seconds sys acme@five:~$ Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250402201549.4090305-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf stat: Better hybrid support for the NMI watchdog warningIan Rogers
Prior to this patch evlist__has_hybrid would return false if the processor wasn't hybrid or the evlist didn't contain any core events. If the only PMU used by events was cpu_core then it would true even though there are no cpu_atom events. For example: ``` $ perf stat --cputype=cpu_core -e '{cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles}' true Performance counter stats for 'true': <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) 0.001981900 seconds time elapsed 0.002311000 seconds user 0.000000000 seconds sys ``` This patch changes evlist__has_hybrid to return true only if the evlist contains events from >1 core PMU. This means the NMI watchdog warning is shown for the case above. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250402201549.4090305-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf trace: Add missing thread__put() in thread__e_machine()Ian Rogers
Add missing thread__put() of the found parent thread in thread__e_machine(). Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250401202715.3493567-1-irogers@google.com [ split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf trace: Free the files.max entry in files->tableIan Rogers
The files.max is the maximum valid fd in the files array and so freeing the values needs to be inclusive of the max value. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250401202715.3493567-1-irogers@google.com [ split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-07Merge remote-tracking branch 'torvalds/master' into perf-tools-nextArnaldo Carvalho de Melo
To pick up fixes from the latest perf-tools pull request from Namhyung and get perf-tools-next in line with thinngs in other areas it uses, like tools/lib/bpf, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf test: Add direct off-cpu testsHoward Chu
Since we added --off-cpu-thresh, add tests for when a sample's off-cpu time is above the threshold, and when it's below the threshold. Note that the basic test performed in test_offcpu_basic() collects a direct sample now, since sleep 1 has duration of 1000ms, higher than the default value of --off-cpu-thresh of 500ms, resulting in a direct sample. An example: $ sudo perf test offcpu 124: perf record offcpu profiling tests : Ok $ Committer testing: root@number:~# perf test offcpu 126: perf record offcpu profiling tests : Ok root@number:~# perf test -v offcpu 126: perf record offcpu profiling tests : Ok root@number:~# perf test -vv offcpu 126: perf record offcpu profiling tests: --- start --- test child forked, pid 1410791 Checking off-cpu privilege Basic off-cpu test Basic off-cpu test [Success] Child task off-cpu test Child task off-cpu test [Success] Threshold test (above threshold) Threshold test (above threshold) [Success] Threshold test (below threshold) Threshold test (below threshold) [Success] ---- end(0) ---- 126: perf record offcpu profiling tests : Ok root@number:~# Suggested-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250501022809.449767-11-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf record --off-cpu: Add --off-cpu-thresh optionHoward Chu
Specify the threshold for dumping offcpu samples with --off-cpu-thresh, the unit is milliseconds. Default value is 500ms. Example: perf record --off-cpu --off-cpu-thresh 824 The example above collects direct off-cpu samples where the off-cpu time is longer than 824ms. Committer testing: After commenting out the end off-cpu dump to have just the ones that are added right after the task is scheduled back, and using a threshould of 1000ms, we see some periods (the 5th column, just before "offcpu-time" in the 'perf script' output) that are over 1000.000.000 nanoseconds: root@number:~# perf record --off-cpu --off-cpu-thresh 10000 ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 3.902 MB perf.data (34335 samples) ] root@number:~# perf script <SNIP> Isolated Web Co 59932 [028] 63839.594437: 1000049427 offcpu-time: 7fe63c7976c2 __syscall_cancel_arch_end+0x0 (/usr/lib64/libc.so.6) 7fe63c78c04c __futex_abstimed_wait_common+0x7c (/usr/lib64/libc.so.6) 7fe63c78e928 pthread_cond_timedwait@@GLIBC_2.3.2+0x178 (/usr/lib64/libc.so.6) 5599974a9fe7 mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&)+0xe7 (/usr/lib64/fir> 100000000 [unknown] ([unknown]) swapper 0 [025] 63839.594459: 195724 cycles:P: ffffffffac328270 read_tsc+0x0 ([kernel.kallsyms]) Isolated Web Co 59932 [010] 63839.594466: 1000055278 offcpu-time: 7fe63c7976c2 __syscall_cancel_arch_end+0x0 (/usr/lib64/libc.so.6) 7fe63c78ba24 __syscall_cancel+0x14 (/usr/lib64/libc.so.6) 7fe63c804c4e __poll+0x1e (/usr/lib64/libc.so.6) 7fe633b0d1b8 PollWrapper(_GPollFD*, unsigned int, int) [clone .lto_priv.0]+0xf8 (/usr/lib64/firefox/libxul.so) 10000002c [unknown] ([unknown]) swapper 0 [027] 63839.594475: 134433 cycles:P: ffffffffad4c45d9 irqentry_enter+0x19 ([kernel.kallsyms]) swapper 0 [028] 63839.594499: 215838 cycles:P: ffffffffac39199a switch_mm_irqs_off+0x10a ([kernel.kallsyms]) MediaPD~oder #1 1407676 [027] 63839.594514: 134433 cycles:P: 7f982ef5e69f dct_IV(int*, int, int*)+0x24f (/usr/lib64/libfdk-aac.so.2.0.0) swapper 0 [024] 63839.594524: 267411 cycles:P: ffffffffad4c6ee6 poll_idle+0x56 ([kernel.kallsyms]) MediaSu~sor #75 1093827 [026] 63839.594555: 332652 cycles:P: 55be753ad030 moz_xmalloc+0x200 (/usr/lib64/firefox/firefox) swapper 0 [027] 63839.594616: 160548 cycles:P: ffffffffad144840 menu_select+0x570 ([kernel.kallsyms]) Isolated Web Co 14019 [027] 63839.595120: 1000050178 offcpu-time: 7fc9537cc6c2 __syscall_cancel_arch_end+0x0 (/usr/lib64/libc.so.6) 7fc9537c104c __futex_abstimed_wait_common+0x7c (/usr/lib64/libc.so.6) 7fc9537c3928 pthread_cond_timedwait@@GLIBC_2.3.2+0x178 (/usr/lib64/libc.so.6) 7fc95372a3c8 pt_TimedWait+0xb8 (/usr/lib64/libnspr4.so) 7fc95372a8d8 PR_WaitCondVar+0x68 (/usr/lib64/libnspr4.so) 7fc94afb1f7c WatchdogMain(void*)+0xac (/usr/lib64/firefox/libxul.so) 7fc947498660 [unknown] ([unknown]) 7fc9535fce88 [unknown] ([unknown]) 7fc94b620e60 WatchdogManager::~WatchdogManager()+0x0 (/usr/lib64/firefox/libxul.so) fff8548387f8b48 [unknown] ([unknown]) swapper 0 [003] 63839.595712: 212948 cycles:P: ffffffffacd5b865 acpi_os_read_port+0x55 ([kernel.kallsyms]) <SNIP> Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Suggested-by: Ian Rogers <irogers@google.com> Suggested-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-2-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-10-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf record --off-cpu: Dump the remaining PERF_SAMPLE_ in sample_type from ↵Howard Chu
BPF's stack trace map Dump the remaining PERF_SAMPLE_ data, as if it is dumping a direct sample. Put the stack trace, tid, off-cpu time and cgroup id into the raw_data section, just like a direct off-cpu sample coming from BPF's bpf_perf_event_output(). This ensures that evsel__parse_sample() correctly parses both direct samples and accumulated samples. Suggested-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-10-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-9-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf script: Display off-cpu samples correctlyHoward Chu
No PERF_SAMPLE_CALLCHAIN in sample_type, but 'perf script' needs to display a callchain, have to specify manually. Also, prefer displaying a callchain: gvfs-afc-volume 2267 [001] 3829232.955656: 1001115340 offcpu-time: 77f05292603f __pselect+0xbf (/usr/lib/x86_64-linux-gnu/libc.so.6) 77f052a1801c [unknown] (/usr/lib/x86_64-linux-gnu/libusbmuxd-2.0.so.6.0.0) 77f052a18d45 [unknown] (/usr/lib/x86_64-linux-gnu/libusbmuxd-2.0.so.6.0.0) 77f05289ca94 start_thread+0x384 (/usr/lib/x86_64-linux-gnu/libc.so.6) 77f052929c3c clone3+0x2c (/usr/lib/x86_64-linux-gnu/libc.so.6) to a raw binary BPF output: BPF output: 0000: dd 08 00 00 db 08 00 00 <DD>...<DB>... 0008: cc ce ab 3b 00 00 00 00 <CC>Ϋ;.... 0010: 06 00 00 00 00 00 00 00 ........ 0018: 00 fe ff ff ff ff ff ff .<FE><FF><FF><FF><FF><FF><FF> 0020: 3f 60 92 52 f0 77 00 00 ?`.R<F0>w.. 0028: 1c 80 a1 52 f0 77 00 00 ..<A1>R<F0>w.. 0030: 45 8d a1 52 f0 77 00 00 E.<A1>R<F0>w.. 0038: 94 ca 89 52 f0 77 00 00 .<CA>.R<F0>w.. 0040: 3c 9c 92 52 f0 77 00 00 <..R<F0>w.. 0048: 00 00 00 00 00 00 00 00 ........ 0050: 00 00 00 00 .... Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-9-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-8-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf record --off-cpu: Disable perf_event's callchain collectionHoward Chu
There is a check in evsel.c that does this: if (evsel__is_offcpu_event(evsel)) evsel->core.attr.sample_type &= OFFCPU_SAMPLE_TYPES; This along with: #define OFFCPU_SAMPLE_TYPES (PERF_SAMPLE_IDENTIFIER | PERF_SAMPLE_IP | \ PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \ PERF_SAMPLE_ID | PERF_SAMPLE_CPU | \ PERF_SAMPLE_PERIOD | PERF_SAMPLE_CALLCHAIN | \ PERF_SAMPLE_CGROUP) will tell perf_event to collect callchain. We don't need the callchain from perf_event when collecting off-cpu samples, because it's prev's callchain, not next's callchain. (perf_event) (task_storage) (needed) prev next | | ---sched_switch----> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-8-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-7-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf evsel: Assemble off-cpu samplesHoward Chu
Use the data in bpf-output samples, to assemble off-cpu samples. In evsel__is_offcpu_event(), check if sample_type is PERF_SAMPLE_RAW to support off-cpu sample data created by an older version of perf. Testing compatibility on off-cpu samples collected by perf before this patch series: See below, the sample_type still uses PERF_SAMPLE_CALLCHAIN $ perf script --header -i ./perf.data.ptn | grep "event : name = offcpu-time" # event : name = offcpu-time, , id = { 237917, 237918, 237919, 237920 }, type = 1 (software), size = 136, config = 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq } = 1, sample_type = IP|TID|TIME|CALLCHAIN|CPU|PERIOD|IDENTIFIER, read_format = ID|LOST, disabled = 1, freq = 1, sample_id_all = 1 The output is correct. $ perf script -i ./perf.data.ptn | grep offcpu-time gmain 2173 [000] 18446744069.414584: 100102015 offcpu-time: NetworkManager 901 [000] 18446744069.414584: 5603579 offcpu-time: Web Content 1183550 [000] 18446744069.414584: 46278 offcpu-time: gnome-control-c 2200559 [000] 18446744069.414584: 11998247014 offcpu-time: <SNIP> $ And after this patch series: $ perf script --header -i ./perf.data.off-cpu-v9 | grep "event : name = offcpu-time" # event : name = offcpu-time, , id = { 237959, 237960, 237961, 237962 }, type = 1 (software), size = 136, config = 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq } = 1, sample_type = IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format = ID|LOST, disabled = 1, freq = 1, sample_id_all = 1 $ ./perf script -i ./perf.data.off-cpu-v9 | grep offcpu-time gnome-shell 1875 [001] 4789616.361225: 100097057 offcpu-time: gnome-shell 1875 [001] 4789616.461419: 100107463 offcpu-time: firefox 2206821 [002] 4789616.475690: 255257245 offcpu-time: $ Committer testing: The command to record those samples: root@number:~# perf record --off-cpu -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.092 MB perf.data (1552 samples) ] root@number:~# Then, before this patch series, the sample_type for the "offcpu-time" event is: root@number:~# perf evlist -v | grep offcpu-time offcpu-time: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, freq: 1, sample_id_all: 1 root@number:~# And after it, after recording it again: root@number:~# perf record --off-cpu -a sleep 1 ; perf evlist -v | grep offcpu-time [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.151 MB perf.data (2843 samples) ] offcpu-time: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, sample_id_all: 1 root@number:~# Suggested-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-7-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-6-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf record --off-cpu: Dump off-cpu samples in BPFHoward Chu
Collect tid, period, callchain, and cgroup id and dump them when off-cpu time threshold is reached. We don't collect the off-cpu time twice (the delta), it's either in direct samples, or accumulated samples that are dumped at the end of perf.data. Suggested-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-6-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-5-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf record --off-cpu: Preparation of off-cpu BPF programHoward Chu
Set the perf_event map in BPF for dumping off-cpu samples, and set the offcpu_thresh to specify the threshold. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-5-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-4-howardchu95@gmail.com [ Added some missing iteration variables to off_cpu_config() and fixed up a manually edited patch hunk line boundary line ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf record --off-cpu: Parse off-cpu eventHoward Chu
Parse the off-cpu event using parse_event(), as bpf-output. Call evlist__enable_evsel() on off-cpu event. This fixes the inability to collect direct off-cpu samples on a workload, as reported by Arnaldo Carvalho de Melo <acme@redhat.com>. The reason being, workload sets enable_on_exec instead of calling evlist__enable(), but off-cpu event does not attach to an executable and execve won't be called, so the fds from perf_event_open() are not enabled. no-inherit should be set to 1, here's the reason: We update the BPF perf_event map for direct off-cpu sample dumping (in following patches), it executes as follows: bpf_map_update_value() bpf_fd_array_map_update_elem() perf_event_fd_array_get_ptr() perf_event_read_local() In perf_event_read_local(), there is: int perf_event_read_local(struct perf_event *event, u64 *value, u64 *enabled, u64 *running) { ... /* * It must not be an event with inherit set, we cannot read * all child counters from atomic context. */ if (event->attr.inherit) { ret = -EOPNOTSUPP; goto out; } Which means no-inherit has to be true for updating the BPF perf_event map. Moreover, for bpf-output events, we primarily want a system-wide event instead of a per-task event. The reason is that in BPF's bpf_perf_event_output(), BPF uses the CPU index to retrieve the perf_event file descriptor it outputs to. Making a bpf-output event system-wide naturally satisfies this requirement by mapping CPU appropriately. Suggested-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-4-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-3-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf evsel: Expose evsel__is_offcpu_event() for future useHoward Chu
Expose evsel__is_offcpu_event() so it can be used in off_cpu_config(), evsel__parse_sample() and 'perf script'. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-3-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-2-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-04Merge tag 'perf-tools-fixes-for-v6.15-2025-05-04' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Namhyung Kim: "Just a couple of build fixes on arm64" * tag 'perf-tools-fixes-for-v6.15-2025-05-04' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf tools: Fix in-source libperf build perf tools: Fix arm64 build by generating unistd_64.h
2025-05-03Merge tag 'sound-6.15-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A bunch of small fixes. Mostly driver specific. - An OOB access fix in core UMP rawmidi conversion code - Fix for ASoC DAPM hw_params widget sequence - Make retry of usb_set_interface() errors for flaky devices - Fix redundant USB MIDI name strings - Quirks for various HP and ASUS models with HD-audio, and Jabra Evolve 65 USB-audio - Cirrus Kunit test fixes - Various fixes for ASoC Intel, stm32, renesas, imx-card, and simple-card" * tag 'sound-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (30 commits) ASoC: amd: ps: fix for irq handler return status ASoC: simple-card-utils: Fix pointer check in graph_util_parse_link_direction ASoC: intel/sdw_utils: Add volume limit to cs35l56 speakers ASoC: intel/sdw_utils: Add volume limit to cs42l43 speakers ASoC: stm32: sai: add a check on minimal kernel frequency ASoC: stm32: sai: skip useless iterations on kernel rate loop ALSA: hda/realtek - Add more HP laptops which need mute led fixup ALSA: hda/realtek: Fix built-mic regression on other ASUS models ASoC: Intel: catpt: avoid type mismatch in dev_dbg() format ALSA: usb-audio: Fix duplicated name in MIDI substream names ALSA: ump: Fix buffer overflow at UMP SysEx message conversion ALSA: usb-audio: Add second USB ID for Jabra Evolve 65 headset ALSA: hda/realtek: Add quirk for HP Spectre x360 15-df1xxx ALSA: hda: Apply volume control on speaker+lineout for HP EliteStudio AIO ASoC: Intel: bytcr_rt5640: Add DMI quirk for Acer Aspire SW3-013 ASoC: amd: acp: Fix devm_snd_soc_register_card(acp-pdm-mach) failure ASoC: amd: acp: Fix NULL pointer deref in acp_i2s_set_tdm_slot ASoC: amd: acp: Fix NULL pointer deref on acp resume path ASoC: renesas: rz-ssi: Use NOIRQ_SYSTEM_SLEEP_PM_OPS() ASoC: soc-acpi-intel-ptl-match: add empty item to ptl_cs42l43_l3[] ...
2025-05-02perf symbol-minimal: Fix double free in filename__read_build_idIan Rogers
Running the "perf script task-analyzer tests" with address sanitizer showed a double free: ``` FAIL: "test_csv_extended_times" Error message: "Failed to find required string:'Out-Out;'." ================================================================= ==19190==ERROR: AddressSanitizer: attempting double-free on 0x50b000017b10 in thread T0: #0 0x55da9601c78a in free (perf+0x26078a) (BuildId: e7ef50e08970f017a96fde6101c5e2491acc674a) #1 0x55da96640c63 in filename__read_build_id tools/perf/util/symbol-minimal.c:221:2 0x50b000017b10 is located 0 bytes inside of 112-byte region [0x50b000017b10,0x50b000017b80) freed by thread T0 here: #0 0x55da9601ce40 in realloc (perf+0x260e40) (BuildId: e7ef50e08970f017a96fde6101c5e2491acc674a) #1 0x55da96640ad6 in filename__read_build_id tools/perf/util/symbol-minimal.c:204:10 previously allocated by thread T0 here: #0 0x55da9601ca23 in malloc (perf+0x260a23) (BuildId: e7ef50e08970f017a96fde6101c5e2491acc674a) #1 0x55da966407e7 in filename__read_build_id tools/perf/util/symbol-minimal.c:181:9 SUMMARY: AddressSanitizer: double-free (perf+0x26078a) (BuildId: e7ef50e08970f017a96fde6101c5e2491acc674a) in free ==19190==ABORTING FAIL: "invocation of perf script report task-analyzer --csv-summary csvsummary --summary-extended command failed" Error message: "" FAIL: "test_csvsummary_extended" Error message: "Failed to find required string:'Out-Out;'." ---- end(-1) ---- 132: perf script task-analyzer tests : FAILED! ``` The buf_size if always set to phdr->p_filesz, but that may be 0 causing a free and realloc to return NULL. This is treated in filename__read_build_id like a failure and the buffer is freed again. To avoid this problem only grow buf, meaning the buf_size will never be 0. This also reduces the number of memory (re)allocations. Fixes: b691f64360ecec49 ("perf symbols: Implement poor man's ELF parser") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250501070003.22251-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf mem: Add 'dtlb' output fieldNamhyung Kim
This is a breakdown of perf_mem_data_src.mem_dtlb values. It assumes PMU drivers would set PERF_MEM_TLB_HIT bit with an appropriate level. And having PERF_MEM_TLB_MISS means that it failed to find one in any levels of TLB. For now, it doesn't use PERF_MEM_TLB_{WK,OS} bits. Also it seems Intel machines don't distinguish L1 or L2 precisely. So I added ANY_HIT (printed as "L?-Hit") to handle the case. $ perf mem report -F overhead,dtlb,dso --stdio ... # --- D-TLB ---- # Overhead L?-Hit Miss Shared Object # ........ .............. ................. # 67.03% 99.5% 0.5% [unknown] 31.23% 99.2% 0.8% [kernel.kallsyms] 1.08% 97.8% 2.2% [i915] 0.36% 100.0% 0.0% [JIT] tid 6853 0.12% 100.0% 0.0% [drm] 0.05% 100.0% 0.0% [drm_kms_helper] 0.05% 100.0% 0.0% [ext4] 0.02% 100.0% 0.0% [aesni_intel] 0.02% 100.0% 0.0% [crc32c_intel] 0.02% 100.0% 0.0% [dm_crypt] ... Committer testing: # perf report --header | grep cpudesc # cpudesc : AMD Ryzen 9 9950X3D 16-Core Processor # perf mem report -F overhead,dtlb,dso --stdio | head -20 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 2K of event 'cycles:P' # Total weight : 2637 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,local_p_stage_cyc # # ---------- D-TLB ----------- # Overhead L1-Hit L2-Hit Miss Other Shared Object # ........ ............................ ................................. # 77.47% 18.4% 0.1% 0.6% 80.9% [kernel.kallsyms] 5.61% 36.5% 0.7% 1.4% 61.5% libxul.so 2.77% 39.7% 0.0% 12.3% 47.9% libc.so.6 2.01% 34.0% 1.9% 1.9% 62.3% libglib-2.0.so.0.8400.1 1.93% 31.4% 2.0% 2.0% 64.7% [amdgpu] 1.63% 48.8% 0.0% 0.0% 51.2% [JIT] tid 60168 1.14% 3.3% 0.0% 0.0% 96.7% [vdso] # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-12-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf mem: Add 'snoop' output fieldNamhyung Kim
This is a breakdown of perf_mem_data_src.mem_snoop values. For now, it doesn't use mem_snoopx values like FWD and PEER. $ perf mem report -F overhead,snoop,comm --stdio ... # ---------- Snoop ----------- # Overhead Hit HitM Miss Other Command # ........ ............................ ............... # 34.24% 0.6% 0.0% 0.0% 99.4% gnome-shell 12.02% 1.0% 0.0% 0.0% 99.0% chrome 9.32% 1.0% 0.0% 0.3% 98.7% Isolated Web Co 6.85% 1.0% 0.3% 0.0% 98.6% swapper 6.30% 0.8% 0.8% 0.0% 98.5% Xorg 3.02% 2.4% 0.0% 0.0% 97.6% VizCompositorTh 2.35% 0.0% 0.0% 0.0% 100.0% firefox-esr 2.04% 0.0% 0.0% 0.0% 100.0% JS Helper 1.51% 3.2% 0.0% 0.0% 96.8% threaded-ml 1.44% 0.0% 0.0% 0.0% 100.0% AudioIP~allback ... Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-11-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf mem: Add 'cache' and 'memory' output fieldsNamhyung Kim
This is a breakdown of perf_mem_data_src.mem_lvl_num. But it's also divided into two parts because the combination is bigger than 8. Since there are many entries for different cache levels, 'cache' field focuses on them. I generalized buffers like LFB, MAB and MHB to L1-buf and L2-buf. The rest goes to 'memory' field which can be RAM, CXL, PMEM, IO, etc. $ perf mem report -F cache,mem,dso --stdio ... # # -------------- Cache -------------- --- Memory --- # L1 L2 L3 L1-buf Other RAM Other Shared Object # ................................... .............. .................................... # 53.9% 3.6% 16.2% 21.6% 4.8% 4.8% 95.2% [kernel.kallsyms] 64.7% 1.7% 3.5% 17.4% 12.8% 12.8% 87.2% chrome (deleted) 78.3% 2.8% 0.0% 1.0% 17.9% 17.9% 82.1% libc.so.6 39.6% 1.5% 0.0% 5.7% 53.2% 53.2% 46.8% libxul.so 26.2% 0.0% 0.0% 0.0% 73.8% 73.8% 26.2% [unknown] 85.5% 0.0% 0.0% 14.5% 0.0% 0.0% 100.0% libspa-audioconvert.so 66.3% 4.4% 0.0% 29.4% 0.0% 0.0% 100.0% libglib-2.0.so.0.8200.1 (deleted) 1.9% 0.0% 0.0% 0.0% 98.1% 98.1% 1.9% libmutter-cogl-15.so.0.0.0 (deleted) 10.6% 0.0% 0.0% 89.4% 0.0% 0.0% 100.0% libpulsecommon-16.1.so 0.0% 0.0% 0.0% 100.0% 0.0% 0.0% 100.0% libfreeblpriv3.so (deleted) ... Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-10-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf hist: Hide unused mem stat columnsNamhyung Kim
Some mem_stat types don't use all 8 columns. And there are cases only samples in certain kinds of mem_stat types are available only. For that case hide columns which has no samples. The new output for the previous data would be: $ perf mem report -F overhead,op,comm --stdio ... # ------ Mem Op ------- # Overhead Load Store Other Command # ........ ..................... ............... # 44.85% 21.1% 30.7% 48.3% swapper 26.82% 98.8% 0.3% 0.9% netsli-prober 7.19% 51.7% 13.7% 34.6% perf 5.81% 89.7% 2.2% 8.1% qemu-system-ppc 4.77% 100.0% 0.0% 0.0% notifications_c 1.77% 95.9% 1.2% 3.0% MemoryReleaser 0.77% 71.6% 4.1% 24.3% DefaultEventMan 0.19% 66.7% 22.2% 11.1% gnome-shell ... On Intel machines, the event is only for loads or stores so it'll have only one column: # Mem Op # Overhead Load Command # ........ ....... ............... # 20.55% 100.0% swapper 17.13% 100.0% chrome 9.02% 100.0% data-loop.0 6.26% 100.0% pipewire-pulse 5.63% 100.0% threaded-ml 5.47% 100.0% GraphRunner 5.37% 100.0% AudioIP~allback 5.30% 100.0% Chrome_ChildIOT 3.17% 100.0% Isolated Web Co ... Committer testing: # grep "model name" -m1 /proc/cpuinfo model name : AMD Ryzen 9 9950X3D 16-Core Processo # perf mem report -F overhead,op,comm --stdio # Total Lost Samples: 0 # # Samples: 2K of event 'cycles:P' # Total weight : 2637 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,local_p_stage_cyc # # ------ Mem Op ------- # Overhead Load Store Other Command # ........ ..................... ............... # 61.02% 14.4% 25.5% 60.1% swapper 5.61% 26.4% 13.5% 60.1% Isolated Web Co 5.50% 21.4% 29.7% 49.0% perf 4.74% 27.2% 15.2% 57.6% gnome-shell 4.63% 33.6% 11.5% 54.9% mdns_service 4.29% 28.3% 12.4% 59.3% ptyxis 2.16% 24.6% 19.3% 56.1% DOM Worker 0.99% 23.1% 34.6% 42.3% firefox 0.72% 26.3% 15.8% 57.9% IPC I/O Parent 0.61% 12.5% 12.5% 75.0% kworker/u130:20 0.61% 37.5% 18.8% 43.8% podman 0.57% 33.3% 6.7% 60.0% Timer 0.53% 14.3% 7.1% 78.6% KMS thread 0.49% 30.8% 7.7% 61.5% kworker/u130:3- 0.46% 41.7% 33.3% 25.0% IPDL Background Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-9-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf mem: Add 'op' output fieldNamhyung Kim
This is an actual example of the he_mem_stat based sample breakdown. It uses 'mem_op' field of union perf_mem_data_src which means memory operations. It'd have basically 'load' or 'store' which can be useful if PMU doesn't have separate events for them like IBS or SPE. In addition, there's an entry in case load and store happen at the same time. Also adds entries for prefetching and execution. $ perf mem report -F +op -s comm --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 4K of event 'ibs_op//' # Total weight : 9559 # Sort order : comm # # --------------------- Mem Op ---------------------- # Overhead Samples Load Store Ld+St Pfetch Exec Other N/A N/A Command # ........ ....... ................................................... ............... # 44.85% 4077 21.1% 30.7% 0.0% 0.0% 0.0% 48.3% 0.0% 0.0% swapper 26.82% 45 98.8% 0.3% 0.0% 0.0% 0.0% 0.9% 0.0% 0.0% netsli-prober 7.19% 442 51.7% 13.7% 0.0% 0.0% 0.0% 34.6% 0.0% 0.0% perf 5.81% 75 89.7% 2.2% 0.0% 0.0% 0.0% 8.1% 0.0% 0.0% qemu-system-ppc 4.77% 1 100.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% notifications_c 1.77% 10 95.9% 1.2% 0.0% 0.0% 0.0% 3.0% 0.0% 0.0% MemoryReleaser 0.77% 32 71.6% 4.1% 0.0% 0.0% 0.0% 24.3% 0.0% 0.0% DefaultEventMan 0.19% 10 66.7% 22.2% 0.0% 0.0% 0.0% 11.1% 0.0% 0.0% gnome-shell Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf hist: Implement output fields for mem statsNamhyung Kim
This is a preparation for later changes to support mem_stat output. The new fields will need two lines for the header - the first line will show type of mem stat and the second line will show the name of each item which is returned by mem_stat_name(). Each element in the mem_stat array will be printed in percentage for the hist_entry and their sum would be 100%. Add new output field dimension only for SORT_MODE__MEM using mem_stat. To handle possible name conflict with existing sort keys, move the order of checking output field dimensions after the sort dimensions when it looks for sort keys. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf hist: Basic support for mem_stat accountingNamhyung Kim
Add a logic to account he->mem_stat based on mem_stat_type in hists. Each mem_stat entry will have different meaning based on the type so the index in the array is calculated at runtime using the corresponding value in the sample.data_src. Still hists has no mem_stat_types yet so this code won't work for now. Later hists->mem_stat_types will be allocated based on what users want in the output actually. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf hist: Add struct he_mem_statNamhyung Kim
The 'struct he_mem_stat' is to save detailed information about memory instruction. It'll be used to show breakdown of various data from PERF_SAMPLE_DATA_SRC. Note that this structure is generic and the contents will be different depending on actual data it'll use later. The information about the actual data will be saved in 'struct hists' and its length is in nr_mem_stats. This commit just adds ground works and does nothing since hists->nr_mem_stats is 0 for now. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf hist: Support multi-line headerNamhyung Kim
This is a preparation to support multi-line headers in 'perf mem report'. Normal sort keys and output fields that don't have contents for multi- line will print the header string at the last line only. As we don't use multi-line headers normally, it should not have any changes in the output. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf record: Add --sample-mem-info optionNamhyung Kim
There's no way to enable PERF_SAMPLE_DATA_SRC without PERF_SAMPLE_ADDR which brings a lot of overhead due to the number of MMAP[2] records. Let's add a new option to enable this information separately. Committer testing: # perf record -a --sample-mem-info ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.815 MB perf.data (2637 samples) ] # # perf evlist -v cycles:P: type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|IDENTIFIER|DATA_SRC, read_format: ID|LOST, disabled: 1, freq: 1, precise_ip: 2, sample_id_all: 1 dummy:u: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|IDENTIFIER|DATA_SRC, read_format: ID|LOST, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 # # perf report -D |& grep -w PERF_RECORD_SAMPLE -A3 -m1 0 44675164447282 0x1a7590 [0x40]: PERF_RECORD_SAMPLE(IP, 0x4001): 107299/107299: 0xffffffffac4a5e11 period: 144 addr: 0 . data_src: 0x229080142 ... thread: perf:107299 ...... dso: /lib/modules/6.15.0-rc4+/build/vmlinux # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf hist: Remove output field from sort-list properlyNamhyung Kim
When it removes an output format for cancelled children or latency, it should delete itself from the sort list as well. Otherwise assertion in fmt_free() will fire. $ perf report -H --stdio perf: ui/hist.c:603: fmt_free: Assertion `!(!list_empty(&fmt->sort_list))' failed. Aborted (core dumped) Also convert to perf_hpp__column_unregister() for the same open codes. Committer notes: Before this patch: # perf test hierarchy 83: perf report --hierarchy : FAILED! # perf test -v hierarchy --- start --- test child forked, pid 102242 perf report --hierarchy Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.025 MB /tmp/perf-test-report.HX0N85TlPq/perf-report-hierarchy-perf.data (6 samples) ] perf: ui/hist.c:603: fmt_free: Assertion `!(!list_empty(&fmt->sort_list))' failed. /home/acme/libexec/perf-core/tests/shell/perf-report-hierarchy.sh: line 34: 102250 Aborted (core dumped) perf report --hierarchy > /dev/null --- Cleaning up --- ---- end(-1) ---- 83: perf report --hierarchy : FAILED! # After: # perf test hierarchy 83: perf report --hierarchy : Ok # Fixes: dbd11b6bdab12f60 ("perf hist: Remove formats in hierarchy when cancel children") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250430180321.736939-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf test perf-report-hierarchy: Add new testArnaldo Carvalho de Melo
Super simple test to check that at least we're not segfaulting when trying to use 'perf report --hierarchy', more subtests should be added to make sure the output is the expected one. This is being merged right before a fix for that that this test detects: # perf test hierarchy 83: perf report --hierarchy : FAILED! # perf test -v hierarchy --- start --- test child forked, pid 102242 perf report --hierarchy Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.025 MB /tmp/perf-test-report.HX0N85TlPq/perf-report-hierarchy-perf.data (6 samples) ] perf: ui/hist.c:603: fmt_free: Assertion `!(!list_empty(&fmt->sort_list))' failed. /home/acme/libexec/perf-core/tests/shell/perf-report-hierarchy.sh: line 34: 102250 Aborted (core dumped) perf report --hierarchy > /dev/null --- Cleaning up --- ---- end(-1) ---- 83: perf report --hierarchy : FAILED! # Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/lkml/20250430180321.736939-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02Merge tag 'block-6.15-20250502' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request via Christoph: - fix queue unquiesce check on PCI slot_reset (Keith Busch) - fix premature queue removal and I/O failover in nvme-tcp (Michael Liang) - don't restore null sk_state_change (Alistair Francis) - select CONFIG_TLS where needed (Alistair Francis) - always free derived key data (Hannes Reinecke) - more quirks (Wentao Guan) - ublk zero copy fix - ublk selftest fix for UBLK_F_NEED_GET_DATA * tag 'block-6.15-20250502' of git://git.kernel.dk/linux: nvmet-auth: always free derived key data nvmet-tcp: don't restore null sk_state_change nvmet-tcp: select CONFIG_TLS from CONFIG_NVME_TARGET_TCP_TLS nvme-tcp: select CONFIG_TLS from CONFIG_NVME_TCP_TLS nvme-tcp: fix premature queue removal and I/O failover nvme-pci: add quirks for WDC Blue SN550 15b7:5009 nvme-pci: add quirks for device 126f:1001 nvme-pci: fix queue unquiesce check on slot_reset ublk: remove the check of ublk_need_req_ref() from __ublk_check_and_get_req ublk: enhance check for register/unregister io buffer command ublk: decouple zero copy from user copy selftests: ublk: fix UBLK_F_NEED_GET_DATA
2025-05-01Merge tag 'net-6.15-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Happy May Day. Things have calmed down on our end (knock on wood), no outstanding investigations. Including fixes from Bluetooth and WiFi. Current release - fix to a fix: - igc: fix lock order in igc_ptp_reset Current release - new code bugs: - Revert "wifi: iwlwifi: make no_160 more generic", fixes regression to Killer line of devices reported by a number of people - Revert "wifi: iwlwifi: add support for BE213", initial FW is too buggy - number of fixes for mld, the new Intel WiFi subdriver Previous releases - regressions: - wifi: mac80211: restore monitor for outgoing frames - drv: vmxnet3: fix malformed packet sizing in vmxnet3_process_xdp - eth: bnxt_en: fix timestamping FIFO getting out of sync on reset, delivering stale timestamps - use sock_gen_put() in the TCP fraglist GRO heuristic, don't assume every socket is a full socket Previous releases - always broken: - sched: adapt qdiscs for reentrant enqueue cases, fix list corruptions - xsk: fix race condition in AF_XDP generic RX path, shared UMEM can't be protected by a per-socket lock - eth: mtk-star-emac: fix spinlock recursion issues on rx/tx poll - btusb: avoid NULL pointer dereference in skb_dequeue() - dsa: felix: fix broken taprio gate states after clock jump" * tag 'net-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits) net: vertexcom: mse102x: Fix RX error handling net: vertexcom: mse102x: Add range check for CMD_RTS net: vertexcom: mse102x: Fix LEN_MASK net: vertexcom: mse102x: Fix possible stuck of SPI interrupt net: hns3: defer calling ptp_clock_register() net: hns3: fixed debugfs tm_qset size net: hns3: fix an interrupt residual problem net: hns3: store rx VLAN tag offload state for VF octeon_ep: Fix host hang issue during device reboot net: fec: ERR007885 Workaround for conventional TX net: lan743x: Fix memleak issue when GSO enabled ptp: ocp: Fix NULL dereference in Adva board SMA sysfs operations net: use sock_gen_put() when sk_state is TCP_TIME_WAIT bnxt_en: fix module unload sequence bnxt_en: Fix ethtool -d byte order for 32-bit values bnxt_en: Fix out-of-bound memcpy() during ethtool -w bnxt_en: Fix coredump logic to free allocated buffer bnxt_en: delay pci_alloc_irq_vectors() in the AER path bnxt_en: call pci_alloc_irq_vectors() after bnxt_reserve_rings() bnxt_en: Add missing skb_mark_for_recycle() in bnxt_rx_vlan() ...
2025-05-01ASoC: stm32: sai: fix kernel rate configurationMark Brown
Merge series from Olivier Moysan <olivier.moysan@foss.st.com>: This patchset adds some checks on kernel minimum rate requirements. This avoids potential clock rate misconfiguration, when setting the kernel frequency on STM32MP2 SoCs.
2025-04-29perf test amd ibs: Add sample period unit testRavi Bangoria
IBS Fetch and IBS Op PMUs has various constraints on supported sample periods. Add perf unit tests to test those. Running it in parallel with other tests causes intermittent failures. Mark it exclusive to force it to run sequentially. Sample output on a Zen5 machine: Without kernel fixes: $ sudo ./perf test -vv 112 112: AMD IBS sample period: --- start --- test child forked, pid 8774 Using CPUID AuthenticAMD-26-2-1 IBS config tests: ----------------- Fetch PMU tests: 0xffff : Ok (nr samples: 1078) 0x1000 : Ok (nr samples: 17030) 0xff : Ok (nr samples: 41068) 0x1 : Ok (nr samples: 40543) 0x0 : Ok 0x10000 : Ok Op PMU tests: 0x0 : Ok 0x1 : Fail 0x8 : Fail 0x9 : Ok (nr samples: 40543) 0xf : Ok (nr samples: 40543) 0x1000 : Ok (nr samples: 18736) 0xffff : Ok (nr samples: 1168) 0x10000 : Ok 0x100000 : Fail (nr samples: 14) 0xf00000 : Fail (nr samples: 1) 0xf0ffff : Fail (nr samples: 1) 0x1f0ffff : Fail (nr samples: 1) 0x7f0ffff : Fail (nr samples: 0) 0x8f0ffff : Ok 0x17f0ffff : Ok IBS sample period constraint tests: ----------------------------------- Fetch PMU test: freq 0, sample_freq 0: Ok freq 0, sample_freq 1: Fail freq 0, sample_freq 15: Fail freq 0, sample_freq 16: Ok (nr samples: 1604) freq 0, sample_freq 17: Ok (nr samples: 1604) freq 0, sample_freq 143: Ok (nr samples: 1604) freq 0, sample_freq 144: Ok (nr samples: 1604) freq 0, sample_freq 145: Ok (nr samples: 1604) freq 0, sample_freq 1234: Ok (nr samples: 1566) freq 0, sample_freq 4103: Ok (nr samples: 1119) freq 0, sample_freq 65520: Ok (nr samples: 2264) freq 0, sample_freq 65535: Ok (nr samples: 2263) freq 0, sample_freq 65552: Ok (nr samples: 1166) freq 0, sample_freq 8388607: Ok (nr samples: 268) freq 0, sample_freq 268435455: Ok (nr samples: 8) freq 1, sample_freq 0: Ok freq 1, sample_freq 1: Ok (nr samples: 4) freq 1, sample_freq 15: Ok (nr samples: 4) freq 1, sample_freq 16: Ok (nr samples: 4) freq 1, sample_freq 17: Ok (nr samples: 4) freq 1, sample_freq 143: Ok (nr samples: 5) freq 1, sample_freq 144: Ok (nr samples: 5) freq 1, sample_freq 145: Ok (nr samples: 5) freq 1, sample_freq 1234: Ok (nr samples: 7) freq 1, sample_freq 4103: Ok (nr samples: 35) freq 1, sample_freq 65520: Ok (nr samples: 642) freq 1, sample_freq 65535: Ok (nr samples: 636) freq 1, sample_freq 65552: Ok (nr samples: 651) freq 1, sample_freq 8388607: Ok Op PMU test: freq 0, sample_freq 0: Ok freq 0, sample_freq 1: Fail freq 0, sample_freq 15: Fail freq 0, sample_freq 16: Fail freq 0, sample_freq 17: Fail freq 0, sample_freq 143: Fail freq 0, sample_freq 144: Ok (nr samples: 1604) freq 0, sample_freq 145: Ok (nr samples: 1604) freq 0, sample_freq 1234: Ok (nr samples: 1604) freq 0, sample_freq 4103: Ok (nr samples: 1604) freq 0, sample_freq 65520: Ok (nr samples: 2227) freq 0, sample_freq 65535: Ok (nr samples: 2296) freq 0, sample_freq 65552: Ok (nr samples: 2213) freq 0, sample_freq 8388607: Ok (nr samples: 250) freq 0, sample_freq 268435455: Ok (nr samples: 8) freq 1, sample_freq 0: Ok freq 1, sample_freq 1: Fail (nr samples: 4) freq 1, sample_freq 15: Fail (nr samples: 4) freq 1, sample_freq 16: Fail (nr samples: 4) freq 1, sample_freq 17: Fail (nr samples: 4) freq 1, sample_freq 143: Fail (nr samples: 5) freq 1, sample_freq 144: Fail (nr samples: 5) freq 1, sample_freq 145: Fail (nr samples: 5) freq 1, sample_freq 1234: Fail (nr samples: 8) freq 1, sample_freq 4103: Fail (nr samples: 33) freq 1, sample_freq 65520: Fail (nr samples: 546) freq 1, sample_freq 65535: Fail (nr samples: 544) freq 1, sample_freq 65552: Fail (nr samples: 555) freq 1, sample_freq 8388607: Ok IBS ioctl() tests: ------------------ Fetch PMU tests ioctl(period = 0x0 ): Ok ioctl(period = 0x1 ): Fail ioctl(period = 0xf ): Fail ioctl(period = 0x10 ): Ok ioctl(period = 0x11 ): Fail ioctl(period = 0x1f ): Fail ioctl(period = 0x20 ): Ok ioctl(period = 0x80 ): Ok ioctl(period = 0x8f ): Fail ioctl(period = 0x90 ): Ok ioctl(period = 0x91 ): Fail ioctl(period = 0x100 ): Ok ioctl(period = 0xfff0 ): Ok ioctl(period = 0xffff ): Fail ioctl(period = 0x10000 ): Ok ioctl(period = 0x1fff0 ): Ok ioctl(period = 0x1fff5 ): Fail ioctl(freq = 0x0 ): Ok ioctl(freq = 0x1 ): Ok ioctl(freq = 0xf ): Ok ioctl(freq = 0x10 ): Ok ioctl(freq = 0x11 ): Ok ioctl(freq = 0x1f ): Ok ioctl(freq = 0x20 ): Ok ioctl(freq = 0x80 ): Ok ioctl(freq = 0x8f ): Ok ioctl(freq = 0x90 ): Ok ioctl(freq = 0x91 ): Ok ioctl(freq = 0x100 ): Ok Op PMU tests ioctl(period = 0x0 ): Ok ioctl(period = 0x1 ): Fail ioctl(period = 0xf ): Fail ioctl(period = 0x10 ): Fail ioctl(period = 0x11 ): Fail ioctl(period = 0x1f ): Fail ioctl(period = 0x20 ): Fail ioctl(period = 0x80 ): Fail ioctl(period = 0x8f ): Fail ioctl(period = 0x90 ): Ok ioctl(period = 0x91 ): Fail ioctl(period = 0x100 ): Ok ioctl(period = 0xfff0 ): Ok ioctl(period = 0xffff ): Fail ioctl(period = 0x10000 ): Ok ioctl(period = 0x1fff0 ): Ok ioctl(period = 0x1fff5 ): Fail ioctl(freq = 0x0 ): Ok ioctl(freq = 0x1 ): Ok ioctl(freq = 0xf ): Ok ioctl(freq = 0x10 ): Ok ioctl(freq = 0x11 ): Ok ioctl(freq = 0x1f ): Ok ioctl(freq = 0x20 ): Ok ioctl(freq = 0x80 ): Ok ioctl(freq = 0x8f ): Ok ioctl(freq = 0x90 ): Ok ioctl(freq = 0x91 ): Ok ioctl(freq = 0x100 ): Ok IBS freq (negative) tests: -------------------------- freq 1, sample_freq 200000: Fail IBS L3MissOnly test: (takes a while) -------------------- Fetch L3MissOnly: Fail (nr_samples: 1213) Op L3MissOnly: Ok (nr_samples: 1193) ---- end(-1) ---- 112: AMD IBS sample period : FAILED! With kernel fixes: $ sudo ./perf test -vv 112 112: AMD IBS sample period: --- start --- test child forked, pid 6939 Using CPUID AuthenticAMD-26-2-1 IBS config tests: ----------------- Fetch PMU tests: 0xffff : Ok (nr samples: 969) 0x1000 : Ok (nr samples: 15540) 0xff : Ok (nr samples: 40555) 0x1 : Ok (nr samples: 40543) 0x0 : Ok 0x10000 : Ok Op PMU tests: 0x0 : Ok 0x1 : Ok 0x8 : Ok 0x9 : Ok (nr samples: 40543) 0xf : Ok (nr samples: 40543) 0x1000 : Ok (nr samples: 19156) 0xffff : Ok (nr samples: 1169) 0x10000 : Ok 0x100000 : Ok (nr samples: 1151) 0xf00000 : Ok (nr samples: 76) 0xf0ffff : Ok (nr samples: 73) 0x1f0ffff : Ok (nr samples: 33) 0x7f0ffff : Ok (nr samples: 10) 0x8f0ffff : Ok 0x17f0ffff : Ok IBS sample period constraint tests: ----------------------------------- Fetch PMU test: freq 0, sample_freq 0: Ok freq 0, sample_freq 1: Ok freq 0, sample_freq 15: Ok freq 0, sample_freq 16: Ok (nr samples: 1203) freq 0, sample_freq 17: Ok (nr samples: 1604) freq 0, sample_freq 143: Ok (nr samples: 1604) freq 0, sample_freq 144: Ok (nr samples: 1604) freq 0, sample_freq 145: Ok (nr samples: 1604) freq 0, sample_freq 1234: Ok (nr samples: 1604) freq 0, sample_freq 4103: Ok (nr samples: 1343) freq 0, sample_freq 65520: Ok (nr samples: 2254) freq 0, sample_freq 65535: Ok (nr samples: 2136) freq 0, sample_freq 65552: Ok (nr samples: 1158) freq 0, sample_freq 8388607: Ok (nr samples: 257) freq 0, sample_freq 268435455: Ok (nr samples: 8) freq 1, sample_freq 0: Ok freq 1, sample_freq 1: Ok (nr samples: 4) freq 1, sample_freq 15: Ok (nr samples: 4) freq 1, sample_freq 16: Ok (nr samples: 4) freq 1, sample_freq 17: Ok (nr samples: 4) freq 1, sample_freq 143: Ok (nr samples: 5) freq 1, sample_freq 144: Ok (nr samples: 5) freq 1, sample_freq 145: Ok (nr samples: 5) freq 1, sample_freq 1234: Ok (nr samples: 8) freq 1, sample_freq 4103: Ok (nr samples: 34) freq 1, sample_freq 65520: Ok (nr samples: 458) freq 1, sample_freq 65535: Ok (nr samples: 628) freq 1, sample_freq 65552: Ok (nr samples: 396) freq 1, sample_freq 8388607: Ok Op PMU test: freq 0, sample_freq 0: Ok freq 0, sample_freq 1: Ok freq 0, sample_freq 15: Ok freq 0, sample_freq 16: Ok freq 0, sample_freq 17: Ok freq 0, sample_freq 143: Ok freq 0, sample_freq 144: Ok (nr samples: 1604) freq 0, sample_freq 145: Ok (nr samples: 1604) freq 0, sample_freq 1234: Ok (nr samples: 1604) freq 0, sample_freq 4103: Ok (nr samples: 1604) freq 0, sample_freq 65520: Ok (nr samples: 2250) freq 0, sample_freq 65535: Ok (nr samples: 2158) freq 0, sample_freq 65552: Ok (nr samples: 2296) freq 0, sample_freq 8388607: Ok (nr samples: 243) freq 0, sample_freq 268435455: Ok (nr samples: 6) freq 1, sample_freq 0: Ok freq 1, sample_freq 1: Ok (nr samples: 4) freq 1, sample_freq 15: Ok (nr samples: 4) freq 1, sample_freq 16: Ok (nr samples: 4) freq 1, sample_freq 17: Ok (nr samples: 4) freq 1, sample_freq 143: Ok (nr samples: 4) freq 1, sample_freq 144: Ok (nr samples: 5) freq 1, sample_freq 145: Ok (nr samples: 4) freq 1, sample_freq 1234: Ok (nr samples: 6) freq 1, sample_freq 4103: Ok (nr samples: 27) freq 1, sample_freq 65520: Ok (nr samples: 542) freq 1, sample_freq 65535: Ok (nr samples: 550) freq 1, sample_freq 65552: Ok (nr samples: 552) freq 1, sample_freq 8388607: Ok IBS ioctl() tests: ------------------ Fetch PMU tests ioctl(period = 0x0 ): Ok ioctl(period = 0x1 ): Ok ioctl(period = 0xf ): Ok ioctl(period = 0x10 ): Ok ioctl(period = 0x11 ): Ok ioctl(period = 0x1f ): Ok ioctl(period = 0x20 ): Ok ioctl(period = 0x80 ): Ok ioctl(period = 0x8f ): Ok ioctl(period = 0x90 ): Ok ioctl(period = 0x91 ): Ok ioctl(period = 0x100 ): Ok ioctl(period = 0xfff0 ): Ok ioctl(period = 0xffff ): Ok ioctl(period = 0x10000 ): Ok ioctl(period = 0x1fff0 ): Ok ioctl(period = 0x1fff5 ): Ok ioctl(freq = 0x0 ): Ok ioctl(freq = 0x1 ): Ok ioctl(freq = 0xf ): Ok ioctl(freq = 0x10 ): Ok ioctl(freq = 0x11 ): Ok ioctl(freq = 0x1f ): Ok ioctl(freq = 0x20 ): Ok ioctl(freq = 0x80 ): Ok ioctl(freq = 0x8f ): Ok ioctl(freq = 0x90 ): Ok ioctl(freq = 0x91 ): Ok ioctl(freq = 0x100 ): Ok Op PMU tests ioctl(period = 0x0 ): Ok ioctl(period = 0x1 ): Ok ioctl(period = 0xf ): Ok ioctl(period = 0x10 ): Ok ioctl(period = 0x11 ): Ok ioctl(period = 0x1f ): Ok ioctl(period = 0x20 ): Ok ioctl(period = 0x80 ): Ok ioctl(period = 0x8f ): Ok ioctl(period = 0x90 ): Ok ioctl(period = 0x91 ): Ok ioctl(period = 0x100 ): Ok ioctl(period = 0xfff0 ): Ok ioctl(period = 0xffff ): Ok ioctl(period = 0x10000 ): Ok ioctl(period = 0x1fff0 ): Ok ioctl(period = 0x1fff5 ): Ok ioctl(freq = 0x0 ): Ok ioctl(freq = 0x1 ): Ok ioctl(freq = 0xf ): Ok ioctl(freq = 0x10 ): Ok ioctl(freq = 0x11 ): Ok ioctl(freq = 0x1f ): Ok ioctl(freq = 0x20 ): Ok ioctl(freq = 0x80 ): Ok ioctl(freq = 0x8f ): Ok ioctl(freq = 0x90 ): Ok ioctl(freq = 0x91 ): Ok ioctl(freq = 0x100 ): Ok IBS freq (negative) tests: -------------------------- freq 1, sample_freq 200000: Ok IBS L3MissOnly test: (takes a while) -------------------- Fetch L3MissOnly: Ok (nr_samples: 1301) Op L3MissOnly: Ok (nr_samples: 1590) ---- end(0) ---- 112: AMD IBS sample period : Ok Committer notes: Avoid using PAGE_SIZE as that define is also in sys/user.h Make it a variable not to call sysconf() multiple times. Also cast func to void * when passing it as the first arg to memcpy to avoid this with some versions of clang: arch/x86/tests/amd-ibs-period.c:81:3: error: no matching function for call to 'memcpy' memcpy(func, insn1, sizeof(insn1)); ^~~~~~ /usr/include/string.h:27:7: note: candidate function not viable: no known conversion from 'int (*)(void)' to 'void *' for 1st argument void *memcpy (void *__restrict, const void *__restrict, size_t); ^ /usr/include/fortify/string.h:40:27: note: candidate function not viable: no known conversion from 'int (*)(void)' to 'void *const' for 1st argument _FORTIFY_FN(memcpy) void *memcpy(void * _FORTIFY_POS0 __od, ^ arch/x86/tests/amd-ibs-period.c:87:3: error: no matching function for call to 'memcpy' This one, for instance: Alpine clang version 19.1.4 Target: x86_64-alpine-linux-musl Thread model: posix InstalledDir: /usr/lib/llvm19/bin Configuration file: /etc/clang19/x86_64-alpine-linux-musl.cfg System configuration file directory: /etc/clang19 Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20250429035938.1301-5-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-04-29perf mem/c2c amd: Add ldlat supportRavi Bangoria
'perf mem/c2c' uses IBS Op PMU on AMD platforms. IBS Op PMU on Zen5 uarch has added support for Load Latency filtering. Implement 'perf mem/c2c' --ldlat using IBS Op Load Latency filtering capability. Some subtle differences between AMD and other arch: o --ldlat is disabled by default on AMD o Supported values are 128 to 2048. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20250429035938.1301-4-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-04-29perf amd ibs: Incorporate Zen5 DTLB and PageSize informationRavi Bangoria
IBS Op PMU on Zen5 reports DTLB and page size information differently compared to prior generation. IBS_OP_DATA3 Zen3/4 Zen5 ---------------------------------------------------------------- 19 IbsDcL2TlbHit1G Reserved ---------------------------------------------------------------- 6 IbsDcL2tlbHit2M Reserved ---------------------------------------------------------------- 5 IbsDcL1TlbHit1G PageSize: 4 IbsDcL1TlbHit2M 0 - 4K 1 - 2M 2 - 1G 3 - Reserved Valid only if IbsDcPhyAddrValid = 1 ---------------------------------------------------------------- 3 IbsDcL2TlbMiss IbsDcL2TlbMiss Valid only if IbsDcPhyAddrValid = 1 ---------------------------------------------------------------- 2 IbsDcL1tlbMiss IbsDcL1tlbMiss Valid only if IbsDcPhyAddrValid = 1 ---------------------------------------------------------------- Kernel expose this change as "dtlb_pgsize" capability in PMU sysfs. Change IBS register raw-dump logic according to new bit definitions. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20250429035938.1301-3-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-04-29perf amd ibs: Add Load Latency bits in raw dumpRavi Bangoria
IBS OP PMU on Zen5 supports Load Latency filtering. Decode and dump Load Latency filtering related bits into perf script raw dump. Also add oneliner example in the perf-amd-ibs man page. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20250429035938.1301-2-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-04-29perf symbols: Handle 'u' and 'l' symbols in /proc/kallsymsArnaldo Carvalho de Melo
I started seeing this in recent Fedora 42 kernels: # uname -a Linux number 6.14.3-300.fc42.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Apr 20 16:08:39 UTC 2025 x86_64 GNU/Linux # # perf test vmlinux 1: vmlinux symtab matches kallsyms : FAILED! # Where we have Rust enabled: # grep CONFIG_RUST /boot/config-6.14.3-300.fc42.x86_64 CONFIG_RUSTC_VERSION=108600 CONFIG_RUST_IS_AVAILABLE=y CONFIG_RUSTC_LLVM_VERSION=200101 CONFIG_RUSTC_HAS_COERCE_POINTEE=y CONFIG_RUST=y CONFIG_RUSTC_VERSION_TEXT="rustc 1.86.0 (05f9846f8 2025-03-31) (Fedora 1.86.0-1.fc42)" CONFIG_RUST_FW_LOADER_ABSTRACTIONS=y CONFIG_RUST_PHYLIB_ABSTRACTIONS=y # CONFIG_RUST_DEBUG_ASSERTIONS is not set CONFIG_RUST_OVERFLOW_CHECKS=y # CONFIG_RUST_BUILD_ASSERT_ALLOW is not set # Looking at the reason for the failure: # perf test -v vmlinux |& grep ^ERR ERR : 0xffffffff99efc7d0: __pfx__RNCINvNtNtNtCsf5tcb0XGUW4_4core4iter8adapters3map12map_try_foldjNtCsagR6JbSOIa9_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_ not on kallsyms ERR : 0xffffffff99efc7e0: _RNCINvNtNtNtCsf5tcb0XGUW4_4core4iter8adapters3map12map_try_foldjNtCsagR6JbSOIa9_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_ not on kallsyms # But: # grep -w u /proc/kallsyms ffffffff99efc7d0 u __pfx__RNCINvNtNtNtCsf5tcb0XGUW4_4core4iter8adapters3map12map_try_foldjNtCsagR6JbSOIa9_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_ ffffffff99efc7e0 u _RNCINvNtNtNtCsf5tcb0XGUW4_4core4iter8adapters3map12map_try_foldjNtCsagR6JbSOIa9_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_ # The test checks that "vmlinux symtab matches kallsyms", so it finds those two symbols in vmlinux: # pahole --running_kernel_vmlinux /usr/lib/debug/lib/modules/6.14.3-300.fc42.x86_64/vmlinux # # readelf -sW /usr/lib/debug/lib/modules/6.14.3-300.fc42.x86_64/vmlinux | grep -Ew '(__pfx__RNCINvNtNtNtCsf5tcb0XGUW4_4core4iter8adapters3map12map_try_foldjNtCsagR6JbSOIa9_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_|_RNCINvNtNtNtCsf5tcb0XGUW4_4core4iter8adapters3map12map_try_foldjNtCsagR6JbSOIa9_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_)' 81844: ffffffff81efc7e0 524 FUNC LOCAL DEFAULT 1 _RNCINvNtNtNtCsf5tcb0XGUW4_4core4iter8adapters3map12map_try_foldjNtCsagR6JbSOIa9_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_ 144259: ffffffff81efc7d0 16 FUNC LOCAL DEFAULT 1 __pfx__RNCINvNtNtNtCsf5tcb0XGUW4_4core4iter8adapters3map12map_try_foldjNtCsagR6JbSOIa9_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_ # It is there. From the nm documentation we can see that: "U" The symbol is undefined. "u" The symbol is a unique global symbol. This is a GNU extension to the standard set of ELF symbol bindings. For such a symbol the dynamic linker will make sure that in the entire process there is just one symbol with this name and type in use. So lets consider 'u' symbols in /proc/kallsyms when loading it to cover this case. Fedora:40 shows this as a 'l' symbol, so consider that as well. With this patch 'perf test 1' is happy again: # perf test vmlinux 1: vmlinux symtab matches kallsyms : Ok # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/aBE_n0PGl3g6h-cS@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-04-29selftests: net: tc_taprio: new testVladimir Oltean
Add a forwarding path test for tc-taprio, based on isochron. This is specifically intended for NICs with an offloaded data path (switchdev/DSA) and requires taprio 'flags 2'. Also, $h1 and $h2 must support hardware timestamping, and $h1 tc-etf offload, for isochron to work. Packets received by a switch while the egress port has a taprio schedule with an open gate for the traffic class must be sent right away. Packets received by the switch while the traffic class gate must be delayed until it opens. Packets received by the switch must be dropped if the gate for the traffic class never opens. Packets should pass if the maximum SDU for the traffic class allows it, and should be dropped otherwise. The schedule should auto-update itself if clock jumps take place while taprio is installed. Repeat most of the above tests after forcing two clock jumps, one backwards (in Jan 1970) and one back into the present. Symlink it from tools/testing/selftests/drivers/net/dsa, because usually DSA ports have the same MAC address, and we need STABLE_MAC_ADDRS=yes from its forwarding.config for the test to run successfully. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250426144859.3128352-5-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>