summaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)Author
2023-09-26perf pmu: Fix perf stat output with correct scale and unitWyes Karny
The perf_pmu__parse_* functions for the sysfs files of pmu event’s scale, unit, per-pkg and snapshot were updated in commit 7b723dbb96e8 ("perf pmu: Be lazy about loading event info files from sysfs"). However, the paths for these sysfs files were incorrect. This resulted in perf stat reporting values with wrong scaling and missing units. This is fixed by correcting the paths for these sysfs files. Before this fix: $sudo perf stat -e power/energy-pkg/ -- sleep 2 Performance counter stats for 'system wide': 351,217,188,864 power/energy-pkg/ 2.004127961 seconds time elapsed After this fix: $sudo perf stat -e power/energy-pkg/ -- sleep 2 Performance counter stats for 'system wide': 80.58 Joules power/energy-pkg/ 2.004009749 seconds time elapsed Fixes: 7b723dbb96e8 ("perf pmu: Be lazy about loading event info files from sysfs") Signed-off-by: Wyes Karny <wyes.karny@amd.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: ravi.bangoria@amd.com Cc: sandipan.das@amd.com Cc: james.clark@arm.com Cc: kan.liang@linux.intel.com Link: https://lore.kernel.org/r/20230920122349.418673-1-wyes.karny@amd.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-21perf test lock_contention.sh: Skip test if not enough CPUsVeronika Molnarova
Machines with less then 4 CPUs weren't consistently triggering lock events required for the test. Skip the test on those machines. The limit of 4 CPUs is set as it generates around 100 lock events for a test. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Acked-by: Michael Petlan <mpetlan@redhat.com> Link: https://lore.kernel.org/r/20230919150419.23193-2-vmolnaro@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-21perf test stat+shadow_stat.sh: Add threshold for rounding errorsVeronika Molnarova
The test was failing in specific scenarios due to imperfection of FP arithmetics. The `bc` command wasn't correctly rounding the result of division causing the failure. Replace the `bc` with `awk` which should work with more decimal places and add a threshold to catch any possible rounding errors. The acceptable rounding error is set to 0.01 when the test passes with a warning message. Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Acked-by: Michael Petlan <mpetlan@redhat.com> Link: https://lore.kernel.org/r/20230919150419.23193-1-vmolnaro@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-19perf jevents: fix no member named 'entries' issueXu Yang
The struct "pmu_events_table" has been changed after commit 2e255b4f9f41 (perf jevents: Group events by PMU, 2023-08-23). So there doesn't exist 'entries' in pmu_events_table anymore. This will align the members with that commit. Othewise, below errors will be printed when run jevent.py: pmu-events/pmu-events.c:5485:26: error: ‘struct pmu_metrics_table’ has no member named ‘entries’ 5485 | .entries = pmu_metrics__freescale_imx8dxl_sys, Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20230919080929.3807123-1-xu.yang_2@nxp.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-18perf parse-events: Fix tracepoint name memory leakIan Rogers
Fuzzing found that an invalid tracepoint name would create a memory leak with an address sanitizer build: ``` $ perf stat -e '*:o/' true event syntax error: '*:o/' \___ parser error Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events ================================================================= ==59380==ERROR: LeakSanitizer: detected memory leaks Direct leak of 4 byte(s) in 2 object(s) allocated from: #0 0x7f38ac07077b in __interceptor_strdup ../../../../src/libsanitizer/asan/asan_interceptors.cpp:439 #1 0x55f2f41be73b in str util/parse-events.l:49 #2 0x55f2f41d08e8 in parse_events_lex util/parse-events.l:338 #3 0x55f2f41dc3b1 in parse_events_parse util/parse-events-bison.c:1464 #4 0x55f2f410b8b3 in parse_events__scanner util/parse-events.c:1822 #5 0x55f2f410d1b9 in __parse_events util/parse-events.c:2094 #6 0x55f2f410e57f in parse_events_option util/parse-events.c:2279 #7 0x55f2f4427b56 in get_value tools/lib/subcmd/parse-options.c:251 #8 0x55f2f4428d98 in parse_short_opt tools/lib/subcmd/parse-options.c:351 #9 0x55f2f4429d80 in parse_options_step tools/lib/subcmd/parse-options.c:539 #10 0x55f2f442acb9 in parse_options_subcommand tools/lib/subcmd/parse-options.c:654 #11 0x55f2f3ec99fc in cmd_stat tools/perf/builtin-stat.c:2501 #12 0x55f2f4093289 in run_builtin tools/perf/perf.c:322 #13 0x55f2f40937f5 in handle_internal_command tools/perf/perf.c:375 #14 0x55f2f4093bbd in run_argv tools/perf/perf.c:419 #15 0x55f2f409412b in main tools/perf/perf.c:535 SUMMARY: AddressSanitizer: 4 byte(s) leaked in 2 allocation(s). ``` Fix by adding the missing destructor. Fixes: 865582c3f48e ("perf tools: Adds the tracepoint name parsing support") Signed-off-by: Ian Rogers <irogers@google.com> Cc: He Kuang <hekuang@huawei.com> Link: https://lore.kernel.org/r/20230914164028.363220-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-18perf test: Detect off-cpu support from build optionsIan Rogers
Use perf version to detect whether BPF skeletons were enabled in a build rather than a failing perf record. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Patrice Duroux <patrice.duroux@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: llvm@lists.linux.dev Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230914211948.814999-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-18perf test: Ensure EXTRA_TESTS is covered in build testIan Rogers
Add to run variable. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Patrice Duroux <patrice.duroux@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: llvm@lists.linux.dev Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230914211948.814999-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-18perf test: Update build test for changed BPF skeleton defaultsIan Rogers
Fix a target name and set BUILD_BPF_SKEL to 0 rather than 1. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Patrice Duroux <patrice.duroux@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: llvm@lists.linux.dev Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230914211948.814999-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-18perf build: Default BUILD_BPF_SKEL, warn/disable for missing depsIan Rogers
LIBBPF is dependent on zlib so move the NO_ZLIB and feature check early to avoid statically building when zlib is disabled. This avoids a linkage failure with perf and static libbpf when zlib isn't specified. Move BUILD_BPF_SKEL logic to one place and if not defined set BUILD_BPF_SKEL to 1. Detect dependencies of building with BPF skeletons and warn/disable if the dependencies aren't present. Change Makefile.perf to contain BPF skeleton logic dependent on the Makefile.config result and refresh the comment about BUILD_BPF_SKEL. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Patrice Duroux <patrice.duroux@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: llvm@lists.linux.dev Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230914211948.814999-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-18perf version: Add status of bpf skeletonsIan Rogers
Add status for BPF skeletons, to see if a build has them enabled: ``` $ perf version --build-options perf version 6.6.rc1.g0381ae36d1a6 dwarf: [ OFF ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ OFF ] # HAVE_DWARF_GETLOCATIONS_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ OFF ] # HAVE_LIBBFD_SUPPORT debuginfod: [ OFF ] # HAVE_DEBUGINFOD_SUPPORT libelf: [ OFF ] # HAVE_LIBELF_SUPPORT libnuma: [ OFF ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ OFF ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ OFF ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ OFF ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ OFF ] # HAVE_LIBBPF_SUPPORT aio: [ on ] # HAVE_AIO_SUPPORT zstd: [ on ] # HAVE_ZSTD_SUPPORT libpfm4: [ on ] # HAVE_LIBPFM libtraceevent: [ on ] # HAVE_LIBTRACEEVENT bpf_skeletons: [ OFF ] # HAVE_BPF_SKEL ``` Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Patrice Duroux <patrice.duroux@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: llvm@lists.linux.dev Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230914211948.814999-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-18perf kwork top: Simplify bool conversionYang Li
./tools/perf/util/bpf_kwork_top.c:120:53-58: WARNING: conversion to bool not needed here Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230915063832.120274-1-yang.lee@linux.alibaba.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-17perf jevent: fix core dump on software events on s390Thomas Richter
Running commands such as # ./perf stat -e cs -- true Segmentation fault (core dumped) # ./perf stat -e cpu-clock-- true Segmentation fault (core dumped) # dump core. This should not happen as these events are defined even when no hardware PMU is available. Debugging this reveals this call chain: perf_pmus__find_by_type(type=1) +--> pmu_read_sysfs(core_only=false) +--> perf_pmu__find2(dirfd=3, name=0x152a113 "software") +--> perf_pmu__lookup(pmus=0x14f0568 <other_pmus>, dirfd=3, lookup_name=0x152a113 "software") +--> perf_pmu__find_events_table (pmu=0x1532130) Now the pmu is "software" and it tries to find a proper table generated by the pmu-event generation process for s390: # cd pmu-events/ # ./jevents.py s390 all /root/linux/tools/perf/pmu-events/arch |\ grep -E '^const struct pmu_table_entry' const struct pmu_table_entry pmu_events__cf_z10[] = { const struct pmu_table_entry pmu_events__cf_z13[] = { const struct pmu_table_entry pmu_metrics__cf_z13[] = { const struct pmu_table_entry pmu_events__cf_z14[] = { const struct pmu_table_entry pmu_metrics__cf_z14[] = { const struct pmu_table_entry pmu_events__cf_z15[] = { const struct pmu_table_entry pmu_metrics__cf_z15[] = { const struct pmu_table_entry pmu_events__cf_z16[] = { const struct pmu_table_entry pmu_metrics__cf_z16[] = { const struct pmu_table_entry pmu_events__cf_z196[] = { const struct pmu_table_entry pmu_events__cf_zec12[] = { const struct pmu_table_entry pmu_metrics__cf_zec12[] = { const struct pmu_table_entry pmu_events__test_soc_cpu[] = { const struct pmu_table_entry pmu_metrics__test_soc_cpu[] = { const struct pmu_table_entry pmu_events__test_soc_sys[] = { # However event "software" is not listed, as can be seen in the generated const struct pmu_events_map pmu_events_map[]. So in function perf_pmu__find_events_table(), the variable table is initialized to NULL, but never set to a proper value. The function scans all generated &pmu_events_map[] tables, but no table matches, because the tables are s390 CPU Measurement unit specific: i = 0; for (;;) { const struct pmu_events_map *map = &pmu_events_map[i++]; if (!map->arch) break; --> the maps are there because the build generated them if (!strcmp_cpuid_str(map->cpuid, cpuid)) { table = &map->event_table; break; } --> Since no matching CPU string the table var remains 0x0 } free(cpuid); if (!pmu) return table; --> The pmu is "software" so it exists and no return --> and here perf dies because table is 0x0 for (i = 0; i < table->num_pmus; i++) { ... } return NULL; Fix this and do not access the table variable. Instead return 0x0 which is the same return code when the for-loop was not successful. Output after: # ./perf stat -e cs -- true Performance counter stats for 'true': 0 cs 0.000853105 seconds time elapsed 0.000061000 seconds user 0.000827000 seconds sys # ./perf stat -e cpu-clock -- true Performance counter stats for 'true': 0.25 msec cpu-clock # 0.341 CPUs utilized 0.000728383 seconds time elapsed 0.000055000 seconds user 0.000706000 seconds sys # ./perf stat -e cycles -- true Performance counter stats for 'true': <not supported> cycles 0.000767298 seconds time elapsed 0.000055000 seconds user 0.000739000 seconds sys # Fixes: 7c52f10c0d4d8 ("perf pmu: Cache JSON events table") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: dengler@linux.ibm.com Cc: gor@linux.ibm.com Cc: hca@linux.ibm.com Cc: sumanthk@linux.ibm.com Cc: svens@linux.ibm.com Link: https://lore.kernel.org/r/20230913125157.2790375-1-tmricht@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-17perf pmu: Ensure all alias variables are initializedIan Rogers
Fix an error detected by memory sanitizer: ``` ==4033==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x55fb0fbedfc7 in read_alias_info tools/perf/util/pmu.c:457:6 #1 0x55fb0fbea339 in check_info_data tools/perf/util/pmu.c:1434:2 #2 0x55fb0fbea339 in perf_pmu__check_alias tools/perf/util/pmu.c:1504:9 #3 0x55fb0fbdca85 in parse_events_add_pmu tools/perf/util/parse-events.c:1429:32 #4 0x55fb0f965230 in parse_events_parse tools/perf/util/parse-events.y:299:6 #5 0x55fb0fbdf6b2 in parse_events__scanner tools/perf/util/parse-events.c:1822:8 #6 0x55fb0fbdf8c1 in __parse_events tools/perf/util/parse-events.c:2094:8 #7 0x55fb0fa8ffa9 in parse_events tools/perf/util/parse-events.h:41:9 #8 0x55fb0fa8ffa9 in test_event tools/perf/tests/parse-events.c:2393:8 #9 0x55fb0fa8f458 in test__pmu_events tools/perf/tests/parse-events.c:2551:15 #10 0x55fb0fa6d93f in run_test tools/perf/tests/builtin-test.c:242:9 #11 0x55fb0fa6d93f in test_and_print tools/perf/tests/builtin-test.c:271:8 #12 0x55fb0fa6d082 in __cmd_test tools/perf/tests/builtin-test.c:442:5 #13 0x55fb0fa6d082 in cmd_test tools/perf/tests/builtin-test.c:564:9 #14 0x55fb0f942720 in run_builtin tools/perf/perf.c:322:11 #15 0x55fb0f942486 in handle_internal_command tools/perf/perf.c:375:8 #16 0x55fb0f941dab in run_argv tools/perf/perf.c:419:2 #17 0x55fb0f941dab in main tools/perf/perf.c:535:3 ``` Fixes: 7b723dbb96e8 ("perf pmu: Be lazy about loading event info files from sysfs") Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230914022425.1489035-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-17perf jevents metric: Fix type of strcmp_cpuid_strIan Rogers
The parser wraps all strings as Events, so the input is an Event. Using a string would be bad as functions like Simplify are called on the arguments, which wouldn't be present on a string. Fixes: 9d5da30e4ae9 ("perf jevents: Add a new expression builtin strcmp_cpuid_str()") Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230914022204.1488383-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-17perf trace: Avoid compile error wrt redefining boolIan Rogers
Make part of an existing TODO conditional to avoid the following build error: ``` tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c:26:14: error: cannot combine with previous 'char' declaration specifier 26 | typedef char bool; | ^ include/stdbool.h:20:14: note: expanded from macro 'bool' 20 | #define bool _Bool | ^ tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c:26:1: error: typedef requires a name [-Werror,-Wmissing-declarations] 26 | typedef char bool; | ^~~~~~~~~~~~~~~~~ 2 errors generated. ``` Signed-off-by: Ian Rogers <irogers@google.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230913184957.230076-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-17perf bpf-prologue: Remove unused fileIan Rogers
Commit 3d6dfae88917 ("perf parse-events: Remove BPF event support") removed building bpf-prologue.c but failed to remove the actual file. Fixes: 3d6dfae88917 ("perf parse-events: Remove BPF event support") Signed-off-by: Ian Rogers <irogers@google.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230913184534.227961-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-16perf test: Fix test-record-dummy-C0 failure for supported PERF_FORMAT_LOST ↵Yang Jihong
feature kernel For kernel that supports PERF_FORMAT_LOST, attr->read_format has PERF_FORMAT_LOST bit. Update expected value of attr->read_format of test-record-dummy-C0 for this scenario. Before: # ./perf test 17 -vv 17: Setup struct perf_event_attr : --- start --- test child forked, pid 1609441 <SNIP> running './tests/attr/test-record-dummy-C0' 'PERF_TEST_ATTR=/tmp/tmpm3s60aji ./perf record -o /tmp/tmpm3s60aji/perf.data --no-bpf-event -e dummy -C 0 kill >/dev/null 2>&1' ret '1', expected '1' expected read_format=4, got 20 FAILED './tests/attr/test-record-dummy-C0' - match failure test child finished with -1 ---- end ---- Setup struct perf_event_attr: FAILED! After: # ./perf test 17 -vv 17: Setup struct perf_event_attr : --- start --- test child forked, pid 1609441 <SNIP> running './tests/attr/test-record-dummy-C0' 'PERF_TEST_ATTR=/tmp/tmppa9vxcb7 ./perf record -o /tmp/tmppa9vxcb7/perf.data --no-bpf-event -e dummy -C 0 kill >/dev/null 2>&1' ret '1', expected '1' <SNIP> test child finished with 0 ---- end ---- Setup struct perf_event_attr: Ok Reported-and-Tested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20230916091641.776031-1-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-15perf kwork: Fix spelling mistake "COMMMAND" -> "COMMAND"Colin Ian King
There is a spelling mistake in a literal string. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: kernel-janitors@vger.kernel.org Link: https://lore.kernel.org/r/20230915090910.30182-1-colin.i.king@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-15perf annotate: Add more x86 mov instruction casesNamhyung Kim
Instructions with sign- and zero- extention like movsbl and movzwq were not handled properly. As it can check different size suffix (-b, -w, -l or -q) we can omit that and add the common parts even though some combinations are not possible. Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20230908052216.566148-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-15perf pmu: Remove unused functionJames Clark
pmu_events_table__find() is no longer used so remove it and its Arm specific version. Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230913153355.138331-4-james.clark@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-15perf pmus: Simplify perf_pmus__find_core_pmu()James Clark
Currently the while loop always either exits on the first iteration with a core PMU, or exits with NULL on heterogeneous systems or when not all CPUs are online. Both of the latter behaviors are undesirable for platforms other than Arm so simplify it to always return the first core PMU, or NULL if none exist. This behavior was depended on by the Arm version of pmu_metrics_table__find(), so the logic has been moved there instead. Signed-off-by: James Clark <james.clark@arm.com> Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230913153355.138331-3-james.clark@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-15perf pmu: Move pmu__find_core_pmu() to pmus.cJames Clark
pmu__find_core_pmu() more logically belongs in pmus.c because it iterates over all PMUs, so move it to pmus.c At the same time rename it to perf_pmus__find_core_pmu() to match the naming convention in this file. list_prepare_entry() can't be used in perf_pmus__scan_core() anymore now that it's called from the same compilation unit. This is with -O2 (specifically -O1 -ftree-vrp -finline-functions -finline-small-functions) which allow the bounds of the array access to be determined at compile time. list_prepare_entry() subtracts the offset of the 'list' member in struct perf_pmu from &core_pmus, which isn't a struct perf_pmu. The compiler sees that pmu results in &core_pmus - 8 and refuses to compile. At runtime this works because list_for_each_entry_continue() always adds the offset back again before dereferencing ->next, but it's technically undefined behavior. With -fsanitize=undefined an additional warning is generated. Using list_first_entry_or_null() to get the first entry here avoids doing &core_pmus - 8 but has the same result and fixes both the compile warning and the undefined behavior warning. There are other uses of list_prepare_entry() in pmus.c, but the compiler doesn't seem to be able to see that they can also be called with &core_pmus, so I won't change any at this time. Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230913153355.138331-2-james.clark@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-15perf symbol: Avoid an undefined behavior warningIan Rogers
The node (nd) may be NULL and pointer arithmetic on NULL is undefined behavior. Move the computation of next below the NULL check on the node. Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20230914044233.1550195-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-13perf bench sched-seccomp-notify: Use the tools copy of seccomp.h UAPIArnaldo Carvalho de Melo
To keep perf building in systems where types and defines used in this new benchmark are not available, such as: 12 13.46 centos:stream : FAIL gcc version 8.5.0 20210514 (Red Hat 8.5.0-20) (GCC) bench/sched-seccomp-notify.c: In function 'user_notif_syscall': bench/sched-seccomp-notify.c:55:27: error: 'SECCOMP_RET_USER_NOTIF' undeclared (first use in this function); did you mean 'SECCOMP_RET_ERRNO'? BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_USER_NOTIF), ^~~~~~~~~~~~~~~~~~~~~~ /git/perf-6.6.0-rc1/tools/include/uapi/linux/filter.h:49:59: note: in definition of macro 'BPF_STMT' #define BPF_STMT(code, k) { (unsigned short)(code), 0, 0, k } ^ bench/sched-seccomp-notify.c:55:27: note: each undeclared identifier is reported only once for each function it appears in BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_USER_NOTIF), ^~~~~~~~~~~~~~~~~~~~~~ /git/perf-6.6.0-rc1/tools/include/uapi/linux/filter.h:49:59: note: in definition of macro 'BPF_STMT' #define BPF_STMT(code, k) { (unsigned short)(code), 0, 0, k } ^ bench/sched-seccomp-notify.c:55:3: error: missing initializer for field 'k' of 'struct sock_filter' [-Werror=missing-field-initializers] BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_USER_NOTIF), ^~~~~~~~ In file included from bench/sched-seccomp-notify.c:5: /git/perf-6.6.0-rc1/tools/include/uapi/linux/filter.h:28:8: note: 'k' declared here __u32 k; /* Generic multiuse field */ ^ bench/sched-seccomp-notify.c: In function 'user_notification_sync_loop': bench/sched-seccomp-notify.c:70:28: error: storage size of 'resp' isn't known struct seccomp_notif_resp resp; ^~~~ bench/sched-seccomp-notify.c:71:23: error: storage size of 'req' isn't known struct seccomp_notif req; ^~~ bench/sched-seccomp-notify.c:76:23: error: 'SECCOMP_IOCTL_NOTIF_RECV' undeclared (first use in this function); did you mean 'SECCOMP_MODE_STRICT'? if (ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req)) ^~~~~~~~~~~~~~~~~~~~~~~~ SECCOMP_MODE_STRICT bench/sched-seccomp-notify.c:86:23: error: 'SECCOMP_IOCTL_NOTIF_SEND' undeclared (first use in this function); did you mean 'SECCOMP_RET_ACTION'? if (ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp)) ^~~~~~~~~~~~~~~~~~~~~~~~ SECCOMP_RET_ACTION bench/sched-seccomp-notify.c:71:23: error: unused variable 'req' [-Werror=unused-variable] struct seccomp_notif req; ^~~ bench/sched-seccomp-notify.c:70:28: error: unused variable 'resp' [-Werror=unused-variable] struct seccomp_notif_resp resp; ^~~~ 14 11.31 debian:10 : FAIL gcc version 8.3.0 (Debian 8.3.0-6) Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrei Vagin <avagin@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kees Kook <keescook@chromium.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/ZQGhjaojgOGtSNk6@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-13tools headers UAPI: Copy seccomp.h to be able to build 'perf bench' in older ↵Arnaldo Carvalho de Melo
systems The new 'perf bench' for sched-seccomp-notify uses defines and types not available in older systems where we want to have perf available, so grab a copy of this UAPI from the kernel sources to allow that. This will be checked in the future for drift from the original when we build the perf tool, that will warn when that happens like: make: Entering directory '/var/home/acme/git/perf-tools/tools/perf' BUILD: Doing 'make -j32' parallel build Warning: Kernel ABI header differences: Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrei Vagin <avagin@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kees Kook <keescook@chromium.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/ZQGhMXtwX7RvV3ya@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-13tools headers UAPI: Sync files changed by new fchmodat2 and map_shadow_stack ↵Arnaldo Carvalho de Melo
syscalls with the kernel sources To pick the changes in these csets: c35559f94ebc3e3b ("x86/shstk: Introduce map_shadow_stack syscall") 78252deb023cf087 ("arch: Register fchmodat2, usually as syscall 452") That add support for this new syscall in tools such as 'perf trace'. For instance, this is now possible: # perf trace -v -e fchmodat*,map_shadow_stack --max-events=4 Using CPUID AuthenticAMD-25-21-0 Reusing "openat" BPF sys_enter augmenter for "fchmodat" event qualifier tracepoint filter: (common_pid != 3499340 && common_pid != 11259) && (id == 268 || id == 452 || id == 453) ^C# And it'll work as with other syscalls, for instance openat: # perf trace -e openat* --max-events=4 0.000 ( 0.015 ms): systemd-oomd/1150 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 11 0.068 ( 0.019 ms): systemd-oomd/1150 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/memory.pressure", flags: RDONLY|CLOEXEC) = 11 0.119 ( 0.008 ms): systemd-oomd/1150 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/memory.current", flags: RDONLY|CLOEXEC) = 11 0.138 ( 0.006 ms): systemd-oomd/1150 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/memory.min", flags: RDONLY|CLOEXEC) = 11 # That is the filter expression attached to the raw_syscalls:sys_{enter,exit} tracepoints. $ find tools/perf/arch/ -name "syscall*tbl" | xargs grep -E fchmodat\|sys_map_shadow_stack tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl:258 n64 fchmodat sys_fchmodat tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl:452 n64 fchmodat2 sys_fchmodat2 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl:297 common fchmodat sys_fchmodat tools/perf/arch/powerpc/entry/syscalls/syscall.tbl:452 common fchmodat2 sys_fchmodat2 tools/perf/arch/s390/entry/syscalls/syscall.tbl:299 common fchmodat sys_fchmodat sys_fchmodat tools/perf/arch/s390/entry/syscalls/syscall.tbl:452 common fchmodat2 sys_fchmodat2 sys_fchmodat2 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:268 common fchmodat sys_fchmodat tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:452 common fchmodat2 sys_fchmodat2 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:453 64 map_shadow_stack sys_map_shadow_stack $ $ grep -Ew map_shadow_stack\|fchmodat2 /tmp/build/perf-tools/arch/x86/include/generated/asm/syscalls_64.c [452] = "fchmodat2", [453] = "map_shadow_stack", $ This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://lore.kernel.org/lkml/ZP8bE7aXDBu%2Fdrak@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf bpf-filter: Add YYDEBUGIan Rogers
YYDEBUG enables line numbers and other error helpers in the generated bpf-filter-bison.c. Conditionally enabled only for debug builds. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230911170559.4037734-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf pmu: Add YYDEBUGIan Rogers
YYDEBUG enables line numbers and other error helpers in the generated pmu-bison.c. Conditionally enabled only for debug builds. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230911170559.4037734-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf expr: Make YYDEBUG dependent on doing a debug buildIan Rogers
YYDEBUG enables line numbers and other error helpers in the generated expr-bison.c. These shouldn't be generated when debugging isn't enabled. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230911170559.4037734-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf parse-events: Make YYDEBUG dependent on doing a debug buildIan Rogers
YYDEBUG enables line numbers and other error helpers in the generated parse-events-bison.c. These shouldn't be generated when debugging isn't enabled. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230911170559.4037734-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf parse-events: Remove unused header filesIan Rogers
The fnmatch header is now used in the PMU matching logic in pmu.c. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230911170559.4037734-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf tools: Add includes for detected configs in Makefile.perfAthira Rajeev
Makefile.perf uses "CONFIG_*" checks in the code. Example the config for libtraceevent is used to set PYTHON_EXT_SRCS ifeq ($(CONFIG_LIBTRACEEVENT),y) PYTHON_EXT_SRCS := $(shell grep -v ^\# util/python-ext-sources) else PYTHON_EXT_SRCS := $(shell grep -v '^\#\|util/trace-event.c' util/python-ext-sources) endif But this is not picking the value for CONFIG_LIBTRACEEVENT that is set using the settings in Makefile.config. Include the file ".config-detected" so that make will use the system detected configuration in the CONFIG checks. This will fix isues that could arise when other "CONFIG_*" checks are added to Makefile.perf in future as well. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230912063807.74250-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf test: Update cs_etm testcase for Arm ETERuidong Tian
Add ETE as one of the supported device types in perf cs_etm testcase. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ruidong Tian <tianruidong@linux.alibaba.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230911065541.91293-1-tianruidong@linux.alibaba.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf vendor events arm64: Add V1 metrics using Arm telemetry repoJames Clark
Metrics for V1 weren't previously included in the Perf Jsons, so add them using the telemetry source [1]. After generation any parts identical to the default metrics in sbsa.json were manually removed. [1]: https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/blob/main/data/pmu/cpu/neoverse/neoverse-v1.json Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Forrington <nick.forrington@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230831161618.134738-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf vendor events arm64: Update V1 events using Arm telemetry repoJames Clark
The new data [1] includes descriptions that may have product specific details and new groupings that will be consistent with other products. The following command was used to generate the jsons: $ telemetry-solution/tools/perf_json_generator/generate.py \ linux/tools/perf/ --telemetry-files \ telemetry-solution/data/pmu/cpu/neoverse/neoverse-v1.json [1]: https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/blob/main/data/pmu/cpu/neoverse/neoverse-v1.json Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Forrington <nick.forrington@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230831161618.134738-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf test: Add a test for strcmp_cpuid_str() expressionJames Clark
Test that the new expression builtin returns a match when the current escaped CPU ID is given, and that it doesn't match when "0x0" is given. The CPU ID in test__expr() has to be changed to perf_pmu__getcpuid() which returns the CPU ID string, rather than the raw CPU ID that get_cpuid() returns because that can't be used with strcmp_cpuid_str(). It doesn't affect the is_intel test because both versions contain "Intel". Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chen Zhongjin <chenzhongjin@huawei.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230904095104.1162928-5-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf util: Add a function for replacing characters in a stringJames Clark
It finds all occurrences of a single character and replaces them with a multi character string. This will be used in a test in a following commit. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chen Zhongjin <chenzhongjin@huawei.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230904095104.1162928-4-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf jevents: Remove unused keywordJames Clark
'cpuid_not_more_than' was the working title of the new 'strcmp_cpuid_str' keyword and was accidentally left in. It was never used so tidying it up has no effect. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chen Zhongjin <chenzhongjin@huawei.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230904095104.1162928-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf test: Check result of has_event(cycles) testJames Clark
Currently the function always returns 0, so even when the has_event() test fails, the test still passes. Fix it by returning ret instead. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chen Zhongjin <chenzhongjin@huawei.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230904095104.1162928-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf list pfm: Retry supported test with exclude_kernelIan Rogers
With paranoia set at 2 evsel__open will fail with EACCES for non-root users. To avoid this stopping libpfm4 events from being printed, retry with exclude_kernel enabled - copying the regular is_event_supported test. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kang Minchul <tegongkang@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20230906234416.3472339-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf list: Avoid a hardcoded cpu PMU nameIan Rogers
Use the first core PMU instead. On a Raspberry Pi, before: $ perf list ... cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor] [(see 'man perf-list' on how to encode it)] ... After: $ perf list ... armv8_cortex_a72/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor] [(see 'man perf-list' on how to encode it)] ... ``` Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kang Minchul <tegongkang@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20230906234416.3472339-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf test shell lock_contention: Add cgroup aggregation and filter testsNamhyung Kim
Add cgroup aggregation and filter tests. $ sudo ./perf test -v contention 84: kernel lock contention analysis test : --- start --- test child forked, pid 222423 Testing perf lock record and perf lock contention Testing perf lock contention --use-bpf Testing perf lock record and perf lock contention at the same time Testing perf lock contention --threads Testing perf lock contention --lock-addr Testing perf lock contention --lock-cgroup Testing perf lock contention --type-filter (w/ spinlock) Testing perf lock contention --lock-filter (w/ tasklist_lock) Testing perf lock contention --callstack-filter (w/ unix_stream) Testing perf lock contention --callstack-filter with task aggregation Testing perf lock contention --cgroup-filter Testing perf lock contention CSV output test child finished with 0 ---- end ---- kernel lock contention analysis test: Ok Committer testing: [root@quaco ~]# uname -a Linux quaco 6.4.10-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Aug 11 12:20:29 UTC 2023 x86_64 GNU/Linux [root@quaco ~]# perf test -v contention 84: kernel lock contention analysis test : --- start --- test child forked, pid 452625 Testing perf lock record and perf lock contention Testing perf lock contention --use-bpf Testing perf lock record and perf lock contention at the same time Testing perf lock contention --threads Testing perf lock contention --lock-addr Testing perf lock contention --lock-cgroup Testing perf lock contention --type-filter (w/ spinlock) Testing perf lock contention --lock-filter (w/ tasklist_lock) Testing perf lock contention --callstack-filter (w/ unix_stream) Testing perf lock contention --callstack-filter with task aggregation Testing perf lock contention --cgroup-filter Testing perf lock contention CSV output test child finished with 0 ---- end ---- kernel lock contention analysis test: Ok [root@quaco ~]# Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230906174903.346486-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf lock contention: Add -G/--cgroup-filter optionNamhyung Kim
The -G/--cgroup-filter is to limit lock contention collection on the tasks in the specific cgroups only. $ sudo ./perf lock con -abt -G /user.slice/.../vte-spawn-52221fb8-b33f-4a52-b5c3-e35d1e6fc0e0.scope \ ./perf bench sched messaging # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 0.174 [sec] contended total wait max wait avg wait pid comm 4 114.45 us 60.06 us 28.61 us 214847 sched-messaging 2 111.40 us 60.84 us 55.70 us 214848 sched-messaging 2 106.09 us 59.42 us 53.04 us 214837 sched-messaging 1 81.70 us 81.70 us 81.70 us 214709 sched-messaging 68 78.44 us 6.83 us 1.15 us 214633 sched-messaging 69 73.71 us 2.69 us 1.07 us 214632 sched-messaging 4 72.62 us 60.83 us 18.15 us 214850 sched-messaging 2 71.75 us 67.60 us 35.88 us 214840 sched-messaging 2 69.29 us 67.53 us 34.65 us 214804 sched-messaging 2 69.00 us 68.23 us 34.50 us 214826 sched-messaging ... Export cgroup__new() function as it's needed from outside. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230906174903.346486-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf lock contention: Add --lock-cgroup optionNamhyung Kim
The --lock-cgroup option shows lock contention stats break down by cgroups. Add LOCK_AGGR_CGROUP mode and use it instead of use_cgroup field. $ sudo ./perf lock con -ab --lock-cgroup sleep 1 contended total wait max wait avg wait cgroup 8 15.70 us 6.34 us 1.96 us / 2 1.48 us 747 ns 738 ns /user.slice/.../app.slice/app-gnome-google\x2dchrome-6442.scope 1 848 ns 848 ns 848 ns /user.slice/.../session.slice/org.gnome.Shell@x11.service 1 220 ns 220 ns 220 ns /user.slice/.../session.slice/pipewire-pulse.service For now, the cgroup mode only works with BPF (-b). Committer notes: Remove -g as it is used in the other tools with a clear meaning of collect/show callchains. As agreed with Namhyung off list. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230906174903.346486-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf lock contention: Prepare to handle cgroupsNamhyung Kim
Save cgroup info and display cgroup names if requested. This is a preparation for the next patch. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230906174903.346486-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf tools: Add read_all_cgroups() and __cgroup_find()Namhyung Kim
The read_all_cgroups() is to build a tree of cgroups in the system and users can look up a cgroup using __cgroup_find(). Committer notes: Had to do this to cover that #else block: -static inline u64 __read_cgroup_id(const char *path) { return -1ULL; } +static inline u64 __read_cgroup_id(const char *path __maybe_unused) { return -1ULL; } Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230906174903.346486-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf kwork top: Add BPF-based statistics on softirq event supportYang Jihong
Use BPF to collect statistics on softirq events based on perf BPF skeletons. Example usage: # perf kwork top -b Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 135445.704 ms, 8 cpus %Cpu(s): 28.35% id, 0.00% hi, 0.25% si %Cpu0 [|||||||||||||||||||| 69.85%] %Cpu1 [|||||||||||||||||||||| 74.10%] %Cpu2 [||||||||||||||||||||| 71.18%] %Cpu3 [|||||||||||||||||||| 69.61%] %Cpu4 [|||||||||||||||||||||| 74.05%] %Cpu5 [|||||||||||||||||||| 69.33%] %Cpu6 [|||||||||||||||||||| 69.71%] %Cpu7 [|||||||||||||||||||||| 73.77%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 0 0 30.43 5271.005 ms [swapper/5] 0 0 30.17 5226.644 ms [swapper/3] 0 0 30.08 5210.257 ms [swapper/6] 0 0 29.89 5177.177 ms [swapper/0] 0 0 28.51 4938.672 ms [swapper/2] 0 0 25.93 4223.464 ms [swapper/7] 0 0 25.69 4181.411 ms [swapper/4] 0 0 25.63 4173.804 ms [swapper/1] 16665 16265 2.16 360.600 ms sched-messaging 16537 16265 2.05 356.275 ms sched-messaging 16503 16265 2.01 343.063 ms sched-messaging 16424 16265 1.97 336.876 ms sched-messaging 16580 16265 1.94 323.658 ms sched-messaging 16515 16265 1.92 321.616 ms sched-messaging 16659 16265 1.91 325.538 ms sched-messaging 16634 16265 1.88 327.766 ms sched-messaging 16454 16265 1.87 326.843 ms sched-messaging 16382 16265 1.87 322.591 ms sched-messaging 16642 16265 1.86 320.506 ms sched-messaging 16582 16265 1.86 320.164 ms sched-messaging 16315 16265 1.86 326.872 ms sched-messaging 16637 16265 1.85 323.766 ms sched-messaging 16506 16265 1.82 311.688 ms sched-messaging 16512 16265 1.81 304.643 ms sched-messaging 16560 16265 1.80 314.751 ms sched-messaging 16320 16265 1.80 313.405 ms sched-messaging 16442 16265 1.80 314.403 ms sched-messaging 16626 16265 1.78 295.380 ms sched-messaging 16600 16265 1.77 309.444 ms sched-messaging 16550 16265 1.76 301.161 ms sched-messaging 16525 16265 1.75 296.560 ms sched-messaging 16314 16265 1.75 298.338 ms sched-messaging 16595 16265 1.74 304.390 ms sched-messaging 16555 16265 1.74 287.564 ms sched-messaging 16520 16265 1.74 295.734 ms sched-messaging 16507 16265 1.73 293.956 ms sched-messaging 16593 16265 1.72 296.443 ms sched-messaging 16531 16265 1.72 299.950 ms sched-messaging 16281 16265 1.72 301.339 ms sched-messaging <SNIP> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230812084917.169338-17-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf kwork top: Add BPF-based statistics on hardirq event supportYang Jihong
Use BPF to collect statistics on hardirq events based on perf BPF skeletons. Example usage: # perf kwork top -k sched,irq -b Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 136717.945 ms, 8 cpus %Cpu(s): 17.10% id, 0.01% hi, 0.00% si %Cpu0 [||||||||||||||||||||||||| 84.26%] %Cpu1 [||||||||||||||||||||||||| 84.77%] %Cpu2 [|||||||||||||||||||||||| 83.22%] %Cpu3 [|||||||||||||||||||||||| 80.37%] %Cpu4 [|||||||||||||||||||||||| 81.49%] %Cpu5 [||||||||||||||||||||||||| 84.68%] %Cpu6 [||||||||||||||||||||||||| 84.48%] %Cpu7 [|||||||||||||||||||||||| 80.21%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 0 0 19.78 3482.833 ms [swapper/7] 0 0 19.62 3454.219 ms [swapper/3] 0 0 18.50 3258.339 ms [swapper/4] 0 0 16.76 2842.749 ms [swapper/2] 0 0 15.71 2627.905 ms [swapper/0] 0 0 15.51 2598.206 ms [swapper/6] 0 0 15.31 2561.820 ms [swapper/5] 0 0 15.22 2548.708 ms [swapper/1] 13253 13018 2.95 513.108 ms sched-messaging 13092 13018 2.67 454.167 ms sched-messaging 13401 13018 2.66 454.790 ms sched-messaging 13240 13018 2.64 454.587 ms sched-messaging 13251 13018 2.61 442.273 ms sched-messaging 13075 13018 2.61 438.932 ms sched-messaging 13220 13018 2.60 443.245 ms sched-messaging 13235 13018 2.59 443.268 ms sched-messaging 13222 13018 2.50 426.344 ms sched-messaging 13410 13018 2.49 426.191 ms sched-messaging 13228 13018 2.46 425.121 ms sched-messaging 13379 13018 2.38 409.950 ms sched-messaging 13236 13018 2.37 413.159 ms sched-messaging 13095 13018 2.36 396.572 ms sched-messaging 13325 13018 2.35 408.089 ms sched-messaging 13242 13018 2.32 394.750 ms sched-messaging 13386 13018 2.31 396.997 ms sched-messaging 13046 13018 2.29 383.833 ms sched-messaging 13109 13018 2.28 388.482 ms sched-messaging 13388 13018 2.28 393.576 ms sched-messaging 13238 13018 2.26 388.487 ms sched-messaging <SNIP> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230812084917.169338-16-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf kwork top: Implements BPF-based cpu usage statisticsYang Jihong
Use BPF to collect statistics on the CPU usage based on perf BPF skeletons. Example usage: # perf kwork top -h Usage: perf kwork top [<options>] -b, --use-bpf Use BPF to measure task cpu usage -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): rate, runtime, tid --time <str> Time span for analysis (start,stop) # # perf kwork -k sched top -b Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 160702.425 ms, 8 cpus %Cpu(s): 36.00% id, 0.00% hi, 0.00% si %Cpu0 [|||||||||||||||||| 61.66%] %Cpu1 [|||||||||||||||||| 61.27%] %Cpu2 [||||||||||||||||||| 66.40%] %Cpu3 [|||||||||||||||||| 61.28%] %Cpu4 [|||||||||||||||||| 61.82%] %Cpu5 [||||||||||||||||||||||| 77.41%] %Cpu6 [|||||||||||||||||| 61.73%] %Cpu7 [|||||||||||||||||| 63.25%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 0 0 38.72 8089.463 ms [swapper/1] 0 0 38.71 8084.547 ms [swapper/3] 0 0 38.33 8007.532 ms [swapper/0] 0 0 38.26 7992.985 ms [swapper/6] 0 0 38.17 7971.865 ms [swapper/4] 0 0 36.74 7447.765 ms [swapper/7] 0 0 33.59 6486.942 ms [swapper/2] 0 0 22.58 3771.268 ms [swapper/5] 9545 9351 2.48 447.136 ms sched-messaging 9574 9351 2.09 418.583 ms sched-messaging 9724 9351 2.05 372.407 ms sched-messaging 9531 9351 2.01 368.804 ms sched-messaging 9512 9351 2.00 362.250 ms sched-messaging 9514 9351 1.95 357.767 ms sched-messaging 9538 9351 1.86 384.476 ms sched-messaging 9712 9351 1.84 386.490 ms sched-messaging 9723 9351 1.83 380.021 ms sched-messaging 9722 9351 1.82 382.738 ms sched-messaging 9517 9351 1.81 354.794 ms sched-messaging 9559 9351 1.79 344.305 ms sched-messaging 9725 9351 1.77 365.315 ms sched-messaging <SNIP> # perf kwork -k sched top -b -n perf Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 151563.332 ms, 8 cpus %Cpu(s): 26.49% id, 0.00% hi, 0.00% si %Cpu0 [ 0.01%] %Cpu1 [ 0.00%] %Cpu2 [ 0.00%] %Cpu3 [ 0.00%] %Cpu4 [ 0.00%] %Cpu5 [ 0.00%] %Cpu6 [ 0.00%] %Cpu7 [ 0.00%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 9754 9754 0.01 2.303 ms perf # # perf kwork -k sched top -b -C 2,3,4 Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 48016.721 ms, 3 cpus %Cpu(s): 27.82% id, 0.00% hi, 0.00% si %Cpu2 [|||||||||||||||||||||| 74.68%] %Cpu3 [||||||||||||||||||||| 71.06%] %Cpu4 [||||||||||||||||||||| 70.91%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 0 0 29.08 4734.998 ms [swapper/4] 0 0 28.93 4710.029 ms [swapper/3] 0 0 25.31 3912.363 ms [swapper/2] 10248 10158 1.62 264.931 ms sched-messaging 10253 10158 1.62 265.136 ms sched-messaging 10158 10158 1.60 263.013 ms bash 10360 10158 1.49 243.639 ms sched-messaging 10413 10158 1.48 238.604 ms sched-messaging 10531 10158 1.47 234.067 ms sched-messaging 10400 10158 1.47 240.631 ms sched-messaging 10355 10158 1.47 230.586 ms sched-messaging 10377 10158 1.43 234.835 ms sched-messaging 10526 10158 1.42 232.045 ms sched-messaging 10298 10158 1.41 222.396 ms sched-messaging 10410 10158 1.38 221.853 ms sched-messaging 10364 10158 1.38 226.042 ms sched-messaging 10480 10158 1.36 213.633 ms sched-messaging 10370 10158 1.36 223.620 ms sched-messaging 10553 10158 1.34 217.169 ms sched-messaging 10291 10158 1.34 211.516 ms sched-messaging 10251 10158 1.34 218.813 ms sched-messaging 10522 10158 1.33 218.498 ms sched-messaging 10288 10158 1.33 216.787 ms sched-messaging <SNIP> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230812084917.169338-15-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf kwork top: Add -C/--cpu -i/--input -n/--name -s/--sort --time optionsYang Jihong
Provide the following options for perf kwork top: 1. -C, --cpu <cpu> list of cpus to profile 2. -i, --input <file> input file name 3. -n, --name <name> event name to profile 4. -s, --sort <key[,key2...]> sort by key(s): rate, runtime, tid 5. --time <str> Time span for analysis (start,stop) Example usage: # perf kwork top -h Usage: perf kwork top [<options>] -C, --cpu <cpu> list of cpus to profile -i, --input <file> input file name -n, --name <name> event name to profile -s, --sort <key[,key2...]> sort by key(s): rate, runtime, tid --time <str> Time span for analysis (start,stop) # perf kwork top -C 2,4,5 Total : 51226.940 ms, 3 cpus %Cpu(s): 92.59% id, 0.00% hi, 0.09% si %Cpu2 [| 4.61%] %Cpu4 [ 0.01%] %Cpu5 [||||| 17.31%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 0 99.98 17073.515 ms swapper/4 0 95.17 16250.874 ms swapper/2 0 82.62 14108.577 ms swapper/5 4342 21.70 3708.358 ms perf 16 0.13 22.296 ms rcu_preempt 75 0.02 4.261 ms kworker/2:1 98 0.01 2.540 ms jbd2/sda-8 61 0.01 3.404 ms kcompactd0 87 0.00 0.145 ms kworker/5:1H 73 0.00 0.596 ms kworker/5:1 41 0.00 0.041 ms ksoftirqd/5 40 0.00 0.718 ms migration/5 64 0.00 0.115 ms kworker/4:1 35 0.00 0.556 ms migration/4 353 0.00 1.143 ms sshd 26 0.00 1.665 ms ksoftirqd/2 25 0.00 0.662 ms migration/2 # perf kwork top -i perf.data Total : 136601.588 ms, 8 cpus %Cpu(s): 95.66% id, 0.04% hi, 0.05% si %Cpu0 [ 0.02%] %Cpu1 [ 0.01%] %Cpu2 [| 4.61%] %Cpu3 [ 0.04%] %Cpu4 [ 0.01%] %Cpu5 [||||| 17.31%] %Cpu6 [ 0.51%] %Cpu7 [||| 11.42%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 0 99.98 17073.515 ms swapper/4 0 99.98 17072.173 ms swapper/1 0 99.93 17064.229 ms swapper/3 0 99.62 17011.013 ms swapper/0 0 99.47 16985.180 ms swapper/6 0 95.17 16250.874 ms swapper/2 0 88.51 15111.684 ms swapper/7 0 82.62 14108.577 ms swapper/5 4342 33.00 5644.045 ms perf 4344 0.43 74.351 ms perf 16 0.13 22.296 ms rcu_preempt 4345 0.05 10.093 ms perf 4343 0.05 8.769 ms perf 4341 0.02 4.882 ms perf 4095 0.02 4.605 ms kworker/7:1 75 0.02 4.261 ms kworker/2:1 120 0.01 1.909 ms systemd-journal 98 0.01 2.540 ms jbd2/sda-8 61 0.01 3.404 ms kcompactd0 667 0.01 2.542 ms kworker/u16:2 4340 0.00 1.052 ms kworker/7:2 97 0.00 0.489 ms kworker/7:1H 51 0.00 0.209 ms ksoftirqd/7 50 0.00 0.646 ms migration/7 76 0.00 0.753 ms kworker/6:1 45 0.00 0.572 ms migration/6 87 0.00 0.145 ms kworker/5:1H 73 0.00 0.596 ms kworker/5:1 41 0.00 0.041 ms ksoftirqd/5 40 0.00 0.718 ms migration/5 64 0.00 0.115 ms kworker/4:1 35 0.00 0.556 ms migration/4 353 0.00 2.600 ms sshd 74 0.00 0.205 ms kworker/3:1 33 0.00 1.576 ms kworker/3:0H 30 0.00 0.996 ms migration/3 26 0.00 1.665 ms ksoftirqd/2 25 0.00 0.662 ms migration/2 397 0.00 0.057 ms kworker/1:1 20 0.00 1.005 ms migration/1 2909 0.00 1.053 ms kworker/0:2 17 0.00 0.720 ms migration/0 15 0.00 0.039 ms ksoftirqd/0 # perf kwork top -n perf Total : 136601.588 ms, 8 cpus %Cpu(s): 95.66% id, 0.04% hi, 0.05% si %Cpu0 [ 0.01%] %Cpu1 [ 0.00%] %Cpu2 [| 4.44%] %Cpu3 [ 0.00%] %Cpu4 [ 0.00%] %Cpu5 [ 0.00%] %Cpu6 [ 0.49%] %Cpu7 [||| 11.38%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 4342 15.74 2695.516 ms perf 4344 0.43 74.351 ms perf 4345 0.05 10.093 ms perf 4343 0.05 8.769 ms perf 4341 0.02 4.882 ms perf # perf kwork top -s tid Total : 136601.588 ms, 8 cpus %Cpu(s): 95.66% id, 0.04% hi, 0.05% si %Cpu0 [ 0.02%] %Cpu1 [ 0.01%] %Cpu2 [| 4.61%] %Cpu3 [ 0.04%] %Cpu4 [ 0.01%] %Cpu5 [||||| 17.31%] %Cpu6 [ 0.51%] %Cpu7 [||| 11.42%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 0 99.62 17011.013 ms swapper/0 0 99.98 17072.173 ms swapper/1 0 95.17 16250.874 ms swapper/2 0 99.93 17064.229 ms swapper/3 0 99.98 17073.515 ms swapper/4 0 82.62 14108.577 ms swapper/5 0 99.47 16985.180 ms swapper/6 0 88.51 15111.684 ms swapper/7 15 0.00 0.039 ms ksoftirqd/0 16 0.13 22.296 ms rcu_preempt 17 0.00 0.720 ms migration/0 20 0.00 1.005 ms migration/1 25 0.00 0.662 ms migration/2 26 0.00 1.665 ms ksoftirqd/2 30 0.00 0.996 ms migration/3 33 0.00 1.576 ms kworker/3:0H 35 0.00 0.556 ms migration/4 40 0.00 0.718 ms migration/5 41 0.00 0.041 ms ksoftirqd/5 45 0.00 0.572 ms migration/6 50 0.00 0.646 ms migration/7 51 0.00 0.209 ms ksoftirqd/7 61 0.01 3.404 ms kcompactd0 64 0.00 0.115 ms kworker/4:1 73 0.00 0.596 ms kworker/5:1 74 0.00 0.205 ms kworker/3:1 75 0.02 4.261 ms kworker/2:1 76 0.00 0.753 ms kworker/6:1 87 0.00 0.145 ms kworker/5:1H 97 0.00 0.489 ms kworker/7:1H 98 0.01 2.540 ms jbd2/sda-8 120 0.01 1.909 ms systemd-journal 353 0.00 2.600 ms sshd 397 0.00 0.057 ms kworker/1:1 667 0.01 2.542 ms kworker/u16:2 2909 0.00 1.053 ms kworker/0:2 4095 0.02 4.605 ms kworker/7:1 4340 0.00 1.052 ms kworker/7:2 4341 0.02 4.882 ms perf 4342 33.00 5644.045 ms perf 4343 0.05 8.769 ms perf 4344 0.43 74.351 ms perf 4345 0.05 10.093 ms perf # perf kwork top --time 128800, Total : 53495.122 ms, 8 cpus %Cpu(s): 94.71% id, 0.09% hi, 0.09% si %Cpu0 [ 0.07%] %Cpu1 [ 0.04%] %Cpu2 [|| 8.49%] %Cpu3 [ 0.09%] %Cpu4 [ 0.02%] %Cpu5 [ 0.06%] %Cpu6 [ 0.12%] %Cpu7 [|||||| 21.24%] PID %CPU RUNTIME COMMMAND ---------------------------------------------------- 0 99.96 3981.363 ms swapper/4 0 99.94 3978.955 ms swapper/1 0 99.91 9329.375 ms swapper/5 0 99.87 4906.829 ms swapper/3 0 99.86 9028.064 ms swapper/6 0 98.67 3928.161 ms swapper/0 0 91.17 8388.432 ms swapper/2 0 78.65 7125.602 ms swapper/7 4342 29.42 2675.198 ms perf 16 0.18 16.817 ms rcu_preempt 4345 0.09 8.183 ms perf 4344 0.04 4.290 ms perf 4343 0.03 2.844 ms perf 353 0.03 2.600 ms sshd 4095 0.02 2.702 ms kworker/7:1 120 0.02 1.909 ms systemd-journal 98 0.02 2.540 ms jbd2/sda-8 61 0.02 1.886 ms kcompactd0 667 0.02 1.011 ms kworker/u16:2 75 0.02 2.693 ms kworker/2:1 4341 0.01 1.838 ms perf 30 0.01 0.788 ms migration/3 26 0.01 1.665 ms ksoftirqd/2 20 0.01 0.752 ms migration/1 2909 0.01 0.604 ms kworker/0:2 4340 0.00 0.635 ms kworker/7:2 97 0.00 0.214 ms kworker/7:1H 51 0.00 0.209 ms ksoftirqd/7 50 0.00 0.646 ms migration/7 76 0.00 0.602 ms kworker/6:1 45 0.00 0.366 ms migration/6 87 0.00 0.145 ms kworker/5:1H 40 0.00 0.446 ms migration/5 35 0.00 0.318 ms migration/4 74 0.00 0.205 ms kworker/3:1 33 0.00 0.080 ms kworker/3:0H 25 0.00 0.448 ms migration/2 397 0.00 0.057 ms kworker/1:1 17 0.00 0.365 ms migration/0 Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20230812084917.169338-14-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>