summaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)Author
2022-11-03perf session: Change type to avoid undefined behaviour in a signal handlerIan Rogers
The 'session_done' variable is written to inside the signal handler of 'perf report' and 'perf script'. Switch its type to avoid undefined behavior. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20221024181913.630986-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-03perf ftrace: Use sig_atomic_t to avoid UBIan Rogers
Use sig_atomic_t for a variable written to in a signal handler and read elsewhere. This is undefined behavior as per: https://wiki.sei.cmu.edu/confluence/display/c/SIG31-C.+Do+not+access+shared+objects+in+signal+handlers Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20221024181913.630986-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-03perf daemon: Use sig_atomic_t to avoid UBIan Rogers
Use sig_atomic_t for a variable written to in a signal handler and read elsewhere. This is undefined behavior as per: https://wiki.sei.cmu.edu/confluence/display/c/SIG31-C.+Do+not+access+shared+objects+in+signal+handlers Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20221024181913.630986-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-03perf record: Use sig_atomic_t for signal handlersIan Rogers
This removes undefined behavior as described in: https://wiki.sei.cmu.edu/confluence/display/c/SIG31-C.+Do+not+access+shared+objects+in+signal+handlers Suggested-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20221024181913.630986-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-03perf build: Update to C standard to gnu11Ian Rogers
C11 has become the standard for mainstream kernel development [1], allowing it in the perf build enables libraries like stdatomic.h to be assumed to be present. This came up in the context of [2]. [1] https://lore.kernel.org/lkml/CAHk-=whWbENRz-vLY6vpESDLj6kGUTKO3khGtVfipHqwewh2HQ@mail.gmail.com/ [2] https://lore.kernel.org/lkml/20221024011024.462518-1-irogers@google.com/ Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20221024181913.630986-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-03perf probe: Fix to get declared file name from clang DWARF5Masami Hiramatsu (Google)
Fix to get the declared file name even if it uses file index 0 in DWARF5, using custom die_get_decl_file() function. Actually, the DWARF5 standard says file index 0 of the DW_AT_decl_file is invalid(1), but there is a discussion and maybe this will be updated [2]. Anyway, clang generates such DWARF5 file for the linux kernel. Thus it must be handled. Without this, 'perf probe' returns an error: $ ./perf probe -k $BIN_PATH/vmlinux -s $SRC_PATH -L vfs_read:10 Debuginfo analysis failed. Error: Failed to show lines. With this, it can handle the case correctly: $ ./perf probe -k $BIN_PATH/vmlinux -s $SRC_PATH -L vfs_read:10 <vfs_read@$SRC_PATH/fs/read_write.c:10> 11 ret = rw_verify_area(READ, file, pos, count); 12 if (ret) return ret; [1] DWARF5 specification 2.14 says "The value 0 indicates that no source file has been specified.") [2] http://wiki.dwarfstd.org/index.php?title=DWARF5_Line_Table_File_Numbers) Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/166731052936.2100653.13380621874859467731.stgit@devnote3 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-03perf probe: Use dwarf_attr_integrate as generic DWARF attr accessorMasami Hiramatsu (Google)
Use dwarf_attr_integrate() instead of dwarf_attr() for generic attribute acccessor functions, so that it can find the specified attribute from abstact origin DIE etc. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/166731051988.2100653.13595339994343449770.stgit@devnote3 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-03perf probe: Fix to avoid crashing if DW_AT_decl_file is NULLMasami Hiramatsu (Google)
Since clang generates DWARF5 which sets DW_AT_decl_file as 0, dwarf_decl_file() thinks that is invalid and returns NULL. In that case 'perf probe' SIGSEGVs because it doesn't expect a NULL decl_file. This adds a dwarf_decl_file() return value check to avoid such SEGV with clang generated DWARF5 info. Without this, 'perf probe' crashes: $ perf probe -k $BIN_PATH/vmlinux -s $SRC_PATH -L vfs_read:10 Segmentation fault $ With this, it just warns about it: $ perf probe -k $BIN_PATH/vmlinux -s $SRC_PATH -L vfs_read:10 Debuginfo analysis failed. Error: Failed to show lines. $ Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/166731051077.2100653.15626653369345128302.stgit@devnote3 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-31perf lock contention: Increase default stack skip to 4Namhyung Kim
In most configurations, it works well with skipping 4 entries by default. If some systems still have 3 BPF internal stack frames, the next frame should be in a lock function which will be skipped later when it tries to find a caller. So increasing to 4 won't affect such systems too. With --stack-skip=0, I can see something like this: 24 49.84 us 7.41 us 2.08 us mutex bpf_prog_e1b85959d520446c_contention_begin+0x12e 0xffffffffc045040e bpf_prog_e1b85959d520446c_contention_begin+0x12e 0xffffffffc045040e bpf_prog_e1b85959d520446c_contention_begin+0x12e 0xffffffff82ea2071 bpf_trace_run2+0x51 0xffffffff82de775b __bpf_trace_contention_begin+0xb 0xffffffff82c02045 __mutex_lock+0x245 0xffffffff82c019e3 __mutex_lock_slowpath+0x13 0xffffffff82c019c0 mutex_lock+0x20 0xffffffff830a083c kernfs_iop_permission+0x2c Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20221028180128.3311491-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-31perf lock contention: Avoid variable length arraysNamhyung Kim
The msan also warns about the use of VLA for stack_trace variable. We can dynamically allocate instead. While at it, simplify the error handle a bit (and fix bugs). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221028180128.3311491-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-31perf lock contention: Check --max-stack optionNamhyung Kim
The --max-stack option is used to allocate the BPF stack map and stack trace array in the userspace. Check the value properly before using. Practically it cannot be greater than the sysctl_perf_event_max_stack. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221028180128.3311491-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-31perf lock contention: Fix memory sanitizer issueNamhyung Kim
The msan reported a use-of-uninitialized-value warning for the struct lock_contention_data in lock_contention_read(). While it'd be filled by bpf_map_lookup_elem(), let's just initialize it to silence the warning. ==12524==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x562b0f16b1cd in lock_contention_read util/bpf_lock_contention.c:139:7 #1 0x562b0ef65ec6 in __cmd_contention builtin-lock.c:1737:3 #2 0x562b0ef65ec6 in cmd_lock builtin-lock.c:1992:8 #3 0x562b0ee7f50b in run_builtin perf.c:322:11 #4 0x562b0ee7efc1 in handle_internal_command perf.c:376:8 #5 0x562b0ee7e1e9 in run_argv perf.c:420:2 #6 0x562b0ee7e1e9 in main perf.c:550:3 #7 0x7f065f10e632 in __libc_start_main (/usr/lib64/libc.so.6+0x61632) #8 0x562b0edf2fa9 in _start (perf+0xfa9) SUMMARY: MemorySanitizer: use-of-uninitialized-value (perf+0xe15160) in lock_contention_read Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221028180128.3311491-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-31perf test: Parse events workaround for dash/minusIan Rogers
Skip an event configuration for event names with a dash/minus in them. Events with a dash/minus in their name cause parsing issues as legacy encoding of events would use a dash/minus as a separator. The parser separates events with dashes into prefixes and suffixes and then recombines them. Unfortunately if an event has part of its name that matches a legacy token then the recombining fails. This is seen for branch-brs where branch is a legacy token. branch-brs was introduced to sysfs in: https://lore.kernel.org/all/20220322221517.2510440-5-eranian@google.com/ The failure is shown below as well as the workaround to use a config where the dash/minus isn't treated specially: ``` $ perf stat -e branch-brs true event syntax error: 'branch-brs' \___ parser error $ perf stat -e cpu/branch-brs/ true Performance counter stats for 'true': 46,179 cpu/branch-brs/ ``` Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20221013011205.3151391-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-31perf evlist: Add missing util/event.h headerArnaldo Carvalho de Melo
Needed to get the event_attr_init() and perf_event_paranoid() prototypes that were being obtained indirectly, by sheer luck. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-31perf mmap: Remove several unneeded includes from util/mmap.hArnaldo Carvalho de Melo
Those headers are not needed in util/mmap.h, remove them. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-31perf tests: Add missing event.h includeArnaldo Carvalho de Melo
It uses things like perf_event__name() but were not including event.h, where its prototype lives, fix it. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-31perf thread: Move thread__resolve() from event.hArnaldo Carvalho de Melo
Its a thread method, so move it to thread.h, this way some places that were using event.h just to get this prototype may stop doing so and speed up building and disentanble the header dependency graph. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-31perf symbol: Move addr_location__put() from event.hArnaldo Carvalho de Melo
Its a addr_location method, so move it to symbol.h, where 'struct addr_location' is, this way some places that were using event.h just to get this prototype may stop doing so and speed up building and disentanble the header dependency graph. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-31perf machine: Move machine__resolve() from event.hArnaldo Carvalho de Melo
Its a machine method, so move it to machine.h, this way some places that were using event.h just to get this prototype may stop doing so and speed up building and disentanble the header dependency graph. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-31perf kwork: Remove includes not needed in kwork.hArnaldo Carvalho de Melo
Leave just some forward declarations for pointers, move the includes to where they are really needed. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-31perf tools: Move 'struct perf_sample' to a separate header file to ↵Arnaldo Carvalho de Melo
disentangle headers Some places were including event.h just to get 'struct perf_sample', move it to a separate place so that we speed up a bit the build. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-31perf branch: Remove some needless headers, add a needed oneArnaldo Carvalho de Melo
map_symbol.h is needed because we have structs that contains 'struct addr_map_symbol', so add it, remove the others. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-31perf bpf: No need to include headers just use forward declarationsArnaldo Carvalho de Melo
In the bpf-prologue.h header we are just using pointers, so no need to include headers for that, just provide forward declarations for those types. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf bpf: No need to include compiler.h when HAVE_LIBBPF_SUPPORT is trueArnaldo Carvalho de Melo
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf tools: Make quiet mode consistent between toolsJames Clark
Use the global quiet variable everywhere so that all tools hide warnings in quiet mode and update the documentation to reflect this. 'perf probe' claimed that errors are not printed in quiet mode but I don't see this so remove it from the docs. Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221018094137.783081-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf tools: Fix "kernel lock contention analysis" test by not printing ↵James Clark
warnings in quiet mode Especially when CONFIG_LOCKDEP and other debug configs are enabled, Perf can print the following warning when running the "kernel lock contention analysis" test: Warning: Processed 1378918 events and lost 4 chunks! Check IO/CPU overload! Warning: Processed 4593325 samples and lost 70.00%! The test already supplies -q to run in quiet mode, so extend quiet mode to perf_stdio__warning() and also ui__warning() for consistency. This fixes the following failure due to the extra lines counted: perf test "lock cont" -vvv 82: kernel lock contention analysis test : --- start --- test child forked, pid 3125 Testing perf lock record and perf lock contention [Fail] Recorded result count is not 1: 9 test child finished with -1 ---- end ---- kernel lock contention analysis test: FAILED! Fixes: ec685de25b6718f8 ("perf test: Add kernel lock contention test") Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221018094137.783081-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf test: Do not set TEST_SKIP for record subtestsNamhyung Kim
It now has 4 sub tests and at least one of them should run. But once the TEST_SKIP (= 2) return value is set, it won't be overwritten unless there's a failure. I think we should return success when one or more tests are skipped but the remaining subtests are passed. So update the test code not to set the err variable when it skips the test. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221020172643.3458767-9-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf test: Test record with --threads optionNamhyung Kim
The --threads option changed the 'perf record' behavior significantly, so it'd be nice if we test it separately. Add --threads options with different argument in each test supported and check the result. Also update the cleanup routine because threads recording produces data in a directory. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221020172643.3458767-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf test: Add target workload test in 'perf record' testsNamhyung Kim
Add a subtest which profiles the given workload on the command line. As it's a minimal requirement, the test should run ok so it doesn't skip the test even if it failed to run the 'perf record' command. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221020172643.3458767-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf test: Add system-wide mode in 'perf record' testsNamhyung Kim
Add system wide recording test with the same pattern. It'd skip the test when it fails to run 'perf record'. For system-wide mode, it needs to avoid build-id collection and synthesis because the test only cares about the test program and kernel would generate the necessary events as the process starts. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221020172643.3458767-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf test: Wait for a new thread when testing --per-thread recordNamhyung Kim
Just running the target program is not enough to test multi-thread target because it'd be racy perf vs target startup. I used the initial delay but it cannot guarantee for perf to see the thread. Instead, use wait_for_threads helper from shell/lib/waiting.sh to make sure it starts the sibling thread first. Then perf record can use -p option to profile the target process. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221020172643.3458767-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf test: Use a test program in 'perf record' testsNamhyung Kim
If the system has cc it could build a test program with two threads and then use it for more detailed testing. Also it accepts an option to run a thread forever to ensure multi-thread runs. If cc is not found, it falls back to use the default value 'true'. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221020172643.3458767-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf test: Fix shellcheck issues in the record testNamhyung Kim
Basically there are 3 issues: 1. quote shell expansion 2. do not use egrep 3. use upper case letters for signal names Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221020172643.3458767-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf test: Do not use instructions:u explicitlyNamhyung Kim
I think it's to support non-root user tests. But perf record can handle the case and fall back to a software event (cpu-clock). Practically this would affect when it's run on a VM, but it seems no reason to prevent running the test in the guest. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221020172643.3458767-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf scripts python: intel-pt-events.py: Add ability interleave outputAdrian Hunter
Intel PT timestamps are not provided for every branch, let alone every instruction, so there can be many samples with the same timestamp. With per-cpu contexts, decoding is done for each CPU in turn, which can make it difficult to see what is happening on different CPUs at the same time. Currently the interleaving from perf script --itrace=i0ns is quite coarse grained. There are often long stretches executing on one CPU and nothing on another. Some people are interested in seeing what happened on multiple CPUs before a crash to debug races etc. To improve perf script interleaving for parallel execution, the intel-pt-events.py script has been enhanced to enable interleaving the output with the same timestamp from different CPUs. It is understood that interleaving is not perfect or causal. Add parameter --interleave [<n>] to interleave sample output for the same timestamp so that no more than n samples for a CPU are displayed in a row. 'n' defaults to 4. Note this only affects the order of output, and only when the timestamp is the same. Example: $ perf script intel-pt-events.py --insn-trace --interleave 3 ... bash 2267/2267 [004] 9323.692625625 563caa3c86f0 jz 0x563caa3c89c7 run_pending_traps+0x30 (/usr/bin/bash) IPC: 1.52 (38/25) bash 2267/2267 [004] 9323.692625625 563caa3c89c7 movq 0x118(%rsp), %rax run_pending_traps+0x307 (/usr/bin/bash) bash 2267/2267 [004] 9323.692625625 563caa3c89cf subq %fs:0x28, %rax run_pending_traps+0x30f (/usr/bin/bash) bash 2270/2270 [007] 9323.692625625 55dc58cabf02 jz 0x55dc58cabf48 unquoted_glob_pattern_p+0x102 (/usr/bin/bash) IPC: 1.56 (25/16) bash 2270/2270 [007] 9323.692625625 55dc58cabf04 cmp $0x5d, %al unquoted_glob_pattern_p+0x104 (/usr/bin/bash) bash 2270/2270 [007] 9323.692625625 55dc58cabf06 jnz 0x55dc58cabf10 unquoted_glob_pattern_p+0x106 (/usr/bin/bash) bash 2264/2264 [001] 9323.692625625 7fd556a4376c jbe 0x7fd556a43ac8 round_and_return+0x3fc (/usr/lib/x86_64-linux-gnu/libc.so.6) IPC: 4.30 (43/10) bash 2264/2264 [001] 9323.692625625 7fd556a43772 and $0x8, %edx round_and_return+0x402 (/usr/lib/x86_64-linux-gnu/libc.so.6) bash 2264/2264 [001] 9323.692625625 7fd556a43775 jnz 0x7fd556a43ac8 round_and_return+0x405 (/usr/lib/x86_64-linux-gnu/libc.so.6) bash 2267/2267 [004] 9323.692625625 563caa3c89d8 jnz 0x563caa3c8b11 run_pending_traps+0x318 (/usr/bin/bash) bash 2267/2267 [004] 9323.692625625 563caa3c89de add $0x128, %rsp run_pending_traps+0x31e (/usr/bin/bash) bash 2267/2267 [004] 9323.692625625 563caa3c89e5 popq %rbx run_pending_traps+0x325 (/usr/bin/bash) ... Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20221020152509.5298-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf event: Drop perf_regs.h include, not needed anymoreArnaldo Carvalho de Melo
Since commit c897899752478d4c ("perf tools: Prevent out-of-bounds access to registers") the util/event.h header doesn't use anything from util/perf_regs.h, so drop it to untangle the header dependency tree a bit, speeding up compilation. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf scripting python: Add missing util/perf_regs.h include to get ↵Arnaldo Carvalho de Melo
perf_reg_name() prototype It was getting it via event.h, that doesn't need that include anymore and will drop it. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf arch x86: Add missing stdlib.h to get free() prototypeArnaldo Carvalho de Melo
It was getting indirectly, out of luck, add it. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf unwind arm64: Remove needless event.h & thread.h includesArnaldo Carvalho de Melo
To reduce compile time and header dependency chains just add forward declarations for pointer types and include linux/types.h for u64. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf config: Add missing newline on pr_warning() call in home_perfconfig()Yang Jihong
Add missing newline on pr_warning() call in home_perfconfig(). Before: # perf record File /home/yangjihong/.perfconfig not owned by current user or root, ignoring it.Couldn't synthesize bpf events. After: # perf record File /home/yangjihong/.perfconfig not owned by current user or root, ignoring it. Couldn't synthesize bpf events. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221022092735.114967-4-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf daemon: Complete list of supported subcommand in help messageYang Jihong
perf daemon supports start, signal, stop and ping subcommands, complete it Before: # perf daemon -h Usage: perf daemon start [<options>] or: perf daemon [<options>] -v, --verbose be more verbose -x, --field-separator[=<field separator>] print counts with custom separator --base <directory> base directory --config <config file> config file path After: # perf daemon -h Usage: perf daemon {start|signal|stop|ping} [<options>] or: perf daemon [<options>] -v, --verbose be more verbose -x, --field-separator[=<field separator>] print counts with custom separator --base <directory> base directory --config <config file> config file path Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221022092735.114967-3-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Remove unused perf_counts.aggr fieldNamhyung Kim
The aggr field in the struct perf_counts is to keep the aggregated value in the AGGR_GLOBAL for the old code. But it's not used anymore. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-21-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Display percore events properlyNamhyung Kim
The recent change in the perf stat broke the percore event display. Note that the aggr counts are already processed so that the every sibling thread in the same core will get the per-core counter values. Check percore evsels and skip the sibling threads in the display. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-20-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Display event stats using aggr countsNamhyung Kim
Now aggr counts are ready for use. Convert the display routines to use the aggr counts and update the shadow stat with them. It doesn't need to aggregate counts or collect aliases anymore during the display. Get rid of now unused struct perf_aggr_thread_value. Note that there's a difference in the display order among the aggr mode. For per-core/die/socket/node aggregation, it shows relevant events in the same unit together, whereas global/thread/no aggregation it shows the same events for different units together. So it still uses separate codes to display them due to the ordering. One more thing to note is that it breaks per-core event display for now. The next patch will fix it to have identical output as of now. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-19-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Add perf_stat_process_shadow_stats()Namhyung Kim
This function updates the shadow stats using the aggregated counts uniformly since it uses the aggr_counts for the every aggr mode. It'd have duplicate shadow stats for each items for now since the display routines will update them once again. But that'd be fine as it shows the average values and it'd be gone eventually. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-18-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Add perf_stat_process_percore()Namhyung Kim
The perf_stat_process_percore() is to aggregate counts for an event per-core even if the aggr_mode is AGGR_NONE. This is enabled when user requested it on the command line. To handle that, it keeps the per-cpu counts at first. And then it aggregates the counts that have the same core id in the aggr->counts and updates the values for each cpu back. Later, per-core events will skip one of the CPUs unless percore-show-thread option is given. In that case, it can simply print all cpu stats with the updated (per-core) values. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-17-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Add perf_stat_merge_counters()Namhyung Kim
The perf_stat_merge_counters() is to aggregate the same events in different PMUs like in case of uncore or hybrid. The same logic is in the stat-display routines but I think it should be handled when it processes the event counters. As it works on the aggr_counters, it doesn't change the output yet. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-16-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Split process_counters() to share it with process_stat_round_event()Namhyung Kim
It'd do more processing with aggregation. Let's split the function so that it can be shared with by process_stat_round_event() too. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-15-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Reset aggr counts for each intervalNamhyung Kim
The evsel->stats->aggr->count should be reset for interval processing since we want to use the values directly for display. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-14-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf stat: Allocate aggr counts for recorded dataNamhyung Kim
In the process_stat_config_event() it sets the aggr_mode that means the earlier evlist__alloc_stats() cannot allocate the aggr counts due to the missing aggr_mode. Do it after setting the aggr_map using evlist__alloc_aggr_stats(). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221018020227.85905-13-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>