linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2024-08-12	perf script: add --addr2line option	Martin Liška
	Similarly to other subcommands (like report, top), it would be handy to provide a path for addr2line command. Signed-off-by: Martin Liska <martin.liska@hey.com> Cc: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/eadc3e36-029d-4848-9d69-272fe5a83a26@foxlink.cz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-09	perf annotate-data: Support --skip-empty option	Namhyung Kim
	The --skip-empty option is to hide dummy events in a group. Like other output mode in 'perf report' and 'perf annotate', the data-type profiling output should support the option. Committer testing: With dummy: root@number:~# perf annotate --stdio --group --data-type --skip-empty \| head -24 Annotate type: 'pthread_mutex_t' in /usr/lib64/libc.so.6 (50 samples): event[0] = cpu_atom/mem-loads,ldlat=30/P event[1] = cpu_atom/mem-stores/P event[2] = dummy:u ============================================================================ Percent offset size field 100.00 100.00 0.00 0 40 pthread_mutex_t { 100.00 100.00 0.00 0 40 struct __pthread_mutex_s __data { 45.21 84.54 0.00 0 4 int __lock; 0.00 0.00 0.00 4 4 unsigned int __count; 0.00 1.83 0.00 8 4 int __owner; 5.19 10.65 0.00 12 4 unsigned int __nusers; 49.61 2.97 0.00 16 4 int __kind; 0.00 0.00 0.00 20 2 short int __spins; 0.00 0.00 0.00 22 2 short int __elision; 0.00 0.00 0.00 24 16 __pthread_list_t __list { 0.00 0.00 0.00 24 8 struct __pthread_internal_list* __prev; 0.00 0.00 0.00 32 8 struct __pthread_internal_list* __next; }; }; 0.00 0.00 0.00 0 0 char[] __size; 45.21 84.54 0.00 0 8 long int __align; }; Skipping it: root@number:~# perf annotate --stdio --group --data-type --skip-empty \| head -24 Annotate type: 'pthread_mutex_t' in /usr/lib64/libc.so.6 (50 samples): event[0] = cpu_atom/mem-loads,ldlat=30/P event[1] = cpu_atom/mem-stores/P ============================================================================ Percent offset size field 100.00 100.00 0 40 pthread_mutex_t { 100.00 100.00 0 40 struct __pthread_mutex_s __data { 45.21 84.54 0 4 int __lock; 0.00 0.00 4 4 unsigned int __count; 0.00 1.83 8 4 int __owner; 5.19 10.65 12 4 unsigned int __nusers; 49.61 2.97 16 4 int __kind; 0.00 0.00 20 2 short int __spins; 0.00 0.00 22 2 short int __elision; 0.00 0.00 24 16 __pthread_list_t __list { 0.00 0.00 24 8 struct __pthread_internal_list* __prev; 0.00 0.00 32 8 struct __pthread_internal_list* __next; }; }; 0.00 0.00 0 0 char[] __size; 45.21 84.54 0 8 long int __align; }; Annotate type: 'pthread_mutexattr_t' in /usr/lib64/libc.so.6 (1 samples): root@number:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240807061713.1642924-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-09	perf debuginfo: Fix the build with !HAVE_DWARF_SUPPORT	Arnaldo Carvalho de Melo
	In that case we have a set of placeholder functions, one of them uses a 'Dwarf_Addr' type that is not present as it is defined in the missing DWARF libraries, so provide a placeholder typedef for that as well. The build error before this patch: In file included from util/annotate.c:28: util/debuginfo.h:44:46: error: unknown type name ‘Dwarf_Addr’ 44 \| Dwarf_Addr offs __maybe_unused, \| ^~~~~~~~~~ make[6]: [/home/acme/git/perf-tools-next/tools/build/Makefile.build:106: util/annotate.o] Error 1 make[6]: * Waiting for unfinished jobs.... Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/lkml/CAM9d7ciushSwEfj7yW4rtDEJBTcCB991V4cswwFEL+cv6QF2pg@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-09	perf script python: Add the 'ins_lat' field to event handler	Zixian Cai
	For example, when using the Alder Lake PMU memory load event, the instruction latency is stored in 'ins_lat', while the cache latency is stored in 'weight'. This patch reports the 'ins_lat' field for Python scripting. Committer testing: On a Rocket Lake Refresh Intel machine (14th gen): root@number:~# grep -m1 'model name' /proc/cpuinfo model name : Intel(R) Core(TM) i7-14700K root@number:~# perf mem record -a sleep 5 Memory events are enabled on a subset of CPUs: 16-27 [ perf record: Woken up 85 times to write data ] [ perf record: Captured and wrote 41.236 MB perf.data (191390 samples) ] root@number:~# perf evlist -v cpu_atom/mem-loads,ldlat=30/P: type: 10 (cpu_atom), size: 136, config: 0x5d0 (mem-loads), { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|ADDR\|CPU\|PERIOD\|IDENTIFIER\|DATA_SRC\|WEIGHT_STRUCT, read_format: ID\|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x1f cpu_atom/mem-stores/P: type: 10 (cpu_atom), size: 136, config: 0x6d0 (mem-stores), { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|ADDR\|CPU\|PERIOD\|IDENTIFIER\|DATA_SRC\|WEIGHT_STRUCT, read_format: ID\|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1 dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP\|TID\|TIME\|ADDR\|CPU\|IDENTIFIER\|DATA_SRC\|WEIGHT_STRUCT, read_format: ID\|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 root@number:~# Now generate a python script to then dump the dictionary that now needs to have that 'ins_lat' field: root@number:~# perf script --gen python generated Python script: perf-script.py root@number:~# vim perf-script.py root@number:~# perf script -s perf-script.py \| head -40 in trace_begin in trace_end root@number:~# vim perf-script.py Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Zixian Cai <fzczx123@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240809080137.3590148-1-fzczx123@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08	perf callchain: Fix stitch LBR memory leaks	Ian Rogers
	The 'struct callchain_cursor_node' has a 'struct map_symbol' whose maps and map members are reference counted. Ensure these values use a _get routine to increment the reference counts and use map_symbol__exit() to release the reference counts. Do similar for 'struct thread's prev_lbr_cursor, but save the size of the prev_lbr_cursor array so that it may be iterated. Ensure that when stitch_nodes are placed on the free list the map_symbols are exited. Fix resolve_lbr_callchain_sample() by replacing list_replace_init() to list_splice_init(), so the whole list is moved and nodes aren't leaked. A reproduction of the memory leaks is possible with a leak sanitizer build in the perf report command of: ``` $ perf record -e cycles --call-graph lbr perf test -w thloop $ perf report --stitch-lbr ``` Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Fixes: ff165628d72644e3 ("perf callchain: Stitch LBR call stack") Signed-off-by: Ian Rogers <irogers@google.com> [ Basic tests after applying the patch, repeating the example above ] Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Changbin Du <changbin.du@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240808054644.1286065-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08	perf annotate-data: Show typedef names properly	Namhyung Kim
	The die_get_typename() would resolve typedef and get to the original type. But sometimes the original type is a struct without name and it makes the output confusing and hard to read. This is a diff of perf report -s type before and after the change. New types such as atomic{,64}_t and sigset_t appeared and the portion of unnamed struct was reduced. Also u32, u64 and size_t were splitted from the base types. --- b 2024-08-01 17:02:34.307809952 -0700 +++ a 2024-08-07 14:17:05.245853999 -0700 - 2.40% long unsigned int + 2.26% long unsigned int - 1.56% unsigned int + 1.27% unsigned int - 0.98% struct - 0.79% long long unsigned int + 0.58% long long unsigned int + 0.36% struct + 0.27% atomic64_t + 0.22% u32 + 0.21% u64 + 0.19% atomic_t + 0.13% size_t - 0.08% struct seqcount_spinlock + 0.08% seqcount_spinlock_t + 0.08% sigset_t + 0.08% __poll_t Let's use the typedef name directly and the resolved to get the size of the type. Committer testing: root@x1:~# diff -u before after \| head -30 --- before 2024-08-08 09:35:13.917325041 -0300 +++ after 2024-08-08 09:37:35.312257905 -0300 @@ -10,25 +10,27 @@ # ........ ......... # 79.40% (unknown) - 2.28% union 1.96% (stack operation) - 1.24% struct + 1.87% pthread_mutex_t 0.99% u32[] - 0.92% unsigned int 0.77% struct task_struct + 0.75% U32 0.75% struct pcpu_hot 0.63% struct qspinlock + 0.61% atomic_t 0.59% struct list_head - 0.58% int 0.53% struct cfs_rq 0.51% BYTE* - 0.48% unsigned char + 0.48% BYTE 0.48% long unsigned int 0.46% struct rq 0.41% struct worker 0.41% struct memcg_vmstats_percpu + 0.41% pthread_cond_t 0.37% _Bool + 0.36% int root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240807223129.1738004-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08	perf annotate: Cache debuginfo for data type profiling	Namhyung Kim
	In find_data_type(), it creates and deletes a debug info whenver it tries to find data type for a sample. This is inefficient and it most likely accesses the same binary again and again. Let's add a single entry cache the debug info structure for the last DSO. Depending on sample data, it usually gives me 2~3x (and sometimes more) speed ups. Note that this will introduce a little difference in the output due to the order of checking stack operations. It used to check the stack ops before checking the availability of debug info but I moved it after the symbol check. So it'll report stack operations in DSOs without debug info as unknown. But I think it's ok and better to have the checking near the caching logic. Committer testing: root@x1:~# perf mem record -a sleep 5s root@x1:~# perf evlist cpu_atom/mem-loads,ldlat=30/P cpu_atom/mem-stores/P dummy:u root@x1:~# diff -u before after --- before 2024-08-08 09:33:53.880780784 -0300 +++ after 2024-08-08 09:35:13.917325041 -0300 @@ -81,8 +81,8 @@ # Overhead Data Type # ........ ......... # - 55.43% (unknown) - 11.61% (stack operation) + 55.56% (unknown) + 11.48% (stack operation) 4.93% struct pcpu_hot 3.26% unsigned int 2.48% struct Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240805234648.1453689-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08	perf hist: Fix reference counting of branch_info	Ian Rogers
	iter_finish_branch_entry() doesn't put the branch_info from/to map elements creating memory leaks. This can be seen with: ``` $ perf record -e cycles -b perf test -w noploop $ perf report -D ... Direct leak of 984344 byte(s) in 123043 object(s) allocated from: #0 0x7fb2654f3bd7 in malloc libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x564d3400d10b in map__get util/map.h:186 #2 0x564d3400d10b in ip__resolve_ams util/machine.c:1981 #3 0x564d34014d81 in sample__resolve_bstack util/machine.c:2151 #4 0x564d34094790 in iter_prepare_branch_entry util/hist.c:898 #5 0x564d34098fa4 in hist_entry_iter__add util/hist.c:1238 #6 0x564d33d1f0c7 in process_sample_event tools/perf/builtin-report.c:334 #7 0x564d34031eb7 in perf_session__deliver_event util/session.c:1655 #8 0x564d3403ba52 in do_flush util/ordered-events.c:245 #9 0x564d3403ba52 in __ordered_events__flush util/ordered-events.c:324 #10 0x564d3402d32e in perf_session__process_user_event util/session.c:1708 #11 0x564d34032480 in perf_session__process_event util/session.c:1877 #12 0x564d340336ad in reader__read_event util/session.c:2399 #13 0x564d34033fdc in reader__process_events util/session.c:2448 #14 0x564d34033fdc in __perf_session__process_events util/session.c:2495 #15 0x564d34033fdc in perf_session__process_events util/session.c:2661 #16 0x564d33d27113 in __cmd_report tools/perf/builtin-report.c:1065 #17 0x564d33d27113 in cmd_report tools/perf/builtin-report.c:1805 #18 0x564d33e0ccb7 in run_builtin tools/perf/perf.c:350 #19 0x564d33e0d45e in handle_internal_command tools/perf/perf.c:403 #20 0x564d33cdd827 in run_argv tools/perf/perf.c:447 #21 0x564d33cdd827 in main tools/perf/perf.c:561 ... ``` Clearing up the map_symbols properly creates maps reference count issues so resolve those. Resolving this issue doesn't improve peak heap consumption for the test above. Committer testing: $ sudo dnf install libasan $ make -k CORESIGHT=1 EXTRA_CFLAGS="-fsanitize=address" CC=clang O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/20240807065136.1039977-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-05	perf annotate: Add --skip-empty option	Namhyung Kim
	Like in 'perf report', we want to hide empty events in the 'perf annotate' output. This is consistent when the option is set in perf report. For example, the following command would use 3 events including dummy. $ perf mem record -a -- perf test -w noploop $ perf evlist cpu/mem-loads,ldlat=30/P cpu/mem-stores/P dummy:u Just using perf annotate with --group will show the all 3 events. $ perf annotate --group --stdio \| head Percent \| Source code & Disassembly of ... -------------------------------------------------------------- : 0 0xe060 <_dl_relocate_object>: 0.00 0.00 0.00 : e060: pushq %rbp 0.00 0.00 0.00 : e061: movq %rsp, %rbp 0.00 0.00 0.00 : e064: pushq %r15 0.00 0.00 0.00 : e066: movq %rdi, %r15 0.00 0.00 0.00 : e069: pushq %r14 0.00 0.00 0.00 : e06b: pushq %r13 0.00 0.00 0.00 : e06d: movl %edx, %r13d Now with --skip-empty, it'll hide the last dummy event. $ perf annotate --group --stdio --skip-empty \| head Percent \| Source code & Disassembly of ... ------------------------------------------------------ : 0 0xe060 <_dl_relocate_object>: 0.00 0.00 : e060: pushq %rbp 0.00 0.00 : e061: movq %rsp, %rbp 0.00 0.00 : e064: pushq %r15 0.00 0.00 : e066: movq %rdi, %r15 0.00 0.00 : e069: pushq %r14 0.00 0.00 : e06b: pushq %r13 0.00 0.00 : e06d: movl %edx, %r13d Committer testing: root@x1:~# perf evlist cpu_atom/mem-loads,ldlat=30/P cpu_atom/mem-stores/P dummy:u root@x1:~# Before: root@x1:~# perf annotate --group --stdio2 do_lookup_x \| head -25 Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 769079, [percent: local period] do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2 Percent 0x9900 <do_lookup_x>: pushq %rbp movq %rsp,%rbp pushq %r15 pushq %r14 pushq %r13 pushq %r12 pushq %rbx subq $0x88,%rsp movq %rdi,-0x50(%rbp) movl 8(%r9),%edi movq 0x10(%rbp),%r12 movq 0x28(%rbp),%r10 movq %rdx,-0x70(%rbp) movq %rcx,-0x58(%rbp) movq %rdi,%r11 0.00 5.73 0.00 movq %r8,-0x68(%rbp) movq (%r9),%r8 movl %esi,%eax 8.30 0.00 0.00 movl 0x30(%rbp),%r9d movl %esi,%r15d shrl $6, %eax movq %r8,%r13 root@x1:~# After: root@x1:~# perf annotate --group --skip-empty --stdio2 do_lookup_x \| head -25 Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 769079, [percent: local period] do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2 Percent 0x9900 <do_lookup_x>: pushq %rbp movq %rsp,%rbp pushq %r15 pushq %r14 pushq %r13 pushq %r12 pushq %rbx subq $0x88,%rsp movq %rdi,-0x50(%rbp) movl 8(%r9),%edi movq 0x10(%rbp),%r12 movq 0x28(%rbp),%r10 movq %rdx,-0x70(%rbp) movq %rcx,-0x58(%rbp) movq %rdi,%r11 0.00 5.73 movq %r8,-0x68(%rbp) movq (%r9),%r8 movl %esi,%eax 8.30 0.00 movl 0x30(%rbp),%r9d movl %esi,%r15d shrl $6, %eax movq %r8,%r13 root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240803211332.1107222-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-05	perf annotate: Set al->data_nr using the notes->src->nr_events	Namhyung Kim
	This is a preparation to support skipping empty events. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240803211332.1107222-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-05	perf annotate: Use annotation__pcnt_width() consistently	Namhyung Kim
	The annotation__pcnt_width() calculates the screen width for the overhead (percent) area considering event groups properly. Use this function consistently so that we can make sure it has similar output in different modes. But there's a difference in stdio and tui output: stdio uses 8 and tui uses 7 for a percent. Let's use 8 and adjust the print width in __annotation_line__write() properly. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240803211332.1107222-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-05	perf annotate: Set notes->src->nr_events early	Namhyung Kim
	We want to use it in different places so make sure it sets properly in symbol__annotate() before creating the disasm lines. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240803211332.1107222-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-05	perf annotate: Use al->data_nr if possible	Namhyung Kim
	The data_nr keeps the number of entries in al->data[] so it should use it when it iterates the array. The notes->src->nr_events should have the same number but it'd be natural to use al->data_nr. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240803211332.1107222-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-01	perf tools: Add mode argument to sort_help()	Namhyung Kim
	Some sort keys are meaningful only in a specific mode - like branch stack and memory (data-src). Add the mode to skip unnecessary ones. This will be used for 'perf mem report' later. While at it, change the prefix for the -F/--fields option to remove the duplicate part. Before: $ perf report -F Error: switch `F' requires a value Usage: perf report [<options>] -F, --fields <key[,keys...]> output field(s): overhead period sample overhead overhead_sys overhead_us overhead_guest_sys overhead_guest_us overhead_children sample period weight1 weight2 weight3 ins_lat retire_lat ... After: $ perf report -F Error: switch `F' requires a value Usage: perf report [<options>] -F, --fields <key[,keys...]> output field(s): overhead overhead_sys overhead_us overhead_guest_sys overhead_guest_us overhead_children sample period weight1 weight2 weight3 ins_lat retire_lat ... Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240731235505.710436-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-01	perf hist: Correct hist_entry->mem_info refcounts	Namhyung Kim
	The 'struct mem_info' is created by iter_prepare_mem_entry() at the beginning and destroyed by iter_finish_mem_entry() at the end. So if it's used in a new hist_entry, it should be cloned. Simplify (hopefully) the logic by adding some helper functions and by not holding the refcount in the temporary entry. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240731235505.710436-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-01	perf python: Remove PYTHON_PERF ifdefs	Ian Rogers
	When perf code was compiled one way for the binary and another for the python module, the PYTHON_PERF ifdef was used to remove some code from the python module. Since switching to building the perf code as a series of libraries, with the same libraries being used for the python module, the ifdefs became unused as PYTHON_PERF is never defined. As such remove the ifdefs. Fixes: 9dabf4003423c8d3 ("perf python: Switch module to linking libraries from building source") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240731230005.12295-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-01	perf bpf: Move BPF disassembly routines to separate file to avoid clash with ↵	Arnaldo Carvalho de Melo
	capstone bpf headers There is a clash of the libbpf and capstone libraries, that ends up with: In file included from /usr/include/capstone/capstone.h:325, from util/disasm.c:1513: /usr/include/capstone/bpf.h:94:14: error: ‘bpf_insn’ defined as wrong kind of tag 94 \| typedef enum bpf_insn { So far we're just trying to avoid this by not having both headers included in the same .c or .h file, do it one more time by moving the BPF diassembly routines from util/disasm.c to util/disasm_bpf.c. This is only being hit when building with BUILD_NONDISTRO=1, i.e. building with binutils-devel, that isn't the in the default build due to a licencing clash. We need to reimplement what is now isolated in util/disasm_bpf.c using some other library to have BPF annotation feature that now only is available with BUILD_NONDISTRO=1. Fixes: 6d17edc113de1e21 ("perf annotate: Use libcapstone to disassemble") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZqpUSKPxMwaQKORr@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-01	perf bpf-filter: Support separate lost counts for each filter	Namhyung Kim
	As the BPF filter is shared between other processes, it should have its own counter for each invocation. Add a new array map (lost_count) to save the count using the same index as the filter. It should clear the count before running the filter. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240703223035.2024586-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-01	perf bpf-filter: Support pin/unpin BPF object	Namhyung Kim
	And use the pinned objects for unprivileged users to profile their own tasks. The BPF objects need to be pinned in the BPF-fs by root first and it'll be handled in the later patch. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240703223035.2024586-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-01	perf bpf-filter: Split per-task filter use case	Namhyung Kim
	If the target is a list of tasks, it can use a shared hash map for filter expressions. The key of the filter map is an integer index like in an array. A separate pid_hash map is added to get the index for the filter map using the tgid. For system-wide mode including per-cpu or per-user targets are handled by the single entry map like before. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240703223035.2024586-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-01	perf bpf-filter: Pass 'target' to perf_bpf_filter__prepare()	Namhyung Kim
	This is needed to prepare target-specific actions in the later patch. We want to reuse the pinned BPF program and map for regular users to profile their own processes. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240703223035.2024586-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-01	perf bpf-filter: Make filters map a single entry hashmap	Namhyung Kim
	And the value is now an array. This is to support multiple filter entries in the map later. No functional changes intended. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240703223035.2024586-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-01	perf list: Give clues if failed to open tracing events directory	Tiezhu Yang
	When executing the command "perf list", I met "Error: failed to open tracing events directory" twice, the first reason is that there is no "/sys/kernel/tracing/events" directory due to it does not enable the kernel tracing infrastructure with CONFIG_FTRACE, the second reason is that there is no root privileges. Add the error string to tell the users what happened and what should to do, and also call put_tracing_file() to free events_path a little later to avoid messy code in the error message. At the same time, just remove the redundant "/" of the file path in the function get_tracing_file(), otherwise it shows something like "/sys/kernel/tracing//events". Before: $ ./perf list Error: failed to open tracing events directory After: (1) Without CONFIG_FTRACE $ ./perf list Error: failed to open tracing events directory /sys/kernel/tracing/events: No such file or directory (2) With CONFIG_FTRACE but no root privileges $ ./perf list Error: failed to open tracing events directory /sys/kernel/tracing/events: Permission denied Committer testing: Redirect stdout to null to quickly test the patch: Before: $ perf list > /dev/null Error: failed to open tracing events directory $ After: $ perf list > /dev/null Error: failed to open tracing events directory /sys/kernel/tracing/events: Permission denied $ Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/20240730062301.23244-3-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31	perf ftrace: Add 'profile' command	Namhyung Kim
	The 'perf ftrace profile' command is to get function execution profiles using function-graph tracer so that users can see the total, average, max execution time as well as the number of invocations easily. The following is a profile for the perf_event_open syscall. $ sudo perf ftrace profile -G __x64_sys_perf_event_open -- \ perf stat -e cycles -C1 true 2> /dev/null \| head # Total (us) Avg (us) Max (us) Count Function 65.611 65.611 65.611 1 __x64_sys_perf_event_open 30.527 30.527 30.527 1 anon_inode_getfile 30.260 30.260 30.260 1 __anon_inode_getfile 29.700 29.700 29.700 1 alloc_file_pseudo 17.578 17.578 17.578 1 d_alloc_pseudo 17.382 17.382 17.382 1 __d_alloc 16.738 16.738 16.738 1 kmem_cache_alloc_lru 15.686 15.686 15.686 1 perf_event_alloc 14.012 7.006 11.264 2 obj_cgroup_charge # Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Changbin Du <changbin.du@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/lkml/20240729004127.238611-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31	perf ftrace: Add 'tail' option to --graph-opts	Namhyung Kim
	The 'graph-tail' option is to print function name as a comment at the end. This is useful when a large function is mixed with other functions (possibly from different CPUs). For example, $ sudo perf ftrace -- perf stat true ... 1) \| get_unused_fd_flags() { 1) \| alloc_fd() { 1) 0.178 us \| _raw_spin_lock(); 1) 0.187 us \| expand_files(); 1) 0.169 us \| _raw_spin_unlock(); 1) 1.211 us \| } 1) 1.503 us \| } $ sudo perf ftrace --graph-opts tail -- perf stat true ... 1) \| get_unused_fd_flags() { 1) \| alloc_fd() { 1) 0.099 us \| _raw_spin_lock(); 1) 0.083 us \| expand_files(); 1) 0.081 us \| _raw_spin_unlock(); 1) 0.601 us \| } /* alloc_fd / 1) 0.751 us \| } / get_unused_fd_flags */ Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Changbin Du <changbin.du@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/lkml/20240729004127.238611-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31	perf tools: Enable evsel__is_aux_event() to work for S390_CPUMSF	Adrian Hunter
	evsel__is_aux_event() identifies AUX area tracing selected events. S390_CPUMSF uses a raw event type (PERF_TYPE_RAW - refer s390_cpumsf_evsel_is_auxtrace()) not a PMU type value that could be checked in evsel__is_aux_event(). However it sets needs_auxtrace_mmap (refer auxtrace_record__init()), so check that first. Currently, the features that use evsel__is_aux_event() are used only by Intel PT, but that may change in the future. Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240715160712.127117-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31	perf cs-etm: Output 0 instead of 0xdeadbeef when exception packets are flushed	James Clark
	Normally exception packets don't directly output a branch sample, but if they're the last record in a buffer then they will. Because they don't have addresses set we'll see the placeholder value CS_ETM_INVAL_ADDR (0xdeadbeef) in the output. Since commit 6035b6804bdf ("perf cs-etm: Support dummy address value for CS_ETM_TRACE_ON packet") we've used 0 as an externally visible "not set" address value. For consistency reasons and to not make exceptions look like an error, change them to use 0 too. This is particularly visible when doing userspace only tracing because trace is disabled when jumping to the kernel, causing the flush and then forcing the last exception packet to be emitted as a branch. With kernel trace included, there is no flush so exception packets don't generate samples until the next range packet and they'll pick up the correct address. Before: $ perf record -e cs_etm//u -- stress -i 1 -t 1 $ perf script -F comm,ip,addr,flags stress syscall ffffb7eedbc0 => deadbeefdeadbeef stress syscall ffffb7f14a14 => deadbeefdeadbeef stress syscall ffffb7eedbc0 => deadbeefdeadbeef After: stress syscall ffffb7eedbc0 => 0 stress syscall ffffb7f14a14 => 0 stress syscall ffffb7eedbc0 => 0 Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: gankulkarni@os.amperecomputing.com Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240722152756.59453-2-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31	perf annotate: Set instruction name to be used with insn-stat when using raw ↵	Athira Rajeev
	instruction Since the "ins.name" is not set while using raw instruction, 'perf annotate' with insn-stat gives wrong data: Result from "./perf annotate --data-type --insn-stat": Annotate Instruction stats total 615, ok 419 (68.1%), bad 196 (31.9%) Name : Good Bad ----------------------------------------------------------- : 419 196 This patch sets "dl->ins.name" in arch specific function "check_ppc_insn" while initialising "struct disasm_line". Also update "ins_find" function to pass "struct disasm_line" as a parameter so as to set its name field in arch specific call. With the patch changes: Annotate Instruction stats total 609, ok 446 (73.2%), bad 163 (26.8%) Name/opcode : Good Bad ----------------------------------------------------------- 58 : 323 80 32 : 49 43 34 : 33 11 OP_31_XOP_LDX : 8 20 40 : 23 0 OP_31_XOP_LWARX : 5 1 OP_31_XOP_LWZX : 2 3 OP_31_XOP_LDARX : 3 0 33 : 0 2 OP_31_XOP_LBZX : 0 1 OP_31_XOP_LWAX : 0 1 OP_31_XOP_LHZX : 0 1 Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-16-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31	perf annotate: Add support to use libcapstone in powerpc	Athira Rajeev
	Now perf uses the capstone library to disassemble the instructions in x86. capstone is used (if available) for perf annotate to speed up. Currently it only supports x86 architecture. This patch includes changes to enable this in powerpc. For now, only for data type sort keys, this method is used and only binary code (raw instruction) is read. This is because powerpc approach to understand instructions and reg fields uses raw instruction. The "cs_disasm" is currently not enabled. While attempting to do cs_disasm, observation is that some of the instructions were not identified (ex: extswsli, maddld) and it had to fallback to use objdump. Hence enabling "cs_disasm" is added in comment section as a TODO for powerpc. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-15-atrajeev@linux.vnet.ibm.com [ Use dso__nsinfo(dso) as required to match EXTRA_CFLAGS=-DREFCNT_CHECKING=1 build expectations ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31	perf annotate: Use capstone_init and remove open_capstone_handle from disasm.c	Athira Rajeev
	capstone_init is made availbale for all archs to use and updated to enable support for CS_ARCH_PPC as well. Patch removes open_capstone_handle and uses capstone_init in all the places. Committer notes: Avoid including capstone/capstone.h from print_insn.h to not break the build in builtin-script.c due to the namespace clash with libbpf: /usr/include/capstone/bpf.h:94:14: error: 'bpf_insn' defined as wrong kind of tag Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-14-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31	perf annotate: Make capstone_init non-static so that it can be used during ↵	Athira Rajeev
	symbol disassemble symbol__disassemble_capstone in util/disasm.c calls function open_capstone_handle to open/init the capstone. We already have a capstone_init function in "util/print_insn.c". But capstone_init is defined as a static function in util/print_insn.c. Change this and also add the function in print_insn.h The open_capstone_handle checks the disassembler_style option from annotation_options to decide whether to set CS_OPT_SYNTAX_ATT. Add that logic in capstone_init also and by default set it to true. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-13-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31	perf annotate: Update instruction tracking for powerpc	Athira Rajeev
	Add instruction tracking function "update_insn_state_powerpc" for powerpc. Example sequence in powerpc: ld r10,264(r3) mr r31,r3 <<after some sequence> ld r9,312(r31) Consider ithe sample is pointing to: "ld r9,312(r31)". Here the memory reference is hit at "312(r31)" where 312 is the offset and r31 is the source register. Previous instruction sequence shows that register state of r3 is moved to r31. So to identify the data type for r31 access, the previous instruction ("mr") needs to be tracked and the state type entry has to be updated. Current instruction tracking support in perf tools infrastructure is specific to x86. Patch adds this support for powerpc as well. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-12-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31	perf annotate: Add some of the arithmetic instructions to support ↵	Athira Rajeev
	instruction tracking in powerpc Data-type profiling has the concept of instruction tracking. Example sequence in powerpc: ld r10,264(r3) mr r31,r3 <<after some sequence> ld r9,312(r31) or differently lwz r10,264(r3) add r31, r3, RB lwz r9, 0(r31) If a sample is hit at "lwz r9, 0(r31)", data type of r31 depends on previous instruction sequence here. So to track the previous instructions, patch adds changes to identify some of the arithmetic instructions which are having opcode as 31. Since memory instructions also has cases with opcode 31, use the bits 22:30 to filter the arithmetic instructions here. Also there are instructions with just two operands like "addme", "addze". This patch adds new instructions ops "arithmetic_ops" to handle this Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-10-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31	perf annotate: Add support to identify memory instructions of opcode 31 in ↵	Athira Rajeev
	powerpc There are memory instructions in powerpc with opcode as 31. Example: "ldx RT,RA,RB" , Its X form is as below: ______________________________________ \| 31 \| RT \| RA \| RB \| 21 \|/\| -------------------------------------- 0 6 11 16 21 30 31 The opcode for "ldx" is 31. There are other instructions also with opcode 31 which are memory insn like ldux, stbx, lwzx, lhaux But all instructions with opcode 31 are not memory. Example is add instruction: "add RT,RA,RB" The value in bit 21-30 [ 21 for ldx ] is different for these instructions. Patch uses this value to assign instruction ops for these cases. The naming convention and value to identify these are picked from defines in "arch/powerpc/include/asm/ppc-opcode.h" Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-9-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31	perf annotate: Add parse function for memory instructions in powerpc	Athira Rajeev
	Use the raw instruction code and macros to identify memory instructions, extract register fields and also offset. The implementation addresses the D-form, X-form, DS-form instructions. Two main functions are added. New parse function "load_store__parse" as instruction ops parser for memory instructions. Unlike other parsers (like mov__parse), this one fills in the "multi_regs" field for source/target and new added "mem_ref" field. No other fields are set because, here there is no need to parse the disassembled code and arch specific macros will take care of extracting offset and regs which is easier and will be precise. In powerpc, all instructions with a primary opcode from 32 to 63 are memory instructions. Update "ins__find" function to have "raw_insn" also as a parameter. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-8-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31	perf annotate: Update parameters for reg extract functions to use raw ↵	Athira Rajeev
	instruction on powerpc Use the raw instruction code and macros to identify memory instructions, extract register fields and also offset. The implementation addresses the D-form, X-form, DS-form instructions. Adds "mem_ref" field to check whether source/target has memory reference. Add function "get_powerpc_regs" which will set these fields: reg1, reg2, offset depending of where it is source or target ops. Update "parse" callback for "struct ins_ops" to also pass "struct disasm_line" as argument. This is needed in parse functions where opcode is used to determine whether to set multi_regs and other fields Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-7-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31	perf annotate: Add support to capture and parse raw instruction in powerpc ↵	Athira Rajeev
	using dso__data_read_offset utility Add support to capture and parse raw instruction in powerpc. Currently, the perf tool infrastructure uses two ways to disassemble and understand the instruction. One is objdump and other option is via libcapstone. Currently, the perf tool infrastructure uses "--no-show-raw-insn" option with "objdump" while disassemble. Example from powerpc with this option for an instruction address is: Snippet from: objdump --start-address=<address> --stop-address=<address> -d --no-show-raw-insn -C <vmlinux> c0000000010224b4: lwz r10,0(r9) This line "lwz r10,0(r9)" is parsed to extract instruction name, registers names and offset. Also to find whether there is a memory reference in the operands, "memory_ref_char" field of objdump is used. For x86, "(" is used as memory_ref_char to tackle instructions of the form "mov (%rax), %rcx". In case of powerpc, not all instructions using "(" are the only memory instructions. Example, above instruction can also be of extended form (X form) "lwzx r10,0,r19". Inorder to easy identify the instruction category and extract the source/target registers, patch adds support to use raw instruction for powerpc. Approach used is to read the raw instruction directly from the DSO file using "dso__data_read_offset" utility which is already implemented in perf infrastructure in "util/dso.c". Example: 38 01 81 e8 ld r4,312(r1) Here "38 01 81 e8" is the raw instruction representation. In powerpc, this translates to instruction form: "ld RT,DS(RA)" and binary code as: \| 58 \| RT \| RA \| DS \| \| ------------------------------------- 0 6 11 16 30 31 Function "symbol__disassemble_dso" is updated to read raw instruction directly from DSO using dso__data_read_offset utility. In case of above example, this captures: line: 38 01 81 e8 The above works well when 'perf report' is invoked with only sort keys for data type ie type and typeoff. Because there is no instruction level annotation needed if only data type information is requested for. For annotating sample, along with type and typeoff sort key, "sym" sort key is also needed. And by default invoking just "perf report" uses sort key "sym" that displays the symbol information. With approach changes in powerpc which first reads DSO for raw instruction, "perf annotate" and "perf report" + a key breaks since it doesn't do the instruction level disassembly. Snippet of result from 'perf report': Samples: 1K of event 'mem-loads', 4000 Hz, Event count (approx.): 937238 do_work /usr/bin/pmlogger [Percent: local period] Percent│ ea230010 │ 3a550010 │ 3a600000 │ 38f60001 │ 39490008 │ 42400438 51.44 │ 81290008 │ 7d485378 Here, raw instruction is displayed in the output instead of human readable annotated form. One way to get the appropriate data is to specify "--objdump path", by which code annotation will be done. But the default behaviour will be changed. To fix this breakage, check if "sym" sort key is set. If so fallback and use the libcapstone/objdump way of disassmbling the sample. With the changes and "perf report" Samples: 1K of event 'mem-loads', 4000 Hz, Event count (approx.): 937238 do_work /usr/bin/pmlogger [Percent: local period] Percent│ ld r17,16(r3) │ addi r18,r21,16 │ li r19,0 │ 8b0: rldicl r10,r10,63,33 │ addi r10,r10,1 │ mtctr r10 │ ↓ b 8e4 │ 8c0: addi r7,r22,1 │ addi r10,r9,8 │ ↓ bdz d00 51.44 │ lwz r9,8(r9) │ mr r8,r10 │ cmpw r20,r9 Committer notes: Just add the extern for 'sort_order' in disasm.c so that we don't end up breaking the build due to this type colision with capstone and libbpf: In file included from /usr/include/capstone/capstone.h:325, from /git/perf-6.10.0/tools/perf/util/print_insn.h:23, from builtin-script.c:38: /usr/include/capstone/bpf.h:94:14: error: 'bpf_insn' defined as wrong kind of tag 94 \| typedef enum bpf_insn { I reported this to the bpf mailing list, see one of the links below. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-6-atrajeev@linux.vnet.ibm.com Link: https://lore.kernel.org/bpf/ZqOltPk9VQGgJZAA@x1/T/#u Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31	perf annotate: Add disasm_line__parse() to parse raw instruction for powerpc	Athira Rajeev
	Currently, the perf tool infrastructure uses the disasm_line__parse function to parse disassembled line. Example snippet from objdump: objdump --start-address=<address> --stop-address=<address> -d --no-show-raw-insn -C <vmlinux> c0000000010224b4: lwz r10,0(r9) This line "lwz r10,0(r9)" is parsed to extract instruction name, registers names and offset. In powerpc, the approach for data type profiling uses raw instruction instead of result from objdump to identify the instruction category and extract the source/target registers. Example: 38 01 81 e8 ld r4,312(r1) Here "38 01 81 e8" is the raw instruction representation. Add function "disasm_line__parse_powerpc" to handle parsing of raw instruction. Also update "struct disasm_line" to save the binary code/ With the change, function captures: line -> "38 01 81 e8 ld r4,312(r1)" raw instruction "38 01 81 e8" Raw instruction is used later to extract the reg/offset fields. Macros are added to extract opcode and register fields. "struct disasm_line" is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw code (raw). Function "disasm_line__parse_powerpc fills the raw instruction hex value and can use macros to get opcode. There is no changes in existing code paths, which parses the disassembled code. The size of raw instruction depends on architecture. In case of powerpc, the parsing the disasm line needs to handle cases for reading binary code directly from DSO as well as parsing the objdump result. Hence adding the logic into separate function instead of updating "disasm_line__parse". The architecture using the instruction name and present approach is not altered. Since this approach targets powerpc, the macro implementation is added for powerpc as of now. Since the disasm_line__parse is used in other cases (perf annotate) and not only data tye profiling, the powerpc callback includes changes to work with binary code as well as mnemonic representation. Also in case if the DSO read fails and libcapstone is not supported, the approach fallback to use objdump as option. Hence as option, patch has changes to ensure objdump option also works well. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-5-atrajeev@linux.vnet.ibm.com [ Add check for strndup() result ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31	perf annotate: Update TYPE_STATE_MAX_REGS to include max of regs in powerpc	Athira Rajeev
	TYPE_STATE_MAX_REGS is arch-dependent. Currently this is defined to be 16. While checking if reg is valid using has_reg_type, max value is checked using TYPE_STATE_MAX_REGS value. Define this conditionally for powerpc. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-4-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31	perf annotate: Add "update_insn_state" callback function to handle arch ↵	Athira Rajeev
	specific instruction tracking Add "update_insn_state" callback to "struct arch" to handle instruction tracking. Currently updating instruction state is handled by static function "update_insn_state_x86" which is defined in "annotate-data.c". Make this as a callback for specific arch and move to archs specific file "arch/x86/annotate/instructions.c" . This will help to add helper function for other platforms in file: "arch/<platform>/annotate/instructions.c" and make changes/updates easier. Define callback "update_insn_state" as part of "struct arch", also make some of the debug functions non-static so that it can be referenced from other places. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-3-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31	perf annotate: Move the data structures related to register type to header file	Athira Rajeev
	Data type profiling uses instruction tracking by checking each instruction and updating the register type state in some data structures. This is useful to find the data type in cases when the register state gets transferred from one reg to another. Example, in x86, "mov" instruction and in powerpc, "mr" instruction. Currently these structures are defined in annotate-data.c and instruction tracking is implemented only for x86. Move these data structures to "annotate-data.h" header file so that other arch implementations can use it in arch specific files as well. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-2-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-26	perf tool: fix dereferencing NULL al->maps	Casey Chen
	With 0dd5041c9a0e ("perf addr_location: Add init/exit/copy functions"), when cpumode is 3 (macro PERF_RECORD_MISC_HYPERVISOR), thread__find_map() could return with al->maps being NULL. The path below could add a callchain_cursor_node with NULL ms.maps. add_callchain_ip() thread__find_symbol(.., &al) thread__find_map(.., &al) // al->maps becomes NULL ms.maps = maps__get(al.maps) callchain_cursor_append(..., &ms, ...) node->ms.maps = maps__get(ms->maps) Then the path below would dereference NULL maps and get segfault. fill_callchain_info() maps__machine(node->ms.maps); Fix it by checking if maps is NULL in fill_callchain_info(). Fixes: 0dd5041c9a0e ("perf addr_location: Add init/exit/copy functions") Signed-off-by: Casey Chen <cachen@purestorage.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: yzhong@purestorage.com Link: https://lore.kernel.org/r/20240722211548.61455-1-cachen@purestorage.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-17	perf dso: Fix build when libunwind is enabled	James Clark
	Now that symsrc_filename is always accessed through an accessor, we also need a free() function for it to avoid the following compilation error: util/unwind-libunwind-local.c:416:12: error: lvalue required as unary ‘&’ operand 416 \| zfree(&dso__symsrc_filename(dso)); Fixes: 1553419c3c10 ("perf dso: Fix address sanitizer build") Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Leo Yan <leo.yan@arm.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20240715094715.3914813-1-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-12	perf trace: Fix iteration of syscall ids in syscalltbl->entries	Howard Chu
	This is a bug found when implementing pretty-printing for the landlock_add_rule system call, I decided to send this patch separately because this is a serious bug that should be fixed fast. I wrote a test program to do landlock_add_rule syscall in a loop, yet perf trace -e landlock_add_rule freezes, giving no output. This bug is introduced by the false understanding of the variable "key" below: ``` for (key = 0; key < trace->sctbl->syscalls.nr_entries; ++key) { struct syscall sc = trace__syscall_info(trace, NULL, key); ... } ``` The code above seems right at the beginning, but when looking at syscalltbl.c, I found these lines: ``` for (i = 0; i <= syscalltbl_native_max_id; ++i) if (syscalltbl_native[i]) ++nr_entries; entries = tbl->syscalls.entries = malloc(sizeof(struct syscall) nr_entries); ... for (i = 0, j = 0; i <= syscalltbl_native_max_id; ++i) { if (syscalltbl_native[i]) { entries[j].name = syscalltbl_native[i]; entries[j].id = i; ++j; } } ``` meaning the key is merely an index to traverse the syscall table, instead of the actual syscall id for this particular syscall. So if one uses key to do trace__syscall_info(trace, NULL, key), because key only goes up to trace->sctbl->syscalls.nr_entries, for example, on my X86_64 machine, this number is 373, it will end up neglecting all the rest of the syscall, in my case, everything after `rseq`, because the traversal will stop at 373, and `rseq` is the last syscall whose id is lower than 373 in tools/perf/arch/x86/include/generated/asm/syscalls_64.c: ``` ... [334] = "rseq", [424] = "pidfd_send_signal", ... ``` The reason why the key is scrambled but perf trace works well is that key is used in trace__syscall_info(trace, NULL, key) to do trace->syscalls.table[id], this makes sure that the struct syscall returned actually has an id the same value as key, making the later bpf_prog matching all correct. After fixing this bug, I can do perf trace on 38 more syscalls, and because more syscalls are visible, we get 8 more syscalls that can be augmented. before: perf $ perf trace -vv --max-events=1 \|& grep Reusing Reusing "open" BPF sys_enter augmenter for "stat" Reusing "open" BPF sys_enter augmenter for "lstat" Reusing "open" BPF sys_enter augmenter for "access" Reusing "connect" BPF sys_enter augmenter for "accept" Reusing "sendto" BPF sys_enter augmenter for "recvfrom" Reusing "connect" BPF sys_enter augmenter for "bind" Reusing "connect" BPF sys_enter augmenter for "getsockname" Reusing "connect" BPF sys_enter augmenter for "getpeername" Reusing "open" BPF sys_enter augmenter for "execve" Reusing "open" BPF sys_enter augmenter for "truncate" Reusing "open" BPF sys_enter augmenter for "chdir" Reusing "open" BPF sys_enter augmenter for "mkdir" Reusing "open" BPF sys_enter augmenter for "rmdir" Reusing "open" BPF sys_enter augmenter for "creat" Reusing "open" BPF sys_enter augmenter for "link" Reusing "open" BPF sys_enter augmenter for "unlink" Reusing "open" BPF sys_enter augmenter for "symlink" Reusing "open" BPF sys_enter augmenter for "readlink" Reusing "open" BPF sys_enter augmenter for "chmod" Reusing "open" BPF sys_enter augmenter for "chown" Reusing "open" BPF sys_enter augmenter for "lchown" Reusing "open" BPF sys_enter augmenter for "mknod" Reusing "open" BPF sys_enter augmenter for "statfs" Reusing "open" BPF sys_enter augmenter for "pivot_root" Reusing "open" BPF sys_enter augmenter for "chroot" Reusing "open" BPF sys_enter augmenter for "acct" Reusing "open" BPF sys_enter augmenter for "swapon" Reusing "open" BPF sys_enter augmenter for "swapoff" Reusing "open" BPF sys_enter augmenter for "delete_module" Reusing "open" BPF sys_enter augmenter for "setxattr" Reusing "open" BPF sys_enter augmenter for "lsetxattr" Reusing "openat" BPF sys_enter augmenter for "fsetxattr" Reusing "open" BPF sys_enter augmenter for "getxattr" Reusing "open" BPF sys_enter augmenter for "lgetxattr" Reusing "openat" BPF sys_enter augmenter for "fgetxattr" Reusing "open" BPF sys_enter augmenter for "listxattr" Reusing "open" BPF sys_enter augmenter for "llistxattr" Reusing "open" BPF sys_enter augmenter for "removexattr" Reusing "open" BPF sys_enter augmenter for "lremovexattr" Reusing "fsetxattr" BPF sys_enter augmenter for "fremovexattr" Reusing "open" BPF sys_enter augmenter for "mq_open" Reusing "open" BPF sys_enter augmenter for "mq_unlink" Reusing "fsetxattr" BPF sys_enter augmenter for "add_key" Reusing "fremovexattr" BPF sys_enter augmenter for "request_key" Reusing "fremovexattr" BPF sys_enter augmenter for "inotify_add_watch" Reusing "fremovexattr" BPF sys_enter augmenter for "mkdirat" Reusing "fremovexattr" BPF sys_enter augmenter for "mknodat" Reusing "fremovexattr" BPF sys_enter augmenter for "fchownat" Reusing "fremovexattr" BPF sys_enter augmenter for "futimesat" Reusing "fremovexattr" BPF sys_enter augmenter for "newfstatat" Reusing "fremovexattr" BPF sys_enter augmenter for "unlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "linkat" Reusing "open" BPF sys_enter augmenter for "symlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "readlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "fchmodat" Reusing "fremovexattr" BPF sys_enter augmenter for "faccessat" Reusing "fremovexattr" BPF sys_enter augmenter for "utimensat" Reusing "connect" BPF sys_enter augmenter for "accept4" Reusing "fremovexattr" BPF sys_enter augmenter for "name_to_handle_at" Reusing "fremovexattr" BPF sys_enter augmenter for "renameat2" Reusing "open" BPF sys_enter augmenter for "memfd_create" Reusing "fremovexattr" BPF sys_enter augmenter for "execveat" Reusing "fremovexattr" BPF sys_enter augmenter for "statx" after perf $ perf trace -vv --max-events=1 \|& grep Reusing Reusing "open" BPF sys_enter augmenter for "stat" Reusing "open" BPF sys_enter augmenter for "lstat" Reusing "open" BPF sys_enter augmenter for "access" Reusing "connect" BPF sys_enter augmenter for "accept" Reusing "sendto" BPF sys_enter augmenter for "recvfrom" Reusing "connect" BPF sys_enter augmenter for "bind" Reusing "connect" BPF sys_enter augmenter for "getsockname" Reusing "connect" BPF sys_enter augmenter for "getpeername" Reusing "open" BPF sys_enter augmenter for "execve" Reusing "open" BPF sys_enter augmenter for "truncate" Reusing "open" BPF sys_enter augmenter for "chdir" Reusing "open" BPF sys_enter augmenter for "mkdir" Reusing "open" BPF sys_enter augmenter for "rmdir" Reusing "open" BPF sys_enter augmenter for "creat" Reusing "open" BPF sys_enter augmenter for "link" Reusing "open" BPF sys_enter augmenter for "unlink" Reusing "open" BPF sys_enter augmenter for "symlink" Reusing "open" BPF sys_enter augmenter for "readlink" Reusing "open" BPF sys_enter augmenter for "chmod" Reusing "open" BPF sys_enter augmenter for "chown" Reusing "open" BPF sys_enter augmenter for "lchown" Reusing "open" BPF sys_enter augmenter for "mknod" Reusing "open" BPF sys_enter augmenter for "statfs" Reusing "open" BPF sys_enter augmenter for "pivot_root" Reusing "open" BPF sys_enter augmenter for "chroot" Reusing "open" BPF sys_enter augmenter for "acct" Reusing "open" BPF sys_enter augmenter for "swapon" Reusing "open" BPF sys_enter augmenter for "swapoff" Reusing "open" BPF sys_enter augmenter for "delete_module" Reusing "open" BPF sys_enter augmenter for "setxattr" Reusing "open" BPF sys_enter augmenter for "lsetxattr" Reusing "openat" BPF sys_enter augmenter for "fsetxattr" Reusing "open" BPF sys_enter augmenter for "getxattr" Reusing "open" BPF sys_enter augmenter for "lgetxattr" Reusing "openat" BPF sys_enter augmenter for "fgetxattr" Reusing "open" BPF sys_enter augmenter for "listxattr" Reusing "open" BPF sys_enter augmenter for "llistxattr" Reusing "open" BPF sys_enter augmenter for "removexattr" Reusing "open" BPF sys_enter augmenter for "lremovexattr" Reusing "fsetxattr" BPF sys_enter augmenter for "fremovexattr" Reusing "open" BPF sys_enter augmenter for "mq_open" Reusing "open" BPF sys_enter augmenter for "mq_unlink" Reusing "fsetxattr" BPF sys_enter augmenter for "add_key" Reusing "fremovexattr" BPF sys_enter augmenter for "request_key" Reusing "fremovexattr" BPF sys_enter augmenter for "inotify_add_watch" Reusing "fremovexattr" BPF sys_enter augmenter for "mkdirat" Reusing "fremovexattr" BPF sys_enter augmenter for "mknodat" Reusing "fremovexattr" BPF sys_enter augmenter for "fchownat" Reusing "fremovexattr" BPF sys_enter augmenter for "futimesat" Reusing "fremovexattr" BPF sys_enter augmenter for "newfstatat" Reusing "fremovexattr" BPF sys_enter augmenter for "unlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "linkat" Reusing "open" BPF sys_enter augmenter for "symlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "readlinkat" Reusing "fremovexattr" BPF sys_enter augmenter for "fchmodat" Reusing "fremovexattr" BPF sys_enter augmenter for "faccessat" Reusing "fremovexattr" BPF sys_enter augmenter for "utimensat" Reusing "connect" BPF sys_enter augmenter for "accept4" Reusing "fremovexattr" BPF sys_enter augmenter for "name_to_handle_at" Reusing "fremovexattr" BPF sys_enter augmenter for "renameat2" Reusing "open" BPF sys_enter augmenter for "memfd_create" Reusing "fremovexattr" BPF sys_enter augmenter for "execveat" Reusing "fremovexattr" BPF sys_enter augmenter for "statx" TL;DR: These are the new syscalls that can be augmented Reusing "openat" BPF sys_enter augmenter for "open_tree" Reusing "openat" BPF sys_enter augmenter for "openat2" Reusing "openat" BPF sys_enter augmenter for "mount_setattr" Reusing "openat" BPF sys_enter augmenter for "move_mount" Reusing "open" BPF sys_enter augmenter for "fsopen" Reusing "openat" BPF sys_enter augmenter for "fspick" Reusing "openat" BPF sys_enter augmenter for "faccessat2" Reusing "openat" BPF sys_enter augmenter for "fchmodat2" as for the perf trace output: before perf $ perf trace -e faccessat2 --max-events=1 [no output] after perf $ ./perf trace -e faccessat2 --max-events=1 0.000 ( 0.037 ms): waybar/958 faccessat2(dfd: 40, filename: "uevent") = 0 P.S. The reason why this bug was not found in the past five years is probably because it only happens to the newer syscalls whose id is greater, for instance, faccessat2 of id 439, which not a lot of people care about when using perf trace. [Arnaldo]: notes That and the fact that the BPF code was hidden before having to use -e, that got changed kinda recently when we switched to using BPF skels for augmenting syscalls in 'perf trace': ⬢[acme@toolbox perf-tools-next]$ git log --oneline tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c a9f4c6c999008c92 perf trace: Collect sys_nanosleep first argument 29d16de26df17e94 perf augmented_raw_syscalls.bpf: Move 'struct timespec64' to vmlinux.h 5069211e2f0b47e7 perf trace: Use the right bpf_probe_read(_str) variant for reading user data 33b725ce7b988756 perf trace: Avoid compile error wrt redefining bool 7d9642311b6d9d31 perf bpf augmented_raw_syscalls: Add an assert to make sure sizeof(augmented_arg->value) is a power of two. 262b54b6c9396823 perf bpf augmented_raw_syscalls: Add an assert to make sure sizeof(saddr) is a power of two. 1836480429d173c0 perf bpf_skel augmented_raw_syscalls: Cap the socklen parameter using &= sizeof(saddr) cd2cece61ac5f900 perf trace: Tidy comments related to BPF + syscall augmentation 5e6da6be3082f77b perf trace: Migrate BPF augmentation to use a skeleton ⬢[acme@toolbox perf-tools-next]$ ⬢[acme@toolbox perf-tools-next]$ git show --oneline --pretty=reference 5e6da6be3082f77b \| head -1 5e6da6be3082f77b (perf trace: Migrate BPF augmentation to use a skeleton, 2023-08-10) ⬢[acme@toolbox perf-tools-next]$ I.e. from August, 2023. One had as well to ask for BUILD_BPF_SKEL=1, which now is default if all it needs is available on the system. I simplified the code to not expose the 'struct syscall' outside of tools/perf/util/syscalltbl.c, instead providing a function to go from the index to the syscall id: int syscalltbl__id_at_idx(struct syscalltbl *tbl, int idx); Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/lkml/ZmhlAxbVcAKoPTg8@x1 Link: https://lore.kernel.org/r/20240705132059.853205-2-howardchu95@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-12	perf dso: Fix address sanitizer build	Ian Rogers
	Various files had been missed from having accessor functions added for the sake of dso reference count checking. Add the function calls and missing dso accessor functions. Fixes: ee756ef7491e ("perf dso: Add reference count checking and accessor functions") Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240704011745.1021288-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-12	perf mem: Warn if memory events are not supported on all CPUs	Leo Yan
	It is possible that memory events are not supported on all CPUs. Prints a warning by dumping the enabled CPU maps in this case. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20240706152035.86983-3-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-03	perf dsos: When adding a dso into sorted dsos maintain the sort order	Ian Rogers
	dsos__add would add at the end of the dso array possibly requiring a later find to re-sort the array. Patterns of find then add were becoming O(nlog n) due to the sorts. Change the add routine to be O(n) rather than O(1) but to maintain the sorted-ness of the dsos array so that later finds don't need the O(nlog n) sort. Fixes: 3f4ac23a9908 ("perf dsos: Switch backing storage to array from rbtree/list") Reported-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Steinar Gunderson <sesse@google.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Matt Fleming <matt@readmodwrite.com> Link: https://lore.kernel.org/r/20240703172117.810918-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-03	perf comm str: Avoid sort during insert	Ian Rogers
	The array is sorted, so just move the elements and insert in order. Fixes: 13ca628716c6 ("perf comm: Add reference count checking to 'struct comm_str'") Reported-by: Matt Fleming <matt@readmodwrite.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Matt Fleming <matt@readmodwrite.com> Cc: Steinar Gunderson <sesse@google.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20240703172117.810918-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-06-28	perf: pmus: Remove unneeded semicolon	Yang Li
	./tools/perf/util/pmu.c:1776:49-50: Unneeded semicolon Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9443 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20240628053049.44521-1-yang.lee@linux.alibaba.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-06-28	perf stat: Use field separator in the metric header	Namhyung Kim
	It didn't use the passed field separator (using -x option) when it prints the metric headers and always put "," between the fields. Before: $ sudo ./perf stat -a -x : --per-core -M tma_core_bound --metric-only true core,cpus,% tma_core_bound: <<<--- here: "core,cpus," but ":" expected S0-D0-C0:2:10.5: S0-D0-C1:2:14.8: S0-D0-C2:2:9.9: S0-D0-C3:2:13.2: After: $ sudo ./perf stat -a -x : --per-core -M tma_core_bound --metric-only true core:cpus:% tma_core_bound: S0-D0-C0:2:10.5: S0-D0-C1:2:15.0: S0-D0-C2:2:16.5: S0-D0-C3:2:12.5: Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20240628000604.1296808-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>