summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-13perf annotate: Add --code-with-type option.Namhyung Kim
This option is to show data type info in the regular (code) annotation. It tries to find data type for each (memory) instruction in the function. It'd be useful to see function-level memory access pattern and also to debug the data type profiling result. The output would be added at the end of the line and have "# data-type:" prefix. For now, it only works with --stdio mode for simplicity. I can work on enabling it for TUI later. $ perf annotate --stdio --code-with-type Percent | Source code & Disassembly of vmlinux for cpu/mem-loads/ppk (253 samples, percent: local period) --------------------------------------------------------------------------------------------------------------- : 0 0xffffffff81baa000 <check_preemption_disabled>: 0.00 : ffffffff81baa000: pushq %r12 # data-type: (stack operation) 0.00 : ffffffff81baa002: pushq %rbp # data-type: (stack operation) 0.00 : ffffffff81baa003: pushq %rbx # data-type: (stack operation) 0.00 : ffffffff81baa004: subq $0x8, %rsp 18.00 : ffffffff81baa008: movl %gs:0x7e48893d(%rip), %ebx # 0x3294c <pcpu_hot+0xc> # data-type: struct pcpu_hot +0xc (cpu_number) 12.58 : ffffffff81baa00f: movl %gs:0x7e488932(%rip), %eax # 0x32948 <pcpu_hot+0x8> # data-type: struct pcpu_hot +0x8 (preempt_count) 0.00 : ffffffff81baa016: testl $0x7fffffff, %eax 0.00 : ffffffff81baa01b: je 0xffffffff81baa02c <check_preemption_disabled+0x2c> 0.00 : ffffffff81baa01d: addq $0x8, %rsp 0.00 : ffffffff81baa021: movl %ebx, %eax 14.19 : ffffffff81baa023: popq %rbx # data-type: (stack operation) 18.86 : ffffffff81baa024: popq %rbp # data-type: (stack operation) 12.10 : ffffffff81baa025: popq %r12 # data-type: (stack operation) 17.78 : ffffffff81baa027: jmp 0xffffffff81bc1170 <__x86_return_thunk> 6.49 : ffffffff81baa02c: callq *0xc9139e(%rip) # 0xffffffff8283b3d0 <pv_ops+0xf0> # data-type: (stack operation) 0.00 : ffffffff81baa032: testb $0x2, %ah 0.00 : ffffffff81baa035: je 0xffffffff81baa01d <check_preemption_disabled+0x1d> 0.00 : ffffffff81baa037: movq %rdi, %rbp 0.00 : ffffffff81baa03a: movq %gs:0x32940, %rax # data-type: struct pcpu_hot +0 (current_task) 0.00 : ffffffff81baa043: testb $0x4, 0x2f(%rax) # data-type: struct task_struct +0x2f (flags) 0.00 : ffffffff81baa047: je 0xffffffff81baa052 <check_preemption_disabled+0x52> 0.00 : ffffffff81baa049: cmpl $0x1, 0x3d0(%rax) # data-type: struct task_struct +0x3d0 (nr_cpus_allowed) 0.00 : ffffffff81baa050: je 0xffffffff81baa01d <check_preemption_disabled+0x1d> 0.00 : ffffffff81baa052: movq %gs:0x32940, %r12 # data-type: struct pcpu_hot +0 (current_task) 0.00 : ffffffff81baa05b: cmpw $0x0, 0x7f0(%r12) # data-type: struct task_struct +0x7f0 (migration_disabled) 0.00 : ffffffff81baa065: movq %rsi, (%rsp) 0.00 : ffffffff81baa069: jne 0xffffffff81baa01d <check_preemption_disabled+0x1d> 0.00 : ffffffff81baa06b: movl 0xe8dd13(%rip), %eax # 0xffffffff82a37d84 <system_state> # data-type: enum system_states +0 0.00 : ffffffff81baa071: testl %eax, %eax 0.00 : ffffffff81baa073: je 0xffffffff81baa01d <check_preemption_disabled+0x1d> 0.00 : ffffffff81baa075: incl %gs:0x7e4888cc(%rip) # 0x32948 <pcpu_hot+0x8> # data-type: struct pcpu_hot +0x8 (preempt_count) 0.00 : ffffffff81baa07c: movq $-0x7e14a100, %rdi 0.00 : ffffffff81baa083: callq 0xffffffff81148c40 <__printk_ratelimit> # data-type: (stack operation) 0.00 : ffffffff81baa088: testl %eax, %eax 0.00 : ffffffff81baa08a: je 0xffffffff81baa0d5 <check_preemption_disabled+0xd5> 0.00 : ffffffff81baa08c: movl 0x958(%r12), %r9d # data-type: struct task_struct +0x958 (pid) 0.00 : ffffffff81baa094: movq (%rsp), %rdx # data-type: char* +0 0.00 : ffffffff81baa098: movq %rbp, %rsi 0.00 : ffffffff81baa09b: leaq 0xb88(%r12), %r8 # data-type: struct task_struct +0xb88 (comm) 0.00 : ffffffff81baa0a3: movl %gs:0x7e48889e(%rip), %ecx # 0x32948 <pcpu_hot+0x8> # data-type: struct pcpu_hot +0x8 (preempt_count) 0.00 : ffffffff81baa0aa: andl $0x7fffffff, %ecx 0.00 : ffffffff81baa0b0: movq $-0x7dd3cdf0, %rdi 0.00 : ffffffff81baa0b7: subl $0x1, %ecx 0.00 : ffffffff81baa0ba: callq 0xffffffff81149340 <_printk> # data-type: (stack operation) 0.00 : ffffffff81baa0bf: movq 0x20(%rsp), %rsi 0.00 : ffffffff81baa0c4: movq $-0x7ddb8c7e, %rdi 0.00 : ffffffff81baa0cb: callq 0xffffffff81149340 <_printk> # data-type: (stack operation) 0.00 : ffffffff81baa0d0: callq 0xffffffff81b7ab60 <dump_stack> # data-type: (stack operation) 0.00 : ffffffff81baa0d5: decl %gs:0x7e48886c(%rip) # 0x32948 <pcpu_hot+0x8> # data-type: struct pcpu_hot +0x8 (preempt_count) 0.00 : ffffffff81baa0dc: jmp 0xffffffff81baa01d <check_preemption_disabled+0x1d> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250310224925.799005-8-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf annotate: Implement code + data type annotationNamhyung Kim
Sometimes it's useful to see both instructions and their data type together. Let's extend the annotate code to use data type profiling functions. To make it easy to pass more argument, introduce a struct to carry necessary information together. Also add a new annotation_option called 'code_with_type' to control the behavior. This is not enabled yet but it'll be set later from the command line. For simplicity, this is implemented for --stdio only. Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250310224925.799005-7-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf annotate: Factor out __hist_entry__get_data_type()Namhyung Kim
So that it can only handle a single disasm_linme and hopefully make the code simpler. This is also a preparation to be called from different places later. The NO_TYPE macro was added to distinguish when it failed or needs retry. Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250310224925.799005-6-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf annotate: Pass hist_entry to annotate functionsNamhyung Kim
It's a prepartion to support code annotation and data type annotation at the same time. Data type annotation needs more information in the hist_entry so it needs to be passed deeper. Also rename a function with the same name in the builtin-annotate.c to hist_entry__stdio_annotate since it matches better to the command line option. And change the condition inside to be simpler. Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250310224925.799005-5-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf annotate: Pass annotation_options to annotation_line__print()Namhyung Kim
The annotation_line__print() has many arguments. But min_percent, max_lines and percent_type are from struct annotaion_options. So let's pass a pointer to the option instead of passing them separately to reduce the number of function arguments. Actually it has a recursive call if 'queue' is set. Add a new option instance to pass different values for the case. Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250310224925.799005-4-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf annotate: Remove unused len parameter from annotation_line__print()Namhyung Kim
It's not used anywhere, let's get rid of it. Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250310224925.799005-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf annotate-data: Add annotated_data_type__get_member_name()Namhyung Kim
Factor out a function to get the name of member field at the given offset. This will be used in other places. Also update the output of typeoff sort key a little bit. As we know that some special types like (stack operation), (stack canary) and (unknown) won't have fields, skip printing the offset and field. For example, the following change is expected. "(stack operation) +0 (no field)" ==> "(stack operation)" Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250310224925.799005-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf ftrace: Use atomic inc to update histogram in BPFNamhyung Kim
It should use an atomic instruction to update even if the histogram is keyed by delta as it's also used for stats. Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/r/20250227191223.1288473-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf ftrace: Remove an unnecessary condition check in BPFNamhyung Kim
The bucket_num is set based on the {max,min}_latency already in cmd_ftrace(), so no need to check it again in BPF. Also I found that it didn't pass the max_latency to BPF. :) No functional changes intended. Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/r/20250227191223.1288473-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf ftrace: Fix latency stats with BPFNamhyung Kim
When BPF collects the stats for the latency in usec, it first divides the time by 1000. But that means it would have 0 if the delta is small and won't update the total time properly. Let's keep the stats in nsec always and adjust to usec before printing. Before: $ sudo ./perf ftrace latency -ab -T mutex_lock --hide-empty -- sleep 0.1 # DURATION | COUNT | GRAPH | 0 - 1 us | 765 | ############################################# | 1 - 2 us | 10 | | 2 - 4 us | 2 | | 4 - 8 us | 5 | | # statistics (in usec) total time: 0 <<<--- (here) avg time: 0 max time: 6 min time: 0 count: 782 After: $ sudo ./perf ftrace latency -ab -T mutex_lock --hide-empty -- sleep 0.1 # DURATION | COUNT | GRAPH | 0 - 1 us | 880 | ############################################ | 1 - 2 us | 13 | | 2 - 4 us | 8 | | 4 - 8 us | 3 | | # statistics (in usec) total time: 268 <<<--- (here) avg time: 0 max time: 6 min time: 0 count: 904 Tested-by: Athira Rajeev <atrajeev@linux.ibm.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/r/20250227191223.1288473-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf test stat: Additional topdown grouping testsIan Rogers
Add a loop and helper function to avoid repetition, the loop uses arrays so switch the shell to bash. Add additional topdown group tests where a topdown event needs to be moved beyond others and the slots event isn't first in the target group. This replicates issues that occur on hybrid systems where the other events are for the cpu_atom PMU. Test with both PMU and software events. Place the slots event later in the event list. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250307023906.1135613-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf x86 evlist: Update comments on topdown regroupingDapeng Mi
Update to remove comments about groupings not working and with the: ``` perf stat -e "{instructions,slots},{cycles,topdown-retiring}" ``` case that now works. Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250307023906.1135613-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf parse-events: Corrections to topdown sortingIan Rogers
In the case of '{instructions,slots},faults,topdown-retiring' the first event that must be grouped, slots, is ignored causing the topdown-retiring event not to be adjacent to the group it needs to be inserted into. Don't ignore the group members when computing the force_grouped_index. Make the force_grouped_index be for the leader of the group it is within and always use it first rather than a group leader index so that topdown events may be sorted from one group into another. As the PMU name comparison applies to moving events in the same group ensure the name ordering is always respected. Change the group splitting logic to not group if there are no other topdown events and to fix cases where the force group leader wasn't being grouped with the other members of its group. Reported-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Closes: https://lore.kernel.org/lkml/20250224083306.71813-2-dapeng1.mi@linux.intel.com/ Closes: https://lore.kernel.org/lkml/f7e4f7e8-748c-4ec7-9088-0e844392c11a@linux.intel.com/ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250307023906.1135613-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf x86/topdown: Fix topdown leader sampling test error on hybridDapeng Mi
When running topdown leader smapling test on Intel hybrid platforms, such as LNL/ARL, we see the below error. Topdown leader sampling test Topdown leader sampling [Failed topdown events not reordered correctly] It indciates the below command fails. perf record -o "${perfdata}" -e "{instructions,slots,topdown-retiring}:S" true The root cause is that perf tool creats a perf event for each PMU type if it can create. As for this command, there would be 5 perf events created, cpu_atom/instructions/,cpu_atom/topdown_retiring/, cpu_core/slots/,cpu_core/instructions/,cpu_core/topdown-retiring/ For these 5 events, the 2 cpu_atom events are in a group and the other 3 cpu_core events are in another group. When arch_topdown_sample_read() traverses all these 5 events, events cpu_atom/instructions/ and cpu_core/slots/ don't have a same group leade, and then return false directly and lead to cpu_core/slots/ event is used to sample and this is not allowed by PMU driver. It's a overkill to return false directly if "evsel->core.leader != leader->core.leader" since there could be multiple groups in the event list. Just "continue" instead of "return false" to fix this issue. Fixes: 1e53e9d1787b ("perf x86/topdown: Correct leader selection with sample_read enabled") Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250307023906.1135613-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf tools: Improve handling of hybrid PMUs in perf_event_attr__fprintfIan Rogers
Support the PMU name from the legacy hardware and hw_cache PMU extended types. Remove some macros and make variables more intention revealing, rather than just being called "value". Before: ``` $ perf stat -vv -e instructions true ... ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0xa00000001 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid 181636 cpu -1 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0x400000001 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid 181636 cpu -1 group_fd -1 flags 0x8 = 6 ... ``` After: ``` $ perf stat -vv -e instructions true ... ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0xa00000001 (cpu_atom/PERF_COUNT_HW_INSTRUCTIONS/) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 ------------------------------------------------------------ sys_perf_event_open: pid 181724 cpu -1 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0x400000001 (cpu_core/PERF_COUNT_HW_INSTRUCTIONS/) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 ------------------------------------------------------------ sys_perf_event_open: pid 181724 cpu -1 group_fd -1 flags 0x8 = 6 ... ``` Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Tested-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250307023906.1135613-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf python tracepoint: Switch to using parse_eventsIan Rogers
Rather than manually configuring an evsel, switch to using parse_events for greater commonality with the rest of the perf code. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-12-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf python: Add evlist.config to set up record optionsIan Rogers
Add access to evlist__config that is used to configure an evlist with record options. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf python: Add evlist all_cpus accessorIan Rogers
Add a means to get the reference counted all_cpus CPU map from an evlist in its python form. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf python: Avoid duplicated code in get_tracepoint_fieldIan Rogers
The code replicates computations done in evsel__tp_format, reuse evsel__tp_format to simplify the python C code. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf python: Update ungrouped evsel leader in cloneIan Rogers
evsels are cloned in the python code as they form part of the Python object pyrf_evsel. The cloning doesn't update the evsel's leader, do this for the case of an evsel being ungrouped. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf python: Add optional cpus and threads arguments to parse_eventsIan Rogers
Used for the evlist initialization. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf python: Add member access to a number of evsel variablesIan Rogers
Most variables are part of the perf_event_attr, so that they may be queried and modified. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf python: Add evlist enable and disable methodsIan Rogers
By default the evsels from parse_events will be disabled. Add access to the evlist functions so they can be enabled/disabled. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf evsel: tp_format accessing improvementsIan Rogers
Ensure evsel__clone copies the tp_sys and tp_name variables. In evsel__tp_format, if tp_sys isn't set, use the config value to find the tp_format. This succeeds in python code where pyrf__tracepoint has already found the format. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-4-irogers@google.com Fixes: 6c8310e8380d472c ("perf evsel: Allow evsel__newtp without libtraceevent") Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf evlist: Add success path to evlist__create_syswide_mapsIan Rogers
Over various refactorings evlist__create_syswide_maps has been made to only ever return with -ENOMEM. Fix this so that when perf_evlist__set_maps is successfully called, 0 is returned. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-3-irogers@google.com Fixes: 8c0498b6891d7ca5 ("perf evlist: Fix create_syswide_maps() not propagating maps") Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf debug: Avoid stack overflow in recursive error messageIan Rogers
In debug_file, pr_warning_once is called on error. As that function calls debug_file the function will yield a stack overflow. Switch the location of the call so the recursion is avoided. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-2-irogers@google.com Fixes: ec49230cf6dda704 ("perf debug: Expose debug file") Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-10perf symbol: Support .gnu_debugdata for symbolsStephen Brennan
Fedora introduced a "MiniDebuginfo" feature, in which an LZMA-compressed ELF file is placed inside a section named ".gnu_debugdata". This file contains nothing but a symbol table, which can be used to supplement the .dynsym section which only contains required symbols for runtime. It is supported by GDB for stack traces, but it should be useful for tracing as well. Implement support for loading symbols from .gnu_debugdata. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250307232206.2102440-4-stephen.s.brennan@oracle.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-10perf tools: Add LZMA decompression from FILEStephen Brennan
Internally lzma_decompress_to_file() creates a FILE from the filename. Add an API that takes an existing FILE directly. This allows decompressing already-open files and even buffers opened by fmemopen(). It is necessary for supporting .gnu_debugdata in the next patch. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250307232206.2102440-3-stephen.s.brennan@oracle.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-10perf tools: Add dummy functions for !HAVE_LZMA_SUPPORTStephen Brennan
This allows us to use them without needing to ifdef the calling code. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250307232206.2102440-2-stephen.s.brennan@oracle.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-10perf mem: Don't leak mem event namesIan Rogers
When preparing the mem events for the argv copies are intentionally made. These copies are leaked and cause runs of perf using address sanitizer to fail. Rather than leak the memory allocate a chunk of memory for the mem event names upfront and build the strings in this - the storage is sized larger than the previous buffer size. The caller is then responsible for clearing up this memory. As part of this change, remove the mem_loads_name and mem_stores_name global buffers then change the perf_pmu__mem_events_name to write to an out argument buffer. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250308012853.1384762-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-10perf vendor events riscv: Add SiFive P650 eventsEric Lin
The SiFive Performance P650 core (including the vector-enabled P670 and area-optimized P450/P470 variants) updates the P550 microarchitecture. It brings in the debug, trace, and counter events from newer Bullet cores, and adds new events for iTLB and dTLB multi-hits. All other PMU events are unchanged from the P550 core. Signed-off-by: Eric Lin <eric.lin@sifive.com> Co-developed-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Tested-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250213220341.3215660-8-samuel.holland@sifive.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-10perf vendor events riscv: Add SiFive P550 eventsEric Lin
The SiFive Performance P550 core features an out-of-order microarchitecture which exposes the same PMU events as Bullet, plus events for UTLB hits and PTE cache misses/hits. Add support for specifying these events using symbolic names. Signed-off-by: Eric Lin <eric.lin@sifive.com> Co-developed-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Tested-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250213220341.3215660-7-samuel.holland@sifive.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-10perf vendor events riscv: Add SiFive Bullet version 0x0d eventsEric Lin
SiFive Bullet microarchitecture cores with mimpid values starting with 0x0d or greater add new PMU events to count TLB miss stall cycles. All other PMU events are unchanged from earlier Bullet cores. Signed-off-by: Eric Lin <eric.lin@sifive.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Tested-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250213220341.3215660-6-samuel.holland@sifive.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-10perf vendor events riscv: Add SiFive Bullet version 0x07 eventsEric Lin
SiFive Bullet microarchitecture cores with mimpid values starting with 0x07 or greater add new PMU events to support debug, trace, and counter sampling and filtering (Sscofpmf). All other PMU events are unchanged from earlier Bullet cores. Signed-off-by: Eric Lin <eric.lin@sifive.com> Co-developed-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Tested-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250213220341.3215660-5-samuel.holland@sifive.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-10perf vendor events riscv: Update SiFive Bullet eventsEric Lin
Regenerate the event lists from the original hardware description. This makes them consistent with the event lists for newer versions of the hardware, allowing most files to be reused across hardware versions. Signed-off-by: Eric Lin <eric.lin@sifive.com> Co-developed-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Tested-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250213220341.3215660-4-samuel.holland@sifive.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-10perf vendor events riscv: Remove leading zeroesSamuel Holland
The EventCode field (as stored in the mhpmeventN CSRs) is actually 56 bits wide, but there is no need to keep leading zeroes in the JSON files. Remove them to simplify review of the following change, which regenerates the files in a way that does not include leading zeroes. This change was performed automatically with `sed -i "s/0x0*/0x/"`. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Tested-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250213220341.3215660-3-samuel.holland@sifive.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-10perf vendor events riscv: Rename U74 to BulletSamuel Holland
This set of PMU event descriptions applies not only to the SiFive U74 core configuration, but also to other SiFive cores that implement the Bullet microarchitecture (such as U64, P270, and X280). Rename the directory to be more generic. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Tested-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250213220341.3215660-2-samuel.holland@sifive.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-10perf util: Remove unused perf_config__refreshDr. David Alan Gilbert
perf_config__refresh() was added in 2016 by commit 8a0a9c7e9146 ("perf config: Introduce new init() and exit()") but has remained unused. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250305023120.155420-7-linux@treblig.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-10perf util: Remove unused perf_pmus__default_pmu_nameDr. David Alan Gilbert
perf_pmus__default_pmu_name() last use was removed by 2023's commit e3edd6cf6399 ("perf pmu-events: Reduce processed events by passing PMU") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250305023120.155420-6-linux@treblig.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-10perf util: Remove unused perf_data__update_dirDr. David Alan Gilbert
perf_data__update_dir() was added in 2019's commit e8be135751f2 ("perf data: Add perf_data__update_dir() function") but has never been used. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250305023120.155420-5-linux@treblig.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-10perf util: Remove unused pstack__popDr. David Alan Gilbert
The last use of pstack__pop() was removed in 2015 by commit 6422184b087f ("perf hists browser: Simplify zooming code using pstack_peek()") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250305023120.155420-4-linux@treblig.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-10perf util: Remove unused perf_color_default_configDr. David Alan Gilbert
perf_color_default_config() was added in 2009 by commit 8fc0321f1ad0 ("perf_counter tools: Add color terminal output support") but has remained unused. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250305023120.155420-3-linux@treblig.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-07perf tests: Fix data symbol test with LTO buildsIan Rogers
With LTO builds, although regular builds could also see this as all the code is in one file, the datasym workload can realize the buf1.reserved data is never accessed. The compiler moves the variable to bss and only keeps the data1 and data2 parts as separate variables. This causes the symbol check to fail in the test. Make the variable volatile to disable the more aggressive optimization. Rename the variable to make which buf1 in perf is being referred to. Before: $ perf test -vv "data symbol" 126: Test data symbol: --- start --- test child forked, pid 299808 perf does not have symbol 'buf1' perf is missing symbols - skipping test ---- end(-2) ---- 126: Test data symbol : Skip $ nm perf|grep buf1 0000000000a5fa40 b buf1.0 0000000000a5fa48 b buf1.1 After: $ nm perf|grep buf1 0000000000a53a00 d buf1 $ perf test -vv "data symbol"126: Test data symbol: --- start --- test child forked, pid 302166 a53a00-a53a39 l buf1 perf does have symbol 'buf1' Recording workload... Waiting for "perf record has started" message OK Cleaning up files... ---- end(0) ---- 126: Test data symbol : Ok Fixes: 3dfc01fe9d12 ("perf test: Add 'datasym' test workload") Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250226230109.314580-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-07perf report: Fix memory leaks in the hierarchy modeNamhyung Kim
Ian told me that there are many memory leaks in the hierarchy mode. I can easily reproduce it with the follwing command. $ make DEBUG=1 EXTRA_CFLAGS=-fsanitize=leak $ perf record --latency -g -- ./perf test -w thloop $ perf report -H --stdio ... Indirect leak of 168 byte(s) in 21 object(s) allocated from: #0 0x7f3414c16c65 in malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:75 #1 0x55ed3602346e in map__get util/map.h:189 #2 0x55ed36024cc4 in hist_entry__init util/hist.c:476 #3 0x55ed36025208 in hist_entry__new util/hist.c:588 #4 0x55ed36027c05 in hierarchy_insert_entry util/hist.c:1587 #5 0x55ed36027e2e in hists__hierarchy_insert_entry util/hist.c:1638 #6 0x55ed36027fa4 in hists__collapse_insert_entry util/hist.c:1685 #7 0x55ed360283e8 in hists__collapse_resort util/hist.c:1776 #8 0x55ed35de0323 in report__collapse_hists /home/namhyung/project/linux/tools/perf/builtin-report.c:735 #9 0x55ed35de15b4 in __cmd_report /home/namhyung/project/linux/tools/perf/builtin-report.c:1119 #10 0x55ed35de43dc in cmd_report /home/namhyung/project/linux/tools/perf/builtin-report.c:1867 #11 0x55ed35e66767 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:351 #12 0x55ed35e66a0e in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:404 #13 0x55ed35e66b67 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:448 #14 0x55ed35e66eb0 in main /home/namhyung/project/linux/tools/perf/perf.c:556 #15 0x7f340ac33d67 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 ... $ perf report -H --stdio 2>&1 | grep -c '^Indirect leak' 93 I found that hist_entry__delete() missed to release child entries in the hierarchy tree (hroot_{in,out}). It needs to iterate the child entries and call hist_entry__delete() recursively. After this change: $ perf report -H --stdio 2>&1 | grep -c '^Indirect leak' 0 Reported-by: Ian Rogers <irogers@google.com> Tested-by Thomas Falcon <thomas.falcon@intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250307061250.320849-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-07perf report: Use map_symbol__copy() when copying callchainsNamhyung Kim
It seems there are places to miss updating refcount of maps. Let's use map_symbol__copy() helper to properly copy them with refcounts updated. Link: https://lore.kernel.org/r/20250307061250.320849-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-06perf annotate: Return errors from disasm_line__parse_powerpc()Athira Rajeev
In disasm_line__parse_powerpc() , return code from function disasm_line__parse() is ignored. This will result in bad results if the disasm_line__parse() fails to disasm the line. Use the return code to fix this. Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com> Tested-By: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Link: https://lore.kernel.org/r/20250304154114.62093-2-atrajeev@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-06perf annotate: Add annotation_options.disassembler_usedAthira Rajeev
When doing "perf annotate", perf tool provides option to use specific disassembler like llvm/objdump/capstone. The order picked is to use llvm first and if that fails fallback to objdump ie to use PERF_DISASM_LLVM, PERF_DISASM_CAPSTONE and PERF_DISASM_OBJDUMP In powerpc, when using "data type" sort keys, first preferred approach is to read the raw instruction from the DSO. In objdump is specified in "--objdump" option, it picks the symbol disassemble using objdump. Currently disasm_line__parse_powerpc() function uses length of the "line" to determine if objdump is used. But there are few cases, where if objdump doesn't recognise the instruction, the disassembled string will be empty. Example: 134cdc: c4 05 82 41 beq 1352a0 <getcwd+0x6e0> 134ce0: ac 00 8e 40 bne cr3,134d8c <getcwd+0x1cc> 134ce4: 0f 00 10 04 pld r9,1028308 ====>134ce8: d4 b0 20 e5 134cec: 16 00 40 39 li r10,22 134cf0: 48 01 21 ea ld r17,328(r1) So depending on length of line will give bad results. Add a new filed to annotation options structure, "struct annotation_options" to save the disassembler used. Use this info to determine if disassembly is done while parsing the disasm line. Reported-by: Tejas Manhas <Tejas.Manhas1@ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com> Tested-By: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Link: https://lore.kernel.org/r/20250304154114.62093-1-atrajeev@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-06perf report: Do not process non-JIT BPF ksymbol eventsNamhyung Kim
The length of PERF_RECORD_KSYMBOL for BPF is a size of JITed code so it'd be 0 when it's not JITed. The ksymbol is needed to symbolize the code when it gets samples in the region but non-JITed code cannot get samples. Thus it'd be ok to ignore them. Actually it caused a performance issue in the perf tools on old ARM kernels where it can refuse to JIT some BPF codes. It ended up splitting the existing kernel map (kallsyms). And later lookup for a kernel symbol would create a new kernel map from kallsyms and then split it again and again. :( Probably there's a bug in the kernel map/symbol handling in perf tools. But I think we need to fix this anyway. Reported-by: Kevin Nomura <nomurak@google.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250305232838.128692-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-06perf test: Fix leak in "Synthesize attr update" testIan Rogers
The own_cpus map variable may be non-NULL and hold a reference, in particular on hybrid machines. Do a put before overwriting the variable to avoid a memory leak. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250305191931.604764-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf machine: Fix insertion of PERF_RECORD_KSYMBOL related kernel mapsNamhyung Kim
This was detected at the end of a 'perf record' session when build-id collection was enabled and thus the BPF programs put in place while the session was running, some even put in place by perf itself were processed and inserted, with some overlaps related to BPF trampolines and programs took place. Using maps__fixup_overlap_and_insert() instead of maps__insert() "fixes" the problem, in the sense that overlaps will be dealt with and then the consistency will be kept, but it would be interesting to fully understand why such overlaps take place and how to deal with them when doing symbol resolution. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Suggested-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/lkml/CAP-5=fXEEMFgPF2aZhKsfrY_En+qoqX20dWfuE_ad73Uxf0ZHQ@mail.gmail.com Link: https://lore.kernel.org/r/20250228211734.33781-7-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>