summaryrefslogtreecommitdiff
path: root/tools/perf/util/evsel.c
AgeCommit message (Collapse)Author
2025-05-14perf parse-events: Use wildcard processing to set an event to merge intoIan Rogers
The merge stat code fails for uncore events if they are repeated twice, for example `perf stat -e clockticks,clockticks -I 1000` as the counts of the second set of uncore events will be merged into the first counter. Reimplement the logic to have a first_wildcard_match so that merged later events correctly merge into the first wildcard event that they will be aggregated into. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Chun-Tse Shao <ctshao@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250513215401.2315949-3-ctshao@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-14perf evlist: Make uniquifying counter names consistentIan Rogers
'perf stat' has different uniquification logic to 'perf record' and perf top. In the case of perf record and 'perf top' all hybrid event names are uniquified. 'perf stat' is more disciplined respecting name config terms, libpfm4 events, etc. 'perf stat' will uniquify hybrid events and the non-core PMU cases shouldn't apply to perf record or 'perf top'. For consistency, remove the uniquification for 'perf record' and 'perf top' and reuse the 'perf stat' uniquification, making the code more globally visible for this. Fix the detection of cross-PMU for disabling uniquify by correctly setting last_pmu. When setting uniquify on an evsel, make sure the PMUs between the 2 considered events differ otherwise the uniquify isn't adding value. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Chun-Tse Shao <ctshao@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250513215401.2315949-2-ctshao@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12perf evsel: Add per-thread warning for EOPNOTSUPP open failuesIan Rogers
The mrvl_ddr_pmu will return EOPNOTSUPP if opened in per-thread mode. Give a warning for this similar to EINVAL. Doing this better supports metric testing with limited permissions when the mrvl_ddr_pmu is present, as the failure to open causes the test to skip and not fail. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250412004704.2297939-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf evsel: Assemble off-cpu samplesHoward Chu
Use the data in bpf-output samples, to assemble off-cpu samples. In evsel__is_offcpu_event(), check if sample_type is PERF_SAMPLE_RAW to support off-cpu sample data created by an older version of perf. Testing compatibility on off-cpu samples collected by perf before this patch series: See below, the sample_type still uses PERF_SAMPLE_CALLCHAIN $ perf script --header -i ./perf.data.ptn | grep "event : name = offcpu-time" # event : name = offcpu-time, , id = { 237917, 237918, 237919, 237920 }, type = 1 (software), size = 136, config = 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq } = 1, sample_type = IP|TID|TIME|CALLCHAIN|CPU|PERIOD|IDENTIFIER, read_format = ID|LOST, disabled = 1, freq = 1, sample_id_all = 1 The output is correct. $ perf script -i ./perf.data.ptn | grep offcpu-time gmain 2173 [000] 18446744069.414584: 100102015 offcpu-time: NetworkManager 901 [000] 18446744069.414584: 5603579 offcpu-time: Web Content 1183550 [000] 18446744069.414584: 46278 offcpu-time: gnome-control-c 2200559 [000] 18446744069.414584: 11998247014 offcpu-time: <SNIP> $ And after this patch series: $ perf script --header -i ./perf.data.off-cpu-v9 | grep "event : name = offcpu-time" # event : name = offcpu-time, , id = { 237959, 237960, 237961, 237962 }, type = 1 (software), size = 136, config = 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq } = 1, sample_type = IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format = ID|LOST, disabled = 1, freq = 1, sample_id_all = 1 $ ./perf script -i ./perf.data.off-cpu-v9 | grep offcpu-time gnome-shell 1875 [001] 4789616.361225: 100097057 offcpu-time: gnome-shell 1875 [001] 4789616.461419: 100107463 offcpu-time: firefox 2206821 [002] 4789616.475690: 255257245 offcpu-time: $ Committer testing: The command to record those samples: root@number:~# perf record --off-cpu -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.092 MB perf.data (1552 samples) ] root@number:~# Then, before this patch series, the sample_type for the "offcpu-time" event is: root@number:~# perf evlist -v | grep offcpu-time offcpu-time: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, freq: 1, sample_id_all: 1 root@number:~# And after it, after recording it again: root@number:~# perf record --off-cpu -a sleep 1 ; perf evlist -v | grep offcpu-time [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.151 MB perf.data (2843 samples) ] offcpu-time: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, sample_id_all: 1 root@number:~# Suggested-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-7-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-6-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf record --off-cpu: Parse off-cpu eventHoward Chu
Parse the off-cpu event using parse_event(), as bpf-output. Call evlist__enable_evsel() on off-cpu event. This fixes the inability to collect direct off-cpu samples on a workload, as reported by Arnaldo Carvalho de Melo <acme@redhat.com>. The reason being, workload sets enable_on_exec instead of calling evlist__enable(), but off-cpu event does not attach to an executable and execve won't be called, so the fds from perf_event_open() are not enabled. no-inherit should be set to 1, here's the reason: We update the BPF perf_event map for direct off-cpu sample dumping (in following patches), it executes as follows: bpf_map_update_value() bpf_fd_array_map_update_elem() perf_event_fd_array_get_ptr() perf_event_read_local() In perf_event_read_local(), there is: int perf_event_read_local(struct perf_event *event, u64 *value, u64 *enabled, u64 *running) { ... /* * It must not be an event with inherit set, we cannot read * all child counters from atomic context. */ if (event->attr.inherit) { ret = -EOPNOTSUPP; goto out; } Which means no-inherit has to be true for updating the BPF perf_event map. Moreover, for bpf-output events, we primarily want a system-wide event instead of a per-task event. The reason is that in BPF's bpf_perf_event_output(), BPF uses the CPU index to retrieve the perf_event file descriptor it outputs to. Making a bpf-output event system-wide naturally satisfies this requirement by mapping CPU appropriately. Suggested-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-4-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-3-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf evsel: Expose evsel__is_offcpu_event() for future useHoward Chu
Expose evsel__is_offcpu_event() so it can be used in off_cpu_config(), evsel__parse_sample() and 'perf script'. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-3-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-2-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf record: Add --sample-mem-info optionNamhyung Kim
There's no way to enable PERF_SAMPLE_DATA_SRC without PERF_SAMPLE_ADDR which brings a lot of overhead due to the number of MMAP[2] records. Let's add a new option to enable this information separately. Committer testing: # perf record -a --sample-mem-info ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.815 MB perf.data (2637 samples) ] # # perf evlist -v cycles:P: type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|IDENTIFIER|DATA_SRC, read_format: ID|LOST, disabled: 1, freq: 1, precise_ip: 2, sample_id_all: 1 dummy:u: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|IDENTIFIER|DATA_SRC, read_format: ID|LOST, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 # # perf report -D |& grep -w PERF_RECORD_SAMPLE -A3 -m1 0 44675164447282 0x1a7590 [0x40]: PERF_RECORD_SAMPLE(IP, 0x4001): 107299/107299: 0xffffffffac4a5e11 period: 144 addr: 0 . data_src: 0x229080142 ... thread: perf:107299 ...... dso: /lib/modules/6.15.0-rc4+/build/vmlinux # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-04-25perf record: Retirement latency cleanup in evsel__configIan Rogers
'perf record' will fail with retirement latency events as the open doesn't do a perf_event_open system call. Use evsel__config() to set up such events for recording by removing the flag and enabling sample weights - the sample weights containing the retirement latency. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Weilin Wang <weilin.wang@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250414174134.3095492-17-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-04-25perf intel-tpebs: Don't close record on readIan Rogers
Factor sending record control fd code into its own function. Rather than killing the record process send it a ping when reading. Timeouts were witnessed if done too frequently, so only ping for the first tpebs events. Don't kill the record command send it a stop command. As close isn't reliably called also close on evsel__exit. Add extra checks on the record being terminated to avoid warnings. Adjust the locking as needed and incorporate extra -Wthread-safety checks. Check to do six 500ms poll timeouts when sending commands, rather than the larger 3000ms, to allow the record process terminating to be better witnessed. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Weilin Wang <weilin.wang@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250414174134.3095492-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-04-25perf intel-tpebs: Add support for updating counts in evsel__tpebs_readIan Rogers
Rename to reflect evsel argument and for consistency with other tpebs functions. Update count from prev_raw_counts when available. Eventually this will allow inteval mode support. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Weilin Wang <weilin.wang@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250414174134.3095492-11-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-04-25perf intel-tpebs: Refactor tpebs_results listIan Rogers
evsel names and metric-ids are used for matching but this can be problematic, for example, multiple occurrences of the same retirement latency event become a single event for the record. Change the name of the record events so they are unique and reflect the evsel of the retirement latency event that opens them (the retirement latency event's evsel address is embedded within them). This allows an evsel based close to close the event when the retirement latency event is closed. This is important as 'perf stat' has an evlist and the session listen to the record events has an evlist, knowing which event should remove the tpebs_retire_lat can't be tied to an evlist list as there is more than 1, so closing which evlist should cause the tpebs to stop? Using the evsel and the last one out doing the tpebs_stop is cleaner. Committer notes: Fix the build on 32-bit systems by using unsigned long when converting pointers to integers instead of uint64_t. Fixes: 20 4.97 debian:experimental-x-mips : FAIL gcc version 14.2.0 (Debian 14.2.0-13) util/intel-tpebs.c: In function 'tpebs_retire_lat__find': util/intel-tpebs.c:377:21: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] 377 | if ((uint64_t)t->evsel == num) | ^ cc1: all warnings being treated as errors Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Weilin Wang <weilin.wang@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250414174134.3095492-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-04-25perf intel-tpebs: Rename tpebs_start to evsel__tpebs_openIan Rogers
Try to add more consistency to evsel by having tpebs_start renamed to evsel__tpebs_open, passing the evsel that is being opened. The unusual behavior of evsel__tpebs_open opening all events on the evlist is kept and will be cleaned up further in later patches. The comments are cleaned up as tpebs_start isn't called from evlist. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Weilin Wang <weilin.wang@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250414174134.3095492-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-04-11perf tools: Remove evsel__handle_error_quirks()Namhyung Kim
The evsel__handle_error_quirks() is to fixup invalid event attributes on some architecture based on the error code. Currently it's only used for AMD to disable precise_ip not to use IBS which has more restrictions. But the commit c33aea446bf555ab changed call evsel__precise_ip_fallback for any errors so there's no difference with the above function. To make matter worse, it caused a problem with branch stack on Zen3. The IBS doesn't support branch stack so it should use a regular core PMU event. The default event is set precise_max and it starts with 3. And evsel__precise_ip_fallback() tries with it and reduces the level one by one. At last it tries with 0 but it also failed on Zen3 since the branch stack is not supported for the cycles event. At this point, evsel__precise_ip_fallback() restores the original precise_ip value (3) in the hope that it can succeed with other modifier (like exclude_kernel). Then evsel__handle_error_quirks() see it has precise_ip != 0 and make it retry with 0. This created an infinite loop. Before: $ perf record -b -vv |& grep removing removing precise_ip on AMD removing precise_ip on AMD removing precise_ip on AMD removing precise_ip on AMD removing precise_ip on AMD removing precise_ip on AMD removing precise_ip on AMD removing precise_ip on AMD removing precise_ip on AMD removing precise_ip on AMD removing precise_ip on AMD removing precise_ip on AMD ... After: $ perf record -b true Error: Failure to open event 'cycles:P' on PMU 'cpu' which will be removed. Invalid event (cycles:P) in per-thread mode, enable system wide with '-a'. Error: Failure to open any events for recording. Fixes: c33aea446bf555ab ("perf tools: Fix precise_ip fallback logic") Tested-by: Chun-Tse Shao <ctshao@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250410010252.402221-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf evsel: tp_format accessing improvementsIan Rogers
Ensure evsel__clone copies the tp_sys and tp_name variables. In evsel__tp_format, if tp_sys isn't set, use the config value to find the tp_format. This succeeds in python code where pyrf__tracepoint has already found the format. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-4-irogers@google.com Fixes: 6c8310e8380d472c ("perf evsel: Allow evsel__newtp without libtraceevent") Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-12perf sample: Make user_regs and intr_regs optionalIan Rogers
The struct dump_regs contains 512 bytes of cache_regs, meaning the two values in perf_sample contribute 1088 bytes of its total 1384 bytes size. Initializing this much memory has a cost reported by Tavian Barnes <tavianator@tavianator.com> as about 2.5% when running `perf script --itrace=i0`: https://lore.kernel.org/lkml/d841b97b3ad2ca8bcab07e4293375fb7c32dfce7.1736618095.git.tavianator@tavianator.com/ Adrian Hunter <adrian.hunter@intel.com> replied that the zero initialization was necessary and couldn't simply be removed. This patch aims to strike a middle ground of still zeroing the perf_sample, but removing 79% of its size by make user_regs and intr_regs optional pointers to zalloc-ed memory. To support the allocation accessors are created for user_regs and intr_regs. To support correct cleanup perf_sample__init and perf_sample__exit functions are created and added throughout the code base. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250113194345.1537821-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-04perf evsel: Reduce scanning core PMUs in is_hybridIan Rogers
evsel__is_hybrid returns true if there are multiple core PMUs and the evsel is for a core PMU. Determining the number of core PMUs can require loading/scanning PMUs. There's no point doing the scanning if evsel for the is_hybrid test isn't core so reorder the tests to reduce PMU scanning. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250201074320.746259-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-29perf evsel: Add pmu_name helperIan Rogers
Add helper to get the name of the evsel's PMU. This handles the case where there's no sysfs PMU via parse_events event_type helper. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Tested-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250109222109.567031-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-08perf evsel: Improve the evsel__open_strerror() for EBUSYIan Rogers
The existing EBUSY strerror message is: The sys_perf_event_open() syscall returned with 16 (Device or resource busy) for event (intel_bts//). "dmesg | grep -i perf" may provide additional information. The dmesg won't be useful. What is more useful is knowing what processes are potentially using the PMU, which some procfs scanning can reveal. When parallel testing tests/shell/stat_all_pmu.sh this yields: Testing intel_bts// Error: The PMU intel_bts counters are busy and in use by another process. Possible processes: 2585882 perf list 2585902 perf list -j -o /tmp/__perf_test.list_output.json.KF9MY 2585904 perf list 2585911 perf record -e task-clock --filter period > 1 -o /dev/null --quiet true 2585912 perf list 2585915 perf list 2586042 /tmp/perf/perf record -asdg -e cpu-clock -o /tmp/perftool-testsuite_report.dIF/perf_report/perf.data -- sleep 2 2589078 perf record -g -e task-clock:u -o - perf test -w noploop 2589148 /tmp/perf/perf record --control=fifo:control,ack -e cpu-clock -m 1 sleep 10 2589379 perf --buildid-dir /tmp/perf.debug.Umx record --buildid-all -o /tmp/perf.data.YBm /tmp/perf.ex.MD5.ZQW 2589568 perf record -o /tmp/__perf_test.program.mtcZH/perf.data --branch-filter any,save_type,u -- perf test -w brstack 2589649 perf record --per-thread -o /tmp/__perf_test.perf.data.5d3dc perf test -w thloop 2589898 perf record -o /tmp/perf-test-script.BX2b27Dcnj/pp-perf.data --sample-cpu uname Which gets a little closer to finding the issue. Committer testing: root@number:~# root@number:~# grep -m1 "model name" /proc/cpuinfo model name : Intel(R) Core(TM) i7-14700K root@number:~# Before: root@number:~# perf stat -e intel_bts// & [1] 197954 root@number:~# perf test "perf all PMU test" 124: perf all PMU test : FAILED! root@number:~# perf test -v "perf all PMU test" |& tail Testing i915/vecs0-busy/ Testing i915/vecs0-sema/ Testing i915/vecs0-wait/ Testing intel_bts// Unexpected signal in main Error: The sys_perf_event_open() syscall returned with 16 (Device or resource busy) for event (intel_bts//). "dmesg | grep -i perf" may provide additional information. ---- end(-1) ---- 124: perf all PMU test : FAILED! root@number:~# After: root@number:~# perf stat -e intel_bts// & [1] 200195 root@number:~# perf test "perf all PMU test" 123: perf all PMU test : FAILED! root@number:~# perf test -v "perf all PMU test" |& tail Testing i915/vecs0-wait/ Testing intel_bts// Unexpected signal in main Error: The PMU intel_bts counters are busy and in use by another process. Possible processes: 200195 perf stat -e intel_bts// 2319766 /root/bin/perf top --stdio ---- end(-1) ---- 123: perf all PMU test : FAILED! root@number:~# Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Ze Gao <zegao2021@gmail.com> Change-Id: Ie1ed8688286c44e8f44a35e98fed8be3e2a344df Link: https://lore.kernel.org/r/20241106003007.2112584-1-ctshao@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-20perf script: Cache the output typeArnaldo Carvalho de Melo
Right now every time we need to figure out the type of an evsel for output purposes we do a quick sequence of ifs, but there are new cases where there is a need to do more complex iterations over multiple data structures, sso allow for caching this operation on a hole of 'struct evsel'. This should really be done on the evsel->priv area that 'perf script' sets up, but more work is needed to make sure that it is allocated when we need it, right now it is only used for conditionally, add some comments so that we move this to that 'perf script' specific area when the conditions are in place for that. Acked-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/lkml/Z2XCi3PgstSrV0SE@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-18perf python: Add parse_events functionIan Rogers
Add basic parse_events function that takes a string and returns an evlist. As the python evlist is embedded in a pyrf_evlist, and the evsels are embedded in pyrf_evsels, copy the parsed data into those structs and update evsel__clone to enable this. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-20-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-18perf tools: Add missing_features for aux_start_paused, aux_pause, aux_resumeAdrian Hunter
Display "feature is not supported" error message if aux_start_paused, aux_pause or aux_resume result in a perf_event_open() error. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-18perf tools: Parse aux-actionAdrian Hunter
Add parsing for aux-action to accept "pause", "resume" or "start-paused" values. "start-paused" is valid only for AUX area events. "pause" and "resume" are valid only for events grouped with an AUX area event as the group leader. However, like with aux-output, the events will be automatically grouped if they are not currently in a group, and the AUX area event precedes the other events. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-18perf tools: Add aux-action config termAdrian Hunter
Add a new common config term "aux-action" to use for configuring AUX area trace pause / resume. The value is a string that will be parsed in a subsequent patch. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-13Merge remote-tracking branch 'torvalds/master' into perf-tools-nextArnaldo Carvalho de Melo
To get the fixes that went thru perf-tools for v6.13. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09perf evsel: Allow evsel__newtp without libtraceeventIan Rogers
Switch from reading the tracepoint format to reading the id directly for the evsel config. This avoids the need to initialize libtraceevent, plugins, etc. It is sufficient for many tracepoint commands to work like: $ perf stat -e sched:sched_switch true To populate evsel->tp_format, do lazy initialization using libtraceevent in the evsel__tp_format function (the sys and name are saved in evsel__newtp_idx for this purpose). Reading the id should be indicative of the format failing to load, but if not an error is reported in evsel__tp_format. This could happen for a tracepoint with a format that fails to parse. As tracepoints can be parsed without libtraceevent with this, remove the associated #ifdefs in parse-events.c. By only lazily parsing the tracepoint format information it is hoped this will help improve the performance of code using tracepoints but not the format information. It also cuts down on the build and ifdef logic. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Zixian Cai <fzczx123@gmail.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20241118225345.889810-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09perf evsel: Add/use accessor for tp_formatIan Rogers
Add an accessor function for tp_format. Rather than search+replace uses try to use a variable and reuse it. Add additional NULL checks when accessing/using the value. Make sure the PTR_ERR is nulled out on error path in evsel__newtp_idx. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Zixian Cai <fzczx123@gmail.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20241118225345.889810-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-05perf tools: Fix precise_ip fallback logicNamhyung Kim
Sometimes it returns other than EOPNOTSUPP for invalid precise_ip so it cannot check the error code. Let's move the fallback after the missing feature checks so that it can handle EINVAL as well. This also aligns well with the existing behavior which blindly turns off the precise_ip but we check the missing features correctly now. Fixes: af954f76eea56453 ("perf tools: Check fallback error and order") Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Closes: https://lore.kernel.org/oe-lkp/202411301431.799e5531-lkp@intel.com Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/Z1DV0lN8qHSysX7f@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-11-09perf pmu: Add calls enabling the hwmon_pmuIan Rogers
Add the base PMU calls necessary for hwmon_pmu(s) to be created/deleted and events found, listed, opened and read. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20241109003759.473460-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-11-08perf build: Include libtraceevent headers directly indicated by pkg-configYicong Yang
Currently the libtraceevent's found by pkg-config, which give the include path as: [root@localhost tmp]# pkg-config --cflags libtraceevent -I/usr/local/include/traceevent So we should include the libtraceevent headers directly without "traceevent/" prefix. Update all the users. Fixes: 0f0e1f445690 ("perf build: Use pkg-config for feature check for libtrace{event,fs}") Suggested-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/linux-perf-users/ZyF5_Hf1iL01kldE@google.com/ Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Cc: leo.yan@arm.com Cc: amadio@gentoo.org Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/20241105105649.45399-1-yangyicong@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-22perf tools: Check fallback error and orderNamhyung Kim
The perf_event_open might fail due to various reasons, so blindly reducing precise_ip level might not be the best way to deal with it. It seems the kernel return -EOPNOTSUPP when PMU doesn't support the given precise level. Let's try again with the correct error code. This caused a problem on AMD, as it stops on precise_ip of 2 for IBS but user events with exclude_kernel=1 cannot make progress. Let's add the evsel__handle_error_quirks() to this case specially. I plan to work on the kernel side to improve this situation but it'd still need some special handling for IBS. Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-8-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-22perf tools: Detect missing kernel features properlyNamhyung Kim
The evsel__detect_missing_features() is to check if the attributes of the evsel is supported or not. But it checks the attribute based on the given evsel, it might miss something if the attr doesn't have the bit or give incorrect results if the event is special. Also it maintains the order of the feature that was added to the kernel which means it can assume older features should be supported once it detects the current feature is working. To minimized the confusion and to accurately check the kernel features, I think it's better to use a software event and go through all the features at once. Also make the function static since it's only used in evsel.c. Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-6-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-22perf tools: Simplify evsel__add_modifier()Namhyung Kim
Since it doesn't set the exclude_guest, no need to special handle the bit and simply show only if one of host or guest bit is set. Now the default event name might not have :H prefix anymore so change the dlfilter test not to compare the ":" at the end. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-4-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-22perf tools: Add fallback for exclude_guestNamhyung Kim
Commit 7b100989b4f6bce70 ("perf evlist: Remove __evlist__add_default") changed to parse "cycles:P" event instead of creating a new cycles event for perf record. But it also changed the way how modifiers are handled so it doesn't set the exclude_guest bit by default. It seems Apple M1 PMU requires exclude_guest set and returns EOPNOTSUPP if not. Let's add a fallback so that it can work with default events. Also update perf stat hybrid tests to handle possible u or H modifiers. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-2-namhyung@kernel.org Fixes: 7b100989b4f6bce70 ("perf evlist: Remove __evlist__add_default") Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-17perf test: Remove C test wrapper for attr.pyIan Rogers
Remove the C wrapper now a shell script wrapper exists. Move perf_event_attr dumping functions to evsel.c and reduce the scope of variables/defines. Use fprintf to avoid snprintf complexities in WRITE_ASS. Add __SANE_USERSPACE_TYPES__ to evsel.c to fix format flag issues on PowerPC triggered by moving attr.c functions to evsel.c. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241015000158.871828-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-14perf evsel: Fix missing inherit + sample read checkNamhyung Kim
It should not clear the inherit bit simply because the kernel doesn't support the sample read with it. IOW the inherit bit should be kept when the sample read is not requested for the event. Fixes: 90035d3cd876cb71 ("tools/perf: Allow inherit + PERF_SAMPLE_READ when opening events") Acked-by: Ben Gainey <ben.gainey@arm.com> Link: https://lore.kernel.org/r/20241009062250.730192-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-10perf tool_pmu: Rename perf_tool_event__* to tool_pmu__*Ian Rogers
Now the events are associated with the tool PMU, rename the functions to reflect this. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-10perf tool_pmu: Rename enum perf_tool_event to tool_pmu_eventIan Rogers
To better reflect the events listed are from the tool PMU. Rename the enum values from PERF_TOOL_* to TOOL_PMU__EVENT_*. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-10perf tool_pmu: Factor tool events into their own PMUIan Rogers
Rather than treat tool events as a special kind of event, create a tool only PMU where the events/aliases match the existing duration_time, user_time and system_time events. Remove special parsing and printing support for the tool events, but add function calls for when PMU functions are called on a tool_pmu. Move the tool PMU code in evsel into tool_pmu.c to better encapsulate the tool event behavior in that file. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241002032016.333748-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-02tools/perf: Allow inherit + PERF_SAMPLE_READ when opening eventsBen Gainey
The "perf record" tool will now default to this new mode if the user specifies a sampling group when not in system-wide mode, and when "--no-inherit" is not specified. This change updates evsel to allow the combination of inherit and PERF_SAMPLE_READ. A fallback is implemented for kernel versions where this feature is not supported. Signed-off-by: Ben Gainey <ben.gainey@arm.com> Cc: james.clark@arm.com Link: https://lore.kernel.org/r/20241001121505.1009685-3-ben.gainey@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-09-30perf evsel: Reduce a variables scopeIan Rogers
In __evsel__config_callchain avoid computing arch until code path that uses it. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240918223116.127386-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-09-26perf evsel: Remove pmu_nameIan Rogers
"evsel->pmu_name" is only ever assigned a strdup of "pmu->name", a strdup of "evsel->pmu_name" or NULL. As such, prefer to use "pmu->name" directly and even to directly compare PMUs than PMU names. For safety, add some additional NULL tests. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> [ Fix arm-spe.c usage of pmu_name and empty PMU name ] Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: ak@linux.intel.com Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240926144851.245903-6-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-09-26perf evsel: Add alternate_hw_config and use in evsel__matchIan Rogers
There are cases where we want to match events like instructions and cycles with legacy hardware values, in particular in stat-shadow's hard coded metrics. An evsel's name isn't a good point of reference as it gets altered, strstr would be too imprecise and re-parsing the event from its name is silly. Instead, hold the legacy hardware event name, determined during parsing, in the evsel for this matching case. Inline evsel__match2 that is only used in builtin-diff. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: ak@linux.intel.com Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240926144851.245903-2-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-09-24perf evsel: display dmesg command of showing a hardcoded pathMasum Reza
In non-FHS compliant distros like NixOS, nothing resides in `/bin` and `/usr/bin`. Instead dynamically symlinked into `/run/current-system/sw/bin/`, the executable resides in `/nix/store`. With this patch,`/bin` prefix from the dmesg command in the error message is stripped. Link: https://github.com/NixOS/nixpkgs/pull/258027 Signed-off-by: Masum Reza <masumrezarock100@gmail.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240922112619.149429-1-masumrezarock100@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-09-11perf evsel: Add accessor for tool_eventIan Rogers
Currently tool events use a dedicated variable within the evsel. Later changes will move this to the unused struct perf_event_attr config for these events. Add an accessor to allow the later change to be well typed and avoid changing all uses. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240907050830.6752-4-irogers@google.com Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Xu Yang <xu.yang_2@nxp.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-19perf evsel: Constify evsel__id_hdr_size() argumentIan Rogers
Allows evsel__id_hdr_size() to be used when the evsel is const. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Casey Chen <cachen@purestorage.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jann Horn <jannh@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yunseong Kim <yskelg@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Link: https://lore.kernel.org/r/20240817064442.2152089-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-14perf evlist: Save branch counters informationKan Liang
The branch counters logging (A.K.A LBR event logging) introduces a per-counter indication of precise event occurrences in LBRs. The kernel only dumps the number of occurrences into a record. The perf tool has to map the number to the corresponding event. Add evlist__update_br_cntr() to go through the evlist to pick the events that are configured to be logged. Assign a logical idx to track them, and add the total number of the events in the leader event. The total number will be used to allocate the space to save the branch counters for a block. The logical idx will be used to locate the corresponding event quickly in the following patches. It only needs to iterate the evlist once. The evsel__has_branch_counters() is also optimized. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240813160208.2493643-4-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-14perf report: Remove the first overflow check for branch countersKan Liang
A false overflow warning is triggered if a sample doesn't have any LBRs recorded and the branch counters feature is enabled. The current code does OVERFLOW_CHECK_u64() at the very beginning when reading the information of branch counters. It assumes that there is at least one LBR in the PEBS record. But it is a valid case that 0 LBR is recorded especially in a high context switch. Remove the OVERFLOW_CHECK_u64(). The later OVERFLOW_CHECK() should be good enough to check the overflow when reading the information of the branch counters. Fixes: 9fbb4b02302b0ae6 ("perf tools: Add branch counter knob") Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240813160208.2493643-3-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-13perf stat: Fork and launch 'perf record' when 'perf stat' needs to get ↵Weilin Wang
retire latency value for a metric. When retire_latency value is used in a metric formula, evsel would fork a 'perf record' process with "-e" and "-W" options. 'perf record' will collect required retire_latency values in parallel while 'perf stat' is collecting counting values. At the point of time that 'perf stat' stops counting, evsel would stop 'perf record' by sending sigterm signal to 'perf record' process. Sampled data will be processed to get retire latency value. Another thread is required to synchronize between 'perf stat' and 'perf record' when we pass data through pipe. Retire_latency evsel is not opened for 'perf stat' so that there is no counter wasted on it. This commit includes code suggested by Namhyung to adjust reading size for groups that include retire_latency evsels. In current :R parsing implementation, the parser would recognize events with retire_latency modifier and insert them into the evlist like a normal event. Ideally, we need to avoid counting these events. In this commit, at the time when a retire_latency evsel is read, set the retire latency value processed from the sampled data to count value. This sampled retire latency value will be used for metric calculation and final event count print out. No special metric calculation and event print out code required for retire_latency events. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Weilin Wang <weilin.wang@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Link: https://lore.kernel.org/r/20240720062102.444578-4-weilin.wang@intel.com [ Squashed the 3rd and 4th commit in the series to keep it building patch by patch ] [ Constified the 'struct perf_tool' pointer in process_sample_event() ] [ Use perf_tool__init(&tool, false) to address a segfault I reported and Ian/Weilin diagnosed ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-06-25util: constant -1 with expression of type charYunseong Kim
This patch resolve following warning. tools/perf/util/evsel.c:1620:9: error: result of comparison of constant -1 with expression of type 'char' is always false -Werror,-Wtautological-constant-out-of-range-compare 1620 | if (c == -1) | ~ ^ ~~ Signed-off-by: Yunseong Kim <yskelg@gmail.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Austin Kim <austindh.kim@gmail.com> Cc: shjy180909@gmail.com Cc: Ze Gao <zegao2021@gmail.com> Cc: Leo Yan <leo.yan@linux.dev> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240619203428.6330-2-yskelg@gmail.com
2024-06-15perf hist: Honor symbol_conf.skip_emptyNamhyung Kim
So that it can skip events with no sample according to the config value. This can omit the dummy event in the output of perf report --group. An example output: $ sudo perf mem record -a sleep 1 $ sudo perf report --group Before) # # Samples: 232 of events 'cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P, dummy:u' # Event count (approx.): 3089861 # # Overhead Command Shared Object Symbol # ........................ ........... ................. ..................................... # 9.29% 0.00% 0.00% swapper [kernel.kallsyms] [k] update_blocked_averages 5.26% 0.15% 0.00% swapper [kernel.kallsyms] [k] __update_load_avg_se 4.15% 0.00% 0.00% perf-exec [kernel.kallsyms] [k] slab_update_freelist.isra.0 3.87% 0.00% 0.00% perf-exec [kernel.kallsyms] [k] memcg_slab_post_alloc_hook 3.79% 0.17% 0.00% swapper [kernel.kallsyms] [k] enqueue_task_fair 3.63% 0.00% 0.00% sleep [kernel.kallsyms] [k] next_uptodate_page 2.86% 0.00% 0.00% swapper [kernel.kallsyms] [k] __update_load_avg_cfs_rq 2.78% 0.00% 0.00% swapper [kernel.kallsyms] [k] __schedule 2.34% 0.00% 0.00% swapper [kernel.kallsyms] [k] intel_idle 2.32% 0.97% 0.00% swapper [kernel.kallsyms] [k] psi_group_change After) # # Samples: 232 of events 'cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P' # Event count (approx.): 3089861 # # Overhead Command Shared Object Symbol # ................ ........... ................. ..................................... # 9.29% 0.00% swapper [kernel.kallsyms] [k] update_blocked_averages 5.26% 0.15% swapper [kernel.kallsyms] [k] __update_load_avg_se 4.15% 0.00% perf-exec [kernel.kallsyms] [k] slab_update_freelist.isra.0 3.87% 0.00% perf-exec [kernel.kallsyms] [k] memcg_slab_post_alloc_hook 3.79% 0.17% swapper [kernel.kallsyms] [k] enqueue_task_fair 3.63% 0.00% sleep [kernel.kallsyms] [k] next_uptodate_page 2.86% 0.00% swapper [kernel.kallsyms] [k] __update_load_avg_cfs_rq 2.78% 0.00% swapper [kernel.kallsyms] [k] __schedule 2.34% 0.00% swapper [kernel.kallsyms] [k] intel_idle 2.32% 0.97% swapper [kernel.kallsyms] [k] psi_group_change Now it doesn't have a column for the dummy event. Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240607202918.2357459-5-namhyung@kernel.org