Age | Commit message (Collapse) | Author |
|
Since 9ffa6c7512ca ("perf machine thread: Remove exited threads by
default") perf cleans exited threads up, but as said, sometimes they
are necessary to be kept. The mentioned commit does not cover all the
cases, we also need the information to construct the summary table in
perf-trace.
Before:
# perf trace -s true
Summary of events:
After:
# perf trace -s -- true
Summary of events:
true (383382), 64 events, 91.4%
syscall calls errors total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- ------ -------- --------- --------- --------- ------
mmap 8 0 0.150 0.013 0.019 0.031 11.90%
mprotect 3 0 0.045 0.014 0.015 0.017 6.47%
openat 2 0 0.014 0.006 0.007 0.007 9.73%
munmap 1 0 0.009 0.009 0.009 0.009 0.00%
access 1 1 0.009 0.009 0.009 0.009 0.00%
pread64 4 0 0.006 0.001 0.001 0.002 4.53%
fstat 2 0 0.005 0.001 0.002 0.003 37.59%
arch_prctl 2 1 0.003 0.001 0.002 0.002 25.91%
read 1 0 0.003 0.003 0.003 0.003 0.00%
close 2 0 0.003 0.001 0.001 0.001 3.86%
brk 1 0 0.002 0.002 0.002 0.002 0.00%
rseq 1 0 0.001 0.001 0.001 0.001 0.00%
prlimit64 1 0 0.001 0.001 0.001 0.001 0.00%
set_robust_list 1 0 0.001 0.001 0.001 0.001 0.00%
set_tid_address 1 0 0.001 0.001 0.001 0.001 0.00%
execve 1 0 0.000 0.000 0.000 0.000 0.00%
[namhyung: simplified the condition]
Fixes: 9ffa6c7512ca ("perf machine thread: Remove exited threads by default")
Reported-by: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Link: https://lore.kernel.org/r/20240927151926.399474-1-mpetlan@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Command perf test 86 fails on s390:
# perf test -F 86
ping 868299 [007] 28248.013596: probe_libc:inet_pton_1: (3ff95948020)
3ff95948020 inet_pton+0x0 (inlined)
3ff9595e6e7 text_to_binary_address+0x1007 (inlined)
3ff9595e6e7 gaih_inet+0x1007 (inlined)
FAIL: expected backtrace entry \
"main\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$"
got "3ff9595e6e7 gaih_inet+0x1007 (inlined)"
86: probe libc's inet_pton & backtrace it with ping : FAILED!
#
The root cause is a new stack layout, two functions have been added
as seen below.
# perf script | tac | grep -m1 '^ping' -B9 | tac
ping 866856 [007] 25979.494921: probe_libc:inet_pton: (3ff8ec48020)
3ff8ec48020 inet_pton+0x0 (inlined)
new --> 3ff8ec5e6e7 text_to_binary_address+0x1007 (inlined)
new --> 3ff8ec5e6e7 gaih_inet+0x1007 (inlined)
3ff8ec5e6e7 getaddrinfo+0x1007 (/usr/lib64/libc.so.6)
2aa3fe04bf5 main+0xff5 (/usr/bin/ping)
3ff8eb34a5b __libc_start_call_main+0x8b (/usr/lib64/libc.so.6)
3ff8eb34b5d __libc_start_main@GLIBC_2.2+0xad (inlined)
2aa3fe06a1f [unknown] (/usr/bin/ping)
#
The new functions in the call chain are:
- text_to_binary_address()
- gaih_inet().
Both functions are inlined and do not show up in the output
of the nm command:
# nm -a /usr/lib64/libc.so.6 | \
grep -E '(text_to_binary_address|gaih_inet)$'
#
There is no possibility to add these 2 functions depending on their
existance in the C library.
Add text_to_binary_address() and gaih_inet() to the list of
expected functions in an compatible way and extend the regular
expression. On s390 the backtrace can now be
Before After
probe_libc:inet_pton probe_libc:inet_pton
inet_pton inet_pton
getaddrinfo getaddrinfo | text_to_binary_address
main main | gaih_inet
Output after:
# perf test -F 86
86: probe libc's inet_pton & backtrace it with ping : Ok
#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: agordeev@linux.ibm.com
Cc: gor@linux.ibm.com
Cc: hca@linux.ibm.com
Cc: sumanthk@linux.ibm.com
Link: https://lore.kernel.org/r/20241001124224.3370306-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The "perf record" tool will now default to this new mode if the user
specifies a sampling group when not in system-wide mode, and when
"--no-inherit" is not specified.
This change updates evsel to allow the combination of inherit
and PERF_SAMPLE_READ.
A fallback is implemented for kernel versions where this feature is not
supported.
Signed-off-by: Ben Gainey <ben.gainey@arm.com>
Cc: james.clark@arm.com
Link: https://lore.kernel.org/r/20241001121505.1009685-3-ben.gainey@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Sample period calculation in deliver_sample_value is updated to
calculate the per-thread period delta for events that are inherit +
PERF_SAMPLE_READ. When the sampling event has this configuration, the
read_format.id is used with the tid from the sample to lookup the
storage of the previously accumulated counter total before calculating
the delta. All existing valid configurations where read_format.value
represents some global value continue to use just the read_format.id to
locate the storage of the previously accumulated total.
perf_sample_id is modified to support tracking per-thread
values, along with the existing global per-id values. In the
per-thread case, values are stored in a hash by tid within the
perf_sample_id, and are dynamically allocated as the number is not known
ahead of time.
Signed-off-by: Ben Gainey <ben.gainey@arm.com>
Cc: james.clark@arm.com
Link: https://lore.kernel.org/r/20241001121505.1009685-2-ben.gainey@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Clean up return value to be TEST_* rather than unspecific integer. Add
test case skip reason. Skip test if EACCES comes back from
evsel__newtp.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241001052327.7052-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Clean up return value to be TEST_* rather than unspecific integer. Add
test case skip reason. Skip test if EACCES comes back from
evsel__newtp.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241001052327.7052-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
These error paths occur without sufficient permissions. Fix the memory
leaks to make leak sanitizer happier.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241001052327.7052-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Missed cleanup when an error occurs.
Fixes: 49de179577e7 ("perf stat: No need to setup affinities when starting a workload")
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241001052327.7052-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The "perf all PMU test" fails on a Coffee Lake machine.
The failure is caused by the below change in the commit e2641db83f18
("perf vendor events: Add/update skylake events/metrics").
+ {
+ "BriefDescription": "This 48-bit fixed counter counts the UCLK cycles",
+ "Counter": "FIXED",
+ "EventCode": "0xff",
+ "EventName": "UNC_CLOCK.SOCKET",
+ "PerPkg": "1",
+ "PublicDescription": "This 48-bit fixed counter counts the UCLK cycles.",
+ "Unit": "cbox_0"
}
The other cbox events have the unit name "CBOX", while the fixed counter
has a unit name "cbox_0". So the events_table will maintain separate
entries for cbox and cbox_0.
The perf_pmus__print_pmu_events() calculates the total number of events,
allocate an aliases buffer, store all the events into the buffer, sort,
and print all the aliases one by one.
The problem is that the calculated total number of events doesn't match
the stored events in the aliases buffer.
The perf_pmu__num_events() is used to calculate the number of events. It
invokes the pmu_events_table__num_events() to go through the entire
events_table to find all events. Because of the
pmu_uncore_alias_match(), the suffix of uncore PMU will be ignored. So
the events for cbox and cbox_0 are all counted.
When storing events into the aliases buffer, the
perf_pmu__for_each_event() only process the events for cbox.
Since a bigger buffer was allocated, the last entry are all 0.
When printing all the aliases, null will be outputted, and trigger the
failure.
The mismatch was introduced from the commit e3edd6cf6399 ("perf
pmu-events: Reduce processed events by passing PMU"). The
pmu_events_table__for_each_event() stops immediately once a pmu is set.
But for uncore, especially this case, the method is wrong and mismatch
what perf does in the perf_pmu__num_events().
With the patch,
$ perf list pmu | grep -A 1 clock.socket
unc_clock.socket
[This 48-bit fixed counter counts the UCLK cycles. Unit: uncore_cbox_0
$ perf test "perf all PMU test"
107: perf all PMU test : Ok
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/all/202407101021.2c8baddb-oliver.sang@intel.com/
Fixes: e3edd6cf6399 ("perf pmu-events: Reduce processed events by passing PMU")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241001021431.814811-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
|
|
If one builds perf with DEBUG=1, captures data on multiple CPUs and
finally runs 'perf report -C <cpu>' for only one of the cpus, assert()
aborts the program. This happens because there are empty queues with
format set.
This patch changes the condition to abort only if a queue is not empty
and if the format is unset.
$ make -C tools/perf DEBUG=1 CORESIGHT=1 CSLIBS=/usr/lib CSINCLUDES=/usr/include install
$ perf record -o kcore --kcore -e cs_etm/timestamp/k -s -C 0-1 dd if=/dev/zero of=/dev/null bs=1M count=1
$ perf report --input kcore/data --vmlinux=/home/ikoskine/projects/linux/vmlinux -C 1
Aborted (core dumped)
Fixes: 57880a7966be510c ("perf: cs-etm: Allocate queues for all CPUs")
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240924233930.5193-1-ilkka@os.amperecomputing.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
For libdw versions below 0.177, need to link libdl.a in addition to
libbebl.a during static compilation, otherwise
feature-dwarf_getlocations compilation will fail.
Before:
$ make LDFLAGS=-static
BUILD: Doing 'make -j20' parallel build
<SNIP>
Makefile.config:483: Old libdw.h, finding variables at given 'perf probe' point will not work, install elfutils-devel/libdw-dev >= 0.157
<SNIP>
$ cat ../build/feature/test-dwarf_getlocations.make.output
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libebl.a(eblclosebackend.o): in function `ebl_closebackend':
(.text+0x20): undefined reference to `dlclose'
collect2: error: ld returned 1 exit status
After:
$ make LDFLAGS=-static
<SNIP>
Auto-detecting system features:
... dwarf: [ on ]
<SNIP>
$ ./perf probe
Usage: perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...]
or: perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...]
or: perf probe [<options>] --del '[GROUP:]EVENT' ...
or: perf probe --list [GROUP:]EVENT ...
<SNIP>
Fixes: 536661da6ea18fe6 ("perf: build: Only link libebl.a for old libdw")
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240919013513.118527-3-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
If libdw is not installed in build environment, the output of
'pkg-config --modversion libdw' is empty, causing LIBDW_VERSION_2 to be
empty and the shell test will have the following error:
/bin/sh: 1: test: -lt: unexpected operator
Before:
$ pkg-config --modversion libdw
Package libdw was not found in the pkg-config search path.
Perhaps you should add the directory containing `libdw.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libdw' found
$ make LDFLAGS=-static -j16
BUILD: Doing 'make -j20' parallel build
<SNIP>
Package libdw was not found in the pkg-config search path.
Perhaps you should add the directory containing `libdw.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libdw' found
/bin/sh: 1: test: -lt: unexpected operator
After:
1. libdw is not installed:
$ pkg-config --modversion libdw
Package libdw was not found in the pkg-config search path.
Perhaps you should add the directory containing `libdw.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libdw' found
$ make LDFLAGS=-static -j16
BUILD: Doing 'make -j20' parallel build
<SNIP>
Package libdw was not found in the pkg-config search path.
Perhaps you should add the directory containing `libdw.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libdw' found
Makefile.config:473: No libdw DWARF unwind found, Please install elfutils-devel/libdw-dev >= 0.158 and/or set LIBDW_DIR
2. libdw version is lower than 0.177
$ pkg-config --modversion libdw
0.176
$ make LDFLAGS=-static -j16
BUILD: Doing 'make -j20' parallel build
<SNIP>
Auto-detecting system features:
... dwarf: [ on ]
<SNIP>
INSTALL libsubcmd_headers
INSTALL libapi_headers
INSTALL libperf_headers
INSTALL libsymbol_headers
INSTALL libbpf_headers
LINK perf
3. libdw version is higher than 0.177
$ pkg-config --modversion libdw
0.186
$ make LDFLAGS=-static -j16
BUILD: Doing 'make -j20' parallel build
<SNIP>
Auto-detecting system features:
... dwarf: [ on ]
<SNIP>
CC util/bpf-utils.o
CC util/pfm.o
LD util/perf-util-in.o
LD perf-util-in.o
AR libperf-util.a
LINK perf
Fixes: 536661da6ea18fe6 ("perf: build: Only link libebl.a for old libdw")
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240919013513.118527-2-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The linked fixes commit added an #include "dwarf-aux.h" to disasm.h
which gets picked up in a lot of places. Without
HAVE_DWARF_GETLOCATIONS_SUPPORT the stubs return an errno, so include
errno.h to fix the following build error:
In file included from util/disasm.h:8,
from util/annotate.h:16,
from builtin-top.c:23:
util/dwarf-aux.h: In function 'die_get_var_range':
util/dwarf-aux.h:183:10: error: 'ENOTSUP' undeclared (first use in this function)
183 | return -ENOTSUP;
| ^~~~~~~
Fixes: 782959ac248ac3cb ("perf annotate: Add "update_insn_state" callback function to handle arch specific instruction tracking")
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241001123625.1063153-1-james.clark@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
With 6d74e1e371d43a7b ("tools/lib/list_sort: remove redundant code for
cond_resched handling") we need to use the newly added hunk based
exceptions when comparing the copy we carry in tools/lib/ to the
original file, do it by adding the hunks that we know will be the
expected diff.
If at some point the original file is updated in other parts, then we
should flag and check the file for update.
Acked-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/lkml/20240930202136.16904-3-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
With 6d74e1e371d43a7b ("tools/lib/list_sort: remove redundant code for
cond_resched handling") we end up with a multi-line variation in the
merge_final() implementation, one that the simple line based exceptions
we had so far can't cope.
Thus this check has been failing:
Warning: Kernel ABI header differences:
diff -u tools/lib/list_sort.c lib/list_sort.c
So add a new check routine that uses grep -vf to exclude some hunks that
we store in the tools/perf/check-header_ignore_hunks/ directory.
This first patch is just the new check routine, the next one will use it
to check lib/list_sort.c.
Acked-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/lkml/20240930202136.16904-2-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add more test cases to cover all supported topdown events regroup cases.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lore.kernel.org/r/20240913084712.13861-7-dapeng1.mi@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add counting and leader sampling tests to verify topdown events including
raw format can be reordered correctly.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lore.kernel.org/r/20240913084712.13861-6-dapeng1.mi@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add leader sampling test to validate event counts are captured into
record and the count value is consistent.
Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lore.kernel.org/r/20240913084712.13861-5-dapeng1.mi@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
when running below perf command, we say error is reported.
perf record -e "{slots,instructions,topdown-retiring}:S" -vv -C0 sleep 1
------------------------------------------------------------
perf_event_attr:
type 4 (cpu)
size 168
config 0x400 (slots)
sample_type IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
read_format ID|GROUP|LOST
disabled 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 4 (cpu)
size 168
config 0x8000 (topdown-retiring)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
read_format ID|GROUP|LOST
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd 5 flags 0x8
sys_perf_event_open failed, error -22
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for
event (topdown-retiring).
The reason of error is that the events are regrouped and
topdown-retiring event is moved to closely after the slots event and
topdown-retiring event needs to do the sampling, but Intel PMU driver
doesn't support to sample topdown metrics events.
For topdown metrics events, it just requires to be in a group which has
slots event as leader. It doesn't require topdown metrics event must be
closely after slots event. Thus it's a overkill to move topdown metrics
event closely after slots event in events regrouping and furtherly cause
the above issue.
Thus don't move topdown metrics events forward if they are already in a
group.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lore.kernel.org/r/20240913084712.13861-4-dapeng1.mi@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Addresses an issue where, in the absence of a topdown metrics event
within a sampling group, the slots event was incorrectly bypassed as
the sampling leader when sample_read was enabled.
perf record -e '{slots,branches}:S' -c 10000 -vv sleep 1
In this case, the slots event should be sampled as leader but the
branches event is sampled in fact like the verbose output shows.
perf_event_attr:
type 4 (cpu)
size 168
config 0x400 (slots)
sample_type IP|TID|TIME|READ|CPU|IDENTIFIER
read_format ID|GROUP|LOST
disabled 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 168
config 0x4 (PERF_COUNT_HW_BRANCH_INSTRUCTIONS)
{ sample_period, sample_freq } 10000
sample_type IP|TID|TIME|READ|CPU|IDENTIFIER
read_format ID|GROUP|LOST
sample_id_all 1
exclude_guest 1
The sample period of slots event instead of branches event is reset to
0.
This fix ensures the slots event remains the leader under these
conditions.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lore.kernel.org/r/20240913084712.13861-3-dapeng1.mi@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
It's not complete to check whether an event is a topdown slots or
topdown metrics event by only comparing the event name since user
may assign the event by RAW format, e.g.
perf stat -e '{instructions,cpu/r400/,cpu/r8300/}' sleep 1
Performance counter stats for 'sleep 1':
<not counted> instructions
<not counted> cpu/r400/
<not supported> cpu/r8300/
1.002917796 seconds time elapsed
0.002955000 seconds user
0.000000000 seconds sys
The RAW format slots and topdown-be-bound events are not recognized and
not regroup the events, and eventually cause error.
Thus add two helpers arch_is_topdown_slots()/arch_is_topdown_metrics()
to detect whether an event is topdown slots/metrics event by comparing
the event config directly, and use these two helpers to replace the
original event name comparisons.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lore.kernel.org/r/20240913084712.13861-2-dapeng1.mi@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
In __evsel__config_callchain avoid computing arch until code path that
uses it.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Link: https://lore.kernel.org/r/20240918223116.127386-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
HiSilicon HIP08 does support Topdown metrics but perf tool complains
when trying to count Topdown metrics:
[root@localhost tracing]# perf stat --topdown
Topdown requested but the topdown metric groups aren't present.
(See perf list the metric groups have names like TopdownL1)
It's because tool's using "Topdown" as the metric group name[1] rather
than "TopDown", so follow the convention. This is introduced by [2]
which allows to use json metrics to support --topdown function.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/builtin-stat.c?h=v6.11-rc1#n1994
[2] commit 1647cd5b8802 ("perf stat: Implement --topdown using json metrics")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: prime.zeng@hisilicon.com
Cc: hejunhao3@huawei.com
Cc: linuxarm@huawei.com
Cc: shameerali.kolothum.thodi@huawei.com
Link: https://lore.kernel.org/r/20240912063903.31460-1-yangyicong@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
To pick the changes in:
8f0b3cc9a4c102c2 ("tcp: RX path for devmem TCP")
That don't result in any changes in the tables generated from that
header.
But while updating I noticed we need to support the new MSG_SOCK_DEVMEM
flag in the hard coded table for the msg flags table, add it.
This silences this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h
Please see tools/include/uapi/README for details.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZvrO_eT9e_41xrNv@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
the kernel sources
To pick up the change in:
a1fab3e69d9d0e9b ("x86/irq: Fix comment on IRQ vector layout")
That just adds some comments, so no changes in perf tooling, just
silences this build warning:
diff -u tools/perf/trace/beauty/arch/x86/include/asm/irq_vectors.h arch/x86/include/asm/irq_vectors.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sohil Mehta <sohil.mehta@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/ZvrKT7oQc1AOv6Vk@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Picking the changes from:
4356d575ef0f39a3 ("fhandle: expose u64 mount id to name_to_handle_at(2)")
b4fef22c2fb97fa2 ("uapi: explain how per-syscall AT_* flags should be allocated")
820a185896b77814 ("fcntl: add F_CREATED_QUERY")
It just moves AT_REMOVEDIR around, and adds a bunch more AT_ for
renameat2() and name_to_handle_at(). We need to improve this situation,
as not all AT_ defines are applicable to all fs flags...
This adds support for those new AT_ defines, addressing this build
warning:
diff -u tools/perf/trace/beauty/include/uapi/sound/asound.h include/uapi/sound/asound.h
Reviewed-by: Aleksa Sarai <cyphar@cyphar.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/lkml/ZvrIKL3cREoRHIQd@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Use of macro ARRAY_SIZE to calculate array size minimizes
the redundant code and improves code reusability.
./tools/perf/tests/demangle-java-test.c:31:34-35: WARNING: Use ARRAY_SIZE.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=11173
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240929093045.10136-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Picking the changes from:
f0e1a0643a59bf1f ("sched_ext: Implement BPF extensible scheduler class")
The inclusion of the SCHED_EXT define doesn't cause any change in
behaviour in tools/perf.
This just silences this perf tools build warning:
diff -u tools/perf/trace/beauty/include/uapi/sound/asound.h include/uapi/sound/asound.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/lkml/ZvrDShNVXotZpiwk@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Picking the changes from:
37745918e0e7575b ("ALSA: timer: Introduce virtual userspace-driven timers")
Which entails no changes in the tooling side as it only introduces new
SNDRV_TIMER_IOCTL_ ioctls, and the ones tracked by scripts in
tools/perf/trace/beauty/ are only SNDRV_PCM_IOCTL_ and SNDRV_CTL_IOCTL_,
we still need to support SNDRV_TIMER_IOCTL_ ones, but that probably will
be one of the first for a BTF enumeration based approach :-)
This silences this perf tools build warning:
diff -u tools/perf/trace/beauty/include/uapi/sound/asound.h include/uapi/sound/asound.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ivan Orlov <ivan.orlov0322@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/lkml/ZvrB-g_E7g2ArlYW@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
If the dso type doesn't match then NULL is returned but the dso should
be put first.
Fixes: f649ed80f3cabbf1 ("perf dsos: Tidy reference counting and locking")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240912182757.762369-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
perf test 70 takes a long time. One culprit is the output of command
perf annotate. Per default enabled are
- demangle symbol names
- interleave source code with assembly code.
Disable demangle of symbols and abort the annotation
after the first 250 lines.
This speeds up the test case considerable, for example
on s390:
Output before:
# time perf test 70
70: perf annotate basic tests : Ok
.....
real 2m7.467s
user 1m26.869s
sys 0m34.086s
#
Output after:
# time perf test 70
70: perf annotate basic tests : Ok
real 0m3.341s
user 0m1.606s
sys 0m0.362s
#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: sumanthk@linux.ibm.com
Link: https://lore.kernel.org/r/20240917085706.249691-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
With commit 8ec9497d3ef34 ("tools/include: Sync uapi/linux/perf.h
with the kernel sources"), 'perf mem report' gives an incorrect memory
access string.
...
0.02% 1 3644 L5 hit [.] 0x0000000000009b0e mlc [.] 0x00007fce43f59480
...
This occurs because, if no entry exists in mem_lvlnum, perf_mem__lvl_scnprintf
will default to 'L%d, lvl', which in this case for PERF_MEM_LVLNUM_L2_MHB is 0x05.
Add entries for PERF_MEM_LVLNUM_L2_MHB and PERF_MEM_LVLNUM_MSC to mem_lvlnum,
so that the correct strings are printed.
...
0.02% 1 3644 L2 MHB hit [.] 0x0000000000009b0e mlc [.] 0x00007fce43f59480
...
Fixes: 8ec9497d3ef34 ("tools/include: Sync uapi/linux/perf.h with the kernel sources")
Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Thomas Falcon <thomas.falcon@intel.com>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20240926144040.77897-1-thomas.falcon@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The sleep_sem semaphore and the specific_wait field (member of sched_atom)
are initialized but not used anywhere in the code, so this patch removes
them.
The SCHED_EVENT_MIGRATION case in perf_sched__process_event() is currently
not used and is also removed.
Additionally, prev_state in add_sched_event_sleep() is marked with
__maybe_unused and is not utilized anywhere in the function. This patch
removes the parameter.
If the task_state parameter was intended for future use, it can be
reintroduced when needed.
No functionality change intended.
Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20240917090100.42783-1-vineethr@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Test that one cycles event is opened for each core PMU when "perf stat"
is run without arguments.
The event line can either be output as "pmu/cycles/" or just "cycles" if
there is only one PMU. Include 2 spaces for padding in the one PMU case
to avoid matching when the word cycles is included in metric
descriptions.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: ak@linux.intel.com
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240926144851.245903-8-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
PMUs aren't listed in /sys/devices/ on DT devices, so change the search
directory to /sys/bus/event_source/devices which works everywhere. Also
add armv8_cortex_* as a known PMU type to search for to make the test
run on more devices.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: ak@linux.intel.com
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240926144851.245903-7-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
"evsel->pmu_name" is only ever assigned a strdup of "pmu->name", a
strdup of "evsel->pmu_name" or NULL. As such, prefer to use
"pmu->name" directly and even to directly compare PMUs than PMU
names. For safety, add some additional NULL tests.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
[ Fix arm-spe.c usage of pmu_name and empty PMU name ]
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: ak@linux.intel.com
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240926144851.245903-6-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Use PMU interface to better detect core PMU for legacy events. Look
for slots event on core PMU if it is appropriate for the event.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: ak@linux.intel.com
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240926144851.245903-5-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
add_default_atttributes would add evsels by having pre-created
perf_event_attr, however, this needed fixing for hybrid as the
extended PMU type was necessary for each core PMU. The logic for this
was in an arch specific x86 function and wasn't present for ARM,
meaning that default events weren't being opened on all PMUs on
ARM. Change the creation of the default events to use parse_events and
strings as that will open the events on all PMUs.
Rather than try to detect events on PMUs before parsing, parse the
event but skip its output in stat-display.
The previous order of hardware events was: cycles,
stalled-cycles-frontend, stalled-cycles-backend, instructions. As
instructions is a more fundamental concept the order is changed to:
instructions, cycles, stalled-cycles-frontend, stalled-cycles-backend.
Closes: https://lore.kernel.org/lkml/CAP-5=fVABSBZnsmtRn1uF-k-G1GWM-L5SgiinhPTfHbQsKXb_g@mail.gmail.com/
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
[Don't display unsupported default events except 'cycles']
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: ak@linux.intel.com
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240926144851.245903-4-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Without aggregation on Intel:
```
$ perf stat -e instructions,cycles ...
```
Will use "cycles" for the name of the legacy cycles event but as
"instructions" has a sysfs name it will and a "[cpu]" PMU suffix. This
often breaks things as the space between the event and the PMU name
look like an extra column. The existing uniquify logic was also
uniquifying in cases when all events are core and not with uncore
events, it was not correctly handling modifiers, etc.
Change the logic so that an initial pass that can disable
uniquification is run. For individual counters, disable uniquification
in more cases such as for consistency with legacy events or for
libpfm4 events. Don't use the "[pmu]" style suffix in uniquification,
always use "pmu/.../". Change how modifiers/terms are handled in the
uniquification so that they look like parse-able events.
This fixes "102: perf stat metrics (shadow stat) test:" that has been
failing due to "instructions [cpu]" breaking its column/awk logic when
values aren't aggregated. This started happening when instructions
could match a sysfs rather than a legacy event, so the fixes tag
reflects this.
Fixes: 617824a7f0f7 ("perf parse-events: Prefer sysfs/JSON hardware events over legacy")
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
[ Fix Intel TPEBS counting mode test ]
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: ak@linux.intel.com
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240926144851.245903-3-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
There are cases where we want to match events like instructions and
cycles with legacy hardware values, in particular in stat-shadow's
hard coded metrics. An evsel's name isn't a good point of reference as
it gets altered, strstr would be too imprecise and re-parsing the
event from its name is silly. Instead, hold the legacy hardware event
name, determined during parsing, in the evsel for this matching case.
Inline evsel__match2 that is only used in builtin-diff.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: ak@linux.intel.com
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240926144851.245903-2-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Refactor code to have some more error diagnosis on traps, etc. and to
do less work on each line. Add an ignore situation for security failures.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20240925173013.12789-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When it loads symbols from an ELF file, it loads label symbols which is
0 size. Sometimes it has the same address with other symbols and might
shadow the original symbols because it fixes up the size of the symbol.
For example, in my system __do_softirq is shadowed and only accepts the
__softirqentry_text_start instead. But it should accept __do_softirq.
$ readelf -sW vmlinux | grep -e __do_softirq -e __softirqentry_text_start
105089: ffffffff82000000 814 FUNC GLOBAL DEFAULT 1 __do_softirq
111954: ffffffff82000000 0 NOTYPE GLOBAL DEFAULT 1 __softirqentry_text_start
$ perf annotate --stdio __do_softirq
Error:
The perf.data data has no samples!
$ perf annotate --stdio __softirqentry_text_start | head
Percent | Source code & Disassembly of vmlinux for cycles (26 samples, percent: local period)
---------------------------------------------------------------------------------------------------
: 0 0xffffffff82000000 <__softirqentry_text_start>:
0.00 : ffffffff82000000: nopl (%rax,%rax)
30.77 : ffffffff82000005: pushq %rbp
3.85 : ffffffff82000006: movq %rsp, %rbp
0.00 : ffffffff82000009: pushq %r15
3.85 : ffffffff8200000b: pushq %r14
3.85 : ffffffff8200000d: pushq %r13
0.00 : ffffffff8200000f: pushq %r12
We can ignore NOTYPE symbols in the symbols__fixup_end() so that it can
pick the __do_softirq() in choose_best_symbol(). This should be fine
since most symbols have either STT_FUNC or STT_OBJECT.
Link: https://lore.kernel.org/r/20240912224208.3360116-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Except lpddr5, i.MX95 also support lpddr4x. This will add a metric for
lpddr4x.
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Cc: shawnguo@kernel.org
Cc: will@kernel.org
Cc: james.clark@linaro.org
Cc: mike.leach@linaro.org
Cc: imx@lists.linux.dev
Cc: john.g.garry@oracle.com
Cc: kernel@pengutronix.de
Cc: s.hauer@pengutronix.de
Link: https://lore.kernel.org/r/20240924030812.3211029-1-xu.yang_2@nxp.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Exit when run_perf_stat() returns an error to avoid continuously
repeating the same error message. It's not expected that COUNTER_FATAL
or internal errors are recoverable so there's no point in retrying.
This fixes the following flood of error messages for permission issues,
for example when perf_event_paranoid==3:
perf stat -r 1044 -- false
Error:
Access to performance monitoring and observability operations is limited.
...
Error:
Access to performance monitoring and observability operations is limited.
...
(repeating for 1044 times).
Signed-off-by: Levi Yun <yeoreum.yun@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: nd@arm.com
Cc: howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240925132022.2650180-3-yeoreum.yun@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When create_perf_stat_counter() failed, it doesn't close workload.cork_fd
open in evlist__prepare_workload(). This could make too many open file
error while __run_perf_stat() repeats.
Introduce evlist__cancel_workload to close workload.cork_fd and
wait workload.child_pid until exit to clear child process
when create_perf_stat_counter() is failed.
Signed-off-by: Levi Yun <yeoreum.yun@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: nd@arm.com
Cc: howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240925132022.2650180-2-yeoreum.yun@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
In non-FHS compliant distros like NixOS, nothing resides in `/bin`
and `/usr/bin`. Instead dynamically symlinked into
`/run/current-system/sw/bin/`, the executable resides in `/nix/store`.
With this patch,`/bin` prefix from the dmesg command in the error
message is stripped.
Link: https://github.com/NixOS/nixpkgs/pull/258027
Signed-off-by: Masum Reza <masumrezarock100@gmail.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20240922112619.149429-1-masumrezarock100@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Run a few samples through the disassembly script and check to see that
at least one branch instruction is printed.
Signed-off-by: James Clark <james.clark@linaro.org>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ruidong Tian <tianruidong@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Cc: John Garry <john.g.garry@oracle.com>
Cc: scclevenger@os.amperecomputing.com
Link: https://lore.kernel.org/r/20240916135743.1490403-8-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Make it possible to only disassemble a range of timestamps or sample
indexes. This will be used by the test to limit the runtime, but it's
also useful for users.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ruidong Tian <tianruidong@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Cc: John Garry <john.g.garry@oracle.com>
Cc: scclevenger@os.amperecomputing.com
Link: https://lore.kernel.org/r/20240916135743.1490403-7-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Make vmlinux detection automatic and use Perf's default objdump
when -d is specified. This will make it easier for a test to use the
script without having to provide arguments. And similarly for users.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ruidong Tian <tianruidong@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Cc: John Garry <john.g.garry@oracle.com>
Cc: scclevenger@os.amperecomputing.com
Link: https://lore.kernel.org/r/20240916135743.1490403-6-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|