Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next
Pull more perf tools updates from Namhyung Kim:
"These are remaining changes and fixes for this cycle.
Build:
- Allow generating vmlinux.h from BTF using `make GEN_VMLINUX_H=1`
and skip if the vmlinux has no BTF.
- Replace deprecated clang -target xxx option by --target=xxx.
perf record:
- Print event attributes with well known type and config symbols in
the debug output like below:
# perf record -e cycles,cpu-clock -C0 -vv true
<SNIP>
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0 (PERF_COUNT_SW_CPU_CLOCK)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
- Update AMD IBS event error message since it now support per-process
profiling but no priviledge filters.
$ sudo perf record -e ibs_op//k -C 0
Error:
AMD IBS doesn't support privilege filtering. Try again without
the privilege modifiers (like 'k') at the end.
perf lock contention:
- Support CSV style output using -x option
$ sudo perf lock con -ab -x, sleep 1
# output: contended, total wait, max wait, avg wait, type, caller
19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0
15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e
4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d
1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135
8, 67608, 27404, 8451, spinlock, __queue_work+0x174
3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff
3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248
2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad
1, 24619, 24619, 24619, spinlock, rcu_core+0xd4
- Add --output option to save the data to a file not to be interfered
by other debug messages.
Test:
- Fix event parsing test on ARM where there's no raw PMU nor supports
PERF_PMU_CAP_EXTENDED_HW_TYPE.
- Update the lock contention test case for CSV output.
- Fix a segfault in the daemon command test.
Vendor events (JSON):
- Add has_event() to check if the given event is available on system
at runtime. On Intel machines, some transaction events may not be
present when TSC extensions are disabled.
- Update Intel event metrics.
Misc:
- Sort symbols by name using an external array of pointers instead of
a rbtree node in the symbol. This will save 16-bytes or 24-bytes
per symbol whether the sorting is actually requested or not.
- Fix unwinding DWARF callstacks using libdw when --symfs option is
used"
* tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (38 commits)
perf test: Fix event parsing test when PERF_PMU_CAP_EXTENDED_HW_TYPE isn't supported.
perf test: Fix event parsing test on Arm
perf evsel amd: Fix IBS error message
perf: unwind: Fix symfs with libdw
perf symbol: Fix uninitialized return value in symbols__find_by_name()
perf test: Test perf lock contention CSV output
perf lock contention: Add --output option
perf lock contention: Add -x option for CSV style output
perf lock: Remove stale comments
perf vendor events intel: Update tigerlake to 1.13
perf vendor events intel: Update skylakex to 1.31
perf vendor events intel: Update skylake to 57
perf vendor events intel: Update sapphirerapids to 1.14
perf vendor events intel: Update icelakex to 1.21
perf vendor events intel: Update icelake to 1.19
perf vendor events intel: Update cascadelakex to 1.19
perf vendor events intel: Update meteorlake to 1.03
perf vendor events intel: Add rocketlake events/metrics
perf vendor metrics intel: Make transaction metrics conditional
perf jevents: Support for has_event function
...
|
|
AMD IBS can do per-process profiling[1] and is no longer restricted to
per-cpu or systemwide only. Remove stale error message. Also, checking
just exclude_kernel is not sufficient since IBS does not support any
privilege filters. So include all exclude_* checks. And finally, move
these checks under tools/perf/arch/x86/ from generic code.
Before:
$ sudo ./perf record -e ibs_op//k -C 0
Error:
AMD IBS may only be available in system-wide/per-cpu mode. Try
using -a, or -C and workload affinity
After:
$ sudo ./perf record -e ibs_op//k -C 0
Error:
AMD IBS doesn't support privilege filtering. Try again without
the privilege modifiers (like 'k') at the end.
[1] https://git.kernel.org/torvalds/c/30093056f7b2
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: ananth.narayan@amd.com
Cc: sandipan.das@amd.com
Cc: santosh.shukla@amd.com
Cc: irogers@google.com
Cc: peterz@infradead.org
Cc: adrian.hunter@intel.com
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Link: https://lore.kernel.org/r/20230630085230.437-1-ravi.bangoria@amd.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Pass the full path including the symfs (if any) to libdw. Without this
unwinding fails with errors like this when a symfs is used:
unwind: failed with 'No such file or directory'"
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: kernel@axis.com
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230630-perf-libdw-symfs-v2-1-469760dd4d5b@axis.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
found_idx and s aren't initialized, so if no symbol is found then the
assert at the end will index off the end of the array causing a
segfault. The function also doesn't return NULL when the symbol isn't
found even if the assert passes. Fix it by initializing the values and
only setting them when something is found.
Fixes the following test failure:
$ perf test 1
1: vmlinux symtab matches kallsyms : FAILED!
Fixes: 259dce914e93 ("perf symbol: Remove symbol_name_rb_node")
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230630153840.858668-1-james.clark@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next
Pull perf tools updates from Namhyung Kim:
"Internal cleanup:
- Refactor PMU data management to handle hybrid systems in a generic
way.
Do more work in the lexer so that legacy event types parse more
easily. A side-effect of this is that if a PMU is specified,
scanning sysfs is avoided improving start-up time.
- Fix hybrid metrics, for example, the TopdownL1 works for both
performance and efficiency cores on Intel machines. To support
this, sort and regroup events after parsing.
- Add reference count checking for the 'thread' data structure.
- Lots of fixes for memory leaks in various places thanks to the ASAN
and Ian's refcount checker.
- Reduce the binary size by replacing static variables with local or
dynamically allocated memory.
- Introduce shared_mutex for annotate data to reduce memory
footprint.
- Make filesystem access library functions more thread safe.
Test:
- Organize cpu_map tests into a single suite.
- Add metric value validation test to check if the values are within
correct value ranges.
- Add perf stat stdio output test to check if event and metric names
match.
- Add perf data converter JSON output test.
- Fix a lot of issues reported by shellcheck(1). This is a
preparation to enable shellcheck by default.
- Make the large x86 new instructions test optional at build time
using EXTRA_TESTS=1.
- Add a test for libpfm4 events.
perf script:
- Add 'dsoff' outpuf field to display offset from the DSO.
$ perf script -F comm,pid,event,ip,dsoff
ls 2695501 cycles: 152cc73ef4b5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so+0x1c4b5)
ls 2695501 cycles: ffffffff99045b3e ([kernel.kallsyms])
ls 2695501 cycles: ffffffff9968e107 ([kernel.kallsyms])
ls 2695501 cycles: ffffffffc1f54afb ([kernel.kallsyms])
ls 2695501 cycles: ffffffff9968382f ([kernel.kallsyms])
ls 2695501 cycles: ffffffff99e00094 ([kernel.kallsyms])
ls 2695501 cycles: 152cc718a8d0 (/usr/lib/x86_64-linux-gnu/libselinux.so.1+0x68d0)
ls 2695501 cycles: ffffffff992a6db0 ([kernel.kallsyms])
- Adjust width for large PID/TID values.
perf report:
- Robustify reading addr2line output for srcline by checking sentinel
output before the actual data and by using timeout of 1 second.
- Allow config terms (like 'name=ABC') with breakpoint events.
$ perf record -e mem:0x55feb98dd169:x/name=breakpoint/ -p 19646 -- sleep 1
perf annotate:
- Handle x86 instruction suffix like 'l' in 'movl' generally.
- Parse instruction operands properly even with a whitespace. This is
needed for llvm-objdump output.
- Support RISC-V binutils lookup using the triplet prefixes.
- Add '<' and '>' key to navigate to prev/next symbols in TUI.
- Fix instruction association and parsing for LoongArch.
perf stat:
- Add --per-cache aggregation option, optionally specify a cache
level like `--per-cache=L2`.
$ sudo perf stat --per-cache -a -e ls_dmnd_fills_from_sys.ext_cache_remote --\
taskset -c 0-15,64-79,128-143,192-207\
perf bench sched messaging -p -t -l 100000 -g 8
# Running 'sched/messaging' benchmark:
# 20 sender and receiver threads per group
# 8 groups == 320 threads run
Total time: 7.648 [sec]
Performance counter stats for 'system wide':
S0-D0-L3-ID0 16 17,145,912 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID8 16 14,977,628 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID16 16 262,539 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID24 16 3,140 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID32 16 27,403 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID40 16 17,026 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID48 16 7,292 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID56 16 2,464 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID64 16 22,489,306 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID72 16 21,455,257 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID80 16 11,619 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID88 16 30,978 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID96 16 37,628 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID104 16 13,594 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID112 16 10,164 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID120 16 11,259 ls_dmnd_fills_from_sys.ext_cache_remote
7.779171484 seconds time elapsed
- Change default (no event/metric) formatting for default metrics so
that events are hidden and the metric and group appear.
Performance counter stats for 'ls /':
1.85 msec task-clock # 0.594 CPUs utilized
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
97 page-faults # 52.517 K/sec
2,187,173 cycles # 1.184 GHz
2,474,459 instructions # 1.13 insn per cycle
531,584 branches # 287.805 M/sec
13,626 branch-misses # 2.56% of all branches
TopdownL1 # 23.5 % tma_backend_bound
# 11.5 % tma_bad_speculation
# 39.1 % tma_frontend_bound
# 25.9 % tma_retiring
- Allow --cputype option to have any PMU name (not just hybrid).
- Fix output value not to added when it runs multiple times with -r
option.
perf list:
- Show metricgroup description from JSON file called
metricgroups.json.
- Allow 'pfm' argument to list only libpfm4 events and check each
event is supported before showing it.
JSON vendor events:
- Avoid event grouping using "NO_GROUP_EVENTS" constraints. The
topdown events are correctly grouped even if no group exists.
- Add "Default" metric group to print it in the default output. And
use "DefaultMetricgroupName" to indicate the real metric group
name.
- Add AmpereOne core PMU events.
Misc:
- Define man page date correctly.
- Track exception level properly on ARM CoreSight ETM.
- Allow anonymous struct, union or enum when retrieving type names
from DWARF.
- Fix incorrect filename when calling `perf inject --jit`.
- Handle PLT size correctly on LoongArch"
* tag 'perf-tools-for-v6.5-1-2023-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (269 commits)
perf test: Skip metrics w/o event name in stat STD output linter
perf test: Reorder event name checks in stat STD output linter
perf pmu: Remove a hard coded cpu PMU assumption
perf pmus: Add notion of default PMU for JSON events
perf unwind: Fix map reference counts
perf test: Set PERF_EXEC_PATH for script execution
perf script: Initialize buffer for regs_map()
perf tests: Fix test_arm_callgraph_fp variable expansion
perf symbol: Add LoongArch case in get_plt_sizes()
perf test: Remove x permission from lib/stat_output.sh
perf test: Rerun failed metrics with longer workload
perf test: Add skip list for metrics known would fail
perf test: Add metric value validation test
perf jit: Fix incorrect file name in DWARF line table
perf annotate: Fix instruction association and parsing for LoongArch
perf annotation: Switch lock from a mutex to a sharded_mutex
perf sharded_mutex: Introduce sharded_mutex
tools: Fix incorrect calculation of object size by sizeof
perf subcmd: Fix missing check for return value of malloc() in add_cmdname()
perf parse-events: Remove unneeded semicolon
...
|
|
Some events are dependent on firmware/kernel enablement. Allow such
events to be detected when the metric is parsed so that the metric's
event parsing doesn't fail.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Sohom Datta <sohomdatta1@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230623151016.4193660-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The thread__find_map() is to find a map for a given address in the
given thread's address space. It searches maps based on the cpu mode
and fills various information in the addr_location data structure.
It might change al->maps and al->map, but not al->thread. Then I
think no reason to not set the al->thread at the beginning.
Also get rid of the duplicate 'al->map = NULL' part.
Fixes: 0dd5041c9a0ea ("perf addr_location: Add init/exit/copy functions")
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: James Clark <james.clark@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
If loading a core PMU fails, legacy hardware/cache events may segv due
to there being no PMU. Create a placeholder empty PMU for this
case. This was discussed in:
https://lore.kernel.org/lkml/20230614151625.2077-1-yangjihong1@huawei.com/
Reported-by: Yang Jihong <yangjihong1@huawei.com>
Tested-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230627182834.117565-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Pull arm64 documentation move from Jonathan Corbet:
"Move the arm64 architecture documentation under Documentation/arch/.
This brings some order to the documentation directory, declutters the
top-level directory, and makes the documentation organization more
closely match that of the source"
* tag 'docs-arm64-move' of git://git.lwn.net/linux:
perf arm-spe: Fix a dangling Documentation/arm64 reference
mm: Fix a dangling Documentation/arm64 reference
arm64: Fix dangling references to Documentation/arm64
dt-bindings: fix dangling Documentation/arm64 reference
docs: arm64: Move arm64 documentation under Documentation/arch/
|
|
-target has been deprecated since Clang 3.4 in 2013. Use the preferred
--target=bpf form instead. This matches how we use --target= in
scripts/Makefile.clang.
Signed-off-by: Fangrui Song <maskray@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: llvm@lists.linux.dev
Cc: Ingo Molnar <mingo@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://github.com/llvm/llvm-project/commit/274b6f0c87a6a1798de0a68135afc7f95def6277
Link: https://lore.kernel.org/r/20230624002708.1907962-1-maskray@google.com
[ resolved a conflict with GEN_VMLINUX_H changes ]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The original logic was to check is_pmu_hybrid() like in the below.
It just checks the name of PMU specifically for Intel hybrid systems
which means uncore PMU events should return false.
https://lore.kernel.org/all/20230527072210.2900565-35-irogers@google.com/
The is_pmu_hybrid() was replaced by arch-agnostic way but with the
incorrect condition which was fixed for core PMUs but not uncore.
This change fixes both.
Fixes: e23421426e13 ("perf pmu: Correct perf_pmu__auto_merge_stats() affecting hybrid")
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/all/CAP-5=fXOi=xQ4=j5xAq+jWLR9n7uvfsWK+PzXkY1MZ3Fz-xccw@mail.gmail.com/
Link: https://lore.kernel.org/r/20230626053048.257959-1-irogers@google.com
[ rephrase the commit log a bit ]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
perf_event_attr__fprintf()
When printing perf_event_attr, always display perf_event_attr config and
its symbol to improve the readability of debugging information.
Before:
# perf --debug verbose=2 record -e cycles,cpu-clock,sched:sched_switch,branch-load-misses,r101,mem:0x0 -C 0 true
<SNIP>
------------------------------------------------------------
perf_event_attr:
size 136
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 1
size 136
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
------------------------------------------------------------
perf_event_attr:
type 2
size 136
config 0x143
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER
read_format ID
disabled 1
inherit 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 7
------------------------------------------------------------
perf_event_attr:
type 3
size 136
config 0x10005
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 9
------------------------------------------------------------
perf_event_attr:
type 4
size 136
config 0x101
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 10
------------------------------------------------------------
perf_event_attr:
type 5
size 136
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|IDENTIFIER
read_format ID
disabled 1
inherit 1
sample_id_all 1
exclude_guest 1
bp_type 3
{ bp_len, config2 } 0x4
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 11
<SNIP>
After:
# perf --debug verbose=2 record -e cycles,cpu-clock,sched:sched_switch,branch-load-misses,r101,mem:0x0 -C 0 true
<SNIP>
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0 (PERF_COUNT_SW_CPU_CLOCK)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
------------------------------------------------------------
perf_event_attr:
type 2 (PERF_TYPE_TRACEPOINT)
size 136
config 0x143 (sched:sched_switch)
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER
read_format ID
disabled 1
inherit 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 7
------------------------------------------------------------
perf_event_attr:
type 3 (PERF_TYPE_HW_CACHE)
size 136
config 0x10005 (PERF_COUNT_HW_CACHE_RESULT_MISS | PERF_COUNT_HW_CACHE_OP_READ | PERF_COUNT_HW_CACHE_BPU)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 9
------------------------------------------------------------
perf_event_attr:
type 4 (PERF_TYPE_RAW)
size 136
config 0x101
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 10
------------------------------------------------------------
perf_event_attr:
type 5 (PERF_TYPE_BREAKPOINT)
size 136
config 0
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|IDENTIFIER
read_format ID
disabled 1
inherit 1
sample_id_all 1
exclude_guest 1
bp_type 3
{ bp_len, config2 } 0x4
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 11
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0x9 (PERF_COUNT_SW_DUMMY)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
inherit 1
mmap 1
comm 1
freq 1
task 1
sample_id_all 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 12
<SNIP>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: anshuman.khandual@arm.com
Cc: mark.rutland@arm.com
Cc: irogers@google.com
Cc: jesussanp@google.com
Cc: peterz@infradead.org
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Cc: alexander.shishkin@linux.intel.com
Cc: mingo@redhat.com
Link: https://lore.kernel.org/r/20230623054416.160858-5-yangjihong1@huawei.com
[ fix perf import test by adding a dummy tracepoint_id__to_name() ]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
perf_event_attr__fprintf()
When printing perf_event_attr, always display perf_event_attr type and
its symbol to improve the readability of debugging information.
Before:
# perf --debug verbose=2 record -e cycles,cpu-clock,sched:sched_switch,branch-load-misses,r101,mem:0x0 -C 0 true
<SNIP>
------------------------------------------------------------
perf_event_attr:
size 136
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 1
size 136
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
------------------------------------------------------------
perf_event_attr:
type 2
size 136
config 0x143
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER
read_format ID
disabled 1
inherit 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 7
------------------------------------------------------------
perf_event_attr:
type 3
size 136
config 0x10005
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 9
------------------------------------------------------------
perf_event_attr:
type 4
size 136
config 0x101
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 10
------------------------------------------------------------
perf_event_attr:
type 5
size 136
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|IDENTIFIER
read_format ID
disabled 1
inherit 1
sample_id_all 1
exclude_guest 1
bp_type 3
{ bp_len, config2 } 0x4
------------------------------------------------------------
<SNIP>
After:
# perf --debug verbose=2 record -e cycles,cpu-clock,sched:sched_switch,branch-load-misses,r101,mem:0x0 -C 0 true
<SNIP>
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6
------------------------------------------------------------
perf_event_attr:
type 2 (PERF_TYPE_TRACEPOINT)
size 136
config 0x143
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER
read_format ID
disabled 1
inherit 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 7
------------------------------------------------------------
perf_event_attr:
type 3 (PERF_TYPE_HW_CACHE)
size 136
config 0x10005
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 9
------------------------------------------------------------
perf_event_attr:
type 4 (PERF_TYPE_RAW)
size 136
config 0x101
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 10
------------------------------------------------------------
perf_event_attr:
type 5 (PERF_TYPE_BREAKPOINT)
size 136
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|CPU|IDENTIFIER
read_format ID
disabled 1
inherit 1
sample_id_all 1
exclude_guest 1
bp_type 3
{ bp_len, config2 } 0x4
------------------------------------------------------------
<SNIP>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: anshuman.khandual@arm.com
Cc: mark.rutland@arm.com
Cc: irogers@google.com
Cc: jesussanp@google.com
Cc: peterz@infradead.org
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Cc: alexander.shishkin@linux.intel.com
Cc: mingo@redhat.com
Link: https://lore.kernel.org/r/20230623054416.160858-4-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When printing attr, members whose value is 0 will not be printed, we want
to print the case where attr->type is 0(PERF_TYPE_HARDWARE), add `_a`
param to PRINT_ATTRf macro to always print member when it is true
No functional change.
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: anshuman.khandual@arm.com
Cc: mark.rutland@arm.com
Cc: irogers@google.com
Cc: jesussanp@google.com
Cc: peterz@infradead.org
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Cc: alexander.shishkin@linux.intel.com
Cc: mingo@redhat.com
Link: https://lore.kernel.org/r/20230623054416.160858-3-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add tracepoint_id_to_name() helper to search for the trace events directory
by given event id and return the corresponding tracepoint.
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: anshuman.khandual@arm.com
Cc: mark.rutland@arm.com
Cc: irogers@google.com
Cc: jesussanp@google.com
Cc: peterz@infradead.org
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Cc: alexander.shishkin@linux.intel.com
Cc: mingo@redhat.com
Link: https://lore.kernel.org/r/20230623054416.160858-2-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Previously used to specify symbol_name_rb_node was in use.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Jason Wang <wangborong@cdjrlc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/20230623054520.4118442-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Most perf commands want to sort symbols by name and this is done via
an invasive rbtree that on 64-bit systems costs 24 bytes. Sorting the
symbols in a DSO by name is optional and not done by default, however,
if sorting is requested the 24 bytes is allocated for every
symbol.
This change removes the rbtree and uses a sorted array of symbol
pointers instead (costing 8 bytes per symbol). As the array is created
on demand then there are further memory savings. The complexity of
sorting the array and using the rbtree are the same.
To support going to the next symbol, the index of the current symbol
needs to be passed around as a pair with the current symbol. This
requires some API changes.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Jason Wang <wangborong@cdjrlc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/20230623054520.4118442-3-irogers@google.com
[ minimize change in symbols__sort_by_name() ]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Determine if symbols are sorted, set the sorted flag and sort under
the dso lock. Done in the interest of thread safety.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Jason Wang <wangborong@cdjrlc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/20230623054520.4118442-2-irogers@google.com
[ handle the similar code in util/probe-event.c ]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
struct rq is defined in vmlinux.h when the vmlinux.h is generated,
this causes a redefinition failure if it is declared in
lock_contention.bpf.c. Move the definition to vmlinux.h for
consistency with the generated version.
Fixes: 760ebc45746b ("perf lock contention: Add empty 'struct rq' to satisfy libbpf 'runqueue' type verification")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230623041405.4039475-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Commit a887466562b4 ("perf bpf skels: Stop using vmlinux.h generated
from BTF, use subset of used structs + CO-RE") made it so that
vmlinux.h was uncondtionally included from
tools/perf/util/vmlinux.h. This change reverts part of that change (so
that vmlinux.h is once again generated) and makes it so that the
vmlinux.h used at build time is selected from the VMLINUX_H
variable. By default the VMLINUX_H variable is set to the vmlinux.h
added in change a887466562b4, but if GEN_VMLINUX_H=1 is passed on the
build command line then the previous generation behavior kicks in.
The build with GEN_VMLINUX_H=1 currently fails with:
util/bpf_skel/lock_contention.bpf.c:419:8: error: redefinition of 'rq'
struct rq {};
^
/tmp/perf/util/bpf_skel/.tmp/../vmlinux.h:45630:8: note: previous definition is here
struct rq {
^
1 error generated.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230623041405.4039475-2-irogers@google.com
[ Format the error message and add a comment for GEN_VMLINUX_H ]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The property of "cpu" when it has no cpu map is true on S390 with the
PMU cpum_cf. Rather than maintain a list of such PMUs, reuse the
is_core test result from the caller.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20230623043843.4080180-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
JSON events created in pmu-events.c by jevents.py may not specify a
PMU they are associated with, in which case it is implied that it is
the first core PMU. Care is needed to select this for regular 'cpu',
s390 'cpum_cf' and ARMs many names as at the point the name is first
needed the core PMUs list hasn't been initialized. Add a helper in
perf_pmus to create this value, in the worst case by scanning sysfs.
v2. Add missing close if fdopendir fails.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230623043843.4080180-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The result of thread__find_map is the map in the passed in
addr_location. Calling addr_location__exit puts that map and so copies
need to do a map__get. Add in the corresponding map__puts.
v2. Add missing map__put when dso is missing.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ivan Babrou <ivan@cloudflare.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230623043107.4077510-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The buffer is used to save register mapping in a sample. Normally
perf samples don't have any register so the string should be empty.
But it missed to initialize the buffer when the size is 0. And it's
passed to PyUnicode_FromString() with a garbage data.
So it returns NULL due to invalid input (instead of an empty unicode
string object) which causes a segfault like below:
Thread 2.1 "perf" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7c83780 (LWP 193775)]
0x00007ffff6dbca2e in PyDict_SetItem () from /lib/x86_64-linux-gnu/libpython3.11.so.1.0
(gdb) bt
#0 0x00007ffff6dbca2e in PyDict_SetItem () from /lib/x86_64-linux-gnu/libpython3.11.so.1.0
#1 0x00007ffff6dbf848 in PyDict_SetItemString () from /lib/x86_64-linux-gnu/libpython3.11.so.1.0
#2 0x000055555575824d in pydict_set_item_string_decref (val=0x0, key=0x5555557f96e3 "iregs", dict=0x7ffff5f7f780)
at util/scripting-engines/trace-event-python.c:145
#3 set_regs_in_dict (evsel=0x555555efc370, sample=0x7fffffffb870, dict=0x7ffff5f7f780)
at util/scripting-engines/trace-event-python.c:776
#4 get_perf_sample_dict (sample=sample@entry=0x7fffffffb870, evsel=evsel@entry=0x555555efc370, al=al@entry=0x7fffffffb2e0,
addr_al=addr_al@entry=0x0, callchain=callchain@entry=0x7ffff63ef440) at util/scripting-engines/trace-event-python.c:923
#5 0x0000555555758ec1 in python_process_tracepoint (sample=0x7fffffffb870, evsel=0x555555efc370, al=0x7fffffffb2e0, addr_al=0x0)
at util/scripting-engines/trace-event-python.c:1044
#6 0x00005555555c5db8 in process_sample_event (tool=<optimized out>, event=<optimized out>, sample=<optimized out>,
evsel=0x555555efc370, machine=0x555555ef4d68) at builtin-script.c:2421
#7 0x00005555556b7793 in perf_session__deliver_event (session=0x555555ef4b60, event=0x7ffff62ff7d0, tool=0x7fffffffc150,
file_offset=30672, file_path=0x555555efb8a0 "perf.data") at util/session.c:1639
#8 0x00005555556bc864 in do_flush (show_progress=true, oe=0x555555efb700) at util/ordered-events.c:245
#9 __ordered_events__flush (oe=oe@entry=0x555555efb700, how=how@entry=OE_FLUSH__FINAL, timestamp=timestamp@entry=0)
at util/ordered-events.c:324
#10 0x00005555556bd06e in ordered_events__flush (oe=oe@entry=0x555555efb700, how=how@entry=OE_FLUSH__FINAL)
at util/ordered-events.c:342
#11 0x00005555556b9d63 in __perf_session__process_events (session=0x555555ef4b60) at util/session.c:2465
#12 perf_session__process_events (session=0x555555ef4b60) at util/session.c:2627
#13 0x00005555555cb1d0 in __cmd_script (script=0x7fffffffc150) at builtin-script.c:2839
#14 cmd_script (argc=<optimized out>, argv=<optimized out>) at builtin-script.c:4365
#15 0x0000555555650811 in run_builtin (p=p@entry=0x555555ed8948 <commands+456>, argc=argc@entry=4, argv=argv@entry=0x7fffffffe240)
at perf.c:323
#16 0x0000555555597eb3 in handle_internal_command (argv=0x7fffffffe240, argc=4) at perf.c:377
#17 run_argv (argv=<synthetic pointer>, argcp=<synthetic pointer>) at perf.c:421
#18 main (argc=4, argv=0x7fffffffe240) at perf.c:537
Fixes: 51cfe7a3e87e ("perf python: Avoid 2 leak sanitizer issues")
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
We can see the following definitions in bfd/elfnn-loongarch.c:
#define PLT_HEADER_INSNS 8
#define PLT_HEADER_SIZE (PLT_HEADER_INSNS * 4)
#define PLT_ENTRY_INSNS 4
#define PLT_ENTRY_SIZE (PLT_ENTRY_INSNS * 4)
so plt header size is 32 and plt entry size is 16 on LoongArch,
let us add LoongArch case in get_plt_sizes().
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: loongarch@lists.linux.dev
Cc: loongson-kernel@lists.loongnix.cn
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elfnn-loongarch.c
Link: https://lore.kernel.org/r/1684835873-15956-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The arm64 documentation has moved under Documentation/arch/. Fix up a
dangling reference to match.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
|
Fixes an issue where an incorrect filename was added in the DWARF line table of
an ELF object file when calling 'perf inject --jit' due to not checking the
filename of a debug entry against the repeated name marker (/xff/0).
The marker is mentioned in the tools/perf/util/jitdump.h header, which describes
the jitdump binary format, and indicitates that the filename in a debug entry
is the same as the previous enrty.
In the function emit_lineno_info(), in the file tools/perf/util/genelf-debug.c,
the debug entry filename gets compared to the previous entry filename. If they
are not the same, a new filename is added to the DWARF line table. However,
since there is no check against '\xff\0', in some cases '\xff\0' is inserted
as the filename into the DWARF line table.
This can be seen with `objdump --dwarf=line` on the ELF file after `perf inject --jit`.
It also makes no source code information show up in 'perf annotate'.
Signed-off-by: Elisabeth Panholzer <elisabeth@leaningtech.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230602123815.255001-1-paniii94@gmail.com
[ Fixed a trailing white space, removed a subject prefix ]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
In the perf annotate view for LoongArch, there is no arrowed line
pointing to the target from the branch instruction. This issue is
caused by incorrect instruction association and parsing.
$ perf record alloc-6276705c94ad1398 # rust benchmark
$ perf report
0.28 │ ori $a1, $zero, 0x63
│ move $a2, $zero
10.55 │ addi.d $a3, $a2, 1(0x1)
│ sltu $a4, $a3, $s7
9.53 │ masknez $a4, $s7, $a4
│ sub.d $a3, $a3, $a4
12.12 │ st.d $a1, $fp, 24(0x18)
│ st.d $a3, $fp, 16(0x10)
16.29 │ slli.d $a2, $a2, 0x2
│ ldx.w $a2, $s8, $a2
12.77 │ st.w $a2, $sp, 724(0x2d4)
│ st.w $s0, $sp, 720(0x2d0)
7.03 │ addi.d $a2, $sp, 720(0x2d0)
│ addi.d $a1, $a1, -1(0xfff)
12.03 │ move $a2, $a3
│ → bne $a1, $s3, -52(0x3ffcc) # 82ce8 <test::bench::Bencher::iter+0x3f4>
2.50 │ addi.d $a0, $a0, 1(0x1)
This patch fixes instruction association issues, such as associating
branch instructions with jump_ops instead of call_ops, and corrects
false instruction matches. It also implements branch instruction parsing
specifically for LoongArch. With this patch, we will be able to see the
arrowed line.
0.79 │3ec: ori $a1, $zero, 0x63
│ move $a2, $zero
10.32 │3f4:┌─→addi.d $a3, $a2, 1(0x1)
│ │ sltu $a4, $a3, $s7
10.44 │ │ masknez $a4, $s7, $a4
│ │ sub.d $a3, $a3, $a4
14.17 │ │ st.d $a1, $fp, 24(0x18)
│ │ st.d $a3, $fp, 16(0x10)
13.15 │ │ slli.d $a2, $a2, 0x2
│ │ ldx.w $a2, $s8, $a2
11.00 │ │ st.w $a2, $sp, 724(0x2d4)
│ │ st.w $s0, $sp, 720(0x2d0)
8.00 │ │ addi.d $a2, $sp, 720(0x2d0)
│ │ addi.d $a1, $a1, -1(0xfff)
11.99 │ │ move $a2, $a3
│ └──bne $a1, $s3, 3f4
3.17 │ addi.d $a0, $a0, 1(0x1)
Signed-off-by: WANG Rui <wangrui@loongson.cn>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: loongarch@lists.linux.dev
Cc: loongson-kernel@lists.loongnix.cn
Cc: Huacai Chen <chenhuacai@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Link: https://lore.kernel.org/r/20230620132025.105563-1-wangrui@loongson.cn
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Remove the "struct mutex lock" variable from annotation that is
allocated per symbol. This removes in the region of 40 bytes per
symbol allocation. Use a sharded mutex where the number of shards is
set to the number of CPUs. Assuming good hashing of the annotation
(done based on the pointer), this means in order to contend there
needs to be more threads than CPUs, which is not currently true in any
perf command. Were contention an issue it is straightforward to
increase the number of shards in the mutex.
On my Debian/glibc based machine, this reduces the size of struct
annotation from 136 bytes to 96 bytes, or nearly 30%.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andres Freund <andres@anarazel.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yuan Can <yuancan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230615040715.2064350-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Per object mutexes may come with significant memory cost while a
global mutex can suffer from unnecessary contention. A sharded mutex
is a compromise where objects are hashed and then a particular mutex
for the hash of the object used. Contention can be controlled by the
number of shards.
v2. Use hashmap.h's hash_bits in case of contention from alignment of
objects.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andres Freund <andres@anarazel.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yuan Can <yuancan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230615040715.2064350-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
What we need to calculate is the size of the object, not the size of the
pointer.
Fixed: 51cfe7a3e87e ("perf python: Avoid 2 leak sanitizer issues")
Signed-off-by: Li Dong <lidong@vivo.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: opensource.kernel@vivo.com
Link: https://lore.kernel.org/r/20230619082036.410-1-lidong@vivo.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
./tools/perf/util/parse-events.c:1466:2-3: Unneeded semicolon
Signed-off-by: Mingtong Bao <baomingtong001@208suo.com>
Link: https://lore.kernel.org/r/2c733a91717eae93119ba2226420fd8f@208suo.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
evsel__compute_group_pmu_name()
The newline is missing for pr_debug message in
evsel__compute_group_pmu_name(), fix it.
Before:
# perf --debug verbose=2 record -e cpu-clock true
<SNIP>
No PMU found for 'cycles:u'No PMU found for 'instructions:u'------------------------------------------------------------
perf_event_attr:
type 1
size 136
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|PERIOD
read_format ID|LOST
disabled 1
inherit 1
mmap 1
comm 1
freq 1
enable_on_exec 1
task 1
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
------------------------------------------------------------
<SNIP>
After:
# perf --debug verbose=2 record -e cpu-clock true
<SNIP>
No PMU found for 'cycles:u'
No PMU found for 'instructions:u'
------------------------------------------------------------
perf_event_attr:
type 1
size 136
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|PERIOD
read_format ID|LOST
disabled 1
inherit 1
mmap 1
comm 1
freq 1
enable_on_exec 1
task 1
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
------------------------------------------------------------
<SNIP>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: mark.rutland@arm.com
Cc: irogers@google.com
Cc: peterz@infradead.org
Cc: adrian.hunter@intel.com
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Cc: alexander.shishkin@linux.intel.com
Cc: kan.liang@linux.intel.com
Cc: mingo@redhat.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20230616024515.80814-1-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
In some architectures we can't encode the PMU number in
perf_event_attr.type and thus can't just ask for the same event in
multiple CPUs (and thus PMUs), that is what we want in hybrid systems
but we can't when that encoding isn't understood by the kernel, such as
in ARM64's big.LITTLE.
If that is the case, fallback to the previous behaviour till we find a
better solution to have consistent output accross architectures with
hybrid CPU configurations.
Co-developed-with: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/linux-perf-users/ZIzYgImv61OGK1wA@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Will be used when checking if we can encode the PMU number in
perf_event_attr.type, part of the logic to use in hybrid systems
(multiple types of CPUs, such as Intel's (Alder Lake, etc) or ARM's
big.LITTLE).
Co-developed-with: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/linux-perf-users/ZIzYgImv61OGK1wA@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Scanning only core PMUs is not sufficient on platforms like AMD since
perf mem on AMD uses IBS OP PMU, which is independent of core PMU.
Scan all PMUs instead of just core PMUs. There should be negligible
performance overhead because of scanning all PMUs, so we should be okay.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20230615051700.1833-4-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
perf mem/c2c on AMD internally uses IBS OP PMU, not the core PMU. Also,
AMD platforms does not have heterogeneous PMUs.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20230615051700.1833-3-ravi.bangoria@amd.com
[ Added the improved comment for perf_pmus__num_mem_pmus() as b4 didn't from the per-patch (not series) newer version ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Notion of 'core_pmus' and 'other_pmus' are independent of hw core and
uncore pmus. For example, AMD IBS PMUs are present in each SMT-thread
but they belongs to 'other_pmus'. Add a comment describing what these
list contains and how they are treated.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20230615051700.1833-2-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When -r option is used, perf stat runs the command multiple times and
update stats in the evsel->stats.res_stats for global aggregation. But
the value is never used and the value it prints at the end is just the
value from the last run. I think we should print the average number of
multiple runs.
Add evlist__copy_res_stats() to update the aggr counter (for display)
using the values in the evsel->stats.res_stats.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230616073211.1057936-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In linux-next tree the many test cases fail on s390x when running the
perf test suite, sometime the perf tool dumps core.
Output before:
6.1: Test event parsing : FAILED!
10.3: Parsing of PMU event table metrics : FAILED!
10.4: Parsing of PMU event table metrics with fake PMUs: FAILED!
17: Setup struct perf_event_attr : FAILED!
24: Number of exit events of a simple workload : FAILED!
26: Object code reading : FAILED!
28: Use a dummy software event to keep tracking : FAILED!
35: Track with sched_switch : FAILED!
42.3: BPF prologue generation : FAILED!
66: Parse and process metrics : FAILED!
68: Event expansion for cgroups : FAILED!
69.2: Perf time to TSC : FAILED!
74: build id cache operations : FAILED!
86: Zstd perf.data compression/decompression : FAILED!
87: perf record tests : FAILED!
106: Test java symbol : FAILED!
The reason for all these failure is a missing PMU. On s390x the PMU is
named cpum_cf which is not detected as core PMU. A similar patch was
added before, see commit 9bacbced0e32204d ("perf list: Add s390 support
for detailed PMU event description") which got lost during the recent
reworks. Add it again.
Output after:
10.2: PMU event map aliases : FAILED!
42.3: BPF prologue generation : FAILED!
Most test cases now work and there is not core dump anymore.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20230616081437.1932003-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It is currently possible to use --symfs along with a vmlinux which lies
outside of the symfs by passing an absolute path to --vmlinux, thanks to
the check in dso__load_vmlinux() which handles this explicitly.
However, the annotate code lacks this check and thus 'perf annotate'
does not work ("Internal error: Invalid -1 error code") for kernel
functions with this combination. Add the missing handling.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel@axis.com
Link: https://lore.kernel.org/r/20221125114210.2353820-1-vincent.whitchurch@axis.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In the default mode, the current output of the metricgroup include both
events and metrics, which is not necessary and just makes the output
hard to read. Since different ARCHs (even different generations in the
same ARCH) may use different events. The output also vary on different
platforms.
For a metricgroup, only outputting the value of each metric is good
enough.
Add a new field default_metricgroup in evsel to indicate an event of the
default metricgroup. For those events, printout() should print the
metricgroup name rather than each event.
Add perf_stat__skip_metric_event() to skip the evsel in the Default
metricgroup, if it's not running or not the metric event.
Add print_metricgroup_header_t to pass the functions which print the
display name of each metricgroup in the Default metricgroup. Support all
three output methods.
Factor out perf_stat__print_shadow_stats_metricgroup() to print out each
metrics.
On SPR:
Before:
./perf_old stat sleep 1
Performance counter stats for 'sleep 1':
0.54 msec task-clock:u # 0.001 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
68 page-faults:u # 125.445 K/sec
540,970 cycles:u # 0.998 GHz
556,325 instructions:u # 1.03 insn per cycle
123,602 branches:u # 228.018 M/sec
6,889 branch-misses:u # 5.57% of all branches
3,245,820 TOPDOWN.SLOTS:u # 18.4 % tma_backend_bound
# 17.2 % tma_retiring
# 23.1 % tma_bad_speculation
# 41.4 % tma_frontend_bound
564,859 topdown-retiring:u
1,370,999 topdown-fe-bound:u
603,271 topdown-be-bound:u
744,874 topdown-bad-spec:u
12,661 INT_MISC.UOP_DROPPING:u # 23.357 M/sec
1.001798215 seconds time elapsed
0.000193000 seconds user
0.001700000 seconds sys
After:
$ ./perf stat sleep 1
Performance counter stats for 'sleep 1':
0.51 msec task-clock:u # 0.001 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
68 page-faults:u # 132.683 K/sec
545,228 cycles:u # 1.064 GHz
555,509 instructions:u # 1.02 insn per cycle
123,574 branches:u # 241.120 M/sec
6,957 branch-misses:u # 5.63% of all branches
TopdownL1 # 17.5 % tma_backend_bound
# 22.6 % tma_bad_speculation
# 42.7 % tma_frontend_bound
# 17.1 % tma_retiring
TopdownL2 # 21.8 % tma_branch_mispredicts
# 11.5 % tma_core_bound
# 13.4 % tma_fetch_bandwidth
# 29.3 % tma_fetch_latency
# 2.7 % tma_heavy_operations
# 14.5 % tma_light_operations
# 0.8 % tma_machine_clears
# 6.1 % tma_memory_bound
1.001712086 seconds time elapsed
0.000151000 seconds user
0.001618000 seconds sys
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230616031420.3751973-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The new default mode will print the metrics as a metric group. The
metrics from the same metric group must be adjacent to each other in the
metric list. But the metric_list_cmp() sorts metrics by the number of
events.
Add a new sort for the Default metricgroup, which sorts by
default_metricgroup_name and metric_name.
Add is_default in the struct metric_event to indicate that it's from
the Default metricgroup.
Store the displayed metricgroup name of the Default metricgroup into
the metric expr for output.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230616031420.3751973-2-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Introduce a new metricgroup, Default, to tag all the metric groups which
will be collected in the default mode.
Add a new field, DefaultMetricgroupName, in the JSON file to indicate
the real metric group name. It will be printed in the default output
to replace the event names.
There is nothing changed for the output format.
On SPR, both TopdownL1 and TopdownL2 are displayed in the default
output.
On ARM, Intel ICL and later platforms (before SPR), only TopdownL1 is
displayed in the default output.
Suggested-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230615135315.3662428-4-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The annotation for hardware events is wrong on hybrid. For example,
# ./perf stat -a sleep 1
Performance counter stats for 'system wide':
32,148.85 msec cpu-clock # 32.000 CPUs utilized
374 context-switches # 11.633 /sec
33 cpu-migrations # 1.026 /sec
295 page-faults # 9.176 /sec
18,979,960 cpu_core/cycles/ # 590.378 K/sec
261,230,783 cpu_atom/cycles/ # 8.126 M/sec (54.21%)
17,019,732 cpu_core/instructions/ # 529.404 K/sec
38,020,470 cpu_atom/instructions/ # 1.183 M/sec (63.36%)
3,296,743 cpu_core/branches/ # 102.546 K/sec
6,692,338 cpu_atom/branches/ # 208.167 K/sec (63.40%)
96,421 cpu_core/branch-misses/ # 2.999 K/sec
1,016,336 cpu_atom/branch-misses/ # 31.613 K/sec (63.38%)
The hardware events have extended type on hybrid, but the evsel__match()
doesn't take it into account.
Filter the config on hybrid before checking.
With the patch,
# ./perf stat -a sleep 1
Performance counter stats for 'system wide':
32,139.90 msec cpu-clock # 32.003 CPUs utilized
343 context-switches # 10.672 /sec
32 cpu-migrations # 0.996 /sec
73 page-faults # 2.271 /sec
13,712,841 cpu_core/cycles/ # 0.000 GHz
258,301,691 cpu_atom/cycles/ # 0.008 GHz (54.20%)
12,428,163 cpu_core/instructions/ # 0.91 insn per cycle
37,786,557 cpu_atom/instructions/ # 2.76 insn per cycle (63.35%)
2,418,826 cpu_core/branches/ # 75.259 K/sec
6,965,962 cpu_atom/branches/ # 216.739 K/sec (63.38%)
72,150 cpu_core/branch-misses/ # 2.98% of all branches
1,032,746 cpu_atom/branch-misses/ # 42.70% of all branches (63.35%)
Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230615135315.3662428-2-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We write an address then a ',' to addr2line. With inline data we
generally get back (// are my comments):
0x1234 // address
foo // function name
foo.c:123 // filename:line
bar // function name
bar.c:123 // filename:line
0x000000000000000 // sentinel address created by ','
?? // unknown function name
??:0 // unknown filename:line
The code was assuming the inline data also had the address, which is
incorrect. This means the first inline function name (bar above) needs
to be checked to see if it is the sentinel, otherwise to be treated as
a function name. The regression was caused by the addition of
addresses as the kernel is reporting a symbol at address 0 (used by
GNU binutils when it interprets ',').
Committer testing:
Using:
# perf trace --call-graph=dwarf -e lock:contention_*
<SNIP>
1244.615 TaskCon~ller #/2645281 lock:contention_begin(lock_addr: 0xffff8e6748da5ab0, flags: 2)
__preempt_count_dec_and_test (inlined)
trace_contention_begin (inlined)
trace_contention_begin (inlined)
rwsem_down_read_slowpath ([kernel.kallsyms])
__preempt_count_dec_and_test (inlined)
trace_contention_begin (inlined)
trace_contention_begin (inlined)
rwsem_down_read_slowpath ([kernel.kallsyms])
__down_read_common (inlined)
__down_read (inlined)
down_read ([kernel.kallsyms])
arch_static_branch (inlined)
static_key_false (inlined)
__mmap_lock_trace_acquire_returned (inlined)
mmap_read_lock (inlined)
do_user_addr_fault ([kernel.kallsyms])
arch_local_irq_disable (inlined)
handle_page_fault (inlined)
exc_page_fault ([kernel.kallsyms])
asm_exc_page_fault ([kernel.kallsyms])
[0x4def008] (/usr/lib64/firefox/libxul.so)
1244.619 TaskCon~ller #/2645281 lock:contention_end(lock_addr: 0xffff8e6748da5ab0)
__preempt_count_dec_and_test (inlined)
trace_contention_end (inlined)
trace_contention_end (inlined)
rwsem_down_read_slowpath ([kernel.kallsyms])
__preempt_count_dec_and_test (inlined)
trace_contention_end (inlined)
trace_contention_end (inlined)
rwsem_down_read_slowpath ([kernel.kallsyms])
__down_read_common (inlined)
__down_read (inlined)
down_read ([kernel.kallsyms])
arch_static_branch (inlined)
static_key_false (inlined)
__mmap_lock_trace_acquire_returned (inlined)
mmap_read_lock (inlined)
do_user_addr_fault ([kernel.kallsyms])
arch_local_irq_disable (inlined)
handle_page_fault (inlined)
exc_page_fault ([kernel.kallsyms])
asm_exc_page_fault ([kernel.kallsyms])
<SNIP>
Fixes: 8dc26b6f718a8118 ("perf srcline: Make sentinel reading for binutils addr2line more robust")
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: llvm@lists.linux.dev
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230615025041.1982072-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
addr2line may fail to send expected values causing perf to wait
indefinitely. Add a 1 second timeout (twice the timeout for reading from
/proc/pid/maps) so that such reads don't cause perf to appear to lock
up.
There are already checks that the file for addr2line contains a debug
section but this isn't always sufficient. The problem was observed when
a valid elf file would set the configuration for binutils addr2line,
then a later read of vmlinux with ELF debug sections would cause a
failing write/read which would block indefinitely.
As a service to future readers, if the io hits eof or an error, cleanup
the addr2line process.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230608061812.3715566-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
There's no need to read the string ':' or '/' for PE_BP_COLON or
PE_BP_SLASH and doing so causes parse-events.y to leak memory.
The original patch has a committer note about not using these tokens
presumably as yacc spotted they were a memory leak because no
%destructor could be run. Remove the unused token workaround as there
is now no value associated with these tokens.
Fixes: f0617f526cb0c482 ("perf parse: Allow config terms with breakpoints")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230613182629.1500317-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The no group check fails if there is more than one meticgroup in the
metricgroup_no_group.
The first parameter of the match_metric() should be the string, while
the substring should be the second parameter.
Fixes: ccc66c6092802d68 ("perf metric: JSON flag to not group events if gathering a metric group")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230607162700.3234712-2-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The addr2line process is sent an address then multiple function,
filename:line "records" are read. To detect the end of output a ',' is
sent and for llvm-addr2line a ',' is then read back showing the end of
addrline's output.
For binutils addr2line the ',' translates to address 0 and we expect the
bogus filename marker "??:0" (see filename_split) to be sent from
addr2line.
For some kernels address 0 may have a mapping and so a seemingly valid
inline output is given and breaking the sentinel discovery:
```
$ addr2line -e vmlinux -f -i
,
__per_cpu_start
./arch/x86/kernel/cpu/common.c:1850
```
To avoid this problem enable the address dumping for addr2line (the -a
option). If an address of 0x0000000000000000 is read then this is the
sentinel value working around the problem above.
The filename_split still needs to check for "??:0" as bogus non-zero
addresses also need handling.
Reported-by: Changbin Du <changbin.du@huawei.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Changbin Du <changbin.du@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230613034817.1356114-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|