summaryrefslogtreecommitdiff
path: root/tools/perf/util
AgeCommit message (Collapse)Author
2018-08-08perf annotate: Add PERCENT_PERIOD_LOCAL percent valueJiri Olsa
Adding and computing local period percent value for annotation line. Storing it in struct annotation_data percent array under new PERCENT_PERIOD_LOCAL index. At the moment it's not displayed, it's coming in following patches. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20180804130521.11408-11-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08perf annotate: Add PERCENT_HITS_GLOBAL percent valueJiri Olsa
Adding and computing global hits percent value for annotation line. Storing it in struct annotation_data percent array under new PERCENT_HITS_GLOBAL index. At the moment it's not displayed, it's coming in following patches. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20180804130521.11408-10-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08perf annotate: Switch struct annotation_data::percent to arrayJiri Olsa
So we can hold multiple percent values for annotation line. The first member of this array is current local hits percent value (PERCENT_HITS_LOCAL index), so no functional change is expected. Adding annotation_data__percent function to return requested percent value from struct annotation_data. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20180804130521.11408-9-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08perf annotate: Loop group events directly in annotation__calc_percent()Jiri Olsa
We need to bring in 'struct hists' object and for that we need 'struct perf_evsel' object in the scope. Switching the group data loop with the evsel group loop. It does the same thing, but it brings evsel object, that we can use later get the 'struct hists' object. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20180804130521.11408-8-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08perf annotate: Rename hist to sym_hist in annotation__calc_percentJiri Olsa
We will need to bring in 'struct hists' variable in this scope, so it's better we do this rename first. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20180804130521.11408-7-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08perf annotate: Rename local sample variables to dataJiri Olsa
Based on previous rename, changing also the local variable names to fit properly. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20180804130521.11408-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08perf annotate: Rename struct annotation_line::samples* to data*Jiri Olsa
The name 'samples*' is little confusing because we have nested 'struct sym_hist_entry' under annotation_line struct, which holds 'nr_samples' as well. Also the holding struct name is 'annotation_data' so the 'data' name fits better. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20180804130521.11408-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08perf annotate: Get rid of annotation__scnprintf_samples_period()Jiri Olsa
We have more current function tto get the title for annotation, which is hists__scnprintf_title. They both have same output as far as the annotation's header line goes. They differ in counting of the nr_samples, hists__scnprintf_title provides more accurate number based on the setup of the symbol_conf.filter_relative variable. Plus it also displays any uid/thread/dso/socket filters/zooms if there are set any, which annotation__scnprintf_samples_period does not. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20180804130521.11408-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08perf annotate: Make annotation_line__max_percent staticJiri Olsa
There's no outside user of it. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20180804130521.11408-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08perf annotate: Make symbol__annotate_fprintf2() localJiri Olsa
There's no outside user of it. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lkml.kernel.org/r/20180804130521.11408-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08perf tools: Drop unneeded bitmap_zero() callsYury Norov
bitmap_zero() is called after bitmap_alloc() in perf code. But bitmap_alloc() internally uses calloc() which guarantees that allocated area is zeroed. So following bitmap_zero is unneeded. Drop it. This happened because of confusing name for bitmap allocator. It should has name bitmap_zalloc instead of bitmap_alloc. This series: https://lkml.org/lkml/2018/6/18/841 introduces a new API for bitmap allocations in kernel, and functions there are named correctly. Following patch propogates the API to tools, and fixes naming issue. Signed-off-by: Yury Norov <ynorov@caviumnetworks.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: David Carrillo-Cisneros <davidcc@google.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Link: http://lkml.kernel.org/r/20180623073502.16321-1-ynorov@caviumnetworks.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08perf report: Add GUI report support for s390 auxiliary traceThomas Richter
Add support for s390 auxiliary trace support. Use 'perf record -e rbd000 -- ls' to create the perf.data file. Use 'perf report' to display the auxiliary trace data. Output before: [root@s35lp76 perf]# ./perf report --stdio 0x128 [0x10]: failed to process type: 70 Error: failed to process sample [root@s35lp76 perf]# Output after: [root@s35lp76 perf]# ./perf report --stdio 18.21% 18.21% ls [kernel.kallsyms] [k] ftrace_likely_update 9.52% 9.52% ls [kernel.kallsyms] [k] lock_acquire 9.38% 9.38% ls [kernel.kallsyms] [k] lock_release 3.45% 3.45% ls [kernel.kallsyms] [k] lock_acquired 2.88% 2.88% ls [kernel.kallsyms] [k] link_path_walk 2.63% 2.63% ls [kernel.kallsyms] [k] __d_lookup 2.38% 2.38% ls [kernel.kallsyms] [k] __d_lookup_rcu 2.04% 2.04% ls [kernel.kallsyms] [k] ___might_sleep 1.83% 1.83% ls [kernel.kallsyms] [k] debug_lockdep_rcu_enabled 1.44% 1.44% ls [kernel.kallsyms] [k] dput .... Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/20180802074622.13641-4-tmricht@linux.ibm.com [ Use PRI[xd]64 to fix the build on debian:experimental-x-mips (gcc 8.1.0) and others ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08perf report: Add raw report support for s390 auxiliary traceThomas Richter
Add support for s390 auxiliary trace support. Use 'perf record -e rbd000' to create the perf.data file. The event also has the symbolic name SF_CYCLES_BASIC_DIAG, using 'perf record -e SF_CYCLES_BASIC_DIAG' is equivalent. Use 'perf report -D' to display the auxiliary trace data. Output before: 0 0 0x25a66 [0x30]: PERF_RECORD_AUXTRACE size: 0x40000 offset: 0 ref: 0 idx: 4 tid: -1 cpu: 4 Nothing else Output after: 0 0 0x25a66 [0x30]: PERF_RECORD_AUXTRACE size: 0x40000 offset: 0 ref: 0 idx: 4 tid: -1 cpu: 4 . . ... s390 AUX data: size 262144 bytes [00000000] Basic Def:0001 Inst:0000 TW AS:3 ASN:0xffff IA:0x0000000000c2f1bc CL:1 HPP:0x8000000000000000 GPP:000000000000000000 [0x000020] Diag Def:8005 [0x0000bf] Basic Def:0001 Inst:0000 TW AS:3 ASN:0xffff IA:0x0000000000c2f1bc CL:1 HPP:0x8000000000000000 GPP:000000000000000000 [0x0000df] Diag Def:8005 [0x00017e] Basic Def:0001 Inst:0000 TW AS:3 ASN:0xffff IA:0x0000000000c2f1bc CL:1 HPP:0x8000000000000000 GPP:000000000000000000 .... [0x000fc0] Trailer F T bsdes:32 dsdes:159 Overflow:0 Time:0xd4ab59a8450fa108 C:1 TOD:0xd4ab4ec98ceb3832 1:0x8000000000000000 2:0xd4ab4ec98ceb3832 This output is shown for every sampled data block. The output contains the - basic-sampling data entry - diagnostic-sampling data entry - trailer entry The basic sampling entry and diagnostic sampling entry sizes can be extracted using the trailer entries in the SDB. On older hardware these values (bsdes and dsdes in the trailer entry) are reserved and zero. Older hardware use hard coded values based on the s390 machine type. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Link: http://lkml.kernel.org/r/20180802074622.13641-3-tmricht@linux.ibm.com Link: http://lkml.kernel.org/r/eda2632e-7919-5ffd-5f68-821e77d216fa@linux.ibm.com [ Merged a fix for a 'tipe puned' problem reported by Michael Ellerman see last Link tag. ] [ Removed __packed from two structs, they're already naturally packed and having that. ] [ attribute breaks the build in gcc 8.1.1 mips, 4.4.7 x86_64, 7.1.1 ARCompact ISA, etc) ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-03perf auxtrace: Support for perf report -D for s390Thomas Richter
Add initial support for s390 auxiliary traces using the CPU-Measurement Sampling Facility. Support and ignore PERF_REPORT_AUXTRACE_INFO records in the perf data file. Later patches will show the contents of the auxiliary traces. Setup the auxtrace queues and data structures for s390. A raw dump of the perf.data file now does not show an error when an auxtrace event is encountered. Output before: [root@s35lp76 perf]# ./perf report -D -i perf.data.auxtrace 0x128 [0x10]: failed to process type: 70 Error: failed to process sample 0x128 [0x10]: event: 70 . . ... raw event: size 16 bytes . 0000: 00 00 00 46 00 00 00 10 00 00 00 00 00 00 00 00 ...F............ 0x128 [0x10]: PERF_RECORD_AUXTRACE_INFO type: 0 [root@s35lp76 perf]# Output after: # ./perf report -D -i perf.data.auxtrace |fgrep PERF_RECORD_AUXTRACE 0 0 0x128 [0x10]: PERF_RECORD_AUXTRACE_INFO type: 5 0 0 0x25a66 [0x30]: PERF_RECORD_AUXTRACE size: 0x40000 offset: 0 ref: 0 idx: 4 tid: -1 cpu: 4 .... Additional notes about the underlying hardware and software implementation, provided by Hendrik Brueckner (see Link: below). ============================================================================= The CPU-Measurement Facility (CPU-MF) provides a set of functions to obtain performance information on the mainframe. Basically, it was introduced with System z10 years ago for the z/Architecture, that means, 64-bit. For Linux, there are two facilities of interest, counter facility and sampling facility. The counter facility provides hardware counters for instructions, cycles, crypto-activities, and many more. The sampling facility is a hardware sampler that when started will write samples at a particular interval into a sampling buffer. At some point, for example, if a sample block is full, it generates an interrupt to collect samples (while the sampler continues to run). Few years ago, I started to provide the a perf PMU to use the counter and sampling facilities. Recently, the device driver was updated to also "export" the sampling buffer into the AUX area. Thomas now completed the related perf work to interpret and process these AUX data. If people are more interested in the sampling facility, they can have a look into: - The Load-Program-Parameter and the CPU-Measurement Facilities, SA23-2260-05 http://www-01.ibm.com/support/docview.wss?uid=isg26fcd1cc32246f4c8852574ce0044734a and to learn how-to use it for Linux on Z, have look at chapter 54, "Using the CPU-measurement facilities" in the: - Device Drivers, Features, and Commands, SC33-8411-34 http://public.dhe.ibm.com/software/dw/linux390/docu/l416dd34.pdf ============================================================================= Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Link: http://lkml.kernel.org/r/20180803100758.GA28475@linux.ibm.com Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/20180802074622.13641-2-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf bpf: Show better message when failing to load an objectArnaldo Carvalho de Melo
Before: libbpf: license of tools/perf/examples/bpf/etcsnoop.c is GPL libbpf: section(6) version, size 4, link 0, flags 3, type=1 libbpf: kernel version of tools/perf/examples/bpf/etcsnoop.c is 41200 libbpf: section(7) .symtab, size 120, link 1, flags 0, type=2 bpf: config program 'syscalls:sys_enter_openat' libbpf: load bpf program failed: Operation not permitted libbpf: failed to load program 'syscalls:sys_enter_openat' libbpf: failed to load object 'tools/perf/examples/bpf/etcsnoop.c' bpf: load objects failed After: (just the last line changes) bpf: load objects failed: err=-4009: (Incorrect kernel version) Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-wi44iid0yjfht3lcvplc75fm@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf list: Unify metric group description format with PMU event descriptionMichael Petlan
PMU event descriptions use 7 spaces + '[' or 8 spaces as indentation. Metric groups used a tab + '['. This patch unifies it to the way PMU event descriptions are indented. BEFORE: $ perf list [...] Metric Groups: DSB: DSB_Coverage [Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)] [...] AFTER: $ perf list [...] Metric Groups: DSB: DSB_Coverage [Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)] [...] Signed-off-by: Michael Petlan <mpetlan@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Kim Phillips <kim.phillips@arm.com> LPU-Reference: 771439042.22924766.1532986504631.JavaMail.zimbra@redhat.com Link: https://lkml.kernel.org/n/tip-mlo850517m6u1rbjndvd1bwr@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf cs-etm: Generate branch sample for CS_ETM_TRACE_ON packetLeo Yan
CS_ETM_TRACE_ON packet itself can give the info that there have a discontinuity in the trace, this patch is to add branch sample for CS_ETM_TRACE_ON packet if it is inserted in the middle of CS_ETM_RANGE packets; as result we can have hint for the trace discontinuity. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Walker <robert.walker@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1531295145-596-7-git-send-email-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf cs-etm: Generate branch sample when receiving a CS_ETM_TRACE_ON packetLeo Yan
If one CS_ETM_TRACE_ON packet is inserted, we miss to generate branch sample for the previous CS_ETM_RANGE packet. This patch is to generate branch sample when receiving a CS_ETM_TRACE_ON packet, so this can save complete info for the previous CS_ETM_RANGE packet just before CS_ETM_TRACE_ON packet. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Walker <robert.walker@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1531295145-596-6-git-send-email-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf cs-etm: Support dummy address value for CS_ETM_TRACE_ON packetLeo Yan
For CS_ETM_TRACE_ON packet, its fields 'packet->start_addr' and 'packet->end_addr' equal to 0xdeadbeefdeadbeefUL which are emitted in the decoder layer as dummy value, but the dummy value is pointless for branch sample when we use 'perf script' command to check program flow. This patch is a preparation to support CS_ETM_TRACE_ON packet for branch sample, it converts the dummy address value to zero for more readable; this is accomplished by cs_etm__last_executed_instr() and cs_etm__first_executed_instr(). The later one is a new function introduced by this patch. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Walker <robert.walker@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1531295145-596-5-git-send-email-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf cs-etm: Fix start tracing packet handlingLeo Yan
Usually the start tracing packet is a CS_ETM_TRACE_ON packet, this packet is passed to cs_etm__flush(); cs_etm__flush() will check the condition 'prev_packet->sample_type == CS_ETM_RANGE' but 'prev_packet' is allocated by zalloc() so 'prev_packet->sample_type' is zero in initialization and this condition is false. So cs_etm__flush() will directly bail out without handling the start tracing packet. This patch is to introduce a new sample type CS_ETM_EMPTY, which is used to indicate the packet is an empty packet. cs_etm__flush() will swap packets when it finds the previous packet is empty, so this can record the start tracing packet into 'etmq->prev_packet'. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Walker <robert.walker@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1531295145-596-4-git-send-email-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31perf evlist: Fix error out while applying initial delay and LBRKan Liang
'perf record' will error out if both --delay and LBR are applied. For example: # perf record -D 1000 -a -e cycles -j any -- sleep 2 Error: dummy:HG: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' # A dummy event is added implicitly for initial delay, which has the same configurations as real sampling events. The dummy event is a software event. If LBR is configured, perf must error out. The dummy event will only be used to track PERF_RECORD_MMAP while perf waits for the initial delay to enable the real events. The BRANCH_STACK bit can be safely cleared for the dummy event. After applying the patch: # perf record -D 1000 -a -e cycles -j any -- sleep 2 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.054 MB perf.data (828 samples) ] # Reported-by: Sunil K Pandey <sunil.k.pandey@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1531145722-16404-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-31Merge remote-tracking branch 'tip/perf/urgent' into perf/coreArnaldo Carvalho de Melo
To pick up fixes. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-30perf tools: Fix the build on the alpine:edge distroArnaldo Carvalho de Melo
The UAPI file byteorder/little_endian.h uses the __always_inline define without including the header where it is defined, linux/stddef.h, this ends up working in all the other distros because that file gets included seemingly by luck from one of the files included from little_endian.h. But not on Alpine:edge, that fails for all files where perf_event.h is included but linux/stddef.h isn't include before that. Adding the missing linux/stddef.h file where it breaks on Alpine:edge to fix that, in all other distros, that is just a very small header anyway. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-9r1pifftxvuxms8l7ir73p5l@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-24perf stat: Get rid of extra clock display functionJiri Olsa
There's no reason to have separate function to display clock events. It's only purpose was to convert the nanosecond value into microseconds. We do that now in generic code, if the unit and scale values are properly set, which this patch do for clock events. The output differs in the unit field being displayed in its columns rather than having it added as a suffix of the event name. Plus the value is rounded into 2 decimal numbers as for any other event. Before: # perf stat -e cpu-clock,task-clock -C 0 sleep 3 Performance counter stats for 'CPU(s) 0': 3001.123137 cpu-clock (msec) # 1.000 CPUs utilized 3001.133250 task-clock (msec) # 1.000 CPUs utilized 3.001159813 seconds time elapsed Now: # perf stat -e cpu-clock,task-clock -C 0 sleep 3 Performance counter stats for 'CPU(s) 0': 3,001.05 msec cpu-clock # 1.000 CPUs utilized 3,001.05 msec task-clock # 1.000 CPUs utilized 3.001077794 seconds time elapsed There's a small difference in csv output, as we now output the unit field, which was empty before. It's in the proper spot, so there's no compatibility issue. Before: # perf stat -e cpu-clock,task-clock -C 0 -x, sleep 3 3001.065177,,cpu-clock,3001064187,100.00,1.000,CPUs utilized 3001.077085,,task-clock,3001077085,100.00,1.000,CPUs utilized # perf stat -e cpu-clock,task-clock -C 0 -x, sleep 3 3000.80,msec,cpu-clock,3000799026,100.00,1.000,CPUs utilized 3000.80,msec,task-clock,3000799550,100.00,1.000,CPUs utilized Add perf_evsel__is_clock to replace nsec_counter. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180720110036.32251-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-24perf tools: Use perf_evsel__match instead of open coded equivalentJiri Olsa
Use perf_evsel__match() helper in perf_evsel__is_bpf_output(). Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180720110036.32251-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-24perf tools: Fix struct comm_str removal crashJiri Olsa
We occasionaly hit following assert failure in 'perf top', when processing the /proc info in multiple threads. perf: ...include/linux/refcount.h:109: refcount_inc: Assertion `!(!refcount_inc_not_zero(r))' failed. The gdb backtrace looks like this: [Switching to Thread 0x7ffff11ba700 (LWP 13749)] 0x00007ffff50839fb in raise () from /lib64/libc.so.6 (gdb) #0 0x00007ffff50839fb in raise () from /lib64/libc.so.6 #1 0x00007ffff5085800 in abort () from /lib64/libc.so.6 #2 0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6 #3 0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6 #4 0x0000000000535373 in refcount_inc (r=0x7fffdc009be0) at ...include/linux/refcount.h:109 #5 0x00000000005354f1 in comm_str__get (cs=0x7fffdc009bc0) at util/comm.c:24 #6 0x00000000005356bd in __comm_str__findnew (str=0x7fffd000b260 ":2", root=0xbed5c0 <comm_str_root>) at util/comm.c:72 #7 0x000000000053579e in comm_str__findnew (str=0x7fffd000b260 ":2", root=0xbed5c0 <comm_str_root>) at util/comm.c:95 #8 0x000000000053582e in comm__new (str=0x7fffd000b260 ":2", timestamp=0, exec=false) at util/comm.c:111 #9 0x00000000005363bc in thread__new (pid=2, tid=2) at util/thread.c:57 #10 0x0000000000523da0 in ____machine__findnew_thread (machine=0xbfde38, threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:457 #11 0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38, ... The failing assertion is this one: REFCOUNT_WARN(!refcount_inc_not_zero(r), ... The problem is that we keep global comm_str_root list, which is accessed by multiple threads during the 'perf top' startup and following 2 paths can race: thread 1: ... thread__new comm__new comm_str__findnew down_write(&comm_str_lock); __comm_str__findnew comm_str__get thread 2: ... comm__override or comm__free comm_str__put refcount_dec_and_test down_write(&comm_str_lock); rb_erase(&cs->rb_node, &comm_str_root); Because thread 2 first decrements the refcnt and only after then it removes the struct comm_str from the list, the thread 1 can find this object on the list with refcnt equls to 0 and hit the assert. This patch fixes the thread 1 __comm_str__findnew path, by ignoring objects that already dropped the refcnt to 0. For the rest of the objects we take the refcnt before comparing its name and release it afterwards with comm_str__put, which can also release the object completely. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/20180720101740.GA27176@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-24perf machine: Use last_match threads cache only in single thread modeJiri Olsa
There's an issue with using threads::last_match in multithread mode which is enabled during the perf top synthesize. It might crash with following assertion: perf: ...include/linux/refcount.h:109: refcount_inc: Assertion `!(!refcount_inc_not_zero(r))' failed. The gdb backtrace looks like this: 0x00007ffff50839fb in raise () from /lib64/libc.so.6 (gdb) #0 0x00007ffff50839fb in raise () from /lib64/libc.so.6 #1 0x00007ffff5085800 in abort () from /lib64/libc.so.6 #2 0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6 #3 0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6 #4 0x0000000000535ff9 in refcount_inc (r=0x7fffe8009a70) at ...include/linux/refcount.h:109 #5 0x0000000000536771 in thread__get (thread=0x7fffe8009a40) at util/thread.c:115 #6 0x0000000000523cd0 in ____machine__findnew_thread (machine=0xbfde38, threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:432 #7 0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38, pid=2, tid=2) at util/machine.c:489 #8 0x0000000000523f24 in machine__findnew_thread (machine=0xbfde38, pid=2, tid=2) at util/machine.c:499 #9 0x0000000000526fbe in machine__process_fork_event (machine=0xbfde38, ... The failing assertion is this one: REFCOUNT_WARN(!refcount_inc_not_zero(r), ... the problem is that we don't serialize access to threads::last_match. We serialize the access to the threads tree, but we don't care how's threads::last_match being accessed. Both locked/unlocked paths use that data and can set it. In multithreaded mode we can end up with invalid object in thread__get call, like in following paths race: thread 1 ... machine__findnew_thread down_write(&threads->lock); __machine__findnew_thread ____machine__findnew_thread th = threads->last_match; if (th->tid == tid) { thread__get thread 2 ... machine__find_thread down_read(&threads->lock); __machine__findnew_thread ____machine__findnew_thread th = threads->last_match; if (th->tid == tid) { thread__get thread 3 ... machine__process_fork_event machine__remove_thread __machine__remove_thread threads->last_match = NULL thread__put thread__put Thread 1 and 2 might got stale last_match, before thread 3 clears it. Thread 1 and 2 then race with thread 3's thread__put and they might trigger the refcnt == 0 assertion above. The patch is disabling the last_match cache for multiple thread mode. It was originally meant for single thread scenarios, where it's common to have multiple sequential searches of the same thread. In multithread mode this does not make sense, because top's threads processes different /proc entries and so the 'struct threads' object is queried for various threads. Moreover we'd need to add more locks to make it work. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20180719143345.12963-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-24perf machine: Add threads__set_last_match functionJiri Olsa
Separating threads::last_match cache set into separate threads__set_last_match function. This will be useful in following patch. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20180719143345.12963-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-24perf machine: Add threads__get_last_match functionJiri Olsa
Separating threads::last_match cache read/check into separate threads__get_last_match function. This will be useful in following patch. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20180719143345.12963-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-24perf tools: Synthesize GROUP_DESC feature in pipe modeJiri Olsa
Stephan reported, that pipe mode does not carry the group information and thus the piped report won't display the grouped output for following command: # perf record -e '{cycles,instructions,branches}' -a sleep 4 | perf report It has no idea about the group setup, so it will display events separately: # Overhead Command Shared Object ... # ........ ............... ....................... # 6.71% swapper [kernel.kallsyms] 2.28% offlineimap libpython2.7.so.1.0 0.78% perf [kernel.kallsyms] ... Fix GROUP_DESC feature record to be synthesized in pipe mode, so the report output is grouped if there are groups defined in record: # Overhead Command Shared ... # ........................ ............... ....... # 7.57% 0.16% 0.30% swapper [kernel 1.87% 3.15% 2.46% offlineimap libpyth 1.33% 0.00% 0.00% perf [kernel ... Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Stephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: David Carrillo-Cisneros <davidcc@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180712135202.14774-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-24perf script: Show correct offsets for DWARF-based unwindingSandipan Das
When perf/data is recorded with the dwarf call-graph option, the callchain shown by 'perf script' still shows the binary offsets of the userspace symbols instead of their virtual addresses. Since the symbol offset calculation is based on using virtual address as the ip, we see incorrect offsets as well. The use of virtual addresses affects the ability to find out the line number in the corresponding source file to which an address maps to as described in commit 67540759151a ("perf unwind: Use addr_location::addr instead of ip for entries"). This has also been addressed by temporarily converting the virtual address to the correponding binary offset so that it can be mapped to the source line number correctly. This is a follow-up for commit 19610184693c ("perf script: Show virtual addresses instead of offsets"). This can be verified on a powerpc64le system running Fedora 27 as shown below: # perf probe -x /usr/lib64/libc-2.26.so -a inet_pton # perf record -e probe_libc:inet_pton --call-graph=dwarf ping -6 -c 1 ::1 Before: # perf report --stdio --no-children -s sym,srcline -g address # Samples: 1 of event 'probe_libc:inet_pton' # Event count (approx.): 1 # # Overhead Symbol Source:Line # ........ .................... ........... # 100.00% [.] __GI___inet_pton inet_pton.c | ---gaih_inet getaddrinfo.c:537 (inlined) __GI_getaddrinfo getaddrinfo.c:2304 (inlined) main ping.c:519 generic_start_main libc-start.c:308 (inlined) __libc_start_main libc-start.c:102 ... # perf script -F comm,ip,sym,symoff,srcline,dso ping 15af28 __GI___inet_pton+0xffff000099160008 (/usr/lib64/libc-2.26.so) libc-2.26.so[ffff80004ca0af28] 10fa53 gaih_inet+0xffff000099160f43 libc-2.26.so[ffff80004c9bfa53] (inlined) 1105b3 __GI_getaddrinfo+0xffff000099160163 libc-2.26.so[ffff80004c9c05b3] (inlined) 2d6f main+0xfffffffd9f1003df (/usr/bin/ping) ping[fffffffecf882d6f] 2369f generic_start_main+0xffff00009916013f libc-2.26.so[ffff80004c8d369f] (inlined) 23897 __libc_start_main+0xffff0000991600b7 (/usr/lib64/libc-2.26.so) libc-2.26.so[ffff80004c8d3897] After: # perf report --stdio --no-children -s sym,srcline -g address # Samples: 1 of event 'probe_libc:inet_pton' # Event count (approx.): 1 # # Overhead Symbol Source:Line # ........ .................... ........... # 100.00% [.] __GI___inet_pton inet_pton.c | ---gaih_inet.constprop.7 getaddrinfo.c:537 getaddrinfo getaddrinfo.c:2304 main ping.c:519 generic_start_main.isra.0 libc-start.c:308 __libc_start_main libc-start.c:102 ... # perf script -F comm,ip,sym,symoff,srcline,dso ping 7fffb38aaf28 __GI___inet_pton+0x8 (/usr/lib64/libc-2.26.so) inet_pton.c:68 7fffb385fa53 gaih_inet.constprop.7+0xf43 (/usr/lib64/libc-2.26.so) getaddrinfo.c:537 7fffb38605b3 getaddrinfo+0x163 (/usr/lib64/libc-2.26.so) getaddrinfo.c:2304 130782d6f main+0x3df (/usr/bin/ping) ping.c:519 7fffb377369f generic_start_main.isra.0+0x13f (/usr/lib64/libc-2.26.so) libc-start.c:308 7fffb3773897 __libc_start_main+0xb7 (/usr/lib64/libc-2.26.so) libc-start.c:102 Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Fixes: 67540759151a ("perf unwind: Use addr_location::addr instead of ip for entries") Link: http://lkml.kernel.org/r/20180703120555.32971-1-sandipan@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-24perf trace arm64: Use generated syscall tableKim Phillips
This should speed up accessing new system calls introduced with the kernel rather than waiting for libaudit updates to include them. It also enables users to specify wildcards, for example, perf trace -e 'open*', just like was already possible on x86, s390, and powerpc, which means arm64 can now pass the "Check open filename arg using perf trace + vfs_getname" test. Signed-off-by: Kim Phillips <kim.phillips@arm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20180706163454.f714b9ab49ecc8566a0b3565@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-24perf stat: Add transaction flag (-T) support for s390Thomas Richter
The 'perf stat' command line flag -T to display transaction counters is currently supported for x86 only. Add support for s390. It is based on the metrics flag -M transaction using the architecture dependent JSON files. This requires a metric named "transaction" in the JSON files for the platform. Introduce a new function metricgroup__has_metric() to check for the existence of a metric_name transaction. As suggested by Andi Kleen, this is the new approach to support transactions counters. Other architectures will follow. Output before: [root@p23lp27 perf]# ./perf stat -T -- sleep 1 Cannot set up transaction events [root@p23lp27 perf]# Output after: [root@s35lp76 perf]# ./perf stat -T -- ~/mytesttx 1 >/tmp/111 Performance counter stats for '/root/mytesttx 1': 1 tx_c_tend # 13.0 transaction 1 tx_nc_tend 11 tx_nc_tabort 0 tx_c_tabort_special 0 tx_c_tabort_no_special 0.001070109 seconds time elapsed [root@s35lp76 perf]# Suggested-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/20180626071701.58190-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-24Revert "perf list: Add s390 support for detailed/verbose PMU event description"Thomas Richter
This reverts commit 038586c34301578e538f6c5aa79ca82bce1b9152. Fix the support of detailed/verbose PMU event description by using the "Unit": keyword in the json files to address event names refering to the /sys/devices/cpum_[cs]f devices. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/20180621080452.61012-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-24perf cs-etm: Bail out immediately for instruction sample failureLeo Yan
If the instruction sample failure has happened, it isn't necessary to execute to the end of the function cs_etm__flush(). This commit is to bail out immediately and return the error code. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Walker <robert.walker@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1529298599-3876-3-git-send-email-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-24perf cs-etm: Introduce invalid address macroLeo Yan
This patch introduces invalid address macro and uses it to replace dummy value '0xdeadbeefdeadbeefUL'. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Walker <robert.walker@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1529298599-3876-2-git-send-email-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-24perf hists: Clarify callchain disabling when availableArnaldo Carvalho de Melo
We want to allow having mixed events with/without callchains, not using a global flag to show callchains, but allowing supressing callchains when they are present. So invert the logic of the last parameter to hists__fprint() to that effect. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-ohqyisr6qge79qa95ojslptx@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-11perf script python: Fix dict reference countingJanne Huttunen
The dictionaries are attached to the parameter tuple that steals the references and takes care of releasing them when appropriate. The code should not decrement the reference counts explicitly. E.g. if libpython has been built with reference debugging enabled, the superfluous DECREFs will trigger this error when running perf script: Fatal Python error: Objects/tupleobject.c:238 object at 0x7f10f2041b40 has negative ref count -1 Aborted (core dumped) If the reference debugging is not enabled, the superfluous DECREFs might cause the dict objects to be silently released while they are still in use. This may trigger various other assertions or just cause perf crashes and/or weird and unexpected data changes in the stored Python objects. Signed-off-by: Janne Huttunen <janne.huttunen@nokia.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jaroslav Skarvada <jskarvad@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1531133990-17485-1-git-send-email-janne.huttunen@nokia.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-11perf llvm-utils: Remove bashism from kernel include fetch scriptKim Phillips
Like system(), popen() calls /bin/sh, which may/may not be bash. Script when run on dash and encounters the line, yields: exit: Illegal number: -1 checkbashisms report on script content: possible bashism (exit|return with negative status code): exit -1 Remove the bashism and use the more portable non-zero failure status code 1. Signed-off-by: Kim Phillips <kim.phillips@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan@linux.vnet.ibm.com> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20180629124652.8d0af7e2281fd3fd8262cacc@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-11perf tools: Generate a Python script compatible with Python 2 and 3Jeremy Cline
When generating a Python script with "perf script -g python", produce one that is compatible with Python 2 and 3. The difference between the two generated scripts is: --- python2-perf-script.py 2018-05-08 15:35:00.865889705 -0400 +++ python3-perf-script.py 2018-05-08 15:34:49.019789564 -0400 @@ -7,6 +7,8 @@ # be retrieved using Python functions of the form common_*(context). # See the perf-script-python Documentation for the list of available functions. +from __future__ import print_function + import os import sys @@ -18,10 +20,10 @@ def trace_begin(): - print "in trace_begin" + print("in trace_begin") def trace_end(): - print "in trace_end" + print("in trace_end") def raw_syscalls__sys_enter(event_name, context, common_cpu, common_secs, common_nsecs, common_pid, common_comm, @@ -29,26 +31,26 @@ print_header(event_name, common_cpu, common_secs, common_nsecs, common_pid, common_comm) - print "id=%d, args=%s" % \ - (id, args) + print("id=%d, args=%s" % \ + (id, args)) - print 'Sample: {'+get_dict_as_string(perf_sample_dict['sample'], ', ')+'}' + print('Sample: {'+get_dict_as_string(perf_sample_dict['sample'], ', ')+'}') for node in common_callchain: if 'sym' in node: - print "\t[%x] %s" % (node['ip'], node['sym']['name']) + print("\t[%x] %s" % (node['ip'], node['sym']['name'])) else: - print " [%x]" % (node['ip']) + print(" [%x]" % (node['ip'])) - print "\n" + print() def trace_unhandled(event_name, context, event_fields_dict, perf_sample_dict): - print get_dict_as_string(event_fields_dict) - print 'Sample: {'+get_dict_as_string(perf_sample_dict['sample'], ', ')+'}' + print(get_dict_as_string(event_fields_dict)) + print('Sample: {'+get_dict_as_string(perf_sample_dict['sample'], ', ')+'}') def print_header(event_name, cpu, secs, nsecs, pid, comm): - print "%-20s %5u %05u.%09u %8u %-20s " % \ - (event_name, cpu, secs, nsecs, pid, comm), + print("%-20s %5u %05u.%09u %8u %-20s " % \ + (event_name, cpu, secs, nsecs, pid, comm), end="") def get_dict_as_string(a_dict, delimiter=' '): return delimiter.join(['%s=%s'%(k,str(v))for k,v in sorted(a_dict.items())]) Signed-off-by: Jeremy Cline <jeremy@jcline.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Herton Krzesinski <herton@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/0100016341a7278a-d178c724-2b0f-49ca-be93-80a7d51aaa0d-000000@email.amazonses.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-26Merge tag 'perf-urgent-for-mingo-4.18-20180625' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: perf bench: (Jiri Olsa): - Fix NUMA report output code handling of less than 1s runtimes. perf script: (Ravi Bangoria) - Add missing output fields in a 'perf script -h' hint. - Fix crash because of missing evsel->priv. - Fix crash caused by accessing feat_ops[HEADER_LAST_FEATURE], which is just a end of features header marker. perf stat: (Thomas Richter) - Remove duplicate event counting perf test: - Wire parsing error handling in 'parse events' test (Jiri Olsa) - Fix 'session topology' test on s/390 (Thomas Richter) eBPF: (Yonghong Song) - Fix a clang 7.0 compilation error when building perf linking with libclang intel-pt: (Adrian Hunter) - Fix packet decoding of CYC packets. Copies of kernel files: (Arnaldo Carvalho de Melo) - Synchronize drm/drm.h UAPI - Update x86's syscall_64.tbl, adding support for 'io_pgetevents' and 'rseq' in 'perf trace'. - Update powerpc uapi/asm/unistd.h, adding support for the 'rseq' syscall. - Update if_link.h and bpf.h, no effect on tool features. PowerPC: (Sandipan Das) - Fix crash if callchain is empty. s/390: (Thomas Richter) - Support random socked_id assignment in the perf header. - Support s390 random socket_id assignment in perf.data file. - Make PMU alias definitions taken from sysfs and JSON files comparable by normalizing them wrt spaces and newlines. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-25perf tools: Fix crash caused by accessing feat_ops[HEADER_LAST_FEATURE]Ravi Bangoria
perf_event__process_feature() accesses feat_ops[HEADER_LAST_FEATURE] which is not defined and thus perf is crashing. HEADER_LAST_FEATURE is used as an end marker for the perf report but it's unused for perf script/annotate. Ignore HEADER_LAST_FEATURE for perf script/annotate, just like it is done in 'perf report'. Before: # perf record -o - ls | perf script <SNIP 'ls' output> Segmentation fault (core dumped) # After: # perf record -o - ls | perf script <SNIP 'ls' output> Segmentation fault (core dumped) ls 7031 4392.099856: 250000 cpu-clock:uhH: 7f5e0ce7cd60 ls 7031 4392.100355: 250000 cpu-clock:uhH: 7f5e0c706ef7 # Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: David Carrillo-Cisneros <davidcc@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: 57b5de463925 ("perf report: Support forced leader feature in pipe mode") Link: http://lkml.kernel.org/r/20180625124220.6434-4-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-25perf stat: Remove duplicate event countingThomas Richter
'perf stat' shows a mismatch in perf stat regarding counter names on s390: Run command: [root@s35lp76 perf]# ./perf stat -e tx_nc_tend -v -- ~/mytesttx 1 >/tmp/111 tx_nc_tend: 1 573146 573146 tx_nc_tend: 1 573146 573146 Performance counter stats for '/root/mytesttx 1': 3 tx_nc_tend 0.001037252 seconds time elapsed [root@s35lp76 perf]# shows transaction counter tx_nc_tend with value 3 but it was triggered only once as seen by the output of mytesttx. When looking up the event name tx_nc_tend the following function sequence is called: parse_events_multi_pmu_add() +--> perf_pmu__scan() being called with NULL argument +--> pmu_read_sysfs() scans directory ../devices/ for all PMUs +--> perf_pmu__find() tries to find a PMU in the global pmu list. +--> pmu_lookup() called to read all file entries when not in global list. pmu_lookup() causes the issue. It calls +---> pmu_aliases() to read all the entries in the PMU directory. On s390 this is named /sys/devices/cpum_cf/events. +--> pmu_aliases_parse() reads all files and creates an alias for each file name. So we end up with first entry created by reading the sysfs file [root@s35lp76 perf]# cat /sys/devices/cpum_cf /events/TX_NC_TEND event=0x008d [root@s35lp76 perf]# Debug output shows this entry tx_nc_tend -> 'cpum_cf'/'event=0x008d '/ After all files in this directory have been read and aliases created this function is called: +--> pmu_add_cpu_aliases() This function looks up the CPU tables created by the json files. With json files for s390 now available all the aliases are added to the PMU alias list a second time. The second entry is added by reading the json file converted by jevent resulting in file pmu-events/pmu-events.c: { .name = "tx_nc_tend", .event = "event=0x8d", .desc = "Unit: cpum_cf Completed TEND \ instructions \ in non-constrained TX mode", .topic = "extended", .long_desc = "A TEND instruction has \ completed in a \ non-constrained \ transactional-execution mode", .pmu = "cpum_cf", }, Debug output shows this entry tx_nc_tend -> 'cpum_cf'/'event=0x8d'/ Function pmu_aliases_parse() and pmu_add_cpu_aliases() both use __perf_pmu__new_alias() to add an alias to the PMU alias list. There is no check if an alias already exist So we end up with 2 entries for tx_nc_tend in the PMU alias list. Having set up the PMU alias list for this PMU now parse_events_multi_add_pmu() reads the complete alias list and adds each alias with parse_events_add_pmu() to the global perfev_list. This causes the alias to be added multiple times to the event list. Fix this by making __perf_pmu__new_alias() to merge alias definitions if an alias is already on the alias list. Also print a debug message when the alias has mismatches in some fields. Output before: [root@s35lp76 perf]# ./perf stat -e tx_nc_tend -v \ -- ~/mytesttx 1 >/tmp/111 tx_nc_tend: 1 551446 551446 Performance counter stats for '/root/mytesttx 1': 3 tx_nc_tend 0.000961134 seconds time elapsed [root@s35lp76 perf]# Output after: [root@s35lp76 perf]# ./perf stat -e tx_nc_tend -v \ -- ~/mytesttx 1 >/tmp/111 tx_nc_tend: 1 551446 551446 Performance counter stats for '/root/mytesttx 1': 1 tx_nc_tend 0.000961134 seconds time elapsed [root@s35lp76 perf]# Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/20180615101105.47047-3-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-25perf alias: Rebuild alias expression string to make it comparableThomas Richter
PMU alias definitions in sysfs files may have spaces, newlines and numbers with leading zeroes. Some alias definitions may also appear in JSON files without spaces, etc. Scan alias definitions and remove leading zeroes, spaces, newlines, etc and rebuild string to make alias->str member comparable. s390 for example has terms specified as event=0x0091 (read from files ../<PMU>/events/<FILE> and terms specified as event=0x91 (read from JSON files). Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/20180615101105.47047-2-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-25perf alias: Remove trailing newline when reading sysfs filesThomas Richter
Remove a trailing newline when reading sysfs file contents such as /sys/devices/cpum_cf/events/TX_NC_TEND. This shows when verbose option -v is used. Output before: tx_nc_tend -> 'cpum_cf'/'event=0x008d '/ Output after: tx_nc_tend -> 'cpum_cf'/'event=0x8d'/ Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/20180615101105.47047-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-25perf tools: Fix a clang 7.0 compilation errorYonghong Song
Arnaldo reported the perf build failure with latest llvm/clang compiler (7.0). $ make LIBCLANGLLVM=1 -C tools/perf/ <SNIP> CC /tmp/tmp.t53Qo38zci/tests/kmod-path.o util/c++/clang.cpp: In function ‘std::unique_ptr<llvm::SmallVectorImpl<char> > perf::getBPFObjectFromModule(llvm::Module*)’: util/c++/clang.cpp:150:43: error: no matching function for call to ‘llvm::TargetMachine::addPassesToEmitFile(llvm::legacy::PassManager&, llvm::raw_svector_ostream&, llvm::TargetMachine::CodeGenFileType)’ TargetMachine::CGFT_ObjectFile)) { ^ In file included from util/c++/clang.cpp:25:0: /usr/local/include/llvm/Target/TargetMachine.h:254:16: note: candidate: virtual bool llvm::TargetMachine::addPassesToEmitFile( llvm::legacy::PassManagerBase&, llvm::raw_pwrite_stream&, llvm::raw_pwrite_stream*, llvm::TargetMachine::CodeGenFileType, bool, llvm::MachineModuleInfo*) virtual bool addPassesToEmitFile(PassManagerBase &, raw_pwrite_stream &, ^~~~~~~~~~~~~~~~~~~ /usr/local/include/llvm/Target/TargetMachine.h:254:16: note: candidate expects 6 arguments, 3 provided mv: cannot stat '/tmp/tmp.t53Qo38zci/util/c++/.clang.o.tmp': No such file or directory make[7]: *** [/home/acme/git/perf/tools/build/Makefile.build:101: /tmp/tmp.t53Qo38zci/util/c++/clang.o] Error 1 make[6]: *** [/home/acme/git/perf/tools/build/Makefile.build:139: c++] Error 2 make[5]: *** [/home/acme/git/perf/tools/build/Makefile.build:139: util] Error 2 make[5]: *** Waiting for unfinished jobs.... CC /tmp/tmp.t53Qo38zci/tests/thread-map.o The function addPassesToEmitFile signature changed in llvm 7.0 and such a change caused the failure. This patch fixed the issue with using proper function signatures under different compiler versions. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Yonghong Song <yhs@fb.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20180616174739.1076733-1-yhs@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-25perf intel-pt: Fix packet decoding of CYC packetsAdrian Hunter
Use a 64-bit type so that the cycle count is not limited to 32-bits. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1528371002-8862-1-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-25perf record: Support s390 random socket_id assignmentThomas Richter
On s390 the socket identifier assigned to a CPU identifier is random and (depending on the configuration of the LPAR) may be higher than the CPU identifier. This is currently not supported. Fix this by allowing arbitrary socket identifiers being assigned to CPU id. Output before: [root@p23lp27 perf]# ./perf report --header -I -v ... socket_id number is too big.You may need to upgrade the perf tool. Error: The perf.data file has no samples! # ======== # captured on : Tue May 29 09:29:57 2018 # header version : 1 ... # Core ID and Socket ID information is not available ... [root@p23lp27 perf]# Output after: [root@p23lp27 perf]# ./perf report --header -I -v ... Error: The perf.data file has no samples! # ======== # captured on : Tue May 29 09:29:57 2018 # header version : 1 ... # CPU 0: Core ID 0, Socket ID 6 # CPU 1: Core ID 1, Socket ID 3 # CPU 2: Core ID -1, Socket ID -1 ... [root@p23lp27 perf]# Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/20180611073153.15592-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-24Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "A pile of perf updates: Kernel side: - Remove an incorrect warning in uprobe_init_insn() when insn_get_length() fails. The error return code is handled at the call site. - Move the inline keyword to the right place in the perf ringbuffer code to address a W=1 build warning. Tooling: perf stat: - Fix metric column header display alignment - Improve error messages for default attributes, providing better output for error in command line. - Add --interval-clear option, to provide a 'watch' like printing perf script: - Show hw-cache events too perf c2c: - Fix data dependency problem in layout of 'struct c2c_hist_entry' Core: - Do not blindly assume that 'struct perf_evsel' can be obtained via a straight forward container_of() as there are call sites which hand in a plain 'struct hist' which is not part of a container. - Fix error index in the PMU event parser, so that error messages can point to the problematic token" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Move the inline keyword at the beginning of the function declaration uprobes/x86: Remove incorrect WARN_ON() in uprobe_init_insn() perf script: Show hw-cache events perf c2c: Keep struct hist_entry at the end of struct c2c_hist_entry perf stat: Add event parsing error handling to add_default_attributes perf stat: Allow to specify specific metric column len perf stat: Fix metric column header display alignment perf stat: Use only color_fprintf call in print_metric_only perf stat: Add --interval-clear option perf tools: Fix error index for pmu event parser perf hists: Reimplement hists__has_callchains() perf hists browser gtk: Use hist_entry__has_callchains() perf hists: Make hist_entry__has_callchains() work with 'perf c2c' perf hists: Save the callchain_size in struct hist_entry
2018-06-15docs: Fix some broken referencesMauro Carvalho Chehab
As we move stuff around, some doc references are broken. Fix some of them via this script: ./scripts/documentation-file-ref-check --fix Manually checked if the produced result is valid, removing a few false-positives. Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Stephen Boyd <sboyd@kernel.org> Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>