summaryrefslogtreecommitdiff
path: root/tools/perf/Documentation
AgeCommit message (Collapse)Author
2021-02-08perf stat: Support L2 Topdown eventsKan Liang
The TMA method level 2 metrics is supported from the Intel Sapphire Rapids server, which expose four L2 Topdown metrics events to user space. There are eight L2 events in total. The other four L2 Topdown metrics events are calculated from the corresponding L1 and the exposed L2 events. Now, the --topdown prints the complete top-down metrics that supported by the CPU. For the Intel Sapphire Rapids server, there are 4 L1 events and 8 L2 events displyed in one line. Add a new option, --td-level, to display the top-down statistics that equal to or lower than the input level. The L2 event is marked only when both its L1 parent event and itself crosse the threshold. Here is an example: $ perf stat --topdown --td-level=2 --no-metric-only sleep 1 Topdown accuracy may decrease when measuring long periods. Please print the result regularly, e.g. -I1000 Performance counter stats for 'sleep 1': 16,734,390 slots 2,100,001 topdown-retiring # 12.6% retiring 2,034,376 topdown-bad-spec # 12.3% bad speculation 4,003,128 topdown-fe-bound # 24.1% frontend bound 328,125 topdown-heavy-ops # 2.0% heavy operations # 10.6% light operations 1,968,751 topdown-br-mispredict # 11.9% branch mispredict # 0.4% machine clears 2,953,127 topdown-fetch-lat # 17.8% fetch latency # 6.3% fetch bandwidth 5,906,255 topdown-mem-bound # 35.6% memory bound # 15.4% core bound Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/1612296553-21962-9-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08perf report: Support instruction latencyKan Liang
The instruction latency information can be recorded on some platforms, e.g., the Intel Sapphire Rapids server. With both memory latency (weight) and the new instruction latency information, users can easily locate the expensive load instructions, and also understand the time spent in different stages. The users can optimize their applications in different pipeline stages. The 'weight' field is shared among different architectures. Reusing the 'weight' field may impacts other architectures. Add a new field to store the instruction latency. Like the 'weight' support, introduce a 'ins_lat' for the global instruction latency, and a 'local_ins_lat' for the local instruction latency version. Add new sort functions, INSTR Latency and Local INSTR Latency, accordingly. Add local_ins_lat to the default_mem_sort_order[]. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/1612296553-21962-7-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08perf tools: Support data block and addr blockKan Liang
Two new data source fields, to indicate the block reasons of a load instruction, are introduced on the Intel Sapphire Rapids server. The fields can be used by the memory profiling. Add a new sort function, SORT_MEM_BLOCKED, for the two fields. For the previous platforms or the block reason is unknown, print "N/A" for the block reason. Add blocked as a default mem sort key for perf report and perf mem report. Committer testing: So in machines without this capability we get a "N/A" filling the new "Blocked" column: $ perf mem record ls arch certs CREDITS Documentation include ipc Kconfig lib MAINTAINERS mm samples security usr block COPYING crypto drivers fs init Kbuild kernel LICENSES Makefile net README scripts sound tools virt [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.008 MB perf.data (17 samples) ] $ $ perf mem report --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 6 of event 'cpu/mem-loads,ldlat=30/Pu' # Total weight : 1381 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked # # Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access Locked Blocked # ........ ....... ............ .................... ....................... ............. ...................... ............ ..... ............ ...... ....... # 32.87% 1 454 Local RAM or RAM hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91cef3078 libc-2.31.so Hit L1 or L2 hit No N/A 25.56% 1 353 LFB or LFB hit [.] strcmp ld-2.31.so [.] 0x00005586973855ca ls None L1 or L2 hit No N/A 22.59% 1 312 LFB or LFB hit [.] _dl_cache_libcmp ld-2.31.so [.] 0x00007fe91d0e3b18 ld.so.cache None L1 or L2 hit No N/A 8.47% 1 117 LFB or LFB hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91ceee570 libc-2.31.so None L1 or L2 hit No N/A 6.88% 1 95 LFB or LFB hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91ceed490 libc-2.31.so None L1 or L2 hit No N/A 3.62% 1 50 LFB or LFB hit [.] _dl_cache_libcmp ld-2.31.so [.] 0x00007fe91d0ebe60 ld.so.cache None L1 or L2 hit No N/A # Samples: 11 of event 'cpu/mem-stores/Pu' # Total weight : 11 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked # # Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access Locked Blocked # ........ ....... ............ ............. ....................... ............. ...................... ........... ..... .......... ...... ....... # 9.09% 1 0 L1 hit [.] __strcoll_l libc-2.31.so [.] 0x00007fffe5648fc8 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] _dl_lookup_symbol_x ld-2.31.so [.] 0x00007fffe56490b8 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] _dl_name_match_p ld-2.31.so [.] 0x00007fffe56487d8 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] _dl_start ld-2.31.so [.] start_time+0x0 ld-2.31.so N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] _dl_sysdep_start ld-2.31.so [.] 0x00007fffe56494b8 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] do_lookup_x ld-2.31.so [.] 0x00007fffe5648ff8 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] do_lookup_x ld-2.31.so [.] 0x00007fffe5649064 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 hit [.] do_lookup_x ld-2.31.so [.] 0x00007fffe5649130 [stack] N/A N/A N/A N/A 9.09% 1 0 L1 miss [.] _dl_start ld-2.31.so [.] _rtld_global+0xaf8 ld-2.31.so N/A N/A N/A N/A 9.09% 1 0 L1 miss [.] _dl_start ld-2.31.so [.] _rtld_global+0xc28 ld-2.31.so N/A N/A N/A N/A 9.09% 1 0 L1 miss [.] _dl_start ld-2.31.so [.] 0x00007fffe56495b8 [stack] N/A N/A N/A N/A # (Tip: Show user configuration overrides: perf config --user --list) $ Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/1612296553-21962-4-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03perf script: Support DSO filter like in other perf toolsJin Yao
Other perf tool builtins already supported a DSO filter. For example: $ perf report --dsos a,b,c which only considers symbols in these dsos. Now the DSO filter is supported in 'perf script': root@kbl-ppc:~# ./perf script --dsos "[kernel.kallsyms]" perf 18123 [000] 6142863.075104: 1 cycles: ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms]) perf 18123 [000] 6142863.075107: 1 cycles: ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms]) perf 18123 [000] 6142863.075108: 10 cycles: ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms]) perf 18123 [000] 6142863.075109: 273 cycles: ffffffff9ca7730a native_write_msr+0xa ([kernel.kallsyms]) perf 18123 [000] 6142863.075110: 7684 cycles: ffffffff9ca3c9c0 native_sched_clock+0x50 ([kernel.kallsyms]) perf 18123 [000] 6142863.075112: 213017 cycles: ffffffff9d765a92 syscall_exit_to_user_mode+0x32 ([kernel.kallsyms]) perf 18123 [001] 6142863.075156: 1 cycles: ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms]) perf 18123 [001] 6142863.075158: 1 cycles: ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms]) perf 18123 [001] 6142863.075159: 17 cycles: ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms]) Committer testing: $ perf script ls 2364888 29303.010949: 1 cycles:u: ffffffffa4bbc6a9 [unknown] ([unknown]) ls 2364888 29303.010957: 1 cycles:u: ffffffffa429ef48 [unknown] ([unknown]) ls 2364888 29303.010961: 1 cycles:u: ffffffffa4260133 [unknown] ([unknown]) ls 2364888 29303.010964: 5 cycles:u: ffffffffa429efad [unknown] ([unknown]) ls 2364888 29303.010967: 41 cycles:u: ffffffffa42a4586 [unknown] ([unknown]) ls 2364888 29303.010972: 435 cycles:u: ffffffffa429efe0 [unknown] ([unknown]) ls 2364888 29303.010978: 5142 cycles:u: 7f9b95bc2abf __GI___tunables_init+0x11f (/usr/lib64/ld-2.32.so) ls 2364888 29303.011006: 38551 cycles:u: ffffffffa4290f61 [unknown] ([unknown]) ls 2364888 29303.011486: 238234 cycles:u: 7f9b95bb7741 _dl_relocate_object+0xa71 (/usr/lib64/ld-2.32.so) ls 2364888 29303.011937: 415870 cycles:u: 7f9b95a1c80e __strcoll_l+0xe (/usr/lib64/libc-2.32.so) $ Before: $ perf script --dsos /usr/lib64/libc-2.32.so |& head -5 Error: unknown option `dsos' Usage: perf script [<options>] or: perf script [<options>] record <script> [<record-options>] <command> or: perf script [<options>] report <script> [script-args] $ After: $ perf script --dsos /usr/lib64/libc-2.32.so ls 2364888 29303.011937: 415870 cycles:u: 7f9b95a1c80e __strcoll_l+0xe (/usr/lib64/libc-2.32.so) $ Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210124232750.19170-2-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20perf tools: Add 'ping' control commandJiri Olsa
Add a control 'ping' command to detect if perf is up and its control interface is operational. It will be used in following daemon patches to synchronize with record session - when control interface is up and running, we know that perf record is monitoring and ready to receive signals. Example session: terminal 1: # mkfifo control ack # perf record --control=fifo:control,ack terminal 2: # echo ping > control # cat ack ack Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201226232038.390883-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20perf tools: Add 'stop' control commandJiri Olsa
Adding control 'stop' command to stop perf record. When it is received, perf will set the 'done' variable to 1 to stop its mmap ring buffer reading loop. Example session: terminal 1: # mkfifo control ack # perf record --control=fifo:control,ack terminal 2: # echo stop > control terminal 1: [ perf record: Woken up 7 times to write data ] [ perf record: Captured and wrote 3.214 MB perf.data (38280 samples) ] # Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201226232038.390883-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20perf tools: Add 'evlist' control commandJiri Olsa
Add a new 'evlist' control command to display all the evlist events. When it is received, perf will scan and print current evlist into perf record terminal. The interface string for control file is: evlist [-v|-g|-F] The syntax follows perf evlist command: -F Show just the sample frequency used for each event. -v Show all fields. -g Show event group information. Example session: terminal 1: # mkfifo control ack # perf record --control=fifo:control,ack -e '{cycles,instructions}' terminal 2: # echo evlist > control terminal 1: cycles instructions dummy:HG terminal 2: # echo 'evlist -v' > control terminal 1: cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: \ IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, freq: 1, \ sample_id_all: 1, exclude_guest: 1 instructions: size: 120, config: 0x1, { sample_period, sample_freq }: 4000, \ sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, inherit: 1, freq: 1, \ sample_id_all: 1, exclude_guest: 1 dummy:HG: type: 1, size: 120, config: 0x9, { sample_period, sample_freq }: 4000, \ sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, inherit: 1, mmap: 1, \ comm: 1, freq: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, \ bpf_event: 1 terminal 2: # echo 'evlist -g' > control terminal 1: {cycles,instructions} dummy:HG terminal 2: # echo 'evlist -F' > control terminal 1: cycles: sample_freq=4000 instructions: sample_freq=4000 dummy:HG: sample_freq=4000 This new evlist command is handy to get real event names when wildcards are used. Adding evsel_fprintf.c object to python/perf.so build, because it's now evlist.c dependency. Adding PYTHON_PERF define for python/perf.so compilation, so we can use it to compile in only evsel__fprintf from evsel_fprintf.c object. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201226232038.390883-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20perf tools: Allow to enable/disable events via control fileJiri Olsa
Adding new control events to enable/disable specific event. The interface string for control file are: 'enable <EVENT NAME>' 'disable <EVENT NAME>' when received the command, perf will scan the current evlist for <EVENT NAME> and if found it's enabled/disabled. Example session: terminal 1: # mkfifo control ack perf.pipe # perf record --control=fifo:control,ack -D -1 --no-buffering -e 'sched:*' -o - > perf.pipe terminal 2: # cat perf.pipe | perf --no-pager script -i - terminal 1: Events disabled NOTE Above message will show only after read side of the pipe ('>') is started on 'terminal 2'. The 'terminal 1's bash does not execute perf before that, hence the delyaed perf record message. terminal 3: # echo 'enable sched:sched_process_fork' > control terminal 1: event sched:sched_process_fork enabled terminal 2: bash 33349 [034] 149587.674295: sched:sched_process_fork: comm=bash pid=33349 child_comm=bash child_pid=34056 bash 33349 [034] 149588.239521: sched:sched_process_fork: comm=bash pid=33349 child_comm=bash child_pid=34057 terminal 3: # echo 'enable sched:sched_wakeup_new' > control terminal 1: event sched:sched_wakeup_new enabled terminal 2: bash 33349 [034] 149632.228023: sched:sched_process_fork: comm=bash pid=33349 child_comm=bash child_pid=34059 bash 33349 [034] 149632.228050: sched:sched_wakeup_new: bash:34059 [120] success=1 CPU:036 bash 33349 [034] 149633.950005: sched:sched_process_fork: comm=bash pid=33349 child_comm=bash child_pid=34060 bash 33349 [034] 149633.950030: sched:sched_wakeup_new: bash:34060 [120] success=1 CPU:036 Committer testing: If I use 'sched:*' and then enable all events, I can't get 'perf record' to react to further commands, so I tested it with: [root@five ~]# perf record --control=fifo:control,ack -D -1 --no-buffering -e 'sched:sched_process_*' -o - > perf.pipe Events disabled Events enabled Events disabled And then it works as expected, so we need to fix this pre-existing problem. Another issue, we need to check if a event is already enabled or disabled and change the message to be clearer, i.e.: [root@five ~]# perf record --control=fifo:control,ack -D -1 --no-buffering -e 'sched:sched_process_*' -o - > perf.pipe Events disabled If we receive a 'disable' command, then it should say: [root@five ~]# perf record --control=fifo:control,ack -D -1 --no-buffering -e 'sched:sched_process_*' -o - > perf.pipe Events disabled Events already disabled Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201226232038.390883-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20perf report: Add support for PERF_SAMPLE_CODE_PAGE_SIZEStephane Eranian
Add a new sort dimension "code_page_size" for common sort. With this option applied, perf can sort and report by sample's code page size. For example: # perf report --stdio --sort=comm,symbol,code_page_size # To display the perf.data header info, please use # --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 3K of event 'mem-loads:uP' # Event count (approx.): 1470769 # # Overhead Command Symbol Code Page Size IPC [IPC Coverage] # ........ ....... ............................ .............. .................... # 69.56% dtlb [.] GetTickCount 4K - - 17.93% dtlb [.] Calibrate 4K - - 11.40% dtlb [.] __gettimeofday 4K - - # Signed-off-by: Stephane Eranian <eranian@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210105195752.43489-6-kan.liang@linux.intel.com Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20perf script: Add support for PERF_SAMPLE_CODE_PAGE_SIZEStephane Eranian
Display sampled code page sizes when PERF_SAMPLE_CODE_PAGE_SIZE was set. For example: # perf script --fields comm,event,ip,code_page_size dtlb mem-loads:uP: 445777 4K dtlb mem-loads:uP: 40f724 4K dtlb mem-loads:uP: 474926 4K dtlb mem-loads:uP: 401075 4K dtlb mem-loads:uP: 401095 4K dtlb mem-loads:uP: 401095 4K dtlb mem-loads:uP: 4010cc 4K dtlb mem-loads:uP: 440b6f 4K # Signed-off-by: Stephane Eranian <eranian@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210105195752.43489-5-kan.liang@linux.intel.com Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20perf record: Add support for PERF_SAMPLE_CODE_PAGE_SIZEKan Liang
Adds the infrastructure to sample the code address page size. Introduce a new --code-page-size option for perf record. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Originally-by: Stephane Eranian <eranian@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210105195752.43489-4-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20perf mem: Support data page sizeKan Liang
Add option --data-page-size in "perf mem" to record/report data page size. Here are some examples: # perf mem --phys-data --data-page-size report -D # PID, TID, IP, ADDR, PHYS ADDR, DATA PAGE SIZE, LOCAL WEIGHT, DSRC, SYMBOL 20134 20134 0xffffffffb5bd2fd0 0x016ffff9a274e96a308 0x000000044e96a308 4K 1168 0x5080144 /lib/modules/4.18.0-rc7+/build/vmlinux:perf_ctx_unlock 20134 20134 0xffffffffb63f645c 0xffffffffb752b814 0xcfb52b814 2M 225 0x26a100142 /lib/modules/4.18.0-rc7+/build/vmlinux:_raw_spin_lock 20134 20134 0xffffffffb660300c 0xfffffe00016b8bb0 0x0 4K 0 0x5080144 /lib/modules/4.18.0-rc7+/build/vmlinux:__x86_indirect_thunk_rax # # perf mem --phys-data --data-page-size report --stdio # To display the perf.data header info, please use # --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 5K of event 'cpu/mem-loads,ldlat=30/P' # Total weight : 281234 # Sort order : # mem,sym,dso,symbol_daddr,dso_daddr,tlb,locked,phys_daddr,data_page_size # # Overhead Samples Memory access Symbol Shared Object Data Symbol Data Object TLB access Locked Data Physical Address Data Page Size # ........ ....... ............. ............................ ................ ...................... ........... ............ ...... ...................... .............. 28.54% 1826 L1 or L1 hit [k] __x86_indirect_thunk_rax [kernel.vmlinux] [k] 0xffffb0df31b0ff28 [unknown] L1 or L2 hit No [k] 0x0000000000000000 4K 6.02% 256 L1 or L1 hit [.] touch_buffer dtlb [.] 0x00007ffd50109da8 [stack] L1 or L2 hit No [.] 0x000000042454ada8 4K 3.23% 5 L1 or L1 hit [k] clear_huge_page [kernel.vmlinux] [k] 0xffff9a2753b8ce60 [unknown] L1 or L2 hit No [k] 0x0000000453b8ce60 2M 2.98% 4 L1 or L1 hit [k] clear_page_erms [kernel.vmlinux] [k] 0xffffb0df31b0fd00 [unknown] L1 or L2 hit No [k] 0x0000000000000000 4K Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210105195752.43489-3-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20perf stat: Enable counting events for BPF programsSong Liu
Introduce 'perf stat -b' option, which counts events for BPF programs, like: [root@localhost ~]# ~/perf stat -e ref-cycles,cycles -b 254 -I 1000 1.487903822 115,200 ref-cycles 1.487903822 86,012 cycles 2.489147029 80,560 ref-cycles 2.489147029 73,784 cycles 3.490341825 60,720 ref-cycles 3.490341825 37,797 cycles 4.491540887 37,120 ref-cycles 4.491540887 31,963 cycles The example above counts 'cycles' and 'ref-cycles' of BPF program of id 254. This is similar to bpftool-prog-profile command, but more flexible. 'perf stat -b' creates per-cpu perf_event and loads fentry/fexit BPF programs (monitor-progs) to the target BPF program (target-prog). The monitor-progs read perf_event before and after the target-prog, and aggregate the difference in a BPF map. Then the user space reads data from these maps. A new 'struct bpf_counter' is introduced to provide a common interface that uses BPF programs/maps to count perf events. Committer notes: Removed all but bpf_counter.h includes from evsel.h, not needed at all. Also BPF map lookups for PERCPU_ARRAYs need to have as its value receive buffer passed to the kernel libbpf_num_possible_cpus() entries, not evsel__nr_cpus(evsel), as the former uses /sys/devices/system/cpu/possible while the later uses /sys/devices/system/cpu/online, which may be less than the 'possible' number making the bpf map lookup overwrite memory and cause hard to debug memory corruption. We need to continue using evsel__nr_cpus(evsel) when accessing the perf_counts array tho, not to overwrite another are of memory :-) Signed-off-by: Song Liu <songliubraving@fb.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/lkml/20210120163031.GU12699@kernel.org/ Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-team@fb.com Link: http://lore.kernel.org/lkml/20201229214214.3413833-4-songliubraving@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-28perf buildid-cache: Add --debuginfod option to specify a server to fetch ↵Jiri Olsa
debug files Add the --debuginfod option to specify debuginfod URL and support to do that through config file as well. Use the following in ~/.perfconfig file: [buildid-cache] debuginfod=http://192.168.122.174:8002 Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201214105457.543111-14-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-28perf record: Add --buildid-mmap option to enable PERF_RECORD_MMAP2's build idJiri Olsa
Add --buildid-mmap option to enable build id in PERF_RECORD_MMAP2 events. It will only work if there's kernel support for that and it disables build id cache (implies --no-buildid). It's also possible to enable it permanently via config option in ~/.perfconfig file: [record] build-id=mmap Also added build_id bit in the verbose output for perf_event_attr: # perf record --buildid-mmap -vv ... perf_event_attr: type 1 size 120 ... build_id 1 Adding also missing text_poke bit. Committer testing: $ perf record -h build Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -B, --no-buildid do not collect buildids in perf.data -N, --no-buildid-cache do not update the buildid cache --buildid-all Record build-id of all DSOs regardless of hits --buildid-mmap Record build-id in map events $ $ perf record --buildid-mmap sleep 1 Failed: no support to record build id in mmap events, update your kernel. $ After adding the needed kernel bits in a test kernel: $ perf record -vv --buildid-mmap sleep 1 |& grep -m1 build Enabling build id in mmap2 events. $ perf evlist -v cycles:u: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, build_id: 1 $ Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201214105457.543111-16-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-19perf sort: Add sort option for data page sizeKan Liang
Add a new sort option "data_page_size" for --mem-mode sort. With this option applied, perf can sort and report by sample's data page size. Here is an example: perf report --stdio --mem-mode --sort=comm,symbol,phys_daddr,data_page_size # To display the perf.data header info, please use # --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 9K of event 'mem-loads:uP' # Total weight : 9028 # Sort order : comm,symbol,phys_daddr,data_page_size # # Overhead Command Symbol Data Physical # Address # Data Page Size # ........ ....... ............................ # ...................... ...................... # 11.19% dtlb [.] touch_buffer [.] 0x00000003fec82ea8 4K 8.61% dtlb [.] GetTickCount [.] 0x00000003c4f2c8a8 4K 4.52% dtlb [.] GetTickCount [.] 0x00000003fec82f58 4K 4.33% dtlb [.] __gettimeofday [.] 0x00000003fec82f48 4K 4.32% dtlb [.] GetTickCount [.] 0x00000003fec82f78 4K 4.28% dtlb [.] GetTickCount [.] 0x00000003fec82f50 4K 4.23% dtlb [.] GetTickCount [.] 0x00000003fec82f70 4K 4.11% dtlb [.] GetTickCount [.] 0x00000003fec82f68 4K 4.00% dtlb [.] Calibrate [.] 0x00000003fec82f98 4K 3.91% dtlb [.] Calibrate [.] 0x00000003fec82f90 4K 3.43% dtlb [.] touch_buffer [.] 0x00000003fec82e98 4K 3.42% dtlb [.] touch_buffer [.] 0x00000003fec82e90 4K 0.09% dtlb [.] DoDependentLoads [.] 0x000000036ea084c0 2M 0.08% dtlb [.] DoDependentLoads [.] 0x000000032b010b80 2M Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Link: http://lore.kernel.org/lkml/20201216185805.9981-3-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-19perf script: Support data page sizeKan Liang
Display the data page size if it is available and asked by the user: Can be configured by the user, for example: perf script --fields comm,event,phys_addr,data_page_size dtlb mem-loads:uP: 3fec82ea8 4K dtlb mem-loads:uP: 3fec82e90 4K dtlb mem-loads:uP: 3e23700a4 4K dtlb mem-loads:uP: 3fec82f20 4K dtlb mem-loads:uP: 3e23700a4 4K dtlb mem-loads:uP: 3b4211bec 4K dtlb mem-loads:uP: 382205dc0 2M dtlb mem-loads:uP: 36fa082c0 2M dtlb mem-loads:uP: 377607340 2M dtlb mem-loads:uP: 330010180 2M dtlb mem-loads:uP: 33200fd80 2M dtlb mem-loads:uP: 31b012b80 2M Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Link: http://lore.kernel.org/lkml/20201216185805.9981-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17perf tools: Reformat record's control fd man textJiri Olsa
Adding available control commands in separate paragraph, so it's more readable and easier to add new commands. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201216083914.47215-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17perf config: Fix example command in manpage to conform to syntax specified ↵Nick Thompson
in the SYNOPSIS section. Committer testing: With the previously documented example: $ perf config --user report sort-order=srcline The config variable does not contain a section name: report $ With the fixed example line: $ perf config --user report.sort-order=srcline $ perf config --user report.sort-order report.sort-order=srcline $ Signed-off-by: Nick Thompson <nathompson7@protonmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/linux-perf-users/20201217142619.GA14524@redhat.com/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17perf record: Support new sample type for data page sizeKan Liang
Support new sample type PERF_SAMPLE_DATA_PAGE_SIZE for page size. Add new option --data-page-size to record sample data page size. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Link: http://lore.kernel.org/lkml/20201130172803.2676-3-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11perf auxtrace: Add itrace option '-M' for memory eventsLeo Yan
This patch is to add itrace option '-M' to synthesize memory event. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201106094853.21082-7-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04perf stat: Add --quiet optionAndi Kleen
Add a new --quiet option to 'perf stat'. This is useful with 'perf stat record' to write the data only to the perf.data file, which can lower measurement overhead because the data doesn't need to be formatted. On my 4C desktop: % time ./perf stat record -e $(python -c 'print ",".join(["cycles"]*1000)') -a -I 1000 sleep 5 ... real 0m5.377s user 0m0.238s sys 0m0.452s % time ./perf stat record --quiet -e $(python -c 'print ",".join(["cycles"]*1000)') -a -I 1000 sleep 5 real 0m5.452s user 0m0.183s sys 0m0.423s In this example it cuts the user time by 20%. On systems with more cores the savings are higher. Signed-off-by: Andi Kleen <andi@firstfloor.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Link: http://lore.kernel.org/lkml/20201027002737.30942-1-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04perf stat: Support regex pattern in --for-each-cgroupNamhyung Kim
To make the command line even more compact with cgroups, support regex pattern matching in cgroup names. $ perf stat -a -e cpu-clock,cycles --for-each-cgroup ^foo sleep 1 3,000.73 msec cpu-clock foo # 2.998 CPUs utilized 12,530,992,699 cycles foo # 7.517 GHz (100.00%) 1,000.61 msec cpu-clock foo/bar # 1.000 CPUs utilized 4,178,529,579 cycles foo/bar # 2.506 GHz (100.00%) 1,000.03 msec cpu-clock foo/baz # 0.999 CPUs utilized 4,176,104,315 cycles foo/baz # 2.505 GHz (100.00%) 1.000892614 seconds time elapsed Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201027072855.655449-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-15perf c2c: Update documentation for metrics reorganizationLeo Yan
The output format for metrics has been reorganized, update documentation to reflect the changes for it. Signed-off-by: Leo Yan <leo.yan@linaro.org> Cc: Al Grant <al.grant@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20201015144548.18482-10-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14perf diff: Support hot streams comparisonJin Yao
This patch enables perf-diff with "--stream" option. "--stream": Enable hot streams comparison Now let's see example. perf record -b ... Generate perf.data.old with branch data perf record -b ... Generate perf.data with branch data perf diff --stream [ Matched hot streams ] hot chain pair 1: cycles: 1, hits: 27.77% cycles: 1, hits: 9.24% --------------------------- -------------------------- main div.c:39 main div.c:39 main div.c:44 main div.c:44 hot chain pair 2: cycles: 34, hits: 20.06% cycles: 27, hits: 16.98% --------------------------- -------------------------- __random_r random_r.c:360 __random_r random_r.c:360 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:380 __random_r random_r.c:380 __random_r random_r.c:357 __random_r random_r.c:357 __random random.c:293 __random random.c:293 __random random.c:293 __random random.c:293 __random random.c:291 __random random.c:291 __random random.c:291 __random random.c:291 __random random.c:291 __random random.c:291 __random random.c:288 __random random.c:288 rand rand.c:27 rand rand.c:27 rand rand.c:26 rand rand.c:26 rand@plt rand@plt rand@plt rand@plt compute_flag div.c:25 compute_flag div.c:25 compute_flag div.c:22 compute_flag div.c:22 main div.c:40 main div.c:40 main div.c:40 main div.c:40 main div.c:39 main div.c:39 hot chain pair 3: cycles: 9, hits: 4.48% cycles: 6, hits: 4.51% --------------------------- -------------------------- __random_r random_r.c:360 __random_r random_r.c:360 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:380 __random_r random_r.c:380 [ Hot streams in old perf data only ] hot chain 1: cycles: 18, hits: 6.75% -------------------------- __random_r random_r.c:360 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:380 __random_r random_r.c:357 __random random.c:293 __random random.c:293 __random random.c:291 __random random.c:291 __random random.c:291 __random random.c:288 rand rand.c:27 rand rand.c:26 rand@plt rand@plt compute_flag div.c:25 compute_flag div.c:22 main div.c:40 hot chain 2: cycles: 29, hits: 2.78% -------------------------- compute_flag div.c:22 main div.c:40 main div.c:40 main div.c:39 [ Hot streams in new perf data only ] hot chain 1: cycles: 4, hits: 4.54% -------------------------- main div.c:42 compute_flag div.c:28 hot chain 2: cycles: 5, hits: 3.51% -------------------------- main div.c:39 main div.c:44 main div.c:42 compute_flag div.c:28 Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20201009022845.13141-8-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14perf intel-pt: Improve PT documentation slightlyAndi Kleen
Document the higher level --insn-trace etc. perf script options. Include the howto how to build xed into the manpage Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20201014035346.4772-1-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14perf tools: Add support for exclusive groups/eventsAndi Kleen
Peter suggested that using the exclusive mode in perf could avoid some problems with bad scheduling of groups. Exclusive is implemented in the kernel, but wasn't exposed by the perf tool, so hard to use without custom low level API users. Add support for marking groups or events with :e for exclusive in the perf tool. The implementation is basically the same as the existing pinned attribute. Committer testing: # perf test "parse event" 6: Parse event definition strings : Ok # perf test -v "parse event" |& grep :u*e running test 56 'instructions:uep' running test 57 '{cycles,cache-misses,branch-misses}:e' # # # grep "model name" -m1 /proc/cpuinfo model name : AMD Ryzen 9 3900X 12-Core Processor # # perf stat -a -e '{cycles,cache-misses,branch-misses}:e' sleep 1 Performance counter stats for 'system wide': <not counted> cycles (0.00%) <not counted> cache-misses (0.00%) <not counted> branch-misses (0.00%) 1.001269893 seconds time elapsed Some events weren't counted. Try disabling the NMI watchdog: echo 0 > /proc/sys/kernel/nmi_watchdog perf stat ... echo 1 > /proc/sys/kernel/nmi_watchdog # echo 0 > /proc/sys/kernel/nmi_watchdog # perf stat -a -e '{cycles,cache-misses,branch-misses}:e' sleep 1 Performance counter stats for 'system wide': 1,298,663,141 cycles 30,962,215 cache-misses 5,325,150 branch-misses 1.001474934 seconds time elapsed # # The output for asking for precise events on AMD needs to improve, it # supposedly works only for system wide or per CPU # # perf stat -a -e '{cycles,cache-misses,branch-misses}:uep' sleep 1 Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cycles). /bin/dmesg | grep -i perf may provide additional information. # perf stat -a -e '{cycles,cache-misses,branch-misses}:ue' sleep 1 Performance counter stats for 'system wide': 746,363,126 cycles 16,881,611 cache-misses 2,871,259 branch-misses 1.001636066 seconds time elapsed # Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20201014144255.22699-1-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-13perf inject: Add --buildid-all optionNamhyung Kim
Like 'perf record', we can even more speedup build-id processing by just using all DSOs. Then we don't need to look at all the sample events anymore. The following patch will update 'perf bench' to show the result of the --buildid-all option too. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Original-patch-by: Stephane Eranian <eranian@google.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201012070214.2074921-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-28perf stat: Add --for-each-cgroup optionNamhyung Kim
The --for-each-cgroup option is a syntax sugar to monitor large number of cgroups easily. Current command line requires to list all the events and cgroups even if users want to monitor same events for each cgroup. This patch addresses that usage by copying given events for each cgroup on user's behalf. For instance, if they want to monitor 6 events for 200 cgroups each they should write 1200 event names (with -e) AND 1200 cgroup names (with -G) on the command line. But with this change, they can just specify 6 events and 200 cgroups with a new option. A simpler example below: It wants to measure 3 events for 2 cgroups ('A' and 'B'). The result is that total 6 events are counted like below. $ perf stat -a -e cpu-clock,cycles,instructions --for-each-cgroup A,B sleep 1 Performance counter stats for 'system wide': 988.18 msec cpu-clock A # 0.987 CPUs utilized 3,153,761,702 cycles A # 3.200 GHz (100.00%) 8,067,769,847 instructions A # 2.57 insn per cycle (100.00%) 982.71 msec cpu-clock B # 0.982 CPUs utilized 3,136,093,298 cycles B # 3.182 GHz (99.99%) 8,109,619,327 instructions B # 2.58 insn per cycle (99.99%) 1.001228054 seconds time elapsed Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200924124455.336326-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-17perf docs: Improve help information in perf.txtZejiang Tang
perf has many undocumented options, such as:-vv, --exec-path, --html-path, -p, --paginate,--no-pager, --debugfs-dir, --buildid-dir, --list-cmds, --list-opts. Add entris for these options in perf.txt. Signed-off-by: Zejiang Tang <tangzejiang@loongson.cn> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/1599645194-8438-1-git-send-email-tangzejiang@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-17perf tools: Add documentation for topdown metricsAndi Kleen
Add some documentation how to use the topdown metrics in ring 3. Signed-off-by: Andi Kleen <ak@linux.intel.com> Co-developed-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200911144808.27603-5-kan.liang@linux.intel.com Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-17perf stat: Support new per thread TopDown metricsAndi Kleen
Icelake has support for reporting per thread TopDown metrics. These are reported differently than the previous TopDown support, each metric is standalone, but scaled to pipeline "slots". We don't need to do anything special for HyperThreading anymore. Teach perf stat --topdown to handle these new metrics and print them in the same way as the previous TopDown metrics. The restrictions of only being able to report information per core is gone. Signed-off-by: Andi Kleen <ak@linux.intel.com> Co-developed-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200911144808.27603-4-kan.liang@linux.intel.com Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-04perf: ftrace: Add filter support for option -F/--funcsChangbin Du
Same as 'perf probe -F', this patch adds filter support for the ftrace subcommand option '-F, --funcs <[FILTER]>'. Here is an example that only lists functions which start with 'vfs_': $ sudo perf ftrace -F vfs_* vfs_fadvise vfs_fallocate vfs_truncate vfs_open vfs_setpos vfs_llseek vfs_readf vfs_writef ... Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Changbin Du <changbin.du@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200904152357.6053-1-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-04perf intel-pt: Document snapshot control commandAdrian Hunter
The documentation describes snapshot mode. Update it to include the new snapshot control command. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Alexey Budankov <alexey.budankov@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200901093758.32293-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-04perf annotate: Allow configuring the 'disassembler_style' knob via 'perf config'Arnaldo Carvalho de Melo
# perf annotate --stdio2 acpi_processor_ffh_cstate_enter > default # perf config annotate.disassembler_style=intel # perf config annotate.disassembler_style annotate.disassembler_style=intel # perf annotate --stdio2 acpi_processor_ffh_cstate_enter > intel # diff -u default intel --- default 2020-09-04 13:09:26.019205732 -0300 +++ intel 2020-09-04 13:09:52.823795081 -0300 @@ -1,42 +1,42 @@ Samples: 1K of event 'cycles', 4000 Hz, Event count (approx.): 990065316, [percent: local period] acpi_processor_ffh_cstate_enter() /lib/modules/5.9.0-rc3/build/vmlinux -Percent → callq __fentry__ - mov cpu_number,%edx - mov %edx,%edx - mov cpu_cstate_entry,%rax - add -0x7dbe9700(,%rdx,8),%rax - movzbl 0x9(%rdi),%edx - mov 0x4(%rax,%rdx,8),%edi - mov (%rax,%rdx,8),%esi - → jmpq 137ccc6 - 2d: → jmpq 137ccd8 +Percent → call __fentry__ + mov edx,DWORD PTR gs:[rip+0x7e541d74] + mov edx,edx + mov rax,QWORD PTR [rip+0x152b8fb] + add rax,QWORD PTR [rdx*8-0x7dbe9700] + movzx edx,BYTE PTR [rdi+0x9] + mov edi,DWORD PTR [rax+rdx*8+0x4] + mov esi,DWORD PTR [rax+rdx*8] + → jmp 137ccc6 + 2d: → jmp 137ccd8 mfence - mov %gs:0x17bc0,%rax - clflush (%rax) + mov rax,QWORD PTR gs:0x17bc0 + clflush BYTE PTR [rax] mfence - xor %edx,%edx - mov %rdx,%rcx - mov %gs:0x17bc0,%rax - 0.00 monitor %rax,%ecx,%edx - mov (%rax),%rax - test $0x8,%al + xor edx,edx + mov rcx,rdx + mov rax,QWORD PTR gs:0x17bc0 + 0.00 monitor + mov rax,QWORD PTR [rax] + test al,0x8 ↓ jne 71 - ↓ jmpq 68 - verw 0x538b08(%rip) # ffffffff82008150 <ds.0> - 68: mov %rsi,%rax - mov %rdi,%rcx -100.00 mwait %eax,%ecx - 71: mov %gs:0x17bc0,%rax - lock andb $0xdf,0x2(%rax) - lock addl $0x0,-0x4(%rsp) - mov (%rax),%rax - test $0x8,%al + ↓ jmp 68 + verw WORD PTR [rip+0x538b08] # ffffffff82008150 <ds.0> + 68: mov rax,rsi + mov rcx,rdi +100.00 mwait + 71: mov rax,QWORD PTR gs:0x17bc0 + lock and BYTE PTR [rax+0x2],0xdf + lock add DWORD PTR [rsp-0x4],0x0 + mov rax,QWORD PTR [rax] + test al,0x8 ↓ je 97 - andl $0x7fffffff,__preempt_count - 97: ← retq - mov %gs:0x17bc0,%rax - lock orb $0x20,0x2(%rax) - mov (%rax),%rax - test $0x8,%al + and DWORD PTR gs:[rip+0x7e548509],0x7fffffff + 97: ret + mov rax,QWORD PTR gs:0x17bc0 + lock or BYTE PTR [rax+0x2],0x20 + mov rax,QWORD PTR [rax] + test al,0x8 ↑ jne 71 - ↑ jmpq 2d + ↑ jmp 2d # Requested-by: Matt P. Dziubinski <matdzb@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-04perf record: Add 'snapshot' control commandAdrian Hunter
Add 'snapshot' control command to create an AUX area tracing snapshot the same as if sending SIGUSR2. The advantage of the FIFO is that access is governed by access to the FIFO. Example: $ mkfifo perf.control $ mkfifo perf.ack $ cat perf.ack & [1] 15235 $ sudo ~/bin/perf record --control fifo:perf.control,perf.ack -S -e intel_pt//u -- sleep 60 & [2] 15243 $ ps -e | grep perf 15244 pts/1 00:00:00 perf $ kill -USR2 15244 bash: kill: (15244) - Operation not permitted $ echo snapshot > perf.control ack $ Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Alexey Budankov <alexey.budankov@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200901093758.32293-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-04perf tools: Add FIFO file names as alternative options to --controlAdrian Hunter
Enable the --control option to accept file names as an alternative to file descriptors. Example: $ mkfifo perf.control $ mkfifo perf.ack $ cat perf.ack & [1] 6808 $ perf record --control fifo:perf.control,perf.ack -- sleep 300 & [2] 6810 $ echo disable > perf.control $ Events disabled ack $ echo enable > perf.control $ Events enabled ack $ echo disable > perf.control $ Events disabled ack $ kill %2 [ perf record: Woken up 4 times to write data ] $ [ perf record: Captured and wrote 0.018 MB perf.data (7 samples) ] [1]- Done cat perf.ack [2]+ Terminated perf record --control fifo:perf.control,perf.ack -- sleep 300 $ Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Alexey Budankov <alexey.budankov@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200902105707.11491-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-04perf tools: Use AsciiDoc formatting for --control option documentationAdrian Hunter
The --control option does not display well in man pages unless AsciiDoc formatting is used. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Alexey Budankov <alexey.budankov@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200901093758.32293-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-03perf record/stat: Explicitly call out event modifiers in the documentationKim Phillips
Event modifiers are not mentioned in the perf record or perf stat manpages. Add them to orient new users more effectively by pointing them to the perf list manpage for details. Fixes: 2055fdaf8703 ("perf list: Document precise event sampling for AMD IBS") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tony Jones <tonyj@suse.de> Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20200901215853.276234-1-kim.phillips@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-03perf stat: Turn off summary for interval mode by defaultJin Yao
There's a risk that outputting interval mode summaries by default breaks CSV consumers. It already broke pmu-tools/toplev. So now we turn off the summary by default but we create a new option '--summary' to enable the summary. This is active even when not using CSV mode. Before: root@kbl-ppc:~# perf stat -I1000 --interval-count 2 # time counts unit events 1.000265904 8,005.73 msec cpu-clock # 8.006 CPUs utilized 1.000265904 601 context-switches # 0.075 K/sec 1.000265904 10 cpu-migrations # 0.001 K/sec 1.000265904 0 page-faults # 0.000 K/sec 1.000265904 66,746,521 cycles # 0.008 GHz 1.000265904 71,874,398 instructions # 1.08 insn per cycle 1.000265904 13,356,781 branches # 1.668 M/sec 1.000265904 298,756 branch-misses # 2.24% of all branches 2.001857667 8,012.52 msec cpu-clock # 8.013 CPUs utilized 2.001857667 164 context-switches # 0.020 K/sec 2.001857667 10 cpu-migrations # 0.001 K/sec 2.001857667 2 page-faults # 0.000 K/sec 2.001857667 5,822,188 cycles # 0.001 GHz 2.001857667 2,186,170 instructions # 0.38 insn per cycle 2.001857667 442,378 branches # 0.055 M/sec 2.001857667 44,750 branch-misses # 10.12% of all branches Performance counter stats for 'system wide': 16,018.25 msec cpu-clock # 7.993 CPUs utilized 765 context-switches # 0.048 K/sec 20 cpu-migrations # 0.001 K/sec 2 page-faults # 0.000 K/sec 72,568,709 cycles # 0.005 GHz 74,060,568 instructions # 1.02 insn per cycle 13,799,159 branches # 0.861 M/sec 343,506 branch-misses # 2.49% of all branches 2.004118489 seconds time elapsed After: root@kbl-ppc:~# perf stat -I1000 --interval-count 2 # time counts unit events 1.001336393 8,013.28 msec cpu-clock # 8.013 CPUs utilized 1.001336393 82 context-switches # 0.010 K/sec 1.001336393 8 cpu-migrations # 0.001 K/sec 1.001336393 0 page-faults # 0.000 K/sec 1.001336393 4,199,121 cycles # 0.001 GHz 1.001336393 1,373,991 instructions # 0.33 insn per cycle 1.001336393 270,681 branches # 0.034 M/sec 1.001336393 31,659 branch-misses # 11.70% of all branches 2.003905006 8,020.52 msec cpu-clock # 8.021 CPUs utilized 2.003905006 184 context-switches # 0.023 K/sec 2.003905006 8 cpu-migrations # 0.001 K/sec 2.003905006 2 page-faults # 0.000 K/sec 2.003905006 5,446,190 cycles # 0.001 GHz 2.003905006 2,312,547 instructions # 0.42 insn per cycle 2.003905006 451,691 branches # 0.056 M/sec 2.003905006 37,925 branch-misses # 8.40% of all branches root@kbl-ppc:~# perf stat -I1000 --interval-count 2 --summary # time counts unit events 1.001313128 8,013.20 msec cpu-clock # 8.013 CPUs utilized 1.001313128 83 context-switches # 0.010 K/sec 1.001313128 8 cpu-migrations # 0.001 K/sec 1.001313128 0 page-faults # 0.000 K/sec 1.001313128 4,470,950 cycles # 0.001 GHz 1.001313128 1,440,045 instructions # 0.32 insn per cycle 1.001313128 283,222 branches # 0.035 M/sec 1.001313128 33,576 branch-misses # 11.86% of all branches 2.003857385 8,020.34 msec cpu-clock # 8.020 CPUs utilized 2.003857385 154 context-switches # 0.019 K/sec 2.003857385 8 cpu-migrations # 0.001 K/sec 2.003857385 2 page-faults # 0.000 K/sec 2.003857385 4,515,676 cycles # 0.001 GHz 2.003857385 2,180,449 instructions # 0.48 insn per cycle 2.003857385 435,254 branches # 0.054 M/sec 2.003857385 31,179 branch-misses # 7.16% of all branches Performance counter stats for 'system wide': 16,033.53 msec cpu-clock # 7.992 CPUs utilized 237 context-switches # 0.015 K/sec 16 cpu-migrations # 0.001 K/sec 2 page-faults # 0.000 K/sec 8,986,626 cycles # 0.001 GHz 3,620,494 instructions # 0.40 insn per cycle 718,476 branches # 0.045 M/sec 64,755 branch-misses # 9.01% of all branches 2.006124542 seconds time elapsed Fixes: c7e5b328a8d4 ("perf stat: Report summary for interval mode") Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200903010113.32232-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add option --tid to filter by thread idChangbin Du
This allows us to trace single thread instead of the whole process. Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-17-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add option -D/--delay to delay tracingChangbin Du
This adds an option '-D/--delay' to allow us to start tracing some times later after workload is launched. Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-16-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf: ftrace: Allow set graph depth by '--graph-opts'Changbin Du
This is to have a consistent view of all graph tracer options. The original option '--graph-depth' is marked as deprecated. Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-15-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add support for trace option tracing_threshChangbin Du
This adds an option '--graph-opts thresh' to setup trace duration threshold for funcgraph tracer. $ sudo ./perf ftrace -G '*' --graph-opts thresh=100 3) ! 184.060 us | } /* schedule */ 3) ! 185.600 us | } /* exit_to_usermode_loop */ 2) ! 225.989 us | } /* schedule_idle */ 2) # 4140.051 us | } /* do_idle */ Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-14-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add option 'verbose' to show more info for graph tracerChangbin Du
Sometimes we want ftrace display more and longer information about the trace. $ sudo perf ftrace -G '*' 2) 0.979 us | mutex_unlock(); 2) 1.540 us | __fsnotify_parent(); 2) 0.433 us | fsnotify(); $ sudo perf ftrace -G '*' --graph-opts verbose 14160.770883 | 0) <...>-47814 | .... | 1.289 us | mutex_unlock(); 14160.770886 | 0) <...>-47814 | .... | 1.624 us | __fsnotify_parent(); 14160.770887 | 0) <...>-47814 | .... | 0.636 us | fsnotify(); 14160.770888 | 0) <...>-47814 | .... | 0.328 us | __sb_end_write(); 14160.770888 | 0) <...>-47814 | d... | 0.430 us | fpregs_assert_state_consistent(); 14160.770889 | 0) <...>-47814 | d... | | do_syscall_64() { 14160.770889 | 0) <...>-47814 | .... | | __x64_sys_close() { Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-13-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add support for tracing option 'irq-info'Changbin Du
This adds support to display irq context info for function tracer. To do this, just specify a '--func-opts irq-info' option. Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-12-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add support for trace option funcgraph-irqsChangbin Du
This adds an option '--graph-opts noirqs' to filter out functions executed in irq context. Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-11-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add support for trace option sleep-timeChangbin Du
This adds an option '--graph-opts nosleep-time' which allow us only to measure on-CPU time. This option is function_graph tracer only. Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-10-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add support for tracing option 'func_stack_trace'Changbin Du
This adds support to display call trace for function tracer. To do this, just specify a '--func-opts call-graph' option. Example: $ sudo perf ftrace -T vfs_read --func-opts call-graph iio-sensor-prox-855 [003] 6168.369657: vfs_read <-ksys_read iio-sensor-prox-855 [003] 6168.369677: <stack trace> => vfs_read => ksys_read => __x64_sys_read => do_syscall_64 => entry_SYSCALL_64_after_hwframe ... Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-9-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add option '--inherit' to trace children processesChangbin Du
This adds an option '--inherit' to allow us trace children processes spawned by our target. Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-7-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>