summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2020-04-30perf tools: Remove unneeded semicolonsZou Wei
Fixes coccicheck warnings: tools/perf/builtin-diff.c:1565:2-3: Unneeded semicolon tools/perf/builtin-lock.c:778:2-3: Unneeded semicolon tools/perf/builtin-mem.c:126:2-3: Unneeded semicolon tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c:555:2-3: Unneeded semicolon tools/perf/util/ordered-events.c:317:2-3: Unneeded semicolon tools/perf/util/synthetic-events.c:1131:2-3: Unneeded semicolon tools/perf/util/trace-event-read.c:78:2-3: Unneeded semicolon Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/1588065523-71423-1-git-send-email-zou_wei@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30perf c2c: Remove unneeded semicolonZou Wei
Fixes coccicheck warnings: tools/perf/builtin-c2c.c:1712:2-3: Unneeded semicolon tools/perf/builtin-c2c.c:1928:2-3: Unneeded semicolon tools/perf/builtin-c2c.c:2962:2-3: Unneeded semicolon Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/1588064336-70456-1-git-send-email-zou_wei@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30libtraceevent: Remove unneeded semicolonZou Wei
Fixes coccicheck warning: tools/lib/traceevent/kbuffer-parse.c:441:2-3: Unneeded semicolon Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/1588065121-71236-1-git-send-email-zou_wei@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30perf script: Remove extraneous newline in perf_sample__fprintf_regs()Stephane Eranian
When printing iregs, there was a double newline printed because perf_sample__fprintf_regs() was printing its own and then at the end of all fields, perf script was adding one. This was causing blank line in the output: Before: $ perf script -Fip,iregs 401b8d ABI:2 DX:0x100 SI:0x4a8340 DI:0x4a9340 401b8d ABI:2 DX:0x100 SI:0x4a9340 DI:0x4a8340 401b8d ABI:2 DX:0x100 SI:0x4a8340 DI:0x4a9340 401b8d ABI:2 DX:0x100 SI:0x4a9340 DI:0x4a8340 After: $ perf script -Fip,iregs 401b8d ABI:2 DX:0x100 SI:0x4a8340 DI:0x4a9340 401b8d ABI:2 DX:0x100 SI:0x4a9340 DI:0x4a8340 401b8d ABI:2 DX:0x100 SI:0x4a8340 DI:0x4a9340 Committer testing: First we need to figure out how to request that registers be recorded, so we use: # perf record -h reg Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -I, --intr-regs[=<any register>] sample selected machine registers on interrupt, use '-I?' to list register names --buildid-all Record build-id of all DSOs regardless of hits --user-regs[=<any register>] sample selected machine registers on interrupt, use '--user-regs=?' to list register names # Ok, now lets ask for them all: # perf record -a --intr-regs --user-regs sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 4.105 MB perf.data (2760 samples) ] # Lets look at the first 6 output lines: # perf script -Fip,iregs | head -6 ffffffff8a06f2f4 ABI:2 AX:0xffffd168fee0a980 BX:0xffff8a23b087f000 CX:0xfffeb69aaeb25d73 DX:0xffff8a253e8310f0 SI:0xfffffff9bafe7359 DI:0xffffb1690204fb10 BP:0xffffd168fee0a950 SP:0xffffb1690204fb88 IP:0xffffffff8a06f2f4 FLAGS:0x4e CS:0x10 SS:0x18 R8:0x1495f0a91129a R9:0xffff8a23b087f000 R10:0x1 R11:0xffffffff R12:0x0 R13:0xffff8a253e827e00 R14:0xffffd168fee0aa5c R15:0xffffd168fee0a980 ffffffff8a06f2f4 ABI:2 AX:0x0 BX:0xffffd168fee0a950 CX:0x5684cc1118491900 DX:0x0 SI:0xffffd168fee0a9d0 DI:0x202 BP:0xffffb1690204fd70 SP:0xffffb1690204fd20 IP:0xffffffff8a06f2f4 FLAGS:0x24e CS:0x10 SS:0x18 R8:0x0 R9:0xffffd168fee0a9d0 R10:0x1 R11:0xffffffff R12:0xffffffff8a23e480 R13:0xffff8a23b087f240 R14:0xffff8a23b087f000 R15:0xffffd168fee0a950 ffffffff8a06f2f4 ABI:2 AX:0x0 BX:0x0 CX:0x7f25f334335b DX:0x0 SI:0x2400 DI:0x4 BP:0x7fff5f264570 SP:0x7fff5f264538 IP:0xffffffff8a06f2f4 FLAGS:0x24e CS:0x10 SS:0x2b R8:0x0 R9:0x2312d20 R10:0x0 R11:0x246 R12:0x22cc0e0 R13:0x0 R14:0x0 R15:0x22d0780 # Reproduced, apply the patch and: [root@five ~]# perf script -Fip,iregs | head -6 ffffffff8a06f2f4 ABI:2 AX:0xffffd168fee0a980 BX:0xffff8a23b087f000 CX:0xfffeb69aaeb25d73 DX:0xffff8a253e8310f0 SI:0xfffffff9bafe7359 DI:0xffffb1690204fb10 BP:0xffffd168fee0a950 SP:0xffffb1690204fb88 IP:0xffffffff8a06f2f4 FLAGS:0x4e CS:0x10 SS:0x18 R8:0x1495f0a91129a R9:0xffff8a23b087f000 R10:0x1 R11:0xffffffff R12:0x0 R13:0xffff8a253e827e00 R14:0xffffd168fee0aa5c R15:0xffffd168fee0a980 ffffffff8a06f2f4 ABI:2 AX:0x0 BX:0xffffd168fee0a950 CX:0x5684cc1118491900 DX:0x0 SI:0xffffd168fee0a9d0 DI:0x202 BP:0xffffb1690204fd70 SP:0xffffb1690204fd20 IP:0xffffffff8a06f2f4 FLAGS:0x24e CS:0x10 SS:0x18 R8:0x0 R9:0xffffd168fee0a9d0 R10:0x1 R11:0xffffffff R12:0xffffffff8a23e480 R13:0xffff8a23b087f240 R14:0xffff8a23b087f000 R15:0xffffd168fee0a950 ffffffff8a06f2f4 ABI:2 AX:0x0 BX:0x0 CX:0x7f25f334335b DX:0x0 SI:0x2400 DI:0x4 BP:0x7fff5f264570 SP:0x7fff5f264538 IP:0xffffffff8a06f2f4 FLAGS:0x24e CS:0x10 SS:0x2b R8:0x0 R9:0x2312d20 R10:0x0 R11:0x246 R12:0x22cc0e0 R13:0x0 R14:0x0 R15:0x22d0780 ffffffff8a24074b ABI:2 AX:0xcb BX:0xcb CX:0x0 DX:0x0 SI:0xffffb1690204ff58 DI:0xcb BP:0xffffb1690204ff58 SP:0xffffb1690204ff40 IP:0xffffffff8a24074b FLAGS:0x24e CS:0x10 SS:0x18 R8:0x0 R9:0x0 R10:0x0 R11:0x0 R12:0x0 R13:0x0 R14:0x0 R15:0x0 ffffffff8a310600 ABI:2 AX:0x0 BX:0xffffffff8b8c39a0 CX:0x0 DX:0xffff8a2503890300 SI:0xffffb1690204ff20 DI:0xffff8a23e4080000 BP:0xffff8a23e4080000 SP:0xffffb1690204fec0 IP:0xffffffff8a310600 FLAGS:0x28e CS:0x10 SS:0x18 R8:0x0 R9:0x0 R10:0x0 R11:0x0 R12:0xffffffffffffffea R13:0xffff8a23e4080020 R14:0x0 R15:0x0 ffffffff8a11b688 ABI:2 AX:0x0 BX:0xffff8a237b7c8800 CX:0xffffb1690204fae0 DX:0x78 SI:0xffff8a237b7c8800 DI:0xffffb1690204fa10 BP:0xffffb1690204fb00 SP:0xffffb1690204fa00 IP:0xffffffff8a11b688 FLAGS:0x8a CS:0x10 SS:0x18 R8:0x1495f0a917eba R9:0xffffd168fde19a48 R10:0xffffb1690204fd98 R11:0xffff8a253e82afb0 R12:0xffff8a237b7c8800 R13:0xffffb1690204fb00 R14:0x0 R15:0xffff8a237b7c8800 [root@five ~]# To see it more clearly, lets get just two of those registers by sample: # perf record -a --intr-regs=ax,bx --user-regs=cx,dx sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 3.502 MB perf.data (1653 samples) ] # Extra info, lets see what gets setup in that 'struct perf_event_attr': # perf evlist -v cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|REGS_USER|REGS_INTR, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, sample_regs_user: 0xc, sample_regs_intr: 0x3 # Cook, some PERF_SAMPLE_REGS_USER|PERF_SAMPLE_REGS_INTR + attr.sample_regs_user and attr.sample_regs_intr register masks, now lets see if those newlines are gone in a more compact fashion: # perf script -Fip,iregs,uregs ffffffff8a56df78 ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a56df78 ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a56df78 ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a56df78 ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a56df78 ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a56df78 ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a29b78d ABI:2 AX:0x2a20ffcd6000 BX:0x2ec7d9000 ABI:2 CX:0x7f204460e49b DX:0xf42920 # And where was that? # perf script -Fip,iregs,uregs,sym,dso ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a29b78d __vma_link_rb (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0x2a20ffcd6000 BX:0x2ec7d9000 ABI:2 CX:0x7f204460e49b DX:0xf42920 # Signed-off-by: Stephane Eranian <eranian@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200418231908.152212-1-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30perf synthetic events: Remove use of sscanf from /proc readingIan Rogers
The synthesize benchmark, run on a single process and thread, shows perf_event__synthesize_mmap_events as the hottest function with fgets and sscanf taking the majority of execution time. fscanf performs similarly well. Replace the scanf call with manual reading of each field of the /proc/pid/maps line, and remove some unnecessary buffering. This change also addresses potential, but unlikely, buffer overruns for the string values read by scanf. Performance before is: $ sudo perf bench internals synthesize -m 16 -M 16 -s -t \# Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 102.810 usec (+- 0.027 usec) Average num. events: 17.000 (+- 0.000) Average time per event 6.048 usec Average data synthesis took: 106.325 usec (+- 0.018 usec) Average num. events: 89.000 (+- 0.000) Average time per event 1.195 usec Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 16 Average synthesis took: 68103.100 usec (+- 441.234 usec) Average num. events: 30703.000 (+- 0.730) Average time per event 2.218 usec And after is: $ sudo perf bench internals synthesize -m 16 -M 16 -s -t \# Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 50.388 usec (+- 0.031 usec) Average num. events: 17.000 (+- 0.000) Average time per event 2.964 usec Average data synthesis took: 52.693 usec (+- 0.020 usec) Average num. events: 89.000 (+- 0.000) Average time per event 0.592 usec Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 16 Average synthesis took: 45022.400 usec (+- 552.740 usec) Average num. events: 30624.200 (+- 10.037) Average time per event 1.470 usec On a Intel Xeon 6154 compiling with Debian gcc 9.2.1. Committer testing: On a AMD Ryzen 5 3600X 6-Core Processor: Before: # perf bench internals synthesize --min-threads 12 --max-threads 12 --st --mt # Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 267.491 usec (+- 0.176 usec) Average num. events: 56.000 (+- 0.000) Average time per event 4.777 usec Average data synthesis took: 277.257 usec (+- 0.169 usec) Average num. events: 287.000 (+- 0.000) Average time per event 0.966 usec Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 12 Average synthesis took: 81599.500 usec (+- 346.315 usec) Average num. events: 36096.100 (+- 2.523) Average time per event 2.261 usec # After: # perf bench internals synthesize --min-threads 12 --max-threads 12 --st --mt # Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 110.125 usec (+- 0.080 usec) Average num. events: 56.000 (+- 0.000) Average time per event 1.967 usec Average data synthesis took: 118.518 usec (+- 0.057 usec) Average num. events: 287.000 (+- 0.000) Average time per event 0.413 usec Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 12 Average synthesis took: 43490.700 usec (+- 284.527 usec) Average num. events: 37028.500 (+- 0.563) Average time per event 1.175 usec # Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrey Zhizhikin <andrey.z@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200415054050.31645-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30tools api: Add a lightweight buffered reading apiIan Rogers
The synthesize benchmark shows the majority of execution time going to fgets and sscanf, necessary to parse /proc/pid/maps. Add a new buffered reading library that will be used to replace these calls in a follow-up CL. Add tests for the library to perf test. Committer tests: $ perf test api 63: Test api io : Ok $ Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrey Zhizhikin <andrey.z@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200415054050.31645-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30perf bench: Add a multi-threaded synthesize benchmarkIan Rogers
By default this isn't run as it reads /proc and may not have access. For consistency, modify the single threaded benchmark to compute an average time per event. Committer testing: $ grep -m1 "model name" /proc/cpuinfo model name : Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz $ grep "model name" /proc/cpuinfo | wc -l 8 $ $ perf bench internals synthesize -h # Running 'internals/synthesize' benchmark: Usage: perf bench internals synthesize <options> -I, --multi-iterations <n> Number of iterations used to compute multi-threaded average -i, --single-iterations <n> Number of iterations used to compute single-threaded average -M, --max-threads <n> Maximum number of threads in multithreaded bench -m, --min-threads <n> Minimum number of threads in multithreaded bench -s, --st Run single threaded benchmark -t, --mt Run multi-threaded benchmark $ $ perf bench internals synthesize -t # Running 'internals/synthesize' benchmark: Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 1 Average synthesis took: 65449.000 usec (+- 586.442 usec) Average num. events: 9405.400 (+- 0.306) Average time per event 6.959 usec Number of synthesis threads: 2 Average synthesis took: 37838.300 usec (+- 130.259 usec) Average num. events: 9501.800 (+- 20.469) Average time per event 3.982 usec Number of synthesis threads: 3 Average synthesis took: 48551.400 usec (+- 225.686 usec) Average num. events: 9544.000 (+- 0.000) Average time per event 5.087 usec Number of synthesis threads: 4 Average synthesis took: 29632.500 usec (+- 50.808 usec) Average num. events: 9544.000 (+- 0.000) Average time per event 3.105 usec Number of synthesis threads: 5 Average synthesis took: 33920.400 usec (+- 284.509 usec) Average num. events: 9544.000 (+- 0.000) Average time per event 3.554 usec Number of synthesis threads: 6 Average synthesis took: 27604.100 usec (+- 72.344 usec) Average num. events: 9548.000 (+- 0.000) Average time per event 2.891 usec Number of synthesis threads: 7 Average synthesis took: 25406.300 usec (+- 933.371 usec) Average num. events: 9545.500 (+- 0.167) Average time per event 2.662 usec Number of synthesis threads: 8 Average synthesis took: 24110.400 usec (+- 73.229 usec) Average num. events: 9551.000 (+- 0.000) Average time per event 2.524 usec $ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrey Zhizhikin <andrey.z@gmail.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200415054050.31645-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-26objtool: Fix infinite loop in for_offset_range()Josh Poimboeuf
Randy reported that objtool got stuck in an infinite loop when processing drivers/i2c/busses/i2c-parport.o. It was caused by the following code: 00000000000001fd <line_set>: 1fd: 48 b8 00 00 00 00 00 movabs $0x0,%rax 204: 00 00 00 1ff: R_X86_64_64 .rodata-0x8 207: 41 55 push %r13 209: 41 89 f5 mov %esi,%r13d 20c: 41 54 push %r12 20e: 49 89 fc mov %rdi,%r12 211: 55 push %rbp 212: 48 89 d5 mov %rdx,%rbp 215: 53 push %rbx 216: 0f b6 5a 01 movzbl 0x1(%rdx),%ebx 21a: 48 8d 34 dd 00 00 00 lea 0x0(,%rbx,8),%rsi 221: 00 21e: R_X86_64_32S .rodata 222: 48 89 f1 mov %rsi,%rcx 225: 48 29 c1 sub %rax,%rcx find_jump_table() saw the .rodata reference and tried to find a jump table associated with it (though there wasn't one). The -0x8 rela addend is unusual. It caused find_jump_table() to send a negative table_offset (unsigned 0xfffffffffffffff8) to find_rela_by_dest(). The negative offset should have been harmless, but it actually threw for_offset_range() for a loop... literally. When the mask value got incremented past the end value, it also wrapped to zero, causing the loop exit condition to remain true forever. Prevent this scenario from happening by ensuring the incremented value is always >= the starting value. Fixes: 74b873e49d92 ("objtool: Optimize find_rela_by_dest_range()") Reported-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Julien Thierry <jthierry@redhat.com> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/02b719674b031800b61e33c30b2e823183627c19.1587842122.git.jpoimboe@redhat.com
2020-04-25Merge tag 'objtool-urgent-2020-04-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Ingo Molnar: "Two fixes: fix an off-by-one bug, and fix 32-bit builds on 64-bit systems" * tag 'objtool-urgent-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix off-by-one in symbol_by_offset() objtool: Fix 32bit cross builds
2020-04-25objtool: Fix stack offset tracking for indirect CFAsJosh Poimboeuf
When the current frame address (CFA) is stored on the stack (i.e., cfa->base == CFI_SP_INDIRECT), objtool neglects to adjust the stack offset when there are subsequent pushes or pops. This results in bad ORC data at the end of the ENTER_IRQ_STACK macro, when it puts the previous stack pointer on the stack and does a subsequent push. This fixes the following unwinder warning: WARNING: can't dereference registers at 00000000f0a6bdba for ip interrupt_entry+0x9f/0xa0 Fixes: 627fce14809b ("objtool: Add ORC unwind table generation") Reported-by: Vince Weaver <vincent.weaver@maine.edu> Reported-by: Dave Jones <dsj@fb.com> Reported-by: Steven Rostedt <rostedt@goodmis.org> Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Reported-by: Joe Mario <jmario@redhat.com> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/853d5d691b29e250333332f09b8e27410b2d9924.1587808742.git.jpoimboe@redhat.com
2020-04-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix memory leak in netfilter flowtable, from Roi Dayan. 2) Ref-count leaks in netrom and tipc, from Xiyu Yang. 3) Fix warning when mptcp socket is never accepted before close, from Florian Westphal. 4) Missed locking in ovs_ct_exit(), from Tonghao Zhang. 5) Fix large delays during PTP synchornization in cxgb4, from Rahul Lakkireddy. 6) team_mode_get() can hang, from Taehee Yoo. 7) Need to use kvzalloc() when allocating fw tracer in mlx5 driver, from Niklas Schnelle. 8) Fix handling of bpf XADD on BTF memory, from Jann Horn. 9) Fix BPF_STX/BPF_B encoding in x86 bpf jit, from Luke Nelson. 10) Missing queue memory release in iwlwifi pcie code, from Johannes Berg. 11) Fix NULL deref in macvlan device event, from Taehee Yoo. 12) Initialize lan87xx phy correctly, from Yuiko Oshino. 13) Fix looping between VRF and XFRM lookups, from David Ahern. 14) etf packet scheduler assumes all sockets are full sockets, which is not necessarily true. From Eric Dumazet. 15) Fix mptcp data_fin handling in RX path, from Paolo Abeni. 16) fib_select_default() needs to handle nexthop objects, from David Ahern. 17) Use GFP_ATOMIC under spinlock in mac80211_hwsim, from Wei Yongjun. 18) vxlan and geneve use wrong nlattr array, from Sabrina Dubroca. 19) Correct rx/tx stats in bcmgenet driver, from Doug Berger. 20) BPF_LDX zero-extension is encoded improperly in x86_32 bpf jit, fix from Luke Nelson. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (100 commits) selftests/bpf: Fix a couple of broken test_btf cases tools/runqslower: Ensure own vmlinux.h is picked up first bpf: Make bpf_link_fops static bpftool: Respect the -d option in struct_ops cmd selftests/bpf: Add test for freplace program with expected_attach_type bpf: Propagate expected_attach_type when verifying freplace programs bpf: Fix leak in LINK_UPDATE and enforce empty old_prog_fd bpf, x86_32: Fix logic error in BPF_LDX zero-extension bpf, x86_32: Fix clobbering of dst for BPF_JSET bpf, x86_32: Fix incorrect encoding in BPF_LDX zero-extension bpf: Fix reStructuredText markup net: systemport: suppress warnings on failed Rx SKB allocations net: bcmgenet: suppress warnings on failed Rx SKB allocations macsec: avoid to set wrong mtu mac80211: sta_info: Add lockdep condition for RCU list usage mac80211: populate debugfs only after cfg80211 init net: bcmgenet: correct per TX/RX ring statistics net: meth: remove spurious copyright text net: phy: bcm84881: clear settings on link down chcr: Fix CPU hard lockup ...
2020-04-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2020-04-24 The following pull-request contains BPF updates for your *net* tree. We've added 17 non-merge commits during the last 5 day(s) which contain a total of 19 files changed, 203 insertions(+), 85 deletions(-). The main changes are: 1) link_update fix, from Andrii. 2) libbpf get_xdp_id fix, from David. 3) xadd verifier fix, from Jann. 4) x86-32 JIT fixes, from Luke and Wang. 5) test_btf fix, from Stanislav. 6) freplace verifier fix, from Toke. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-24selftests/bpf: Fix a couple of broken test_btf casesStanislav Fomichev
Commit 51c39bb1d5d1 ("bpf: Introduce function-by-function verification") introduced function linkage flag and changed the error message from "vlen != 0" to "Invalid func linkage" and broke some fake BPF programs. Adjust the test accordingly. AFACT, the programs don't really need any arguments and only look at BTF for maps, so let's drop the args altogether. Before: BTF raw test[103] (func (Non zero vlen)): do_test_raw:3703:FAIL expected err_str:vlen != 0 magic: 0xeb9f version: 1 flags: 0x0 hdr_len: 24 type_off: 0 type_len: 72 str_off: 72 str_len: 10 btf_total_size: 106 [1] INT (anon) size=4 bits_offset=0 nr_bits=32 encoding=SIGNED [2] INT (anon) size=4 bits_offset=0 nr_bits=32 encoding=(none) [3] FUNC_PROTO (anon) return=0 args=(1 a, 2 b) [4] FUNC func type_id=3 Invalid func linkage BTF libbpf test[1] (test_btf_haskv.o): libbpf: load bpf program failed: Invalid argument libbpf: -- BEGIN DUMP LOG --- libbpf: Validating test_long_fname_2() func#1... Arg#0 type PTR in test_long_fname_2() is not supported yet. processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 libbpf: -- END LOG -- libbpf: failed to load program 'dummy_tracepoint' libbpf: failed to load object 'test_btf_haskv.o' do_test_file:4201:FAIL bpf_object__load: -4007 BTF libbpf test[2] (test_btf_newkv.o): libbpf: load bpf program failed: Invalid argument libbpf: -- BEGIN DUMP LOG --- libbpf: Validating test_long_fname_2() func#1... Arg#0 type PTR in test_long_fname_2() is not supported yet. processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 libbpf: -- END LOG -- libbpf: failed to load program 'dummy_tracepoint' libbpf: failed to load object 'test_btf_newkv.o' do_test_file:4201:FAIL bpf_object__load: -4007 BTF libbpf test[3] (test_btf_nokv.o): libbpf: load bpf program failed: Invalid argument libbpf: -- BEGIN DUMP LOG --- libbpf: Validating test_long_fname_2() func#1... Arg#0 type PTR in test_long_fname_2() is not supported yet. processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 libbpf: -- END LOG -- libbpf: failed to load program 'dummy_tracepoint' libbpf: failed to load object 'test_btf_nokv.o' do_test_file:4201:FAIL bpf_object__load: -4007 Fixes: 51c39bb1d5d1 ("bpf: Introduce function-by-function verification") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200422003753.124921-1-sdf@google.com
2020-04-24tools/runqslower: Ensure own vmlinux.h is picked up firstAndrii Nakryiko
Reorder include paths to ensure that runqslower sources are picking up vmlinux.h, generated by runqslower's own Makefile. When runqslower is built from selftests/bpf, due to current -I$(BPF_INCLUDE) -I$(OUTPUT) ordering, it might pick up not-yet-complete vmlinux.h, generated by selftests Makefile, which could lead to compilation errors like [0]. So ensure that -I$(OUTPUT) goes first and rely on runqslower's Makefile own dependency chain to ensure vmlinux.h is properly completed before source code relying on it is compiled. [0] https://travis-ci.org/github/libbpf/libbpf/jobs/677905925 Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200422012407.176303-1-andriin@fb.com
2020-04-24bpftool: Respect the -d option in struct_ops cmdMartin KaFai Lau
In the prog cmd, the "-d" option turns on the verifier log. This is missed in the "struct_ops" cmd and this patch fixes it. Fixes: 65c93628599d ("bpftool: Add struct_ops support") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200424182911.1259355-1-kafai@fb.com
2020-04-24selftests/bpf: Add test for freplace program with expected_attach_typeToke Høiland-Jørgensen
This adds a new selftest that tests the ability to attach an freplace program to a program type that relies on the expected_attach_type of the target program to pass verification. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158773526831.293902.16011743438619684815.stgit@toke.dk
2020-04-24bpf: Fix reStructuredText markupJakub Wilk
The patch fixes: $ scripts/bpf_helpers_doc.py > bpf-helpers.rst $ rst2man bpf-helpers.rst > bpf-helpers.7 bpf-helpers.rst:1105: (WARNING/2) Inline strong start-string without end-string. Signed-off-by: Jakub Wilk <jwilk@jwilk.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20200422082324.2030-1-jwilk@jwilk.net
2020-04-24Merge tag 'pm-5.7-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "Restore an optimization related to asynchronous suspend and resume of devices during system-wide power transitions that was disabled by mistake (Kai-Heng Feng) and update the pm-graph suite of power management utilities (Todd Brandt)" * tag 'pm-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: sleep: core: Switch back to async_schedule_dev() pm-graph v5.6
2020-04-24selftests/ftrace: Check the first record for kprobe_args_type.tcXiao Yang
It is possible to get multiple records from trace during test and then more than 4 arguments are assigned to ARGS. This situation results in the failure of kprobe_args_type.tc. For example: ----------------------------------------------------------- grep testprobe trace ftracetest-5902 [001] d... 111195.682227: testprobe: (_do_fork+0x0/0x460) arg1=334823024 arg2=334823024 arg3=0x13f4fe70 arg4=7 pmlogger-5949 [000] d... 111195.709898: testprobe: (_do_fork+0x0/0x460) arg1=345308784 arg2=345308784 arg3=0x1494fe70 arg4=7 grep testprobe trace sed -e 's/.* arg1=\(.*\) arg2=\(.*\) arg3=\(.*\) arg4=\(.*\)/\1 \2 \3 \4/' ARGS='334823024 334823024 0x13f4fe70 7 345308784 345308784 0x1494fe70 7' ----------------------------------------------------------- We don't care which process calls do_fork so just check the first record to fix the issue. Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-04-23selftests: add build/cross-build dependency check scriptShuah Khan
Add build/cross-build dependency check script kselftest_deps.sh This script does the following: Usage: ./kselftest_deps.sh -[p] <compiler> [test_name] kselftest_deps.sh [-p] gcc kselftest_deps.sh [-p] gcc vm kselftest_deps.sh [-p] aarch64-linux-gnu-gcc kselftest_deps.sh [-p] aarch64-linux-gnu-gcc vm - Should be run in selftests directory in the kernel repo. - Checks if Kselftests can be built/cross-built on a system. - Parses all test/sub-test Makefile to find library dependencies. - Runs compile test on a trivial C file with LDLIBS specified in the test Makefiles to identify missing library dependencies. - Prints suggested target list for a system filtering out tests failed the build dependency check from the TARGETS in Selftests the main Makefile when optional -p is specified. - Prints pass/fail dependency check for each tests/sub-test. - Prints pass/fail targets and libraries. - Default: runs dependency checks on all tests. - Optional test name can be specified to check dependencies for it. To make LDLIBS parsing easier - change gpio and memfd Makefiles to use the same temporary variable used to find and add libraries to LDLIBS. - simlify LDLIBS append logic in intel_pstate/Makefile. Results from run on x86_64 system (trimmed detailed pass/fail list): ======================================================== Kselftest Dependency Check for [./kselftest_deps.sh gcc ] results... ======================================================== Checked tests defining LDLIBS dependencies -------------------------------------------------------- Total tests with Dependencies: 55 Pass: 53 Fail: 2 -------------------------------------------------------- Targets passed build dependency check on system: bpf capabilities filesystems futex gpio intel_pstate membarrier memfd mqueue net powerpc ptp rseq rtc safesetid timens timers vDSO vm -------------------------------------------------------- FAIL: netfilter/Makefile dependency check: -lmnl FAIL: gpio/Makefile dependency check: -lmount -------------------------------------------------------- Targets failed build dependency check on system: gpio netfilter -------------------------------------------------------- Missing libraries system -lmnl -lmount -------------------------------------------------------- ======================================================== Results from run on x86_64 system with aarch64-linux-gnu-gcc: (trimmed detailed pass/fail list): ======================================================== Kselftest Dependency Check for [./kselftest_deps.sh aarch64-linux-gnu-gcc ] results... ======================================================== Checked tests defining LDLIBS dependencies -------------------------------------------------------- Total tests with Dependencies: 55 Pass: 41 Fail: 14 -------------------------------------------------------- Targets failed build dependency check on system: bpf capabilities filesystems futex gpio intel_pstate membarrier memfd mqueue net powerpc ptp rseq rtc timens timers vDSO vm -------------------------------------------------------- -------------------------------------------------------- Targets failed build dependency check on system: bpf capabilities gpio memfd mqueue net netfilter safesetid vm -------------------------------------------------------- Missing libraries system -lcap -lcap-ng -lelf -lfuse -lmnl -lmount -lnuma -lpopt -lz -------------------------------------------------------- ======================================================== Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-04-23selftests/ftrace: Check required filter files before running testXiao Yang
Without CONFIG_DYNAMIC_FTRACE, some tests get failure because required filter files(set_ftrace_filter/available_filter_functions/stack_trace_filter) are missing. So implement check_filter_file() and make all related tests check required filter files by it. BTW: set_ftrace_filter and available_filter_functions are introduced together so just check either of them. Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-04-23perf record: Add num-synthesize-threads optionStephane Eranian
To control degree of parallelism of the synthesize_mmap() code which is scanning /proc/PID/task/PID/maps and can be time consuming. Mimic perf top way of handling the option. If not specified will default to 1 thread, i.e. default behavior before this option. On a desktop computer the processing of /proc/PID/task/PID/maps isn't slow enough to warrant parallel processing and the thread creation has some cost - hence the default of 1. On a loaded server with >100 cores it is possible to see synthesis times in the order of seconds and in this case having the option is desirable. As the processing is a synchronization point, it is legitimate to worry if Amdahl's law will apply to this patch. Profiling with this patch in place: https://lore.kernel.org/lkml/20200415054050.31645-4-irogers@google.com/ shows: ... - 32.59% __perf_event__synthesize_threads - 32.54% __event__synthesize_thread + 22.13% perf_event__synthesize_mmap_events + 6.68% perf_event__get_comm_ids.constprop.0 + 1.49% process_synthesized_event + 1.29% __GI___readdir64 + 0.60% __opendir ... That is the processing is 1.49% of execution time and there is plenty to make parallel. This is shown in the benchmark in this patch: https://lore.kernel.org/lkml/20200415054050.31645-2-irogers@google.com/ Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 1 Average synthesis took: 127729.000 usec (+- 3372.880 usec) Average num. events: 21548.600 (+- 0.306) Average time per event 5.927 usec Number of synthesis threads: 2 Average synthesis took: 88863.500 usec (+- 385.168 usec) Average num. events: 21552.800 (+- 0.327) Average time per event 4.123 usec Number of synthesis threads: 3 Average synthesis took: 83257.400 usec (+- 348.617 usec) Average num. events: 21553.200 (+- 0.327) Average time per event 3.863 usec Number of synthesis threads: 4 Average synthesis took: 75093.000 usec (+- 422.978 usec) Average num. events: 21554.200 (+- 0.200) Average time per event 3.484 usec Number of synthesis threads: 5 Average synthesis took: 64896.600 usec (+- 353.348 usec) Average num. events: 21558.000 (+- 0.000) Average time per event 3.010 usec Number of synthesis threads: 6 Average synthesis took: 59210.200 usec (+- 342.890 usec) Average num. events: 21560.000 (+- 0.000) Average time per event 2.746 usec Number of synthesis threads: 7 Average synthesis took: 54093.900 usec (+- 306.247 usec) Average num. events: 21562.000 (+- 0.000) Average time per event 2.509 usec Number of synthesis threads: 8 Average synthesis took: 48938.700 usec (+- 341.732 usec) Average num. events: 21564.000 (+- 0.000) Average time per event 2.269 usec Where average time per synthesized event goes from 5.927 usec with 1 thread to 2.269 usec with 8. This isn't a linear speed up as not all of synthesize code has been made parallel. If the synthesis time was about 10 seconds then using 8 threads may bring this down to less than 4. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Jones <tonyj@suse.de> Cc: yuzhoujian <yuzhoujian@didichuxing.com> Link: http://lore.kernel.org/lkml/20200422155038.9380-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-23perf test session topology: Fix data pathTommi Rantala
Commit 2d4f27999b88 ("perf data: Add global path holder") missed path conversion in tests/topology.c, causing the "Session topology" testcase to "hang" (waits forever for input from stdin) when doing "ssh $VM perf test". Can be reproduced by running "cat | perf test topo", and crashed by replacing cat with true: $ true | perf test -v topo 40: Session topology : --- start --- test child forked, pid 3638 templ file: /tmp/perf-test-QPvAch incompatible file format incompatible file format (rerun with -v to learn more) free(): invalid pointer test child interrupted ---- end ---- Session topology: FAILED! Committer testing: Reproduced the above result before the patch and after it is back working: # true | perf test -v topo 41: Session topology : --- start --- test child forked, pid 19374 templ file: /tmp/perf-test-YOTEQg CPU 0, core 0, socket 0 CPU 1, core 1, socket 0 CPU 2, core 2, socket 0 CPU 3, core 3, socket 0 CPU 4, core 0, socket 0 CPU 5, core 1, socket 0 CPU 6, core 2, socket 0 CPU 7, core 3, socket 0 test child finished with 0 ---- end ---- Session topology: Ok # Fixes: 2d4f27999b88 ("perf data: Add global path holder") Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200423115341.562782-1-tommi.t.rantala@nokia.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-23perf stat: Improve runtime stat for interval modeJin Yao
For interval mode, the metric is printed after the '#' character if it exists. But it's not calculated by the counts generated in this interval. See the following examples: root@kbl-ppc:~# perf stat -M CPI -I1000 --interval-count 2 # time counts unit events 1.000422803 764,809 inst_retired.any # 2.9 CPI 1.000422803 2,234,932 cycles 2.001464585 1,960,061 inst_retired.any # 1.6 CPI 2.001464585 4,022,591 cycles The second CPI should not be 1.6 (4,022,591/1,960,061 is 2.1) root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2 # time counts unit events 1.000429493 2,869,311 cycles 1.000429493 816,875 instructions # 0.28 insn per cycle 2.001516426 9,260,973 cycles 2.001516426 5,250,634 instructions # 0.87 insn per cycle The second 'insn per cycle' should not be 0.87 (5,250,634/9,260,973 is 0.57). The current code uses a global variable 'rt_stat' for tracking and updating the std dev of runtime stat. Unlike the counts, 'rt_stat' is not reset for interval. While the counts are reset for interval. perf_stat_process_counter() { if (config->interval) init_stats(ps->res_stats); } So for interval mode, the 'rt_stat' variable should be reset too. This patch resets 'rt_stat' before read_counters(), so the runtime stat is only calculated by the counts generated in this interval. With this patch: root@kbl-ppc:~# perf stat -M CPI -I1000 --interval-count 2 # time counts unit events 1.000420924 2,408,818 inst_retired.any # 2.1 CPI 1.000420924 5,010,111 cycles 2.001448579 2,798,407 inst_retired.any # 1.6 CPI 2.001448579 4,599,861 cycles root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2 # time counts unit events 1.000428555 2,769,714 cycles 1.000428555 774,462 instructions # 0.28 insn per cycle 2.001471562 3,595,904 cycles 2.001471562 1,243,703 instructions # 0.35 insn per cycle Now the second 'insn per cycle' and CPI are calculated by the counts generated in this interval. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-By: Kajol Jain <kjain@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200420145417.6864-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-22libbpf: Only check mode flags in get_xdp_idDavid Ahern
The commit in the Fixes tag changed get_xdp_id to only return prog_id if flags is 0, but there are other XDP flags than the modes - e.g., XDP_FLAGS_UPDATE_IF_NOEXIST. Since the intention was only to look at MODE flags, clear other ones before checking if flags is 0. Fixes: f07cbad29741 ("libbpf: Fix bpf_get_link_xdp_id flags handling") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrey Ignatov <rdna@fb.com>
2020-04-22ipv4: Update fib_select_default to handle nexthop objectsDavid Ahern
A user reported [0] hitting the WARN_ON in fib_info_nh: [ 8633.839816] ------------[ cut here ]------------ [ 8633.839819] WARNING: CPU: 0 PID: 1719 at include/net/nexthop.h:251 fib_select_path+0x303/0x381 ... [ 8633.839846] RIP: 0010:fib_select_path+0x303/0x381 ... [ 8633.839848] RSP: 0018:ffffb04d407f7d00 EFLAGS: 00010286 [ 8633.839850] RAX: 0000000000000000 RBX: ffff9460b9897ee8 RCX: 00000000000000fe [ 8633.839851] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000000 [ 8633.839852] RBP: ffff946076049850 R08: 0000000059263a83 R09: ffff9460840e4000 [ 8633.839853] R10: 0000000000000014 R11: 0000000000000000 R12: ffffb04d407f7dc0 [ 8633.839854] R13: ffffffffa4ce3240 R14: 0000000000000000 R15: ffff9460b7681f60 [ 8633.839857] FS: 00007fcac2e02700(0000) GS:ffff9460bdc00000(0000) knlGS:0000000000000000 [ 8633.839858] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8633.839859] CR2: 00007f27beb77e28 CR3: 0000000077734000 CR4: 00000000000006f0 [ 8633.839867] Call Trace: [ 8633.839871] ip_route_output_key_hash_rcu+0x421/0x890 [ 8633.839873] ip_route_output_key_hash+0x5e/0x80 [ 8633.839876] ip_route_output_flow+0x1a/0x50 [ 8633.839878] __ip4_datagram_connect+0x154/0x310 [ 8633.839880] ip4_datagram_connect+0x28/0x40 [ 8633.839882] __sys_connect+0xd6/0x100 ... The WARN_ON is triggered in fib_select_default which is invoked when there are multiple default routes. Update the function to use fib_info_nhc and convert the nexthop checks to use fib_nh_common. Add test case that covers the affected code path. [0] https://github.com/FRRouting/frr/issues/6089 Fixes: 493ced1ac47c ("ipv4: Allow routes to use nexthop objects") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22objtool: Fix off-by-one in symbol_by_offset()Julien Thierry
Sometimes, WARN_FUNC() and other users of symbol_by_offset() will associate the first instruction of a symbol with the symbol preceding it. This is because symbol->offset + symbol->len is already outside of the symbol's range. Fixes: 2a362ecc3ec9 ("objtool: Optimize find_symbol_*() and read_symbols()") Signed-off-by: Julien Thierry <jthierry@redhat.com> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
2020-04-22objtool: Fix 32bit cross buildsPeter Zijlstra
Apparently there's people doing 64bit builds on 32bit machines. Fixes: 74b873e49d92 ("objtool: Optimize find_rela_by_dest_range()") Reported-by: youling257@gmail.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2020-04-22selftests: Fix suppress test in fib_tests.shDavid Ahern
fib_tests is spewing errors: ... Cannot open network namespace "ns1": No such file or directory Cannot open network namespace "ns1": No such file or directory Cannot open network namespace "ns1": No such file or directory Cannot open network namespace "ns1": No such file or directory ping: connect: Network is unreachable Cannot open network namespace "ns1": No such file or directory Cannot open network namespace "ns1": No such file or directory ... Each test entry in fib_tests is supposed to do its own setup and cleanup. Right now the $IP commands in fib_suppress_test are failing because there is no ns1. Add the setup/cleanup and logging expected for each test. Fixes: ca7a03c41753 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule") Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22perf stat: Zero all the 'ena' and 'run' array slot stats for interval modeJin Yao
As the code comments in perf_stat_process_counter() say, we calculate counter's data every interval, and the display code shows ps->res_stats avg value. We need to zero the stats for interval mode. But the current code only zeros the res_stats[0], it doesn't zero the res_stats[1] and res_stats[2], which are for ena and run of counter. This patch zeros the whole res_stats[] for interval mode. Fixes: 51fd2df1e882 ("perf stat: Fix interval output values") Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200409070755.17261-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-22Merge tag 'linux-kselftest-5.7-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: "This consists of fixes to runner scripts and individual test run-time bugs. Includes fixes to tpm2 and memfd test run-time regressions" * tag 'linux-kselftest-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/ipc: Fix test failure seen after initial test run Revert "Kernel selftests: tpm2: check for tpm support" selftests/ftrace: Add CONFIG_SAMPLE_FTRACE_DIRECT=m kconfig selftests/seccomp: allow clock_nanosleep instead of nanosleep kselftest/runner: allow to properly deliver signals to tests selftests/harness: fix spelling mistake "SIGARLM" -> "SIGALRM" selftests: Fix memfd test run-time regression selftests: vm: Fix 64-bit test builds for powerpc64le selftests: vm: Do not override definition of ARCH
2020-04-22perf script: Avoid NULL dereference on symbolIan Rogers
al->sym may be NULL given current if conditions and may cause a segv. Fixes: d2bedb7863e9 ("perf script: Allow --symbol to accept hexadecimal addresses") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200421004329.43109-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-22perf evlist: Remove duplicate headersJagadeesh Pagadala
Code cleanup: Remove duplicate headers which are included twice. Signed-off-by: Jagadeesh Pagadala <jagdsh.linux@gmail.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/1587276836-17088-1-git-send-email-jagdsh.linux@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-22perf bench: Fix div-by-zero if runtime is zeroTommi Rantala
Fix div-by-zero if runtime is zero: $ perf bench futex hash --runtime=0 # Running 'futex/hash' benchmark: Run summary [PID 12090]: 4 threads, each operating on 1024 [private] futexes for 0 secs. Floating point exception (core dumped) Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200417132330.119407-4-tommi.t.rantala@nokia.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-22perf cgroup: Avoid needless closing of unopened fdTommi Rantala
Do not bother with close() if fd is not valid, just to silence valgrind: $ valgrind ./perf script ==59169== Memcheck, a memory error detector ==59169== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==59169== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info ==59169== Command: ./perf script ==59169== ==59169== Warning: invalid file descriptor -1 in syscall close() ==59169== Warning: invalid file descriptor -1 in syscall close() ==59169== Warning: invalid file descriptor -1 in syscall close() ==59169== Warning: invalid file descriptor -1 in syscall close() ==59169== Warning: invalid file descriptor -1 in syscall close() ==59169== Warning: invalid file descriptor -1 in syscall close() ==59169== Warning: invalid file descriptor -1 in syscall close() ==59169== Warning: invalid file descriptor -1 in syscall close() Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200417132330.119407-1-tommi.t.rantala@nokia.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-22Merge tag 'perf-core-for-mingo-5.8-20200420' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core fixes and improvements from Arnaldo Carvalho de Melo: kernel + tools/perf: Alexey Budankov: - Introduce CAP_PERFMON to kernel and user space. callchains: Adrian Hunter: - Allow using Intel PT to synthesize callchains for regular events. Kan Liang: - Stitch LBR records from multiple samples to get deeper backtraces, there are caveats, see the csets for details. perf script: Andreas Gerstmayr: - Add flamegraph.py script BPF: Jiri Olsa: - Synthesize bpf_trampoline/dispatcher ksymbol events. perf stat: Arnaldo Carvalho de Melo: - Honour --timeout for forked workloads. Stephane Eranian: - Force error in fallback on :k events, to avoid counting nothing when the user asks for kernel events but is not allowed to. perf bench: Ian Rogers: - Add event synthesis benchmark. tools api fs: Stephane Eranian: - Make xxx__mountpoint() more scalable libtraceevent: He Zhe: - Handle return value of asprintf. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-21Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "15 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: tools/vm: fix cross-compile build coredump: fix null pointer dereference on coredump mm: shmem: disable interrupt when acquiring info->lock in userfaultfd_copy path shmem: fix possible deadlocks on shmlock_user_lock vmalloc: fix remap_vmalloc_range() bounds checks mm/shmem: fix build without THP mm/ksm: fix NULL pointer dereference when KSM zero page is enabled tools/build: tweak unused value workaround checkpatch: fix a typo in the regex for $allocFunctions mm, gup: return EINTR when gup is interrupted by fatal signals mm/hugetlb: fix a addressing exception caused by huge_pte_offset MAINTAINERS: add an entry for kfifo mm/userfaultfd: disable userfaultfd-wp on x86_32 slub: avoid redzone when choosing freepointer location sh: fix build error in mm/init.c
2020-04-21Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes and cleanups from Michael Tsirkin: - Some bug fixes - Cleanup a couple of issues that surfaced meanwhile - Disable vhost on ARM with OABI for now - to be fixed fully later in the cycle or in the next release. * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (24 commits) vhost: disable for OABI virtio: drop vringh.h dependency virtio_blk: add a missing include virtio-balloon: Avoid using the word 'report' when referring to free page hinting virtio-balloon: make virtballoon_free_page_report() static vdpa: fix comment of vdpa_register_device() vdpa: make vhost, virtio depend on menu vdpa: allow a 32 bit vq alignment drm/virtio: fix up for include file changes remoteproc: pull in slab.h rpmsg: pull in slab.h virtio_input: pull in slab.h remoteproc: pull in slab.h virtio-rng: pull in slab.h virtgpu: pull in uaccess.h tools/virtio: make asm/barrier.h self contained tools/virtio: define aligned attribute virtio/test: fix up after IOTLB changes vhost: Create accessors for virtqueues private_data vdpasim: Return status in vdpasim_get_status ...
2020-04-21tools/vm: fix cross-compile buildLucas Stach
Commit 7ed1c1901fe5 ("tools: fix cross-compile var clobbering") moved the setup of the CC variable to tools/scripts/Makefile.include to make the behavior consistent across all the tools Makefiles. As the vm tools missed the include we end up with the wrong CC in a cross-compiling evironment. Fixes: 7ed1c1901fe5 (tools: fix cross-compile var clobbering) Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Martin Kelly <martin@martingkelly.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200416104748.25243-1-l.stach@pengutronix.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-21tools/build: tweak unused value workaroundGeorge Burgess IV
Clang has -Wself-assign enabled by default under -Wall, which always gets -Werror'ed on this file, causing sync-compare-and-swap to be disabled by default. The generally-accepted way to spell "this value is intentionally unused," is casting it to `void`. This is accepted by both GCC and Clang with -Wall enabled: https://godbolt.org/z/qqZ9r3 Signed-off-by: George Burgess IV <gbiv@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Link: http://lkml.kernel.org/r/20200414195638.156123-1-gbiv@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-20bpf, selftests: Add test for BPF_STX BPF_B storing R10Luke Nelson
This patch adds a test to test_verifier that writes the lower 8 bits of R10 (aka FP) using BPF_B to an array map and reads the result back. The expected behavior is that the result should be the same as first copying R10 to R9, and then storing / loading the lower 8 bits of R9. This test catches a bug that was present in the x86-64 JIT that caused an incorrect encoding for BPF_STX BPF_B when the source operand is R10. Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200418232655.23870-2-luke.r.nels@gmail.com
2020-04-20bpf: Forbid XADD on spilled pointers for unprivileged usersJann Horn
When check_xadd() verifies an XADD operation on a pointer to a stack slot containing a spilled pointer, check_stack_read() verifies that the read, which is part of XADD, is valid. However, since the placeholder value -1 is passed as `value_regno`, check_stack_read() can only return a binary decision and can't return the type of the value that was read. The intent here is to verify whether the value read from the stack slot may be used as a SCALAR_VALUE; but since check_stack_read() doesn't check the type, and the type information is lost when check_stack_read() returns, this is not enforced, and a malicious user can abuse XADD to leak spilled kernel pointers. Fix it by letting check_stack_read() verify that the value is usable as a SCALAR_VALUE if no type information is passed to the caller. To be able to use __is_pointer_value() in check_stack_read(), move it up. Fix up the expected unprivileged error message for a BPF selftest that, until now, assumed that unprivileged users can use XADD on stack-spilled pointers. This also gives us a test for the behavior introduced in this patch for free. In theory, this could also be fixed by forbidding XADD on stack spills entirely, since XADD is a locked operation (for operations on memory with concurrency) and there can't be any concurrency on the BPF stack; but Alexei has said that he wants to keep XADD on stack slots working to avoid changes to the test suite [1]. The following BPF program demonstrates how to leak a BPF map pointer as an unprivileged user using this bug: // r7 = map_pointer BPF_LD_MAP_FD(BPF_REG_7, small_map), // r8 = launder(map_pointer) BPF_STX_MEM(BPF_DW, BPF_REG_FP, BPF_REG_7, -8), BPF_MOV64_IMM(BPF_REG_1, 0), ((struct bpf_insn) { .code = BPF_STX | BPF_DW | BPF_XADD, .dst_reg = BPF_REG_FP, .src_reg = BPF_REG_1, .off = -8 }), BPF_LDX_MEM(BPF_DW, BPF_REG_8, BPF_REG_FP, -8), // store r8 into map BPF_MOV64_REG(BPF_REG_ARG1, BPF_REG_7), BPF_MOV64_REG(BPF_REG_ARG2, BPF_REG_FP), BPF_ALU64_IMM(BPF_ADD, BPF_REG_ARG2, -4), BPF_ST_MEM(BPF_W, BPF_REG_ARG2, 0, 0), BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem), BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), BPF_EXIT_INSN(), BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_8, 0), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN() [1] https://lore.kernel.org/bpf/20200416211116.qxqcza5vo2ddnkdq@ast-mbp.dhcp.thefacebook.com/ Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200417000007.10734-1-jannh@google.com
2020-04-20pm-graph v5.6Todd Brandt
sleepgraph: - force usage of python3 instead of using system default - fix bugzilla 204773 (https://bugzilla.kernel.org/show_bug.cgi?id=204773) - fix issue of platform info not being reset in -multi (logs fill up) - change -ftop call to "pm_suspend", this is one level below state_store - add -wificheck command to read out the current wifi device details - change -wifi behavior to poll /proc/net/wireless for wifi connect - add wifi reconnect time to timeline, include time in summary column - add "fail on wifi_resume" to timeline and summary when wifi fails - add a set of commands to collect data before/after suspend in the log - add "-cmdinfo" command which prints out all the data collected - check for cmd info tools at start, print found/missing in green/red - fix kernel suspend time calculation: tool used to look for start of pm_suspend_console, but the order has changed. latest kernel starts with ksys_sync, use this instead - include time spent in mem/disk in the header (same as freeze/standby) - ignore turbostat 32-bit capability warnings - print to result.txt when -skiphtml is used, just say result: pass - don't exit on SIGTSTP, it's a ctrl-Z and the tool may come back - -multi argument supports duration as well as count: hours, minutes, seconds - update the -multi status output to be more informative - -maxfail sets maximum consecutive fails before a -multi run is aborted - in -summary, ignore dmesg/ftrace/html files that are 0 size bootgraph: - force usage of python3 instead of using system default README: - add endurance testing instructions Makefile: - remove pycache on uninstall Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-04-19Merge tag 'x86-urgent-2020-04-19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 and objtool fixes from Thomas Gleixner: "A set of fixes for x86 and objtool: objtool: - Ignore the double UD2 which is emitted in BUG() when CONFIG_UBSAN_TRAP is enabled. - Support clang non-section symbols in objtool ORC dump - Fix switch table detection in .text.unlikely - Make the BP scratch register warning more robust. x86: - Increase microcode maximum patch size for AMD to cope with new CPUs which have a larger patch size. - Fix a crash in the resource control filesystem when the removal of the default resource group is attempted. - Preserve Code and Data Prioritization enabled state accross CPU hotplug. - Update split lock cpu matching to use the new X86_MATCH macros. - Change the split lock enumeration as Intel finaly decided that the IA32_CORE_CAPABILITIES bits are not architectural contrary to what the SDM claims. !@#%$^! - Add Tremont CPU models to the split lock detection cpu match. - Add a missing static attribute to make sparse happy" * tag 'x86-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/split_lock: Add Tremont family CPU models x86/split_lock: Bits in IA32_CORE_CAPABILITIES are not architectural x86/resctrl: Preserve CDP enable over CPU hotplug x86/resctrl: Fix invalid attempt at removing the default resource group x86/split_lock: Update to use X86_MATCH_INTEL_FAM6_MODEL() x86/umip: Make umip_insns static x86/microcode/AMD: Increase microcode PATCH_MAX_SIZE objtool: Make BP scratch register warning more robust objtool: Fix switch table detection in .text.unlikely objtool: Support Clang non-section symbols in ORC generation objtool: Support Clang non-section symbols in ORC dump objtool: Fix CONFIG_UBSAN_TRAP unreachable warnings
2020-04-19Merge tag 'perf-urgent-2020-04-19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf tooling fixes and updates from Thomas Gleixner: - Fix the header line of perf stat output for '--metric-only --per-socket' - Fix the python build with clang - The usual tools UAPI header synchronization * tag 'perf-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools headers: Synchronize linux/bits.h with the kernel sources tools headers: Adopt verbatim copy of compiletime_assert() from kernel sources tools headers: Update x86's syscall_64.tbl with the kernel sources tools headers UAPI: Sync drm/i915_drm.h with the kernel sources tools headers UAPI: Update tools's copy of drm.h headers tools headers kvm: Sync linux/kvm.h with the kernel sources tools headers UAPI: Sync linux/fscrypt.h with the kernel sources tools include UAPI: Sync linux/vhost.h with the kernel sources tools arch x86: Sync asm/cpufeatures.h with the kernel sources tools headers UAPI: Sync linux/mman.h with the kernel tools headers UAPI: Sync sched.h with the kernel tools headers: Update linux/vdso.h and grab a copy of vdso/const.h perf stat: Fix no metric header if --per-socket and --metric-only set perf python: Check if clang supports -fno-semantic-interposition tools arch x86: Sync the msr-index.h copy with the kernel sources
2020-04-18perf hist: Add fast path for duplicate entries checkKan Liang
Perf checks the duplicate entries in a callchain before adding an entry. However the check is very slow especially with deeper call stack. Almost ~50% elapsed time of perf report is spent on the check when the call stack is always depth of 32. The hist_entry__cmp() is used to compare the new entry with the old entries. It will go through all the available sorts in the sort_list, and call the specific cmp of each sort, which is very slow. Actually, for most cases, there are no duplicate entries in callchain. The symbols are usually different. It's much faster to do a quick check for symbols first. Only do the full cmp when the symbols are exactly the same. The quick check is only to check symbols, not dso. Export _sort__sym_cmp. $ perf record --call-graph lbr ./tchain_edit_64 Without the patch $time perf report --stdio real 0m21.142s user 0m21.110s sys 0m0.033s With the patch $time perf report --stdio real 0m10.977s user 0m10.948s sys 0m0.027s Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-18-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-18perf c2c: Add option to enable the LBR stitching approachKan Liang
With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number of samples with stitched LBRs are huge. Add an option to enable the approach. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-17-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-18perf top: Add option to enable the LBR stitching approachKan Liang
With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number of samples with stitched LBRs are huge. Add an option to enable the approach. The option must be used with --call-graph lbr. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-16-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-18perf script: Add option to enable the LBR stitching approachKan Liang
With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number of samples with stitched LBRs are huge. Add an option to enable the approach. Committer testing: Using the same perf.data as with the latest cset committer testing section: $ perf script --stitch-lbr <SNIP> tchain_edit 11131 15164.984292: 437491 cycles:u: 401106 f43+0x0 (/wb/tchain_edit) 40114c f42+0x18 (/wb/tchain_edit) 401172 f41+0xe (/wb/tchain_edit) 401194 f40+0x0 (/wb/tchain_edit) 40119b f39+0x0 (/wb/tchain_edit) 4011a2 f38+0x0 (/wb/tchain_edit) 4011a9 f37+0x0 (/wb/tchain_edit) 4011b0 f36+0x0 (/wb/tchain_edit) 4011b7 f35+0x0 (/wb/tchain_edit) 4011be f34+0x0 (/wb/tchain_edit) 4011c5 f33+0x0 (/wb/tchain_edit) 4011cc f32+0x0 (/wb/tchain_edit) 401207 f31+0x34 (/wb/tchain_edit) 401212 f30+0x0 (/wb/tchain_edit) 401219 f29+0x0 (/wb/tchain_edit) 401220 f28+0x0 (/wb/tchain_edit) 401227 f27+0x0 (/wb/tchain_edit) 40122e f26+0x0 (/wb/tchain_edit) 401235 f25+0x0 (/wb/tchain_edit) 40123c f24+0x0 (/wb/tchain_edit) 401243 f23+0x0 (/wb/tchain_edit) 40124a f22+0x0 (/wb/tchain_edit) 401251 f21+0x0 (/wb/tchain_edit) 401258 f20+0x0 (/wb/tchain_edit) 40125f f19+0x0 (/wb/tchain_edit) 401266 f18+0x0 (/wb/tchain_edit) 40126d f17+0x0 (/wb/tchain_edit) 401274 f16+0x0 (/wb/tchain_edit) 40127b f15+0x0 (/wb/tchain_edit) 401282 f14+0x0 (/wb/tchain_edit) 401289 f13+0x0 (/wb/tchain_edit) 401290 f12+0x0 (/wb/tchain_edit) 401297 f11+0x0 (/wb/tchain_edit) 40129e f10+0x0 (/wb/tchain_edit) 4012a5 f9+0x0 (/wb/tchain_edit) 4012ac f8+0x0 (/wb/tchain_edit) 4012b3 f7+0x0 (/wb/tchain_edit) 4012ba f6+0x0 (/wb/tchain_edit) 4012c1 f5+0x0 (/wb/tchain_edit) 4012c8 f4+0x0 (/wb/tchain_edit) 4012cf f3+0x0 (/wb/tchain_edit) 4012d6 f2+0x0 (/wb/tchain_edit) 4012dd f1+0x0 (/wb/tchain_edit) 4012e4 main+0x0 (/wb/tchain_edit) 7f41a5016f41 __libc_start_main+0xf1 (/usr/lib64/libc-2.29.so) <SNIP> $ Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-15-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-18perf report: Add option to enable the LBR stitching approachKan Liang
With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number of samples with stitched LBRs are huge. Add an option to enable the approach. # To display the perf.data header info, please use # --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 6K of event 'cycles' # Event count (approx.): 6492797701 # # Children Self Command Shared Object Symbol # ........ ........ ............... .................. # ................................. # 99.99% 99.99% tchain_edit tchain_edit [.] f43 | ---main f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 f17 f18 f19 f20 f21 f22 f23 f24 f25 f26 f27 f28 f29 f30 f31 | --99.65%--f32 f33 f34 f35 f36 f37 f38 f39 f40 f41 f42 f43 Committer testing: $ perf record --call-graph lbr /wb/tchain_edit [ perf record: Woken up 23 times to write data ] [ perf record: Captured and wrote 5.578 MB perf.data (6839 samples) ] $ perf report --header-only | egrep 'cpu(desc|.*capabilities)' # cpudesc : Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz # cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake $ Before: $ perf report --no-children --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 6K of event 'cycles:u' # Event count (approx.): 6459523879 # # Overhead Command Shared Object Symbol # ........ ........... ................ ....................... # 99.95% tchain_edit tchain_edit [.] f43 | --99.92%--f43 f42 f41 f40 f39 f38 f37 f36 f35 f34 f33 f32 f31 f30 f29 f28 f27 f26 f25 f24 f23 f22 f21 f20 f19 f18 f17 f16 f15 f14 f13 f12 f11 0.03% tchain_edit tchain_edit [.] f42 0.01% tchain_edit tchain_edit [.] f41 0.00% tchain_edit tchain_edit [.] f31 0.00% tchain_edit ld-2.29.so [.] _dl_relocate_object 0.00% tchain_edit ld-2.29.so [.] memmove 0.00% tchain_edit [unknown] [k] 0xffffffff93a00b17 After: $ perf report --stitch-lbr --no-children --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 6K of event 'cycles:u' # Event count (approx.): 6459496645 # # Overhead Command Shared Object Symbol # ........ ........... ................ ........................ # 99.97% tchain_edit tchain_edit [.] f43 | --99.93%--f43 f42 f41 f40 f39 f38 f37 f36 f35 f34 f33 f32 f31 f30 f29 f28 f27 f26 f25 f24 f23 f22 f21 f20 f19 f18 f17 f16 f15 f14 f13 f12 f11 f10 f9 f8 f7 f6 f5 f4 f3 f2 f1 main __libc_start_main 0.02% tchain_edit [unknown] [k] 0xffffffff93a00b17 0.01% tchain_edit tchain_edit [.] f31 0.00% tchain_edit ld-2.29.so [.] _dl_important_hwcaps Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-14-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>