summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2015-09-28perf intel-pt: Add mispred-all config option to aid use with autofdoAdrian Hunter
autofdo incorrectly expects branch flags to include either mispred or predicted. In fact mispred = predicted = 0 is valid and means the flags are not supported, which they aren't by Intel PT. To make autofdo work, add a config option which will cause Intel PT decoder to set the mispred flag on all branches. Below is an example of using Intel PT with autofdo. The example is also added to the Intel PT documentation. It requires autofdo (https://github.com/google/autofdo) and gcc version 5. The bubble sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial) amended to take the number of elements as a parameter. $ gcc-5 -O3 sort.c -o sort_optimized $ ./sort_optimized 30000 Bubble sorting array of 30000 elements 2254 ms $ cat ~/.perfconfig [intel-pt] mispred-all $ perf record -e intel_pt//u ./sort 3000 Bubble sorting array of 3000 elements 58 ms [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 3.939 MB perf.data ] $ perf inject -i perf.data -o inj --itrace=i100usle --strip $ ./create_gcov --binary=./sort --profile=inj --gcov=sort.gcov -gcov_version=1 $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo $ ./sort_autofdo 30000 Bubble sorting array of 30000 elements 2155 ms Note there is currently no advantage to using Intel PT instead of LBR, but that may change in the future if greater use is made of the data. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-26-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf inject: Add --strip option to strip out non-synthesized eventsAdrian Hunter
Add a new option --strip which is used with --itrace to strip out non-synthesized events. This results in a perf.data file that is simpler for external tools to parse. In particular, this can be used to prepare a perf.data file for consumption by autofdo. A subsequent patch makes a change to Intel PT also to enable use with autofdo and gives an example of that use. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-25-git-send-email-adrian.hunter@intel.com [ Made it use perf_evlist__remove() + perf_evsel__delete() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf inject: Remove more aux-related stuff when processing instruction tracesAdrian Hunter
perf inject can process instruction traces (using the --itrace option) which removes aux-related events and replaces them with the requested synthesized events. However there are still some leftovers, namely PERF_RECORD_ITRACE_START events and the original evsel (selected event) e.g. intel_pt// For the sake of completeness, remove them too. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-24-git-send-email-adrian.hunter@intel.com [ Made it use perf_evlist__remove() + perf_evsel__delete() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf evlist: Add perf_evlist__remove()Adrian Hunter
Add a counterpart to perf_evlist__add() that does the opposite and deletes the evsel. This will be used by perf inject to remove unwanted evsels. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-23-git-send-email-adrian.hunter@intel.com [ Renamed it from perf_evlist__del() to perf_evlist__remove() and removed the perf_evsel__delete() call ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf evlist: Add perf_evlist__id2evsel_strict()Adrian Hunter
perf_evlist__id2evsel_strict() is the same as perf_evlist__id2evsel() except that it ensures that the id must match. This will be used by perf inject to find a specific evsel that is to be deleted, hence the need to match exactly. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-22-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf script: Make scripting_max_stack value allow for synthesized callchainsAdrian Hunter
perf script has a setting to set the maximum stack depth when processing callchains. The setting defaults to the hard-coded maximum definition PERF_MAX_STACK_DEPTH which is 127. It is possible, when processing instruction traces, to synthesize callchains. Synthesized callchains do not have the kernel size limitation and are whatever size the user requests, although validation presently prevents the user requested a value greater that 1024. The default value is 16. To allow for synthesized callchains, make the scripting_max_stack value at least the same size as the synthesized callchain size. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-21-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf scripting python: Allow for max_stack greater than PERF_MAX_STACK_DEPTHAdrian Hunter
Use the scripting_max_stack value to allow for values greater than PERF_MAX_STACK_DEPTH. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-20-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf script: Add a setting for maximum stack depthAdrian Hunter
Add a setting for maximum stack depth in preparation for allowing for synthesized callchains. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-19-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf hists: Allow for max_stack greater than PERF_MAX_STACK_DEPTHAdrian Hunter
Use the max_stack value instead of PERF_MAX_STACK_DEPTH so that arbitrary-sized callchains can be supported. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-17-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf report: Make max_stack value allow for synthesized callchainsAdrian Hunter
perf report has an option (--max-stack) to set the maximum stack depth when processing callchains. The option defaults to the hard-coded maximum definition PERF_MAX_STACK_DEPTH which is 127. The intention of the option is to allow the user to reduce the processing time by reducing the amount of the callchain that is processed. It is also possible, when processing instruction traces, to synthesize callchains. Synthesized callchains do not have the kernel size limitation and are whatever size the user requests, although validation presently prevents the user requested a value greater that 1024. The default value is 16. To allow for synthesized callchains, make the max_stack value at least the same size as the synthesized callchain size. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-16-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf intel-pt: Support generating branch stackAdrian Hunter
Add support for generating branch stack context for PT samples. The decoder reports a configurable number of branches as branch context for each sample. Internally it keeps track of them by using a simple sliding window. We also flush the last branch buffer on each sample to avoid overlapping intervals. This is useful for: - Reporting accurate basic block edge frequencies through the perf report branch view - Using with --branch-history to get the wider context of samples - Other users of LBRs Also the Documentation is updated. Examples: Record with Intel PT: perf record -e intel_pt//u ls Branch stacks are used by default if synthesized so: perf report --itrace=ile is the same as: perf report --itrace=ile -b Branch history can be requested also: perf report --itrace=igle --branch-history Based-on-patch-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-15-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf intel-pt: Move branch filter logicAdrian Hunter
intel_pt_synth_branch_sample() skips synthesizing if the branch does not match the branch filter. That logic was sitting in the middle of the function but is more efficiently placed at the start of the function, so move it. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-14-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf inject: Set branch stack feature flag when synthesizing branch stacksAdrian Hunter
The branch stack feature flag is set by 'perf record' when recording data that contains branch stacks. Consequently, when 'perf inject' synthesizes branch stacks, the feature flag should be set also. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-13-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf report: Skip events with null branch stacksAdrian Hunter
A non-synthesized event might not have a branch stack if branch stacks have been synthesized (using itrace options). An example of that is when Intel PT records sched_switch events for decoding purposes. Those sched_switch events do not have branch stacks even though the Intel PT decoder may be synthesizing other events that do due to the itrace options. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-12-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf report: Also do default setup for synthesized branch stacksAdrian Hunter
The 'perf report' tool will default to displaying branch stacks (-b option) if they are present. Make that also happen for synthesized branch stacks. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-11-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf report: Adjust sample type validation for synthesized branch stacksAdrian Hunter
perf report looks at event sample types to determine if branch stacks have been sampled. Adjust the validation to know about instruction tracing options. This change allows the use of the -b option which otherwise would complain with an error like: Error: Selected -b but no branch data. Did you call perf record without -b? # To display the perf.data header info, # please use --header/--header-only options. # Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-10-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf auxtrace: Add option to synthesize branch stacks on samplesAdrian Hunter
Add AUX area tracing option 'l' to synthesize branch stacks on samples just like sample type PERF_SAMPLE_BRANCH_STACK. This is taken into use by Intel PT in a subsequent patch. Based-on-patch-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-9-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf tools: Add more documentation to export-to-postgresql.py scriptAdrian Hunter
Add some comments to the script and some 'views' to the created database that better illustrate the database structure and how it can be used. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-8-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf session: Warn when AUX data has been lostAdrian Hunter
By default 'perf record' will postprocess the perf.data file to determine build-ids. When that happens, the number of lost perf events is displayed. Make that also happen for AUX events. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-7-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf script: Allow time to be displayed in nanosecondsAdrian Hunter
Add option --ns to display time to 9 decimal places. That is useful in some cases, for example when using Intel PT cycle accurate mode. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-6-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf intel-pt: Make logging slightly more efficientAdrian Hunter
Logging is only used for debugging. Use macros to save calling into the functions only to return immediately when logging is not enabled. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-5-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf intel-pt: Fix potential loop foreverAdrian Hunter
TSC packets contain only 7 bytes of TSC. The 8th byte is assumed to change so infrequently that its value can be inferred. However the logic must cater for a 7 byte wraparound, which it does by adding 1 to the top byte. The existing code was doing that with a while loop even though the addition should only need to be done once. That logic won't work (will loop forever) if TSC wraps around at the 8th byte. Theoretically that would take at least 10 years, unless something else went wrong. And what else could go wrong. Well, if the chunks of trace data are processed out of order, it will make it look like the 7-byte TSC has gone backwards (i.e. wrapped). If that happens 256 times then stuck in the while loop it will be. Fix that by getting rid of the unnecessary while loop. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-4-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf report: Fix sample type validation for synthesized callchainsAdrian Hunter
Processing instruction tracing data (e.g. Intel PT) can synthesize callchains e.g. $ perf record -e intel_pt//u uname $ perf report --stdio --itrace=ige However perf report's callgraph option gets extra validation, so: $ perf report --stdio --itrace=ige -gflat Error: Selected -g or --branch-history but no callchain data. Did you call 'perf record' without -g? # To display the perf.data header info, # please use --header/--header-only options. # Fix the validation to know about instruction tracing options so above command works. A side-effect of the change is that the default option to accumulate the callchain of child functions comes into force. To get the previous behaviour the --no-children option can be used e.g. $ perf report --stdio --itrace=ige -gflat --no-children Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf auxtrace: Fix 'instructions' period of zeroAdrian Hunter
Instruction tracing options (i.e. --itrace) include an option for sampling instructions at an arbitrary period. e.g. --itrace=i10us means make an 'instructions' sample for every 10us of trace. Currently the logic does not distinguish between a period of zero and no period being specified at all, so it gets treated as the default period which is 100000. That doesn't really make sense. Fix it so that zero period is accepted and treated as meaning "as often as possible". In the case of Intel PT that is the same as a period of 1 and a unit of 'instructions' (i.e. --itrace=i1i). Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-2-git-send-email-adrian.hunter@intel.com [ Add a few lines describing this in the Documentation/intel-pt.txt file ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28tools build: Build fixdep helper from perf and basic libsJiri Olsa
Adding the fixdep target into the Makefile.include to ease up building of fixdep helper, that needs to be built before we dive in to the build itself. The user can invoke the fixdep target to build the helper. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1443004442-32660-8-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf tools: Rename the 'single_dep' target to 'prepare'Jiri Olsa
And use the new 'prepare' target for the $(PERF_IN) target. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1443004442-32660-7-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28tools build: Make the fixdep helper part of the build processJiri Olsa
Making the fixdep helper to be invoked within dep-cmd. Each user of the build framework needs to make sure fixdep exists before executing the build itself. If the build doesn't find fixdep, it falls back to the old style dependency tracking. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1443004442-32660-6-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28tools build: Move dependency copy into functionJiri Olsa
So it's easier to add more functionality in the following commit. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1443004442-32660-5-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28tools build: Add fixdep dependency helperJiri Olsa
For dependency tracking we currently use targets that fall out of the gcc -MD command. We store this info in the .cmd file and include as makefile during the build. This format put object as target and all the c and header files as dependencies, like: util/abspath.o: util/abspath.c /usr/include/stdc-predef.h util/cache.h \ /usr/include/bits/wordsize.h /usr/include/gnu/stubs.h \ ... If any of those dependency header files (krava.h below) is removed the build fails on: make[1]: *** No rule to make target 'krava.h', needed by 'inc.o'. Stop. This patch adds fixdep helper, that is used by kbuild to alter the shape of the object dependencies like: source_util/abspath.o := util/abspath.c deps_util/abspath.o := \ /usr/include/stdc-predef.h \ util/cache.h \ ... util/abspath.o: $(deps_util/abspath.o) $(deps_util/abspath.o): With this format the header removal won't make the build fail, because it'll be picked up by the last empty target defined for each header. As previously mentioned the fixdep tool is taken from kbuild. It's not complete backport, only the part that alters the standard dependency info was taken, the part that adds the CONFIG_* dependency logic will be probably taken later on. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Kai Germaschewski <kai.germaschewski@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1443004442-32660-4-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28tools build: Add test for missing includeJiri Olsa
The current build framework fails to cope with header file removal. The reason is that the removed header file stays in the .cmd file target rule and forces the build to fail. This issue is fixed and explained in the following patches. Adding a new build test that simulates header removal. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1443004442-32660-3-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28tools build: Add Makefile.includeJiri Olsa
To ease up build framework code setup for users. More shared code will be added in the following patches. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1443004442-32660-2-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28tools lib api fs: Store tracing mountpoint for better error messageJiri Olsa
Storing the actual tracing path mountpoint to display correct error message hint ('Hint:' line). The error hint rediscovers mountpoints, but it could be different from what we actually used in tracing path. Before we'd display debugfs mount even though tracefs was used: $ perf record -e sched:sched_krava ls event syntax error: 'sched:sched_krava' \___ can't access trace events Error: No permissions to read /sys/kernel/debug/tracing/events/sched/sched_krava Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug' ... After this change, correct mountpoint is displayed: $ perf record -e sched:sched_krava ls event syntax error: 'sched:sched_krava' \___ can't access trace events Error: No permissions to read /sys/kernel/debug/tracing/events/sched/sched_krava Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing' ... Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Raphael Beamonte <raphael.beamonte@gmail.com> Link: http://lkml.kernel.org/r/1442674027-19427-1-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf tools: Use __map__is_kernel() when synthesizing kernel module mmap recordsArnaldo Carvalho de Melo
Equivalent and removes one more case of using dso->kernel. # perf record -a usleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.768 MB perf.data (30 samples) ] Before: [root@zoo ~]# perf script --show-task --show-mmap | head -3 swapper 0 [0] 0.0: PERF_RECORD_MMAP -1/0: [0xffffffff81000000(0x1f000000) @ 0xffffffff81000000]: x [kernel.kallsyms]_text swapper 0 [0] 0.0: PERF_RECORD_MMAP -1/0: [0xffffffffa0000000(0xa000) @ 0]: x /lib/modules/4.3.0-rc1+/kernel/drivers/acpi/video.ko swapper 0 [0] 0.0: PERF_RECORD_MMAP -1/0: [0xffffffffa000a000(0x5000) @ 0]: x /lib/modules/4.3.0-rc1+/kernel/drivers/i2c/algos/i2c-algo-bit.ko # # perf script --show-task --show-mmap | head -3 swapper 0 [0] 0.0: PERF_RECORD_MMAP -1/0: [0xffffffff81000000(0x1f000000) @ 0xffffffff81000000]: x [kernel.kallsyms]_text swapper 0 [0] 0.0: PERF_RECORD_MMAP -1/0: [0xffffffffa0000000(0xa000) @ 0]: x /lib/modules/4.3.0-rc1+/kernel/drivers/acpi/video.ko swapper 0 [0] 0.0: PERF_RECORD_MMAP -1/0: [0xffffffffa000a000(0x5000) @ 0]: x /lib/modules/4.3.0-rc1+/kernel/drivers/i2c/algos/i2c-algo-bit.ko # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-b65xe578dwq22mzmmj5y94wr@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf hists browser: Use the map to determine if a DSO is being used as a kernelArnaldo Carvalho de Melo
The map is what should say if an ELF (or some other format) image is being used for some particular purpose, as a kernel, host or guest. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-zufousvfar0710p4qj71c32d@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf top: Filter symbols based on __map__is_kernel(map)Arnaldo Carvalho de Melo
Instead of using dso->kernel, this is equivalent at the moment, and helps in reducing the accesses to dso->kernel. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-1pc2v63iphtifovw3bv0bo1v@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28Merge branch 'linus' into perf/core, to pick up fixes before applying new ↵Ingo Molnar
changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-27Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Another pile of fixes for perf: - Plug overflows and races in the core code - Sanitize the flow of the perf syscall so we error out before handling the more complex and hard to undo setups - Improve and fix Broadwell and Skylake hardware support - Revert a fix which broke what it tried to fix in perf tools - A couple of smaller fixes in various places of perf tools" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Fix copying of /proc/kcore perf intel-pt: Remove no_force_psb from documentation perf probe: Use existing routine to look for a kernel module by dso->short_name perf/x86: Change test_aperfmperf() and test_intel() to static tools lib traceevent: Fix string handling in heterogeneous arch environments perf record: Avoid infinite loop at buildid processing with no samples perf: Fix races in computing the header sizes perf: Fix u16 overflows perf: Restructure perf syscall point of no return perf/x86/intel: Fix Skylake FRONTEND MSR extrareg mask perf/x86/intel/pebs: Add PEBS frontend profiling for Skylake perf/x86/intel: Make the CYCLE_ACTIVITY.* constraint on Broadwell more specific perf tools: Bool functions shouldn't return -1 tools build: Add test for presence of __get_cpuid() gcc builtin tools build: Add test for presence of numa_num_possible_cpus() in libnuma Revert "perf symbols: Fix mismatched declarations for elf_getphdrnum" perf stat: Fix per-pkg event reporting bug
2015-09-27tools: usb: testusb: change the default value for length from 512 to 1024Peter Chen
For ctrl out test, it needs length > vary, so in order to run it with default parameters, we do this change. Acked-by: Michal Nazarewicz <mina86@mina86.com> Cc: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Peter Chen <peter.chen@freescale.com> Signed-off-by: Felipe Balbi <balbi@ti.com>
2015-09-27tools: usb: testusb: change the help textPeter Chen
The 'length' is the transfer length, not the packet size, so change the help text. Acked-by: Michal Nazarewicz <mina86@mina86.com> Cc: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Peter Chen <peter.chen@freescale.com> Signed-off-by: Felipe Balbi <balbi@ti.com>
2015-09-27Merge branch 'turbostat' of ↵Rafael J. Wysocki
https://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux into pm-tools Pull turbostat updates for v4.3 from Len Brown. * 'turbostat' of https://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbosat: update version number tools/power turbostat: SKL: Adjust for TSC difference from base frequency tools/power turbostat: KNL workaround for %Busy and Avg_MHz tools/power turbostat: IVB Xeon: fix --debug regression
2015-09-26tools/power turbosat: update version numberLen Brown
Signed-off-by: Len Brown <len.brown@intel.com>
2015-09-26tools/power turbostat: SKL: Adjust for TSC difference from base frequencyLen Brown
On a Skylake with 1500MHz base frequency, the TSC runs at 1512MHz. This is because the TSC is no longer in the n*100 MHz BCLK domain, but is now in the m*24MHz crystal clock domain. (24 MHz * 63 = 1512 MHz) This adds error to several calculations in turbostat, unless the TSC sample sizes are adjusted for this difference. Note that calculations in the time domain are immune from this issue, as the timing sub-system has already calibrated the TSC against a known wall clock. AVG_MHz = APERF_delta/measurement_interval need no adjustment. APERF_delta is in the BCLK domain, and measurement_interval is in the time domain. TSC_MHz = TSC_delta/measurement_interval needs no adjustment -- as we really do want to report the actual measured TSC delta here, and measurement_interval is in the accurate time domain. %Busy = MPERF_delta/TSC_delta needs adjustment to use TSC_BCLK_DOMAIN_delta. TSC_BCLK_DOMAIN_delta = TSC_delta * base_hz / tsc_hz Bzy_MHz = TSC_delta/APERF_delta/MPERF_delta/measurement_interval need adjustment as above. No other metrics in turbostat need to be adjusted. Before: CPU Avg_MHz %Busy Bzy_MHz TSC_MHz - 550 24.84 2216 1512 0 2191 98.73 2219 1514 2 0 0.01 2130 1512 1 9 0.43 2016 1512 3 2 0.08 2016 1512 After: CPU Avg_MHz %Busy Bzy_MHz TSC_MHz - 550 25.05 2198 1512 0 2190 99.62 2199 1512 2 0 0.01 2152 1512 1 9 0.46 2000 1512 3 2 0.10 2000 1512 Note that in this example, the "Before" Bzy_MHz was reported as exceeding the 2200 max turbo rate. Also, even a pinned spin loop would not be reported as over 99% busy. Signed-off-by: Len Brown <len.brown@intel.com>
2015-09-26tools/power turbostat: KNL workaround for %Busy and Avg_MHzHubert Chrzaniuk
KNL increments APERF and MPERF every 1024 clocks. This is compliant with the architecture specification, which requires that only the ratio of APERF/MPERF need be valid. However, turbostat takes advantage of the fact that these two MSRs increment every un-halted clock at the actual and base frequency: AVG_MHz = APERF_delta/measurement_interval %Busy = MPERF_delta/TSC_delta This quirk is needed for these calculations to also work on KNL, which would otherwise show a value 1024x smaller than expected. Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2015-09-26tools/power turbostat: IVB Xeon: fix --debug regressionLen Brown
Staring in Linux-4.3-rc1, commit 6fb3143b561c ("tools/power turbostat: dump CONFIG_TDP") touches MSR 0x648, which is not supported on IVB-Xeon. This results in "turbostat --debug" exiting on those systems: turbostat: /dev/cpu/2/msr offset 0x648 read failed: Input/output error Remove IVB-Xeon from the list of machines supporting with that MSR. Signed-off-by: Len Brown <len.brown@intel.com>
2015-09-25perf tools: Fix copying of /proc/kcoreAdrian Hunter
A copy of /proc/kcore containing the kernel text can be made to the buildid cache. e.g. perf buildid-cache -v -k /proc/kcore To workaround objdump limitations, a copy is also made when annotating against /proc/kcore. The copying process stops working from libelf about v1.62 onwards (the problem was found with v1.63). The cause is that a call to gelf_getphdr() in kcore__add_phdr() fails because additional validation has been added to gelf_getphdr(). The use of gelf_getphdr() is a misguided attempt to get default initialization of the Gelf_Phdr structure. That should not be necessary because every member of the Gelf_Phdr structure is subsequently assigned. So just remove the call to gelf_getphdr(). Similarly, a call to gelf_getehdr() in gelf_kcore__init() can be removed also. Committer notes: Note to stable@kernel.org, from Adrian in the cover letter for this patchkit: The "Fix copying of /proc/kcore" problem goes back to v3.13 if you think it is important enough for stable. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1443089122-19082-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-25perf intel-pt: Remove no_force_psb from documentationAdrian Hunter
no_force_psb was dropped as a late change to the kernel driver. Consequently, remove it from the documentation. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443089122-19082-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-25perf probe: Use existing routine to look for a kernel module by dso->short_nameArnaldo Carvalho de Melo
We have map_groups__find_by_name() to look at the list of modules that are in place for a given machine, so use it instead of traversing the machine dso list, which also includes DSOs for userspace. When merging the user and kernel DSO lists a bug was introduced where 'perf probe' stopped being able to add probes to modules using its short name: # perf probe -m usbnet --add usbnet_start_xmit usbnet_start_xmit is out of .text, skip it. Error: Failed to add events. # With this fix it works again: # perf probe -m usbnet --add usbnet_start_xmit Added new event: probe:usbnet_start_xmit (on usbnet_start_xmit in usbnet) You can now use it in all perf tools, such as: perf record -e probe:usbnet_start_xmit -aR sleep 1 # Reported-by: Wang Nan <wangnan0@huawei.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Fixes: 3d39ac538629 ("perf machine: No need to have two DSOs lists") Link: http://lkml.kernel.org/r/20150924015008.GE1897@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-25Merge branch 'linus' into x86/asm, to refresh the tree before applying new ↵Ingo Molnar
changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-23Merge tag 'perf-core-for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: User visible changes: - Fix a segfault in 'perf probe' when removing uprobe events. (Masami Hiramatsu) - Synthesize COMM event for workloads started from the command line in 'perf record' so that we can have the pid->comm mapping before we get the real PERF_RECORD_COMM switching from perf to the workload. (Namhyung Kim) - Fix build tools/vm/ due to removal of tools/lib/api/fs/debugfs.h. (Arnaldo Carvalho de Melo) Infrastructure changes: - Fix the make tarball targets by including the recently added err.h header in the perf MANIFEST file. (Jiri Olsa) - Don't assume that the event parser returns a non empty evlist. (Wang Nan) - Add way to disambiguate feature detection state files, needed to use tools/build feature detection for multiple components in a single O= output dir, which will be the case with tools/perf/ and tools/lib/bpf/. (Arnaldo Carvalho de Melo) - Fixup FEATURE_{TESTS,DISPLAY} inversion in tools/lib/bpf/. (Arnaldo Carvalho de Melo) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-23Merge branch 'perf/urgent' into perf/core to pick up fixes before pulling ↵Ingo Molnar
new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>