summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/cpu/perf_event_intel.c
AgeCommit message (Collapse)Author
2015-04-02perf/x86/intel: Limit to half counters when the HT workaround is enabled, to ↵Stephane Eranian
avoid exclusive mode starvation This patch limits the number of counters available to each CPU when the HT bug workaround is enabled. This is necessary to avoid situation of counter starvation. Such can arise from configuration where one HT thread, HT0, is using all 4 counters with corrupting events which require exclusion the the sibling HT, HT1. In such case, HT1 would not be able to schedule any event until HT0 is done. To mitigate this problem, this patch artificially limits the number of counters to 2. That way, we can gurantee that at least 2 counters are not in exclusive mode and therefore allow the sibling thread to schedule events of the same type (system vs. per-thread). The 2 counters are not determined in advance. We simply set the limit to two events per HT. This helps mitigate starvation in case of events with specific counter constraints such a PREC_DIST. Note that this does not elimintate the starvation is all cases. But it is better than not having it. (Solution suggested by Peter Zjilstra.) Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1416251225-17721-11-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02perf/x86/intel: Fix intel_get_event_constraints() for dynamic constraintsStephane Eranian
With dynamic constraint, we need to restart from the static constraints each time the intel_get_event_constraints() is called. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-10-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02perf/x86/intel: Enforce HT bug workaround for SNB/IVB/HSWMaria Dimakopoulou
This patches activates the HT bug workaround for the SNB/IVB/HSW processors. This covers non-PEBS mode. Activation is done thru the constraint tables. Both client and server processors needs this workaround. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-8-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02perf/x86/intel: Implement cross-HT corruption bug workaroundMaria Dimakopoulou
This patch implements a software workaround for a HW erratum on Intel SandyBridge, IvyBridge and Haswell processors with Hyperthreading enabled. The errata are documented for each processor in their respective specification update documents: - SandyBridge: BJ122 - IvyBridge: BV98 - Haswell: HSD29 The bug causes silent counter corruption across hyperthreads only when measuring certain memory events (0xd0, 0xd1, 0xd2, 0xd3). Counters measuring those events may leak counts to the sibling counter. For instance, counter 0, thread 0 measuring event 0xd0, may leak to counter 0, thread 1, regardless of the event measured there. The size of the leak is not predictible. It all depends on the workload and the state of each sibling hyper-thread. The corrupting events do undercount as a consequence of the leak. The leak is compensated automatically only when the sibling counter measures the exact same corrupting event AND the workload is on the two threads is the same. Given, there is no way to guarantee this, a work-around is necessary. Furthermore, there is a serious problem if the leaked count is added to a low-occurrence event. In that case the corruption on the low occurrence event can be very large, e.g., orders of magnitude. There is no HW or FW workaround for this problem. The bug is very easy to reproduce on a loaded system. Here is an example on a Haswell client, where CPU0, CPU4 are siblings. We load the CPUs with a simple triad app streaming large floating-point vector. We use 0x81d0 corrupting event (MEM_UOPS_RETIRED:ALL_LOADS) and 0x20cc (ROB_MISC_EVENTS:LBR_INSERTS). Given we are not using the LBR, the 0x20cc event should be zero. $ taskset -c 0 triad & $ taskset -c 4 triad & $ perf stat -a -C 0 -e r81d0 sleep 100 & $ perf stat -a -C 4 -r20cc sleep 10 Performance counter stats for 'system wide': 139 277 291 r20cc 10,000969126 seconds time elapsed In this example, 0x81d0 and r20cc ar eusing sinling counters on CPU0 and CPU4. 0x81d0 leaks into 0x20cc and corrupts it from 0 to 139 millions occurrences. This patch provides a software workaround to this problem by modifying the way events are scheduled onto counters by the kernel. The patch forces cross-thread mutual exclusion between counters in case a corrupting event is measured by one of the hyper-threads. If thread 0, counter 0 is measuring event 0xd0, then nothing can be measured on counter 0, thread 1. If no corrupting event is measured on any hyper-thread, event scheduling proceeds as before. The same example run with the workaround enabled, yield the correct answer: $ taskset -c 0 triad & $ taskset -c 4 triad & $ perf stat -a -C 0 -e r81d0 sleep 100 & $ perf stat -a -C 4 -r20cc sleep 10 Performance counter stats for 'system wide': 0 r20cc 10,000969126 seconds time elapsed The patch does provide correctness for all non-corrupting events. It does not "repatriate" the leaked counts back to the leaking counter. This is planned for a second patch series. This patch series makes this repatriation more easy by guaranteeing the sibling counter is not measuring any useful event. The patch introduces dynamic constraints for events. That means that events which did not have constraints, i.e., could be measured on any counters, may now be constrained to a subset of the counters depending on what is going on the sibling thread. The algorithm is similar to a cache coherency protocol. We call it XSU in reference to Exclusive, Shared, Unused, the 3 possible states of a PMU counter. As a consequence of the workaround, users may see an increased amount of event multiplexing, even in situtations where there are fewer events than counters measured on a CPU. Patch has been tested on all three impacted processors. Note that when HT is off, there is no corruption. However, the workaround is still enabled, yet not costing too much. Adding a dynamic detection of HT on turned out to be complex are requiring too much to code to be justified. This patch addresses the issue when PEBS is not used. A subsequent patch fixes the problem when PEBS is used. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> [spinlock_t -> raw_spinlock_t] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-7-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02perf/x86/intel: Add cross-HT counter exclusion infrastructureMaria Dimakopoulou
This patch adds a new shared_regs style structure to the per-cpu x86 state (cpuc). It is used to coordinate access between counters which must be used with exclusion across HyperThreads on Intel processors. This new struct is not needed on each PMU, thus is is allocated on demand. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> [peterz: spinlock_t -> raw_spinlock_t] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-6-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02perf/x86: Add 'index' param to get_event_constraint() callbackStephane Eranian
This patch adds an index parameter to the get_event_constraint() x86_pmu callback. It is expected to represent the index of the event in the cpuc->event_list[] array. When the callback is used for fake_cpuc (evnet validation), then the index must be -1. The motivation for passing the index is to use it to index into another cpuc array. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1416251225-17721-5-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02perf/x86: Vectorize cpuc->kfree_on_onlineStephane Eranian
Make the cpuc->kfree_on_online a vector to accommodate more than one entry and add the second entry to be used by a later patch. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02perf/x86: Rename x86_pmu::er_flags to 'flags'Stephane Eranian
Because it will be used for more than just tracking the presence of extra registers. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1416251225-17721-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02Merge branch 'perf/urgent' into perf/core, before applying dependent patchesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02perf/x86/intel/bts: Add BTS PMU driverAlexander Shishkin
Add support for Branch Trace Store (BTS) via kernel perf event infrastructure. The difference with the existing implementation of BTS support is that this one is a separate PMU that exports events' trace buffers to userspace by means of AUX area of the perf buffer, which is zero-copy mapped into userspace. The immediate benefit is that the buffer size can be much bigger, resulting in fewer interrupts and no kernel side copying is involved and little to no trace data loss. Also, kernel code can be traced with this driver. The old way of collecting BTS traces still works. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Link: http://lkml.kernel.org/r/1422614435-114702-1-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02perf/x86/intel/pt: Add Intel PT PMU driverAlexander Shishkin
Add support for Intel Processor Trace (PT) to kernel's perf events. PT is an extension of Intel Architecture that collects information about software execuction such as control flow, execution modes and timings and formats it into highly compressed binary packets. Even being compressed, these packets are generated at hundreds of megabytes per second per core, which makes it impractical to decode them on the fly in the kernel. This driver exports trace data by through AUX space in the perf ring buffer, which is zero-copy mapped into userspace for faster data retrieval. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Link: http://lkml.kernel.org/r/1422614392-114498-1-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02perf/x86: Mark Intel PT and LBR/BTS as mutually exclusiveAlexander Shishkin
Intel PT cannot be used at the same time as LBR or BTS and will cause a general protection fault if they are used together. In order to avoid fixing up GPs in the fast path, instead we disallow creating LBR/BTS events when PT events are present and vice versa. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Link: http://lkml.kernel.org/r/1421237903-181015-12-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02perf/x86/intel: Fix Haswell CYCLE_ACTIVITY.* counter constraintsAndi Kleen
Some of the CYCLE_ACTIVITY.* events can only be scheduled on counter 2. Due to a typo Haswell matched those with INTEL_EVENT_CONSTRAINT, which lead to the events never matching as the comparison does not expect anything in the umask too. Fix the typo. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1425925222-32361-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02perf/x86/intel: Filter branches for PEBS eventKan Liang
For supporting Intel LBR branches filtering, Intel LBR sharing logic mechanism is introduced from commit b36817e88630 ("perf/x86: Add Intel LBR sharing logic"). It modifies __intel_shared_reg_get_constraints() to config lbr_sel, which is finally used to set LBR_SELECT. However, the intel_shared_regs_constraints() function is called after intel_pebs_constraints(). The PEBS event will return immediately after intel_pebs_constraints(). So it's impossible to filter branches for PEBS events. This patch moves intel_shared_regs_constraints() ahead of intel_pebs_constraints(). We can safely do that because the intel_shared_regs_constraints() function only returns empty constraint if its rejecting the event, otherwise it returns NULL such that we continue calling intel_pebs_constraints() and x86_get_event_constraint(). Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1427467105-9260-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27perf/x86/intel: Add INST_RETIRED.ALL workaroundsAndi Kleen
On Broadwell INST_RETIRED.ALL cannot be used with any period that doesn't have the lowest 6 bits cleared. And the period should not be smaller than 128. This is erratum BDM11 and BDM55: http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5th-gen-core-family-spec-update.pdf BDM11: When using a period < 100; we may get incorrect PEBS/PMI interrupts and/or an invalid counter state. BDM55: When bit0-5 of the period are !0 we may get redundant PEBS records on overflow. Add a new callback to enforce this, and set it for Broadwell. How does this handle the case when an app requests a specific period with some of the bottom bits set? Short answer: Any useful instruction sampling period needs to be 4-6 orders of magnitude larger than 128, as an PMI every 128 instructions would instantly overwhelm the system and be throttled. So the +-64 error from this is really small compared to the period, much smaller than normal system jitter. Long answer (by Peterz): IFF we guarantee perf_event_attr::sample_period >= 128. Suppose we start out with sample_period=192; then we'll set period_left to 192, we'll end up with left = 128 (we truncate the lower bits). We get an interrupt, find that period_left = 64 (>0 so we return 0 and don't get an overflow handler), up that to 128. Then we trigger again, at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get an overflow). We increment with sample_period so we get left = 128. We fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an overflow). And on and on. So while the individual interrupts are 'wrong' we get then with interval=256,128 in exactly the right ratio to average out at 192. And this works for everything >=128. So the num_samples*fixed_period thing is still entirely correct +- 127, which is good enough I'd say, as you already have that error anyhow. So no need to 'fix' the tools, al we need to do is refuse to create INST_RETIRED:ALL events with sample_period < 128. Signed-off-by: Andi Kleen <ak@linux.intel.com> [ Updated comments and changelog a bit. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1424225886-18652-3-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27perf/x86/intel: Add Broadwell core supportAndi Kleen
Add Broadwell support for Broadwell to perf. The basic support is very similar to Haswell. We use the new cache event list added for Haswell earlier. The only differences are a few bits related to remote nodes. To avoid an extra, mostly identical, table these are patched up in the initialization code. The constraint list has one new event that needs to be handled over Haswell. Includes code and testing from Kan Liang. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1424225886-18652-2-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27perf/x86/intel: Add new cache events table for HaswellAndi Kleen
Haswell offcore events are quite different from Sandy Bridge. Add a new table to handle Haswell properly. Note that the offcore bits listed in the SDM are not quite correct (this is currently being fixed). An uptodate list of bits is in the patch. The basic setup is similar to Sandy Bridge. The prefetch columns have been removed, as prefetch counting is not very reliable on Haswell. One L1 event that is not in the event list anymore has been also removed. - data reads do not include code reads (comparable to earlier Sandy Bridge tables) - data counts include speculative execution (except L1 write, dtlb, bpu) - remote node access includes both remote memory, remote cache, remote mmio. - prefetches are not included in the counts for consistency (different from Sandy Bridge, which includes prefetches in the remote node) Signed-off-by: Andi Kleen <ak@linux.intel.com> [ Removed the HSM30 comments; we don't have them for SNB/IVB either. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1424225886-18652-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18perf: Simplify the branch stack checkYan, Zheng
Use event->attr.branch_sample_type to replace intel_pmu_needs_lbr_smpl() for avoiding duplicated code that implicitly enables the LBR. Currently, branch stack can be enabled by user explicitly requesting branch sampling or implicit branch sampling to correct PEBS skid. For user explicitly requested branch sampling, the branch_sample_type is explicitly set by user. For PEBS case, the branch_sample_type is also implicitly set to PERF_SAMPLE_BRANCH_ANY in x86_pmu_hw_config. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: eranian@google.com Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1415156173-10035-11-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18perf/x86/intel: Add basic Haswell LBR call stack supportYan, Zheng
Haswell has a new feature that utilizes the existing LBR facility to record call chains. To enable this feature, bits (JCC, NEAR_IND_JMP, NEAR_REL_JMP, FAR_BRANCH, EN_CALLSTACK) in LBR_SELECT must be set to 1, bits (NEAR_REL_CALL, NEAR-IND_CALL, NEAR_RET) must be cleared. Due to a hardware bug of Haswell, this feature doesn't work well with FREEZE_LBRS_ON_PMI. When the call stack feature is enabled, the LBR stack will capture unfiltered call data normally, but as return instructions are executed, the last captured branch record is flushed from the on-chip registers in a last-in first-out (LIFO) manner. Thus, branch information relative to leaf functions will not be captured, while preserving the call stack information of the main line execution path. This patch defines a separate lbr_sel map for Haswell. The map contains a new entry for the call stack feature. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: eranian@google.com Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1415156173-10035-5-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18perf/x86/intel: Use context switch callback to flush LBR stackYan, Zheng
Previous commit introduces context switch callback, its function overlaps with the flush branch stack callback. So we can use the context switch callback to flush LBR stack. This patch adds code that uses the flush branch callback to flush the LBR stack when task is being scheduled in. The callback is enabled only when there are events use the LBR hardware. This patch also removes all old flush branch stack code. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: eranian@google.com Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1415156173-10035-4-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-28perf/x86/intel: Add model number for AirmontKan Liang
Intel Airmont supports the same architectural and non-architectural performance monitoring events as Silvermont. Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1421913053-99803-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-29perf/x86/intel: Revert incomplete and undocumented Broadwell client supportIngo Molnar
These patches: 86a349a28b24 ("perf/x86/intel: Add Broadwell core support") c46e665f0377 ("perf/x86: Add INST_RETIRED.ALL workarounds") fdda3c4aacec ("perf/x86/intel: Use Broadwell cache event list for Haswell") introduced magic constants and unexplained changes: https://lkml.org/lkml/2014/10/28/1128 https://lkml.org/lkml/2014/10/27/325 https://lkml.org/lkml/2014/8/27/546 https://lkml.org/lkml/2014/10/28/546 Peter Zijlstra has attempted to help out, to clean up the mess: https://lkml.org/lkml/2014/10/28/543 But has not received helpful and constructive replies which makes me doubt wether it can all be finished in time until v3.18 is released. Despite various review feedback the author (Andi Kleen) has answered only few of the review questions and has generally been uncooperative, only giving replies when prompted repeatedly, and only giving minimal answers instead of constructively explaining and helping along the effort. That kind of behavior is not acceptable. There's also a boot crash on Intel E5-1630 v3 CPUs reported for another commit from Andi Kleen: e735b9db12d7 ("perf/x86/intel/uncore: Add Haswell-EP uncore support") https://lkml.org/lkml/2014/10/22/730 Which is not yet resolved. The uncore driver is independent in theory, but the crash makes me worry about how well all these patches were tested and makes me uneasy about the level of interminging that the Broadwell and Haswell code has received by the commits above. As a first step to resolve the mess revert the Broadwell client commits back to the v3.17 version, before we run out of time and problematic code hits a stable upstream kernel. ( If the Haswell-EP crash is not resolved via a simple fix then we'll have to revert the Haswell-EP uncore driver as well. ) The Broadwell client series has to be submitted in a clean fashion, with single, well documented changes per patch. If they are submitted in time and are accepted during review then they can possibly go into v3.19 but will need additional scrutiny due to the rocky history of this patch set. Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1409683455-29168-3-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-15Merge branch 'for-3.18-consistent-ops' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu consistent-ops changes from Tejun Heo: "Way back, before the current percpu allocator was implemented, static and dynamic percpu memory areas were allocated and handled separately and had their own accessors. The distinction has been gone for many years now; however, the now duplicate two sets of accessors remained with the pointer based ones - this_cpu_*() - evolving various other operations over time. During the process, we also accumulated other inconsistent operations. This pull request contains Christoph's patches to clean up the duplicate accessor situation. __get_cpu_var() uses are replaced with with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr(). Unfortunately, the former sometimes is tricky thanks to C being a bit messy with the distinction between lvalues and pointers, which led to a rather ugly solution for cpumask_var_t involving the introduction of this_cpu_cpumask_var_ptr(). This converts most of the uses but not all. Christoph will follow up with the remaining conversions in this merge window and hopefully remove the obsolete accessors" * 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits) irqchip: Properly fetch the per cpu offset percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write. percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t Revert "powerpc: Replace __get_cpu_var uses" percpu: Remove __this_cpu_ptr clocksource: Replace __this_cpu_ptr with raw_cpu_ptr sparc: Replace __get_cpu_var uses avr32: Replace __get_cpu_var with __this_cpu_write blackfin: Replace __get_cpu_var uses tile: Use this_cpu_ptr() for hardware counters tile: Replace __get_cpu_var uses powerpc: Replace __get_cpu_var uses alpha: Replace __get_cpu_var ia64: Replace __get_cpu_var uses s390: cio driver &__get_cpu_var replacements s390: Replace __get_cpu_var uses mips: Replace __get_cpu_var uses MIPS: Replace __get_cpu_var uses in FPU emulator. arm: Replace __this_cpu_ptr with raw_cpu_ptr ...
2014-09-24perf/x86/intel: Use Broadwell cache event list for HaswellAndi Kleen
Use the newly added Broadwell cache event list for Haswell too. All Haswell and Broadwell events and offcore masks used in these lists are identical. However Haswell is very different from the Sandy Bridge list that was used previously. That fixes a wide range of mis-counting cache events. The node events are now only for retired memory events, so prefetching and speculative memory accesses are not included. They are PEBS capable now, which makes it much easier to sample for them, plus it's possible to create address maps with -d. The prefetch events are gone now. They way the hardware counts them is very misleading (some prefetches included, others not), so it seemed best to leave them out. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1409683455-29168-5-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24perf/x86: Add INST_RETIRED.ALL workaroundsAndi Kleen
On Broadwell INST_RETIRED.ALL cannot be used with any period that doesn't have the lowest 6 bits cleared. And the period should not be smaller than 128. Add a new callback to enforce this, and set it for Broadwell. This is erratum BDM57 and BDM11. How does this handle the case when an app requests a specific period with some of the bottom bits set The apps thinks it is sampling at X occurences per sample, when it is in fact at X - 63 (worst case). Short answer: Any useful instruction sampling period needs to be 4-6 orders of magnitude larger than 128, as an PMI every 128 instructions would instantly overwhelm the system and be throttled. So the +-64 error from this is really small compared to the period, much smaller than normal system jitter. Long answer: <write up by Peter:> IFF we guarantee perf_event_attr::sample_period >= 128. Suppose we start out with sample_period=192; then we'll set period_left to 192, we'll end up with left = 128 (we truncate the lower bits). We get an interrupt, find that period_left = 64 (>0 so we return 0 and don't get an overflow handler), up that to 128. Then we trigger again, at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get an overflow). We increment with sample_period so we get left = 128. We fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an overflow). And on and on. So while the individual interrupts are 'wrong' we get then with interval=256,128 in exactly the right ratio to average out at 192. And this works for everything >=128. So the num_samples*fixed_period thing is still entirely correct +- 127, which is good enough I'd say, as you already have that error anyhow. So no need to 'fix' the tools, al we need to do is refuse to create INST_RETIRED:ALL events with sample_period < 128. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Cc: Mark Davies <junk@eslaf.co.uk> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1409683455-29168-4-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24perf/x86/intel: Add Broadwell core supportAndi Kleen
Add Broadwell support for Broadwell Client to perf. This is very similar to Haswell. It uses a new cache event table, because there were various changes there. The constraint list has one new event that needs to be handled over Haswell. The PEBS event list is the same, so we reuse Haswell's. [fengguang.wu: make intel_bdw_event_constraints[] static] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1409683455-29168-3-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24perf/x86/intel: Document all Haswell modelsAndi Kleen
Add names for each Haswell model as requested by Peter. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1409683455-29168-2-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24perf/x86/intel: Remove incorrect model number from Haswell perfAndi Kleen
71 is a Broadwell, not a Haswell. The model number was added by mistake earlier. Remove it for now, until it can be re-added later with real Broadwell support. In practice it does not cause a lot of issues because the Broadwell PMU is very similar to Haswell, but some details were wrong, and it's better to handle it correctly. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Link: http://lkml.kernel.org/r/1409683455-29168-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-26x86: Replace __get_cpu_var usesChristoph Lameter
__get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (*this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Acked-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-13perf/x86: Use extended offcore mask on HaswellAndi Kleen
HSW-EP has a larger offcore mask than the client Haswell CPUs. It is the same mask as on Sandy/IvyBridge-EP. All of Haswell was using the client mask, so some bits were missing. On the client parts some bits were also missing compared to Sandy/IvyBridge, in particular the bits to match on a L4 cache hit. The Haswell core in both client and server incarnations accepts the same bits (but some are nops), so we can use the same mask. So use the snbep extended mask, which is a superset of the client and the server, for all of Haswell. This allows specifying a number of extra offcore events, like for example for HSW-EP. % perf stat -e cpu/event=0xb7,umask=0x1,offcore_rsp=0x3fffc00100,name=offcore_response_pf_l3_rfo_l3_miss_any_response/ true which were <not supported> before. Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: eranian@google.com Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Link: http://lkml.kernel.org/r/1406840722-25416-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-13perf/x86/intel: Update Intel modelsPeter Zijlstra
The model number descriptions got a bit messy, clean them up. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-oo3xclxdoy8s7ubssn929vaj@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16perf/x86/intel: Protect LBR and extra_regs against KVM lyingKan Liang
With -cpu host, KVM reports LBR and extra_regs support, if the host has support. When the guest perf driver tries to access LBR or extra_regs MSR, it #GPs all MSR accesses,since KVM doesn't handle LBR and extra_regs support. So check the related MSRs access right once at initialization time to avoid the error access at runtime. For reproducing the issue, please build the kernel with CONFIG_KVM_INTEL = y (for host kernel). And CONFIG_PARAVIRT = n and CONFIG_KVM_GUEST = n (for guest kernel). Start the guest with -cpu host. Run perf record with --branch-any or --branch-filter in guest to trigger LBR Run perf stat offcore events (E.g. LLC-loads/LLC-load-misses ...) in guest to trigger offcore_rsp #GP Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Cc: Mark Davies <junk@eslaf.co.uk> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: Yan, Zheng <zheng.z.yan@intel.com> Link: http://lkml.kernel.org/r/1405365957-20202-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16perf/x86/intel: Use proper dTLB-load-misses event on IvyBridgeVince Weaver
This was discussed back in February: https://lkml.org/lkml/2014/2/18/956 But I never saw a patch come out of it. On IvyBridge we share the SandyBridge cache event tables, but the dTLB-load-miss event is not compatible. Patch it up after the fact to the proper DTLB_LOAD_MISSES.DEMAND_LD_MISS_CAUSES_A_WALK Signed-off-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1407141528200.17214@vincent-weaver-1.umelst.maine.edu Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-02perf/x86/intel: ignore CondChgd bit to avoid false NMI handlingHATAYAMA Daisuke
Currently, any NMI is falsely handled by a NMI handler of NMI watchdog if CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR is set. For example, we use external NMI to make system panic to get crash dump, but in this case, the external NMI is falsely handled do to the issue. This commit deals with the issue simply by ignoring CondChgd bit. Here is explanation in detail. On x86 NMI watchdog uses performance monitoring feature to periodically signal NMI each time performance counter gets overflowed. intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI handler of NMI watchdog, perf_event_nmi_handler(). It identifies an owner of a given NMI by looking at overflow status bits in MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it handles the given NMI as its own NMI. The problem is that the intel_pmu_handle_irq() doesn't distinguish CondChgd bit from other bits. Unlike the other status bits, CondChgd bit doesn't represent overflow status for performance counters. Thus, CondChgd bit cannot be thought of as a mark indicating a given NMI is NMI watchdog's. As a result, if CondChgd bit is set, any NMI is falsely handled by the NMI handler of NMI watchdog. Also, if type of the falsely handled NMI is either NMI_UNKNOWN, NMI_SERR or NMI_IO_CHECK, the corresponding action is never performed until CondChgd bit is cleared. I noticed this behavior on systems with Ivy Bridge processors: Intel Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems, CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set in the beginning at boot. Then the CondChgd bit is immediately cleared by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain 0. On the other hand, on older processors such as Nehalem, Xeon E7540, CondChgd bit is not set in the beginning at boot. I'm not sure about exact behavior of CondChgd bit, in particular when this bit is set. Although I read Intel System Programmer's Manual to figure out that, the descriptions I found are: In 18.9.1: "The MSR_PERF_GLOBAL_STATUS MSR also provides a ¡sticky bit¢ to indicate changes to the state of performancmonitoring hardware" In Table 35-2 IA-32 Architectural MSRs 63 CondChg: status bits of this register has changed. These are different from the bahviour I see on the actual system as I explained above. At least, I think ignoring CondChgd bit should be enough for NMI watchdog perspective. Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Acked-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20140625.103503.409316067.d.hatayama@jp.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-07perf/x86/intel: Fix Silvermont's event constraintsYan, Zheng
Event 0x013c is not the same as fixed counter2, remove it from Silvermont's event constraints. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1398755081-12471-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-21perf/x86: Correctly use FEATURE_PDCMPeter Zijlstra
The current code simply assumes Intel Arch PerfMon v2+ to have the IA32_PERF_CAPABILITIES MSR; the SDM specifies that we should check CPUID[1].ECX[15] (aka, FEATURE_PDCM) instead. This was found by KVM which implements v2+ but didn't provide the capabilities MSR. Change the code to DTRT; KVM will also implement the MSR and return 0. Cc: pbonzini@redhat.com Reported-by: "Michael S. Tsirkin" <mst@redhat.com> Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140203132903.GI8874@twins.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21perf, nmi: Fix unknown NMI warningMarkus Metzger
When using BTS on Core i7-4*, I get the below kernel warning. $ perf record -c 1 -e branches:u ls Message from syslogd@labpc1501 at Nov 11 15:49:25 ... kernel:[ 438.317893] Uhhuh. NMI received for unknown reason 31 on CPU 2. Message from syslogd@labpc1501 at Nov 11 15:49:25 ... kernel:[ 438.317920] Do you have a strange power saving mode enabled? Message from syslogd@labpc1501 at Nov 11 15:49:25 ... kernel:[ 438.317945] Dazed and confused, but trying to continue Make intel_pmu_handle_irq() take the full exit path when returning early. Cc: eranian@google.com Cc: peterz@infradead.org Cc: mingo@kernel.org Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1392425048-5309-1-git-send-email-andi@firstfloor.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-10-04perf/x86: Suppress duplicated abort LBR recordsAndi Kleen
Haswell always give an extra LBR record after every TSX abort. Suppress the extra record. This only works when the abort is visible in the LBR If the original abort has already left the 16 LBR entries the extra entry will will stay. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1379688044-14173-7-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-04Merge branch 'perf/urgent' into perf/coreIngo Molnar
Pick up the latest fixes before applying new patches. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-23perf/x86/intel: Add model number for Avoton SilvermontYan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Cc: a.p.zijlstra@chello.nl Cc: eranian@google.com Cc: ak@linux.intel.com Link: http://lkml.kernel.org/r/1379837953-17755-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12perf/x86/intel: Clean up EVENT_ATTR_STR() muckIngo Molnar
Make the code a bit more readable by removing stray whitespaces et al. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/n/tip-lzEnychz1ylqy8zjenxOmeht@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12perf/x86/intel: Clean up checkpoint-interrupt bitsPeter Zijlstra
Clean up the weird CP interrupt exception code by keeping a CP mask. Andi suggested this implementation but weirdly didn't actually implement it himself, do so now because it removes the conditional in the interrupt handler and avoids the assumption its only on cnt2. Suggested-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-dvb4q0rydkfp00kqat4p5bah@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12perf/x86/intel: Add Haswell TSX event aliasesAndi Kleen
Add TSX event aliases, and export them from the kernel to perf. These are used by perf stat -T and to allow more user friendly access to events. The events are designed to be fairly generic and may also apply to other architectures implementing HTM. They all cover common situations that happens during tuning of transactional code. For Haswell we have to separate the HLE and RTM events, as they are separate in the PMU. This adds the following events: tx-start Count start transaction (used by perf stat -T) tx-commit Count commit of transaction tx-abort Count all aborts tx-conflict Count aborts due to conflict with another CPU. tx-capacity Count capacity aborts (transaction too large) Then matching el-* events for HLE cycles-t Transactional cycles (used by perf stat -T) * also exists on POWER8 cycles-ct Transactional cycles commited (used by perf stat -T) * according to Michael Ellerman POWER8 has a cycles-transactional-committed, * perf stat -T handles both cases Note for useful abort profiling often precise has to be set, as Haswell can only report the point inside the transaction with precise=2. For some classes of aborts, like conflicts, this is not needed, as it makes more sense to look at the complete critical section. This gives a clean set of generalized events to examine transaction success and aborts. Haswell has additional events for TSX, but those are more specialized for very specific situations. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1378438661-24765-4-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12perf/x86/intel: Avoid checkpointed counters causing excessive TSX abortsAndi Kleen
With checkpointed counters there can be a situation where the counter is overflowing, aborts the transaction, is set back to a non overflowing checkpoint, causes interupt. The interrupt doesn't see the overflow because it has been checkpointed. This is then a spurious PMI, typically with a ugly NMI message. It can also lead to excessive aborts. Avoid this problem by: - Using the full counter width for counting counters (earlier patch) - Forbid sampling for checkpointed counters. It's not too useful anyways, checkpointing is mainly for counting. The check is approximate (to still handle KVM), but should catch the majority of cases. - On a PMI always set back checkpointed counters to zero. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1378438661-24765-2-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12perf/x86/intel: Fix Silvermont offcore masksPeter Zijlstra
Fengguang Wu reported: > sparse warnings: (new ones prefixed by >>) > > >> arch/x86/kernel/cpu/perf_event_intel.c:901:9: sparse: constant 0x768005ffff is so big it is long > >> arch/x86/kernel/cpu/perf_event_intel.c:902:9: sparse: constant 0x768005ffff is so big it is long > > vim +901 arch/x86/kernel/cpu/perf_event_intel.c > > 895 }, > 896 }; > 897 > 898 static struct extra_reg intel_slm_extra_regs[] __read_mostly = > 899 { > 900 /* must define OFFCORE_RSP_X first, see intel_fixup_er() */ > > 901 INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x768005ffff, RSP_0), > > 902 INTEL_UEVENT_EXTRA_REG(0x02b7, MSR_OFFCORE_RSP_1, 0x768005ffff, RSP_1), > 903 EVENT_EXTRA_END > 904 }; > 905 Extend those constants to 64 bits. Reported-by: fengguang.wu@intel.com Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130909112636.GQ31370@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12perf/x86: Add constraint for IVB CYCLE_ACTIVITY:CYCLES_LDM_PENDINGStephane Eranian
The IvyBridge event CYCLE_ACTIVITY:CYCLES_LDM_PENDING can only be measured on counters 0-3 when HT is off. When HT is on, you only have counters 0-3. If you program it on the eight counters for 1s on a 3GHz IVB laptop running a noploop, you see: 2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING Clearly the last 4 values are bogus. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: ak@linux.intel.com Cc: zheng.z.yan@intel.com Cc: dhsharp@google.com Link: http://lkml.kernel.org/r/20130911152222.GA28761@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02perf/x86: Add Silvermont (22nm Atom) supportYan, Zheng
Compared to old atom, Silvermont has offcore and has more events that support PEBS. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374138144-17278-2-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02perf/x86: use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_XYan, Zheng
Silvermont (22nm Atom) has two offcore response configuration MSRs, unlike other Intel CPU, its event code for MSR_OFFCORE_RSP_1 is 0x02b7. To avoid complicating intel_fixup_er(), use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_X. So intel_fixup_er() can find the event code for OFFCORE_RSP_N by x86_pmu.extra_regs[N].event. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374138144-17278-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-12perf/x86: Add Haswell ULT model number used in Macbook Air and other systemsAndi Kleen
This one was missed earlier. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1376007983-31616-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26perf/x86: Fix shared register mutual exclusion enforcementStephane Eranian
This patch fixes a problem with the shared registers mutual exclusion code and incremental event scheduling by the generic perf_event code. There was a bug whereby the mutual exclusion on the shared registers was not enforced because of incremental scheduling abort due to event constraints. As an example on Intel Nehalem, consider the following events: group1= L1D_CACHE_LD:E_STATE,OFFCORE_RESPONSE_0:PF_RFO,L1D_CACHE_LD:I_STATE group2= L1D_CACHE_LD:I_STATE The L1D_CACHE_LD event can only be measured by 2 counters. Yet, there are 3 instances here. The first group can be scheduled and is committed. Then, the generic code tries to schedule group2 and this fails (because there is no more counter to support the 3rd instance of L1D_CACHE_LD). But in x86_schedule_events() error path, put_event_contraints() is invoked on ALL the events and not just the ones that just failed. That causes the "lock" on the shared offcore_response MSR to be released. Yet the first group is actually scheduled and is exposed to reprogramming of that shared msr by the sibling HT thread. In other words, there is no guarantee on what is measured. This patch fixes the problem by tagging committed events with the PERF_X86_EVENT_COMMITTED tag. In the error path of x86_schedule_events(), only the events NOT tagged have their constraint released. The tag is eventually removed when the event in descheduled. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130620164254.GA3556@quad Signed-off-by: Ingo Molnar <mingo@kernel.org>