summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-09-02perf trace: Add beautifier for lseek's whence argArnaldo Carvalho de Melo
[root@zoo ~]# perf trace -a -e lseek | head -1 546.922 ( 0.004 ms): 1184 lseek(fd: 26, offset: 0, whence: CUR) = 2 [root@zoo ~]# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-2eiuhwz9jbnhj80q6jaqeji4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-02vfs: use lockref_get_not_zero() for optimistic lockless dget_parent()Waiman Long
A valid parent pointer is always going to have a non-zero reference count, but if we look up the parent optimistically without locking, we have to protect against the (very unlikely) race against renaming changing the parent from under us. We do that by using lockref_get_not_zero(), and then re-checking the parent pointer after getting a valid reference. [ This is a re-implementation of a chunk from the original patch by Waiman Long: "dcache: Enable lockless update of dentry's refcount". I've completely rewritten the patch-series and split it up, but I'm attributing this part to Waiman as it's close enough to his earlier patch - Linus ] Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-02lockref: add 'lockref_get_or_lock() helperLinus Torvalds
This behaves like "lockref_get_not_zero()", but instead of doing nothing if the count was zero, it returns with the lock held. This allows callers to revalidate the lockref-protected data structure if required even if the count was zero to begin with, and possibly increment the count if it passes muster. In particular, the dentry code wants this when it wants to turn an RCU-protected dentry into a stable refcounted one: if the dentry count it zero, but the sequence number still validates the dentry, we can take a reference to it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-02perf tools: Fix symbol offset computation for some dsosDavid Ahern
For some dsos (e.g., libc, libpthread, kernel modules) the symbol offset is huge. e.g., qemu-kvm 17238/17242 [007] 762235.640311: ffffffff816288a1 __schedule+0x451 ([kernel.kallsyms]) ffffffff81629609 schedule+0x29 ([kernel.kallsyms]) ffffffffa00a6ded kvm_vcpu_block+0xffffffffa00a106d (/lib/modules/3.11.0-rc1+/kernel/arch/x86/kvm/kvm.ko) ffffffffa00bae6b kvm_arch_vcpu_ioctl_run+0xffffffffa00a118b (/lib/modules/3.11.0-rc1+/kernel/arch/x86/kvm/kvm.ko) ffffffffa00a4d7a kvm_vcpu_ioctl+0xffffffffa00a141a (/lib/modules/3.11.0-rc1+/kernel/arch/x86/kvm/kvm.ko) ffffffff811a7bdb do_vfs_ioctl+0x8b ([kernel.kallsyms]) ffffffff811a80c1 sys_ioctl+0x91 ([kernel.kallsyms]) ffffffff81633182 system_call+0x72 ([kernel.kallsyms]) 7f882a97af27 __GI___ioctl+0x7f882a891007 (/lib64/libc-2.14.90.so) 100000002 [unknown] ([unknown]) It seems to be maps with a non-0 start. Taking that into account the offsets are correct: qemu-kvm 17238/17242 [007] 762235.640311: ffffffff816288a1 __schedule+0x451 ([kernel.kallsyms]) ffffffff81629609 schedule+0x29 ([kernel.kallsyms]) ffffffffa00a6ded kvm_vcpu_block+0x6d (/lib/modules/3.11.0-rc1+/kernel/arch/x86/kvm/kvm.ko) ffffffffa00bae6b kvm_arch_vcpu_ioctl_run+0x18b (/lib/modules/3.11.0-rc1+/kernel/arch/x86/kvm/kvm.ko) ffffffffa00a4d7a kvm_vcpu_ioctl+0x41a (/lib/modules/3.11.0-rc1+/kernel/arch/x86/kvm/kvm.ko) ffffffff811a7bdb do_vfs_ioctl+0x8b ([kernel.kallsyms]) ffffffff811a80c1 sys_ioctl+0x91 ([kernel.kallsyms]) ffffffff81633182 system_call+0x72 ([kernel.kallsyms]) 7f882a97af27 __GI___ioctl+0x7 (/lib64/libc-2.14.90.so) 100000002 [unknown] ([unknown]) Signed-off-by: David Ahern <dsahern@gmail.com> Link: http://lkml.kernel.org/r/1375026512-45826-1-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-02perf list: Skip unsupported eventsNamhyung Kim
Some hardware events might not be supported on a system. Listing those events seems meaningless and confusing to users. Let's skip them. Before: $ perf list cache | wc -l 33 After: $ perf list cache | wc -l 27 Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1377571313-14722-1-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-02perf tests: Add 'keep tracking' testAdrian Hunter
Add a test for the newly added PERF_COUNT_SW_DUMMY event. The test checks that tracking events continue when an event is disabled but a dummy software event is not disabled. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1377975053-3811-4-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-02perf tools: Add support for PERF_COUNT_SW_DUMMYAdrian Hunter
Add support for the new dummy software event PERF_COUNT_SW_DUMMY. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1377975053-3811-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-02perf: Add a dummy software event to keep trackingAdrian Hunter
When an event is disabled the "tracking" events selected by the 'mmap', 'comm' and 'task' bits of struct perf_event_attr, are also disabled. However, the information those events provide is necessary to resolve symbols for when the main event is re-enabled. The "tracking" events can be kept enabled by putting them on another event, but that requires an event that otherwise does nothing. A new software event PERF_COUNT_SW_DUMMY is added for that purpose. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1377975053-3811-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-02perf trace: Add beautifier for futex 'operation' parmArnaldo Carvalho de Melo
That uses the arg mask mechanism just introduced to suppress ignored arguments according to the futex operation. Based on an initial patch from David Ahern that showed the need for some way to allow args to tell how many further args should be shown. Initial-patch-by: David Ahern <dsahern@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-0k30it46r4hv5eanefbdmj5t@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-02perf trace: Allow syscall arg formatters to mask argsArnaldo Carvalho de Melo
The futex syscall ignores some arguments according to the 'operation' arg, so allow arg formatters to mask those. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-abqrg3oldgfsdnltfrvso9f7@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-02Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fix from James Bottomley: "This is a bug fix for the pm80xx driver. It turns out that when the new hardware support was added in 3.10 the IO command size was kept at the old hard coded value. This means that the driver attaches to some new cards and then simply hangs the system" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: [SCSI] pm80xx: fix Adaptec 71605H hang
2013-09-02Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot fix from Peter Anvin: "A single very small boot fix for very large memory systems (> 0.5T)" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Fix boot crash with DEBUG_PAGE_ALLOC=y and more than 512G RAM
2013-09-02Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds
Pull slave-dma fix from Vinod Koul: "A fix for resolving TI_EDMA driver's build error in allmodconfig to have filter function built in"" * 'fixes' of git://git.infradead.org/users/vkoul/slave-dma: dma/Kconfig: TI_EDMA needs to be boolean
2013-09-02perf: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node()Joe Perches
Use the convenience function instead of __GFP_ZERO. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/f58599ae1a8d7b32d37e9cf283e95fba6452f7f6.1377809875.git.joe@perches.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02perf: Export struct perf_branch_entry to userspaceVince Weaver
If PERF_SAMPLE_BRANCH_STACK is enabled then samples are returned with the format { u64 from, to, flags } but the flags layout is not specified. This field has the type struct perf_branch_entry; move this definition into include/uapi/linux/perf_event.h so users can access these fields. This is similar to the existing inclusion of perf_mem_data_src in the include/uapi/linux/perf_event.h file. Signed-off-by: Vince Weaver <vincent.weaver@maine.edu> Acked-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1308231544420.1889@vincent-weaver-1.um.maine.edu Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02perf: Add attr->mmap2 attribute to an eventStephane Eranian
Adds a new PERF_RECORD_MMAP2 record type which is essence an expanded version of PERF_RECORD_MMAP. Used to request mmap records with more information about the mapping, including device major, minor and the inode number and generation for mappings associated with files or shared memory segments. Works for code and data (with attr->mmap_data set). Existing PERF_RECORD_MMAP record is unmodified by this patch. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/1377079825-19057-2-git-send-email-eranian@google.com [ Added Al to the Cc:. Are the ino, maj/min exports of vma->vm_file OK? ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02perf/x86: Add Silvermont (22nm Atom) supportYan, Zheng
Compared to old atom, Silvermont has offcore and has more events that support PEBS. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374138144-17278-2-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02perf/x86: use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_XYan, Zheng
Silvermont (22nm Atom) has two offcore response configuration MSRs, unlike other Intel CPU, its event code for MSR_OFFCORE_RSP_1 is 0x02b7. To avoid complicating intel_fixup_er(), use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_X. So intel_fixup_er() can find the event code for OFFCORE_RSP_N by x86_pmu.extra_regs[N].event. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374138144-17278-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02sched/fair: Fix the sd_parent_degenerate() codePeter Zijlstra
I found that on my WSM box I had a redundant domain: [ 0.949769] CPU0 attaching sched-domain: [ 0.953765] domain 0: span 0,12 level SIBLING [ 0.958335] groups: 0 (cpu_power = 587) 12 (cpu_power = 588) [ 0.964548] domain 1: span 0-5,12-17 level MC [ 0.969206] groups: 0,12 (cpu_power = 1175) 1,13 (cpu_power = 1176) 2,14 (cpu_power = 1176) 3,15 (cpu_power = 1176) 4,16 (cpu_power = 1176) 5,17 (cpu_power = 1176) [ 0.984993] domain 2: span 0-5,12-17 level CPU [ 0.989822] groups: 0-5,12-17 (cpu_power = 7055) [ 0.995049] domain 3: span 0-23 level NUMA [ 0.999620] groups: 0-5,12-17 (cpu_power = 7055) 6-11,18-23 (cpu_power = 7056) Note how domain 2 has only a single group and spans the same CPUs as domain 1. We should not keep such domains and do in fact have code to prune these. It turns out that the 'new' SD_PREFER_SIBLING flag causes this, it makes sd_parent_degenerate() fail on the CPU domain. We can easily fix this by 'ignoring' the SD_PREFER_SIBLING bit and transfering it to whatever domain ends up covering the span. With this patch the domains now look like this: [ 0.950419] CPU0 attaching sched-domain: [ 0.954454] domain 0: span 0,12 level SIBLING [ 0.959039] groups: 0 (cpu_power = 587) 12 (cpu_power = 588) [ 0.965271] domain 1: span 0-5,12-17 level MC [ 0.969936] groups: 0,12 (cpu_power = 1175) 1,13 (cpu_power = 1176) 2,14 (cpu_power = 1176) 3,15 (cpu_power = 1176) 4,16 (cpu_power = 1176) 5,17 (cpu_power = 1176) [ 0.985737] domain 2: span 0-23 level NUMA [ 0.990231] groups: 0-5,12-17 (cpu_power = 7055) 6-11,18-23 (cpu_power = 7056) Reviewed-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-ys201g4jwukj0h8xcamakxq1@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02sched/fair: Rework and comment the group_imb codePeter Zijlstra
Rik reported some weirdness due to the group_imb code. As a start to looking at it, clean it up a little and add a few explanatory comments. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-caeeqttnla4wrrmhp5uf89gp@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02sched/fair: Optimize find_busiest_queue()Peter Zijlstra
Use for_each_cpu_and() and thereby avoid computing the capacity for CPUs we know we're not interested in. Reviewed-by: Paul Turner <pjt@google.com> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-lppceyv6kb3a19g8spmrn20b@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02sched/fair: Make group power more consistentPeter Zijlstra
For easier access, less dereferences and more consistent value, store the group power in update_sg_lb_stats() and use it thereafter. The actual value in sched_group::sched_group_power::power can change throughout the load-balance pass if we're unlucky. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-739xxqkyvftrhnh9ncudutc7@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02sched/fair: Remove duplicate load_per_task computationsPeter Zijlstra
Since we already compute (but don't store) the sgs load_per_task value in update_sg_lb_stats() we might as well store it and not re-compute it later on. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-ym1vmljiwbzgdnnrwp9azftq@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02sched/fair: Shrink sg_lb_stats and play memset gamesPeter Zijlstra
We can shrink sg_lb_stats because rq::nr_running is an unsigned int and cpu numbers are 'int' Before: sgs: /* size: 72, cachelines: 2, members: 10 */ sds: /* size: 184, cachelines: 3, members: 7 */ After: sgs: /* size: 56, cachelines: 1, members: 10 */ sds: /* size: 152, cachelines: 3, members: 7 */ Further we can avoid clearing all of sds since we do a total clear/assignment of sg_stats in update_sg_lb_stats() with exception of busiest_stat.avg_load which is referenced in update_sd_pick_busiest(). Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-0klzmz9okll8wc0nsudguc9p@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02sched: Clean-up struct sd_lb_statJoonsoo Kim
There is no reason to maintain separate variables for this_group and busiest_group in sd_lb_stat, except saving some space. But this structure is always allocated in stack, so this saving isn't really benificial [peterz: reducing stack space is good; in this case readability increases enough that I think its still beneficial] This patch unify these variables, so IMO, readability may be improved. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> [ Rename this to local -- avoids confusion between this_cpu and the C++ this pointer. ] Reviewed-by: Paul Turner <pjt@google.com> [ Lots of style edits, a few fixes and a rename. ] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1375778203-31343-4-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02sched: Factor out code to should_we_balance()Joonsoo Kim
Now checking whether this cpu is appropriate to balance or not is embedded into update_sg_lb_stats() and this checking has no direct relationship to this function. There is not enough reason to place this checking at update_sg_lb_stats(), except saving one iteration for sched_group_cpus. In this patch, I factor out this checking to should_we_balance() function. And before doing actual work for load_balancing, check whether this cpu is appropriate to balance via should_we_balance(). If this cpu is not a candidate for balancing, it quit the work immediately. With this change, we can save two memset cost and can expect better compiler optimization. Below is result of this patch. * Vanilla * text data bss dec hex filename 34499 1136 116 35751 8ba7 kernel/sched/fair.o * Patched * text data bss dec hex filename 34243 1136 116 35495 8aa7 kernel/sched/fair.o In addition, rename @balance to @continue_balancing in order to represent its purpose more clearly. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> [ s/should_balance/continue_balancing/g ] Reviewed-by: Paul Turner <pjt@google.com> [ Made style changes and a fix in should_we_balance(). ] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1375778203-31343-3-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02sched: Remove one division operation in find_busiest_queue()Joonsoo Kim
Remove one division operation in find_busiest_queue() by using crosswise multiplication: wl_i / power_i > wl_j / power_j := wl_i * power_j > wl_j * power_i Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> [ Expanded the changelog. ] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1375778203-31343-2-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02perf: Prevent race in unthrottling codeJiri Olsa
The current throttling code triggers WARN below via following workload (only hit on AMD machine with 48 CPUs): # while [ 1 ]; do perf record perf bench sched messaging; done WARNING: at arch/x86/kernel/cpu/perf_event.c:1054 x86_pmu_start+0xc6/0x100() SNIP Call Trace: <IRQ> [<ffffffff815f62d6>] dump_stack+0x19/0x1b [<ffffffff8105f531>] warn_slowpath_common+0x61/0x80 [<ffffffff8105f60a>] warn_slowpath_null+0x1a/0x20 [<ffffffff810213a6>] x86_pmu_start+0xc6/0x100 [<ffffffff81129dd2>] perf_adjust_freq_unthr_context.part.75+0x182/0x1a0 [<ffffffff8112a058>] perf_event_task_tick+0xc8/0xf0 [<ffffffff81093221>] scheduler_tick+0xd1/0x140 [<ffffffff81070176>] update_process_times+0x66/0x80 [<ffffffff810b9565>] tick_sched_handle.isra.15+0x25/0x60 [<ffffffff810b95e1>] tick_sched_timer+0x41/0x60 [<ffffffff81087c24>] __run_hrtimer+0x74/0x1d0 [<ffffffff810b95a0>] ? tick_sched_handle.isra.15+0x60/0x60 [<ffffffff81088407>] hrtimer_interrupt+0xf7/0x240 [<ffffffff81606829>] smp_apic_timer_interrupt+0x69/0x9c [<ffffffff8160569d>] apic_timer_interrupt+0x6d/0x80 <EOI> [<ffffffff81129f74>] ? __perf_event_task_sched_in+0x184/0x1a0 [<ffffffff814dd937>] ? kfree_skbmem+0x37/0x90 [<ffffffff815f2c47>] ? __slab_free+0x1ac/0x30f [<ffffffff8118143d>] ? kfree+0xfd/0x130 [<ffffffff81181622>] kmem_cache_free+0x1b2/0x1d0 [<ffffffff814dd937>] kfree_skbmem+0x37/0x90 [<ffffffff814e03c4>] consume_skb+0x34/0x80 [<ffffffff8158b057>] unix_stream_recvmsg+0x4e7/0x820 [<ffffffff814d5546>] sock_aio_read.part.7+0x116/0x130 [<ffffffff8112c10c>] ? __perf_sw_event+0x19c/0x1e0 [<ffffffff814d5581>] sock_aio_read+0x21/0x30 [<ffffffff8119a5d0>] do_sync_read+0x80/0xb0 [<ffffffff8119ac85>] vfs_read+0x145/0x170 [<ffffffff8119b699>] SyS_read+0x49/0xa0 [<ffffffff810df516>] ? __audit_syscall_exit+0x1f6/0x2a0 [<ffffffff81604a19>] system_call_fastpath+0x16/0x1b ---[ end trace 622b7e226c4a766a ]--- The reason is a race in perf_event_task_tick() throttling code. The race flow (simplified code): - perf_throttled_count is per cpu variable and is CPU throttling flag, here starting with 0 - perf_throttled_seq is sequence/domain for allowed count of interrupts within the tick, gets increased each tick on single CPU (CPU bounded event): ... workload perf_event_task_tick: | | T0 inc(perf_throttled_seq) | T1 needs_unthr = xchg(perf_throttled_count, 0) == 0 tick gets interrupted: ... event gets throttled under new seq ... T2 last NMI comes, event is throttled - inc(perf_throttled_count) back to tick: | perf_adjust_freq_unthr_context: | | T3 unthrottling is skiped for event (needs_unthr == 0) | T4 event is stop and started via freq adjustment | tick ends ... workload ... no sample is hit for event ... perf_event_task_tick: | | T5 needs_unthr = xchg(perf_throttled_count, 0) != 0 (from T2) | T6 unthrottling is done on event (interrupts == MAX_INTERRUPTS) | event is already started (from T4) -> WARN Fixing this by not checking needs_unthr again and thus check all events for unthrottling. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Reported-by: Jan Stancek <jstancek@redhat.com> Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1377355554-8934-1-git-send-email-jolsa@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-01Introduce [compat_]save_altstack_ex() to unbreak x86 SMAPAl Viro
For performance reasons, when SMAP is in use, SMAP is left open for an entire put_user_try { ... } put_user_catch(); block, however, calling __put_user() in the middle of that block will close SMAP as the STAC..CLAC constructs intentionally do not nest. Furthermore, using __put_user() rather than put_user_ex() here is bad for performance. Thus, introduce new [compat_]save_altstack_ex() helpers that replace __[compat_]save_altstack() for x86, being currently the only architecture which supports put_user_try { ... } put_user_catch(). Reported-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # v3.8+ Link: http://lkml.kernel.org/n/tip-es5p6y64if71k8p5u08agv9n@git.kernel.org
2013-09-01x86, smap: Handle csum_partial_copy_*_user()H. Peter Anvin
Add SMAP annotations to csum_partial_copy_to/from_user(). These functions legitimately access user space and thus need to set the AC flag. TODO: add explicit checks that the side with the kernel space pointer really points into kernel space. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/n/tip-2aps0u00eer658fd5xyanan7@git.kernel.org Cc: <stable@vger.kernel.org> # v3.7+
2013-09-01Merge remote-tracking branch 'regulator/topic/tps65912' into regulator-nextMark Brown
2013-09-01Merge remote-tracking branch 'regulator/topic/ti-abb' into regulator-nextMark Brown
2013-09-01Merge remote-tracking branch 'regulator/topic/sec' into regulator-nextMark Brown
2013-09-01Merge remote-tracking branch 'regulator/topic/ramp' into regulator-nextMark Brown
2013-09-01Merge remote-tracking branch 'regulator/topic/pfuze100' into regulator-nextMark Brown
2013-09-01Merge remote-tracking branch 'regulator/topic/palmas' into regulator-nextMark Brown
2013-09-01Merge remote-tracking branch 'regulator/topic/optional' into regulator-nextMark Brown
2013-09-01Merge remote-tracking branch 'regulator/topic/max8660' into regulator-nextMark Brown
2013-09-01Merge remote-tracking branch 'regulator/topic/lp8755' into regulator-nextMark Brown
2013-09-01Merge remote-tracking branch 'regulator/topic/lp872x' into regulator-nextMark Brown
2013-09-01Merge remote-tracking branch 'regulator/topic/linear-range' into regulator-nextMark Brown
2013-09-01Merge remote-tracking branch 'regulator/topic/kconfig' into regulator-nextMark Brown
2013-09-01Merge remote-tracking branch 'regulator/topic/helpers' into regulator-nextMark Brown
2013-09-01Merge remote-tracking branch 'regulator/topic/fan53555' into regulator-nextMark Brown
2013-09-01Merge remote-tracking branch 'regulator/topic/da9063' into regulator-nextMark Brown
2013-09-01Merge remote-tracking branch 'regulator/topic/core' into regulator-nextMark Brown
2013-09-01Merge remote-tracking branch 'regulator/topic/as3711' into regulator-nextMark Brown
2013-09-01Merge remote-tracking branch 'regulator/topic/88pm800' into regulator-nextMark Brown
2013-09-01Merge remote-tracking branch 'spi/topic/txx9' into spi-nextMark Brown
2013-09-01Merge remote-tracking branch 'spi/topic/topcliff' into spi-nextMark Brown