summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2016-06-15PM / sleep: Make pm_prepare_console() return voidBorislav Petkov
Nothing is using its return value so change it to return void. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-14rcutorture: Fix error return code in rcu_perf_init()Wei Yongjun
Fix to return a negative error code -ENOMEM from kcalloc() error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14rcuperf: Don't treat gp_exp mis-setting as a WARNBoqun Feng
0day found a boot warning triggered in rcu_perf_writer() on !SMP kernel: WARN_ON(rcu_gp_is_normal() && gp_exp); , the root cause of which is trying to measure expedited grace periods(by setting gp_exp to true by default) when all the grace periods are normal(TINY RCU only has normal grace periods). However, such a mis-setting would only result in failing to measure the performance for a specific kind of grace periods, therefore using a WARN_ON to check this is a little overkilling. We could handle this inside rcuperf module via some error messages to tell users about the mis-settings. Therefore this patch removes the WARN_ON in rcu_perf_writer() and handles those checkings in rcu_perf_init() with plain if() code. Moreover, this patch changes the default value of gp_exp to 1) align with rcutorture tests and 2) make the default setting work for all RCU implementations by default. Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Fixes: http://lkml.kernel.org/r/57411b10.mFvG0+AgcrMXGtcj%fengguang.wu@intel.com Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14torture: Stop onoff task if there is only one cpuBoqun Feng
If the whole system has only one cpu, that cpu won't be able to be offlined, so there is no need onoff task is stil running. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14torture: Break online and offline functions out of torture_onoff()Paul E. McKenney
This commit breaks torture_online() and torture_offline() out of torture_onoff() in preparation for allowing waketorture finer-grained control of its CPU-hotplug workload. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14torture: Remove CONFIG_RCU_TORTURE_TEST_RUNNABLE, simplify codePaul E. McKenney
This commit removes CONFIG_RCU_TORTURE_TEST_RUNNABLE in favor of the already-existing rcutorture.torture_runnable kernel boot parameter. It also converts an #ifdef into IS_ENABLED(), saving a few lines of code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14torture: Simplify code, eliminate RCU_PERF_TEST_RUNNABLEPaul E. McKenney
This commit applies the infamous IS_ENABLED() macro to eliminate a #ifdef. It also eliminates the RCU_PERF_TEST_RUNNABLE Kconfig option in favor of the already-existing rcuperf.perf_runnable kernel boot parameter. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14rcu: Move expedited code from tree_plugin.h to tree_exp.hPaul E. McKenney
People have been having some difficulty finding their way around the RCU code. This commit therefore pulls some of the expedited grace-period code from tree_plugin.h to a new tree_exp.h file. This commit is strictly code movement. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14rcu: Move expedited code from tree.c to tree_exp.hPaul E. McKenney
People have been having some difficulty finding their way around the RCU code. This commit therefore pulls some of the expedited grace-period code from tree.c to a new tree_exp.h file. This commit is strictly code movement, with the exception of a forward declaration that was added for the sync_sched_exp_online_cleanup() function. A subsequent commit will move the remaining expedited grace-period code from tree_plugin.h to tree_exp.h. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14rcu: Remove some superfluous linesPeter Zijlstra
I think you'll find this condition is superfluous, as the whole function is under #ifdef of that same. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14rcu: Fix outdated hotplug-exclusion comment in rcu_gp_init()Paul E. McKenney
In the past, RCU grace-period initialization excluded CPU-hotplug operations, but this is no longer the case. This commit therefore removed an outdated comment in rcu_gp_init() claiming that these are excluded. Reported-by: Lihao Liang <lihao.liang@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14rcu: Fix outdated rcu_scheduler_active commentPaul E. McKenney
The comment header for rcu_scheduler_active states that it is used to optimize synchronize_sched() at early boot. This is incorrect. The synchronize_sched() function instead checks the number of online CPUs. This commit therefore replaces the comment's synchronize_sched() with synchronize_rcu(), which really does use rcu_scheduler_active for this purpose. Reported-by: Lihao Liang <lihao.liang@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14seccomp: recheck the syscall after RET_TRACEKees Cook
When RET_TRACE triggers, a tracer may change a syscall into something that should be filtered by seccomp. This re-runs seccomp after a trace event to make sure things continue to pass. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@kernel.org>
2016-06-14seccomp: remove 2-phase APIKees Cook
Since nothing is using the 2-phase API, and it adds more complexity than benefit, remove it. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@kernel.org>
2016-06-14seccomp: Add a seccomp_data parameter secure_computing()Andy Lutomirski
Currently, if arch code wants to supply seccomp_data directly to seccomp (which is generally much faster than having seccomp do it using the syscall_get_xyz() API), it has to use the two-phase seccomp hooks. Add it to the easy hooks, too. Cc: linux-arch@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2016-06-14kernel/sysrq, watchdog, sched/core: Reset watchdog on all CPUs while ↵Andrey Ryabinin
processing sysrq-w Lengthy output of sysrq-w may take a lot of time on slow serial console. Currently we reset NMI-watchdog on the current CPU to avoid spurious lockup messages. Sometimes this doesn't work since softlockup watchdog might trigger on another CPU which is waiting for an IPI to proceed. We reset softlockup watchdogs on all CPUs, but we do this only after listing all tasks, and this may be too late on a busy system. So, reset watchdogs CPUs earlier, in for_each_process_thread() loop. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1465474805-14641-1-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14sched/debug: Fix deadlock when enabling sched eventsJosh Poimboeuf
I see a hang when enabling sched events: echo 1 > /sys/kernel/debug/tracing/events/sched/enable The printk buffer shows: BUG: spinlock recursion on CPU#1, swapper/1/0 lock: 0xffff88007d5d8c00, .magic: dead4ead, .owner: swapper/1/0, .owner_cpu: 1 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc2+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 ... Call Trace: <IRQ> [<ffffffff8143d663>] dump_stack+0x85/0xc2 [<ffffffff81115948>] spin_dump+0x78/0xc0 [<ffffffff81115aea>] do_raw_spin_lock+0x11a/0x150 [<ffffffff81891471>] _raw_spin_lock+0x61/0x80 [<ffffffff810e5466>] ? try_to_wake_up+0x256/0x4e0 [<ffffffff810e5466>] try_to_wake_up+0x256/0x4e0 [<ffffffff81891a0a>] ? _raw_spin_unlock_irqrestore+0x4a/0x80 [<ffffffff810e5705>] wake_up_process+0x15/0x20 [<ffffffff810cebb4>] insert_work+0x84/0xc0 [<ffffffff810ced7f>] __queue_work+0x18f/0x660 [<ffffffff810cf9a6>] queue_work_on+0x46/0x90 [<ffffffffa00cd95b>] drm_fb_helper_dirty.isra.11+0xcb/0xe0 [drm_kms_helper] [<ffffffffa00cdac0>] drm_fb_helper_sys_imageblit+0x30/0x40 [drm_kms_helper] [<ffffffff814babcd>] soft_cursor+0x1ad/0x230 [<ffffffff814ba379>] bit_cursor+0x649/0x680 [<ffffffff814b9d30>] ? update_attr.isra.2+0x90/0x90 [<ffffffff814b5e6a>] fbcon_cursor+0x14a/0x1c0 [<ffffffff81555ef8>] hide_cursor+0x28/0x90 [<ffffffff81558b6f>] vt_console_print+0x3bf/0x3f0 [<ffffffff81122c63>] call_console_drivers.constprop.24+0x183/0x200 [<ffffffff811241f4>] console_unlock+0x3d4/0x610 [<ffffffff811247f5>] vprintk_emit+0x3c5/0x610 [<ffffffff81124bc9>] vprintk_default+0x29/0x40 [<ffffffff811e965b>] printk+0x57/0x73 [<ffffffff810f7a9e>] enqueue_entity+0xc2e/0xc70 [<ffffffff810f7b39>] enqueue_task_fair+0x59/0xab0 [<ffffffff8106dcd9>] ? kvm_sched_clock_read+0x9/0x20 [<ffffffff8103fb39>] ? sched_clock+0x9/0x10 [<ffffffff810e3fcc>] activate_task+0x5c/0xa0 [<ffffffff810e4514>] ttwu_do_activate+0x54/0xb0 [<ffffffff810e5cea>] sched_ttwu_pending+0x7a/0xb0 [<ffffffff810e5e51>] scheduler_ipi+0x61/0x170 [<ffffffff81059e7f>] smp_trace_reschedule_interrupt+0x4f/0x2a0 [<ffffffff81893ba6>] trace_reschedule_interrupt+0x96/0xa0 <EOI> [<ffffffff8106e0d6>] ? native_safe_halt+0x6/0x10 [<ffffffff8110fb1d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff81040ac0>] default_idle+0x20/0x1a0 [<ffffffff8104147f>] arch_cpu_idle+0xf/0x20 [<ffffffff81102f8f>] default_idle_call+0x2f/0x50 [<ffffffff8110332e>] cpu_startup_entry+0x37e/0x450 [<ffffffff8105af70>] start_secondary+0x160/0x1a0 Note the hang only occurs when echoing the above from a physical serial console, not from an ssh session. The bug is caused by a deadlock where the task is trying to grab the rq lock twice because printk()'s aren't safe in sched code. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default") Link: http://lkml.kernel.org/r/20160613073209.gdvdybiruljbkn3p@treble Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14locking/spinlock: Update spin_unlock_wait() usersPeter Zijlstra
With the modified semantics of spin_unlock_wait() a number of explicit barriers can be removed. Also update the comment for the do_exit() usecase, as that was somewhat stale/obscure. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14locking/barriers: Introduce smp_acquire__after_ctrl_dep()Peter Zijlstra
Introduce smp_acquire__after_ctrl_dep(), this construct is not uncommon, but the lack of this barrier is. Use it to better express smp_rmb() uses in WRITE_ONCE(), the IPC semaphore code and the qspinlock code. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14locking/barriers: Replace smp_cond_acquire() with smp_cond_load_acquire()Peter Zijlstra
This new form allows using hardware assisted waiting. Some hardware (ARM64 and x86) allow monitoring an address for changes, so by providing a pointer we can use this to replace the cpu_relax() with hardware optimized methods in the future. Requested-by: Will Deacon <will.deacon@arm.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14Merge branch 'linus' into locking/core, to pick up fixes before merging new ↵Ingo Molnar
changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86Andi Kleen
The NMI watchdog uses either the fixed cycles or a generic cycles counter. This causes a lot of conflicts with users of the PMU who want to run a full group including the cycles fixed counter, for example the --topdown support recently added to perf stat. The code needs to fall back to not use groups, which can cause measurement inaccuracy due to multiplexing errors. This patch switches the NMI watchdog to use reference cycles on Intel systems. This is actually more accurate than cycles, because cycles can tick faster than the measured CPU Frequency due to Turbo mode. The ref cycles always tick at their frequency, or slower when the system is idling. That means the NMI watchdog can never expire too early, unlike with cycles. The reference cycles tick roughly at the frequency of the TSC, so the same period computation can be used. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: jolsa@kernel.org Link: http://lkml.kernel.org/r/1465478079-19993-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14Merge branch 'linus' into perf/core, to pick up fixes before merging new changesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14sched/cputime: Add steal time support to full dynticks CPU time accountingWanpeng Li
This patch adds guest steal-time support to full dynticks CPU time accounting. After the following commit: ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity") ... time sampling became jiffy based, even if we do the sampling from the context tracking code, so steal_account_process_tick() can be reused to account how many 'ticks' are stolen-time, after the last accumulation. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1465813966-3116-4-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14sched/cputime: Fix prev steal time accouting during CPU hotplugWanpeng Li
Commit: e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug") ... set rq->prev_* to 0 after a CPU hotplug comes back, in order to fix the case where (after CPU hotplug) steal time is smaller than rq->prev_steal_time. However, this should never happen. Steal time was only smaller because of the KVM-specific bug fixed by the previous patch. Worse, the previous patch triggers a bug on CPU hot-unplug/plug operation: because rq->prev_steal_time is cleared, all of the CPU's past steal time will be accounted again on hot-plug. Since the root cause has been fixed, we can just revert commit e9532e69b8d1. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 'commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")' Link: http://lkml.kernel.org/r/1465813966-3116-3-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14Merge branch 'sched/urgent' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14sched/fair: Fix post_init_entity_util_avg() serializationPeter Zijlstra
Chris Wilson reported a divide by 0 at: post_init_entity_util_avg(): > 725 if (cfs_rq->avg.util_avg != 0) { > 726 sa->util_avg = cfs_rq->avg.util_avg * se->load.weight; > -> 727 sa->util_avg /= (cfs_rq->avg.load_avg + 1); > 728 > 729 if (sa->util_avg > cap) > 730 sa->util_avg = cap; > 731 } else { Which given the lack of serialization, and the code generated from update_cfs_rq_load_avg() is entirely possible: if (atomic_long_read(&cfs_rq->removed_load_avg)) { s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0); sa->load_avg = max_t(long, sa->load_avg - r, 0); sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0); removed_load = 1; } turns into: ffffffff81087064: 49 8b 85 98 00 00 00 mov 0x98(%r13),%rax ffffffff8108706b: 48 85 c0 test %rax,%rax ffffffff8108706e: 74 40 je ffffffff810870b0 ffffffff81087070: 4c 89 f8 mov %r15,%rax ffffffff81087073: 49 87 85 98 00 00 00 xchg %rax,0x98(%r13) ffffffff8108707a: 49 29 45 70 sub %rax,0x70(%r13) ffffffff8108707e: 4c 89 f9 mov %r15,%rcx ffffffff81087081: bb 01 00 00 00 mov $0x1,%ebx ffffffff81087086: 49 83 7d 70 00 cmpq $0x0,0x70(%r13) ffffffff8108708b: 49 0f 49 4d 70 cmovns 0x70(%r13),%rcx Which you'll note ends up with 'sa->load_avg - r' in memory at ffffffff8108707a. By calling post_init_entity_util_avg() under rq->lock we're sure to be fully serialized against PELT updates and cannot observe intermediate state like this. Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yuyang Du <yuyang.du@intel.com> Cc: bsegall@google.com Cc: morten.rasmussen@arm.com Cc: pjt@google.com Cc: steve.muckle@linaro.org Fixes: 2b8c41daba32 ("sched/fair: Initiate a new task's util avg to a bounded value") Link: http://lkml.kernel.org/r/20160609130750.GQ30909@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14PM / Hibernate: Don't let kasan instrument snapshot.cJames Morse
Kasan causes the compiler to instrument C code and is used at runtime to detect accesses to memory that has been freed, or not yet allocated. The code in snapshot.c saves and restores memory when hibernating. This will access whole pages in the slab cache that have both free and allocated areas, resulting in a large number of false positives from Kasan. Disable instrumentation of this file. Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-13Merge back earlier cpufreq changes for v4.8.Rafael J. Wysocki
2016-06-13Merge tag 'irqchip-for-4.8' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core First drop of irqchip updates for 4.8 from Marc Zyngier: - Fix a few bugs in configuring the default trigger from the irqdomain layer - Make the genirq layer PM aware - Add PM capability to the ARM GIC driver - Add support for 2-level translation tables to the GICv3 ITS driver
2016-06-13genirq: Add runtime power management support for IRQ chipsJon Hunter
Some IRQ chips may be located in a power domain outside of the CPU subsystem and hence will require device specific runtime power management. In order to support such IRQ chips, add a pointer for a device structure to the irq_chip structure, and if this pointer is populated by the IRQ chip driver and CONFIG_PM is selected in the kernel configuration, then the pm_runtime_get/put APIs for this chip will be called when an IRQ is requested/freed, respectively. Reviewed-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-06-13irqdomain: Don't set type when mapping an IRQJon Hunter
Some IRQ chips, such as GPIO controllers or secondary level interrupt controllers, may require require additional runtime power management control to ensure they are accessible. For such IRQ chips, it makes sense to enable the IRQ chip when interrupts are requested and disabled them again once all interrupts have been freed. When mapping an IRQ, the IRQ type settings are read and then programmed. The mapping of the IRQ happens before the IRQ is requested and so the programming of the type settings occurs before the IRQ is requested. This is a problem for IRQ chips that require additional power management control because they may not be accessible yet. Therefore, when mapping the IRQ, don't program the type settings, just save them and then program these saved settings when the IRQ is requested (so long as if they are not overridden via the call to request the IRQ). Add a stub function for irq_domain_free_irqs() to avoid any compilation errors when CONFIG_IRQ_DOMAIN_HIERARCHY is not selected. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-06-13genirq: Look-up percpu trigger type if not specified by callerMarc Zyngier
As we now do for non-percpu interrupt, perform a lookup of the interrupt trigger if the user doesn't supply one. The difference here is that we can only do it at enable time (trigger configuration can be per-cpu as well). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-06-13genirq: Look-up trigger type if not specified by callerJon Hunter
For some devices the IRQ trigger type for a device is read from firmware, such as device-tree. The IRQ trigger type is typically read when the mapping for IRQ is created, which is before the IRQ is requested. Hence, the IRQ trigger type is programmed when mapping the IRQ and not when requesting the IRQ. Although this works for most cases, in order to support IRQ chips which require runtime power management, which may not be accessible prior to requesting the IRQ, it is desirable to look-up the IRQ trigger type when it is requested. Therefore, if the IRQ trigger type is not specified when __setup_irq() is called, look-up the saved IRQ trigger type. This will allow us to defer the programming of the trigger type from when the IRQ is mapped to when it is actually requested. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-06-13irqdomain: Fix handling of type settings for existing mappingsJon Hunter
When mapping an IRQ, it is possible that a mapping for the IRQ already exists. If mapping does exist then there are the following issues with regard to the handling of the IRQ type settings ... 1. If the domain is part of a hierarchy, then: a. We do not check that the type settings for the existing mapping match those of the new mapping. b. We do not check to see if the type settings have been programmed yet (and they might not have been) and so we may never set the type. 2. If the domain is NOT part of a hierarchy, we will overwrite the current type settings programmed if they are different from the previous mapping. Please note that irq_create_mapping() calls irq_find_mapping() to check if a mapping already exists. Although, it may be unlikely that the type settings for a shared interrupt would not match, nonetheless we should check for this. Therefore, to fix this check if a mapping exists (regardless of whether the domain is part of a hierarchy or not) and if it does then: 1. Return the IRQ number if the type settings match or are not specified. 2. Program the type settings and return the IRQ number if the type settings have not been programmed yet. 3. Otherwise if the type setting do not match, then print a warning and don't return the IRQ number. Furthermore, add a warning if the type return by irq_domain_translate() has bits outside the sense mask set and then clear these bits. If these bits are not cleared then this will cause the comparision of the type settings for an existing mapping to fail with that of the new mapping even if the sense bit themselves match. The reason being is that the existing type settings are read by calling irq_get_trigger_type() which will clear any bits outside the sense mask. This will allow us to detect irqchips that are not correctly clearing these bits and fix them. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-06-10Merge branch 'stacking-fixes' (vfs stacking fixes from Jann)Linus Torvalds
Merge filesystem stacking fixes from Jann Horn. * emailed patches from Jann Horn <jannh@google.com>: sched: panic on corrupted stack end ecryptfs: forbid opening files without mmap handler proc: prevent stacking filesystems on top
2016-06-10sched: panic on corrupted stack endJann Horn
Until now, hitting this BUG_ON caused a recursive oops (because oops handling involves do_exit(), which calls into the scheduler, which in turn raises an oops), which caused stuff below the stack to be overwritten until a panic happened (e.g. via an oops in interrupt context, caused by the overwritten CPU index in the thread_info). Just panic directly. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-10Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Two scheduler debugging fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/debug: Fix 'schedstats=enable' cmdline option sched/debug: Fix /proc/sched_debug regression
2016-06-10Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "A handful of tooling fixes, two PMU driver fixes and a cleanup of redundant code that addresses a security analyzer false positive" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Remove a redundant check perf/x86/intel/uncore: Remove SBOX support for Broadwell server perf ctf: Convert invalid chars in a string before set value perf record: Fix crash when kptr is restricted perf symbols: Check kptr_restrict for root perf/x86/intel/rapl: Fix pmus free during cleanup
2016-06-10Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Misc fixes: - a file-based futex fix - one more spin_unlock_wait() fix - a ww-mutex deadlock detection improvement/fix - and a raw_read_seqcount_latch() barrier fix" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Calculate the futex key based on a tail page for file-based futexes locking/qspinlock: Fix spin_unlock_wait() some more locking/ww_mutex: Report recursive ww_mutex locking early locking/seqcount: Re-fix raw_read_seqcount_latch()
2016-06-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) nfnetlink timestamp taken from wrong skb, fix from Florian Westphal. 2) Revert some msleep conversions in rtlwifi as these spots are in atomic context, from Larry Finger. 3) Validate that NFTA_SET_TABLE attribute is actually specified when we call nf_tables_getset(). From Phil Turnbull. 4) Don't do mdio_reset in stmmac driver with spinlock held as that can sleep, from Vincent Palatin. 5) sk_filter() does things other than run a BPF filter, so we should not elide it's call just because sk->sk_filter is NULL. Fix from Eric Dumazet. 6) Fix missing backlog updates in several packet schedulers, from Cong Wang. 7) bnx2x driver should allow VLAN add/remove while the interface is down, from Michal Schmidt. 8) Several RDS/TCP race fixes from Sowmini Varadhan. 9) fq_codel scheduler doesn't return correct queue length in dumps, from Eric Dumazet. 10) Fix TCP stats for tail loss probe and early retransmit in ipv6, from Yuchung Cheng. 11) Properly initialize udp_tunnel_socket_cfg in l2tp_tunnel_create(), from Guillaume Nault. 12) qfq scheduler leaks SKBs if a kzalloc fails, fix from Florian Westphal. 13) sock_fprog passed into PACKET_FANOUT_DATA needs compat handling, from Willem de Bruijn. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (85 commits) vmxnet3: segCnt can be 1 for LRO packets packet: compat support for sock_fprog stmmac: fix parameter to dwmac4_set_umac_addr() net/mlx5e: Fix blue flame quota logic net/mlx5e: Use ndo_stop explicitly at shutdown flow net/mlx5: E-Switch, always set mc_promisc for allmulti vports net/mlx5: E-Switch, Modify node guid on vf set MAC net/mlx5: E-Switch, Fix vport enable flow net/mlx5: E-Switch, Use the correct error check on returned pointers net/mlx5: E-Switch, Use the correct free() function net/mlx5: Fix E-Switch flow steering capabilities check net/mlx5: Fix flow steering NIC capabilities check net/mlx5: Fix root flow table update net/mlx5: Fix MLX5_CMD_OP_MAX to be defined correctly net/mlx5: Fix masking of reserved bits in XRCD number net/mlx5: Fix the size of modify QP mailbox mlxsw: spectrum: Don't sleep during ndo_get_phys_port_name() mlxsw: spectrum: Make split flow match firmware requirements wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel cfg80211: remove get/set antenna and tx power warnings ...
2016-06-10Merge tag 'pm-4.7-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "Stable-candidate fixes for the intel_pstate driver and the cpuidle core. Specifics: - Fix two intel_pstate initialization issues, one of which was introduced during the 4.4 cycle (Srinivas Pandruvada) - Fix kernel build with CONFIG_UBSAN set and CONFIG_CPU_IDLE unset (Catalin Marinas)" * tag 'pm-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy() cpuidle: Do not access cpuidle_devices when !CONFIG_CPU_IDLE
2016-06-10genirq: Remove unnecessary memset() callsWeongyo Jeong
sprintf() and snprintf() implementation of kernel guarantees that its result is terminated with null byte if size is larger than 0. So we don't need to call memset() at all. Signed-off-by: Weongyo Jeong <weongyo.linux@gmail.com> Link: http://lkml.kernel.org/r/1459451703-5744-1-git-send-email-weongyo.linux@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-06-10genirq: Remove redundant NULL check of irq_descJianyu Zhan
for_each_irq_desc() macro has already skipped NULL irq_desc, don't bother to check it again. Signed-off-by: Jianyu Zhan <nasa4836@gmail.com> Cc: mingo@kernel.org Cc: yhlu.kernel@gmail.com Link: http://lkml.kernel.org/r/1458395959-7046-1-git-send-email-nasa4836@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-06-10hrtimer: Remove redundant #ifdef blockPratyush Patel
Only need CONFIG_NO_HZ_COMMON as this block is already in a CONFIG_SMP block. Signed-off-by: Pratyush Patel <pratyushpatel.1995@gmail.com> Link: http://lkml.kernel.org/r/20160301172849.GA18152@cyborg Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-06-10timers: Clarify usleep_range() function commentBjorn Helgaas
Update the usleep_range() function comment to make it clear that it can only be used in non-atomic context. Previously we claimed usleep_range() was a drop-in replacement for udelay() where wakeup is flexible. But that's only true in non-atomic contexts, where it's possible to sleep instead of delay. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20160531212302.28502.44995.stgit@bhelgaas-glaptop2.roam.corp.google.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-06-09Merge branches 'pm-cpufreq-fixes' and 'pm-cpuidle'Rafael J. Wysocki
* pm-cpufreq-fixes: cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy() * pm-cpuidle: cpuidle: Do not access cpuidle_devices when !CONFIG_CPU_IDLE
2016-06-09kernel/relay.c: fix potential memory leakZhouyi Zhou
When relay_open_buf() fails in relay_open(), code will goto free_bufs, but chan is nowhere freed. Link: http://lkml.kernel.org/r/1464777927-19675-1-git-send-email-yizhouzhou@ict.ac.cn Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-09block: add a separate operation type for secure eraseChristoph Hellwig
Instead of overloading the discard support with the REQ_SECURE flag. Use the opportunity to rename the queue flag as well, and remove the dead checks for this flag in the RAID 1 and RAID 10 drivers that don't claim support for secure erase. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-08futex: Calculate the futex key based on a tail page for file-based futexesMel Gorman
Mike Galbraith reported that the LTP test case futex_wake04 was broken by commit 65d8fc777f6d ("futex: Remove requirement for lock_page() in get_futex_key()"). This test case uses futexes backed by hugetlbfs pages and so there is an associated inode with a futex stored on such pages. The problem is that the key is being calculated based on the head page index of the hugetlbfs page and not the tail page. Prior to the optimisation, the page lock was used to stabilise mappings and pin the inode is file-backed which is overkill. If the page was a compound page, the head page was automatically looked up as part of the page lock operation but the tail page index was used to calculate the futex key. After the optimisation, the compound head is looked up early and the page lock is only relied upon to identify truncated pages, special pages or a shmem page moving to swapcache. The head page is looked up because without the page lock, special care has to be taken to pin the inode correctly. However, the tail page is still required to calculate the futex key so this patch records the tail page. On vanilla 4.6, the output of the test case is; futex_wake04 0 TINFO : Hugepagesize 2097152 futex_wake04 1 TFAIL : futex_wake04.c:126: Bug: wait_thread2 did not wake after 30 secs. With the patch applied futex_wake04 0 TINFO : Hugepagesize 2097152 futex_wake04 1 TPASS : Hi hydra, thread2 awake! Fixes: 65d8fc777f6d "futex: Remove requirement for lock_page() in get_futex_key()" Reported-and-tested-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20160608132522.GM2469@suse.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>