summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2015-10-12Merge back earlier 'pm-sleep' material for v4.4.Rafael J. Wysocki
2015-10-12workqueue: Allocate the unbound pool using local node memoryXunlei Pang
Currently, get_unbound_pool() uses kzalloc() to allocate the worker pool. Actually, we can use the right node to do the allocation, achieving local memory access. This patch selects target node first, and uses kzalloc_node() instead. Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2015-10-12Merge tag 'v4.3-rc5' into timers/core, to pick up fixes before applying new ↵Ingo Molnar
changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-12sched, tracing: Stop/start critical timings around the idle=poll idle loopDaniel Bristot de Oliveira
When using idle=poll, the preemptoff tracer is always showing the idle task as the culprit for long latencies. That happens because critical timings are not stopped before idle loop. This patch stops critical timings before entering the idle loop, starting it again after the idle loop. This problem does not affect the irqsoff tracer because interruptions are enabled before entering the idle loop. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Reviewed-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/10fc3705874aef11dbe152a068b591a7be1899b4.1444314899.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-11timers: Use __fls in apply_slack()Rasmus Villemoes
In apply_slack(), find_last_bit() is applied to a bitmask consisting of precisely BITS_PER_LONG bits. Since mask is non-zero, we might as well eliminate the function call and use __fls() directly. On x86_64, this shaves 23 bytes of the only caller, mod_timer(). This also gets rid of Coverity CID 1192106, but that is a false positive: Coverity is not aware that mask != 0 implies that find_last_bit will not return BITS_PER_LONG. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/1443771931-6284-1-git-send-email-linux@rasmusvillemoes.dk Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-11clocksource: Remove return statement from void functionsGuillaume Gomez
Signed-off-by: Guillaume Gomez <guillaume1.gomez@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/CAAOQCfSDgmqSWDBsetau%2ByF8x0%2BDagCF_pfFw0p5xH_BKkKEog@mail.gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-11Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Thomas Gleixner: "Fix a long standing state race in finish_task_switch()" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Fix TASK_DEAD race in finish_task_switch()
2015-10-11bpf: fix cb access in socket filter programsAlexei Starovoitov
eBPF socket filter programs may see junk in 'u32 cb[5]' area, since it could have been used by protocol layers earlier. For socket filter programs used in af_packet we need to clean 20 bytes of skb->cb area if it could be used by the program. For programs attached to TCP/UDP sockets we need to save/restore these 20 bytes, since it's used by protocol layers. Remove SK_RUN_FILTER macro, since it's no longer used. Long term we may move this bpf cb area to per-cpu scratch, but that requires addition of new 'per-cpu load/store' instructions, so not suitable as a short term fix. Fixes: d691f9e8d440 ("bpf: allow programs to write to certain skb fields") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-11genirq: Add flag to force mask in disable_irq[_nosync]()Thomas Gleixner
If an irq chip does not implement the irq_disable callback, then we use a lazy approach for disabling the interrupt. That means that the interrupt is marked disabled, but the interrupt line is not immediately masked in the interrupt chip. It only becomes masked if the interrupt is raised while it's marked disabled. We use this to avoid possibly expensive mask/unmask operations for common case operations. Unfortunately there are devices which do not allow the interrupt to be disabled easily at the device level. They are forced to use disable_irq_nosync(). This can result in taking each interrupt twice. Instead of enforcing the non lazy mode on all interrupts of a irq chip, provide a settings flag, which can be set by the driver for that particular interrupt line. Reported-and-tested-by: Duc Dang <dhdang@apm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Jason Cooper <jason@lakedaemon.net> Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1510092348370.6097@nanos
2015-10-09pmem, memremap: convert to numa aware allocationsDan Williams
Given that pmem ranges come with numa-locality hints, arrange for the resulting driver objects to be obtained from node-local memory. Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-10-09devm_memremap_pages: use numa_mem_idDan Williams
Hint to closest numa node for the placement of newly allocated pages. As that is where the device's other allocations will originate by default when it does not specify a NUMA node. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-10-09devm_memremap: convert to return ERR_PTRDan Williams
Make devm_memremap consistent with the error return scheme of devm_memremap_pages to remove special casing in the pmem driver. Cc: Christoph Hellwig <hch@lst.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-10-09devm_memunmap: use devres_release()Dan Williams
Remove open coded call to memunmap. Cc: Christoph Hellwig <hch@lst.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-10-09genirq: Make irq_set_vcpu_affinity available for CONFIG_SMP=nFeng Wu
irq_set_vcpu_affinity() is needed when CONFIG_SMP=n, so move the definition out of "#ifdef CONFIG_SMP" Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Feng Wu <feng.wu@intel.com> Cc: jiang.liu@linux.intel.com Cc: pbonzini@redhat.com Link: http://lkml.kernel.org/r/1443860438-144926-1-git-send-email-feng.wu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-09genirq: Allow migration of chained interrupts by installing default actionMika Westerberg
When a CPU is offlined all interrupts that have an action are migrated to other still online CPUs. However, if the interrupt has chained handler installed this is not done. Chained handlers are used by GPIO drivers which support interrupts, for instance. When the affinity is not corrected properly we end up in situation where most interrupts are not arriving to the online CPUs anymore. For example on Intel Braswell system which has SD-card card detection signal connected to a GPIO the IO-APIC routing entries look like below after CPU1 is offlined: pin30, enabled , level, low , V(52), IRR(0), S(0), logical , D(03), M(1) pin31, enabled , level, low , V(42), IRR(0), S(0), logical , D(03), M(1) pin32, enabled , level, low , V(62), IRR(0), S(0), logical , D(03), M(1) pin5b, enabled , level, low , V(72), IRR(0), S(0), logical , D(03), M(1) The problem here is that the destination mask still contains both CPUs even if CPU1 is already offline. This means that the IO-APIC still routes interrupts to the other CPU as well. We solve the problem by providing a default action for chained interrupts. This action allows the migration code to correct affinity (as it finds desc->action != NULL). Also make the default action handler to emit a warning if for some reason a chained handler ends up calling it. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Link: http://lkml.kernel.org/r/1444039935-30475-1-git-send-email-mika.westerberg@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-09Merge branch 'irq/for-arm' of ↵Catalin Marinas
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'irq/for-arm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Introduce generic irq migration for cpu hotunplug
2015-10-09genirq: Fix handle_bad_irq kerneldoc commentArnd Bergmann
A recent cleanup removed the 'irq' parameter from many functions, but left the documentation for this in place for at least one function. This removes it. Fixes: bd0b9ac405e1 ("genirq: Remove irq argument from irq flow handlers") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Tony Lindgren <tony@atomide.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: kbuild-all@01.org Cc: Austin Schuh <austin@peloton-tech.com> Cc: Santosh Shilimkar <ssantosh@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/5400000.cD19rmgWjV@wuerfel Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-09genirq: Export handle_bad_irqArnd Bergmann
A cleanup of the omap gpio driver introduced a use of the handle_bad_irq() function in a device driver that can be a loadable module. This broke the ARM allmodconfig build: ERROR: "handle_bad_irq" [drivers/gpio/gpio-omap.ko] undefined! This patch exports the handle_bad_irq symbol in order to allow the use in modules. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Santosh Shilimkar <ssantosh@kernel.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Austin Schuh <austin@peloton-tech.com> Cc: Tony Lindgren <tony@atomide.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/5847725.4IBopItaOr@wuerfel Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-08bpf: split state from prandom_u32() and consolidate {c, e}BPF prngsDaniel Borkmann
While recently arguing on a seccomp discussion that raw prandom_u32() access shouldn't be exposed to unpriviledged user space, I forgot the fact that SKF_AD_RANDOM extension actually already does it for some time in cBPF via commit 4cd3675ebf74 ("filter: added BPF random opcode"). Since prandom_u32() is being used in a lot of critical networking code, lets be more conservative and split their states. Furthermore, consolidate eBPF and cBPF prandom handlers to use the new internal PRNG. For eBPF, bpf_get_prandom_u32() was only accessible for priviledged users, but should that change one day, we also don't want to leak raw sequences through things like eBPF maps. One thought was also to have own per bpf_prog states, but due to ABI reasons this is not easily possible, i.e. the program code currently cannot access bpf_prog itself, and copying the rnd_state to/from the stack scratch space whenever a program uses the prng seems not really worth the trouble and seems too hacky. If needed, taus113 could in such cases be implemented within eBPF using a map entry to keep the state space, or get_random_bytes() could become a second helper in cases where performance would not be critical. Both sides can trigger a one-time late init via prandom_init_once() on the shared state. Performance-wise, there should even be a tiny gain as bpf_user_rnd_u32() saves one function call. The PRNG needs to live inside the BPF core since kernels could have a NET-less config as well. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Chema Gonzalez <chema@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08Merge branch 'perf/urgent' into perf/core, before pulling new changesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-07Merge branches 'doc.2015.10.06a', 'percpu-rwsem.2015.10.06a' and ↵Paul E. McKenney
'torture.2015.10.06a' into HEAD doc.2015.10.06a: Documentation updates. percpu-rwsem.2015.10.06a: Optimization of per-CPU reader-writer semaphores. torture.2015.10.06a: Torture-test updates.
2015-10-07Merge branches 'fixes.2015.10.06a' and 'exp.2015.10.07a' into HEADPaul E. McKenney
exp.2015.10.07a: Reduce OS jitter of RCU-sched expedited grace periods. fixes.2015.10.06a: Miscellaneous fixes.
2015-10-07rcu: Better hotplug handling for synchronize_sched_expedited()Paul E. McKenney
Earlier versions of synchronize_sched_expedited() can prematurely end grace periods due to the fact that a CPU marked as cpu_is_offline() can still be using RCU read-side critical sections during the time that CPU makes its last pass through the scheduler and into the idle loop and during the time that a given CPU is in the process of coming online. This commit therefore eliminates this window by adding additional interaction with the CPU-hotplug operations. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07rcu: Enable stall warnings for synchronize_rcu_expedited()Paul E. McKenney
This commit redirects synchronize_rcu_expedited()'s wait to synchronize_sched_expedited_wait(), thus enabling RCU CPU stall warnings. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07rcu: Add tasks to expedited stall-warning messagesPaul E. McKenney
This commit adds task-print ability to the expedited RCU CPU stall warning messages in preparation for adding stall warnings to synchornize_rcu_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07rcu: Add online/offline info to expedited stall warning messagePaul E. McKenney
This commit makes the RCU CPU stall warning message print online/offline indications immediately after the CPU number. A "O" indicates global offline, a "." global online, and a "o" indicates RCU believes that the CPU is offline for the current grace period and "." otherwise, and an "N" indicates that RCU believes that the CPU will be offline for the next grace period, and "." otherwise, all right after the CPU number. So for CPU 10, you would normally see "10-...:" indicating that everything believes that the CPU is online. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07rcu: Consolidate expedited CPU selectionPaul E. McKenney
Now that sync_sched_exp_select_cpus() and sync_rcu_exp_select_cpus() are identical aside from the the argument to smp_call_function_single(), this commit consolidates them with a functional argument. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07rcu: Prepare for consolidating expedited CPU selectionPaul E. McKenney
This commit brings sync_sched_exp_select_cpus() into alignment with sync_rcu_exp_select_cpus(), as a first step towards consolidating them into one function. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07cpu: Remove try_get_online_cpus()Paul E. McKenney
Now that synchronize_sched_expedited() no longer uses it, there are no users of try_get_online_cpus() in mainline. This commit therefore removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2015-10-07rcu: Stop excluding CPU hotplug in synchronize_sched_expedited()Paul E. McKenney
Now that synchronize_sched_expedited() uses IPIs, a hook in rcu_sched_qs(), and the ->expmask field in the rcu_node combining tree, it is no longer necessary to exclude CPU hotplug. Any races with CPU hotplug will be detected when attempting to send the IPI. This commit therefore removes the code excluding CPU hotplug operations. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07rcu: Stop silencing lockdep false positive for expedited grace periodsPaul E. McKenney
This reverts commit af859beaaba4 (rcu: Silence lockdep false positive for expedited grace periods). Because synchronize_rcu_expedited() no longer invokes synchronize_sched_expedited(), ->exp_funnel_mutex acquisition is no longer nested, so the false positive no longer happens. This commit therefore removes the extra lockdep data structures, as they are no longer needed.
2015-10-07rcu: Switch synchronize_sched_expedited() to IPIPaul E. McKenney
This commit switches synchronize_sched_expedited() from stop_one_cpu_nowait() to smp_call_function_single(), thus moving from an IPI and a pair of context switches to an IPI and a single pass through the scheduler. Of course, if the scheduler actually does decide to switch to a different task, there will still be a pair of context switches, but there would likely have been a pair of context switches anyway, just a bit later. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-06locktorture: Fix module unwind when bad torture_type specifiedPaul E. McKenney
The locktorture module has a list of torture types, and specifying a type not on this list is supposed to cleanly fail the module load. Unfortunately, the "fail" happens without the "cleanly". This commit therefore adds the needed clean-up after an incorrect torture_type. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06rcutorture: Fix unused-function warning for torturing_tasks()Paul E. McKenney
The torturing_tasks() function is used only in kernels built with CONFIG_PROVE_RCU=y, so the second definition can result in unused-function compiler warnings. This commit adds __maybe_unused to suppress these warnings. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06rcutorture: Fix module unwind when bad torture_type specifiedPaul E. McKenney
The rcutorture module has a list of torture types, and specifying a type not on this list is supposed to cleanly fail the module load. Unfortunately, the "fail" happens without the "cleanly". This commit therefore adds the needed clean-up after an incorrect torture_type. Reported-by: David Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: David Miller <davem@davemloft.net> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06rcu_sync: Cleanup the CONFIG_PROVE_RCU checksOleg Nesterov
1. Rename __rcu_sync_is_idle() to rcu_sync_lockdep_assert() and change it to use rcu_lockdep_assert(). 2. Change rcu_sync_is_idle() to return rsp->gp_state == GP_IDLE unconditonally, this way we can remove the same check from rcu_sync_lockdep_assert() and clearly isolate the debugging code. Note: rcu_sync_enter()->wait_event(gp_state == GP_PASSED) needs another CONFIG_PROVE_RCU check, the same as is done in ->sync(); but this needs some simple preparations in the core RCU code to avoid the code duplication. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06locking/percpu-rwsem: Clean up the lockdep annotations in percpu_down_read()Oleg Nesterov
Based on Peter Zijlstra's earlier patch. Change percpu_down_read() to use __down_read(), this way we can do rwsem_acquire_read() unconditionally at the start to make this code more symmetric and clean. Originally-From: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06locking/percpu-rwsem: Fix the comments outdated by rcu_syncOleg Nesterov
Update the comments broken by the previous change. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06locking/percpu-rwsem: Make use of the rcu_sync infrastructureOleg Nesterov
Currently down_write/up_write calls synchronize_sched_expedited() twice, which is evil. Change this code to rely on rcu-sync primitives. This avoids the _expedited "big hammer", and this can be faster in the contended case or even in the case when a single thread does down_write/up_write in a loop. Of course, a single down_write() will take more time, but otoh it will be much more friendly to the whole system. To simplify the review this patch doesn't update the comments, fixed by the next change. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06locking/percpu-rwsem: Make percpu_free_rwsem() after kzalloc() safeOleg Nesterov
This is the temporary ugly hack which will be reverted later. We only need it to ensure that the next patch will not break "change sb_writers to use percpu_rw_semaphore" patches routed via the VFS tree. The alloc_super()->destroy_super() error path assumes that it is safe to call percpu_free_rwsem() after kzalloc() without percpu_init_rwsem(), so let's not disappoint it. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06rcu_sync: Introduce rcu_sync_dtor()Oleg Nesterov
This commit allows rcu_sync structures to be safely deallocated, The trick is to add a new ->wait field to the gp_ops array. This field is a pointer to the rcu_barrier() function corresponding to the flavor of RCU in question. This allows a new rcu_sync_dtor() to wait for any outstanding callbacks before freeing the rcu_sync structure. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06rcu_sync: Add CONFIG_PROVE_RCU checksOleg Nesterov
This commit validates that the caller of rcu_sync_is_idle() holds the corresponding type of RCU read-side lock, but only in kernels built with CONFIG_PROVE_RCU=y. This validation is carried out via a new rcu_sync_ops->held() method that is checked within rcu_sync_is_idle(). Note that although this does add code to the fast path, it only does so in kernels built with CONFIG_PROVE_RCU=y. Suggested-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06rcu_sync: Simplify rcu_sync using new rcu_sync_ops structureOleg Nesterov
This commit adds the new struct rcu_sync_ops which holds sync/call methods, and turns the function pointers in rcu_sync_struct into an array of struct rcu_sync_ops. This simplifies the "init" helpers by collapsing a switch statement and explicit multiple definitions into a simple assignment and a helper macro, respectively. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06rcu: Create rcu_sync infrastructureOleg Nesterov
The rcu_sync infrastructure can be thought of as infrastructure to be used to implement reader-writer primitives having extremely lightweight readers during times when there are no writers. The first use is in the percpu_rwsem used by the VFS subsystem. This infrastructure is functionally equivalent to struct rcu_sync_struct { atomic_t counter; }; /* Check possibility of fast-path read-side operations. */ static inline bool rcu_sync_is_idle(struct rcu_sync_struct *rss) { return atomic_read(&rss->counter) == 0; } /* Tell readers to use slowpaths. */ static inline void rcu_sync_enter(struct rcu_sync_struct *rss) { atomic_inc(&rss->counter); synchronize_sched(); } /* Allow readers to once again use fastpaths. */ static inline void rcu_sync_exit(struct rcu_sync_struct *rss) { synchronize_sched(); atomic_dec(&rss->counter); } The main difference is that it records the state and only calls synchronize_sched() if required. At least some of the calls to synchronize_sched() will be optimized away when rcu_sync_enter() and rcu_sync_exit() are invoked repeatedly in quick succession. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06torture: Consolidate cond_resched_rcu_qs() into stutter_wait()Paul E. McKenney
This commit moves cond_resched_rcu_qs() into stutter_wait(), saving a line and also avoiding RCU CPU stall warnings from all torture loops containing a stutter_wait(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06locktorture: Add torture tests for percpu_rwsemPaul E. McKenney
This commit adds percpu_rwsem tests based on the earlier rwsem tests. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06locking/percpu-rwsem: Export symbols for locktorturePaul E. McKenney
This commit exports percpu_down_read(), percpu_down_write(), __percpu_init_rwsem(), percpu_up_read(), and percpu_up_write() to allow locktorture to test them when built as a module. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06locktorture: Support rtmutex torturingDavidlohr Bueso
Real time mutexes is one of the few general primitives that we do not have in locktorture. Address this -- a few considerations: o To spice things up, enable competing thread(s) to become rt, such that we can stress different prio boosting paths in the rtmutex code. Introduce a ->task_boost callback, only used by rtmutex-torturer. Tasks will boost/deboost around every 50k (arbitrarily) lock/unlock operations. o Hold times are similar to what we have for other locks: only occasionally having longer hold times (per ~200k ops). So we roughly do two full rt boost+deboosting ops with short hold times. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06rcu: Correct comment for values of ->gp_state fieldPaul E. McKenney
This commit corrects the comment for the values of the ->gp_state field, which previously incorrectly said that these were for the ->gp_flags field. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06rcu: Finish folding ->fqs_state into ->gp_statePetr Mladek
Commit commit 4cdfc175c25c89ee ("rcu: Move quiescent-state forcing into kthread") started the process of folding the old ->fqs_state into ->gp_state, but did not complete it. This situation does not cause any malfunction, but can result in extremely confusing trace output. This commit completes this task of eliminating ->fqs_state in favor of ->gp_state. The old ->fqs_state was also used to decide when to collect dyntick-idle snapshots. For this purpose, we add a boolean variable into the kthread, which is set on the first call to rcu_gp_fqs() for a given grace period and clear otherwise. Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>