summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2013-08-14context_tracking: Ground setup for static key useFrederic Weisbecker
Prepare for using a static key in the context tracking subsystem. This will help optimizing the off case on its many users: * user_enter, user_exit, exception_enter, exception_exit, guest_enter, guest_exit, vtime_*() Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org>
2013-08-13microblaze: fix clone syscallMichal Simek
Fix inadvertent breakage in the clone syscall ABI for Microblaze that was introduced in commit f3268edbe6fe ("microblaze: switch to generic fork/vfork/clone"). The Microblaze syscall ABI for clone takes the parent tid address in the 4th argument; the third argument slot is used for the stack size. The incorrectly-used CLONE_BACKWARDS type assigned parent tid to the 3rd slot. This commit restores the original ABI so that existing userspace libc code will work correctly. All kernel versions from v3.8-rc1 were affected. Signed-off-by: Michal Simek <michal.simek@xilinx.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-13cpuset: remove an unncessary forward declarationLi Zefan
Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-08-13cgroup: RCU protect each cgroup_subsys_state releaseTejun Heo
With the planned unified hierarchy, individual css's will be created and destroyed dynamically across the lifetime of a cgroup. To enable such usages, css destruction is being decoupled from cgroup destruction. Most of the destruction path has been decoupled but the actual free of css still depends on cgroup free path. When all css refs are drained, css_release() kicks off css_free_work_fn() which puts the cgroup. When the cgroup refcnt reaches zero, cgroup_diput() is invoked which in turn schedules RCU free of the cgroup. After a grace period, all css's are freed along with the cgroup itself. This patch moves the RCU grace period and css freeing from cgroup release path to css release path. css_release(), instead of kicking off css_free_work_fn() directly, schedules RCU callback css_free_rcu_fn() which in turn kicks off css_free_work_fn() after a RCU grace period. css_free_work_fn() is updated to free the css directly. The five-way punting - percpu ref kill confirmation, a work item, percpu ref release, RCU grace period, and again a work item - is quite hairy but the work items are there only to provide process context and the actual sequence is kill confirm -> release -> RCU free, which isn't simple but not too crazy. This removes cgroup_css() usage after offline_css() allowing clearing cgroup->subsys[] from offline_css(), which makes it consistent with online_css() and brings it closer to proper lifetime management for individual css's. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-13cgroup: move subsys file removal to kill_css()Tejun Heo
With the planned unified hierarchy, individual css's will be created and destroyed dynamically across the lifetime of a cgroup. To enable such usages, css destruction is being decoupled from cgroup destruction. This patch moves subsys file removal from cgroup_destroy_locked() to kill_css(). While this changes the order of destruction operations, the changes shouldn't be noticeable to cgroup subsystems or userland. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-13cgroup: factor out kill_css()Tejun Heo
Factor out css ref killing from cgroup_destroy_locked() into kill_css(). We're gonna add more to the path and the factored out function will eventually be called from other places too. While at it, replace open coded percpu_ref_get() with css_get() for consistency. This shouldn't cause any functional difference as the function is not used for root cgroups. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-13cgroup: decouple cgroup_subsys_state destruction from cgroup destructionTejun Heo
Currently, css (cgroup_subsys_state) lifetime is tied to that of the associated cgroup. css's are created when the associated cgroup is created and destroyed when it gets destroyed. Also, individual css's aren't RCU protected but the whole cgroup is. With the planned unified hierarchy, css's will need to be dynamically created and destroyed within the lifetime of a cgroup. To enable such usages, this patch decouples css destruction from cgroup destruction - offline_css() invocation and the final css_put() are moved from cgroup_destroy_css_killed() to css_killed_work_fn(). Now each css is individually offlined and put as its reference count is killed instead of waiting for all css's attached to the cgroup to finish refcnt killing and then proceeding to offlining and putting them together. While this changes the order of destruction operations, the changes shouldn't be noticeable to cgroup subsystems or userland. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-13cgroup: replace cgroup->css_kill_cnt with ->nr_cssTejun Heo
Currently, css (cgroup_subsys_state) lifetime is tied to that of the associated cgroup. With the planned unified hierarchy, css's will be dynamically created and destroyed within the lifetime of a cgroup. To enable such usages, css's will be individually RCU protected instead of being tied to the cgroup. cgroup->css_kill_cnt is used during cgroup destruction to wait for css reference count disable; however, this model doesn't work once css's lifetimes are managed separately from cgroup's. This patch replaces it with cgroup->nr_css which is an cgroup_mutex protected integer counting the number of attached css's. The count is incremented from online_css() and decremented after refcnt kill is confirmed. If the count reaches zero and the cgroup is marked dead, the second stage of cgroup destruction is kicked off. If a cgroup doesn't have any css attached at the time of rmdir, cgroup_destroy_locked() now invokes the second stage directly as no css kill confirmation would happen. cgroup_offline_fn() - the second step of cgroup destruction - is renamed to cgroup_destroy_css_killed() and now expects to be called with cgroup_mutex held. While this patch changes how css destruction is punted to work items, it shouldn't change any visible behavior. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-13cgroup: bounce cgroup_subsys_state ref kill confirmation to a work itemTejun Heo
css (cgroup_subsys_state) offlining, which requires process context, will be moved to ref kill confirmation. In preparation, bounce css_killed handling through css->destroy_work. css_ref_killed_fn() is renamed to css_killed_ref_fn() so that it's consistent with the new css_killed_work_fn(). This patch adds an additional work item bouncing but doesn't change the actual logic. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-13cgroup: move cgroup->subsys[] assignment to online_css()Tejun Heo
Currently, css (cgroup_subsys_state) lifetime is tied to that of the associated cgroup. With the planned unified hierarchy, css's will be dynamically created and destroyed within the lifetime of a cgroup. To enable such usages, css's will be individually RCU protected instead of being tied to the cgroup. In preparation, this patch moves cgroup->subsys[] assignment from init_css() to online_css(). As this means that a newly initialized css should be remembered separately and that cgroup_css() returns NULL between init and online, cgroup_create() is updated so that it stores newly created css's in a local array css_ar[] and cgroup_init/load_subsys() are updated to use local variable @css instead of using cgroup_css(). This change also slightly simplifies error path of cgroup_create(). While this patch changes when cgroup->subsys[] is initialized, this change isn't visible to subsystems or userland. v2: This patch wasn't updated accordingly after the previous "cgroup: reorganize css init / exit paths" was updated leading to missing a css_ar[] conversion in cgroup_create() and thus boot failure. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-13Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Docbook fixes that make 99% of the diffstat, plus a oneliner fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Ensure update_cfs_shares() is called for parents of continuously-running tasks sched: Fix some kernel-doc warnings
2013-08-14PM / QoS: Fix workqueue deadlock when using pm_qos_update_request_timeout()Stephen Boyd
pm_qos_update_request_timeout() updates a qos and then schedules a delayed work item to bring the qos back down to the default after the timeout. When the work item runs, pm_qos_work_fn() will call pm_qos_update_request() and deadlock because it tries to cancel itself via cancel_delayed_work_sync(). Future callers of that qos will also hang waiting to cancel the work that is canceling itself. Let's extract the little bit of code that does the real work of pm_qos_update_request() and call it from the work function so that we don't deadlock. Before ed1ac6e (PM: don't use [delayed_]work_pending()) this didn't happen because the work function wouldn't try to cancel itself. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: 3.9+ <stable@vger.kernel.org> # 3.9+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-13sched: fix the theoretical signal_wake_up() vs schedule() raceOleg Nesterov
This is only theoretical, but after try_to_wake_up(p) was changed to check p->state under p->pi_lock the code like __set_current_state(TASK_INTERRUPTIBLE); schedule(); can miss a signal. This is the special case of wait-for-condition, it relies on try_to_wake_up/schedule interaction and thus it does not need mb() between __set_current_state() and if(signal_pending). However, this __set_current_state() can move into the critical section protected by rq->lock, now that try_to_wake_up() takes another lock we need to ensure that it can't be reordered with "if (signal_pending(current))" check inside that section. The patch is actually one-liner, it simply adds smp_wmb() before spin_lock_irq(rq->lock). This is what try_to_wake_up() already does by the same reason. We turn this wmb() into the new helper, smp_mb__before_spinlock(), for better documentation and to allow the architectures to change the default implementation. While at it, kill smp_mb__after_lock(), it has no callers. Perhaps we can also add smp_mb__before/after_spinunlock() for prepare_to_wait(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-13cgroup: reorganize css init / exit pathsTejun Heo
css (cgroup_subsys_state) lifetime management is about to be restructured. In prepartion, make the following mostly trivial changes. * init_cgroup_css() is renamed to init_css() so that it's consistent with other css handling functions. * alloc_css_id(), online_css() and offline_css() updated to take @css instead of cgroups and subsys IDs. This patch doesn't make any functional changes. v2: v1 merged two for_each_root_subsys() loops in cgroup_create() but Li Zefan pointed out that it breaks error path. Dropped. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-13cgroup: add __rcu modifier to cgroup->subsys[]Tejun Heo
For the planned unified hierarchy, each css (cgroup_subsys_state) will be RCU protected so that it can be created and destroyed individually while allowing RCU accesses. Previous changes ensured that all cgroup->subsys[] accesses use the cgroup_css() accessor. This patch adds __rcu modifier to cgroup->subsys[], add matching RCU dereference in cgroup_css() and convert all assignments to either rcu_assign_pointer() or RCU_INIT_POINTER(). This change prepares for the actual RCUfication of css's and doesn't introduce any visible behavior change. The conversion is verified with sparse and all accesses are properly RCU annotated. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-13cgroup: make cgroup_file_open() rcu_read_lock() around cgroup_css() and add ↵Tejun Heo
cfent->css For the planned unified hierarchy, each css (cgroup_subsys_state) will be RCU protected so that it can be created and destroyed individually while allowing RCU accesses, and cgroup_css() will soon require either holding cgroup_mutex or RCU read lock. This patch updates cgroup_file_open() such that it acquires the associated css under rcu_read_lock(). While cgroup_file_css() usages in other file operations are safe due to the reference from open, cgroup_css() wouldn't know that and will still trigger warnings. It'd be cleanest to store the acquired css in file->prvidate_data for further file operations but that's already used by seqfile. This patch instead adds cfent->css to cache the associated css. Note that while this field is initialized during cfe init, it should only be considered valid while the file is open. This patch doesn't change visible behavior. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-13cgroup: cgroup_css_from_dir() now should be called with RCU read lockedTejun Heo
cgroup->subsys[] will become RCU protected and thus all cgroup_css() usages should either be under RCU read lock or cgroup_mutex. This patch updates cgroup_css_from_dir() which returns the matching cgroup_subsys_state given a directory file and subsys_id so that it requires RCU read lock and updates its sole user perf_cgroup_connect(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com>
2013-08-13cgroup: add cgroup_subsys_state->parentTejun Heo
With the planned unified hierarchy, css's (cgroup_subsys_state) will be RCU protected and allowed to be attached and detached dynamically over the course of a cgroup's lifetime. This means that css's will stay accessible after being detached from its cgroup - the matching pointer in cgroup->subsys[] cleared - for ref draining and RCU grace period. cgroup core still wants to guarantee that the parent css is never destroyed before its children and css_parent() always returns the parent regardless of the state of the child css as long as it's accessible. This patch makes css's hold onto their parents and adds css->parent so that the parent css is never detroyed before its children and can be determined without consulting the cgroups. cgroup->dummy_css is also updated to point to the parent dummy_css; however, it doesn't need to worry about object lifetime as the parent cgroup is already pinned by the child. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-13cgroup: rename cgroup_subsys_state->dput_work and its callback functionTejun Heo
css (cgroup_subsys_state) will become RCU protected and there will be two stages which require punting to work item during release. To prepare for using the work item for multiple times, rename css->dput_work to css->destroy_work and css_dput_fn() to css_free_work_fn() and move work item initialization from css init to right before the actual usage. This reorganization doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-13cgroup: always use cgroup_css()Tejun Heo
cgroup_css() is the accessor for cgroup->subsys[] but is not used consistently. cgroup->subsys[] will become RCU protected and cgroup_css() will grow synchronization sanity checks. In preparation, make all cgroup->subsys[] dereferences use cgroup_css() consistently. This patch doesn't introduce any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-13cpuset: fix the return value of cpuset_write_u64()Li Zefan
Writing to this file always returns -ENODEV: # echo 1 > cpuset.memory_pressure_enabled -bash: echo: write error: No such device Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: <stable@vger.kernel.org> # 3.9+ Signed-off-by: Tejun Heo <tj@kernel.org>
2013-08-13ACPI / processor: Acquire writer lock to update CPU mapsToshi Kani
CPU system maps are protected with reader/writer locks. The reader lock, get_online_cpus(), assures that the maps are not updated while holding the lock. The writer lock, cpu_hotplug_begin(), is used to udpate the cpu maps along with cpu_maps_update_begin(). However, the ACPI processor handler updates the cpu maps without holding the the writer lock. acpi_map_lsapic() is called from acpi_processor_hotadd_init() to update cpu_possible_mask and cpu_present_mask. acpi_unmap_lsapic() is called from acpi_processor_remove() to update cpu_possible_mask. Currently, they are either unprotected or protected with the reader lock, which is not correct. For example, the get_online_cpus() below is supposed to assure that cpu_possible_mask is not changed while the code is iterating with for_each_possible_cpu(). get_online_cpus(); for_each_possible_cpu(cpu) { : } put_online_cpus(); However, this lock has no protection with CPU hotplug since the ACPI processor handler does not use the writer lock when it updates cpu_possible_mask. The reader lock does not serialize within the readers. This patch protects them with the writer lock with cpu_hotplug_begin() along with cpu_maps_update_begin(), which must be held before calling cpu_hotplug_begin(). It also protects arch_register_cpu() / arch_unregister_cpu(), which creates / deletes a sysfs cpu device interface. For this purpose it changes cpu_hotplug_begin() and cpu_hotplug_done() to global and exports them in cpu.h. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-13context_tracking: Remove full dynticks' hacky dependency on wide context ↵Frederic Weisbecker
tracking Now that the full dynticks subsystem only enables the context tracking on full dynticks CPUs, lets remove the dependency on CONTEXT_TRACKING_FORCE This dependency was a hack to enable the context tracking widely for the full dynticks susbsystem until the latter becomes able to enable it in a more CPU-finegrained fashion. Now CONTEXT_TRACKING_FORCE only stands for testing on archs that work on support for the context tracking while full dynticks can't be used yet due to unmet dependencies. It simulates a system where all CPUs are full dynticks so that RCU user extended quiescent states and dynticks cputime accounting can be tested on the given arch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org>
2013-08-13nohz: Only enable context tracking on full dynticks CPUsFrederic Weisbecker
The context tracking subsystem has the ability to selectively enable the tracking on any defined subset of CPU. This means that we can define a CPU range that doesn't run the context tracking and another range that does. Now what we want in practice is to enable the tracking on full dynticks CPUs only. In order to perform this, we just need to pass our full dynticks CPU range selection from the full dynticks subsystem to the context tracking. This way we can spare the overhead of RCU user extended quiescent state and vtime maintainance on the CPUs that are outside the full dynticks range. Just keep in mind the raw context tracking itself is still necessary everywhere. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org>
2013-08-13context_tracking: Fix runtime CPU off-caseFrederic Weisbecker
As long as the context tracking is enabled on any CPU, even a single one, all other CPUs need to keep track of their user <-> kernel boundaries cross as well. This is because a task can sleep while servicing an exception that happened in the kernel or in userspace. Then when the task eventually wakes up and return from the exception, the CPU needs to know if we resume in userspace or in the kernel. exception_exit() get this information from exception_enter() that saved the previous state. If the CPU where the exception happened didn't keep track of these informations, exception_exit() doesn't know which state tracking to restore on the CPU where the task got migrated and we may return to userspace with the context tracking subsystem thinking that we are in kernel mode. This can be fixed in the long term if we move our context tracking probes on very low level arch fast path user <-> kernel boundary, although even that is worrisome as an exception can still happen in the few instructions between the probe and the actual iret. Also we are not yet ready to set these probes in the fast path given the potential overhead problem it induces. So let's fix this by always enable context tracking even on CPUs that are not in the full dynticks range. OTOH we can spare the rcu_user_*() and vtime_user_*() calls there because the tick runs on these CPUs and we can handle RCU state machine and cputime accounting through it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org>
2013-08-13vtime: Update a few commentsFrederic Weisbecker
Update a stale comment from the old vtime era and document some locking that might be non obvious. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org>
2013-08-13context_tracing: Fix guest accounting with native vtimeFrederic Weisbecker
1) If context tracking is enabled with native vtime accounting (which combo is useless except for dev testing), we call vtime_guest_enter() and vtime_guest_exit() on host <-> guest switches. But those are stubs in this configurations. As a result, cputime is not correctly flushed on kvm context switches. 2) If context tracking runs but is disabled on some CPUs, those CPUs end up calling __guest_enter/__guest_exit which in turn call vtime_account_system(). We don't want to call this because we run in tick based accounting for these CPUs. Refactor the guest_enter/guest_exit code such that all combinations finally work. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org>
2013-08-13sched: Consolidate open coded preemptible() checksFrederic Weisbecker
preempt_schedule() and preempt_schedule_context() open code their preemptability checks. Use the standard API instead for consolidation. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Alex Shi <alex.shi@intel.com> Cc: Paul Turner <pjt@google.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Vincent Guittot <vincent.guittot@linaro.org>
2013-08-12Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull w/w mutex deadlock injection fix from Ingo Molnar. This bug made the CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y option largely useless, but wouldn't affect normal users. * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: mutex: Fix w/w mutex deadlock injection
2013-08-12Merge branch 'fortglx/3.11/time' of ↵Ingo Molnar
git://git.linaro.org/people/jstultz/linux into timers/urgent Pull small fix for v3.11 from John Stultz. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-09jump_label: Split jumplabel ratelimitAndrew Jones
Commit b202952075f62603bea9bfb6ebc6b0420db11949 ("perf, core: Rate limit perf_sched_events jump_label patching") introduced rate limiting for jump label disabling. The changes were made in the jump label code in order to be more widely available and to keep things tidier. This is all fine, except now jump_label.h includes linux/workqueue.h, which makes it impossible to include jump_label.h from anything that workqueue.h needs. For example, it's now impossible to include jump_label.h from asm/spinlock.h, which is done in proposed pv-ticketlock patches. This patch splits out the rate limiting related changes from jump_label.h into a new file, jump_label_ratelimit.h, to resolve the issue. Signed-off-by: Andrew Jones <drjones@redhat.com> Link: http://lkml.kernel.org/r/1376058122-8248-10-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-08cgroup: make css_for_each_descendant() and friends include the origin css in ↵Tejun Heo
the iteration Previously, all css descendant iterators didn't include the origin (root of subtree) css in the iteration. The reasons were maintaining consistency with css_for_each_child() and that at the time of introduction more use cases needed skipping the origin anyway; however, given that css_is_descendant() considers self to be a descendant, omitting the origin css has become more confusing and looking at the accumulated use cases rather clearly indicates that including origin would result in simpler code overall. While this is a change which can easily lead to subtle bugs, cgroup API including the iterators has recently gone through major restructuring and no out-of-tree changes will be applicable without adjustments making this a relatively acceptable opportunity for this type of change. The conversions are mostly straight-forward. If the iteration block had explicit origin handling before or after, it's moved inside the iteration. If not, if (pos == origin) continue; is added. Some conversions add extra reference get/put around origin handling by consolidating origin handling and the rest. While the extra ref operations aren't strictly necessary, this shouldn't cause any noticeable difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com>
2013-08-08cgroup: unexport cgroup_css()Tejun Heo
cgroup_css() no longer has any user left outside cgroup.c proper and we don't want subsystems to grow new usages of the function. cgroup core should always provide the css to use to the subsystems, which will make dynamic creation and destruction of css's across the lifetime of a cgroup much more manageable than exposing the cgroup directly to subsystems and let them dereference css's from it. Make cgroup_css() a static function in cgroup.c. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-08cgroup: make cgroup_taskset deal with cgroup_subsys_state instead of cgroupTejun Heo
cgroup is in the process of converting to css (cgroup_subsys_state) from cgroup as the principal subsystem interface handle. This is mostly to prepare for the unified hierarchy support where css's will be created and destroyed dynamically but also helps cleaning up subsystem implementations as css is usually what they are interested in anyway. cgroup_taskset which is used by the subsystem attach methods is the last cgroup subsystem API which isn't using css as the handle. Update cgroup_taskset_cur_cgroup() to cgroup_taskset_cur_css() and cgroup_taskset_for_each() to take @skip_css instead of @skip_cgrp. The conversions are pretty mechanical. One exception is cpuset::cgroup_cs(), which lost its last user and got removed. This patch shouldn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-08-08cgroup: make cftype->[un]register_event() deal with cgroup_subsys_state ↵Tejun Heo
instead of cgroup cgroup is in the process of converting to css (cgroup_subsys_state) from cgroup as the principal subsystem interface handle. This is mostly to prepare for the unified hierarchy support where css's will be created and destroyed dynamically but also helps cleaning up subsystem implementations as css is usually what they are interested in anyway. cftype->[un]register_event() is among the remaining couple interfaces which still use struct cgroup. Convert it to cgroup_subsys_state. The conversion is mostly mechanical and removes the last users of mem_cgroup_from_cont() and cg_to_vmpressure(), which are removed. v2: indentation update as suggested by Li Zefan. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com>
2013-08-08cgroup: make task iterators deal with cgroup_subsys_state instead of cgroupTejun Heo
cgroup is in the process of converting to css (cgroup_subsys_state) from cgroup as the principal subsystem interface handle. This is mostly to prepare for the unified hierarchy support where css's will be created and destroyed dynamically but also helps cleaning up subsystem implementations as css is usually what they are interested in anyway. This patch converts task iterators to deal with css instead of cgroup. Note that under unified hierarchy, different sets of tasks will be considered belonging to a given cgroup depending on the subsystem in question and making the iterators deal with css instead cgroup provides them with enough information about the iteration. While at it, fix several function comment formats in cpuset.c. This patch doesn't introduce any behavior differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Matt Helsley <matthltc@us.ibm.com>
2013-08-08cgroup: remove struct cgroup_scannerTejun Heo
cgroup_scan_tasks() takes a pointer to struct cgroup_scanner as its sole argument and the only function of that struct is packing the arguments of the function call which are consisted of five fields. It's not too unusual to pack parameters into a struct when the number of arguments gets excessive or the whole set needs to be passed around a lot, but neither holds here making it just weird. Drop struct cgroup_scanner and pass the params directly to cgroup_scan_tasks(). Note that struct cpuset_change_nodemask_arg was added to cpuset.c to pass both ->cs and ->newmems pointer to cpuset_change_nodemask() using single data pointer. This doesn't make any functional differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-08cgroup: make cgroup_task_iter remember the cgroup being iteratedTejun Heo
Currently all cgroup_task_iter functions require @cgrp to be passed in, which is superflous and increases chance of usage error. Make cgroup_task_iter remember the cgroup being iterated and drop @cgrp argument from next and end functions. This patch doesn't introduce any behavior differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com>
2013-08-08cgroup: rename cgroup_iter to cgroup_task_iterTejun Heo
cgroup now has multiple iterators and it's quite confusing to have something which walks over tasks of a single cgroup named cgroup_iter. Let's rename it to cgroup_task_iter. While at it, reformat / update comments and replace the overview comment above the interface function decls with proper function comments. Such overview can be useful but function comments should be more than enough here. This is pure rename and doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com>
2013-08-08cgroup: relocate cgroup_advance_iter()Tejun Heo
For some reason, cgroup_advance_iter() is standing lonely all away from its iter comrades. Relocate it. This is cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-08cgroup: make hierarchy iterators deal with cgroup_subsys_state instead of cgroupTejun Heo
cgroup is currently in the process of transitioning to using css (cgroup_subsys_state) as the primary handle instead of cgroup in subsystem API. For hierarchy iterators, this is beneficial because * In most cases, css is the only thing subsystems care about anyway. * On the planned unified hierarchy, iterations for different subsystems will need to skip over different subtrees of the hierarchy depending on which subsystems are enabled on each cgroup. Passing around css makes it unnecessary to explicitly specify the subsystem in question as css is intersection between cgroup and subsystem * For the planned unified hierarchy, css's would need to be created and destroyed dynamically independent from cgroup hierarchy. Having cgroup core manage css iteration makes enforcing deref rules a lot easier. Most subsystem conversions are straight-forward. Noteworthy changes are * blkio: cgroup_to_blkcg() is no longer used. Removed. * freezer: cgroup_freezer() is no longer used. Removed. * devices: cgroup_to_devcgroup() is no longer used. Removed. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Jens Axboe <axboe@kernel.dk>
2013-08-08cgroup: always use cgroup_next_child() to walk the children listTejun Heo
There are several places where the children list is accessed directly. This patch converts those places to use cgroup_next_child(). This will help updating the hierarchy iterators to use @css instead of @cgrp. While cgroup_next_child() can be heavy in pathological cases - e.g. a lot of dead children, this shouldn't cause any noticeable behavior differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-08cgroup: convert cgroup_next_sibling() to cgroup_next_child()Tejun Heo
cgroup is transitioning to using css (cgroup_subsys_state) as the main subsys interface handle instead of cgroup and the iterators will be updated to use css too. The iterators need to walk the cgroup hierarchy and return the css's matching the origin css, which is a bit cumbersome to open code. This patch converts cgroup_next_sibling() to cgroup_next_child() so that it can handle all steps of direct child iteration. This will be used to update iterators to take @css instead of @cgrp. In addition to the new iteration init handling, cgroup_next_child() is restructured so that the different branches share the end of iteration condition check. This patch doesn't change any behavior. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-08cgroup: pass around cgroup_subsys_state instead of cgroup in file methodsTejun Heo
cgroup is currently in the process of transitioning to using struct cgroup_subsys_state * as the primary handle instead of struct cgroup. Please see the previous commit which converts the subsystem methods for rationale. This patch converts all cftype file operations to take @css instead of @cgroup. cftypes for the cgroup core files don't have their subsytem pointer set. These will automatically use the dummy_css added by the previous patch and can be converted the same way. Most subsystem conversions are straight forwards but there are some interesting ones. * freezer: update_if_frozen() is also converted to take @css instead of @cgroup for consistency. This will make the code look simpler too once iterators are converted to use css. * memory/vmpressure: mem_cgroup_from_css() needs to be exported to vmpressure while mem_cgroup_from_cont() can be made static. Updated accordingly. * cpu: cgroup_tg() doesn't have any user left. Removed. * cpuacct: cgroup_ca() doesn't have any user left. Removed. * hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left. Removed. * net_cls: cgrp_cls_state() doesn't have any user left. Removed. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-08-08cgroup: add cgroup->dummy_cssTejun Heo
cgroup subsystem API is being converted to use css (cgroup_subsys_state) as the main handle, which makes things a bit awkward for subsystem agnostic core features - the "cgroup.*" interface files and various iterations - a bit awkward as they don't have a css to use. This patch adds cgroup->dummy_css which has NULL ->ss and whose only role is pointing back to the cgroup. This will be used to support subsystem agnostic features on the coming css based API. css_parent() is updated to handle dummy_css's. Note that css will soon grow its own ->parent field and css_parent() will be made trivial. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-08cgroup: pin cgroup_subsys_state when opening a cgroupfs fileTejun Heo
Previously, each file read/write operation relied on the inode reference count pinning the cgroup and simply checked whether the cgroup was marked dead before proceeding to invoke the per-subsystem callback. This was rather silly as it didn't have any synchronization or css pinning around the check and the cgroup may be removed and all css refs drained between the DEAD check and actual method invocation. This patch pins the css between open() and release() so that it is guaranteed to be alive for all file operations and remove the silly DEAD checks from cgroup_file_read/write(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-08cgroup: add subsys backlink pointer to cftypeTejun Heo
cgroup is transitioning to using css (cgroup_subsys_state) instead of cgroup as the primary subsystem handle. The cgroupfs file interface will be converted to use css's which requires finding out the subsystem from cftype so that the matching css can be determined from the cgroup. This patch adds cftype->ss which points to the subsystem the file belongs to. The field is initialized while a cftype is being registered. This makes it unnecessary to explicitly specify the subsystem for other cftype handling functions. @ss argument dropped from various cftype handling functions. This patch shouldn't introduce any behavior differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: Jens Axboe <axboe@kernel.dk>
2013-08-08cgroup: pass around cgroup_subsys_state instead of cgroup in subsystem methodsTejun Heo
cgroup is currently in the process of transitioning to using struct cgroup_subsys_state * as the primary handle instead of struct cgroup * in subsystem implementations for the following reasons. * With unified hierarchy, subsystems will be dynamically bound and unbound from cgroups and thus css's (cgroup_subsys_state) may be created and destroyed dynamically over the lifetime of a cgroup, which is different from the current state where all css's are allocated and destroyed together with the associated cgroup. This in turn means that cgroup_css() should be synchronized and may return NULL, making it more cumbersome to use. * Differing levels of per-subsystem granularity in the unified hierarchy means that the task and descendant iterators should behave differently depending on the specific subsystem the iteration is being performed for. * In majority of the cases, subsystems only care about its part in the cgroup hierarchy - ie. the hierarchy of css's. Subsystem methods often obtain the matching css pointer from the cgroup and don't bother with the cgroup pointer itself. Passing around css fits much better. This patch converts all cgroup_subsys methods to take @css instead of @cgroup. The conversions are mostly straight-forward. A few noteworthy changes are * ->css_alloc() now takes css of the parent cgroup rather than the pointer to the new cgroup as the css for the new cgroup doesn't exist yet. Knowing the parent css is enough for all the existing subsystems. * In kernel/cgroup.c::offline_css(), unnecessary open coded css dereference is replaced with local variable access. This patch shouldn't cause any behavior differences. v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced with local variable @css as suggested by Li Zefan. Rebased on top of new for-3.12 which includes for-3.11-fixes so that ->css_free() invocation added by da0a12caff ("cgroup: fix a leak when percpu_ref_init() fails") is converted too. Suggested by Li Zefan. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-08-08cgroup: add css_parent()Tejun Heo
Currently, controllers have to explicitly follow the cgroup hierarchy to find the parent of a given css. cgroup is moving towards using cgroup_subsys_state as the main controller interface construct, so let's provide a way to climb the hierarchy using just csses. This patch implements css_parent() which, given a css, returns its parent. The function is guarnateed to valid non-NULL parent css as long as the target css is not at the top of the hierarchy. freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices are converted to use css_parent() instead of accessing cgroup->parent directly. * __parent_ca() is dropped from cpuacct and its usage is replaced with parent_ca(). The only difference between the two was NULL test on cgroup->parent which is now embedded in css_parent() making the distinction moot. Note that eventually a css->parent field will be added to css and the NULL check in css_parent() will go away. This patch shouldn't cause any behavior differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-08cgroup: add/update accessors which obtain subsys specific data from cssTejun Heo
css (cgroup_subsys_state) is usually embedded in a subsys specific data structure. Subsystems either use container_of() directly to cast from css to such data structure or has an accessor function wrapping such cast. As cgroup as whole is moving towards using css as the main interface handle, add and update such accessors to ease dealing with css's. All accessors explicitly handle NULL input and return NULL in those cases. While this looks like an extra branch in the code, as all controllers specific data structures have css as the first field, the casting doesn't involve any offsetting and the compiler can trivially optimize out the branch. * blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such accessor. Added. * memory, hugetlb and devices already had one but didn't explicitly handle NULL input. Updated. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>