summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2012-01-26time: Move common updates to a functionThomas Gleixner
CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-01-26time: Reorder so the hot data is togetherThomas Gleixner
Keep all the interesting data in a single cache line. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-01-26time: Remove most of xtime_lock usage in timekeeping.cJohn Stultz
Now that ntp.c's locking is reworked, we can remove most of the xtime_lock usage in timekeeping.c The remaining xtime_lock presence is really for jiffies access and the global load calculation. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-01-26ntp: Add ntp_lock to replace xtime_lockingJohn Stultz
Use a ntp_lock spin lock to replace xtime_lock locking in ntp.c CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-01-26ntp: Access tick_length variable via ntp_tick_length()John Stultz
Currently the NTP managed tick_length value is accessed globally, in preparations for locking cleanups, make sure it is accessed via a function and mark it as static. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-01-26ntp: Cleanup timex.hJohn Stultz
Move ntp_sycned to ntp.c and mark time_status as static. Also yank function declaration for non-existant function. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-01-26time: Add timekeeper lockJohn Stultz
Now that all the timekeeping variables are stored in the timekeeper structure, add a new lock to protect the structure. For now, this lock nests under the xtime_lock for writes. For readers, we don't need to take xtime_lock anymore. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-01-26time: Cleanup global variables and move them to the topJohn Stultz
Move global xtime_lock and timekeeping_suspended values up to the top of timekeeping.c CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-01-26time: Move raw_time into timekeeper structureJohn Stultz
In preparation for locking cleanups, move raw_time into timekeeper structure. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-01-26time: Move xtime into timekeeeper structureJohn Stultz
In preparation for locking cleanups, move xtime into timekeeper structure. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-01-26time: Move wall_to_monotonic into the timekeeper structureJohn Stultz
In preparation for locking cleanups, move wall_to_monotonic into the timekeeper structure. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-01-26time: Move total_sleep_time into the timekeeper structureJohn Stultz
Move total_sleep_time into the timekeeper structure in preparation for locking cleanups CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-01-26Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Call perf_cgroup_event_time() directly perf: Don't call release_callchain_buffers() if allocation fails
2012-01-26Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Add missing __cpuinit annotation in rcutorture code sched: Add "const" to is_idle_task() parameter rcu: Make rcutorture bool parameters really bool (core code) memblock: Fix alloc failure due to dumb underflow protection in memblock_find_in_range_node()
2012-01-26bugs, x86: Fix printk levels for panic, softlockups and stack dumpsPrarit Bhargava
rsyslog will display KERN_EMERG messages on a connected terminal. However, these messages are useless/undecipherable for a general user. For example, after a softlockup we get: Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ... kernel:Stack: Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ... kernel:Call Trace: Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ... kernel:Code: ff ff a8 08 75 25 31 d2 48 8d 86 38 e0 ff ff 48 89 d1 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 75 08 b1 01 4c 89 e0 0f 01 c9 <e8> ea 69 dd ff 4c 29 e8 48 89 c7 e8 0f bc da ff 49 89 c4 49 89 This happens because the printk levels for these messages are incorrect. Only an informational message should be displayed on a terminal. I modified the printk levels for various messages in the kernel and tested the output by using the drivers/misc/lkdtm.c kernel modules (ie, softlockups, panics, hard lockups, etc.) and confirmed that the console output was still the same and that the output to the terminals was correct. For example, in the case of a softlockup we now see the much more informative: Message from syslogd@intel-s3e37-04 at Jan 25 10:18:06 ... BUG: soft lockup - CPU4 stuck for 60s! instead of the above confusing messages. AFAICT, the messages no longer have to be KERN_EMERG. In the most important case of a panic we set console_verbose(). As for the other less severe cases the correct data is output to the console and /var/log/messages. Successfully tested by me using the drivers/misc/lkdtm.c module. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: dzickus@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1327586134-11926-1-git-send-email-prarit@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-26sched/nohz: Fix nohz cpu idle load balancing state with cpu hotplugSuresh Siddha
With the recent nohz scheduler changes, rq's nohz flag 'NOHZ_TICK_STOPPED' and its associated state doesn't get cleared immediately after the cpu exits idle. This gets cleared as part of the next tick seen on that cpu. For the cpu offline support, we need to clear this state manually. Fix it by registering a cpu notifier, which clears the nohz idle load balance state for this rq explicitly during the CPU_DYING notification. There won't be any nohz updates for that cpu, after the CPU_DYING notification. But lets be extra paranoid and skip updating the nohz state in the select_nohz_load_balancer() if the cpu is not in active state anymore. Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Reviewed-and-tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1327026538.16150.40.camel@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-26sched/s390: Fix compile error in sched/core.cChristian Borntraeger
Commit 029632fbb7b7c9d85063cc9eb470de6c54873df3 ("sched: Make separate sched*.c translation units") removed the include of asm/mutex.h from sched.c. This breaks the combination of: CONFIG_MUTEX_SPIN_ON_OWNER=yes CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX=yes like s390 without mutex debugging: CC kernel/sched/core.o kernel/sched/core.c: In function ‘mutex_spin_on_owner’: kernel/sched/core.c:3287: error: implicit declaration of function ‘arch_mutex_cpu_relax’ Lets re-add the include to kernel/sched/core.c Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1326268696-30904-1-git-send-email-borntraeger@de.ibm.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-26sched: Fix rq->nr_uninterruptible update racePeter Zijlstra
KOSAKI Motohiro noticed the following race: > CPU0 CPU1 > -------------------------------------------------------- > deactivate_task() > task->state = TASK_UNINTERRUPTIBLE; > activate_task() > rq->nr_uninterruptible--; > > schedule() > deactivate_task() > rq->nr_uninterruptible++; > Kosaki-San's scenario is possible when CPU0 runs __sched_setscheduler() against CPU1's current @task. __sched_setscheduler() does a dequeue/enqueue in order to move the task to its new queue (position) to reflect the newly provided scheduling parameters. However it should be completely invariant to nr_uninterruptible accounting, sched_setscheduler() doesn't affect readyness to run, merely policy on when to run. So convert the inappropriate activate/deactivate_task usage to enqueue/dequeue_task, which avoids the nr_uninterruptible accounting. Also convert the two other sites: __migrate_task() and normalize_task() that still use activate/deactivate_task. These sites aren't really a problem since __migrate_task() will only be called on non-running task (and therefore are immume to the described problem) and normalize_task() isn't ever used on regular systems. Also remove the comments from activate/deactivate_task since they're misleading at best. Reported-by: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1327486224.2614.45.camel@laptop Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-26Merge branch 'sigtrace' of git://github.com/utrace/linux into perf/coreIngo Molnar
2012-01-25irq: make SPARSE_IRQ an optionally hidden optionRob Herring
On ARM, we don't want SPARSE_IRQ to be a user visible option. Make SPARSE_IRQ visible based on MAY_HAVE_SPARSE_IRQ instead of depending on HAVE_SPARSE_IRQ. With this, SPARSE_IRQ is not visible on C6X and ARM. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Mark Salter <msalter@redhat.com> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-c6x-dev@linux-c6x.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-sh@vger.kernel.org
2012-01-24sysctl: Move the implementation into fs/proc/proc_sysctl.cEric W. Biederman
Move the core sysctl code from kernel/sysctl.c and kernel/sysctl_check.c into fs/proc/proc_sysctl.c. Currently sysctl maintenance is hampered by the sysctl implementation being split across 3 files with artificial layering between them. Consolidate the entire sysctl implementation into 1 file so that it is easier to see what is going on and hopefully allowing for simpler maintenance. For functions that are now only used in fs/proc/proc_sysctl.c remove their declarations from sysctl.h and make them static in fs/proc/proc_sysctl.c Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-01-24sysctl: Register the base sysctl table like any other sysctl table.Eric W. Biederman
Simplify the code by treating the base sysctl table like any other sysctl table and register it with register_sysctl_table. To ensure this table is registered early enough to avoid problems call sysctl_init from proc_sys_init. Rename sysctl_net.c:sysctl_init() to net_sysctl_init() to avoid name conflicts now that kernel/sysctl.c:sysctl_init() is no longer static. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-01-24sysctl: Consolidate !CONFIG_SYSCTL handlingEric W. Biederman
- In sysctl.h move functions only available if CONFIG_SYSCL is defined inside of #ifdef CONFIG_SYSCTL - Move the stub function definitions for !CONFIG_SYSCTL into sysctl.h and make them static inlines. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-01-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Davem says: 1) Fix JIT code generation on x86-64 for divide by zero, from Eric Dumazet. 2) tg3 header length computation correction from Eric Dumazet. 3) More build and reference counting fixes for socket memory cgroup code from Glauber Costa. 4) module.h snuck back into a core header after all the hard work we did to remove that, from Paul Gortmaker and Jesper Dangaard Brouer. 5) Fix PHY naming regression and add some new PCI IDs in stmmac, from Alessandro Rubini. 6) Netlink message generation fix in new team driver, should only advertise the entries that changed during events, from Jiri Pirko. 7) SRIOV VF registration and unregistration fixes, and also add a missing PCI ID, from Roopa Prabhu. 8) Fix infinite loop in tx queue flush code of brcmsmac, from Stanislaw Gruszka. 9) ftgmac100/ftmac100 build fix, missing interrupt.h include. 10) Memory leak fix in net/hyperv do_set_mutlicast() handling, from Wei Yongjun. 11) Off by one fix in netem packet scheduler, from Vijay Subramanian. 12) TCP loss detection fix from Yuchung Cheng. 13) TCP reset packet MD5 calculation uses wrong address, fix from Shawn Lu. 14) skge carrier assertion and DMA mapping fixes from Stephen Hemminger. 15) Congestion recovery undo performed at the wrong spot in BIC and CUBIC congestion control modules, fix from Neal Cardwell. 16) Ethtool ETHTOOL_GSSET_INFO is unnecessarily restrictive, from Michał Mirosław. 17) Fix triggerable race in ipv6 sysctl handling, from Francesco Ruggeri. 18) Statistics bug fixes in mlx4 from Eugenia Emantayev. 19) rds locking bug fix during info dumps, from your's truly. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits) rds: Make rds_sock_lock BH rather than IRQ safe. netprio_cgroup.h: dont include module.h from other includes net: flow_dissector.c missing include linux/export.h team: send only changed options/ports via netlink net/hyperv: fix possible memory leak in do_set_multicast() drivers/net: dsa/mv88e6xxx.c files need linux/module.h stmmac: added PCI identifiers llc: Fix race condition in llc_ui_recvmsg stmmac: fix phy naming inconsistency dsa: Add reporting of silicon revision for Marvell 88E6123/88E6161/88E6165 switches. tg3: fix ipv6 header length computation skge: add byte queue limit support mv643xx_eth: Add Rx Discard and Rx Overrun statistics bnx2x: fix compilation error with SOE in fw_dump bnx2x: handle CHIP_REVISION during init_one bnx2x: allow user to change ring size in ISCSI SD mode bnx2x: fix Big-Endianess in ethtool -t bnx2x: fixed ethtool statistics for MF modes bnx2x: credit-leakage fixup on vlan_mac_del_all macvlan: fix a possible use after free ...
2012-01-23Merge tag 'pm-fixes-for-3.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Power management fixes for 3.3 Two fixes for regressions introduced during the merge window, one fix for a long-standing obscure issue in the computation of hibernate image size and two small PM documentation fixes. * tag 'pm-fixes-for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / Sleep: Fix read_unlock_usermodehelper() call. PM / Hibernate: Rewrite unlock_system_sleep() to fix s2disk regression PM / Hibernate: Correct additional pages number calculation PM / Documentation: Fix minor issue in freezing_of_tasks.txt PM / Documentation: Fix spelling mistake in basic-pm-debugging.txt
2012-01-23Merge branch 'kernel-doc' from Randy DunlapLinus Torvalds
The usual kernel-doc fixups from Randy. Some of them David acked as merged in his tree, this is the random left-overs. * kernel-doc: docbook: fix sched source file names in device-drivers book docbook: change iomap source filename in deviceiobook docbook: don't use serial_core.h in device-drivers book kernel-doc: fix kernel-doc warnings in sched kernel-doc: fix new warnings in cfg80211.h kernel-doc: fix new warning in usb.h kernel-doc: fix new warnings in device.h kernel-doc: fix new warnings in debugfs kernel-doc: fix new warning in regulator core kernel-doc: fix new warnings in pci kernel-doc: fix new warnings in driver-core kernel-doc: fix new warnings in auditsc.c scripts/kernel-doc: fix fatal error caused by cfg80211.h
2012-01-23kernel-doc: fix kernel-doc warnings in schedRandy Dunlap
Fix new kernel-doc notation warnings: Warning(include/linux/sched.h:2094): No description found for parameter 'p' Warning(include/linux/sched.h:2094): Excess function parameter 'tsk' description in 'is_idle_task' Warning(kernel/sched/cpupri.c:139): No description found for parameter 'newpri' Warning(kernel/sched/cpupri.c:139): Excess function parameter 'pri' description in 'cpupri_set' Warning(kernel/sched/cpupri.c:208): Excess function parameter 'bootmem' description in 'cpupri_init' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-23kernel-doc: fix new warnings in auditsc.cRandy Dunlap
Fix new kernel-doc warnings in auditsc.c: Warning(kernel/auditsc.c:1875): No description found for parameter 'success' Warning(kernel/auditsc.c:1875): No description found for parameter 'return_code' Warning(kernel/auditsc.c:1875): Excess function parameter 'pt_regs' description in '__audit_syscall_exit' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-23kprobes: initialize before using a hlistAnanth N Mavinakayanahalli
Commit ef53d9c5e ("kprobes: improve kretprobe scalability with hashed locking") introduced a bug where we can potentially leak kretprobe_instances since we initialize a hlist head after having used it. Initialize the hlist head before using it. Reported by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Jim Keniston <jkenisto@us.ibm.com> Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Srinivasa D S <srinivasa@in.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-22net: introduce res_counter_charge_nofail() for socket allocationsGlauber Costa
There is a case in __sk_mem_schedule(), where an allocation is beyond the maximum, but yet we are allowed to proceed. It happens under the following condition: sk->sk_wmem_queued + size >= sk->sk_sndbuf The network code won't revert the allocation in this case, meaning that at some point later it'll try to do it. Since this is never communicated to the underlying res_counter code, there is an inbalance in res_counter uncharge operation. I see two ways of fixing this: 1) storing the information about those allocations somewhere in memcg, and then deducting from that first, before we start draining the res_counter, 2) providing a slightly different allocation function for the res_counter, that matches the original behavior of the network code more closely. I decided to go for #2 here, believing it to be more elegant, since #1 would require us to do basically that, but in a more obscure way. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> CC: Tejun Heo <tj@kernel.org> CC: Li Zefan <lizf@cn.fujitsu.com> CC: Laurent Chavey <chavey@google.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-21perf: Call perf_cgroup_event_time() directlyNamhyung Kim
The perf_event_time() will call perf_cgroup_event_time() if @event is a cgroup event. Just do it directly and avoid the extra check.. Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Link: http://lkml.kernel.org/r/1327021966-27688-2-git-send-email-namhyung.kim@lge.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-21perf: Don't call release_callchain_buffers() if allocation failsNamhyung Kim
When alloc_callchain_buffers() fails, it frees all of entries before return. In addition, calling the release_callchain_buffers() will cause a NULL pointer dereference since callchain_cpu_entries is not set. Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Link: http://lkml.kernel.org/r/1327021966-27688-1-git-send-email-namhyung.kim@lge.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-20cgroup: replace tasklist_lock with rcu_read_lockMandeep Singh Baines
We can replace the tasklist_lock in cgroup_attach_proc with an rcu_read_lock(). Changes in V4: * https://lkml.org/lkml/2011/12/23/284 (Frederic Weisbecker) * Minimize size of rcu_read_lock critical section * Add comment * https://lkml.org/lkml/2011/12/26/136 (Li Zefan) * Split into two patches Changes in V3: * https://lkml.org/lkml/2011/12/22/419 (Frederic Weisbecker) * Add an rcu_read_lock to protect against exit Changes in V2: * https://lkml.org/lkml/2011/12/22/86 (Tejun Heo) * Use a goto instead of returning -EAGAIN Suggested-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: containers@lists.linux-foundation.org Cc: cgroups@vger.kernel.org Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Menage <paul@paulmenage.org>
2012-01-20cgroup: simplify double-check locking in cgroup_attach_procMandeep Singh Baines
To keep the complexity of the double-check locking in one place, move the thread_group_leader check up into attach_task_by_pid(). This allows us to use a goto instead of returning -EAGAIN. While at it, convert a couple of returns to gotos and use rcu for the !pid case also in order to simplify the logic. Changes in V2: * https://lkml.org/lkml/2011/12/22/86 (Tejun Heo) * Use a goto instead of returning -EAGAIN Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: containers@lists.linux-foundation.org Cc: cgroups@vger.kernel.org Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Menage <paul@paulmenage.org>
2012-01-20cgroup: move struct cgroup_pidlist out from the header fileLi Zefan
It's internally used only. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2012-01-19Merge branches 'sched-urgent-for-linus', 'perf-urgent-for-linus' and ↵Linus Torvalds
'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/accounting, proc: Fix /proc/stat interrupts sum * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tracepoints/module: Fix disabling tracepoints with taint CRAP or OOT x86/kprobes: Add arch/x86/tools/insn_sanity to .gitignore x86/kprobes: Fix typo transferred from Intel manual * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, syscall: Need __ARCH_WANT_SYS_IPC for 32 bits x86, tsc: Fix SMI induced variation in quick_pit_calibrate() x86, opcode: ANDN and Group 17 in x86-opcode-map.txt x86/kconfig: Move the ZONE_DMA entry under a menu x86/UV2: Add accounting for BAU strong nacks x86/UV2: Ack BAU interrupt earlier x86/UV2: Remove stale no-resources test for UV2 BAU x86/UV2: Work around BAU bug x86/UV2: Fix BAU destination timeout initialization x86/UV2: Fix new UV2 hardware by using native UV2 broadcast mode x86: Get rid of dubious one-bit signed bitfield
2012-01-19PM / Hibernate: Correct additional pages number calculationNamhyung Kim
The struct bm_block is allocated by chain_alloc(), so it'd better counting it in LINKED_PAGE_DATA_SIZE. Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-01-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit: (29 commits) audit: no leading space in audit_log_d_path prefix audit: treat s_id as an untrusted string audit: fix signedness bug in audit_log_execve_info() audit: comparison on interprocess fields audit: implement all object interfield comparisons audit: allow interfield comparison between gid and ogid audit: complex interfield comparison helper audit: allow interfield comparison in audit rules Kernel: Audit Support For The ARM Platform audit: do not call audit_getname on error audit: only allow tasks to set their loginuid if it is -1 audit: remove task argument to audit_set_loginuid audit: allow audit matching on inode gid audit: allow matching on obj_uid audit: remove audit_finish_fork as it can't be called audit: reject entry,always rules audit: inline audit_free to simplify the look of generic code audit: drop audit_set_macxattr as it doesn't do anything audit: inline checks for not needing to collect aux records audit: drop some potentially inadvisable likely notations ... Use evil merge to fix up grammar mistakes in Kconfig file. Bad speling and horrible grammar (and copious swearing) is to be expected, but let's keep it to commit messages and comments, rather than expose it to users in config help texts or printouts.
2012-01-17audit: no leading space in audit_log_d_path prefixKees Cook
audit_log_d_path() injects an additional space before the prefix, which serves no purpose and doesn't mix well with other audit_log*() functions that do not sneak extra characters into the log. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: fix signedness bug in audit_log_execve_info()Xi Wang
In the loop, a size_t "len" is used to hold the return value of audit_log_single_execve_arg(), which returns -1 on error. In that case the error handling (len <= 0) will be bypassed since "len" is unsigned, and the loop continues with (p += len) being wrapped. Change the type of "len" to signed int to fix the error handling. size_t len; ... for (...) { len = audit_log_single_execve_arg(...); if (len <= 0) break; p += len; } Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: comparison on interprocess fieldsPeter Moody
This allows audit to specify rules in which we compare two fields of a process. Such as is the running process uid != to the running process euid? Signed-off-by: Peter Moody <pmoody@google.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: implement all object interfield comparisonsPeter Moody
This completes the matrix of interfield comparisons between uid/gid information for the current task and the uid/gid information for inodes. aka I can audit based on differences between the euid of the process and the uid of fs objects. Signed-off-by: Peter Moody <pmoody@google.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: allow interfield comparison between gid and ogidEric Paris
Allow audit rules to compare the gid of the running task to the gid of the inode in question. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: complex interfield comparison helperEric Paris
Rather than code the same loop over and over implement a helper function which uses some pointer magic to make it generic enough to be used numerous places as we implement more audit interfield comparisons Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: allow interfield comparison in audit rulesEric Paris
We wish to be able to audit when a uid=500 task accesses a file which is uid=0. Or vice versa. This patch introduces a new audit filter type AUDIT_FIELD_COMPARE which takes as an 'enum' which indicates which fields should be compared. At this point we only define the task->uid vs inode->uid, but other comparisons can be added. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: do not call audit_getname on errorEric Paris
Just a code cleanup really. We don't need to make a function call just for it to return on error. This also makes the VFS function even easier to follow and removes a conditional on a hot path. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: only allow tasks to set their loginuid if it is -1Eric Paris
At the moment we allow tasks to set their loginuid if they have CAP_AUDIT_CONTROL. In reality we want tasks to set the loginuid when they log in and it be impossible to ever reset. We had to make it mutable even after it was once set (with the CAP) because on update and admin might have to restart sshd. Now sshd would get his loginuid and the next user which logged in using ssh would not be able to set his loginuid. Systemd has changed how userspace works and allowed us to make the kernel work the way it should. With systemd users (even admins) are not supposed to restart services directly. The system will restart the service for them. Thus since systemd is going to loginuid==-1, sshd would get -1, and sshd would be allowed to set a new loginuid without special permissions. If an admin in this system were to manually start an sshd he is inserting himself into the system chain of trust and thus, logically, it's his loginuid that should be used! Since we have old systems I make this a Kconfig option. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: remove task argument to audit_set_loginuidEric Paris
The function always deals with current. Don't expose an option pretending one can use it for something. You can't. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: allow audit matching on inode gidEric Paris
Much like the ability to filter audit on the uid of an inode collected, we should be able to filter on the gid of the inode. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: allow matching on obj_uidEric Paris
Allow syscall exit filter matching based on the uid of the owner of an inode used in a syscall. aka: auditctl -a always,exit -S open -F obj_uid=0 -F perm=wa Signed-off-by: Eric Paris <eparis@redhat.com>