summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-02-23mm: vmscan: clean up get_scan_count()Johannes Weiner
Reclaim pressure balance between anon and file pages is calculated through a tuple of numerators and a shared denominator. Exceptional cases that want to force-scan anon or file pages configure the numerators and denominator such that one list is preferred, which is not necessarily the most obvious way: fraction[0] = 1; fraction[1] = 0; denominator = 1; goto out; Make this easier by making the force-scan cases explicit and use the fractionals only in case they are calculated from reclaim history. [akpm@linux-foundation.org: avoid using unintialized_var()] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Satoru Moriya <satoru.moriya@hds.com> Cc: Simon Jeons <simon.jeons@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23mm: vmscan: improve comment on low-page cache handlingJohannes Weiner
Fix comment style and elaborate on why anonymous memory is force-scanned when file cache runs low. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Satoru Moriya <satoru.moriya@hds.com> Cc: Simon Jeons <simon.jeons@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23mm: vmscan: clarify how swappiness, highest priority, memcg interactJohannes Weiner
A swappiness of 0 has a slightly different meaning for global reclaim (may swap if file cache really low) and memory cgroup reclaim (never swap, ever). In addition, global reclaim at highest priority will scan all LRU lists equal to their size and ignore other balancing heuristics. UNLESS swappiness forbids swapping, then the lists are balanced based on recent reclaim effectiveness. UNLESS file cache is running low, then anonymous pages are force-scanned. This (total mess of a) behaviour is implicit and not obvious from the way the code is organized. At least make it apparent in the code flow and document the conditions. It will be it easier to come up with sane semantics later. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Satoru Moriya <satoru.moriya@hds.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Simon Jeons <simon.jeons@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23mm: vmscan: save work scanning (almost) empty LRU listsJohannes Weiner
In certain cases (kswapd reclaim, memcg target reclaim), a fixed minimum amount of pages is scanned from the LRU lists on each iteration, to make progress. Do not make this minimum bigger than the respective LRU list size, however, and save some busy work trying to isolate and reclaim pages that are not there. Empty LRU lists are quite common with memory cgroups in NUMA environments because there exists a set of LRU lists for each zone for each memory cgroup, while the memory of a single cgroup is expected to stay on just one node. The number of expected empty LRU lists is thus memcgs * (nodes - 1) * lru types Each attempt to reclaim from an empty LRU list does expensive size comparisons between lists, acquires the zone's lru lock etc. Avoid that. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Satoru Moriya <satoru.moriya@hds.com> Cc: Simon Jeons <simon.jeons@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23mm: memcg: only evict file pages when we have plentyJohannes Weiner
Commit e9868505987a ("mm, vmscan: only evict file pages when we have plenty") makes a point of not going for anonymous memory while there is still enough inactive cache around. The check was added only for global reclaim, but it is just as useful to reduce swapping in memory cgroup reclaim: 200M-memcg-defconfig-j2 vanilla patched Real time 454.06 ( +0.00%) 453.71 ( -0.08%) User time 668.57 ( +0.00%) 668.73 ( +0.02%) System time 128.92 ( +0.00%) 129.53 ( +0.46%) Swap in 1246.80 ( +0.00%) 814.40 ( -34.65%) Swap out 1198.90 ( +0.00%) 827.00 ( -30.99%) Pages allocated 16431288.10 ( +0.00%) 16434035.30 ( +0.02%) Major faults 681.50 ( +0.00%) 593.70 ( -12.86%) THP faults 237.20 ( +0.00%) 242.40 ( +2.18%) THP collapse 241.20 ( +0.00%) 248.50 ( +3.01%) THP splits 157.30 ( +0.00%) 161.40 ( +2.59%) Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Satoru Moriya <satoru.moriya@hds.com> Cc: Simon Jeons <simon.jeons@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23CMA: make putback_lru_pages() call conditionalSrinivas Pandruvada
As per documentation and other places calling putback_lru_pages(), putback_lru_pages() is called on error only. Make the CMA code behave consistently. [akpm@linux-foundation.org: remove a test-n-branch in the wrapup code] Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23mm/hugetlb.c: convert to pr_foo()Andrew Morton
Cc: Michal Hocko <mhocko@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23mm/memcontrol.c: convert printk(KERN_FOO) to pr_foo()Andrew Morton
Acked-by: Sha Zhengju <handai.szj@taobao.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23memcg, oom: provide more precise dump info while memcg oom happeningSha Zhengju
Currently when a memcg oom is happening the oom dump messages is still global state and provides few useful info for users. This patch prints more pointed memcg page statistics for memcg-oom and take hierarchy into consideration: Based on Michal's advice, we take hierarchy into consideration: supppose we trigger an OOM on A's limit root_memcg | A (use_hierachy=1) / \ B C | D then the printed info will be: Memory cgroup stats for /A:... Memory cgroup stats for /A/B:... Memory cgroup stats for /A/C:... Memory cgroup stats for /A/B/D:... Following are samples of oom output: (1) Before change: mal-80 invoked oom-killer:gfp_mask=0xd0, order=0, oom_score_adj=0 mal-80 cpuset=/ mems_allowed=0 Pid: 2976, comm: mal-80 Not tainted 3.7.0+ #10 Call Trace: [<ffffffff8167fbfb>] dump_header+0x83/0x1ca ..... (call trace) [<ffffffff8168a818>] page_fault+0x28/0x30 <<<<<<<<<<<<<<<<<<<<< memcg specific information Task in /A/B/D killed as a result of limit of /A memory: usage 101376kB, limit 101376kB, failcnt 57 memory+swap: usage 101376kB, limit 101376kB, failcnt 0 kmem: usage 0kB, limit 9007199254740991kB, failcnt 0 <<<<<<<<<<<<<<<<<<<<< print per cpu pageset stat Mem-Info: Node 0 DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 ...... CPU 3: hi: 0, btch: 1 usd: 0 Node 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 173 ...... CPU 3: hi: 186, btch: 31 usd: 130 <<<<<<<<<<<<<<<<<<<<< print global page state active_anon:92963 inactive_anon:40777 isolated_anon:0 active_file:33027 inactive_file:51718 isolated_file:0 unevictable:0 dirty:3 writeback:0 unstable:0 free:729995 slab_reclaimable:6897 slab_unreclaimable:6263 mapped:20278 shmem:35971 pagetables:5885 bounce:0 free_cma:0 <<<<<<<<<<<<<<<<<<<<< print per zone page state Node 0 DMA free:15836kB ... all_unreclaimable? no lowmem_reserve[]: 0 3175 3899 3899 Node 0 DMA32 free:2888564kB ... all_unrelaimable? no lowmem_reserve[]: 0 0 724 724 lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 1*4kB (U) ... 3*4096kB (M) = 15836kB Node 0 DMA32: 41*4kB (UM) ... 702*4096kB (MR) = 2888316kB 120710 total pagecache pages 0 pages in swap cache <<<<<<<<<<<<<<<<<<<<< print global swap cache stat Swap cache stats: add 0, delete 0, find 0/0 Free swap = 499708kB Total swap = 499708kB 1040368 pages RAM 58678 pages reserved 169065 pages shared 173632 pages non-shared [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 2693] 0 2693 6005 1324 17 0 0 god [ 2754] 0 2754 6003 1320 16 0 0 god [ 2811] 0 2811 5992 1304 18 0 0 god [ 2874] 0 2874 6005 1323 18 0 0 god [ 2935] 0 2935 8720 7742 21 0 0 mal-30 [ 2976] 0 2976 21520 17577 42 0 0 mal-80 Memory cgroup out of memory: Kill process 2976 (mal-80) score 665 or sacrifice child Killed process 2976 (mal-80) total-vm:86080kB, anon-rss:69964kB, file-rss:344kB We can see that messages dumped by show_free_areas() are longsome and can provide so limited info for memcg that just happen oom. (2) After change mal-80 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 mal-80 cpuset=/ mems_allowed=0 Pid: 2704, comm: mal-80 Not tainted 3.7.0+ #10 Call Trace: [<ffffffff8167fd0b>] dump_header+0x83/0x1d1 .......(call trace) [<ffffffff8168a918>] page_fault+0x28/0x30 Task in /A/B/D killed as a result of limit of /A <<<<<<<<<<<<<<<<<<<<< memcg specific information memory: usage 102400kB, limit 102400kB, failcnt 140 memory+swap: usage 102400kB, limit 102400kB, failcnt 0 kmem: usage 0kB, limit 9007199254740991kB, failcnt 0 Memory cgroup stats for /A: cache:32KB rss:30984KB mapped_file:0KB swap:0KB inactive_anon:6912KB active_anon:24072KB inactive_file:32KB active_file:0KB unevictable:0KB Memory cgroup stats for /A/B: cache:0KB rss:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Memory cgroup stats for /A/C: cache:0KB rss:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Memory cgroup stats for /A/B/D: cache:32KB rss:71352KB mapped_file:0KB swap:0KB inactive_anon:6656KB active_anon:64696KB inactive_file:16KB active_file:16KB unevictable:0KB [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 2260] 0 2260 6006 1325 18 0 0 god [ 2383] 0 2383 6003 1319 17 0 0 god [ 2503] 0 2503 6004 1321 18 0 0 god [ 2622] 0 2622 6004 1321 16 0 0 god [ 2695] 0 2695 8720 7741 22 0 0 mal-30 [ 2704] 0 2704 21520 17839 43 0 0 mal-80 Memory cgroup out of memory: Kill process 2704 (mal-80) score 669 or sacrifice child Killed process 2704 (mal-80) total-vm:86080kB, anon-rss:71016kB, file-rss:340kB This version provides more pointed info for memcg in "Memory cgroup stats for XXX" section. Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23drivers/md/persistent-data/dm-transaction-manager.c: rename HASH_SIZEAndrew Morton
Fix the warning: drivers/md/persistent-data/dm-transaction-manager.c:28:1: warning: "HASH_SIZE" redefined In file included from include/linux/elevator.h:5, from include/linux/blkdev.h:216, from drivers/md/persistent-data/dm-block-manager.h:11, from drivers/md/persistent-data/dm-transaction-manager.h:10, from drivers/md/persistent-data/dm-transaction-manager.c:6: include/linux/hashtable.h:22:1: warning: this is the location of the previous definition Cc: Alasdair Kergon <agk@redhat.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Benjamin Herrenschmidt: "So from the depth of frozen Minnesota, here's the powerpc pull request for 3.9. It has a few interesting highlights, in addition to the usual bunch of bug fixes, minor updates, embedded device tree updates and new boards: - Hand tuned asm implementation of SHA1 (by Paulus & Michael Ellerman) - Support for Doorbell interrupts on Power8 (kind of fast thread-thread IPIs) by Ian Munsie - Long overdue cleanup of the way we handle relocation of our open firmware trampoline (prom_init.c) on 64-bit by Anton Blanchard - Support for saving/restoring & context switching the PPR (Processor Priority Register) on server processors that support it. This allows the kernel to preserve thread priorities established by userspace. By Haren Myneni. - DAWR (new watchpoint facility) support on Power8 by Michael Neuling - Ability to change the DSCR (Data Stream Control Register) which controls cache prefetching on a running process via ptrace by Alexey Kardashevskiy - Support for context switching the TAR register on Power8 (new branch target register meant to be used by some new specific userspace perf event interrupt facility which is yet to be enabled) by Ian Munsie. - Improve preservation of the CFAR register (which captures the origin of a branch) on various exception conditions by Paulus. - Move the Bestcomm DMA driver from arch powerpc to drivers/dma where it belongs by Philippe De Muyter - Support for Transactional Memory on Power8 by Michael Neuling (based on original work by Matt Evans). For those curious about the feature, the patch contains a pretty good description." (See commit db8ff907027b: "powerpc: Documentation for transactional memory on powerpc" for the mentioned description added to the file Documentation/powerpc/transactional_memory.txt) * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (140 commits) powerpc/kexec: Disable hard IRQ before kexec powerpc/85xx: l2sram - Add compatible string for BSC9131 platform powerpc/85xx: bsc9131 - Correct typo in SDHC device node powerpc/e500/qemu-e500: enable coreint powerpc/mpic: allow coreint to be determined by MPIC version powerpc/fsl_pci: Store the pci ctlr device ptr in the pci ctlr struct powerpc/85xx: Board support for ppa8548 powerpc/fsl: remove extraneous DIU platform functions arch/powerpc/platforms/85xx/p1022_ds.c: adjust duplicate test powerpc: Documentation for transactional memory on powerpc powerpc: Add transactional memory to pseries and ppc64 defconfigs powerpc: Add config option for transactional memory powerpc: Add transactional memory to POWER8 cpu features powerpc: Add new transactional memory state to the signal context powerpc: Hook in new transactional memory code powerpc: Routines for FP/VSX/VMX unavailable during a transaction powerpc: Add transactional memory unavaliable execption handler powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes powerpc: Add FP/VSX and VMX register load functions for transactional memory powerpc: Add helper functions for transactional memory context switching ...
2013-02-24powerpc/kexec: Disable hard IRQ before kexecPhileas Fogg
Disable hard IRQ before kexec a new kernel image. Not doing it can result in corrupted data in the memory segments reserved for the new kernel. Signed-off-by: Phileas Fogg <phileas-fogg@mail.ru> CC: <stable@vger.kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-02-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lliubbo/blackfin Pull small blackfin update from Bob Liu. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lliubbo/blackfin: blackfin: time-ts: Remove duplicate assignment blackfin: pm: fix build error blackfin: sync data in blackfin write buffer blackfin: use bitmap library functions blackfin: mem_init: update dmc config register
2013-02-22Merge branch 'parisc-3.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller. The bulk of this is optimized page coping/clearing and cache flushing (virtual caches are lovely) by John David Anglin. * 'parisc-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: (31 commits) arch/parisc/include/asm: use ARRAY_SIZE macro in mmzone.h parisc: remove empty lines and unnecessary #ifdef coding in include/asm/signal.h parisc: sendfile and sendfile64 syscall cleanups parisc: switch to available compat_sched_rr_get_interval implementation parisc: fix fallocate syscall parisc: fix error return codes for rt_sigaction and rt_sigprocmask parisc: convert msgrcv and msgsnd syscalls to use compat layer parisc: correctly wire up mq_* functions for CONFIG_COMPAT case parisc: fix personality on 32bit kernel parisc: wire up process_vm_readv, process_vm_writev, kcmp and finit_module syscalls parisc: led driver requires CONFIG_VM_EVENT_COUNTERS parisc: remove unused compat_rt_sigframe.h header parisc/mm/fault.c: Port OOM changes to do_page_fault parisc: space register variables need to be in native length (unsigned long) parisc: fix ptrace breakage parisc: always detect multiple physical ranges parisc: ensure that mmapped shared pages are aligned at SHMLBA addresses parisc: disable preemption while flushing D- or I-caches through TMPALIAS region parisc: remove IRQF_DISABLED parisc: fixes and cleanups in page cache flushing (4/4) ...
2013-02-22Merge tag 'please-pull-fix-ia64-build' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull ia64 build breakage fix from Tony Luck. * tag 'please-pull-fix-ia64-build' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: sched: move RR_TIMESLICE from sysctl.h to rt.h
2013-02-22Merge branch 'core-locking-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking changes from Ingo Molnar: "The biggest change is the rwsem lock-steal improvements, both to the assembly optimized and the spinlock based variants. The other notable change is the clean up of the seqlock implementation to be based on the seqcount infrastructure. The rest is assorted smaller debuggability, cleanup and continued -rt locking changes." * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rwsem-spinlock: Implement writer lock-stealing for better scalability futex: Revert "futex: Mark get_robust_list as deprecated" generic: Use raw local irq variant for generic cmpxchg lockdep: Selftest: convert spinlock to raw spinlock seqlock: Use seqcount infrastructure seqlock: Remove unused functions ntp: Make ntp_lock raw intel_idle: Convert i7300_idle_lock to raw_spinlock locking: Various static lock initializer fixes lockdep: Print more info when MAX_LOCK_DEPTH is exceeded rwsem: Implement writer lock-stealing for better scalability lockdep: Silence warning if CONFIG_LOCKDEP isn't set watchdog: Use local_clock for get_timestamp() lockdep: Rename print_unlock_inbalance_bug() to print_unlock_imbalance_bug() locking/stat: Fix a typo
2013-02-22Merge branch 'x86/microcode' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loading update from Peter Anvin: "This patchset lets us update the CPU microcode very, very early in initialization if the BIOS fails to do so (never happens, right?) This is handy for dealing with things like the Atom erratum where we have to run without PSE because microcode loading happens too late. As I mentioned in the x86/mm push request it depends on that infrastructure but it is otherwise a standalone feature." * 'x86/microcode' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/Kconfig: Make early microcode loading a configuration feature x86/mm/init.c: Copy ucode from initrd image to kernel memory x86/head64.c: Early update ucode in 64-bit x86/head_32.S: Early update ucode in 32-bit x86/microcode_intel_early.c: Early update ucode on Intel's CPU x86/tlbflush.h: Define __native_flush_tlb_global_irq_disabled() x86/microcode_intel_lib.c: Early update ucode on Intel's CPU x86/microcode_core_early.c: Define interfaces for early loading ucode x86/common.c: load ucode in 64 bit or show loading ucode info in 32 bit on AP x86/common.c: Make have_cpuid_p() a global function x86/microcode_intel.h: Define functions and macros for early loading ucode x86, doc: Documentation for early microcode loading
2013-02-22x86-64, xen, mmu: Provide an early version of write_cr3.Konrad Rzeszutek Wilk
With commit 8170e6bed465 ("x86, 64bit: Use a #PF handler to materialize early mappings on demand") we started hitting an early bootup crash where the Xen hypervisor would inform us that: (XEN) d7:v0: unhandled page fault (ec=0000) (XEN) Pagetable walk from ffffea000005b2d0: (XEN) L4[0x1d4] = 0000000000000000 ffffffffffffffff (XEN) domain_crash_sync called from entry.S (XEN) Domain 7 (vcpu#0) crashed on cpu#3: (XEN) ----[ Xen-4.2.0 x86_64 debug=n Not tainted ]---- .. that Xen was unable to context switch back to dom0. Looking at the calling stack we find: [<ffffffff8103feba>] xen_get_user_pgd+0x5a <-- [<ffffffff8103feba>] xen_get_user_pgd+0x5a [<ffffffff81042d27>] xen_write_cr3+0x77 [<ffffffff81ad2d21>] init_mem_mapping+0x1f9 [<ffffffff81ac293f>] setup_arch+0x742 [<ffffffff81666d71>] printk+0x48 We are trying to figure out whether we need to up-date the user PGD as well. Please keep in mind that under 64-bit PV guests we have a limited amount of rings: 0 for the Hypervisor, and 1 for both the Linux kernel and user-space. As such the Linux pvops'fied version of write_cr3 checks if it has to update the user-space cr3 as well. That clearly is not needed during early bootup. The recent changes (see above git commit) streamline the x86 page table allocation to be much simpler (And also incidentally the #PF handler ends up in spirit being similar to how the Xen toolstack sets up the initial page-tables). The fix is to have an early-bootup version of cr3 that just loads the kernel %cr3. The later version - which also handles user-page modifications will be used after the initial page tables have been setup. [ hpa: removed a redundant #ifdef and made the new function __init. Also note that x86-32 already has such an early xen_write_cr3. ] Tested-by: "H. Peter Anvin" <hpa@zytor.com> Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1361579812-23709-1-git-send-email-konrad.wilk@oracle.com Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22x86-64: don't set the early IDT to point directly to 'early_idt_handler'Linus Torvalds
The code requires the use of the proper per-exception-vector stub functions (set up as the early_idt_handlers[] array - note the 's') that make sure to set up the error vector number. This is true regardless of whether CONFIG_EARLY_PRINTK is set or not. Why? The stack offset for the comparison of __KERNEL_CS won't be right otherwise, nor will the new check (from commit 8170e6bed465: "x86, 64bit: Use a #PF handler to materialize early mappings on demand") for the page fault exception vector. Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22sched: move RR_TIMESLICE from sysctl.h to rt.hClark Williams
Originally submitted by Clark Williams as part of a cleanup, but happens also to fix an ia64 build problem: arch/ia64/kernel/init_task.c:38: error: 'RR_TIMESLICE' undeclared here (not in a function) Signed-off-by: Clark Williams <clark.williams@gmail.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-02-21Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Peter Anvin: "This is a huge set of several partly interrelated (and concurrently developed) changes, which is why the branch history is messier than one would like. The *really* big items are two humonguous patchsets mostly developed by Yinghai Lu at my request, which completely revamps the way we create initial page tables. In particular, rather than estimating how much memory we will need for page tables and then build them into that memory -- a calculation that has shown to be incredibly fragile -- we now build them (on 64 bits) with the aid of a "pseudo-linear mode" -- a #PF handler which creates temporary page tables on demand. This has several advantages: 1. It makes it much easier to support things that need access to data very early (a followon patchset uses this to load microcode way early in the kernel startup). 2. It allows the kernel and all the kernel data objects to be invoked from above the 4 GB limit. This allows kdump to work on very large systems. 3. It greatly reduces the difference between Xen and native (Xen's equivalent of the #PF handler are the temporary page tables created by the domain builder), eliminating a bunch of fragile hooks. The patch series also gets us a bit closer to W^X. Additional work in this pull is the 64-bit get_user() work which you were also involved with, and a bunch of cleanups/speedups to __phys_addr()/__pa()." * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (105 commits) x86, mm: Move reserving low memory later in initialization x86, doc: Clarify the use of asm("%edx") in uaccess.h x86, mm: Redesign get_user with a __builtin_choose_expr hack x86: Be consistent with data size in getuser.S x86, mm: Use a bitfield to mask nuisance get_user() warnings x86/kvm: Fix compile warning in kvm_register_steal_time() x86-32: Add support for 64bit get_user() x86-32, mm: Remove reference to alloc_remap() x86-32, mm: Remove reference to resume_map_numa_kva() x86-32, mm: Rip out x86_32 NUMA remapping code x86/numa: Use __pa_nodebug() instead x86: Don't panic if can not alloc buffer for swiotlb mm: Add alloc_bootmem_low_pages_nopanic() x86, 64bit, mm: hibernate use generic mapping_init x86, 64bit, mm: Mark data/bss/brk to nx x86: Merge early kernel reserve for 32bit and 64bit x86: Add Crash kernel low reservation x86, kdump: Remove crashkernel range find limit for 64bit memblock: Add memblock_mem_size() x86, boot: Not need to check setup_header version for setup_data ...
2013-02-21Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Peter Anvin: "This is a corrected attempt at the x86/cpu branch, this time with the fixes in that makes it not break on KVM (current or past), or any other virtualizer which traps on this configuration. Again, the biggest change here is enabling the WC+ memory type on AMD processors, if the BIOS doesn't." * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, kvm: Add MSR_AMD64_BU_CFG2 to the list of ignored MSRs x86, cpu, amd: Fix WC+ workaround for older virtual hosts x86, AMD: Enable WC+ memory type on family 10 processors x86, AMD: Clean up init_amd() x86/process: Change %8s to %s for pr_warn() in release_thread() x86/cpu/hotplug: Remove CONFIG_EXPERIMENTAL dependency
2013-02-21Merge tag 'please-pull-misc-3.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull misc ia64 bits from Tony Luck. * tag 'please-pull-misc-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: MAINTAINERS: update SGI & ia64 Altix stuff sysctl: Enable IA64 "ignore-unaligned-usertrap" to be used cross-arch
2013-02-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 update from Martin Schwidefsky: "The most prominent change in this patch set is the software dirty bit patch for s390. It removes __HAVE_ARCH_PAGE_TEST_AND_CLEAR_DIRTY and the page_test_and_clear_dirty primitive which makes the common memory management code a bit less obscure. Heiko fixed most of the PCI related fallout, more often than not missing GENERIC_HARDIRQS dependencies. Notable is one of the 3270 patches which adds an export to tty_io to be able to resize a tty. The rest is the usual bunch of cleanups and bug fixes." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (42 commits) s390/module: Add missing R_390_NONE relocation type drivers/gpio: add missing GENERIC_HARDIRQ dependency drivers/input: add couple of missing GENERIC_HARDIRQS dependencies s390/cleanup: rename SPP to LPP s390/mm: implement software dirty bits s390/mm: Fix crst upgrade of mmap with MAP_FIXED s390/linker skript: discard exit.data at runtime drivers/media: add missing GENERIC_HARDIRQS dependency s390/bpf,jit: add vlan tag support drivers/net,AT91RM9200: add missing GENERIC_HARDIRQS dependency iucv: fix kernel panic at reboot s390/Kconfig: sort list of arch selected config options phylib: remove !S390 dependeny from Kconfig uio: remove !S390 dependency from Kconfig dasd: fix sysfs cleanup in dasd_generic_remove s390/pci: fix hotplug module init s390/pci: cleanup clp page allocation s390/pci: cleanup clp inline assembly s390/perf: cpum_cf: fallback to software sampling events s390/mm: provide PAGE_SHARED define ...
2013-02-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID subsystem updates from Jiri Kosina: "HID subsystem and drivers update. Highlights: - new support of a group of Win7/Win8 multitouch devices, from Benjamin Tissoires - fix for compat interface brokenness in uhid, from Dmitry Torokhov - conversion of drivers to use hid_driver helper, by H Hartley Sweeten - HID over I2C transport received ACPI enumeration support, written by Mika Westerberg - there is an ongoing effort to make HID sensor hubs independent of USB transport. The first self-contained part of this work is provided here, done by Mika Westerberg - a few smaller fixes here and there, support for a couple new devices added" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (43 commits) HID: Correct Logitech order in hid-ids.h HID: LG4FF: Remove unnecessary deadzone code HID: LG: Prevent the Logitech Gaming Wheels deadzone HID: LG: Fix detection of Logitech Speed Force Wireless (WiiWheel) HID: LG: Add support for Logitech Momo Force (Red) Wheel HID: hidraw: print message when succesfully initialized HID: logitech: split accel, brake for Driving Force wheel HID: logitech: add report descriptor for Driving Force wheel HID: add ThingM blink(1) USB RGB LED support HID: uhid: make creating devices work on 64/32 systems HID: wiimote: fix nunchuck button parser HID: blacklist Velleman data acquisition boards HID: sensor-hub: don't limit the driver only to USB bus HID: sensor-hub: get rid of unused sensor_hub_grabbed_usages[] table HID: extend autodetect to handle I2C sensors as well HID: ntrig: use input_configured() callback to set the name HID: multitouch: do not use pointers towards hid-core HID: add missing GENERIC_HARDIRQ dependency HID: multitouch: make MT_CLS_ALWAYS_TRUE the new default class HID: multitouch: fix protocol for Elo panels ...
2013-02-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "Assorted tiny fixes queued in trivial tree" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (22 commits) DocBook: update EXPORT_SYMBOL entry to point at export.h Documentation: update top level 00-INDEX file with new additions ARM: at91/ide: remove unsused at91-ide Kconfig entry percpu_counter.h: comment code for better readability x86, efi: fix comment typo in head_32.S IB: cxgb3: delay freeing mem untill entirely done with it net: mvneta: remove unneeded version.h include time: x86: report_lost_ticks doesn't exist any more pcmcia: avoid static analysis complaint about use-after-free fs/jfs: Fix typo in comment : 'how may' -> 'how many' of: add missing documentation for of_platform_populate() btrfs: remove unnecessary cur_trans set before goto loop in join_transaction sound: soc: Fix typo in sound/codecs treewide: Fix typo in various drivers btrfs: fix comment typos Update ibmvscsi module name in Kconfig. powerpc: fix typo (utilties -> utilities) of: fix spelling mistake in comment h8300: Fix home page URL in h8300/README xtensa: Fix home page URL in Kconfig ...
2013-02-21Merge branch 'akpm' (incoming from Andrew)Linus Torvalds
Merge misc patches from Andrew Morton: - Florian has vanished so I appear to have become fbdev maintainer again :( - Joel and Mark are distracted to welcome to the new OCFS2 maintainer - The backlight queue - Small core kernel changes - lib/ updates - The rtc queue - Various random bits * akpm: (164 commits) rtc: rtc-davinci: use devm_*() functions rtc: rtc-max8997: use devm_request_threaded_irq() rtc: rtc-max8907: use devm_request_threaded_irq() rtc: rtc-da9052: use devm_request_threaded_irq() rtc: rtc-wm831x: use devm_request_threaded_irq() rtc: rtc-tps80031: use devm_request_threaded_irq() rtc: rtc-lp8788: use devm_request_threaded_irq() rtc: rtc-coh901331: use devm_clk_get() rtc: rtc-vt8500: use devm_*() functions rtc: rtc-tps6586x: use devm_request_threaded_irq() rtc: rtc-imxdi: use devm_clk_get() rtc: rtc-cmos: use dev_warn()/dev_dbg() instead of printk()/pr_debug() rtc: rtc-pcf8583: use dev_warn() instead of printk() rtc: rtc-sun4v: use pr_warn() instead of printk() rtc: rtc-vr41xx: use dev_info() instead of printk() rtc: rtc-rs5c313: use pr_err() instead of printk() rtc: rtc-at91rm9200: use dev_dbg()/dev_err() instead of printk()/pr_debug() rtc: rtc-rs5c372: use dev_dbg()/dev_warn() instead of printk()/pr_debug() rtc: rtc-ds2404: use dev_err() instead of printk() rtc: rtc-efi: use dev_err()/dev_warn()/pr_err() instead of printk() ...
2013-02-21rtc: rtc-davinci: use devm_*() functionsJingoo Han
Use devm_*() functions to make cleanup paths more simple. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: rtc-max8997: use devm_request_threaded_irq()Jingoo Han
Use devm_request_threaded_irq() to make cleanup paths more simple. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: rtc-max8907: use devm_request_threaded_irq()Jingoo Han
Use devm_request_threaded_irq() to make cleanup paths more simple. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: rtc-da9052: use devm_request_threaded_irq()Jingoo Han
Use devm_request_threaded_irq() to make cleanup paths more simple. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: rtc-wm831x: use devm_request_threaded_irq()Jingoo Han
Use devm_request_threaded_irq() to make cleanup paths more simple. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: rtc-tps80031: use devm_request_threaded_irq()Jingoo Han
Use devm_request_threaded_irq() to make cleanup paths more simple. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: rtc-lp8788: use devm_request_threaded_irq()Jingoo Han
Use devm_request_threaded_irq() to make cleanup paths more simple. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: rtc-coh901331: use devm_clk_get()Jingoo Han
Use devm_clk_get() to make cleanup paths more simple. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: rtc-vt8500: use devm_*() functionsJingoo Han
Use devm_*() functions to make cleanup paths more simple. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: rtc-tps6586x: use devm_request_threaded_irq()Jingoo Han
Use devm_request_threaded_irq() to make cleanup paths more simple. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: rtc-imxdi: use devm_clk_get()Jingoo Han
Use devm_clk_get() to make cleanup paths more simple. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: rtc-cmos: use dev_warn()/dev_dbg() instead of printk()/pr_debug()Jingoo Han
Fix the checkpatch warning as below: WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: rtc-pcf8583: use dev_warn() instead of printk()Jingoo Han
Fix the checkpatch warning as below: WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: rtc-sun4v: use pr_warn() instead of printk()Jingoo Han
Fix the checkpatch warning as below: WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: rtc-vr41xx: use dev_info() instead of printk()Jingoo Han
Fix the checkpatch warning as below: WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: rtc-rs5c313: use pr_err() instead of printk()Jingoo Han
Fix the checkpatch warning as below: WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: rtc-at91rm9200: use dev_dbg()/dev_err() instead of printk()/pr_debug()Jingoo Han
Fix the checkpatch warning as below: WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: rtc-rs5c372: use dev_dbg()/dev_warn() instead of printk()/pr_debug()Jingoo Han
Fix the checkpatch warning as below: WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: rtc-ds2404: use dev_err() instead of printk()Jingoo Han
Fix the checkpatch warning as below: WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: rtc-efi: use dev_err()/dev_warn()/pr_err() instead of printk()Jingoo Han
Fix the checkpatch warnings as below: WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... WARNING: please, no space before tabs Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: max77686: use dev_info() instead of printk()Jingoo Han
Fix the checkpatch warning as below: WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21rtc: use dev_warn()/dev_dbg()/pr_err() instead of printk()Jingoo Han
Fix the checkpatch warning as below: WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21drivers/rtc/rtc-sa1100.c: move clock enable/disable to probe/removeChao Xie
The original sa1100_rtc_open/sa1100_rtc_release will be called when the /dev/rtc0 is opened or closed. In fact, these two functions will enable/disable the clock. Disabling clock will make rtc not work. So only enable/disable clock when probe/remove the device. Signed-off-by: Chao Xie <chao.xie@marvell.com> Acked-by: Haojian Zhuang <haojian.zhuang@gmail.com> Cc: Leo Song <liangs@marvell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>