Age | Commit message (Collapse) | Author |
|
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Guest visible debug register and hardware visible debug registers are
same, so ther is no need to have arch->shadow_dbg_reg, instead use
arch->dbg_reg.
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
This patch adds "rfdi" instruction emulation which is required for
guest debug hander on BOOKE-HV
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
This reverts commit c822e73731fce3b49a4887140878d084d8a44c08.
This commit conflicted with a bitmap allocator change that partially
accomplishes the same thing, but which does so more correctly. Revert
this one until it can be respun on top of the correct change.
Signed-off-by: Scott Wood <scottwood@freescale.com>
|
|
The nohz full code needs irq work to trigger its own interrupt so that
the subsystem can work even when the tick is stopped.
Lets introduce arch_irq_work_has_interrupt() that archs can override to
tell about their support for this ability.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
|
|
This patch wires up three new syscalls for powerpc. The three
new syscalls are seccomp, getrandom and memfd_create.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
|
|
ABIv2 kernels are failing to backtrace through the kernel. An example:
39.30% readseek2_proce [kernel.kallsyms] [k] find_get_entry
|
--- find_get_entry
__GI___libc_read
The problem is in valid_next_sp() where we check that the new stack
pointer is at least STACK_FRAME_OVERHEAD below the previous one.
ABIv1 has a minimum stack frame size of 112 bytes consisting of 48 bytes
and 64 bytes of parameter save area. ABIv2 changes that to 32 bytes
with no paramter save area.
STACK_FRAME_OVERHEAD is in theory the minimum stack frame size,
but we over 240 uses of it, some of which assume that it includes
space for the parameter area.
We need to work through all our stack defines and rationalise them
but let's fix perf now by creating STACK_FRAME_MIN_SIZE and using
in valid_next_sp(). This fixes the issue:
30.64% readseek2_proce [kernel.kallsyms] [k] find_get_entry
|
--- find_get_entry
pagecache_get_page
generic_file_read_iter
new_sync_read
vfs_read
sys_read
syscall_exit
__GI___libc_read
Cc: stable@vger.kernel.org # 3.16+
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
|
|
Since commit 469d62be9263b92f2c3329540cbb1c076111f4f3, SPRG2 is used as a
scratch register just like SPRG0 and SPRG1. So Declare it as such and fix
the comment which is not valid anymore since that commit.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
|
|
Allocate msis such that each time a new interrupt is requested,
the SRS (MSIR register select) to be used is allocated in a
round-robin fashion.
The end result is that the msi interrupts will be spread across
distinct MSIRs with the main benefit that now users can set
affinity to each msi int through the mpic irq backing up the
MSIR register.
This is achieved with the help of a newly introduced msi bitmap
api that allows specifying the starting point when searching
for a free msi interrupt.
Signed-off-by: Laurentiu Tudor <Laurentiu.Tudor@freescale.com>
Cc: Scott Wood <scottwood@freescale.com>
Cc: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
|
|
Platform code can call limit_zone_pfn() to set appropriate limits
for ZONE_DMA and ZONE_DMA32, and dma_direct_alloc_coherent() will
select a suitable zone based on a device's mask and the pfn limits that
platform code has configured.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
|
|
In the beggining was on_each_cpu(), which required an unused argument to
kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten.
Remove unnecessary arguments that stem from this.
Signed-off-by: Radim KrÄmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Using static inline is going to save few bytes and cycles.
For example on powerpc, the difference is 700 B after stripping.
(5 kB before)
This patch also deals with two overlooked empty functions:
kvm_arch_flush_shadow was not removed from arch/mips/kvm/mips.c
2df72e9bc KVM: split kvm_arch_flush_shadow
and kvm_arch_sched_in never made it into arch/ia64/kvm/kvm-ia64.c.
e790d9ef6 KVM: add kvm_arch_sched_in
Signed-off-by: Radim KrÄmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Opaque KVM structs are useful for prototypes in asm/kvm_host.h, to avoid
"'struct foo' declared inside parameter list" warnings (and consequent
breakage due to conflicting types).
Move them from individual files to a generic place in linux/kvm_types.h.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This reverts commit 5828f666c069af74e00db21559f1535103c9f79a due to
build failure after merging with pending powerpc changes.
Link: http://lkml.kernel.org/g/20140827142243.6277eaff@canb.auug.org.au
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x). This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.
Other use cases are for storing and retrieving data from the current
processors percpu area. __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.
__get_cpu_var() is defined as :
#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.
this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.
This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset. Thereby address calculations are avoided and less registers
are used when code is generated.
At the end of the patch set all uses of __get_cpu_var have been removed so
the macro is removed too.
The patch set includes passes over all arches as well. Once these operations
are used throughout then specialized macros can be defined in non -x86
arches as well in order to optimize per cpu access by f.e. using a global
register that may be set to the per cpu base.
Transformations done to __get_cpu_var()
1. Determine the address of the percpu instance of the current processor.
DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(&y);
2. Same as #1 but this time an array structure is involved.
DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(y);
3. Retrieve the content of the current processors instance of a per cpu
variable.
DEFINE_PER_CPU(int, y);
int x = __get_cpu_var(y)
Converts to
int x = __this_cpu_read(y);
4. Retrieve the content of a percpu struct
DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);
Converts to
memcpy(&x, this_cpu_ptr(&y), sizeof(x));
5. Assignment to a per cpu variable
DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;
Converts to
__this_cpu_write(y, x);
6. Increment/Decrement etc of a per cpu variable
DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++
Converts to
__this_cpu_inc(y)
tj: Folded a fix patch.
http://lkml.kernel.org/g/alpine.DEB.2.11.1408172143020.9652@gentwo.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x). This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.
Other use cases are for storing and retrieving data from the current
processors percpu area. __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.
__get_cpu_var() is defined as :
#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.
this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.
This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset. Thereby address calculations are avoided and less registers
are used when code is generated.
At the end of the patch set all uses of __get_cpu_var have been removed so
the macro is removed too.
The patch set includes passes over all arches as well. Once these operations
are used throughout then specialized macros can be defined in non -x86
arches as well in order to optimize per cpu access by f.e. using a global
register that may be set to the per cpu base.
Transformations done to __get_cpu_var()
1. Determine the address of the percpu instance of the current processor.
DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(&y);
2. Same as #1 but this time an array structure is involved.
DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(y);
3. Retrieve the content of the current processors instance of a per cpu
variable.
DEFINE_PER_CPU(int, y);
int x = __get_cpu_var(y)
Converts to
int x = __this_cpu_read(y);
4. Retrieve the content of a percpu struct
DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);
Converts to
memcpy(&x, this_cpu_ptr(&y), sizeof(x));
5. Assignment to a per cpu variable
DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;
Converts to
__this_cpu_write(y, x);
6. Increment/Decrement etc of a per cpu variable
DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++
Converts to
__this_cpu_inc(y)
CC: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Acked-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Many of the atomic op implementations are the same except for one
instruction; fold the lot into a few CPP macros and reduce LoC.
Requires asm_op because PPC asm is weird :-)
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20140508135852.713980957@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
On ppc64 we support 4K hash pte with 64K page size. That requires
us to track the hash pte slot information on a per 4k basis. We do that
by storing the slot details in the second half of pte page. The pte bit
_PAGE_COMBO is used to indicate whether the second half need to be
looked while building real_pte. We need to use read memory barrier while
doing that so that load of hidx is not reordered w.r.t _PAGE_COMBO
check. On the store side we already do a lwsync in __hash_page_4K
CC: <stable@vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
If we changed base page size of the segment, either via sub_page_protect
or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash
table entries. We do a lazy hash page table flush for all mapped pages
in the demoted segment. This happens when we handle hash page fault for
these pages.
We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a
pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte,
that implies that we could possibly have older 64K hash pte entries in
the hash page table and we need to invalidate those entries.
Use _PAGE_COMBO to determine the page size with which we should
invalidate the hash table entries on unmap.
CC: <stable@vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The segment identifier and segment size will remain the same in
the loop, So we can compute it outside. We also change the
hugepage_invalidate interface so that we can use it the later patch
CC: <stable@vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
It appears that commits 7f06f21d40a6 ("powerpc/tm: Add checking to
treclaim/trechkpt") and e4e38121507a ("KVM: PPC: Book3S HV: Add
transactional memory support") both added definitions of TEXASR_FS.
Remove one of them. At the same time, fix the alignment of the remaining
definition (should be tab-separated like the rest of the #defines).
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
PowerNV platform is capable of capturing host memory region when system
crashes (because of host/firmware). We have new OPAL API to register/
unregister memory region to be captured when system crashes.
This patch adds support for new API. Also during boot time we register
kernel log buffer and unregister before doing kexec.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
We have been a bit slack about updating the CPU_FTRS_POSSIBLE and
CPU_FTRS_ALWAYS masks. When we added POWER8, and also POWER8E we forgot
to update the ALWAYS mask. And when we added POWER8_DD1 we forgot to
update both the POSSIBLE and ALWAYS masks.
Luckily this hasn't caused any actual bugs AFAICS. Failing to update the
ALWAYS mask just forgoes a potential optimisation opportunity. Failing
to update the POSSIBLE mask for POWER8_DD1 is also OK because it only
removes a bit rather than adding any.
Regardless they should all be in both masks so as to avoid any future
bugs when the set of ALWAYS/POSSIBLE bits changes, or the masks
themselves change.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Michael Neuling <mikey@neuling.org>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The kernel defines the function spin_is_locked(), which can be used to
check if a spinlock is currently locked.
Using spin_is_locked() on a lock you don't hold is obviously racy. That
is, even though you may observe that the lock is unlocked, it may become
locked at any time.
There is (at least) one exception to that, which is if two locks are
used as a pair, and the holder of each checks the status of the other
before doing any update.
Assuming *A and *B are two locks, and *COUNTER is a shared non-atomic
value:
The first CPU does:
spin_lock(*A)
if spin_is_locked(*B)
# nothing
else
smp_mb()
LOAD r = *COUNTER
r++
STORE *COUNTER = r
spin_unlock(*A)
And the second CPU does:
spin_lock(*B)
if spin_is_locked(*A)
# nothing
else
smp_mb()
LOAD r = *COUNTER
r++
STORE *COUNTER = r
spin_unlock(*B)
Although this is a strange locking construct, it should work.
It seems to be understood, but not documented, that spin_is_locked() is
not a memory barrier, so in the examples above and below the caller
inserts its own memory barrier before acting on the result of
spin_is_locked().
For now we assume spin_is_locked() is implemented as below, and we break
it out in our examples:
bool spin_is_locked(*LOCK) {
LOAD l = *LOCK
return l.locked
}
Our intuition is that there should be no problem even if the two code
sequences run simultaneously such as:
CPU 0 CPU 1
==================================================
spin_lock(*A) spin_lock(*B)
LOAD b = *B LOAD a = *A
if b.locked # true if a.locked # true
# nothing # nothing
spin_unlock(*A) spin_unlock(*B)
If one CPU gets the lock before the other then it will do the update and
the other CPU will back off:
CPU 0 CPU 1
==================================================
spin_lock(*A)
LOAD b = *B
spin_lock(*B)
if b.locked # false LOAD a = *A
else if a.locked # true
smp_mb() # nothing
LOAD r1 = *COUNTER spin_unlock(*B)
r1++
STORE *COUNTER = r1
spin_unlock(*A)
However in reality spin_lock() itself is not indivisible. On powerpc we
implement it as a load-and-reserve and store-conditional.
Ignoring the retry logic for the lost reservation case, it boils down to:
spin_lock(*LOCK) {
LOAD l = *LOCK
l.locked = true
STORE *LOCK = l
ACQUIRE_BARRIER
}
The ACQUIRE_BARRIER is required to give spin_lock() ACQUIRE semantics as
defined in memory-barriers.txt:
This acts as a one-way permeable barrier. It guarantees that all
memory operations after the ACQUIRE operation will appear to happen
after the ACQUIRE operation with respect to the other components of
the system.
On modern powerpc systems we use lwsync for ACQUIRE_BARRIER. lwsync is
also know as "lightweight sync", or "sync 1".
As described in Power ISA v2.07 section B.2.1.1, in this scenario the
lwsync is not the barrier itself. It instead causes the LOAD of *LOCK to
act as the barrier, preventing any loads or stores in the locked region
from occurring prior to the load of *LOCK.
Whether this behaviour is in accordance with the definition of ACQUIRE
semantics in memory-barriers.txt is open to discussion, we may switch to
a different barrier in future.
What this means in practice is that the following can occur:
CPU 0 CPU 1
==================================================
LOAD a = *A LOAD b = *B
a.locked = true b.locked = true
LOAD b = *B LOAD a = *A
STORE *A = a STORE *B = b
if b.locked # false if a.locked # false
else else
smp_mb() smp_mb()
LOAD r1 = *COUNTER LOAD r2 = *COUNTER
r1++ r2++
STORE *COUNTER = r1
STORE *COUNTER = r2 # Lost update
spin_unlock(*A) spin_unlock(*B)
That is, the load of *B can occur prior to the store that makes *A
visibly locked. And similarly for CPU 1. The result is both CPUs hold
their lock and believe the other lock is unlocked.
The easiest fix for this is to add a full memory barrier to the start of
spin_is_locked(), so adding to our previous definition would give us:
bool spin_is_locked(*LOCK) {
smp_mb()
LOAD l = *LOCK
return l.locked
}
The new barrier orders the store to the lock we are locking vs the load
of the other lock:
CPU 0 CPU 1
==================================================
LOAD a = *A LOAD b = *B
a.locked = true b.locked = true
STORE *A = a STORE *B = b
smp_mb() smp_mb()
LOAD b = *B LOAD a = *A
if b.locked # true if a.locked # true
# nothing # nothing
spin_unlock(*A) spin_unlock(*B)
Although the above example is theoretical, there is code similar to this
example in sem_lock() in ipc/sem.c. This commit in addition to the next
commit appears to be a fix for crashes we are seeing in that code where
we believe this race happens in practice.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Merge more incoming from Andrew Morton:
"Two new syscalls:
memfd_create in "shm: add memfd_create() syscall"
kexec_file_load in "kexec: implementation of new syscall kexec_file_load"
And:
- Most (all?) of the rest of MM
- Lots of the usual misc bits
- fs/autofs4
- drivers/rtc
- fs/nilfs
- procfs
- fork.c, exec.c
- more in lib/
- rapidio
- Janitorial work in filesystems: fs/ufs, fs/reiserfs, fs/adfs,
fs/cramfs, fs/romfs, fs/qnx6.
- initrd/initramfs work
- "file sealing" and the memfd_create() syscall, in tmpfs
- add pci_zalloc_consistent, use it in lots of places
- MAINTAINERS maintenance
- kexec feature work"
* emailed patches from Andrew Morton <akpm@linux-foundation.org: (193 commits)
MAINTAINERS: update nomadik patterns
MAINTAINERS: update usb/gadget patterns
MAINTAINERS: update DMA BUFFER SHARING patterns
kexec: verify the signature of signed PE bzImage
kexec: support kexec/kdump on EFI systems
kexec: support for kexec on panic using new system call
kexec-bzImage64: support for loading bzImage using 64bit entry
kexec: load and relocate purgatory at kernel load time
purgatory: core purgatory functionality
purgatory/sha256: provide implementation of sha256 in purgaotory context
kexec: implementation of new syscall kexec_file_load
kexec: new syscall kexec_file_load() declaration
kexec: make kexec_segment user buffer pointer a union
resource: provide new functions to walk through resources
kexec: use common function for kimage_normal_alloc() and kimage_crash_alloc()
kexec: move segment verification code in a separate function
kexec: rename unusebale_pages to unusable_pages
kernel: build bin2c based on config option CONFIG_BUILD_BIN2C
bin2c: move bin2c in scripts/basic
shm: wait for pins to be released when sealing
...
|
|
The core mm code will provide a default gate area based on
FIXADDR_USER_START and FIXADDR_USER_END if
!defined(__HAVE_ARCH_GATE_AREA) && defined(AT_SYSINFO_EHDR).
This default is only useful for ia64. arm64, ppc, s390, sh, tile, 64-bit
UML, and x86_32 have their own code just to disable it. arm, 32-bit UML,
and x86_64 have gate areas, but they have their own implementations.
This gets rid of the default and moves the code into ia64.
This should save some code on architectures without a gate area: it's now
possible to inline the gate_area functions in the default case.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Nathan Lynch <nathan_lynch@mentor.com>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [in principle]
Acked-by: Richard Weinberger <richard@nod.at> [for um]
Acked-by: Will Deacon <will.deacon@arm.com> [for arm64]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Nathan Lynch <Nathan_Lynch@mentor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Rather than have architectures #define ARCH_HAS_SG_CHAIN in an
architecture specific scatterlist.h, make it a proper Kconfig option and
use that instead. At same time, remove the header files are are now
mostly useless and just include asm-generic/scatterlist.h.
[sfr@canb.auug.org.au: powerpc files now need asm/dma.h]
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86]
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [powerpc]
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull second round of KVM changes from Paolo Bonzini:
"Here are the PPC and ARM changes for KVM, which I separated because
they had small conflicts (respectively within KVM documentation, and
with 3.16-rc changes). Since they were all within the subsystem, I
took care of them.
Stephen Rothwell reported some snags in PPC builds, but they are all
fixed now; the latest linux-next report was clean.
New features for ARM include:
- KVM VGIC v2 emulation on GICv3 hardware
- Big-Endian support for arm/arm64 (guest and host)
- Debug Architecture support for arm64 (arm32 is on Christoffer's todo list)
And for PPC:
- Book3S: Good number of LE host fixes, enable HV on LE
- Book3S HV: Add in-guest debug support
This release drops support for KVM on the PPC440. As a result, the
PPC merge removes more lines than it adds. :)
I also included an x86 change, since Davidlohr tied it to an
independent bug report and the reporter quickly provided a Tested-by;
there was no reason to wait for -rc2"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (122 commits)
KVM: Move more code under CONFIG_HAVE_KVM_IRQFD
KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use
KVM: nVMX: Fix nested vmexit ack intr before load vmcs01
KVM: PPC: Enable IRQFD support for the XICS interrupt controller
KVM: Give IRQFD its own separate enabling Kconfig option
KVM: Move irq notifier implementation into eventfd.c
KVM: Move all accesses to kvm::irq_routing into irqchip.c
KVM: irqchip: Provide and use accessors for irq routing table
KVM: Don't keep reference to irq routing table in irqfd struct
KVM: PPC: drop duplicate tracepoint
arm64: KVM: fix 64bit CP15 VM access for 32bit guests
KVM: arm64: GICv3: mandate page-aligned GICV region
arm64: KVM: GICv3: move system register access to msr_s/mrs_s
KVM: PPC: PR: Handle FSCR feature deselects
KVM: PPC: HV: Remove generic instruction emulation
KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr
KVM: PPC: Remove DCR handling
KVM: PPC: Expose helper functions for data/inst faults
KVM: PPC: Separate loadstore emulation from priv emulation
KVM: PPC: Handle magic page in kvmppc_ld/st
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Ben Herrenschmidt:
"This is the powerpc new goodies for 3.17. The short story:
The biggest bit is Michael removing all of pre-POWER4 processor
support from the 64-bit kernel. POWER3 and rs64. This gets rid of a
ton of old cruft that has been bitrotting in a long while. It was
broken for quite a few versions already and nobody noticed. Nobody
uses those machines anymore. While at it, he cleaned up a bunch of
old dusty cabinets, getting rid of a skeletton or two.
Then, we have some base VFIO support for KVM, which allows assigning
of PCI devices to KVM guests, support for large 64-bit BARs on
"powernv" platforms, support for HMI (Hardware Management Interrupts)
on those same platforms, some sparse-vmemmap improvements (for memory
hotplug),
There is the usual batch of Freescale embedded updates (summary in the
merge commit) and fixes here or there, I think that's it for the
highlights"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (102 commits)
powerpc/eeh: Export eeh_iommu_group_to_pe()
powerpc/eeh: Add missing #ifdef CONFIG_IOMMU_API
powerpc: Reduce scariness of interrupt frames in stack traces
powerpc: start loop at section start of start in vmemmap_populated()
powerpc: implement vmemmap_free()
powerpc: implement vmemmap_remove_mapping() for BOOK3S
powerpc: implement vmemmap_list_free()
powerpc: Fail remap_4k_pfn() if PFN doesn't fit inside PTE
powerpc/book3s: Fix endianess issue for HMI handling on napping cpus.
powerpc/book3s: handle HMIs for cpus in nap mode.
powerpc/powernv: Invoke opal call to handle hmi.
powerpc/book3s: Add basic infrastructure to handle HMI in Linux.
powerpc/iommu: Fix comments with it_page_shift
powerpc/powernv: Handle compound PE in config accessors
powerpc/powernv: Handle compound PE for EEH
powerpc/powernv: Handle compound PE
powerpc/powernv: Split ioda_eeh_get_state()
powerpc/powernv: Allow to freeze PE
powerpc/powernv: Enable M64 aperatus for PHB3
powerpc/eeh: Aux PE data for error log
...
|
|
Patch queue for ppc - 2014-08-01
Highlights in this release include:
- BookE: Rework instruction fetch, not racy anymore now
- BookE HV: Fix ONE_REG accessors for some in-hardware registers
- Book3S: Good number of LE host fixes, enable HV on LE
- Book3S: Some misc bug fixes
- Book3S HV: Add in-guest debug support
- Book3S HV: Preload cache lines on context switch
- Remove 440 support
Alexander Graf (31):
KVM: PPC: Book3s PR: Disable AIL mode with OPAL
KVM: PPC: Book3s HV: Fix tlbie compile error
KVM: PPC: Book3S PR: Handle hyp doorbell exits
KVM: PPC: Book3S PR: Fix ABIv2 on LE
KVM: PPC: Book3S PR: Fix sparse endian checks
PPC: Add asm helpers for BE 32bit load/store
KVM: PPC: Book3S HV: Make HTAB code LE host aware
KVM: PPC: Book3S HV: Access guest VPA in BE
KVM: PPC: Book3S HV: Access host lppaca and shadow slb in BE
KVM: PPC: Book3S HV: Access XICS in BE
KVM: PPC: Book3S HV: Fix ABIv2 on LE
KVM: PPC: Book3S HV: Enable for little endian hosts
KVM: PPC: Book3S: Move vcore definition to end of kvm_arch struct
KVM: PPC: Deflect page write faults properly in kvmppc_st
KVM: PPC: Book3S: Stop PTE lookup on write errors
KVM: PPC: Book3S: Add hack for split real mode
KVM: PPC: Book3S: Make magic page properly 4k mappable
KVM: PPC: Remove 440 support
KVM: Rename and add argument to check_extension
KVM: Allow KVM_CHECK_EXTENSION on the vm fd
KVM: PPC: Book3S: Provide different CAPs based on HV or PR mode
KVM: PPC: Implement kvmppc_xlate for all targets
KVM: PPC: Move kvmppc_ld/st to common code
KVM: PPC: Remove kvmppc_bad_hva()
KVM: PPC: Use kvm_read_guest in kvmppc_ld
KVM: PPC: Handle magic page in kvmppc_ld/st
KVM: PPC: Separate loadstore emulation from priv emulation
KVM: PPC: Expose helper functions for data/inst faults
KVM: PPC: Remove DCR handling
KVM: PPC: HV: Remove generic instruction emulation
KVM: PPC: PR: Handle FSCR feature deselects
Alexey Kardashevskiy (1):
KVM: PPC: Book3S: Fix LPCR one_reg interface
Aneesh Kumar K.V (4):
KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation
KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
KVM: PPC: BOOK3S: PR: Emulate instruction counter
KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page
Anton Blanchard (2):
KVM: PPC: Book3S HV: Fix ABIv2 indirect branch issue
KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC()
Bharat Bhushan (10):
kvm: ppc: bookehv: Added wrapper macros for shadow registers
kvm: ppc: booke: Use the shared struct helpers of SRR0 and SRR1
kvm: ppc: booke: Use the shared struct helpers of SPRN_DEAR
kvm: ppc: booke: Add shared struct helpers of SPRN_ESR
kvm: ppc: booke: Use the shared struct helpers for SPRN_SPRG0-7
kvm: ppc: Add SPRN_EPR get helper function
kvm: ppc: bookehv: Save restore SPRN_SPRG9 on guest entry exit
KVM: PPC: Booke-hv: Add one reg interface for SPRG9
KVM: PPC: Remove comment saying SPRG1 is used for vcpu pointer
KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr
Michael Neuling (1):
KVM: PPC: Book3S HV: Add H_SET_MODE hcall handling
Mihai Caraman (8):
KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
KVM: PPC: e500: Fix default tlb for victim hint
KVM: PPC: e500: Emulate power management control SPR
KVM: PPC: e500mc: Revert "add load inst fixup"
KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1
KVM: PPC: Book3s: Remove kvmppc_read_inst() function
KVM: PPC: Allow kvmppc_get_last_inst() to fail
KVM: PPC: Bookehv: Get vcpu's last instruction for emulation
Paul Mackerras (4):
KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling
KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled
KVM: PPC: Book3S PR: Take SRCU read lock around RTAS kvm_read_guest() call
KVM: PPC: Book3S: Make kvmppc_ld return a more accurate error indication
Stewart Smith (2):
Split out struct kvmppc_vcore creation to separate function
Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8
Conflicts:
Documentation/virtual/kvm/api.txt
|
|
remap_4k_pfn() silently truncates upper bits of input 4K PFN
if it cannot be contained in PTE. This leads invalid memory mapping and could
result in a system crash when the memory is accessed. This patch fails
remap_4k_pfn() and returns -EINVAL if the input 4K PFN cannot be contained in
PTE.
V3 : Added parentheses to protect 'pfn' and entire macro as suggested by Brian.
V2 : Rewritten to avoid helper function as suggested by Stephen Rothwell.
Signed-off-by: Madhusudanan Kandasamy <kmadhu@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
When we hit the HMI in Linux, invoke opal call to handle/recover from HMI
errors in real mode and then in virtual mode during check_irq_replay()
invoke opal_poll_events()/opal_do_notifier() to retrieve HMI event from
OPAL and act accordingly.
Now that we are ready to handle HMI interrupt directly in linux, remove
the HMI interrupt registration with firmware.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Handle Hypervisor Maintenance Interrupt (HMI) in Linux. This patch implements
basic infrastructure to handle HMI in Linux host. The design is to invoke
opal handle hmi in real mode for recovery and set irq_pending when we hit HMI.
During check_irq_replay pull opal hmi event and print hmi info on console.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The patch synchronizes header file with firmware to have new OPAL
API opal_pci_eeh_freeze_set(), which is used to freeze the specified
PE in order to support "compound" PE.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This patch enables M64 aperatus for PHB3.
We already had platform hook (ppc_md.pcibios_window_alignment) to affect
the PCI resource assignment done in PCI core so that each PE's M32 resource
was built on basis of M32 segment size. Similarly, we're using that for
M64 assignment on basis of M64 segment size.
* We're using last M64 BAR to cover M64 aperatus, and it's shared by all
256 PEs.
* We don't support P7IOC yet. However, some function callbacks are added
to (struct pnv_phb) so that we can reuse them on P7IOC in future.
* PE, corresponding to PCI bus with large M64 BAR device attached, might
span multiple M64 segments. We introduce "compound" PE to cover the case.
The compound PE is a list of PEs and the master PE is used as before.
The slave PEs are just for MMIO isolation.
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The patch allows PE (struct eeh_pe) instance to have auxillary data,
whose size is configurable on basis of platform. For PowerNV, the
auxillary data will be used to cache PHB diag-data for that PE
(frozen PE or fenced PHB). In turn, we can retrieve the diag-data
at any later points.
It's useful for the case of VFIO PCI devices where the error log
should be cached, and then be retrieved by the guest at later point.
Also, it can avoid PHB diag-data overwritting if another frozen PE
reported and the previous diag-data isn't fetched by guest.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
It's followup of commit ddf0322a ("powerpc/powernv: Fix endianness
problems in EEH"). The patch helps to get non-endian-dependent
diag-data.
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
According to the experiment I did, PCI config access is blocked
on P7IOC frozen PE by hardware, but PHB3 doesn't do that. That
means we always get 0xFF's while dumping PCI config space of the
frozen PE on P7IOC. We don't have the problem on PHB3. So we have
to enable I/O prioir to collecting error log. Otherwise, meaningless
0xFF's are always returned.
The patch fixes it by EEH flag (EEH_ENABLE_IO_FOR_LOG), which is
selectively set to indicate the case for: P7IOC on PowerNV platform,
pSeries platform.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
There are multiple global EEH flags. Almost each flag has its own
accessor, which doesn't make sense. The patch refactors EEH flag
accessors so that they look unified:
eeh_add_flag(): Add EEH flag
eeh_clear_flag(): Clear EEH flag
eeh_has_flag(): Check if one specific flag has been set
eeh_enabled(): Check if EEH functionality has been enabled
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The patch exports functions to be used by new VFIO ioctl command,
which will be introduced in subsequent patch, to support EEH
functinality for VFIO PCI devices.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
We must not handle EEH error on devices which are passed to somebody
else. Instead, we expect that the frozen device owner detects an EEH
error and recovers from it.
This avoids EEH error handling on passed through devices so the device
owner gets a chance to handle them.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Scott writes:
Highlights include e6500 hardware threading support, an e6500 TLB erratum
workaround, corenet error reporting, support for a new board, and some
minor fixes.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"The main changes in this cycle are:
- big rtmutex and futex cleanup and robustification from Thomas
Gleixner
- mutex optimizations and refinements from Jason Low
- arch_mutex_cpu_relax() removal and related cleanups
- smaller lockdep tweaks"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
arch, locking: Ciao arch_mutex_cpu_relax()
locking/lockdep: Only ask for /proc/lock_stat output when available
locking/mutexes: Optimize mutex trylock slowpath
locking/mutexes: Try to acquire mutex only if it is unlocked
locking/mutexes: Delete the MUTEX_SHOW_NO_WAITER macro
locking/mutexes: Correct documentation on mutex optimistic spinning
rtmutex: Make the rtmutex tester depend on BROKEN
futex: Simplify futex_lock_pi_atomic() and make it more robust
futex: Split out the first waiter attachment from lookup_pi_state()
futex: Split out the waiter check from lookup_pi_state()
futex: Use futex_top_waiter() in lookup_pi_state()
futex: Make unlock_pi more robust
rtmutex: Avoid pointless requeueing in the deadlock detection chain walk
rtmutex: Cleanup deadlock detector debug logic
rtmutex: Confine deadlock logic to futex
rtmutex: Simplify remove_waiter()
rtmutex: Document pi chain walk
rtmutex: Clarify the boost/deboost part
rtmutex: No need to keep task ref for lock owner check
rtmutex: Simplify and document try_to_take_rtmutex()
...
|
|
We handle FSCR feature bits (well, TAR only really today) lazily when the guest
starts using them. So when a guest activates the bit and later uses that feature
we enable it for real in hardware.
However, when the guest stops using that bit we don't stop setting it in
hardware. That means we can potentially lose a trap that the guest expects to
happen because it thinks a feature is not active.
This patch adds support to drop TAR when then guest turns it off in FSCR. While
at it it also restricts FSCR access to 64bit systems - 32bit ones don't have it.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
This are not specific to e500hv but applicable for bookehv
(As per comment from Scott Wood on my patch
"kvm: ppc: bookehv: Added wrapper macros for shadow registers")
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
The general idea is that each core will release all of its
threads into the secondary thread startup code, which will
eventually wait in the secondary core holding area, for the
appropriate bit in the PACA to be set. The kick_cpu function
pointer will set that bit in the PACA, and thus "release"
the core/thread to boot. We also need to do a few things that
U-Boot normally does for CPUs (like enable branch prediction).
Signed-off-by: Andy Fleming <afleming@freescale.com>
[scottwood@freescale.com: various changes, including only enabling
threads if Linux wants to kick them]
Signed-off-by: Scott Wood <scottwood@freescale.com>
|
|
This ensures that all MSR definitions are consistently unsigned long,
and that MSR_CM does not become 0xffffffff80000000 (this is usually
harmless because MSR is 32-bit on booke and is mainly noticeable when
debugging, but still I'd rather avoid it).
Signed-off-by: Scott Wood <scottwood@freescale.com>
|
|
DCR handling was only needed for 440 KVM. Since we removed it, we can also
remove handling of DCR accesses.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We're going to implement guest code interpretation in KVM for some rare
corner cases. This code needs to be able to inject data and instruction
faults into the guest when it encounters them.
Expose generic APIs to do this in a reasonably subarch agnostic fashion.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Today the instruction emulator can get called via 2 separate code paths. It
can either be called by MMIO emulation detection code or by privileged
instruction traps.
This is bad, as both code paths prepare the environment differently. For MMIO
emulation we already know the virtual address we faulted on, so instructions
there don't have to actually fetch that information.
Split out the two separate use cases into separate files.
Signed-off-by: Alexander Graf <agraf@suse.de>
|