summaryrefslogtreecommitdiff
path: root/arch/powerpc/include
AgeCommit message (Collapse)Author
2019-03-02powerpc/64s: Fix unrelocated interrupt trampoline address testNicholas Piggin
The recent commit got this test wrong, it declared the assembler symbols the wrong way, and also used the wrong symbol name (xxx_start rather than start_xxx, see asm/head-64.h). Fixes: ccd477028a ("powerpc/64s: Fix HV NMI vs HV interrupt recoverability test") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-03-01KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char()Suraj Jitindar Singh
Add KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST & KVM_PPC_CPU_BEHAV_FLUSH_COUNT_CACHE to the characteristics returned from the H_GET_CPU_CHARACTERISTICS H-CALL, as queried from either the hypervisor or the device tree. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-02-27KVM: PPC: Fix compilation when KVM is not enabledPaul Mackerras
Compiling with CONFIG_PPC_POWERNV=y and KVM disabled currently gives an error like this: CC arch/powerpc/kernel/dbell.o In file included from arch/powerpc/kernel/dbell.c:20:0: arch/powerpc/include/asm/kvm_ppc.h: In function ‘xics_on_xive’: arch/powerpc/include/asm/kvm_ppc.h:625:9: error: implicit declaration of function ‘xive_enabled’ [-Werror=implicit-function-declaration] return xive_enabled() && cpu_has_feature(CPU_FTR_HVMODE); ^ cc1: all warnings being treated as errors scripts/Makefile.build:276: recipe for target 'arch/powerpc/kernel/dbell.o' failed make[3]: *** [arch/powerpc/kernel/dbell.o] Error 1 Fix this by making the xics_on_xive() definition conditional on the same symbol (CONFIG_KVM_BOOK3S_64_HANDLER) that determines whether we include <asm/xive.h> or not, since that's the header that defines xive_enabled(). Fixes: 03f953329bd8 ("KVM: PPC: Book3S: Allow XICS emulation to work in nested hosts using XIVE") Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-02-26powerpc/powernv: move OPAL call wrapper tracing and interrupt handling to CNicholas Piggin
The OPAL call wrapper gets interrupt disabling wrong. It disables interrupts just by clearing MSR[EE], which has two problems: - It doesn't call into the IRQ tracing subsystem, which means tracing across OPAL calls does not always notice IRQs have been disabled. - It doesn't go through the IRQ soft-mask code, which causes a minor bug. MSR[EE] can not be restored by saving the MSR then clearing MSR[EE], because a racing interrupt while soft-masked could clear MSR[EE] between the two steps. This can cause MSR[EE] to be incorrectly enabled when the OPAL call returns. Fortunately that should only result in another masked interrupt being taken to disable MSR[EE] again, but it's a bit sloppy. The existing code also saves MSR to PACA, which is not re-entrant if there is a nested OPAL call from different MSR contexts, which can happen these days with SRESET interrupts on bare metal. To fix these issues, move the tracing and IRQ handling code to C, and call into asm just for the low level call when everything is ready to go. Save the MSR on stack rather than PACA. Performance cost is kept to a minimum with a few optimisations: - The endian switch upon return is combined with the MSR restore, which avoids an expensive context synchronizing operation for LE kernels. This makes up for the additional mtmsrd to enable interrupts with local_irq_enable(). - blr is now used to return from the opal_* functions that are called as C functions, to avoid link stack corruption. This requires a skiboot fix as well to keep the call stack balanced. A NULL call is more costly after this, (410ns->430ns on POWER9), but OPAL calls are generally not performance critical at this scale. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-26powerpc/64s: Fix HV NMI vs HV interrupt recoverability testNicholas Piggin
HV interrupts that use HSRR registers do not enter with MSR[RI] clear, but their entry code is not recoverable vs NMI, due to shared use of HSPRG1 as a scratch register to save r13. This means that a system reset or machine check that hits in HSRR interrupt entry can cause r13 to be silently corrupted. Fix this by marking NMIs non-recoverable if they land in HV interrupt ranges. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-26powerpc: sstep: Add support for maddhd, maddhdu, maddld instructionsSandipan Das
This adds emulation support for the following integer instructions: * Multiply-Add High Doubleword (maddhd) * Multiply-Add High Doubleword Unsigned (maddhdu) * Multiply-Add Low Doubleword (maddld) As suggested by Michael, this uses a raw .long for specifying the instruction word when using inline assembly to retain compatibility with older binutils. Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/64: Replace CURRENT_THREAD_INFO with PACA_THREAD_INFOChristophe Leroy
Now that current_thread_info is located at the beginning of 'current' task struct, CURRENT_THREAD_INFO macro is not really needed any more. This patch replaces it by loads of the value at PACA_THREAD_INFO(r13). Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Add PACA_THREAD_INFO rather than using PACACURRENT] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPUChristophe Leroy
Now that thread_info is similar to task_struct, its address is in r2 so CURRENT_THREAD_INFO() macro is useless. This patch removes it. This patch also moves the 'tovirt(r2, r2)' down just before the reactivation of MMU translation, so that we keep the physical address of 'current' in r2 until then. It avoids a few calls to tophys(). At the same time, as the 'cpu' field is not anymore in thread_info, TI_CPU is renamed TASK_CPU by this patch. It also allows to get rid of a couple of '#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE' as ACCOUNT_CPU_USER_ENTRY() and ACCOUNT_CPU_USER_EXIT() are empty when CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not defined. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Fix a missed conversion of TI_CPU idle_6xx.S] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: 'current_set' is now a table of task_struct pointersChristophe Leroy
The table of pointers 'current_set' has been used for retrieving the stack and current. They used to be thread_info pointers as they were pointing to the stack and current was taken from the 'task' field of the thread_info. Now, the pointers of 'current_set' table are now both pointers to task_struct and pointers to thread_info. As they are used to get current, and the stack pointer is retrieved from current's stack field, this patch changes their type to task_struct, and renames secondary_ti to secondary_current. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: regain entire stack spaceChristophe Leroy
thread_info is not anymore in the stack, so the entire stack can now be used. There is also no risk anymore of corrupting task_cpu(p) with a stack overflow so the patch removes the test. When doing this, an explicit test for NULL stack pointer is needed in validate_sp() as it is not anymore implicitely covered by the sizeof(thread_info) gap. In the meantime, with the previous patch all pointers to the stacks are not anymore pointers to thread_info so this patch changes them to void* Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: Activate CONFIG_THREAD_INFO_IN_TASKChristophe Leroy
This patch activates CONFIG_THREAD_INFO_IN_TASK which moves the thread_info into task_struct. Moving thread_info into task_struct has the following advantages: - It protects thread_info from corruption in the case of stack overflows. - Its address is harder to determine if stack addresses are leaked, making a number of attacks more difficult. This has the following consequences: - thread_info is now located at the beginning of task_struct. - The 'cpu' field is now in task_struct, and only exists when CONFIG_SMP is active. - thread_info doesn't have anymore the 'task' field. This patch: - Removes all recopy of thread_info struct when the stack changes. - Changes the CURRENT_THREAD_INFO() macro to point to current. - Selects CONFIG_THREAD_INFO_IN_TASK. - Modifies raw_smp_processor_id() to get ->cpu from current without including linux/sched.h to avoid circular inclusion and without including asm/asm-offsets.h to avoid symbol names duplication between ASM constants and C constants. - Modifies klp_init_thread_info() to take a task_struct pointer argument. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add task_stack.h to livepatch.h to fix build fails] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: Use task_stack_page() in current_pt_regs()Christophe Leroy
Change current_pt_regs() to use task_stack_page() rather than current_thread_info() so that it keeps working once we enable THREAD_INFO_IN_TASK. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Split out of large patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: Use linux/thread_info.h in processor.hChristophe Leroy
When we enable THREAD_INFO_IN_TASK we will remove our definition of current_thread_info(). Instead it will come from linux/thread_info.h So switch processor.h to include the latter, so that it can continue to find current_thread_info(). Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: Use sizeof(struct thread_info) in INIT_SP_LIMITChristophe Leroy
Currently INIT_SP_LIMIT uses sizeof(init_thread_info), but that symbol won't exist when we enable THREAD_INFO_IN_TASK. So just use the sizeof the type which is the same value but will continue to work. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: Update comments in preparation for THREAD_INFO_IN_TASKChristophe Leroy
Update a few comments that talk about current_thread_info() in preparation for THREAD_INFO_IN_TASK. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: call_do_[soft]irq() takes a pointer to the stackChristophe Leroy
The purpose of the pointer given to call_do_softirq() and call_do_irq() is to point the new stack. Currently that's the same thing as the thread_info, but won't be with THREAD_INFO_IN_TASK. So change the parameter to void* and rename it 'sp'. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: Avoid circular header inclusion in mmu-hash.hChristophe Leroy
When activating CONFIG_THREAD_INFO_IN_TASK, linux/sched.h includes asm/current.h. This generates a circular dependency. To avoid that, asm/processor.h shall not be included in mmu-hash.h. In order to do that, this patch moves into a new header called asm/task_size_64/32.h all the TASK_SIZE related constants, which can then be included in mmu-hash.h directly. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out all the TASK_SIZE constants not just 64-bit ones] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/8xx: don't disable large TLBs with CONFIG_STRICT_KERNEL_RWXChristophe Leroy
This patch implements handling of STRICT_KERNEL_RWX with large TLBs directly in the TLB miss handlers. To do so, etext and sinittext are aligned on 512kB boundaries and the miss handlers use 512kB pages instead of 8Mb pages for addresses close to the boundaries. It sets RO PP flags for addresses under sinittext. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWXChristophe Leroy
Today, STRICT_KERNEL_RWX is based on the use of regular pages to map kernel pages. On Book3s 32, it has three consequences: - Using pages instead of BAT for mapping kernel linear memory severely impacts performance. - Exec protection is not effective because no-execute cannot be set at page level (except on 603 which doesn't have hash tables) - Write protection is not effective because PP bits do not provide RO mode for kernel-only pages (except on 603 which handles it in software via PAGE_DIRTY) On the 603+, we have: - Independent IBAT and DBAT allowing limitation of exec parts. - NX bit can be set in segment registers to forbit execution on memory mapped by pages. - RO mode on DBATs even for kernel-only blocks. On the 601, there is nothing much we can do other than warn the user about it, because: - BATs are common to instructions and data. - BAT do not provide RO mode for kernel-only blocks. - segment registers don't have the NX bit. In order to use IBAT for exec protection, this patch: - Aligns _etext to BAT block sizes (128kb) - Set NX bit in kernel segment register (Except on vmalloc area when CONFIG_MODULES is selected) - Maps kernel text with IBATs. In order to use DBAT for exec protection, this patch: - Aligns RW DATA to BAT block sizes (4M) - Maps kernel RO area with write prohibited DBATs - Maps remaining memory with remaining DBATs Here is what we get with this patch on a 832x when activating STRICT_KERNEL_RWX: Symbols: c0000000 T _stext c0680000 R __start_rodata c0680000 R _etext c0800000 T __init_begin c0800000 T _sinittext ~# cat /sys/kernel/debug/block_address_translation ---[ Instruction Block Address Translation ]--- 0: 0xc0000000-0xc03fffff 0x00000000 Kernel EXEC coherent 1: 0xc0400000-0xc05fffff 0x00400000 Kernel EXEC coherent 2: 0xc0600000-0xc067ffff 0x00600000 Kernel EXEC coherent 3: - 4: - 5: - 6: - 7: - ---[ Data Block Address Translation ]--- 0: 0xc0000000-0xc07fffff 0x00000000 Kernel RO coherent 1: 0xc0800000-0xc0ffffff 0x00800000 Kernel RW coherent 2: 0xc1000000-0xc1ffffff 0x01000000 Kernel RW coherent 3: 0xc2000000-0xc3ffffff 0x02000000 Kernel RW coherent 4: 0xc4000000-0xc7ffffff 0x04000000 Kernel RW coherent 5: 0xc8000000-0xcfffffff 0x08000000 Kernel RW coherent 6: 0xd0000000-0xdfffffff 0x10000000 Kernel RW coherent 7: - ~# cat /sys/kernel/debug/segment_registers ---[ User Segments ]--- 0x00000000-0x0fffffff Kern key 1 User key 1 VSID 0xa085d0 0x10000000-0x1fffffff Kern key 1 User key 1 VSID 0xa086e1 0x20000000-0x2fffffff Kern key 1 User key 1 VSID 0xa087f2 0x30000000-0x3fffffff Kern key 1 User key 1 VSID 0xa08903 0x40000000-0x4fffffff Kern key 1 User key 1 VSID 0xa08a14 0x50000000-0x5fffffff Kern key 1 User key 1 VSID 0xa08b25 0x60000000-0x6fffffff Kern key 1 User key 1 VSID 0xa08c36 0x70000000-0x7fffffff Kern key 1 User key 1 VSID 0xa08d47 0x80000000-0x8fffffff Kern key 1 User key 1 VSID 0xa08e58 0x90000000-0x9fffffff Kern key 1 User key 1 VSID 0xa08f69 0xa0000000-0xafffffff Kern key 1 User key 1 VSID 0xa0907a 0xb0000000-0xbfffffff Kern key 1 User key 1 VSID 0xa0918b ---[ Kernel Segments ]--- 0xc0000000-0xcfffffff Kern key 0 User key 1 No Exec VSID 0x000ccc 0xd0000000-0xdfffffff Kern key 0 User key 1 No Exec VSID 0x000ddd 0xe0000000-0xefffffff Kern key 0 User key 1 No Exec VSID 0x000eee 0xf0000000-0xffffffff Kern key 0 User key 1 No Exec VSID 0x000fff Aligning _etext to 128kb allows to map up to 32Mb text with 8 IBATs: 16Mb + 8Mb + 4Mb + 2Mb + 1Mb + 512kb + 256kb + 128kb (+ 128kb) = 32Mb (A 9th IBAT is unneeded as 32Mb would need only a single 32Mb block) Aligning data to 4M allows to map up to 512Mb data with 8 DBATs: 16Mb + 8Mb + 4Mb + 4Mb + 32Mb + 64Mb + 128Mb + 256Mb = 512Mb Because some processors only have 4 BATs and because some targets need DBATs for mapping other areas, the following patch will allow to modify _etext and data alignment. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/mm/32s: add setibat() clearibat() and update_bats()Christophe Leroy
setibat() and clearibat() allows to manipulate IBATs independently of DBATs. update_bats() allows to update bats after init. This is done with MMU off. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/kconfig: define PAGE_SHIFT inside KconfigChristophe Leroy
This patch defined CONFIG_PPC_PAGE_SHIFT in order to be able to use PAGE_SHIFT value inside Kconfig. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/mmu: add is_strict_kernel_rwx() helperChristophe Leroy
Add a helper to know whether STRICT_KERNEL_RWX is enabled. This is based on rodata_enabled flag which is defined only when CONFIG_STRICT_KERNEL_RWX is selected. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/32: add helper to write into segment registersChristophe Leroy
This patch add an helper which wraps 'mtsrin' instruction to write into segment registers. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: sstep: Add tests for addc[.] instructionSandipan Das
This adds test cases for the addc[.] instruction. Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23Revert "powerpc/book3s32: Reorder _PAGE_XXX flags to simplify TLB handling"Michael Ellerman
This reverts commit 78ca1108b10927b3d068c8da91352b0f4cd01fc5. It is causing boot failures with qemu mac99 in at least some configurations.
2019-02-22Merge tag 'kvm-ppc-next-5.1-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-next PPC KVM update for 5.1 There are no major new features this time, just a collection of bug fixes and improvements in various areas, including machine check handling and context switching of protection-key-related registers.
2019-02-22Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-nextPaul Mackerras
This merges in the "ppc-kvm" topic branch of the powerpc tree to get a series of commits that touch both general arch/powerpc code and KVM code. These commits will be merged both via the KVM tree and the powerpc tree. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-02-22powerpc/book3s32: Reorder _PAGE_XXX flags to simplify TLB handlingChristophe Leroy
For pages without _PAGE_USER, PP field is 00 For pages with _PAGE_USER, PP field is 10 for RW and 11 for RO. This patch sets _PAGE_USER to 0x002 and _PAGE_RW to 0x001 is order to simplify TLB handling by reducing amount of shifts. The location of _PAGE_PRESENT and _PAGE_HASHPTE doesn't matter as they are only SW related flags. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-22powerpc/6xx: Store PGDIR physical address in a SPRGChristophe Leroy
Use SPRN_SPRG2 to store the current thread PGDIR and avoid reading thread_struct.pgdir at every TLB miss. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-22powerpc/6xx: Don't use SPRN_SPRG2 for storing stack pointer while in RTASChristophe Leroy
When calling RTAS, the stack pointer is stored in SPRN_SPRG2 in order to be able to restore it in case of machine check in RTAS. As machine check is not a perfomance critical path, this patch frees SPRN_SPRG2 by using a field in thread struct instead. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-22powerpc: simplify BDI switchChristophe Leroy
There is no reason to re-read each time the pointer at location 0xf0 as it is fixed and known. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-22powerpc/powernv: Don't reprogram SLW image on every KVM guest entry/exitPaul Mackerras
Commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug", 2017-07-21) added two calls to opal_slw_set_reg() inside pnv_cpu_offline(), with the aim of changing the LPCR value in the SLW image to disable wakeups from the decrementer while a CPU is offline. However, pnv_cpu_offline() gets called each time a secondary CPU thread is woken up to participate in running a KVM guest, that is, not just when a CPU is offlined. Since opal_slw_set_reg() is a very slow operation (with observed execution times around 20 milliseconds), this means that an offline secondary CPU can often be busy doing the opal_slw_set_reg() call when the primary CPU wants to grab all the secondary threads so that it can run a KVM guest. This leads to messages like "KVM: couldn't grab CPU n" being printed and guest execution failing. There is no need to reprogram the SLW image on every KVM guest entry and exit. So that we do it only when a CPU is really transitioning between online and offline, this moves the calls to pnv_program_cpu_hotplug_lpcr() into pnv_smp_cpu_kill_self(). Fixes: 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-22powerpc/book3s: Remove pgd/pud/pmd_set() interfacesAneesh Kumar K.V
When updating page tables, we need to make sure we fill the page table entry valid bits. We do this by or'ing in one of PGD/PUD/PMD_VAL_BITS. The page table 'set' interfaces allow updating the raw value of page table entries without setting the valid bits, so remove those interfaces to avoid incorrect usage in future. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Reword commit message based on mailing list discussion] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-22powerpc/eeh: Add eeh_force_recover to debugfsOliver O'Halloran
This patch adds a debugfs interface to force scheduling a recovery event. This can be used to recover a specific PE or schedule a "special" recovery even that checks for errors at the PHB level. To force a recovery of a normal PE, use: echo '<#pe>:<#phb>' > /sys/kernel/debug/powerpc/eeh_force_recover To force a scan for broken PHBs: echo 'hwcheck' > /sys/kernel/debug/powerpc/eeh_force_recover Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-22powerpc/eeh: Allow disabling recoveryOliver O'Halloran
Currently when we detect an error we automatically invoke the EEH recovery handler. This can be annoying when debugging EEH problems, or when working on EEH itself so this patch adds a debugfs knob that will prevent a recovery event from being queued up when an issue is detected. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-22powerpc/pci: Add pci_find_controller_for_domain()Oliver O'Halloran
Add a helper to find the pci_controller structure based on the domain number / phb id. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-22powerpc/eeh_cache: Add a way to dump the EEH address cacheOliver O'Halloran
Adds a debugfs file that can be read to view the contents of the EEH address cache. This is pretty similar to the existing eeh_addr_cache_print() function, but that function is intended to debug issues inside of the kernel since it's #ifdef`ed out by default, and writes into the kernel log. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-22powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezesOliver O'Halloran
There's no need to the custom getter/setter functions so we should remove them in favour of using the generic one. While we're here, change the type of eeh_max_freeze to u32 and print the value in decimal rather than hex because printing it in hex makes no sense. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-22powerpc: drop unused GENERIC_CSUM Kconfig itemChristophe Leroy
Commit d4fde568a34a ("powerpc/64: Use optimized checksum routines on little-endian") converted last powerpc user of GENERIC_CSUM. This patch does a final cleanup dropping the Kconfig GENERIC_CSUM option which is always 'n', and associated piece of code in asm/checksum.h Fixes: d4fde568a34a ("powerpc/64: Use optimized checksum routines on little-endian") Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-22powerpc/mm/hash: Increase vmalloc space to 512T with hash MMUMichael Ellerman
This patch updates the kernel non-linear virtual map to 512TB when we're built with 64K page size and are using the hash MMU. We allocate one context for the vmalloc region and hence the max virtual area size is limited by the context map size (512TB for 64K and 64TB for 4K page size). This patch fixes boot failures with large amounts of system RAM where we need large vmalloc space to handle per cpu allocations. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
2019-02-22Merge branch 'topic/ppc-kvm' into nextMichael Ellerman
Merge commits we're sharing with kvm-ppc tree.
2019-02-21powerpc/64s: Better printing of machine check info for guest MCEsPaul Mackerras
This adds an "in_guest" parameter to machine_check_print_event_info() so that we can avoid trying to translate guest NIP values into symbolic form using the host kernel's symbol table. Reviewed-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-21KVM: PPC: Book3S HV: Simplify machine check handlingPaul Mackerras
This makes the handling of machine check interrupts that occur inside a guest simpler and more robust, with less done in assembler code and in real mode. Now, when a machine check occurs inside a guest, we always get the machine check event struct and put a copy in the vcpu struct for the vcpu where the machine check occurred. We no longer call machine_check_queue_event() from kvmppc_realmode_mc_power7(), because on POWER8, when a vcpu is running on an offline secondary thread and we call machine_check_queue_event(), that calls irq_work_queue(), which doesn't work because the CPU is offline, but instead triggers the WARN_ON(lazy_irq_pending()) in pnv_smp_cpu_kill_self() (which fires again and again because nothing clears the condition). All that machine_check_queue_event() actually does is to cause the event to be printed to the console. For a machine check occurring in the guest, we now print the event in kvmppc_handle_exit_hv() instead. The assembly code at label machine_check_realmode now just calls C code and then continues exiting the guest. We no longer either synthesize a machine check for the guest in assembly code or return to the guest without a machine check. The code in kvmppc_handle_exit_hv() is extended to handle the case where the guest is not FWNMI-capable. In that case we now always synthesize a machine check interrupt for the guest. Previously, if the host thinks it has recovered the machine check fully, it would return to the guest without any notification that the machine check had occurred. If the machine check was caused by some action of the guest (such as creating duplicate SLB entries), it is much better to tell the guest that it has caused a problem. Therefore we now always generate a machine check interrupt for guests that are not FWNMI-capable. Reviewed-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-21Merge branch 'topic/dma' into nextMichael Ellerman
Merge hch's big DMA rework series. This is in a topic branch in case he wants to merge it to minimise conflicts.
2019-02-20KVM: Call kvm_arch_memslots_updated() before updating memslotsSean Christopherson
kvm_arch_memslots_updated() is at this point in time an x86-specific hook for handling MMIO generation wraparound. x86 stashes 19 bits of the memslots generation number in its MMIO sptes in order to avoid full page fault walks for repeat faults on emulated MMIO addresses. Because only 19 bits are used, wrapping the MMIO generation number is possible, if unlikely. kvm_arch_memslots_updated() alerts x86 that the generation has changed so that it can invalidate all MMIO sptes in case the effective MMIO generation has wrapped so as to avoid using a stale spte, e.g. a (very) old spte that was created with generation==0. Given that the purpose of kvm_arch_memslots_updated() is to prevent consuming stale entries, it needs to be called before the new generation is propagated to memslots. Invalidating the MMIO sptes after updating memslots means that there is a window where a vCPU could dereference the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO spte that was created with (pre-wrap) generation==0. Fixes: e59dbe09f8e6 ("KVM: Introduce kvm_arch_memslots_updated()") Cc: <stable@vger.kernel.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two easily resolvable overlapping change conflicts, one in TCP and one in the eBPF verifier. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-19Merge branch 'fixes' into nextMichael Ellerman
There's a few important fixes in our fixes branch, in particular the pgd/pud_present() one, so merge it now.
2019-02-19KVM: PPC: Book3S HV: Add KVM stat largepages_[2M/1G]Suraj Jitindar Singh
This adds an entry to the kvm_stats_debugfs directory which provides the number of large (2M or 1G) pages which have been used to setup the guest mappings, for radix guests. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-02-19KVM: PPC: Book3S: Allow XICS emulation to work in nested hosts using XIVEPaul Mackerras
Currently, the KVM code assumes that if the host kernel is using the XIVE interrupt controller (the new interrupt controller that first appeared in POWER9 systems), then the in-kernel XICS emulation will use the XIVE hardware to deliver interrupts to the guest. However, this only works when the host is running in hypervisor mode and has full access to all of the XIVE functionality. It doesn't work in any nested virtualization scenario, either with PR KVM or nested-HV KVM, because the XICS-on-XIVE code calls directly into the native-XIVE routines, which are not initialized and cannot function correctly because they use OPAL calls, and OPAL is not available in a guest. This means that using the in-kernel XICS emulation in a nested hypervisor that is using XIVE as its interrupt controller will cause a (nested) host kernel crash. To fix this, we change most of the places where the current code calls xive_enabled() to select between the XICS-on-XIVE emulation and the plain XICS emulation to call a new function, xics_on_xive(), which returns false in a guest. However, there is a further twist. The plain XICS emulation has some functions which are used in real mode and access the underlying XICS controller (the interrupt controller of the host) directly. In the case of a nested hypervisor, this means doing XICS hypercalls directly. When the nested host is using XIVE as its interrupt controller, these hypercalls will fail. Therefore this also adds checks in the places where the XICS emulation wants to access the underlying interrupt controller directly, and if that is XIVE, makes the code use the virtual mode fallback paths, which call generic kernel infrastructure rather than doing direct XICS access. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-02-19KVM: PPC: Book3S PR: Add emulation for slbfee. instructionPaul Mackerras
Recent kernels, since commit e15a4fea4dee ("powerpc/64s/hash: Add some SLB debugging tests", 2018-10-03) use the slbfee. instruction, which PR KVM currently does not have code to emulate. Consequently recent kernels fail to boot under PR KVM. This adds emulation of slbfee., enabling these kernels to boot successfully. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>