summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-06-10KVM: PPC: Book3S HV P9: Allow all P9 processors to enable nested HVNicholas Piggin
All radix guests go via the P9 path now, so there is no need to limit nested HV to processors that support "mixed mode" MMU. Remove the restriction. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-27-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S HV: Remove unused nested HV tests in XICS emulationNicholas Piggin
Commit f3c18e9342a44 ("KVM: PPC: Book3S HV: Use XICS hypercalls when running as a nested hypervisor") added nested HV tests in XICS hypercalls, but not all are required. * icp_eoi is only called by kvmppc_deliver_irq_passthru which is only called by kvmppc_check_passthru which is only caled by kvmppc_read_one_intr. * kvmppc_read_one_intr is only called by kvmppc_read_intr which is only called by the L0 HV rmhandlers code. * kvmhv_rm_send_ipi is called by: - kvmhv_interrupt_vcore which is only called by kvmhv_commence_exit which is only called by the L0 HV rmhandlers code. - icp_send_hcore_msg which is only called by icp_rm_set_vcpu_irq. - icp_rm_set_vcpu_irq which is only called by icp_rm_try_update - icp_rm_set_vcpu_irq is not nested HV safe because it writes to LPCR directly without a kvmhv_on_pseries test. Nested handlers should not in general be using the rm handlers. The important test seems to be in kvmppc_ipi_thread, which sends the virt-mode H_IPI handler kick to use smp_call_function rather than msgsnd. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-26-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S HV: Remove virt mode checks from real mode handlersNicholas Piggin
Now that the P7/8 path no longer supports radix, real-mode handlers do not need to deal with being called in virt mode. This change effectively reverts commit acde25726bc6 ("KVM: PPC: Book3S HV: Add radix checks in real-mode hypercall handlers"). It removes a few more real-mode tests in rm hcall handlers, which allows the indirect ops for the xive module to be removed from the built-in xics rm handlers. kvmppc_h_random is renamed to kvmppc_rm_h_random to be a bit more descriptive and consistent with other rm handlers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-25-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S HV: Remove radix guest support from P7/8 pathNicholas Piggin
The P9 path now runs all supported radix guest combinations, so remove radix guest support from the P7/8 path. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-24-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S HV: Remove support for dependent threads mode on P9Nicholas Piggin
Dependent-threads mode is the normal KVM mode for pre-POWER9 SMT processors, where all threads in a core (or subcore) would run the same partition at the same time, or they would run the host. This design was mandated by MMU state that is shared between threads in a processor, so the synchronisation point is in hypervisor real-mode that has essentially no shared state, so it's safe for multiple threads to gather and switch to the correct mode. It is implemented by having the host unplug all secondary threads and always run in SMT1 mode, and host QEMU threads essentially represent virtual cores that wake these secondary threads out of unplug when the ioctl is called to run the guest. This happens via a side-path that is mostly invisible to the rest of the Linux host and the secondary threads still appear to be unplugged. POWER9 / ISA v3.0 has a more flexible MMU design that is independent per-thread and allows a much simpler KVM implementation. Before the new "P9 fast path" was added that began to take advantage of this, POWER9 support was implemented in the existing path which has support to run in the dependent threads mode. So it was not much work to add support to run POWER9 in this dependent threads mode. The mode is not required by the POWER9 MMU (although "mixed-mode" hash / radix MMU limitations of early processors were worked around using this mode). But it is one way to run SMT guests without running different guests or guest and host on different threads of the same core, so it could avoid or reduce some SMT attack surfaces without turning off SMT entirely. This security feature has some real, if indeterminate, value. However the old path is lagging in features (nested HV), and with this series the new P9 path adds remaining missing features (radix prefetch bug and hash support, in later patches), so POWER9 dependent threads mode support would be the only remaining reason to keep that code in and keep supporting POWER9/POWER10 in the old path. So here we make the call to drop this feature. Remove dependent threads mode support for POWER9 and above processors. Systems can still achieve this security by disabling SMT entirely, but that would generally come at a larger performance cost for guests. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-23-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S HV: Implement radix prefetch workaround by disabling MMUNicholas Piggin
Rather than partition the guest PID space + flush a rogue guest PID to work around this problem, instead fix it by always disabling the MMU when switching in or out of guest MMU context in HV mode. This may be a bit less efficient, but it is a lot less complicated and allows the P9 path to trivally implement the workaround too. Newer CPUs are not subject to this issue. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-22-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S HV P9: Switch to guest MMU context as late as possibleNicholas Piggin
Move MMU context switch as late as reasonably possible to minimise code running with guest context switched in. This becomes more important when this code may run in real-mode, with later changes. Move WARN_ON as early as possible so program check interrupts are less likely to tangle everything up. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-21-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S HV P9: Add helpers for OS SPR handlingNicholas Piggin
This is a first step to wrapping supervisor and user SPR saving and loading up into helpers, which will then be called independently in bare metal and nested HV cases in order to optimise SPR access. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-20-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S HV P9: Move SPR loading after expiry time checkNicholas Piggin
This is wasted work if the time limit is exceeded. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-19-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S HV P9: Improve exit timing accounting coverageNicholas Piggin
The C conversion caused exit timing to become a bit cramped. Expand it to cover more of the entry and exit code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-18-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S HV P9: Read machine check registers while MSR[RI] is 0Nicholas Piggin
SRR0/1, DAR, DSISR must all be protected from machine check which can clobber them. Ensure MSR[RI] is clear while they are live. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-17-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S HV P9: inline kvmhv_load_hv_regs_and_go into ↵Nicholas Piggin
__kvmhv_vcpu_entry_p9 Now the initial C implementation is done, inline more HV code to make rearranging things easier. And rename __kvmhv_vcpu_entry_p9 to drop the leading underscores as it's now C, and is now a more complete vcpu entry. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-16-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in CNicholas Piggin
Almost all logic is moved to C, by introducing a new in_guest mode for the P9 path that branches very early in the KVM interrupt handler to P9 exit code. The main P9 entry and exit assembly is now only about 160 lines of low level stack setup and register save/restore, plus a bad-interrupt handler. There are two motivations for this, the first is just make the code more maintainable being in C. The second is to reduce the amount of code running in a special KVM mode, "realmode". In quotes because with radix it is no longer necessarily real-mode in the MMU, but it still has to be treated specially because it may be in real-mode, and has various important registers like PID, DEC, TB, etc set to guest. This is hostile to the rest of Linux and can't use arbitrary kernel functionality or be instrumented well. This initial patch is a reasonably faithful conversion of the asm code, but it does lack any loop to return quickly back into the guest without switching out of realmode in the case of unimportant or easily handled interrupts. As explained in previous changes, handling HV interrupts very quickly in this low level realmode is not so important for P9 performance, and are important to avoid for security, observability, debugability reasons. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-15-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 pathNicholas Piggin
In the interest of minimising the amount of code that is run in "real-mode", don't handle hcalls in real mode in the P9 path. This requires some new handlers for H_CEDE and xics-on-xive to be added before xive is pulled or cede logic is checked. This introduces a change in radix guest behaviour where radix guests that execute 'sc 1' in userspace now get a privilege fault whereas previously the 'sc 1' would be reflected as a syscall interrupt to the guest kernel. That reflection is only required for hash guests that run PR KVM. Background: In POWER8 and earlier processors, it is very expensive to exit from the HV real mode context of a guest hypervisor interrupt, and switch to host virtual mode. On those processors, guest->HV interrupts reach the hypervisor with the MMU off because the MMU is loaded with guest context (LPCR, SDR1, SLB), and the other threads in the sub-core need to be pulled out of the guest too. Then the primary must save off guest state, invalidate SLB and ERAT, and load up host state before the MMU can be enabled to run in host virtual mode (~= regular Linux mode). Hash guests also require a lot of hcalls to run due to the nature of the MMU architecture and paravirtualisation design. The XICS interrupt controller requires hcalls to run. So KVM traditionally tries hard to avoid the full exit, by handling hcalls and other interrupts in real mode as much as possible. By contrast, POWER9 has independent MMU context per-thread, and in radix mode the hypervisor is in host virtual memory mode when the HV interrupt is taken. Radix guests do not require significant hcalls to manage their translations, and xive guests don't need hcalls to handle interrupts. So it's much less important for performance to handle hcalls in real mode on POWER9. One caveat is that the TCE hcalls are performance critical, real-mode variants introduced for POWER8 in order to achieve 10GbE performance. Real mode TCE hcalls were found to be less important on POWER9, which was able to drive 40GBe networking without them (using the virt mode hcalls) but performance is still important. These hcalls will benefit from subsequent guest entry/exit optimisation including possibly a faster "partial exit" that does not entirely switch to host context to handle the hcall. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-14-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S HV P9: Move radix MMU switching instructions togetherNicholas Piggin
Switching the MMU from radix<->radix mode is tricky particularly as the MMU can remain enabled and requires a certain sequence of SPR updates. Move these together into their own functions. This also includes the radix TLB check / flush because it's tied in to MMU switching due to tlbiel getting LPID from LPIDR. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-13-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S HV P9: Move xive vcpu context management into ↵Nicholas Piggin
kvmhv_p9_guest_entry Move the xive management up so the low level register switching can be pushed further down in a later patch. XIVE MMIO CI operations can run in higher level code with machine checks, tracing, etc., available. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-12-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S HV P9: Reduce irq_work vs guest decrementer racesNicholas Piggin
irq_work's use of the DEC SPR is racy with guest<->host switch and guest entry which flips the DEC interrupt to guest, which could lose a host work interrupt. This patch closes one race, and attempts to comment another class of races. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-11-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S HV P9: Move setting HDEC after switching to guest LPCRNicholas Piggin
LPCR[HDICE]=0 suppresses hypervisor decrementer exceptions on some processors, so it must be enabled before HDEC is set. Rather than set it in the host LPCR then setting HDEC, move the HDEC update to after the guest MMU context (including LPCR) is loaded. There shouldn't be much concern with delaying HDEC by some 10s or 100s of nanoseconds by setting it a bit later. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-10-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S HV P9: implement kvmppc_xive_pull_vcpu in CNicholas Piggin
This is more symmetric with kvmppc_xive_push_vcpu, and has the advantage that it runs with the MMU on. The extra test added to the asm will go away with a future change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-9-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S 64: Minimise hcall handler calling convention differencesNicholas Piggin
This sets up the same calling convention from interrupt entry to KVM interrupt handler for system calls as exists for other interrupt types. This is a better API, it uses a save area rather than SPR, and it has more registers free to use. Using a single common API helps maintain it, and it becomes easier to use in C in a later patch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-8-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S 64: move bad_host_intr check to HV handlerNicholas Piggin
The bad_host_intr check will never be true with PR KVM, move it to HV code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-7-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S 64: Move interrupt early register setup to KVMNicholas Piggin
Like the earlier patch for hcalls, KVM interrupt entry requires a different calling convention than the Linux interrupt handlers set up. Move the code that converts from one to the other into KVM. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-6-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S 64: Move hcall early register setup to KVMNicholas Piggin
System calls / hcalls have a different calling convention than other interrupts, so there is code in the KVMTEST to massage these into the same form as other interrupt handlers. Move this work into the KVM hcall handler. This means teaching KVM a little more about the low level interrupt handler setup, PACA save areas, etc., although that's not obviously worse than the current approach of coming up with an entirely different interrupt register / save convention. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-5-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S 64: add hcall interrupt handlerNicholas Piggin
Add a separate hcall entry point. This can be used to deal with the different calling convention. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-4-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S 64: Move GUEST_MODE_SKIP test into KVMNicholas Piggin
Move the GUEST_MODE_SKIP logic into KVM code. This is quite a KVM internal detail that has no real need to be in common handlers. Add a comment explaining the what and why of KVM "skip" interrupts. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-3-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S 64: move KVM interrupt entry to a common entry pointNicholas Piggin
Rather than bifurcate the call depending on whether or not HV is possible, and have the HV entry test for PR, just make a single common point which does the demultiplexing. This makes it simpler to add another type of exit handler. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-2-npiggin@gmail.com
2021-06-10powerpc/ps3: Add dma_mask to ps3_dma_regionGeoff Levand
Commit f959dcd6ddfd29235030e8026471ac1b022ad2b0 (dma-direct: Fix potential NULL pointer dereference) added a null check on the dma_mask pointer of the kernel's device structure. Add a dma_mask variable to the ps3_dma_region structure and set the device structure's dma_mask pointer to point to this new variable. Fixes runtime errors like these: # WARNING: Fixes tag on line 10 doesn't match correct format # WARNING: Fixes tag on line 10 doesn't match correct format ps3_system_bus_match:349: dev=8.0(sb_01), drv=8.0(ps3flash): match WARNING: CPU: 0 PID: 1 at kernel/dma/mapping.c:151 .dma_map_page_attrs+0x34/0x1e0 ps3flash sb_01: ps3stor_setup:193: map DMA region failed Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/562d0c9ea0100a30c3b186bcc7adb34b0bbd2cd7.1622746428.git.geoff@infradead.org
2021-06-10powerpc/ps3: Warn on PS3 device errorsGeoff Levand
To aid debugging PS3 boot problems change the log level of several PS3 device errors from pr_debug to pr_warn. Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/eb5c1c10da0bbdeb27c8b069187b4f58e429e837.1622746428.git.geoff@infradead.org
2021-06-10powerpc/ps3: Add CONFIG_PS3_VERBOSE_RESULT optionGeoff Levand
To aid debugging, add a new PS3 kernel config option PS3_VERBOSE_RESULT that, when enabled, will print more verbose messages for the result of LV1 hypercalls. Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0ce4b6969a08094a747bd382dbfd30b72ebc192d.1622746428.git.geoff@infradead.org
2021-06-10powerpc/ps3: Re-align DTB in imageGeoff Levand
Change the PS3 linker script to align the DTB at 8 bytes, the same alignment as that of the of the 'generic' powerpc linker script. Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/245897ed65e402686a4b114ba618e935cb5c6506.1622822173.git.geoff@infradead.org
2021-06-10powerpc/ps3: Add firmware version to sysfsGeoff Levand
Add a new sysfs entry /sys/firmware/ps3/fw-version that exports the PS3's firmware version. The firmware version is available through an LV1 hypercall, and we've been printing it to the boot log, but haven't provided an easy way for user utilities to get it. Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/41509b2da647cd34b1331cc4756c8774b1e284eb.1622822173.git.geoff@infradead.org
2021-06-10powerpc/barrier: Avoid collision with clang's __lwsync macroNathan Chancellor
A change in clang 13 results in the __lwsync macro being defined as __builtin_ppc_lwsync, which emits 'lwsync' or 'msync' depending on what the target supports. This breaks the build because of -Werror in arch/powerpc, along with thousands of warnings: In file included from arch/powerpc/kernel/pmc.c:12: In file included from include/linux/bug.h:5: In file included from arch/powerpc/include/asm/bug.h:109: In file included from include/asm-generic/bug.h:20: In file included from include/linux/kernel.h:12: In file included from include/linux/bitops.h:32: In file included from arch/powerpc/include/asm/bitops.h:62: arch/powerpc/include/asm/barrier.h:49:9: error: '__lwsync' macro redefined [-Werror,-Wmacro-redefined] #define __lwsync() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") ^ <built-in>:308:9: note: previous definition is here #define __lwsync __builtin_ppc_lwsync ^ 1 error generated. Undefine this macro so that the runtime patching introduced by commit 2d1b2027626d ("powerpc: Fixup lwsync at runtime") continues to work properly with clang and the build no longer breaks. Cc: stable@vger.kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://github.com/ClangBuiltLinux/linux/issues/1386 Link: https://github.com/llvm/llvm-project/commit/62b5df7fe2b3fda1772befeda15598fbef96a614 Link: https://lore.kernel.org/r/20210528182752.1852002-1-nathan@kernel.org
2021-06-06powerpc/mem: Add back missing header to fix 'no previous prototype' errorChristophe Leroy
Commit b26e8f27253a ("powerpc/mem: Move cache flushing functions into mm/cacheflush.c") removed asm/sparsemem.h which is required when CONFIG_MEMORY_HOTPLUG is selected to get the declaration of create_section_mapping(). Add it back. Fixes: b26e8f27253a ("powerpc/mem: Move cache flushing functions into mm/cacheflush.c") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3e5b63bb3daab54a1eb9c20221c2e9528c4db9b3.1622883330.git.christophe.leroy@csgroup.eu
2021-06-04KVM: PPC: Book3S HV: Save host FSCR in the P7/8 pathNicholas Piggin
Similar to commit 25edcc50d76c ("KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path"), ensure the P7/8 path saves and restores the host FSCR. The logic explained in that patch actually applies there to the old path well: a context switch can be made before kvmppc_vcpu_run_hv restores the host FSCR and returns. Now both the p9 and the p7/8 paths now save and restore their FSCR, it no longer needs to be restored at the end of kvmppc_vcpu_run_hv Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs") Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210526125851.3436735-1-npiggin@gmail.com
2021-06-01Revert "powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs"Frederic Barrat
This reverts commit 3c0468d4451eb6b4f6604370639f163f9637a479. That commit was breaking alignment guarantees for the DMA address when allocating coherent mappings, as described in Documentation/core-api/dma-api-howto.rst It was also noticed by Mellanox' driver: [ 1515.763621] mlx5_core c002:01:00.0: mlx5_frag_buf_alloc_node:146:(pid 13402): unexpected map alignment: 0x0800000000c61000, page_shift=16 [ 1515.763635] mlx5_core c002:01:00.0: mlx5_cqwq_create:181:(pid 13402): mlx5_frag_buf_alloc_node() failed, -12 Fixes: 3c0468d4451e ("powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs") Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210526144540.117795-1-fbarrat@linux.ibm.com
2021-05-28KVM: PPC: Book3S HV: Save host FSCR in the P7/8 pathNicholas Piggin
Similar to commit 25edcc50d76c ("KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path"), ensure the P7/8 path saves and restores the host FSCR. The logic explained in that patch actually applies there to the old path well: a context switch can be made before kvmppc_vcpu_run_hv restores the host FSCR and returns. Now both the p9 and the p7/8 paths now save and restore their FSCR, it no longer needs to be restored at the end of kvmppc_vcpu_run_hv Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs") Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210526125851.3436735-1-npiggin@gmail.com
2021-05-28powerpc: Fix reverse map real-mode address lookup with huge vmallocNicholas Piggin
real_vmalloc_addr() does not currently work for huge vmalloc, which is what the reverse map can be allocated with for radix host, hash guest. Extract the hugepage aware equivalent from eeh code into a helper, and convert existing sites including this one to use it. Fixes: 8abddd968a30 ("powerpc/64s/radix: Enable huge vmalloc mappings") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210526120005.3432222-1-npiggin@gmail.com
2021-05-28powerpc/kprobes: Fix validation of prefixed instructions across page boundaryNaveen N. Rao
When checking if the probed instruction is the suffix of a prefixed instruction, we access the instruction at the previous word. If the probed instruction is the very first word of a module, we can end up trying to access an invalid page. Fix this by skipping the check for all instructions at the beginning of a page. Prefixed instructions cannot cross a 64-byte boundary and as such, we don't expect to encounter a suffix as the very first word in a page for kernel text. Even if there are prefixed instructions crossing a page boundary (from a module, for instance), the instruction will be illegal, so preventing probing on the suffix of such prefix instructions isn't worthwhile. Fixes: b4657f7650ba ("powerpc/kprobes: Don't allow breakpoints on suffixes") Cc: stable@vger.kernel.org # v5.8+ Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0df9a032a05576a2fa8e97d1b769af2ff0eafbd6.1621416666.git.naveen.n.rao@linux.vnet.ibm.com
2021-05-23tty: hvc: udbg_hvc: retry putc on -EAGAINNathan Lynch
hvterm_raw_put_chars() calls hvc_put_chars(), which may return -EAGAIN when the underlying hcall returns a "busy" status, but udbg_hvc_putc() doesn't handle this. When using xmon on a PowerVM guest, this can result in incomplete or garbled output when printing relatively large amounts of data quickly, such as when dumping the kernel log buffer. Call again on -EAGAIN. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210514214422.3019105-1-nathanl@linux.ibm.com
2021-05-23powerpc/xmon: make dumping log buffer contents more reliableNathan Lynch
Log buffer entries that are too long for dump_log_buf()'s small local buffer are: * silently discarded when a single-line entry is too long; kmsg_dump_get_line() returns true but sets &len to 0. * silently truncated to the last fitting new line when a multi-line entry is too long, e.g. register dumps from __show_regs(); this seems undetectable via the kmsg_dump API. xmon_printf()'s internal buffer is already 1KB; enlarge dump_log_buf()'s own buffer to match and make it statically allocated. Verified that this allows complete printing of register dumps on ppc64le with both CONFIG_PRINTK_TIME=y and CONFIG_PRINTK_CALLER=y. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210514162420.2911458-1-nathanl@linux.ibm.com
2021-05-23powerpc/kprobes: Replace ppc_optinsn by common optinsnChristophe Leroy
Commit 51c9c0843993 ("powerpc/kprobes: Implement Optprobes") implemented a powerpc specific version of optinsn in order to workaround the 32Mb limitation for direct branches. Instead of implementing a dedicated powerpc version, use the common optinsn and override the allocation and freeing functions. This also indirectly remove the CLANG warning about is_kprobe_ppc_optinsn_slot() not being use, and the powerpc will now benefit from commit 5b485629ba0d ("kprobes, extable: Identify kprobes trampolines as kernel text area") Suggested-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ec5e85f9f9abcfecc959a03495f4a7858eb4d203.1620896780.git.christophe.leroy@csgroup.eu
2021-05-23kprobes: Allow architectures to override optinsn page allocationChristophe Leroy
Some architectures like powerpc require a non standard allocation of optinsn page, because module pages are too far from the kernel for direct branches. Define weak alloc_optinsn_page() and free_optinsn_page(), that fall back on alloc_insn_page() and free_insn_page() when not overridden by the architecture. Suggested-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/40a43d6df1fdf41ade36e9a46e60a4df774ca9f6.1620896780.git.christophe.leroy@csgroup.eu
2021-05-23powerpc: Kconfig: disable CONFIG_COMPAT for clang < 12Nick Desaulniers
Until clang-12, clang would attempt to assemble 32b powerpc assembler in 64b emulation mode when using a 64b target triple with -m32, leading to errors during the build of the compat VDSO. Simply disable all of CONFIG_COMPAT; users should upgrade to the latest release of clang for proper support. Suggested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://github.com/ClangBuiltLinux/linux/issues/1160 Link: https://github.com/llvm/llvm-project/commits/2288319733cd5f525bf7e24dece08bfcf9d0ff9e Link: https://groups.google.com/g/clang-built-linux/c/ayNmi3HoNdY/m/XJAGj_G2AgAJ Link: https://lore.kernel.org/r/20210518205858.2440344-1-ndesaulniers@google.com
2021-05-23powerpc/powernv/pci: fix header guardNick Desaulniers
While looking at -Wundef warnings, the #if CONFIG_EEH stood out as a possible candidate to convert to #ifdef CONFIG_EEH. It seems that based on Kconfig dependencies it's not possible to build this file without CONFIG_EEH enabled, but based on upstream discussion, it's not clear yet that CONFIG_EEH should be enabled by default. For now, simply fix the -Wundef warning. Suggested-by: Nathan Chancellor <nathan@kernel.org> Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://github.com/ClangBuiltLinux/linux/issues/570 Link: https://lore.kernel.org/lkml/67f6cd269684c9aa8463ff4812c3b4605e6739c3.camel@perches.com/ Link: https://lore.kernel.org/lkml/CAOSf1CGoN5R0LUrU=Y=UWho1Z_9SLgCX8s3SbFJXwJXc5BYz4A@mail.gmail.com/ Link: https://lore.kernel.org/r/20210518204044.2390064-1-ndesaulniers@google.com
2021-05-23powerpc/sstep: Add tests for setb instructionSathvika Vasireddy
This adds selftests for setb instruction. Signed-off-by: Sathvika Vasireddy <sathvika@linux.vnet.ibm.com> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b05b61ccb5f10279d46fed490796f32ea2ccc270.1620727160.git.sathvika@linux.vnet.ibm.com
2021-05-23powerpc/sstep: Add emulation support for ‘setb’ instructionSathvika Vasireddy
This adds emulation support for the following instruction: * Set Boolean (setb) Signed-off-by: Sathvika Vasireddy <sathvika@linux.vnet.ibm.com> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7b735b0c898da0db2af8628a64df2f5114596f22.1620727160.git.sathvika@linux.vnet.ibm.com
2021-05-23powerpc/Makefile: Add ppc32/ppc64_randconfig targetsMichael Ellerman
Make it easier to generate a 32 or 64-bit specific randconfig. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Requested-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20210428132700.3426100-1-mpe@ellerman.id.au
2021-05-23powerpc/pseries: minor enhancements in dlpar_memory_remove_by_ic()Daniel Henrique Barboza
We don't need the 'lmbs_available' variable to count the valid LMBs and to check if we have less than 'lmbs_to_remove'. We must ensure that the entire LMB range must be removed, so we can error out immediately if any LMB in the range is marked as reserved. Add a couple of comments explaining the reasoning behind the differences we have in this function in contrast to what it is done in its sister function, dlpar_memory_remove_by_count(). Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210512202809.95363-5-danielhb413@gmail.com
2021-05-23powerpc/pseries: break early in dlpar_memory_remove_by_count() loopsDaniel Henrique Barboza
After marking the LMBs as reserved depending on dlpar_remove_lmb() rc, we evaluate whether we need to add the LMBs back or if we can release the LMB DRCs. In both cases, a for_each_drmem_lmb() loop without a break condition is used. This means that we're going to cycle through all LMBs of the partition even after we're done with what we were going to do. This patch adds break conditions in both loops to avoid this. The 'lmbs_removed' variable was renamed to 'lmbs_reserved', and it's now being decremented each time a lmb reservation is removed, indicating that the operation we're doing (adding back LMBs or releasing DRCs) is completed. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210512202809.95363-4-danielhb413@gmail.com
2021-05-23powerpc/pseries: check DRCONF_MEM_RESERVED in lmb_is_removable()Daniel Henrique Barboza
DRCONF_MEM_RESERVED is a flag that represents the "Reserved Memory" status in LOPAR v2.10, section 4.2.8. If a LMB is marked as reserved, quoting LOPAR, "is not to be used or altered by the base OS". This flag is read only in the kernel, being set by the firmware/hypervisor in the DT. As an example, QEMU will set this flag in hw/ppc/spapr.c, spapr_dt_dynamic_memory(). lmb_is_removable() does not check for DRCONF_MEM_RESERVED. This function is used in dlpar_remove_lmb() as a guard before the removal logic. Since it is failing to check for !RESERVED, dlpar_remove_lmb() will fail in a later stage instead of failing in the validation when receiving a reserved LMB as input. lmb_is_removable() is also used in dlpar_memory_remove_by_count() to evaluate if we have enough LMBs to complete the request. The missing !RESERVED check in this case is causing dlpar_memory_remove_by_count() to miscalculate the number of elegible LMBs for the removal, and can make it error out later on instead of failing in the validation with the 'not enough LMBs to satisfy request' message. Making a DRCONF_MEM_RESERVED check in lmb_is_removable() fixes all these issues. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210512202809.95363-3-danielhb413@gmail.com