summaryrefslogtreecommitdiff
path: root/arch/powerpc/kvm/book3s_64_mmu_hv.c
AgeCommit message (Collapse)Author
2024-10-25KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s HVSean Christopherson
Replace Book3s HV's homebrewed fault-in logic with __kvm_faultin_pfn(), which functionally does pretty much the exact same thing. Note, when the code was written, KVM indeed didn't do fast GUP without "!atomic && !async", but that has long since changed (KVM tries fast GUP for all writable mappings). Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-61-seanjc@google.com>
2024-10-25KVM: Drop unused "hva" pointer from __gfn_to_pfn_memslot()Sean Christopherson
Drop @hva from __gfn_to_pfn_memslot() now that all callers pass NULL. No functional change intended. Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-19-seanjc@google.com>
2024-10-25KVM: Drop @atomic param from gfn=>pfn and hva=>pfn APIsSean Christopherson
Drop @atomic from the myriad "to_pfn" APIs now that all callers pass "false", and remove a comment blurb about KVM running only the "GUP fast" part in atomic context. No functional change intended. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-13-seanjc@google.com>
2024-04-11KVM: delete .change_pte MMU notifier callbackPaolo Bonzini
The .change_pte() MMU notifier callback was intended as an optimization. The original point of it was that KSM could tell KVM to flip its secondary PTE to a new location without having to first zap it. At the time there was also an .invalidate_page() callback; both of them were *not* bracketed by calls to mmu_notifier_invalidate_range_{start,end}(), and .invalidate_page() also doubled as a fallback implementation of .change_pte(). Later on, however, both callbacks were changed to occur within an invalidate_range_start/end() block. In the case of .change_pte(), commit 6bdb913f0a70 ("mm: wrap calls to set_pte_at_notify with invalidate_range_start and invalidate_range_end", 2012-10-09) did so to remove the fallback from .invalidate_page() to .change_pte() and allow sleepable .invalidate_page() hooks. This however made KVM's usage of the .change_pte() callback completely moot, because KVM unmaps the sPTEs during .invalidate_range_start() and therefore .change_pte() has no hope of finding a sPTE to change. Drop the generic KVM code that dispatches to kvm_set_spte_gfn(), as well as all the architecture specific implementations. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Anup Patel <anup@brainfault.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Reviewed-by: Bibo Mao <maobibo@loongson.cn> Message-ID: <20240405115815.3226315-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-14KVM: PPC: Book3s HV: Hold LPIDs in an unsigned longJordan Niethe
The LPID register is 32 bits long. The host keeps the lpids for each guest in an unsigned word struct kvm_arch. Currently, LPIDs are already limited by mmu_lpid_bits and KVM_MAX_NESTED_GUESTS_SHIFT. The nestedv2 API returns a 64 bit "Guest ID" to be used be the L1 host for each L2 guest. This value is used as an lpid, e.g. it is the parameter used by H_RPT_INVALIDATE. To minimize needless special casing it makes sense to keep this "Guest ID" in struct kvm_arch::lpid. This means that struct kvm_arch::lpid is too small so prepare for this and make it an unsigned long. This is not a problem for the KVM-HV and nestedv1 cases as their lpid values are already limited to valid ranges so in those contexts the lpid can be used as an unsigned word safely as needed. In the PAPR, the H_RPT_INVALIDATE pid/lpid parameter is already specified as an unsigned long so change pseries_rpt_invalidate() to match that. Update the callers of pseries_rpt_invalidate() to also take an unsigned long if they take an lpid value. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230914030600.16993-10-jniethe5@gmail.com
2023-09-14KVM: PPC: Book3S HV: Introduce low level MSR accessorJordan Niethe
kvmppc_get_msr() and kvmppc_set_msr_fast() serve as accessors for the MSR. However because the MSR is kept in the shared regs they include a conditional check for kvmppc_shared_big_endian() and endian conversion. Within the Book3S HV specific code there are direct reads and writes of shregs::msr. In preparation for Nested APIv2 these accesses need to be replaced with accessor functions so it is possible to extend their behavior. However, using the kvmppc_get_msr() and kvmppc_set_msr_fast() functions is undesirable because it would introduce a conditional branch and endian conversion that is not currently present. kvmppc_set_msr_hv() already exists, it is used for the kvmppc_ops::set_msr callback. Introduce a low level accessor __kvmppc_{s,g}et_msr_hv() that simply gets and sets shregs::msr. This will be extend for Nested APIv2 support. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230914030600.16993-8-jniethe5@gmail.com
2023-08-16powerpc: Make virt_to_pfn() a static inlineLinus Walleij
Making virt_to_pfn() a static inline taking a strongly typed (const void *) makes the contract of a passing a pointer of that type to the function explicit and exposes any misuse of the macro virt_to_pfn() acting polymorphic and accepting many types such as (void *), (unitptr_t) or (unsigned long) as arguments without warnings. Move the virt_to_pfn() and related functions below the declaration of __pa() so it compiles. For symmetry do the same with pfn_to_kaddr(). As the file is included right into the linker file, we need to surround the functions with ifndef __ASSEMBLY__ so we don't cause compilation errors. The conversion moreover exposes the fact that pmd_page_vaddr() was returning an unsigned long rather than a const void * as could be expected, so all the sites defining pmd_page_vaddr() had to be augmented as well. Finally the KVM code in book3s_64_mmu_hv.c was passing an unsigned int to virt_to_phys() so fix that up with a cast so the result compiles. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> [mpe: Fixup kfence.h, simplify pfn_to_kaddr() & pmd_page_vaddr()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230809-virt-to-phys-powerpc-v1-1-12e912a7d439@linaro.org
2023-05-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "s390: - More phys_to_virt conversions - Improvement of AP management for VSIE (nested virtualization) ARM64: - Numerous fixes for the pathological lock inversion issue that plagued KVM/arm64 since... forever. - New framework allowing SMCCC-compliant hypercalls to be forwarded to userspace, hopefully paving the way for some more features being moved to VMMs rather than be implemented in the kernel. - Large rework of the timer code to allow a VM-wide offset to be applied to both virtual and physical counters as well as a per-timer, per-vcpu offset that complements the global one. This last part allows the NV timer code to be implemented on top. - A small set of fixes to make sure that we don't change anything affecting the EL1&0 translation regime just after having having taken an exception to EL2 until we have executed a DSB. This ensures that speculative walks started in EL1&0 have completed. - The usual selftest fixes and improvements. x86: - Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled, and by giving the guest control of CR0.WP when EPT is enabled on VMX (VMX-only because SVM doesn't support per-bit controls) - Add CR0/CR4 helpers to query single bits, and clean up related code where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return as a bool - Move AMD_PSFD to cpufeatures.h and purge KVM's definition - Avoid unnecessary writes+flushes when the guest is only adding new PTEs - Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations when emulating invalidations - Clean up the range-based flushing APIs - Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle changed SPTE" overhead associated with writing the entire entry - Track the number of "tail" entries in a pte_list_desc to avoid having to walk (potentially) all descriptors during insertion and deletion, which gets quite expensive if the guest is spamming fork() - Disallow virtualizing legacy LBRs if architectural LBRs are available, the two are mutually exclusive in hardware - Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features - Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES - Apply PMU filters to emulated events and add test coverage to the pmu_event_filter selftest - AMD SVM: - Add support for virtual NMIs - Fixes for edge cases related to virtual interrupts - Intel AMX: - Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is not being reported due to userspace not opting in via prctl() - Fix a bug in emulation of ENCLS in compatibility mode - Allow emulation of NOP and PAUSE for L2 - AMX selftests improvements - Misc cleanups MIPS: - Constify MIPS's internal callbacks (a leftover from the hardware enabling rework that landed in 6.3) Generic: - Drop unnecessary casts from "void *" throughout kvm_main.c - Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct size by 8 bytes on 64-bit kernels by utilizing a padding hole Documentation: - Fix goof introduced by the conversion to rST" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (211 commits) KVM: s390: pci: fix virtual-physical confusion on module unload/load KVM: s390: vsie: clarifications on setting the APCB KVM: s390: interrupt: fix virtual-physical confusion for next alert GISA KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init() KVM: selftests: Test the PMU event "Instructions retired" KVM: selftests: Copy full counter values from guest in PMU event filter test KVM: selftests: Use error codes to signal errors in PMU event filter test KVM: selftests: Print detailed info in PMU event filter asserts KVM: selftests: Add helpers for PMC asserts in PMU event filter test KVM: selftests: Add a common helper for the PMU event filter guest code KVM: selftests: Fix spelling mistake "perrmited" -> "permitted" KVM: arm64: vhe: Drop extra isb() on guest exit KVM: arm64: vhe: Synchronise with page table walker on MMU update KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc() KVM: arm64: nvhe: Synchronise with page table walker on TLBI KVM: arm64: Handle 32bit CNTPCTSS traps KVM: arm64: nvhe: Synchronise with page table walker on vcpu run KVM: arm64: vgic: Don't acquire its_lock before config_lock KVM: selftests: Add test to verify KVM's supported XCR0 ...
2023-04-03KVM: PPC: Fetch prefixed instructions from the guestPaul Mackerras
In order to handle emulation of prefixed instructions in the guest, this first makes vcpu->arch.last_inst be an unsigned long, i.e. 64 bits on 64-bit platforms. For prefixed instructions, the upper 32 bits are used for the prefix and the lower 32 bits for the suffix, and both halves are byte-swapped if the guest endianness differs from the host. Next, vcpu->arch.emul_inst is now 64 bits wide, to match the HEIR register on POWER10. Like HEIR, for a prefixed instruction it is defined to have the prefix is in the top 32 bits and the suffix in the bottom 32 bits, with both halves in the correct byte order. kvmppc_get_last_inst is extended on 64-bit machines to put the prefix and suffix in the right places in the ppc_inst_t being returned. kvmppc_load_last_inst now returns the instruction in an unsigned long in the same format as vcpu->arch.last_inst. It makes the decision about whether to fetch a suffix based on the SRR1_PREFIXED bit in the MSR image stored in the vcpu struct, which generally comes from SRR1 or HSRR1 on an interrupt. This bit is defined in Power ISA v3.1B to be set if the interrupt occurred due to a prefixed instruction and cleared otherwise for all interrupts except for instruction storage interrupt, which does not come to the hypervisor. It is set to zero for asynchronous interrupts such as external interrupts. In previous ISA versions it was always set to 0 for all interrupts except instruction storage interrupt. The code in book3s_hv_rmhandlers.S that loads the faulting instruction on a HDSI is only used on POWER8 and therefore doesn't ever need to load a suffix. [npiggin@gmail.com - check that the is-prefixed bit in SRR1 matches the type of instruction that was fetched.] Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/ZAgsq9h1CCzouQuV@cleo
2023-04-03KVM: PPC: Make kvmppc_get_last_inst() produce a ppc_inst_tPaul Mackerras
This changes kvmppc_get_last_inst() so that the instruction it fetches is returned in a ppc_inst_t variable rather than a u32. This will allow us to return a 64-bit prefixed instruction on those 64-bit machines that implement Power ISA v3.1 or later, such as POWER10. On 32-bit platforms, ppc_inst_t is 32 bits wide, and is turned back into a u32 by ppc_inst_val, which is an identity operation on those platforms. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/ZAgsiPlL9O7KnlZZ@cleo
2023-03-16KVM: PPC: Standardize on "int" return types in the powerpc KVM codeThomas Huth
Most functions that are related to kvm_arch_vm_ioctl() already use "int" as return type to pass error values back to the caller. Some outlier functions use "long" instead for no good reason (they do not really require long values here). Let's standardize on "int" here to avoid casting the values back and forth between the two types. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230208140105.655814-2-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-19Merge tag 'powerpc-6.2-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add powerpc qspinlock implementation optimised for large system scalability and paravirt. See the merge message for more details - Enable objtool to be built on powerpc to generate mcount locations - Use a temporary mm for code patching with the Radix MMU, so the writable mapping is restricted to the patching CPU - Add an option to build the 64-bit big-endian kernel with the ELFv2 ABI - Sanitise user registers on interrupt entry on 64-bit Book3S - Many other small features and fixes Thanks to Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn Helgaas, Bo Liu, Chen Lifu, Christoph Hellwig, Christophe JAILLET, Christophe Leroy, Christopher M. Riedl, Colin Ian King, Deming Wang, Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven, Gustavo A. R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol Jain, Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Randy Dunlap, Rohan McLure, Russell Currey, Sathvika Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng, XueBing Chen, Yang Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu, and Wolfram Sang. * tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (181 commits) powerpc/code-patching: Fix oops with DEBUG_VM enabled powerpc/qspinlock: Fix 32-bit build powerpc/prom: Fix 32-bit build powerpc/rtas: mandate RTAS syscall filtering powerpc/rtas: define pr_fmt and convert printk call sites powerpc/rtas: clean up includes powerpc/rtas: clean up rtas_error_log_max initialization powerpc/pseries/eeh: use correct API for error log size powerpc/rtas: avoid scheduling in rtas_os_term() powerpc/rtas: avoid device tree lookups in rtas_os_term() powerpc/rtasd: use correct OF API for event scan rate powerpc/rtas: document rtas_call() powerpc/pseries: unregister VPA when hot unplugging a CPU powerpc/pseries: reset the RCU watchdogs after a LPM powerpc: Take in account addition CPU node when building kexec FDT powerpc: export the CPU node count powerpc/cpuidle: Set CPUIDLE_FLAG_POLLING for snooze state powerpc/dts/fsl: Fix pca954x i2c-mux node names cxl: Remove unnecessary cxl_pci_window_alignment() selftests/powerpc: Fix resource leaks ...
2022-11-24KVM: PPC: Use __func__ to get function's nameXueBing Chen
Prefer using '"%s...", __func__' to get current function's name in output messages. Signed-off-by: XueBing Chen <chenxuebing@jari.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/13b2c857.beb.181725bad35.Coremail.chenxuebing@jari.cn
2022-11-09kvm: Add interruptible flag to __gfn_to_pfn_memslot()Peter Xu
Add a new "interruptible" flag showing that the caller is willing to be interrupted by signals during the __gfn_to_pfn_memslot() request. Wire it up with a FOLL_INTERRUPTIBLE flag that we've just introduced. This prepares KVM to be able to respond to SIGUSR1 (for QEMU that's the SIGIPI) even during e.g. handling an userfaultfd page fault. No functional change intended. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221011195809.557016-4-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-19KVM: Rename mmu_notifier_* to mmu_invalidate_*Chao Peng
The motivation of this renaming is to make these variables and related helper functions less mmu_notifier bound and can also be used for non mmu_notifier based page invalidation. mmu_invalidate_* was chosen to better describe the purpose of 'invalidating' a page that those variables are used for. - mmu_notifier_seq/range_start/range_end are renamed to mmu_invalidate_seq/range_start/range_end. - mmu_notifier_retry{_hva} helper functions are renamed to mmu_invalidate_retry{_hva}. - mmu_notifier_count is renamed to mmu_invalidate_in_progress to avoid confusion with mn_active_invalidate_count. - While here, also update kvm_inc/dec_notifier_count() to kvm_mmu_invalidate_begin/end() to match the change for mmu_notifier_count. No functional change intended. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Message-Id: <20220816125322.1110439-3-chao.p.peng@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-19Merge branch 'topic/ppc-kvm' into nextMichael Ellerman
Merge our KVM topic branch.
2022-05-18KVM: PPC: Book3S HV: Use consistent type for return value of kvm_age_rmapp()Bo Liu
The return value type defined in the function kvm_age_rmapp() is "bool", but the return value type defined in the implementation of the function kvm_age_rmapp() is "int". Change the return value type to "bool". Signed-off-by: Bo Liu <liubo03@inspur.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220401065252.36472-1-liubo03@inspur.com
2022-05-13KVM: PPC: Book3S HV: Remove KVMPPC_NR_LPIDSNicholas Piggin
KVMPPC_NR_LPIDS no longer represents any size restriction on the LPID space and can be removed. A CPU with more than 12 LPID bits implemented will now be able to create more than 4095 guests. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-7-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S Nested: Use explicit 4096 LPID maximumNicholas Piggin
Rather than tie this to KVMPPC_NR_LPIDS which is becoming more dynamic, fix it to 4096 (12-bits) explicitly for now. kvmhv_get_nested() does not have to check against KVM_MAX_NESTED_GUESTS because the L1 partition table registration hcall already did that, and it checks against the partition table size. This patch also puts all the partition table size calculations into the same form, using 12 for the architected size field shift and 4 for the shift corresponding to the partition table entry size. Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-of-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-6-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV: Update LPID allocator init for POWER9, NestedNicholas Piggin
The LPID allocator init is changed to: - use mmu_lpid_bits rather than hard-coding; - use KVM_MAX_NESTED_GUESTS for nested hypervisors; - not reserve the top LPID on POWER9 and newer CPUs. The reserved LPID is made a POWER7/8-specific detail. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-3-npiggin@gmail.com
2022-05-13KVM: PPC: Remove kvmppc_claim_lpidNicholas Piggin
Removing kvmppc_claim_lpid makes the lpid allocator API a bit simpler to change the underlying implementation in a future patch. The host LPID is always 0, so that can be a detail of the allocator. If the allocator range is restricted, that can reserve LPIDs at the top of the range. This allows kvmppc_claim_lpid to be removed. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-2-npiggin@gmail.com
2022-05-05powerpc: fix typos in commentsJulia Lawall
Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220430185654.5855-1-Julia.Lawall@inria.fr
2022-02-02KVM: PPC: Merge powerpc's debugfs entry content into generic entryAlexey Kardashevskiy
At the moment KVM on PPC creates 4 types of entries under the kvm debugfs: 1) "%pid-%fd" per a KVM instance (for all platforms); 2) "vm%pid" (for PPC Book3s HV KVM); 3) "vm%u_vcpu%u_timing" (for PPC Book3e KVM); 4) "kvm-xive-%p" (for XIVE PPC Book3s KVM, the same for XICS); The problem with this is that multiple VMs per process is not allowed for 2) and 3) which makes it possible for userspace to trigger errors when creating duplicated debugfs entries. This merges all these into 1). This defines kvm_arch_create_kvm_debugfs() similar to kvm_arch_create_vcpu_debugfs(). This defines 2 hooks in kvmppc_ops that allow specific KVM implementations add necessary entries, this adds the _e500 suffix to kvmppc_create_vcpu_debugfs_e500() to make it clear what platform it is for. This makes use of already existing kvm_arch_create_vcpu_debugfs() on PPC. This removes no more used debugfs_dir pointers from PPC kvm_arch structs. This stops removing vcpu entries as once created vcpus stay around for the entire life of a VM and removed when the KVM instance is closed, see commit d56f5136b010 ("KVM: let kvm_destroy_vm_debugfs clean up vCPU debugfs directories"). Suggested-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220111005404.162219-1-aik@ozlabs.ru
2021-12-08KVM: Keep memslots in tree-based structures instead of array-based onesMaciej S. Szmigiero
The current memslot code uses a (reverse gfn-ordered) memslot array for keeping track of them. Because the memslot array that is currently in use cannot be modified every memslot management operation (create, delete, move, change flags) has to make a copy of the whole array so it has a scratch copy to work on. Strictly speaking, however, it is only necessary to make copy of the memslot that is being modified, copying all the memslots currently present is just a limitation of the array-based memslot implementation. Two memslot sets, however, are still needed so the VM continues to run on the currently active set while the requested operation is being performed on the second, currently inactive one. In order to have two memslot sets, but only one copy of actual memslots it is necessary to split out the memslot data from the memslot sets. The memslots themselves should be also kept independent of each other so they can be individually added or deleted. These two memslot sets should normally point to the same set of memslots. They can, however, be desynchronized when performing a memslot management operation by replacing the memslot to be modified by its copy. After the operation is complete, both memslot sets once again point to the same, common set of memslot data. This commit implements the aforementioned idea. For tracking of gfns an ordinary rbtree is used since memslots cannot overlap in the guest address space and so this data structure is sufficient for ensuring that lookups are done quickly. The "last used slot" mini-caches (both per-slot set one and per-vCPU one), that keep track of the last found-by-gfn memslot, are still present in the new code. Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <17c0cf3663b760a0d3753d4ac08c0753e941b811.1638817641.git.maciej.szmigiero@oracle.com>
2021-05-12KVM: PPC: Book3S HV: Fix kvm_unmap_gfn_range_hv() for Hash MMUMichael Ellerman
Commit 32b48bf8514c ("KVM: PPC: Book3S HV: Fix conversion to gfn-based MMU notifier callbacks") fixed kvm_unmap_gfn_range_hv() by adding a for loop over each gfn in the range. But for the Hash MMU it repeatedly calls kvm_unmap_rmapp() with the first gfn of the range, rather than iterating through the range. This exhibits as strange guest behaviour, sometimes crashing in firmare, or booting and then guest userspace crashing unexpectedly. Fix it by passing the iterator, gfn, to kvm_unmap_rmapp(). Fixes: 32b48bf8514c ("KVM: PPC: Book3S HV: Fix conversion to gfn-based MMU notifier callbacks") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210511105459.800788-1-mpe@ellerman.id.au
2021-05-06KVM: PPC: Book3S HV: Fix conversion to gfn-based MMU notifier callbacksNicholas Piggin
Commit b1c5356e873c ("KVM: PPC: Convert to the gfn-based MMU notifier callbacks") causes unmap_gfn_range and age_gfn callbacks to only work on the first gfn in the range. It also makes the aging callbacks call into both radix and hash aging functions for radix guests. Fix this. Add warnings for the single-gfn calls that have been converted to range callbacks, in case they ever receieve ranges greater than 1. Fixes: b1c5356e873c ("KVM: PPC: Convert to the gfn-based MMU notifier callbacks") Reported-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210505121509.1470207-1-npiggin@gmail.com
2021-04-17KVM: PPC: Convert to the gfn-based MMU notifier callbacksSean Christopherson
Move PPC to the gfn-base MMU notifier APIs, and update all 15 bajillion PPC-internal hooks to work with gfns instead of hvas. No meaningful functional change intended, though the exact order of operations is slightly different since the memslot lookups occur before calling into arch code. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210402005658.3024832-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull more KVM updates from Paolo Bonzini: "x86: - take into account HVA before retrying on MMU notifier race - fixes for nested AMD guests without NPT - allow INVPCID in guest without PCID - disable PML in hardware when not in use - MMU code cleanups: * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits) KVM: SVM: Fix nested VM-Exit on #GP interception handling KVM: vmx/pmu: Fix dummy check if lbr_desc->event is created KVM: x86/mmu: Consider the hva in mmu_notifier retry KVM: x86/mmu: Skip mmu_notifier check when handling MMIO page fault KVM: Documentation: rectify rst markup in KVM_GET_SUPPORTED_HV_CPUID KVM: nSVM: prepare guest save area while is_guest_mode is true KVM: x86/mmu: Remove a variety of unnecessary exports KVM: x86: Fold "write-protect large" use case into generic write-protect KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML KVM: VMX: Dynamically enable/disable PML based on memslot dirty logging KVM: x86: Further clarify the logic and comments for toggling log dirty KVM: x86: Move MMU's PML logic to common code KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function KVM: x86/mmu: Expand on the comment in kvm_vcpu_ad_need_write_protect() KVM: nVMX: Disable PML in hardware when running L2 KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs KVM: x86/mmu: Pass the memslot to the rmap callbacks KVM: x86/mmu: Split out max mapping level calculation to helper KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE and HugeTLB pages KVM: nVMX: no need to undo inject_page_fault change on nested vmexit ...
2021-02-22KVM: x86/mmu: Consider the hva in mmu_notifier retryDavid Stevens
Track the range being invalidated by mmu_notifier and skip page fault retries if the fault address is not affected by the in-progress invalidation. Handle concurrent invalidations by finding the minimal range which includes all ranges being invalidated. Although the combined range may include unrelated addresses and cannot be shrunk as individual invalidation operations complete, it is unlikely the marginal gains of proper range tracking are worth the additional complexity. The primary benefit of this change is the reduction in the likelihood of extreme latency when handing a page fault due to another thread having been preempted while modifying host virtual addresses. Signed-off-by: David Stevens <stevensd@chromium.org> Message-Id: <20210222024522.1751719-3-stevensd@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-30KVM: PPC: Book3S HV: Include prototypesCédric Le Goater
It fixes these W=1 compile errors : CC [M] arch/powerpc/kvm/book3s_64_mmu_hv.o ../arch/powerpc/kvm/book3s_64_mmu_hv.c:879:5: error: no previous prototype for ‘kvm_unmap_hva_range_hv’ [-Werror=missing-prototypes] 879 | int kvm_unmap_hva_range_hv(struct kvm *kvm, unsigned long start, unsigned long end) | ^~~~~~~~~~~~~~~~~~~~~~ ../arch/powerpc/kvm/book3s_64_mmu_hv.c:888:6: error: no previous prototype for ‘kvmppc_core_flush_memslot_hv’ [-Werror=missing-prototypes] 888 | void kvmppc_core_flush_memslot_hv(struct kvm *kvm, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../arch/powerpc/kvm/book3s_64_mmu_hv.c:970:5: error: no previous prototype for ‘kvm_age_hva_hv’ [-Werror=missing-prototypes] 970 | int kvm_age_hva_hv(struct kvm *kvm, unsigned long start, unsigned long end) | ^~~~~~~~~~~~~~ ../arch/powerpc/kvm/book3s_64_mmu_hv.c:1011:5: error: no previous prototype for ‘kvm_test_age_hva_hv’ [-Werror=missing-prototypes] 1011 | int kvm_test_age_hva_hv(struct kvm *kvm, unsigned long hva) | ^~~~~~~~~~~~~~~~~~~ ../arch/powerpc/kvm/book3s_64_mmu_hv.c:1019:6: error: no previous prototype for ‘kvm_set_spte_hva_hv’ [-Werror=missing-prototypes] 1019 | void kvm_set_spte_hva_hv(struct kvm *kvm, unsigned long hva, pte_t pte) | ^~~~~~~~~~~~~~~~~~~ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210104143206.695198-20-clg@kaod.org
2020-07-21KVM: PPC: Book3S HV: Increase KVMPPC_NR_LPIDS on POWER8 and POWER9Cédric Le Goater
POWER8 and POWER9 have 12-bit LPIDs. Change LPID_RSVD to support up to (4096 - 2) guests on these processors. POWER7 is kept the same with a limitation of (1024 - 2), but it might be time to drop KVM support for POWER7. Tested with 2048 guests * 4 vCPUs on a witherspoon system with 512G RAM and a bit of swap. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-06-12Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull more KVM updates from Paolo Bonzini: "The guest side of the asynchronous page fault work has been delayed to 5.9 in order to sync with Thomas's interrupt entry rework, but here's the rest of the KVM updates for this merge window. MIPS: - Loongson port PPC: - Fixes ARM: - Fixes x86: - KVM_SET_USER_MEMORY_REGION optimizations - Fixes - Selftest fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (62 commits) KVM: x86: do not pass poisoned hva to __kvm_set_memory_region KVM: selftests: fix sync_with_host() in smm_test KVM: async_pf: Inject 'page ready' event only if 'page not present' was previously injected KVM: async_pf: Cleanup kvm_setup_async_pf() kvm: i8254: remove redundant assignment to pointer s KVM: x86: respect singlestep when emulating instruction KVM: selftests: Don't probe KVM_CAP_HYPERV_ENLIGHTENED_VMCS when nested VMX is unsupported KVM: selftests: do not substitute SVM/VMX check with KVM_CAP_NESTED_STATE check KVM: nVMX: Consult only the "basic" exit reason when routing nested exit KVM: arm64: Move hyp_symbol_addr() to kvm_asm.h KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception KVM: arm64: Make vcpu_cp1x() work on Big Endian hosts KVM: arm64: Remove host_cpu_context member from vcpu structure KVM: arm64: Stop sparse from moaning at __hyp_this_cpu_ptr KVM: arm64: Handle PtrAuth traps early KVM: x86: Unexport x86_fpu_cache and make it static KVM: selftests: Ignore KVM 5-level paging support for VM_MODE_PXXV48_4K KVM: arm64: Save the host's PtrAuth keys in non-preemptible context KVM: arm64: Stop save/restoring ACTLR_EL1 KVM: arm64: Add emulation for 32bit guests accessing ACTLR2 ...
2020-06-08mm/gup.c: convert to use get_user_{page|pages}_fast_only()Souptick Joarder
API __get_user_pages_fast() renamed to get_user_pages_fast_only() to align with pin_user_pages_fast_only(). As part of this we will get rid of write parameter. Instead caller will pass FOLL_WRITE to get_user_pages_fast_only(). This will not change any existing functionality of the API. All the callers are changed to pass FOLL_WRITE. Also introduce get_user_page_fast_only(), and use it in a few places that hard-code nr_pages to 1. Updated the documentation of the API. Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> [arch/powerpc/kvm] Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Michal Suchanek <msuchanek@suse.de> Link: http://lkml.kernel.org/r/1590396812-31277-1-git-send-email-jrdr.linux@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-27KVM: PPC: Clean up redundant 'kvm_run' parametersTianjia Zhang
In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu' structure. For historical reasons, many kvm-related function parameters retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This patch does a unified cleanup of these remaining redundant parameters. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-05-05powerpc/kvm/book3s: Use find_kvm_host_pte in h_enterAneesh Kumar K.V
Since kvmppc_do_h_enter can get called in realmode use low level arch_spin_lock which is safe to be called in realmode. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-15-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/kvm/book3s: Use find_kvm_host_pte in page fault handlerAneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-14-aneesh.kumar@linux.ibm.com
2020-05-05Merge tag 'kvm-ppc-fixes-5.7-1' into topic/ppc-kvmMichael Ellerman
This brings in a fix from the kvm-ppc tree that was merged to mainline after rc2, and so isn't in the base of our topic branch. We'd like it in the topic branch because it interacts with patches we plan to carry in this branch.
2020-04-21Merge tag 'kvm-ppc-fixes-5.7-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master PPC KVM fix for 5.7 - Fix a regression introduced in the last merge window, which results in guests in HPT mode dying randomly.
2020-04-21KVM: PPC: Book3S HV: Handle non-present PTEs in page fault functionsPaul Mackerras
Since cd758a9b57ee "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler", it's been possible in fairly rare circumstances to load a non-present PTE in kvmppc_book3s_hv_page_fault() when running a guest on a POWER8 host. Because that case wasn't checked for, we could misinterpret the non-present PTE as being a cache-inhibited PTE. That could mismatch with the corresponding hash PTE, which would cause the function to fail with -EFAULT a little further down. That would propagate up to the KVM_RUN ioctl() generally causing the KVM userspace (usually qemu) to fall over. This addresses the problem by catching that case and returning to the guest instead. For completeness, this fixes the radix page fault handler in the same way. For radix this didn't cause any obvious misbehaviour, because we ended up putting the non-present PTE into the guest's partition-scoped page tables, leading immediately to another hypervisor data/instruction storage interrupt, which would go through the page fault path again and fix things up. Fixes: cd758a9b57ee "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler" Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1820402 Reported-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-04-05Merge tag 'powerpc-5.7-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Slightly late as I had to rebase mid-week to insert a bug fix: - A large series from Nick for 64-bit to further rework our exception vectors, and rewrite portions of the syscall entry/exit and interrupt return in C. The result is much easier to follow code that is also faster in general. - Cleanup of our ptrace code to split various parts out that had become badly intertwined with #ifdefs over the years. - Changes to our NUMA setup under the PowerVM hypervisor which should hopefully avoid non-sensical topologies which can lead to warnings from the workqueue code and other problems. - MAINTAINERS updates to remove some of our old orphan entries and update the status of others. - Quite a few other small changes and fixes all over the map. Thanks to: Abdul Haleem, afzal mohammed, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Balamuruhan S, Cédric Le Goater, Chen Zhou, Christophe JAILLET, Christophe Leroy, Christoph Hellwig, Clement Courbet, Daniel Axtens, David Gibson, Douglas Miller, Fabiano Rosas, Fangrui Song, Ganesh Goudar, Gautham R. Shenoy, Greg Kroah-Hartman, Greg Kurz, Gustavo Luiz Duarte, Hari Bathini, Ilie Halip, Jan Kara, Joe Lawrence, Joe Perches, Kajol Jain, Larry Finger, Laurentiu Tudor, Leonardo Bras, Libor Pechacek, Madhavan Srinivasan, Mahesh Salgaonkar, Masahiro Yamada, Masami Hiramatsu, Mauricio Faria de Oliveira, Michael Neuling, Michal Suchanek, Mike Rapoport, Nageswara R Sastry, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Oliver O'Halloran, Po-Hsu Lin, Pratik Rajesh Sampat, Rasmus Villemoes, Ravi Bangoria, Roman Bolshakov, Sam Bobroff, Sandipan Das, Santosh S, Sedat Dilek, Segher Boessenkool, Shilpasri G Bhat, Sourabh Jain, Srikar Dronamraju, Stephen Rothwell, Tyrel Datwyler, Vaibhav Jain, YueHaibing" * tag 'powerpc-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (158 commits) powerpc: Make setjmp/longjmp signature standard powerpc/cputable: Remove unnecessary copy of cpu_spec->oprofile_type powerpc: Suppress .eh_frame generation powerpc: Drop -fno-dwarf2-cfi-asm powerpc/32: drop unused ISA_DMA_THRESHOLD powerpc/powernv: Add documentation for the opal sensor_groups sysfs interfaces selftests/powerpc: Fix try-run when source tree is not writable powerpc/vmlinux.lds: Explicitly retain .gnu.hash powerpc/ptrace: move ptrace_triggered() into hw_breakpoint.c powerpc/ptrace: create ppc_gethwdinfo() powerpc/ptrace: create ptrace_get_debugreg() powerpc/ptrace: split out ADV_DEBUG_REGS related functions. powerpc/ptrace: move register viewing functions out of ptrace.c powerpc/ptrace: split out TRANSACTIONAL_MEM related functions. powerpc/ptrace: split out SPE related functions. powerpc/ptrace: split out ALTIVEC related functions. powerpc/ptrace: split out VSX related functions. powerpc/ptrace: drop PARAMETER_SAVE_AREA_OFFSET powerpc/ptrace: drop unnecessary #ifdefs CONFIG_PPC64 powerpc/ptrace: remove unused header includes ...
2020-03-19KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handlerPaul Mackerras
This makes the same changes in the page fault handler for HPT guests that commits 31c8b0d0694a ("KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot() in page fault handler", 2018-03-01), 71d29f43b633 ("KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size", 2018-09-11) and 6579804c4317 ("KVM: PPC: Book3S HV: Avoid crash from THP collapse during radix page fault", 2018-10-04) made for the page fault handler for radix guests. In summary, where we used to call get_user_pages_fast() and then do special handling for VM_PFNMAP vmas, we now call __get_user_pages_fast() and then __gfn_to_pfn_memslot() if that fails, followed by reading the Linux PTE to get the host PFN, host page size and mapping attributes. This also brings in the change from SetPageDirty() to set_page_dirty_lock() which was done for the radix page fault handler in commit c3856aeb2940 ("KVM: PPC: Book3S HV: Fix handling of large pages in radix page fault handler", 2018-02-23). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-03-04powerpc/kvm: no need to check return value of debugfs_create functionsGreg Kroah-Hartman
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Because of this cleanup, we get to remove a few fields in struct kvm_arch that are now unused. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [mpe: Fix build error in kvm/timing.c, adapt kvmppc_remove_cpu_debugfs()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200209105901.1620958-2-gregkh@linuxfoundation.org
2020-01-17KVM: PPC: Book3S: Replace current->mm by kvm->mmLeonardo Bras
Given that in kvm_create_vm() there is: kvm->mm = current->mm; And that on every kvm_*_ioctl we have: if (kvm->mm != current->mm) return -EIO; I see no reason to keep using current->mm instead of kvm->mm. By doing so, we would reduce the use of 'global' variables on code, relying more in the contents of kvm struct. Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-11-01Merge tag 'kvm-ppc-next-5.5-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD KVM PPC update for 5.5 * Add capability to tell userspace whether we can single-step the guest. * Improve the allocation of XIVE virtual processor IDs, to reduce the risk of running out of IDs when running many VMs on POWER9. * Rewrite interrupt synthesis code to deliver interrupts in virtual mode when appropriate. * Minor cleanups and improvements.
2019-10-22KVM: Add separate helper for putting borrowed reference to kvmSean Christopherson
Add a new helper, kvm_put_kvm_no_destroy(), to handle putting a borrowed reference[*] to the VM when installing a new file descriptor fails. KVM expects the refcount to remain valid in this case, as the in-progress ioctl() has an explicit reference to the VM. The primary motiviation for the helper is to document that the 'kvm' pointer is still valid after putting the borrowed reference, e.g. to document that doing mutex(&kvm->lock) immediately after putting a ref to kvm isn't broken. [*] When exposing a new object to userspace via a file descriptor, e.g. a new vcpu, KVM grabs a reference to itself (the VM) prior to making the object visible to userspace to avoid prematurely freeing the VM in the scenario where userspace immediately closes file descriptor. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: PPC: Book3S: Replace reset_msr mmu op with inject_interrupt arch opNicholas Piggin
reset_msr sets the MSR for interrupt injection, but it's cleaner and more flexible to provide a single op to set both MSR and PC for the interrupt. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: Reduce calls to get current->mm by storing the value locallyLeonardo Bras
Reduces the number of calls to get_current() in order to get the value of current->mm by doing it once and storing the value, since it is not supposed to change inside the same process). Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 266Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation 51 franklin street fifth floor boston ma 02110 1301 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 67 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141333.953658117@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-29KVM: PPC: Book3S HV: Use new mutex to synchronize MMU setupPaul Mackerras
Currently the HV KVM code uses kvm->lock in conjunction with a flag, kvm->arch.mmu_ready, to synchronize MMU setup and hold off vcpu execution until the MMU-related data structures are ready. However, this means that kvm->lock is being taken inside vcpu->mutex, which is contrary to Documentation/virtual/kvm/locking.txt and results in lockdep warnings. To fix this, we add a new mutex, kvm->arch.mmu_setup_lock, which nests inside the vcpu mutexes, and is taken in the places where kvm->lock was taken that are related to MMU setup. Additionally we take the new mutex in the vcpu creation code at the point where we are creating a new vcore, in order to provide mutual exclusion with kvmppc_update_lpcr() and ensure that an update to kvm->arch.lpcr doesn't get missed, which could otherwise lead to a stale vcore->lpcr value. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-05-14mm/gup: change GUP fast to use flags rather than a write 'bool'Ira Weiny
To facilitate additional options to get_user_pages_fast() change the singular write parameter to be gup_flags. This patch does not change any functionality. New functionality will follow in subsequent patches. Some of the get_user_pages_fast() call sites were unchanged because they already passed FOLL_WRITE or 0 for the write parameter. NOTE: It was suggested to change the ordering of the get_user_pages_fast() arguments to ensure that callers were converted. This breaks the current GUP call site convention of having the returned pages be the final parameter. So the suggestion was rejected. Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Marshall <hubcap@omnibond.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>