summaryrefslogtreecommitdiff
path: root/arch/arm64/kvm/arm.c
AgeCommit message (Collapse)Author
2022-05-04KVM: arm64: Rename the KVM_REQ_SLEEP handlerOliver Upton
The naming of the kvm_req_sleep function is confusing: the function itself sleeps the vCPU, it does not request such an event. Rename the function to make its purpose more clear. No functional change intended. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220504032446.4133305-5-oupton@google.com
2022-05-04KVM: arm64: Track vCPU power state using MP state valuesOliver Upton
A subsequent change to KVM will add support for additional power states. Store the MP state by value rather than keeping track of it as a boolean. No functional change intended. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220504032446.4133305-4-oupton@google.com
2022-05-04KVM: arm64: Dedupe vCPU power off helpersOliver Upton
vcpu_power_off() and kvm_psci_vcpu_off() are equivalent; rename the former and replace all callsites to the latter. No functional change intended. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220504032446.4133305-3-oupton@google.com
2022-05-03KVM: arm64: Setup a framework for hypercall bitmap firmware registersRaghavendra Rao Ananta
KVM regularly introduces new hypercall services to the guests without any consent from the userspace. This means, the guests can observe hypercall services in and out as they migrate across various host kernel versions. This could be a major problem if the guest discovered a hypercall, started using it, and after getting migrated to an older kernel realizes that it's no longer available. Depending on how the guest handles the change, there's a potential chance that the guest would just panic. As a result, there's a need for the userspace to elect the services that it wishes the guest to discover. It can elect these services based on the kernels spread across its (migration) fleet. To remedy this, extend the existing firmware pseudo-registers, such as KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space for all the hypercall services available. These firmware registers are categorized based on the service call owners, but unlike the existing firmware pseudo-registers, they hold the features supported in the form of a bitmap. During the VM initialization, the registers are set to upper-limit of the features supported by the corresponding registers. It's expected that the VMMs discover the features provided by each register via GET_ONE_REG, and write back the desired values using SET_ONE_REG. KVM allows this modification only until the VM has started. Some of the standard features are not mapped to any bits of the registers. But since they can recreate the original problem of making it available without userspace's consent, they need to be explicitly added to the case-list in kvm_hvc_call_default_allowed(). Any function-id that's not enabled via the bitmap, or not listed in kvm_hvc_call_default_allowed, will be returned as SMCCC_RET_NOT_SUPPORTED to the guest. Older userspace code can simply ignore the feature and the hypercall services will be exposed unconditionally to the guests, thus ensuring backward compatibility. In this patch, the framework adds the register only for ARM's standard secure services (owner value 4). Currently, this includes support only for ARM True Random Number Generator (TRNG) service, with bit-0 of the register representing mandatory features of v1.0. Other services are momentarily added in the upcoming patches. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Gavin Shan <gshan@redhat.com> [maz: reduced the scope of some helpers, tidy-up bitmap max values, dropped error-only fast path] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220502233853.1233742-3-rananta@google.com
2022-05-02KVM: Add max_vcpus field in common 'struct kvm'Sean Christopherson
For TDX guests, the maximum number of vcpus needs to be specified when the TDX guest VM is initialized (creating the TDX data corresponding to TDX guest) before creating vcpu. It needs to record the maximum number of vcpus on VM creation (KVM_CREATE_VM) and return error if the number of vcpus exceeds it Because there is already max_vcpu member in arm64 struct kvm_arch, move it to common struct kvm and initialize it to KVM_MAX_VCPUS before kvm_arch_init_vm() instead of adding it to x86 struct kvm_arch. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Message-Id: <e53234cdee6a92357d06c80c03d77c19cdefb804.1646422845.git.isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29KVM: arm64: uapi: Add kvm_debug_exit_arch.hsr_highAlexandru Elisei
When userspace is debugging a VM, the kvm_debug_exit_arch part of the kvm_run struct contains arm64 specific debug information: the ESR_EL2 value, encoded in the field "hsr", and the address of the instruction that caused the exception, encoded in the field "far". Linux has moved to treating ESR_EL2 as a 64-bit register, but unfortunately kvm_debug_exit_arch.hsr cannot be changed because that would change the memory layout of the struct on big endian machines: Current layout: | Layout with "hsr" extended to 64 bits: | offset 0: ESR_EL2[31:0] (hsr) | offset 0: ESR_EL2[61:32] (hsr[61:32]) offset 4: padding | offset 4: ESR_EL2[31:0] (hsr[31:0]) offset 8: FAR_EL2[61:0] (far) | offset 8: FAR_EL2[61:0] (far) which breaks existing code. The padding is inserted by the compiler because the "far" field must be aligned to 8 bytes (each field must be naturally aligned - aapcs64 [1], page 18), and the struct itself must be aligned to 8 bytes (the struct must be aligned to the maximum alignment of its fields - aapcs64, page 18), which means that "hsr" must be aligned to 8 bytes as it is the first field in the struct. To avoid changing the struct size and layout for the existing fields, add a new field, "hsr_high", which replaces the existing padding. "hsr_high" will be used to hold the ESR_EL2[61:32] bits of the register. The memory layout, both on big and little endian machine, becomes: offset 0: ESR_EL2[31:0] (hsr) offset 4: ESR_EL2[61:32] (hsr_high) offset 8: FAR_EL2[61:0] (far) The padding that the compiler inserts for the current struct layout is unitialized. To prevent an updated userspace running on an old kernel mistaking the padding for a valid "hsr_high" value, add a new flag, KVM_DEBUG_ARCH_HSR_HIGH_VALID, to kvm_run->flags to let userspace know that "hsr_high" holds a valid ESR_EL2[61:32] value. [1] https://github.com/ARM-software/abi-aa/releases/download/2021Q3/aapcs64.pdf Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220425114444.368693-6-alexandru.elisei@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-28KVM: arm64: Add guard pages for KVM nVHE hypervisor stackKalesh Singh
Map the stack pages in the flexible private VA range and allocate guard pages below the stack as unbacked VA space. The stack is aligned so that any valid stack address has PAGE_SHIFT bit as 1 - this is used for overflow detection (implemented in a subsequent patch in the series). Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220420214317.3303360-4-kaleshsingh@google.com
2022-04-20KVM: arm64: Handle blocking WFIT instructionMarc Zyngier
When trapping a blocking WFIT instruction, take it into account when computing the deadline of the background timer. The state is tracked with a new vcpu flag, and is gated by a new CPU capability, which isn't currently enabled. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220419182755.601427-6-maz@kernel.org
2022-04-20KVM: arm64: Simplify kvm_cpu_has_pending_timer()Marc Zyngier
kvm_cpu_has_pending_timer() ends up checking all the possible timers for a wake-up cause. However, we already check for pending interrupts whenever we try to wake-up a vcpu, including the timer interrupts. Obviously, doing the same work twice is once too many. Reduce this helper to almost nothing, but keep it around, as we are going to make use of it soon. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220419182755.601427-4-maz@kernel.org
2022-03-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Proper emulation of the OSLock feature of the debug architecture - Scalibility improvements for the MMU lock when dirty logging is on - New VMID allocator, which will eventually help with SVA in VMs - Better support for PMUs in heterogenous systems - PSCI 1.1 support, enabling support for SYSTEM_RESET2 - Implement CONFIG_DEBUG_LIST at EL2 - Make CONFIG_ARM64_ERRATUM_2077057 default y - Reduce the overhead of VM exit when no interrupt is pending - Remove traces of 32bit ARM host support from the documentation - Updated vgic selftests - Various cleanups, doc updates and spelling fixes RISC-V: - Prevent KVM_COMPAT from being selected - Optimize __kvm_riscv_switch_to() implementation - RISC-V SBI v0.3 support s390: - memop selftest - fix SCK locking - adapter interruptions virtualization for secure guests - add Claudio Imbrenda as maintainer - first step to do proper storage key checking x86: - Continue switching kvm_x86_ops to static_call(); introduce static_call_cond() and __static_call_ret0 when applicable. - Cleanup unused arguments in several functions - Synthesize AMD 0x80000021 leaf - Fixes and optimization for Hyper-V sparse-bank hypercalls - Implement Hyper-V's enlightened MSR bitmap for nested SVM - Remove MMU auditing - Eager splitting of page tables (new aka "TDP" MMU only) when dirty page tracking is enabled - Cleanup the implementation of the guest PGD cache - Preparation for the implementation of Intel IPI virtualization - Fix some segment descriptor checks in the emulator - Allow AMD AVIC support on systems with physical APIC ID above 255 - Better API to disable virtualization quirks - Fixes and optimizations for the zapping of page tables: - Zap roots in two passes, avoiding RCU read-side critical sections that last too long for very large guests backed by 4 KiB SPTEs. - Zap invalid and defunct roots asynchronously via concurrency-managed work queue. - Allowing yielding when zapping TDP MMU roots in response to the root's last reference being put. - Batch more TLB flushes with an RCU trick. Whoever frees the paging structure now holds RCU as a proxy for all vCPUs running in the guest, i.e. to prolongs the grace period on their behalf. It then kicks the the vCPUs out of guest mode before doing rcu_read_unlock(). Generic: - Introduce __vcalloc and use it for very large allocations that need memcg accounting" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits) KVM: use kvcalloc for array allocations KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2 kvm: x86: Require const tsc for RT KVM: x86: synthesize CPUID leaf 0x80000021h if useful KVM: x86: add support for CPUID leaf 0x80000021 KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()" kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCU KVM: arm64: fix typos in comments KVM: arm64: Generalise VM features into a set of flags KVM: s390: selftests: Add error memop tests KVM: s390: selftests: Add more copy memop tests KVM: s390: selftests: Add named stages for memop test KVM: s390: selftests: Add macro as abstraction for MEM_OP KVM: s390: selftests: Split memop tests KVM: s390x: fix SCK locking RISC-V: KVM: Implement SBI HSM suspend call RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() function RISC-V: Add SBI HSM suspend related defines RISC-V: KVM: Implement SBI v0.3 SRST extension ...
2022-03-18KVM: arm64: fix typos in commentsJulia Lawall
Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220318103729.157574-24-Julia.Lawall@inria.fr
2022-03-18KVM: arm64: Generalise VM features into a set of flagsMarc Zyngier
We currently deal with a set of booleans for VM features, while they could be better represented as set of flags contained in an unsigned long, similarily to what we are doing on the CPU side. Signed-off-by: Marc Zyngier <maz@kernel.org> [Oliver: Flag-ify the 'ran_once' boolean] Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220311174001.605719-2-oupton@google.com
2022-03-14Merge branch 'for-next/spectre-bhb' into for-next/coreWill Deacon
Merge in the latest Spectre mess to fix up conflicts with what was already queued for 5.18 when the embargo finally lifted. * for-next/spectre-bhb: (21 commits) arm64: Do not include __READ_ONCE() block in assembly files arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting arm64: Use the clearbhb instruction in mitigations KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated arm64: Mitigate spectre style branch history side channels arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2 arm64: Add percpu vectors for EL1 arm64: entry: Add macro for reading symbol addresses from the trampoline arm64: entry: Add vectors that have the bhb mitigation sequences arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations arm64: entry: Allow the trampoline text to occupy multiple pages arm64: entry: Make the kpti trampoline's kpti sequence optional arm64: entry: Move trampoline macros out of ifdef'd section arm64: entry: Don't assume tramp_vectors is the start of the vectors arm64: entry: Allow tramp_alias to access symbols after the 4K boundary arm64: entry: Move the trampoline data page before the text page arm64: entry: Free up another register on kpti's tramp_exit path arm64: entry: Make the trampoline cleanup optional KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit ...
2022-03-09Merge branch kvm-arm64/misc-5.18 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/misc-5.18: : . : Misc fixes for KVM/arm64 5.18: : : - Drop unused kvm parameter to kvm_psci_version() : : - Implement CONFIG_DEBUG_LIST at EL2 : : - Make CONFIG_ARM64_ERRATUM_2077057 default y : : - Only do the interrupt dance if we have exited because of an interrupt : : - Remove traces of 32bit ARM host support from the documentation : . Documentation: KVM: Update documentation to indicate KVM is arm64-only KVM: arm64: Only open the interrupt window on exit due to an interrupt KVM: arm64: Enable Cortex-A510 erratum 2077057 by default Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-03-04KVM: arm64: Only open the interrupt window on exit due to an interruptMarc Zyngier
Now that we properly account for interrupts taken whilst the guest was running, it becomes obvious that there is no need to open this accounting window if we didn't exit because of an interrupt. This saves a number of system register accesses and other barriers if we exited for any other reason (such as a trap, for example). Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220304135914.1464721-1-maz@kernel.org
2022-02-25arm64: Add support of PAuth QARMA3 architected algorithmVladimir Murzin
QARMA3 is relaxed version of the QARMA5 algorithm which expected to reduce the latency of calculation while still delivering a suitable level of security. Support for QARMA3 can be discovered via ID_AA64ISAR2_EL1 APA3, bits [15:12] Indicates whether the QARMA3 algorithm is implemented in the PE for address authentication in AArch64 state. GPA3, bits [11:8] Indicates whether the QARMA3 algorithm is implemented in the PE for generic code authentication in AArch64 state. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220224124952.119612-4-vladimir.murzin@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-02-15KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3AJames Morse
CPUs vulnerable to Spectre-BHB either need to make an SMC-CC firmware call from the vectors, or run a sequence of branches. This gets added to the hyp vectors. If there is no support for arch-workaround-1 in firmware, the indirect vector will be used. kvm_init_vector_slots() only initialises the two indirect slots if the platform is vulnerable to Spectre-v3a. pKVM's hyp_map_vectors() only initialises __hyp_bp_vect_base if the platform is vulnerable to Spectre-v3a. As there are about to more users of the indirect vectors, ensure their entries in hyp_spectre_vector_selector[] are always initialised, and __hyp_bp_vect_base defaults to the regular VA mapping. The Spectre-v3a check is moved to a helper kvm_system_needs_idmapped_vectors(), and merged with the code that creates the hyp mappings. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
2022-02-08Merge branch kvm-arm64/pmu-bl into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pmu-bl: : . : Improve PMU support on heterogeneous systems, courtesy of Alexandru Elisei : . KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute KVM: arm64: Keep a list of probed PMUs KVM: arm64: Keep a per-VM pointer to the default PMU perf: Fix wrong name in comment for struct perf_cpu_context KVM: arm64: Do not change the PMU event filter after a VCPU has run Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-02-08KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPUAlexandru Elisei
Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU device ioctl. If the VCPU is scheduled on a physical CPU which has a different PMU, the perf events needed to emulate a guest PMU won't be scheduled in and the guest performance counters will stop counting. Treat it as an userspace error and refuse to run the VCPU in this situation. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220127161759.53553-7-alexandru.elisei@arm.com
2022-02-08KVM: arm64: Do not change the PMU event filter after a VCPU has runMarc Zyngier
Userspace can specify which events a guest is allowed to use with the KVM_ARM_VCPU_PMU_V3_FILTER attribute. The list of allowed events can be identified by a guest from reading the PMCEID{0,1}_EL0 registers. Changing the PMU event filter after a VCPU has run can cause reads of the registers performed before the filter is changed to return different values than reads performed with the new event filter in place. The architecture defines the two registers as read-only, and this behaviour contradicts that. Keep track when the first VCPU has run and deny changes to the PMU event filter to prevent this from happening. Signed-off-by: Marc Zyngier <maz@kernel.org> [ Alexandru E: Added commit message, updated ioctl documentation ] Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220127161759.53553-2-alexandru.elisei@arm.com
2022-02-08KVM: arm64: Make active_vmids invalid on vCPU schedule outShameer Kolothum
Like ASID allocator, we copy the active_vmids into the reserved_vmids on a rollover. But it's unlikely that every CPU will have a vCPU as current task and we may end up unnecessarily reserving the VMID space. Hence, set active_vmids to an invalid one when scheduling out a vCPU. Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211122121844.867-5-shameerali.kolothum.thodi@huawei.com
2022-02-08KVM: arm64: Align the VMID allocation with the arm64 ASIDJulien Grall
At the moment, the VMID algorithm will send an SGI to all the CPUs to force an exit and then broadcast a full TLB flush and I-Cache invalidation. This patch uses the new VMID allocator. The benefits are:    - Aligns with arm64 ASID algorithm.    - CPUs are not forced to exit at roll-over. Instead, the VMID will be marked reserved and context invalidation is broadcasted. This will reduce the IPIs traffic.   - More flexible to add support for pinned KVM VMIDs in the future.     With the new algo, the code is now adapted:     - The call to update_vmid() will be done with preemption disabled as the new algo requires to store information per-CPU. Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211122121844.867-4-shameerali.kolothum.thodi@huawei.com
2022-02-05Merge tag 'kvmarm-fixes-5.17-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.17, take #2 - A couple of fixes when handling an exception while a SError has been delivered - Workaround for Cortex-A510's single-step[ erratum
2022-02-01kvm/arm64: rework guest entry logicMark Rutland
In kvm_arch_vcpu_ioctl_run() we enter an RCU extended quiescent state (EQS) by calling guest_enter_irqoff(), and unmasked IRQs prior to exiting the EQS by calling guest_exit(). As the IRQ entry code will not wake RCU in this case, we may run the core IRQ code and IRQ handler without RCU watching, leading to various potential problems. Additionally, we do not inform lockdep or tracing that interrupts will be enabled during guest execution, which caan lead to misleading traces and warnings that interrupts have been enabled for overly-long periods. This patch fixes these issues by using the new timing and context entry/exit helpers to ensure that interrupts are handled during guest vtime but with RCU watching, with a sequence: guest_timing_enter_irqoff(); guest_state_enter_irqoff(); < run the vcpu > guest_state_exit_irqoff(); < take any pending IRQs > guest_timing_exit_irqoff(); Since instrumentation may make use of RCU, we must also ensure that no instrumented code is run during the EQS. I've split out the critical section into a new kvm_arm_enter_exit_vcpu() helper which is marked noinstr. Fixes: 1b3d546daf85ed2b ("arm/arm64: KVM: Properly account for guest CPU time") Reported-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Message-Id: <20220201132926.3301912-3-mark.rutland@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "RISCV: - Use common KVM implementation of MMU memory caches - SBI v0.2 support for Guest - Initial KVM selftests support - Fix to avoid spurious virtual interrupts after clearing hideleg CSR - Update email address for Anup and Atish ARM: - Simplification of the 'vcpu first run' by integrating it into KVM's 'pid change' flow - Refactoring of the FP and SVE state tracking, also leading to a simpler state and less shared data between EL1 and EL2 in the nVHE case - Tidy up the header file usage for the nvhe hyp object - New HYP unsharing mechanism, finally allowing pages to be unmapped from the Stage-1 EL2 page-tables - Various pKVM cleanups around refcounting and sharing - A couple of vgic fixes for bugs that would trigger once the vcpu xarray rework is merged, but not sooner - Add minimal support for ARMv8.7's PMU extension - Rework kvm_pgtable initialisation ahead of the NV work - New selftest for IRQ injection - Teach selftests about the lack of default IPA space and page sizes - Expand sysreg selftest to deal with Pointer Authentication - The usual bunch of cleanups and doc update s390: - fix sigp sense/start/stop/inconsistency - cleanups x86: - Clean up some function prototypes more - improved gfn_to_pfn_cache with proper invalidation, used by Xen emulation - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery - completely remove potential TOC/TOU races in nested SVM consistency checks - update some PMCs on emulated instructions - Intel AMX support (joint work between Thomas and Intel) - large MMU cleanups - module parameter to disable PMU virtualization - cleanup register cache - first part of halt handling cleanups - Hyper-V enlightened MSR bitmap support for nested hypervisors Generic: - clean up Makefiles - introduce CONFIG_HAVE_KVM_DIRTY_RING - optimize memslot lookup using a tree - optimize vCPU array usage by converting to xarray" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (268 commits) x86/fpu: Fix inline prefix warnings selftest: kvm: Add amx selftest selftest: kvm: Move struct kvm_x86_state to header selftest: kvm: Reorder vcpu_load_state steps for AMX kvm: x86: Disable interception for IA32_XFD on demand x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state() kvm: selftests: Add support for KVM_CAP_XSAVE2 kvm: x86: Add support for getting/setting expanded xstate buffer x86/fpu: Add uabi_size to guest_fpu kvm: x86: Add CPUID support for Intel AMX kvm: x86: Add XCR0 support for Intel AMX kvm: x86: Disable RDMSR interception of IA32_XFD_ERR kvm: x86: Emulate IA32_XFD_ERR for guest kvm: x86: Intercept #NM for saving IA32_XFD_ERR x86/fpu: Prepare xfd_err in struct fpu_guest kvm: x86: Add emulation for IA32_XFD x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2 x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM x86/fpu: Add guest support to xfd_enable_feature() ...
2022-01-12Merge tag 'perf_core_for_v5.17_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Borislav Petkov: "Cleanup of the perf/kvm interaction." * tag 'perf_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Drop guest callback (un)register stubs KVM: arm64: Drop perf.c and fold its tiny bits of code into arm.c KVM: arm64: Hide kvm_arm_pmu_available behind CONFIG_HW_PERF_EVENTS=y KVM: arm64: Convert to the generic perf callbacks KVM: x86: Move Intel Processor Trace interrupt handler to vmx.c KVM: Move x86's perf guest info callbacks to generic KVM KVM: x86: More precisely identify NMI from guest when handling PMI KVM: x86: Drop current_vcpu for kvm_running_vcpu + kvm_arch_vcpu variable perf/core: Use static_call to optimize perf_guest_info_callbacks perf: Force architectures to opt-in to guest callbacks perf: Add wrappers for invoking guest callbacks perf/core: Rework guest callbacks to prepare for static_call support perf: Drop dead and useless guest "support" from arm, csky, nds32 and riscv perf: Stop pretending that perf can handle multiple guest callbacks KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest KVM: x86: Register perf callbacks after calling vendor's hardware_setup() perf: Protect perf_guest_cbs with RCU
2022-01-07Merge tag 'kvmarm-5.17' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.16 - Simplification of the 'vcpu first run' by integrating it into KVM's 'pid change' flow - Refactoring of the FP and SVE state tracking, also leading to a simpler state and less shared data between EL1 and EL2 in the nVHE case - Tidy up the header file usage for the nvhe hyp object - New HYP unsharing mechanism, finally allowing pages to be unmapped from the Stage-1 EL2 page-tables - Various pKVM cleanups around refcounting and sharing - A couple of vgic fixes for bugs that would trigger once the vcpu xarray rework is merged, but not sooner - Add minimal support for ARMv8.7's PMU extension - Rework kvm_pgtable initialisation ahead of the NV work - New selftest for IRQ injection - Teach selftests about the lack of default IPA space and page sizes - Expand sysreg selftest to deal with Pointer Authentication - The usual bunch of cleanups and doc update
2021-12-16KVM: arm64: pkvm: Unshare guest structs during teardownQuentin Perret
Make use of the newly introduced unshare hypercall during guest teardown to unmap guest-related data structures from the hyp stage-1. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-15-qperret@google.com
2021-12-16KVM: arm64: Introduce kvm_share_hyp()Quentin Perret
The create_hyp_mappings() function can currently be called at any point in time. However, its behaviour in protected mode changes widely depending on when it is being called. Prior to KVM init, it is used to create the temporary page-table used to bring-up the hypervisor, and later on it is transparently turned into a 'share' hypercall when the kernel has lost control over the hypervisor stage-1. In order to prepare the ground for also unsharing pages with the hypervisor during guest teardown, introduce a kvm_share_hyp() function to make it clear in which places a share hypercall should be expected, as we will soon need a matching unshare hypercall in all those places. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-7-qperret@google.com
2021-12-08KVM: Add helpers to wake/query blocking vCPUSean Christopherson
Add helpers to wake and query a blocking vCPU. In addition to providing nice names, the helpers reduce the probability of KVM neglecting to use kvm_arch_vcpu_get_wait(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-20-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: Rename kvm_vcpu_block() => kvm_vcpu_halt()Sean Christopherson
Rename kvm_vcpu_block() to kvm_vcpu_halt() in preparation for splitting the actual "block" sequences into a separate helper (to be named kvm_vcpu_block()). x86 will use the standalone block-only path to handle non-halt cases where the vCPU is not runnable. Rename block_ns to halt_ns to match the new function name. No functional change intended. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: arm64: Move vGIC v4 handling for WFI out arch callback hookSean Christopherson
Move the put and reload of the vGIC out of the block/unblock callbacks and into a dedicated WFI helper. Functionally, this is nearly a nop as the block hook is called at the very beginning of kvm_vcpu_block(), and the only code in kvm_vcpu_block() after the unblock hook is to update the halt-polling controls, i.e. can only affect the next WFI. Back when the arch (un)blocking hooks were added by commits 3217f7c25bca ("KVM: Add kvm_arch_vcpu_{un}blocking callbacks) and d35268da6687 ("arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block"), the hooks were invoked only when KVM was about to "block", i.e. schedule out the vCPU. The use case at the time was to schedule a timer in the host based on the earliest timer in the guest in order to wake the blocking vCPU when the emulated guest timer fired. Commit accb99bcd0ca ("KVM: arm/arm64: Simplify bg_timer programming") reworked the timer logic to be even more precise, by waiting until the vCPU was actually scheduled out, and so move the timer logic from the (un)blocking hooks to vcpu_load/put. In the meantime, the hooks gained usage for enabling vGIC v4 doorbells in commit df9ba95993b9 ("KVM: arm/arm64: GICv4: Use the doorbell interrupt as an unblocking source"), and added related logic for the VMCR in commit 5eeaf10eec39 ("KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block"). Finally, commit 07ab0f8d9a12 ("KVM: Call kvm_arch_vcpu_blocking early into the blocking sequence") hoisted the (un)blocking hooks so that they wrapped KVM's halt-polling logic in addition to the core "block" logic. In other words, the original need for arch hooks to take action _only_ in the block path is long since gone. Cc: Oliver Upton <oupton@google.com> Cc: Marc Zyngier <maz@kernel.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: Use 'unsigned long' as kvm_for_each_vcpu()'s indexMarc Zyngier
Everywhere we use kvm_for_each_vpcu(), we use an int as the vcpu index. Unfortunately, we're about to move rework the iterator, which requires this to be upgrade to an unsigned long. Let's bite the bullet and repaint all of it in one go. Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20211116160403.4074052-7-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: Move wiping of the kvm->vcpus array to common codeMarc Zyngier
All architectures have similar loops iterating over the vcpus, freeing one vcpu at a time, and eventually wiping the reference off the vcpus array. They are also inconsistently taking the kvm->lock mutex when wiping the references from the array. Make this code common, which will simplify further changes. The locking is dropped altogether, as this should only be called when there is no further references on the kvm structure. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20211116160403.4074052-2-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-01Merge branch kvm-arm64/fpsimd-tracking into kvmarm-master/nextMarc Zyngier
* kvm-arm64/fpsimd-tracking: : . : Simplify the handling of both the FP/SIMD and SVE state by : removing the need for mapping the thread at EL2, and by : dropping the tracking of the host's SVE state which is : always invalid by construction. : . arm64/fpsimd: Document the use of TIF_FOREIGN_FPSTATE by KVM KVM: arm64: Stop mapping current thread_info at EL2 KVM: arm64: Introduce flag shadowing TIF_FOREIGN_FPSTATE KVM: arm64: Remove unused __sve_save_state KVM: arm64: Get rid of host SVE tracking/saving KVM: arm64: Reorder vcpu flag definitions Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-12-01KVM: arm64: Drop vcpu->arch.has_run_once for vcpu->pidMarc Zyngier
With the transition to kvm_arch_vcpu_run_pid_change() to handle the "run once" activities, it becomes obvious that has_run_once is now an exact shadow of vcpu->pid. Replace vcpu->arch.has_run_once with a new vcpu_has_run_once() helper that directly checks for vcpu->pid, and get rid of the now unused field. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-12-01KVM: arm64: Merge kvm_arch_vcpu_run_pid_change() and kvm_vcpu_first_run_init()Marc Zyngier
The kvm_arch_vcpu_run_pid_change() helper gets called on each PID change. The kvm_vcpu_first_run_init() helper gets run on the... first run(!) of a vcpu. As it turns out, the first run of a vcpu also triggers a PID change event (vcpu->pid is initially NULL). Use this property to merge these two helpers and get rid of another arm64-specific oddity. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-12-01KVM: arm64: Restructure the point where has_run_once is advertisedMarc Zyngier
Restructure kvm_vcpu_first_run_init() to set the has_run_once flag after having completed all the "run once" activities. This includes moving the flip of the userspace irqchip static key to a point where nothing can fail. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-12-01KVM: arm64: Move kvm_arch_vcpu_run_pid_change() out of lineMarc Zyngier
Having kvm_arch_vcpu_run_pid_change() inline doesn't bring anything to the table. Move it next to kvm_vcpu_first_run_init(), which will be convenient for what is next to come. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-11-22KVM: arm64: Introduce flag shadowing TIF_FOREIGN_FPSTATEMarc Zyngier
We currently have to maintain a mapping the thread_info structure at EL2 in order to be able to check the TIF_FOREIGN_FPSTATE flag. In order to eventually get rid of this, start with a vcpu flag that shadows the thread flag on each entry into the hypervisor. Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-11-18KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()Vitaly Kuznetsov
Generally, it doesn't make sense to return the recommended maximum number of vCPUs which exceeds the maximum possible number of vCPUs. Note: ARM64 is special as the value returned by KVM_CAP_MAX_VCPUS differs depending on whether it is a system-wide ioctl or a per-VM one. Previously, KVM_CAP_NR_VCPUS didn't have this difference and it seems preferable to keep the status quo. Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus() which is what gets returned by system-wide KVM_CAP_MAX_VCPUS. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211116163443.88707-2-vkuznets@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-17KVM: arm64: Drop perf.c and fold its tiny bits of code into arm.cSean Christopherson
Call KVM's (un)register perf callbacks helpers directly from arm.c and delete perf.c No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211111020738.2512932-17-seanjc@google.com
2021-11-17KVM: Move x86's perf guest info callbacks to generic KVMSean Christopherson
Move x86's perf guest callbacks into common KVM, as they are semantically identical to arm64's callbacks (the only other such KVM callbacks). arm64 will convert to the common versions in a future patch. Implement the necessary arm64 arch hooks now to avoid having to provide stubs or a temporary #define (from x86) to avoid arm64 compilation errors when CONFIG_GUEST_PERF_EVENTS=y. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211111020738.2512932-13-seanjc@google.com
2021-11-12Merge tag 'kvmarm-fixes-5.16-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm64 fixes for 5.16, take #1 - Fix the host S2 finalization by solely iterating over the memblocks instead of the whole IPA space - Tighten the return value of kvm_vcpu_preferred_target() now that 32bit support is long gone - Make sure the extraction of ESR_ELx.EC is limited to the architected bits - Comment fixups
2021-11-08KVM: arm64: Change the return type of kvm_vcpu_preferred_target()YueHaibing
kvm_vcpu_preferred_target() always return 0 because kvm_target_cpu() never returns a negative error code. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211105011500.16280-1-yuehaibing@huawei.com
2021-10-31Merge tag 'kvmarm-5.16' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.16 - More progress on the protected VM front, now with the full fixed feature set as well as the limitation of some hypercalls after initialisation. - Cleanup of the RAZ/WI sysreg handling, which was pointlessly complicated - Fixes for the vgic placement in the IPA space, together with a bunch of selftests - More memcg accounting of the memory allocated on behalf of a guest - Timer and vgic selftests - Workarounds for the Apple M1 broken vgic implementation - KConfig cleanups - New kvmarm.mode=none option, for those who really dislike us
2021-10-18Merge branch kvm-arm64/pkvm/fixed-features into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pkvm/fixed-features: (22 commits) : . : Add the pKVM fixed feature that allows a bunch of exceptions : to either be forbidden or be easily handled at EL2. : . KVM: arm64: pkvm: Give priority to standard traps over pvm handling KVM: arm64: pkvm: Pass vpcu instead of kvm to kvm_get_exit_handler_array() KVM: arm64: pkvm: Move kvm_handle_pvm_restricted around KVM: arm64: pkvm: Consolidate include files KVM: arm64: pkvm: Preserve pending SError on exit from AArch32 KVM: arm64: pkvm: Handle GICv3 traps as required KVM: arm64: pkvm: Drop sysregs that should never be routed to the host KVM: arm64: pkvm: Drop AArch32-specific registers KVM: arm64: pkvm: Make the ERR/ERX*_EL1 registers RAZ/WI KVM: arm64: pkvm: Use a single function to expose all id-regs KVM: arm64: Fix early exit ptrauth handling KVM: arm64: Handle protected guests at 32 bits KVM: arm64: Trap access to pVM restricted features KVM: arm64: Move sanitized copies of CPU features KVM: arm64: Initialize trap registers for protected VMs KVM: arm64: Add handlers for protected VM System Registers KVM: arm64: Simplify masking out MTE in feature id reg KVM: arm64: Add missing field descriptor for MDCR_EL2 KVM: arm64: Pass struct kvm to per-EC handlers KVM: arm64: Move early handlers to per-EC handlers ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-17Merge branch kvm-arm64/memory-accounting into kvmarm-master/nextMarc Zyngier
* kvm-arm64/memory-accounting: : . : Sprinkle a bunch of GFP_KERNEL_ACCOUNT all over the code base : to better track memory allocation made on behalf of a VM. : . KVM: arm64: Add memcg accounting to KVM allocations KVM: arm64: vgic: Add memcg accounting to vgic allocations Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-17KVM: arm64: Add memcg accounting to KVM allocationsJia He
Inspired by commit 254272ce6505 ("kvm: x86: Add memcg accounting to KVM allocations"), it would be better to make arm64 KVM consistent with common kvm codes. The memory allocations of VM scope should be charged into VM process cgroup, hence change GFP_KERNEL to GFP_KERNEL_ACCOUNT. There remain a few cases since these allocations are global, not in VM scope. Signed-off-by: Jia He <justin.he@arm.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210907123112.10232-3-justin.he@arm.com
2021-10-11KVM: arm64: Initialize trap registers for protected VMsFuad Tabba
Protected VMs have more restricted features that need to be trapped. Moreover, the host should not be trusted to set the appropriate trapping registers and their values. Initialize the trapping registers, i.e., hcr_el2, mdcr_el2, and cptr_el2 at EL2 for protected guests, based on the values of the guest's feature id registers. No functional change intended as trap handlers introduced in the previous patch are still not hooked in to the guest exit handlers. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010145636.1950948-9-tabba@google.com