summaryrefslogtreecommitdiff
path: root/arch/arm64/include/asm/kvm_host.h
AgeCommit message (Collapse)Author
2024-02-19KVM: arm64: Add Fine-Grained UNDEF tracking informationMarc Zyngier
In order to efficiently handle system register access being disabled, and this resulting in an UNDEF exception being injected, we introduce the (slightly dubious) concept of Fine-Grained UNDEF, modeled after the architectural Fine-Grained Traps. For each FGT group, we keep a 64 bit word that has the exact same bit assignment as the corresponding FGT register, where a 1 indicates that trapping this register should result in an UNDEF exception being reinjected. So far, nothing populates this information, nor sets the corresponding trap bits. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240214131827.2856277-18-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-19KVM: arm64: Register AArch64 system register entries with the sysreg xarrayMarc Zyngier
In order to reduce the number of lookups that we have to perform when handling a sysreg, register each AArch64 sysreg descriptor with the global xarray. The index of the descriptor is stored as a 10 bit field in the data word. Subsequent patches will retrieve and use the stored index. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240214131827.2856277-15-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-19KVM: arm64: nv: Add sanitising to VNCR-backed sysregsMarc Zyngier
VNCR-backed "registers" are actually only memory. Which means that there is zero control over what the guest can write, and that it is the hypervisor's job to actually sanitise the content of the backing store. Yeah, this is fun. In order to preserve some form of sanity, add a repainting mechanism that makes use of a per-VM set of RES0/RES1 masks, one pair per VNCR register. These masks get applied on access to the backing store via __vcpu_sys_reg(), ensuring that the state that is consumed by KVM is correct. So far, nothing populates these masks, but stay tuned. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20240214131827.2856277-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-19KVM: arm64: Add feature checking helpersMarc Zyngier
In order to make it easier to check whether a particular feature is exposed to a guest, add a new set of helpers, with kvm_has_feat() being the most useful. Let's start making use of them in the PMU code (courtesy of Oliver). Follow-up changes will introduce additional use patterns. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Co-developed--by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240214131827.2856277-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-01-08Merge tag 'kvmarm-6.8' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 6.8 - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base granule sizes. Branch shared with the arm64 tree. - Large Fine-Grained Trap rework, bringing some sanity to the feature, although there is more to come. This comes with a prefix branch shared with the arm64 tree. - Some additional Nested Virtualization groundwork, mostly introducing the NV2 VNCR support and retargetting the NV support to that version of the architecture. - A small set of vgic fixes and associated cleanups.
2023-12-19Merge branch kvm-arm64/nv-6.8-prefix into kvmarm-master/nextMarc Zyngier
* kvm-arm64/nv-6.8-prefix: : . : Nested Virtualization support update, focussing on the : NV2 support (VNCR mapping and such). : . KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg() KVM: arm64: nv: Map VNCR-capable registers to a separate page KVM: arm64: nv: Add EL2_REG_VNCR()/EL2_REG_REDIR() sysreg helpers KVM: arm64: Introduce a bad_trap() primitive for unexpected trap handling KVM: arm64: nv: Add include containing the VNCR_EL2 offsets KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers KVM: arm64: nv: Drop EL12 register traps that are redirected to VNCR KVM: arm64: nv: Compute NV view of idregs as a one-off KVM: arm64: nv: Hoist vcpu_has_nv() into is_hyp_ctxt() arm64: cpufeatures: Restrict NV support to FEAT_NV2 Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-12-19KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()Marc Zyngier
KVM internally uses accessor functions when reading or writing the guest's system registers. This takes care of accessing either the stored copy or using the "live" EL1 system registers when the host uses VHE. With the introduction of virtual EL2 we add a bunch of EL2 system registers, which now must also be taken care of: - If the guest is running in vEL2, and we access an EL1 sysreg, we must revert to the stored version of that, and not use the CPU's copy. - If the guest is running in vEL1, and we access an EL2 sysreg, we must also use the stored version, since the CPU carries the EL1 copy. - Some EL2 system registers are supposed to affect the current execution of the system, so we need to put them into their respective EL1 counterparts. For this we need to define a mapping between the two. - Some EL2 system registers have a different format than their EL1 counterpart, so we need to translate them before writing them to the CPU. This is done using an (optional) translate function in the map. All of these cases are now wrapped into the existing accessor functions, so KVM users wouldn't need to care whether they access EL2 or EL1 registers and also which state the guest is in. Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Co-developed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-12-19KVM: arm64: nv: Map VNCR-capable registers to a separate pageMarc Zyngier
With ARMv8.4-NV, registers that can be directly accessed in memory by the guest have to live at architected offsets in a special page. Let's annotate the sysreg enum to reflect the offset at which they are in this page, whith a little twist: If running on HW that doesn't have the ARMv8.4-NV feature, or even a VM that doesn't use NV, we store all the system registers in the usual sys_regs array. The only difference with the pre-8.4 situation is that VNCR-capable registers are at a "similar" offset as in the VNCR page (we can compute the actual offset at compile time), and that the sys_regs array is both bigger and sparse. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-12-19KVM: arm64: nv: Compute NV view of idregs as a one-offMarc Zyngier
Now that we have a full copy of the idregs for each VM, there is no point in repainting the sysregs on each access. Instead, we can simply perform the transmation as a one-off and be done with it. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-12-19KVM: arm64: nv: Hoist vcpu_has_nv() into is_hyp_ctxt()Marc Zyngier
A rather common idiom when writing NV code as part of KVM is to have things such has: if (vcpu_has_nv(vcpu) && is_hyp_ctxt(vcpu)) { [...] } to check that we are in a hyp-related context. The second part of the conjunction would be enough, but the first one contains a static key that allows the rest of the checkis to be elided when in a non-NV environment. Rewrite is_hyp_ctxt() to directly use vcpu_has_nv(). The result is the same, and the code easier to read. The one occurence of this that is already merged is rewritten in the process. In order to avoid nasty cirtular dependencies between kvm_emulate.h and kvm_nested.h, vcpu_has_feature() is itself hoisted into kvm_host.h, at the cost of some #deferry... Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-12-18KVM: arm64: Handle HAFGRTR_EL2 trapping in nested virtFuad Tabba
Add the encodings to fine grain trapping fields for HAFGRTR_EL2 and add the associated handling code in nested virt. Based on DDI0601 2023-09. Add the missing field definitions as well, both to generate the correct RES0 mask and to be able to toggle their FGT bits. Also add the code for handling FGT trapping, reading of the register, to nested virt. Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231214100158.2305400-10-tabba@google.com
2023-11-14Merge branch 'kvm-guestmemfd' into HEADPaolo Bonzini
Introduce several new KVM uAPIs to ultimately create a guest-first memory subsystem within KVM, a.k.a. guest_memfd. Guest-first memory allows KVM to provide features, enhancements, and optimizations that are kludgly or outright impossible to implement in a generic memory subsystem. The core KVM ioctl() for guest_memfd is KVM_CREATE_GUEST_MEMFD, which similar to the generic memfd_create(), creates an anonymous file and returns a file descriptor that refers to it. Again like "regular" memfd files, guest_memfd files live in RAM, have volatile storage, and are automatically released when the last reference is dropped. The key differences between memfd files (and every other memory subystem) is that guest_memfd files are bound to their owning virtual machine, cannot be mapped, read, or written by userspace, and cannot be resized. guest_memfd files do however support PUNCH_HOLE, which can be used to convert a guest memory area between the shared and guest-private states. A second KVM ioctl(), KVM_SET_MEMORY_ATTRIBUTES, allows userspace to specify attributes for a given page of guest memory. In the long term, it will likely be extended to allow userspace to specify per-gfn RWX protections, including allowing memory to be writable in the guest without it also being writable in host userspace. The immediate and driving use case for guest_memfd are Confidential (CoCo) VMs, specifically AMD's SEV-SNP, Intel's TDX, and KVM's own pKVM. For such use cases, being able to map memory into KVM guests without requiring said memory to be mapped into the host is a hard requirement. While SEV+ and TDX prevent untrusted software from reading guest private data by encrypting guest memory, pKVM provides confidentiality and integrity *without* relying on memory encryption. In addition, with SEV-SNP and especially TDX, accessing guest private memory can be fatal to the host, i.e. KVM must be prevent host userspace from accessing guest memory irrespective of hardware behavior. Long term, guest_memfd may be useful for use cases beyond CoCo VMs, for example hardening userspace against unintentional accesses to guest memory. As mentioned earlier, KVM's ABI uses userspace VMA protections to define the allow guest protection (with an exception granted to mapping guest memory executable), and similarly KVM currently requires the guest mapping size to be a strict subset of the host userspace mapping size. Decoupling the mappings sizes would allow userspace to precisely map only what is needed and with the required permissions, without impacting guest performance. A guest-first memory subsystem also provides clearer line of sight to things like a dedicated memory pool (for slice-of-hardware VMs) and elimination of "struct page" (for offload setups where userspace _never_ needs to DMA from or into guest memory). guest_memfd is the result of 3+ years of development and exploration; taking on memory management responsibilities in KVM was not the first, second, or even third choice for supporting CoCo VMs. But after many failed attempts to avoid KVM-specific backing memory, and looking at where things ended up, it is quite clear that of all approaches tried, guest_memfd is the simplest, most robust, and most extensible, and the right thing to do for KVM and the kernel at-large. The "development cycle" for this version is going to be very short; ideally, next week I will merge it as is in kvm/next, taking this through the KVM tree for 6.8 immediately after the end of the merge window. The series is still based on 6.6 (plus KVM changes for 6.7) so it will require a small fixup for changes to get_file_rcu() introduced in 6.7 by commit 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU"). The fixup will be done as part of the merge commit, and most of the text above will become the commit message for the merge. Pending post-merge work includes: - hugepage support - looking into using the restrictedmem framework for guest memory - introducing a testing mechanism to poison memory, possibly using the same memory attributes introduced here - SNP and TDX support There are two non-KVM patches buried in the middle of this series: fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure() mm: Add AS_UNMOVABLE to mark mapping as completely unmovable The first is small and mostly suggested-by Christian Brauner; the second a bit less so but it was written by an mm person (Vlastimil Babka).
2023-11-13KVM: Convert KVM_ARCH_WANT_MMU_NOTIFIER to CONFIG_KVM_GENERIC_MMU_NOTIFIERSean Christopherson
Convert KVM_ARCH_WANT_MMU_NOTIFIER into a Kconfig and select it where appropriate to effectively maintain existing behavior. Using a proper Kconfig will simplify building more functionality on top of KVM's mmu_notifier infrastructure. Add a forward declaration of kvm_gfn_range to kvm_types.h so that including arch/powerpc/include/asm/kvm_ppc.h's with CONFIG_KVM=n doesn't generate warnings due to kvm_gfn_range being undeclared. PPC defines hooks for PR vs. HV without guarding them via #ifdeffery, e.g. bool (*unmap_gfn_range)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*test_age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*set_spte_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); Alternatively, PPC could forward declare kvm_gfn_range, but there's no good reason not to define it in common KVM. Acked-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Message-Id: <20231027182217.3615211-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest - Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table - Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM - Guest support for memory operation instructions (FEAT_MOPS) - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code - Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations - Miscellaneous kernel and selftest fixes LoongArch: - New architecture for kvm. The hardware uses the same model as x86, s390 and RISC-V, where guest/host mode is orthogonal to supervisor/user mode. The virtualization extensions are very similar to MIPS, therefore the code also has some similarities but it's been cleaned up to avoid some of the historical bogosities that are found in arch/mips. The kernel emulates MMU, timer and CSR accesses, while interrupt controllers are only emulated in userspace, at least for now. RISC-V: - Support for the Smstateen and Zicond extensions - Support for virtualizing senvcfg - Support for virtualized SBI debug console (DBCN) S390: - Nested page table management can be monitored through tracepoints and statistics x86: - Fix incorrect handling of VMX posted interrupt descriptor in KVM_SET_LAPIC, which could result in a dropped timer IRQ - Avoid WARN on systems with Intel IPI virtualization - Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs without forcing more common use cases to eat the extra memory overhead. - Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and SBPB, aka Selective Branch Predictor Barrier). - Fix a bug where restoring a vCPU snapshot that was taken within 1 second of creating the original vCPU would cause KVM to try to synchronize the vCPU's TSC and thus clobber the correct TSC being set by userspace. - Compute guest wall clock using a single TSC read to avoid generating an inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads. - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain about a "Firmware Bug" if the bit isn't set for select F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to appease Windows Server 2022. - Don't apply side effects to Hyper-V's synthetic timer on writes from userspace to fix an issue where the auto-enable behavior can trigger spurious interrupts, i.e. do auto-enabling only for guest writes. - Remove an unnecessary kick of all vCPUs when synchronizing the dirty log without PML enabled. - Advertise "support" for non-serializing FS/GS base MSR writes as appropriate. - Harden the fast page fault path to guard against encountering an invalid root when walking SPTEs. - Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n. - Use the fast path directly from the timer callback when delivering Xen timer events, instead of waiting for the next iteration of the run loop. This was not done so far because previously proposed code had races, but now care is taken to stop the hrtimer at critical points such as restarting the timer or saving the timer information for userspace. - Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag. - Optimize injection of PMU interrupts that are simultaneous with NMIs. - Usual handful of fixes for typos and other warts. x86 - MTRR/PAT fixes and optimizations: - Clean up code that deals with honoring guest MTRRs when the VM has non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled. - Zap EPT entries when non-coherent DMA assignment stops/start to prevent using stale entries with the wrong memtype. - Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y This was done as a workaround for virtual machine BIOSes that did not bother to clear CR0.CD (because ancient KVM/QEMU did not bother to set it, in turn), and there's zero reason to extend the quirk to also ignore guest PAT. x86 - SEV fixes: - Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while running an SEV-ES guest. - Clean up the recognition of emulation failures on SEV guests, when KVM would like to "skip" the instruction but it had already been partially emulated. This makes it possible to drop a hack that second guessed the (insufficient) information provided by the emulator, and just do the right thing. Documentation: - Various updates and fixes, mostly for x86 - MTRR and PAT fixes and optimizations" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (164 commits) KVM: selftests: Avoid using forced target for generating arm64 headers tools headers arm64: Fix references to top srcdir in Makefile KVM: arm64: Add tracepoint for MMIO accesses where ISV==0 KVM: arm64: selftest: Perform ISB before reading PAR_EL1 KVM: arm64: selftest: Add the missing .guest_prepare() KVM: arm64: Always invalidate TLB for stage-2 permission faults KVM: x86: Service NMI requests after PMI requests in VM-Enter path KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs KVM: arm64: Refine _EL2 system register list that require trap reinjection arm64: Add missing _EL2 encodings arm64: Add missing _EL12 encodings KVM: selftests: aarch64: vPMU test for validating user accesses KVM: selftests: aarch64: vPMU register test for unimplemented counters KVM: selftests: aarch64: vPMU register test for implemented counters KVM: selftests: aarch64: Introduce vpmu_counter_access test tools: Import arm_pmuv3.h KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} ...
2023-10-30Merge branch kvm-arm64/pmu_pmcr_n into kvmarm/nextOliver Upton
* kvm-arm64/pmu_pmcr_n: : User-defined PMC limit, courtesy Raghavendra Rao Ananta : : Certain VMMs may want to reserve some PMCs for host use while running a : KVM guest. This was a bit difficult before, as KVM advertised all : supported counters to the guest. Userspace can now limit the number of : advertised PMCs by writing to PMCR_EL0.N, as KVM's sysreg and PMU : emulation enforce the specified limit for handling guest accesses. KVM: selftests: aarch64: vPMU test for validating user accesses KVM: selftests: aarch64: vPMU register test for unimplemented counters KVM: selftests: aarch64: vPMU register test for implemented counters KVM: selftests: aarch64: Introduce vpmu_counter_access test tools: Import arm_pmuv3.h KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} KVM: arm64: PMU: Set PMCR_EL0.N for vCPU based on the associated PMU KVM: arm64: PMU: Add a helper to read a vCPU's PMCR_EL0 KVM: arm64: Select default PMU in KVM_ARM_VCPU_INIT handler KVM: arm64: PMU: Introduce helpers to set the guest's PMU Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/writable-id-regs into kvmarm/nextOliver Upton
* kvm-arm64/writable-id-regs: : Writable ID registers, courtesy of Jing Zhang : : This series significantly expands the architectural feature set that : userspace can manipulate via the ID registers. A new ioctl is defined : that makes the mutable fields in the ID registers discoverable to : userspace. KVM: selftests: Avoid using forced target for generating arm64 headers tools headers arm64: Fix references to top srcdir in Makefile KVM: arm64: selftests: Test for setting ID register from usersapce tools headers arm64: Update sysreg.h with kernel sources KVM: selftests: Generate sysreg-defs.h and add to include path perf build: Generate arm64's sysreg-defs.h and add to include path tools: arm64: Add a Makefile for generating sysreg-defs.h KVM: arm64: Document vCPU feature selection UAPIs KVM: arm64: Allow userspace to change ID_AA64ZFR0_EL1 KVM: arm64: Allow userspace to change ID_AA64PFR0_EL1 KVM: arm64: Allow userspace to change ID_AA64MMFR{0-2}_EL1 KVM: arm64: Allow userspace to change ID_AA64ISAR{0-2}_EL1 KVM: arm64: Bump up the default KVM sanitised debug version to v8p8 KVM: arm64: Reject attempts to set invalid debug arch version KVM: arm64: Advertise selected DebugVer in DBGDIDR.Version KVM: arm64: Use guest ID register values for the sake of emulation KVM: arm64: Document KVM_ARM_GET_REG_WRITABLE_MASKS KVM: arm64: Allow userspace to get the writable masks for feature ID registers Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/sgi-injection into kvmarm/nextOliver Upton
* kvm-arm64/sgi-injection: : vSGI injection improvements + fixes, courtesy Marc Zyngier : : Avoid linearly searching for vSGI targets using a compressed MPIDR to : index a cache. While at it, fix some egregious bugs in KVM's mishandling : of vcpuid (user-controlled value) and vcpu_idx. KVM: arm64: Clarify the ordering requirements for vcpu/RD creation KVM: arm64: vgic-v3: Optimize affinity-based SGI injection KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available KVM: arm64: Build MPIDR to vcpu index cache at runtime KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff() KVM: arm64: Use vcpu_idx for invalidation tracking KVM: arm64: vgic: Use vcpu_idx for the debug information KVM: arm64: vgic-v2: Use cpuid from userspace as vcpu_id KVM: arm64: vgic-v3: Refactor GICv3 SGI generation KVM: arm64: vgic-its: Treat the collection target address as a vcpu_id KVM: arm64: vgic: Make kvm_vgic_inject_irq() take a vcpu pointer Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/stage2-vhe-load into kvmarm/nextOliver Upton
* kvm-arm64/stage2-vhe-load: : Setup stage-2 MMU from vcpu_load() for VHE : : Unlike nVHE, there is no need to switch the stage-2 MMU around on guest : entry/exit in VHE mode as the host is running at EL2. Despite this KVM : reloads the stage-2 on every guest entry, which is needless. : : This series moves the setup of the stage-2 MMU context to vcpu_load() : when running in VHE mode. This is likely to be a win across the board, : but also allows us to remove an ISB on the guest entry path for systems : with one of the speculative AT errata. KVM: arm64: Move VTCR_EL2 into struct s2_mmu KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe() KVM: arm64: Rename helpers for VHE vCPU load/put KVM: arm64: Reload stage-2 for VMID change on VHE KVM: arm64: Restore the stage-2 context in VHE's __tlb_switch_to_host() KVM: arm64: Don't zero VTTBR in __tlb_switch_to_host() Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/smccc-filter-cleanups into kvmarm/nextOliver Upton
* kvm-arm64/smccc-filter-cleanups: : Cleanup the management of KVM's SMCCC maple tree : : Avoid the cost of maintaining the SMCCC filter maple tree if userspace : hasn't writen a rule to the filter. While at it, rip out the now : unnecessary VM flag to indicate whether or not the SMCCC filter was : configured. KVM: arm64: Use mtree_empty() to determine if SMCCC filter configured KVM: arm64: Only insert reserved ranges when SMCCC filter is used KVM: arm64: Add a predicate for testing if SMCCC filter is configured Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24KVM: arm64: PMU: Set PMCR_EL0.N for vCPU based on the associated PMURaghavendra Rao Ananta
The number of PMU event counters is indicated in PMCR_EL0.N. For a vCPU with PMUv3 configured, the value is set to the same value as the current PE on every vCPU reset. Unless the vCPU is pinned to PEs that has the PMU associated to the guest from the initial vCPU reset, the value might be different from the PMU's PMCR_EL0.N on heterogeneous PMU systems. Fix this by setting the vCPU's PMCR_EL0.N to the PMU's PMCR_EL0.N value. Track the PMCR_EL0.N per guest, as only one PMU can be set for the guest (PMCR_EL0.N must be the same for all vCPUs of the guest), and it is convenient for updating the value. To achieve this, the patch introduces a helper, kvm_arm_pmu_get_max_counters(), that reads the maximum number of counters from the arm_pmu associated to the VM. Make the function global as upcoming patches will be interested to know the value while setting the PMCR.N of the guest from userspace. KVM does not yet support userspace modifying PMCR_EL0.N. The following patch will add support for that. Reviewed-by: Sebastian Ott <sebott@redhat.com> Co-developed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Link: https://lore.kernel.org/r/20231020214053.2144305-5-rananta@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-23KVM: arm64: Move VTCR_EL2 into struct s2_mmuMarc Zyngier
We currently have a global VTCR_EL2 value for each guest, even if the guest uses NV. This implies that the guest's own S2 must fit in the host's. This is odd, for multiple reasons: - the PARange values and the number of IPA bits don't necessarily match: you can have 33 bits of IPA space, and yet you can only describe 32 or 36 bits of PARange - When userspace set the IPA space, it creates a contract with the kernel saying "this is the IPA space I'm prepared to handle". At no point does it constraint the guest's own IPA space as long as the guest doesn't try to use a [I]PA outside of the IPA space set by userspace - We don't even try to hide the value of ID_AA64MMFR0_EL1.PARange. And then there is the consequence of the above: if a guest tries to create a S2 that has for input address something that is larger than the IPA space defined by the host, we inject a fatal exception. This is no good. For all intent and purposes, a guest should be able to have the S2 it really wants, as long as the *output* address of that S2 isn't outside of the IPA space. For that, we need to have a per-s2_mmu VTCR_EL2 setting, which allows us to represent the full PARange. Move the vctr field into the s2_mmu structure, which has no impact whatsoever, except for NV. Note that once we are able to override ID_AA64MMFR0_EL1.PARange from userspace, we'll also be able to restrict the size of the shadow S2 that NV uses. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231012205108.3937270-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-20KVM: arm64: Rename helpers for VHE vCPU load/putOliver Upton
The names for the helpers we expose to the 'generic' KVM code are a bit imprecise; we switch the EL0 + EL1 sysreg context and setup trap controls that do not need to change for every guest entry/exit. Rename + shuffle things around a bit in preparation for loading the stage-2 MMU context on vcpu_load(). Link: https://lore.kernel.org/r/20231018233212.2888027-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-20KVM: arm64: Reload stage-2 for VMID change on VHEMarc Zyngier
Naturally, a change to the VMID for an MMU implies a new value for VTTBR. Reload on VMID change in anticipation of loading stage-2 on vcpu_load() instead of every guest entry. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231018233212.2888027-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-16arm64: kvm: Use cpus_have_final_cap() explicitlyMark Rutland
Much of the arm64 KVM code uses cpus_have_const_cap() to check for cpucaps, but this is unnecessary and it would be preferable to use cpus_have_final_cap(). For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. KVM is initialized after cpucaps have been finalized and alternatives have been patched. Since commit: d86de40decaa14e6 ("arm64: cpufeature: upgrade hyp caps to final") ... use of cpus_have_const_cap() in hyp code is automatically converted to use cpus_have_final_cap(): | static __always_inline bool cpus_have_const_cap(int num) | { | if (is_hyp_code()) | return cpus_have_final_cap(num); | else if (system_capabilities_finalized()) | return __cpus_have_const_cap(num); | else | return cpus_have_cap(num); | } Thus, converting hyp code to use cpus_have_final_cap() directly will not result in any functional change. Non-hyp KVM code is also not executed until cpucaps have been finalized, and it would be preferable to extent the same treatment to this code and use cpus_have_final_cap() directly. This patch converts instances of cpus_have_const_cap() in KVM-only code over to cpus_have_final_cap(). As all of this code runs after cpucaps have been finalized, there should be no functional change as a result of this patch, but the redundant instructions generated by cpus_have_const_cap() will be removed from the non-hyp KVM code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-05KVM: arm64: Use mtree_empty() to determine if SMCCC filter configuredOliver Upton
The smccc_filter maple tree is only populated if userspace attempted to configure it. Use the state of the maple tree to determine if the filter has been configured, eliminating the VM flag. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231004234947.207507-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-04KVM: arm64: Allow userspace to get the writable masks for feature ID registersJing Zhang
While the Feature ID range is well defined and pretty large, it isn't inconceivable that the architecture will eventually grow some other ranges that will need to similarly be described to userspace. Add a VM ioctl to allow userspace to get writable masks for feature ID registers in below system register space: op0 = 3, op1 = {0, 1, 3}, CRn = 0, CRm = {0 - 7}, op2 = {0 - 7} This is used to support mix-and-match userspace and kernels for writable ID registers, where userspace may want to know upfront whether it can actually tweak the contents of an idreg or not. Add a new capability (KVM_CAP_ARM_SUPPORTED_FEATURE_ID_RANGES) that returns a bitmap of the valid ranges, which can subsequently be retrieved, one at a time by setting the index of the set bit as the range identifier. Suggested-by: Marc Zyngier <maz@kernel.org> Suggested-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jing Zhang <jingzhangos@google.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231003230408.3405722-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-09-30KVM: arm64: Build MPIDR to vcpu index cache at runtimeMarc Zyngier
The MPIDR_EL1 register contains a unique value that identifies the CPU. The only problem with it is that it is stupidly large (32 bits, once the useless stuff is removed). Trying to obtain a vcpu from an MPIDR value is a fairly common, yet costly operation: we iterate over all the vcpus until we find the correct one. While this is cheap for small VMs, it is pretty expensive on large ones, specially if you are trying to get to the one that's at the end of the list... In order to help with this, it is important to realise that the MPIDR values are actually structured, and that implementations tend to use a small number of significant bits in the 32bit space. We can use this fact to our advantage by computing a small hash table that uses the "compression" of the significant MPIDR bits as an index, giving us the vcpu index as a result. Given that the MPIDR values can be supplied by userspace, and that an evil VMM could decide to make *all* bits significant, resulting in a 4G-entry table, we only use this method if the resulting table fits in a single page. Otherwise, we fallback to the good old iterative method. Nothing uses that table just yet, but keep your eyes peeled. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Tested-by: Joey Gouly <joey.gouly@arm.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-9-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-09-21KVM: arm64: Get rid of vCPU-scoped feature bitmapOliver Upton
The vCPU-scoped feature bitmap was left in place a couple of releases ago in case the change to VM-scoped vCPU features broke anyone. Nobody has complained and the interop between VM and vCPU bitmaps is pretty gross. Throw it out. Link: https://lore.kernel.org/r/20230920195036.1169791-9-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-09-21KVM: arm64: Remove unused return value from kvm_reset_vcpu()Oliver Upton
Get rid of the return value for kvm_reset_vcpu() as there are no longer any cases where it returns a nonzero value. Link: https://lore.kernel.org/r/20230920195036.1169791-8-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-08-28Merge branch kvm-arm64/6.6/misc into kvmarm-master/nextMarc Zyngier
* kvm-arm64/6.6/misc: : . : Misc KVM/arm64 updates for 6.6: : : - Don't unnecessary align non-stack allocations in the EL2 VA space : : - Drop HCR_VIRT_EXCP_MASK, which was never used... : : - Don't use smp_processor_id() in kvm_arch_vcpu_load(), : but the cpu parameter instead : : - Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort() : : - Remove prototypes without implementations : . KVM: arm64: Remove size-order align in the nVHE hyp private VA range KVM: arm64: Remove unused declarations KVM: arm64: Remove redundant kvm_set_pfn_accessed() from user_mem_abort() KVM: arm64: Drop HCR_VIRT_EXCP_MASK KVM: arm64: Use the known cpu id instead of smp_processor_id() Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-08-28Merge branch kvm-arm64/6.6/pmu-fixes into kvmarm-master/nextMarc Zyngier
* kvm-arm64/6.6/pmu-fixes: : . : Another set of PMU fixes, coutrtesy of Reiji Watanabe. : From the cover letter: : : "This series fixes a couple of PMUver related handling of : vPMU support. : : On systems where the PMUVer is not uniform across all PEs, : KVM currently does not advertise PMUv3 to the guest, : even if userspace successfully runs KVM_ARM_VCPU_INIT with : KVM_ARM_VCPU_PMU_V3." : : Additionally, a fix for an obscure counter oversubscription : issue happening when the hsot profines the guest's EL0. : . KVM: arm64: pmu: Guard PMU emulation definitions with CONFIG_KVM KVM: arm64: pmu: Resync EL0 state on counter rotation KVM: arm64: PMU: Don't advertise STALL_SLOT_{FRONTEND,BACKEND} KVM: arm64: PMU: Don't advertise the STALL_SLOT event KVM: arm64: PMU: Avoid inappropriate use of host's PMUVer KVM: arm64: PMU: Disallow vPMU on non-uniform PMUVer Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-08-28Merge branch kvm-arm64/tlbi-range into kvmarm-master/nextMarc Zyngier
* kvm-arm64/tlbi-range: : . : FEAT_TLBIRANGE support, courtesy of Raghavendra Rao Ananta. : From the cover letter: : : "In certain code paths, KVM/ARM currently invalidates the entire VM's : page-tables instead of just invalidating a necessary range. For example, : when collapsing a table PTE to a block PTE, instead of iterating over : each PTE and flushing them, KVM uses 'vmalls12e1is' TLBI operation to : flush all the entries. This is inefficient since the guest would have : to refill the TLBs again, even for the addresses that aren't covered : by the table entry. The performance impact would scale poorly if many : addresses in the VM is going through this remapping. : : For architectures that implement FEAT_TLBIRANGE, KVM can replace such : inefficient paths by performing the invalidations only on the range of : addresses that are in scope. This series tries to achieve the same in : the areas of stage-2 map, unmap and write-protecting the pages." : . KVM: arm64: Use TLBI range-based instructions for unmap KVM: arm64: Invalidate the table entries upon a range KVM: arm64: Flush only the memslot after write-protect KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range() KVM: arm64: Define kvm_tlb_flush_vmid_range() KVM: arm64: Implement __kvm_tlb_flush_vmid_range() arm64: tlb: Implement __flush_s2_tlb_range_op() arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code KVM: Allow range-based TLB invalidation from common code KVM: Remove CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL KVM: arm64: Use kvm_arch_flush_remote_tlbs() KVM: Declare kvm_arch_flush_remote_tlbs() globally KVM: Rename kvm_arch_flush_remote_tlb() to kvm_arch_flush_remote_tlbs() Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-08-28Merge branch kvm-arm64/nv-trap-forwarding into kvmarm-master/nextMarc Zyngier
* kvm-arm64/nv-trap-forwarding: (30 commits) : . : This implements the so called "trap forwarding" infrastructure, which : gets used when we take a trap from an L2 guest and that the L1 guest : wants to see the trap for itself. : . KVM: arm64: nv: Add trap description for SPSR_EL2 and ELR_EL2 KVM: arm64: nv: Select XARRAY_MULTI to fix build error KVM: arm64: nv: Add support for HCRX_EL2 KVM: arm64: Move HCRX_EL2 switch to load/put on VHE systems KVM: arm64: nv: Expose FGT to nested guests KVM: arm64: nv: Add switching support for HFGxTR/HDFGxTR KVM: arm64: nv: Expand ERET trap forwarding to handle FGT KVM: arm64: nv: Add SVC trap forwarding KVM: arm64: nv: Add trap forwarding for HDFGxTR_EL2 KVM: arm64: nv: Add trap forwarding for HFGITR_EL2 KVM: arm64: nv: Add trap forwarding for HFGxTR_EL2 KVM: arm64: nv: Add fine grained trap forwarding infrastructure KVM: arm64: nv: Add trap forwarding for CNTHCTL_EL2 KVM: arm64: nv: Add trap forwarding for MDCR_EL2 KVM: arm64: nv: Expose FEAT_EVT to nested guests KVM: arm64: nv: Add trap forwarding for HCR_EL2 KVM: arm64: nv: Add trap forwarding infrastructure KVM: arm64: Restructure FGT register switching KVM: arm64: nv: Add FGT registers KVM: arm64: Add missing HCR_EL2 trap bits ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-08-22KVM: arm64: pmu: Resync EL0 state on counter rotationMarc Zyngier
Huang Shijie reports that, when profiling a guest from the host with a number of events that exceeds the number of available counters, the reported counts are wildly inaccurate. Without the counter oversubscription, the reported counts are correct. Their investigation indicates that upon counter rotation (which takes place on the back of a timer interrupt), we fail to re-apply the guest EL0 enabling, leading to the counting of host events instead of guest events. In order to solve this, add yet another hook between the host PMU driver and KVM, re-applying the guest EL0 configuration if the right conditions apply (the host is VHE, we are in interrupt context, and we interrupted a running vcpu). This triggers a new vcpu request which will apply the correct configuration on guest reentry. With this, we have the correct counts, even when the counters are oversubscribed. Reported-by: Huang Shijie <shijie@os.amperecomputing.com> Suggested-by: Oliver Upton <oliver.upton@linux.dev> Tested_by: Huang Shijie <shijie@os.amperecomputing.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230809013953.7692-1-shijie@os.amperecomputing.com Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230820090108.177817-1-maz@kernel.org
2023-08-17KVM: arm64: nv: Add support for HCRX_EL2Marc Zyngier
HCRX_EL2 has an interesting effect on HFGITR_EL2, as it conditions the traps of TLBI*nXS. Expand the FGT support to add a new Fine Grained Filter that will get checked when the instruction gets trapped, allowing the shadow register to override the trap as needed. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20230815183903.2735724-29-maz@kernel.org
2023-08-17KVM: arm64: nv: Add trap forwarding infrastructureMarc Zyngier
A significant part of what a NV hypervisor needs to do is to decide whether a trap from a L2+ guest has to be forwarded to a L1 guest or handled locally. This is done by checking for the trap bits that the guest hypervisor has set and acting accordingly, as described by the architecture. A previous approach was to sprinkle a bunch of checks in all the system register accessors, but this is pretty error prone and doesn't help getting an overview of what is happening. Instead, implement a set of global tables that describe a trap bit, combinations of trap bits, behaviours on trap, and what bits must be evaluated on a system register trap. Although this is painful to describe, this allows to specify each and every control bit in a static manner. To make it efficient, the table is inserted in an xarray that is global to the system, and checked each time we trap a system register while running a L2 guest. Add the basic infrastructure for now, while additional patches will implement configuration registers. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Jing Zhang <jingzhangos@google.com> Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Link: https://lore.kernel.org/r/20230815183903.2735724-15-maz@kernel.org
2023-08-17KVM: arm64: nv: Add FGT registersMarc Zyngier
Add the 5 registers covering FEAT_FGT. The AMU-related registers are currently left out as we don't have a plan for them. Yet. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Jing Zhang <jingzhangos@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230815183903.2735724-13-maz@kernel.org
2023-08-17KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()Raghavendra Rao Ananta
Implement kvm_arch_flush_remote_tlbs_range() for arm64 to invalidate the given range in the TLB. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230811045127.3308641-12-rananta@google.com
2023-08-17KVM: arm64: Use kvm_arch_flush_remote_tlbs()Raghavendra Rao Ananta
Stop depending on CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL and opt to standardize on kvm_arch_flush_remote_tlbs() since it avoids duplicating the generic TLB stats across architectures that implement their own remote TLB flush. This adds an extra function call to the ARM64 kvm_flush_remote_tlbs() path, but that is a small cost in comparison to flushing remote TLBs. In addition, instead of just incrementing remote_tlb_flush_requests stat, the generic interface would also increment the remote_tlb_flush stat. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230811045127.3308641-4-rananta@google.com
2023-08-15KVM: arm64: Remove unused declarationsYue Haibing
Commit 53692908b0f5 ("KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI") removed vgic_v2_set_npie()/vgic_v3_set_npie() but not the declarations. Commit 29eb5a3c57f7 ("KVM: arm64: Handle PtrAuth traps early") left behind kvm_arm_vcpu_ptrauth_trap(), remove it. Commit 2a0c343386ae ("KVM: arm64: Initialize trap registers for protected VMs") declared but never implemented kvm_init_protected_traps() and commit cf5d318865e2 ("arm/arm64: KVM: Turn off vcpus on PSCI shutdown/reboot") declared but never implemented force_vm_exit(). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Zenghui Yu <zenghui.yu@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230814140636.45988-1-yuehaibing@huawei.com
2023-07-28Merge branch kvm-arm64/6.6/generic-vcpu into kvmarm-master/nextMarc Zyngier
* kvm-arm64/6.6/generic-vcpu: : . : Cleanup the obsolete vcpu target abstraction, courtesy of Oliver. : From the cover letter: : : "kvm_vcpu_init::target is quite useless at this point. We don't do any : uarch-specific emulation in the first place, and require userspace : select the 'generic' vCPU target on all but a few implementations. : : Small series to (1) clean up usage of the target value in the kernel and : (2) switch to the 'generic' target on implementations that previously : had their own target values. The implementation-specific values are : still tolerated, though, to avoid UAPI breakage." : . KVM: arm64: Always return generic v8 as the preferred target KVM: arm64: Replace vCPU target with a configuration flag KVM: arm64: Remove pointless check for changed init target KVM: arm64: Delete pointless switch statement in kvm_reset_vcpu() Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-07-13KVM: arm64: vgic-v4: Make the doorbell request robust w.r.t preemptionMarc Zyngier
Xiang reports that VMs occasionally fail to boot on GICv4.1 systems when running a preemptible kernel, as it is possible that a vCPU is blocked without requesting a doorbell interrupt. The issue is that any preemption that occurs between vgic_v4_put() and schedule() on the block path will mark the vPE as nonresident and *not* request a doorbell irq. This occurs because when the vcpu thread is resumed on its way to block, vcpu_load() will make the vPE resident again. Once the vcpu actually blocks, we don't request a doorbell anymore, and the vcpu won't be woken up on interrupt delivery. Fix it by tracking that we're entering WFI, and key the doorbell request on that flag. This allows us not to make the vPE resident when going through a preempt/schedule cycle, meaning we don't lose any state. Cc: stable@vger.kernel.org Fixes: 8e01d9a396e6 ("KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put") Reported-by: Xiang Chen <chenxiang66@hisilicon.com> Suggested-by: Zenghui Yu <yuzenghui@huawei.com> Tested-by: Xiang Chen <chenxiang66@hisilicon.com> Co-developed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20230713070657.3873244-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-07-11KVM: arm64: Always return generic v8 as the preferred targetOliver Upton
Userspace selecting an implementation-specific vCPU target has been completely useless for a very long time. Let's go whole hog and start returning the generic v8 target across all implementations as the preferred target. Uphold the pre-existing behavior by tolerating either the generic target or an implementation-specific target if the vCPU happens to be running on one of the lucky few parts. Acked-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230710193140.1706399-5-oliver.upton@linux.dev
2023-07-11KVM: arm64: Replace vCPU target with a configuration flagOliver Upton
The value of kvm_vcpu_arch::target has been used to determine if a vCPU has actually been initialized. Storing this as an integer is needless at this point, as KVM doesn't do any microarch-specific emulation in the first place. Instead, all we care about is whether or not the vCPU has been initialized. Delete the field in favor of a vCPU configuration flag indicating if KVM_ARM_VCPU_INIT has completed for the vCPU. Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230710193140.1706399-4-oliver.upton@linux.dev
2023-07-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM64: - Eager page splitting optimization for dirty logging, optionally allowing for a VM to avoid the cost of hugepage splitting in the stage-2 fault path. - Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact with services that live in the Secure world. pKVM intervenes on FF-A calls to guarantee the host doesn't misuse memory donated to the hyp or a pKVM guest. - Support for running the split hypervisor with VHE enabled, known as 'hVHE' mode. This is extremely useful for testing the split hypervisor on VHE-only systems, and paves the way for new use cases that depend on having two TTBRs available at EL2. - Generalized framework for configurable ID registers from userspace. KVM/arm64 currently prevents arbitrary CPU feature set configuration from userspace, but the intent is to relax this limitation and allow userspace to select a feature set consistent with the CPU. - Enable the use of Branch Target Identification (FEAT_BTI) in the hypervisor. - Use a separate set of pointer authentication keys for the hypervisor when running in protected mode, as the host is untrusted at runtime. - Ensure timer IRQs are consistently released in the init failure paths. - Avoid trapping CTR_EL0 on systems with Enhanced Virtualization Traps (FEAT_EVT), as it is a register commonly read from userspace. - Erratum workaround for the upcoming AmpereOne part, which has broken hardware A/D state management. RISC-V: - Redirect AMO load/store misaligned traps to KVM guest - Trap-n-emulate AIA in-kernel irqchip for KVM guest - Svnapot support for KVM Guest s390: - New uvdevice secret API - CMM selftest and fixes - fix racy access to target CPU for diag 9c x86: - Fix missing/incorrect #GP checks on ENCLS - Use standard mmu_notifier hooks for handling APIC access page - Drop now unnecessary TR/TSS load after VM-Exit on AMD - Print more descriptive information about the status of SEV and SEV-ES during module load - Add a test for splitting and reconstituting hugepages during and after dirty logging - Add support for CPU pinning in demand paging test - Add support for AMD PerfMonV2, with a variety of cleanups and minor fixes included along the way - Add a "nx_huge_pages=never" option to effectively avoid creating NX hugepage recovery threads (because nx_huge_pages=off can be toggled at runtime) - Move handling of PAT out of MTRR code and dedup SVM+VMX code - Fix output of PIC poll command emulation when there's an interrupt - Add a maintainer's handbook to document KVM x86 processes, preferred coding style, testing expectations, etc. - Misc cleanups, fixes and comments Generic: - Miscellaneous bugfixes and cleanups Selftests: - Generate dependency files so that partial rebuilds work as expected" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (153 commits) Documentation/process: Add a maintainer handbook for KVM x86 Documentation/process: Add a label for the tip tree handbook's coding style KVM: arm64: Fix misuse of KVM_ARM_VCPU_POWER_OFF bit index RISC-V: KVM: Remove unneeded semicolon RISC-V: KVM: Allow Svnapot extension for Guest/VM riscv: kvm: define vcpu_sbi_ext_pmu in header RISC-V: KVM: Expose IMSIC registers as attributes of AIA irqchip RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC RISC-V: KVM: Expose APLIC registers as attributes of AIA irqchip RISC-V: KVM: Add in-kernel emulation of AIA APLIC RISC-V: KVM: Implement device interface for AIA irqchip RISC-V: KVM: Skeletal in-kernel AIA irqchip support RISC-V: KVM: Set kvm_riscv_aia_nr_hgei to zero RISC-V: KVM: Add APLIC related defines RISC-V: KVM: Add IMSIC related defines RISC-V: KVM: Implement guest external interrupt line management KVM: x86: Remove PRIx* definitions as they are solely for user space s390/uv: Update query for secret-UVCs s390/uv: replace scnprintf with sysfs_emit s390/uvdevice: Add 'Lock Secret Store' UVC ...
2023-07-01Merge tag 'kvmarm-6.5' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.5 - Eager page splitting optimization for dirty logging, optionally allowing for a VM to avoid the cost of block splitting in the stage-2 fault path. - Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact with services that live in the Secure world. pKVM intervenes on FF-A calls to guarantee the host doesn't misuse memory donated to the hyp or a pKVM guest. - Support for running the split hypervisor with VHE enabled, known as 'hVHE' mode. This is extremely useful for testing the split hypervisor on VHE-only systems, and paves the way for new use cases that depend on having two TTBRs available at EL2. - Generalized framework for configurable ID registers from userspace. KVM/arm64 currently prevents arbitrary CPU feature set configuration from userspace, but the intent is to relax this limitation and allow userspace to select a feature set consistent with the CPU. - Enable the use of Branch Target Identification (FEAT_BTI) in the hypervisor. - Use a separate set of pointer authentication keys for the hypervisor when running in protected mode, as the host is untrusted at runtime. - Ensure timer IRQs are consistently released in the init failure paths. - Avoid trapping CTR_EL0 on systems with Enhanced Virtualization Traps (FEAT_EVT), as it is a register commonly read from userspace. - Erratum workaround for the upcoming AmpereOne part, which has broken hardware A/D state management. As a consequence of the hVHE series reworking the arm64 software features framework, the for-next/module-alloc branch from the arm64 tree comes along for the ride.
2023-06-26Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Notable features are user-space support for the memcpy/memset instructions and the permission indirection extension. - Support for the Armv8.9 Permission Indirection Extensions. While this feature doesn't add new functionality, it enables future support for Guarded Control Stacks (GCS) and Permission Overlays - User-space support for the Armv8.8 memcpy/memset instructions - arm64 perf: support the HiSilicon SoC uncore PMU, Arm CMN sysfs identifier, support for the NXP i.MX9 SoC DDRC PMU, fixes and cleanups - Removal of superfluous ISBs on context switch (following retrospective architecture tightening) - Decode the ISS2 register during faults for additional information to help with debugging - KPTI clean-up/simplification of the trampoline exit code - Addressing several -Wmissing-prototype warnings - Kselftest improvements for signal handling and ptrace - Fix TPIDR2_EL0 restoring on sigreturn - Clean-up, robustness improvements of the module allocation code - More sysreg conversions to the automatic register/bitfields generation - CPU capabilities handling cleanup - Arm documentation updates: ACPI, ptdump" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (124 commits) kselftest/arm64: Add a test case for TPIDR2 restore arm64/signal: Restore TPIDR2 register rather than memory state arm64: alternatives: make clean_dcache_range_nopatch() noinstr-safe Documentation/arm64: Add ptdump documentation arm64: hibernate: remove WARN_ON in save_processor_state kselftest/arm64: Log signal code and address for unexpected signals docs: perf: Fix warning from 'make htmldocs' in hisi-pmu.rst arm64/fpsimd: Exit streaming mode when flushing tasks docs: perf: Add new description for HiSilicon UC PMU drivers/perf: hisi: Add support for HiSilicon UC PMU driver drivers/perf: hisi: Add support for HiSilicon H60PA and PAv3 PMU driver perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE perf/arm-cmn: Add sysfs identifier perf/arm-cmn: Revamp model detection perf/arm_dmc620: Add cpumask arm64: mm: fix VA-range sanity check arm64/mm: remove now-superfluous ISBs from TTBR writes Documentation/arm64: Update ACPI tables from BBR Documentation/arm64: Update references in arm-acpi Documentation/arm64: Update ARM and arch reference ...
2023-06-23Merge branch 'for-next/feat_s1pie' into for-next/coreCatalin Marinas
* for-next/feat_s1pie: : Support for the Armv8.9 Permission Indirection Extensions (stage 1 only) KVM: selftests: get-reg-list: add Permission Indirection registers KVM: selftests: get-reg-list: support ID register features arm64: Document boot requirements for PIE arm64: transfer permission indirection settings to EL2 arm64: enable Permission Indirection Extension (PIE) arm64: add encodings of PIRx_ELx registers arm64: disable EL2 traps for PIE arm64: reorganise PAGE_/PROT_ macros arm64: add PTE_WRITE to PROT_SECT_NORMAL arm64: add PTE_UXN/PTE_WRITE to SWAPPER_*_FLAGS KVM: arm64: expose ID_AA64MMFR3_EL1 to guests KVM: arm64: Save/restore PIE registers KVM: arm64: Save/restore TCR2_EL1 arm64: cpufeature: add Permission Indirection Extension cpucap arm64: cpufeature: add TCR2 cpucap arm64: cpufeature: add system register ID_AA64MMFR3 arm64/sysreg: add PIR*_ELx registers arm64/sysreg: update HCRX_EL2 register arm64/sysreg: add system registers TCR2_ELx arm64/sysreg: Add ID register ID_AA64MMFR3
2023-06-15Merge branch kvm-arm64/configurable-id-regs into kvmarm/nextOliver Upton
* kvm-arm64/configurable-id-regs: : Configurable ID register infrastructure, courtesy of Jing Zhang : : Create generalized infrastructure for allowing userspace to select the : supported feature set for a VM, so long as the feature set is a subset : of what hardware + KVM allows. This does not add any new features that : are user-configurable, and instead focuses on the necessary refactoring : to enable future work. : : As a consequence of the series, feature asymmetry is now deliberately : disallowed for KVM. It is unlikely that VMMs ever configured VMs with : asymmetry, nor does it align with the kernel's overall stance that : features must be uniform across all cores in the system. : : Furthermore, KVM incorrectly advertised an IMP_DEF PMU to guests for : some time. Migrations from affected kernels was supported by explicitly : allowing such an ID register value from userspace, and forwarding that : along to the guest. KVM now allows an IMP_DEF PMU version to be restored : through the ID register interface, but reinterprets the user value as : not implemented (0). KVM: arm64: Rip out the vestiges of the 'old' ID register scheme KVM: arm64: Handle ID register reads using the VM-wide values KVM: arm64: Use generic sanitisation for ID_AA64PFR0_EL1 KVM: arm64: Use generic sanitisation for ID_(AA64)DFR0_EL1 KVM: arm64: Use arm64_ftr_bits to sanitise ID register writes KVM: arm64: Save ID registers' sanitized value per guest KVM: arm64: Reuse fields of sys_reg_desc for idreg KVM: arm64: Rewrite IMPDEF PMU version as NI KVM: arm64: Make vCPU feature flags consistent VM-wide KVM: arm64: Relax invariance of KVM_ARM_VCPU_POWER_OFF KVM: arm64: Separate out feature sanitisation and initialisation Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15Merge branch kvm-arm64/ffa-proxy into kvmarm/nextOliver Upton
* kvm-arm64/ffa-proxy: : pKVM FF-A Proxy, courtesy Will Deacon and Andrew Walbran : : From the cover letter: : : pKVM's primary goal is to protect guest pages from a compromised host by : enforcing access control restrictions using stage-2 page-tables. Sadly, : this cannot prevent TrustZone from accessing non-secure memory, and a : compromised host could, for example, perform a 'confused deputy' attack : by asking TrustZone to use pages that have been donated to protected : guests. This would effectively allow the host to have TrustZone : exfiltrate guest secrets on its behalf, hence breaking the isolation : that pKVM intends to provide. : : This series addresses this problem by providing pKVM with the ability to : monitor SMCs following the Arm FF-A protocol. FF-A provides (among other : things) a set of memory management APIs allowing the Normal World to : share, donate or lend pages with Secure. By monitoring these SMCs, pKVM : can ensure that the pages that are shared, lent or donated to Secure by : the host kernel are only pages that it owns. KVM: arm64: pkvm: Add support for fragmented FF-A descriptors KVM: arm64: Handle FFA_FEATURES call from the host KVM: arm64: Handle FFA_MEM_LEND calls from the host KVM: arm64: Handle FFA_MEM_RECLAIM calls from the host KVM: arm64: Handle FFA_MEM_SHARE calls from the host KVM: arm64: Add FF-A helpers to share/unshare memory with secure world KVM: arm64: Handle FFA_RXTX_MAP and FFA_RXTX_UNMAP calls from the host KVM: arm64: Allocate pages for hypervisor FF-A mailboxes KVM: arm64: Probe FF-A version and host/hyp partition ID during init KVM: arm64: Block unsafe FF-A calls from the host Signed-off-by: Oliver Upton <oliver.upton@linux.dev>