summaryrefslogtreecommitdiff
path: root/arch/arm64/include/asm/sysreg.h
AgeCommit message (Collapse)Author
3 daysMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor - Convert more system register sanitization to the config-driven implementation - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls - Various cleanups and minor fixes LoongArch: - Add stat information for in-kernel irqchip - Add tracepoints for CPUCFG and CSR emulation exits - Enhance in-kernel irqchip emulation - Various cleanups RISC-V: - Enable ring-based dirty memory tracking - Improve perf kvm stat to report interrupt events - Delegate illegal instruction trap to VS-mode - MMU improvements related to upcoming nested virtualization s390x - Fixes x86: - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time - Share device posted IRQ code between SVM and VMX and harden it against bugs and runtime errors - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n) - For MMIO stale data mitigation, track whether or not a vCPU has access to (host) MMIO based on whether the page tables have MMIO pfns mapped; using VFIO is prone to false negatives - Rework the MSR interception code so that the SVM and VMX APIs are more or less identical - Recalculate all MSR intercepts from scratch on MSR filter changes, instead of maintaining shadow bitmaps - Advertise support for LKGS (Load Kernel GS base), a new instruction that's loosely related to FRED, but is supported and enumerated independently - Fix a user-triggerable WARN that syzkaller found by setting the vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU into VMX Root Mode (post-VMXON). Trying to detect every possible path leading to architecturally forbidden states is hard and even risks breaking userspace (if it goes from valid to valid state but passes through invalid states), so just wait until KVM_RUN to detect that the vCPU state isn't allowed - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of APERF/MPERF reads, so that a "properly" configured VM can access APERF/MPERF. This has many caveats (APERF/MPERF cannot be zeroed on vCPU creation or saved/restored on suspend and resume, or preserved over thread migration let alone VM migration) but can be useful whenever you're interested in letting Linux guests see the effective physical CPU frequency in /proc/cpuinfo - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been created, as there's no known use case for changing the default frequency for other VM types and it goes counter to the very reason why the ioctl was added to the vm file descriptor. And also, there would be no way to make it work for confidential VMs with a "secure" TSC, so kill two birds with one stone - Dynamically allocation the shadow MMU's hashed page list, and defer allocating the hashed list until it's actually needed (the TDP MMU doesn't use the list) - Extract many of KVM's helpers for accessing architectural local APIC state to common x86 so that they can be shared by guest-side code for Secure AVIC - Various cleanups and fixes x86 (Intel): - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest. Failure to honor FREEZE_IN_SMM can leak host state into guests - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF x86 (AMD): - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty much a static condition and therefore should never happen, but still) - Fix a variety of flaws and bugs in the AVIC device posted IRQ code - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs - Request GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the vCPU's CPUID model - Accept any SNP policy that is accepted by the firmware with respect to SMT and single-socket restrictions. An incompatible policy doesn't put the kernel at risk in any way, so there's no reason for KVM to care - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and use WBNOINVD instead of WBINVD when possible for SEV cache maintenance - When reclaiming memory from an SEV guest, only do cache flushes on CPUs that have ever run a vCPU for the guest, i.e. don't flush the caches for CPUs that can't possibly have cache lines with dirty, encrypted data Generic: - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *", to avoid making a simple concept unnecessarily difficult to understand - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs - Clean up and document/comment the irqfd assignment code - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues related to private <=> shared memory conversions - Drop guest_memfd's .getattr() implementation as the VFS layer will call generic_fillattr() if inode_operations.getattr is NULL - Fix issues with dirty ring harvesting where KVM doesn't bound the processing of entries in any way, which allows userspace to keep KVM in a tight loop indefinitely - Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking, now that KVM no longer uses assigned_device_count as a heuristic for either irqbypass usage or MDS mitigation Selftests: - Fix a comment typo - Verify KVM is loaded when getting any KVM module param so that attempting to run a selftest without kvm.ko loaded results in a SKIP message about KVM not being loaded/enabled (versus some random parameter not existing) - Skip tests that hit EACCES when attempting to access a file, and print a "Root required?" help message. In most cases, the test just needs to be run with elevated permissions" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits) Documentation: KVM: Use unordered list for pre-init VGIC registers RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map() RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs RISC-V: perf/kvm: Add reporting of interrupt events RISC-V: KVM: Enable ring-based dirty memory tracking RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap RISC-V: KVM: Delegate illegal instruction fault to VS mode RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs RISC-V: KVM: Factor-out g-stage page table management RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence RISC-V: KVM: Introduce struct kvm_gstage_mapping RISC-V: KVM: Factor-out MMU related declarations into separate headers RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect() RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range() RISC-V: KVM: Don't flush TLB when PTE is unchanged RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize() RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init() RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list ...
2025-07-08irqchip/gic-v5: Add GICv5 LPI/IPI supportLorenzo Pieralisi
An IRS supports Logical Peripheral Interrupts (LPIs) and implement Linux IPIs on top of it. LPIs are used for interrupt signals that are translated by a GICv5 ITS (Interrupt Translation Service) but also for software generated IRQs - namely interrupts that are not driven by a HW signal, ie IPIs. LPIs rely on memory storage for interrupt routing and state. LPIs state and routing information is kept in the Interrupt State Table (IST). IRSes provide support for 1- or 2-level IST tables configured to support a maximum number of interrupts that depend on the OS configuration and the HW capabilities. On systems that provide 2-level IST support, always allow the maximum number of LPIs; On systems with only 1-level support, limit the number of LPIs to 2^12 to prevent wasting memory (presumably a system that supports a 1-level only IST is not expecting a large number of interrupts). On a 2-level IST system, L2 entries are allocated on demand. The IST table memory is allocated using the kmalloc() interface; the allocation required may be smaller than a page and must be made up of contiguous physical pages if larger than a page. On systems where the IRS is not cache-coherent with the CPUs, cache mainteinance operations are executed to clean and invalidate the allocated memory to the point of coherency making it visible to the IRS components. On GICv5 systems, IPIs are implemented using LPIs. Add an LPI IRQ domain and implement an IPI-specific IRQ domain created as a child/subdomain of the LPI domain to allocate the required number of LPIs needed to implement the IPIs. IPIs are backed by LPIs, add LPIs allocation/de-allocation functions. The LPI INTID namespace is managed using an IDA to alloc/free LPI INTIDs. Associate an IPI irqchip with IPI IRQ descriptors to provide core code with the irqchip.ipi_send_single() method required to raise an IPI. Co-developed-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Co-developed-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-22-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-08irqchip/gic-v5: Add GICv5 IRS/SPI supportLorenzo Pieralisi
The GICv5 Interrupt Routing Service (IRS) component implements interrupt management and routing in the GICv5 architecture. A GICv5 system comprises one or more IRSes, that together handle the interrupt routing and state for the system. An IRS supports Shared Peripheral Interrupts (SPIs), that are interrupt sources directly connected to the IRS; they do not rely on memory for storage. The number of supported SPIs is fixed for a given implementation and can be probed through IRS IDR registers. SPI interrupt state and routing are managed through GICv5 instructions. Each core (PE in GICv5 terms) in a GICv5 system is identified with an Interrupt AFFinity ID (IAFFID). An IRS manages a set of cores that are connected to it. Firmware provides a topology description that the driver uses to detect to which IRS a CPU (ie an IAFFID) is associated with. Use probeable information and firmware description to initialize the IRSes and implement GICv5 IRS SPIs support through an SPI-specific IRQ domain. The GICv5 IRS driver: - Probes IRSes in the system to detect SPI ranges - Associates an IRS with a set of cores connected to it - Adds an IRQchip structure for SPI handling SPIs priority is set to a value corresponding to the lowest permissible priority in the system (taking into account the implemented priority bits of the IRS and CPU interface). Since all IRQs are set to the same priority value, the value itself does not matter as long as it is a valid one. Co-developed-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Co-developed-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-21-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-08irqchip/gic-v5: Add GICv5 PPI supportLorenzo Pieralisi
The GICv5 CPU interface implements support for PE-Private Peripheral Interrupts (PPI), that are handled (enabled/prioritized/delivered) entirely within the CPU interface hardware. To enable PPI interrupts, implement the baseline GICv5 host kernel driver infrastructure required to handle interrupts on a GICv5 system. Add the exception handling code path and definitions for GICv5 instructions. Add GICv5 PPI handling code as a specific IRQ domain to: - Set-up PPI priority - Manage PPI configuration and state - Manage IRQ flow handler - IRQs allocation/free - Hook-up a PPI specific IRQchip to provide the relevant methods PPI IRQ priority is chosen as the minimum allowed priority by the system design (after probing the number of priority bits implemented by the CPU interface). Co-developed-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Co-developed-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-20-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-08arm64: Add support for GICv5 GSB barriersLorenzo Pieralisi
The GICv5 architecture introduces two barriers instructions (GSB SYS, GSB ACK) that are used to manage interrupt effects. Rework macro used to emit the SB barrier instruction and implement the GSB barriers on top of it. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-19-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-08arm64/sysreg: Add BRBE registers and fieldsAnshuman Khandual
This patch adds definitions related to the Branch Record Buffer Extension (BRBE) as per ARM DDI 0487K.a. These will be used by KVM and a BRBE driver in subsequent patches. Some existing BRBE definitions in asm/sysreg.h are replaced with equivalent generated definitions. Cc: Marc Zyngier <maz@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by: James Clark <james.clark@linaro.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> tested-by: Adam Young <admiyo@os.amperecomputing.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250611-arm-brbe-v19-v23-1-e7775563036e@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2025-05-30arm64: sysreg: Drag linux/kconfig.h to work around vdso build issueMarc Zyngier
Broonie reports that fed55f49fad18 ("arm64: errata: Work around AmpereOne's erratum AC04_CPU_23") breaks one of the vdso selftests (vdso_test_chacha) as it indirectly drags asm/sysreg.h. It is rather unfortunate (and worrying) that userspace gets built with non-UAPI headers. In any case, paper over the issue by dragging linux/kconfig.h in asm/sysreg.h. It is the right thing to do, at least from the kernel perspective. Reported-by: Mark Brown <broonie@kernel.org> Fixes: fed55f49fad18 ("arm64: errata: Work around AmpereOne's erratum AC04_CPU_23") Link: https://lore.kernel.org/r/aDCDGZ-G-nCP3hJI@finisterre.sirena.org.uk Cc: D Scott Phillips <scott@os.amperecomputing.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250523170208.530818-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/misc-6.16 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/misc-6.16: : . : Misc changes and improvements for 6.16: : : - Add a new selftest for the SVE host state being corrupted by a guest : : - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, : ensuring that the window for interrupts is slightly bigger, and avoiding : a pretty bad erratum on the AmpereOne HW : : - Replace a couple of open-coded on/off strings with str_on_off() : : - Get rid of the pKVM memblock sorting, which now appears to be superflous : : - Drop superflous clearing of ICH_LR_EOI in the LR when nesting : : - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from : a pretty bad case of TLB corruption unless accesses to HCR_EL2 are : heavily synchronised : : - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables : in a human-friendly fashion : . KVM: arm64: Fix documentation for vgic_its_iter_next() KVM: arm64: vgic-its: Add debugfs interface to expose ITS tables arm64: errata: Work around AmpereOne's erratum AC04_CPU_23 KVM: arm64: nv: Remove clearing of ICH_LR<n>.EOI if ICH_LR<n>.HW == 1 KVM: arm64: Drop sort_memblock_regions() KVM: arm64: selftests: Add test for SVE host corruption KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode KVM: arm64: Replace ternary flags with str_on_off() helper Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/nv-nv into kvmarm-master/nextMarc Zyngier
* kvm-arm64/nv-nv: : . : Flick the switch on the NV support by adding the missing piece : in the form of the VNCR page management. From the cover letter: : : "This is probably the most interesting bit of the whole NV adventure. : So far, everything else has been a walk in the park, but this one is : where the real fun takes place. : : With FEAT_NV2, most of the NV support revolves around tricking a guest : into accessing memory while it tries to access system registers. The : hypervisor's job is to handle the context switch of the actual : registers with the state in memory as needed." : . KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating KVM: arm64: Document NV caps and vcpu flags KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2* KVM: arm64: nv: Remove dead code from ERET handling KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2 KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2 KVM: arm64: nv: Handle VNCR_EL2-triggered faults KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2 KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2 KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting KVM: arm64: nv: Move TLBI range decoding to a helper KVM: arm64: nv: Snapshot S1 ASID tagging information during walk KVM: arm64: nv: Extract translation helper from the AT code KVM: arm64: nv: Allocate VNCR page when required arm64: sysreg: Add layout for VNCR_EL2 Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19arm64: errata: Work around AmpereOne's erratum AC04_CPU_23D Scott Phillips
On AmpereOne AC04, updates to HCR_EL2 can rarely corrupt simultaneous translations for data addresses initiated by load/store instructions. Only instruction initiated translations are vulnerable, not translations from prefetches for example. A DSB before the store to HCR_EL2 is sufficient to prevent older instructions from hitting the window for corruption, and an ISB after is sufficient to prevent younger instructions from hitting the window for corruption. Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250513184514.2678288-1-scott@os.amperecomputing.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19arm64: sysreg: Add layout for VNCR_EL2Marc Zyngier
Now that we're about to emulate VNCR_EL2, we need its full layout. Add it to the sysreg file. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06arm64: Remove duplicated sysreg encodingsMarc Zyngier
A bunch of sysregs are now generated from the sysreg file, so no need to carry separate definitions. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06arm64: sysreg: Add system instructions trapped by HFGIRT2_EL2Marc Zyngier
Add the new CMOs trapped by HFGITR2_EL2. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06arm64: sysreg: Add registers trapped by HDFG{R,W}TR2_EL2Marc Zyngier
Bulk addition of all the system registers trapped by HDFG{R,W}TR2_EL2. The descriptions are extracted from the BSD-licenced JSON file part of the 2025-03 drop from ARM. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-03-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Nested virtualization support for VGICv3, giving the nested hypervisor control of the VGIC hardware when running an L2 VM - Removal of 'late' nested virtualization feature register masking, making the supported feature set directly visible to userspace - Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers - Paravirtual interface for discovering the set of CPU implementations where a VM may run, addressing a longstanding issue of guest CPU errata awareness in big-little systems and cross-implementation VM migration - Userspace control of the registers responsible for identifying a particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1), allowing VMs to be migrated cross-implementation - pKVM updates, including support for tracking stage-2 page table allocations in the protected hypervisor in the 'SecPageTable' stat - Fixes to vPMU, ensuring that userspace updates to the vPMU after KVM_RUN are reflected into the backing perf events LoongArch: - Remove unnecessary header include path - Assume constant PGD during VM context switch - Add perf events support for guest VM RISC-V: - Disable the kernel perf counter during configure - KVM selftests improvements for PMU - Fix warning at the time of KVM module removal x86: - Add support for aging of SPTEs without holding mmu_lock. Not taking mmu_lock allows multiple aging actions to run in parallel, and more importantly avoids stalling vCPUs. This includes an implementation of per-rmap-entry locking; aging the gfn is done with only a per-rmap single-bin spinlock taken, whereas locking an rmap for write requires taking both the per-rmap spinlock and the mmu_lock. Note that this decreases slightly the accuracy of accessed-page information, because changes to the SPTE outside aging might not use atomic operations even if they could race against a clear of the Accessed bit. This is deliberate because KVM and mm/ tolerate false positives/negatives for accessed information, and testing has shown that reducing the latency of aging is far more beneficial to overall system performance than providing "perfect" young/old information. - Defer runtime CPUID updates until KVM emulates a CPUID instruction, to coalesce updates when multiple pieces of vCPU state are changing, e.g. as part of a nested transition - Fix a variety of nested emulation bugs, and add VMX support for synthesizing nested VM-Exit on interception (instead of injecting #UD into L2) - Drop "support" for async page faults for protected guests that do not set SEND_ALWAYS (i.e. that only want async page faults at CPL3) - Bring a bit of sanity to x86's VM teardown code, which has accumulated a lot of cruft over the years. Particularly, destroy vCPUs before the MMU, despite the latter being a VM-wide operation - Add common secure TSC infrastructure for use within SNP and in the future TDX - Block KVM_CAP_SYNC_REGS if guest state is protected. It does not make sense to use the capability if the relevant registers are not available for reading or writing - Don't take kvm->lock when iterating over vCPUs in the suspend notifier to fix a largely theoretical deadlock - Use the vCPU's actual Xen PV clock information when starting the Xen timer, as the cached state in arch.hv_clock can be stale/bogus - Fix a bug where KVM could bleed PVCLOCK_GUEST_STOPPED across different PV clocks; restrict PVCLOCK_GUEST_STOPPED to kvmclock, as KVM's suspend notifier only accounts for kvmclock, and there's no evidence that the flag is actually supported by Xen guests - Clean up the per-vCPU "cache" of its reference pvclock, and instead only track the vCPU's TSC scaling (multipler+shift) metadata (which is moderately expensive to compute, and rarely changes for modern setups) - Don't write to the Xen hypercall page on MSR writes that are initiated by the host (userspace or KVM) to fix a class of bugs where KVM can write to guest memory at unexpected times, e.g. during vCPU creation if userspace has set the Xen hypercall MSR index to collide with an MSR that KVM emulates - Restrict the Xen hypercall MSR index to the unofficial synthetic range to reduce the set of possible collisions with MSRs that are emulated by KVM (collisions can still happen as KVM emulates Hyper-V MSRs, which also reside in the synthetic range) - Clean up and optimize KVM's handling of Xen MSR writes and xen_hvm_config - Update Xen TSC leaves during CPUID emulation instead of modifying the CPUID entries when updating PV clocks; there is no guarantee PV clocks will be updated between TSC frequency changes and CPUID emulation, and guest reads of the TSC leaves should be rare, i.e. are not a hot path x86 (Intel): - Fix a bug where KVM unnecessarily reads XFD_ERR from hardware and thus modifies the vCPU's XFD_ERR on a #NM due to CR0.TS=1 - Pass XFD_ERR as the payload when injecting #NM, as a preparatory step for upcoming FRED virtualization support - Decouple the EPT entry RWX protection bit macros from the EPT Violation bits, both as a general cleanup and in anticipation of adding support for emulating Mode-Based Execution Control (MBEC) - Reject KVM_RUN if userspace manages to gain control and stuff invalid guest state while KVM is in the middle of emulating nested VM-Enter - Add a macro to handle KVM's sanity checks on entry/exit VMCS control pairs in anticipation of adding sanity checks for secondary exit controls (the primary field is out of bits) x86 (AMD): - Ensure the PSP driver is initialized when both the PSP and KVM modules are built-in (the initcall framework doesn't handle dependencies) - Use long-term pins when registering encrypted memory regions, so that the pages are migrated out of MIGRATE_CMA/ZONE_MOVABLE and don't lead to excessive fragmentation - Add macros and helpers for setting GHCB return/error codes - Add support for Idle HLT interception, which elides interception if the vCPU has a pending, unmasked virtual IRQ when HLT is executed - Fix a bug in INVPCID emulation where KVM fails to check for a non-canonical address - Don't attempt VMRUN for SEV-ES+ guests if the vCPU's VMSA is invalid, e.g. because the vCPU was "destroyed" via SNP's AP Creation hypercall - Reject SNP AP Creation if the requested SEV features for the vCPU don't match the VM's configured set of features Selftests: - Fix again the Intel PMU counters test; add a data load and do CLFLUSH{OPT} on the data instead of executing code. The theory is that modern Intel CPUs have learned new code prefetching tricks that bypass the PMU counters - Fix a flaw in the Intel PMU counters test where it asserts that an event is counting correctly without actually knowing what the event counts on the underlying hardware - Fix a variety of flaws, bugs, and false failures/passes dirty_log_test, and improve its coverage by collecting all dirty entries on each iteration - Fix a few minor bugs related to handling of stats FDs - Add infrastructure to make vCPU and VM stats FDs available to tests by default (open the FDs during VM/vCPU creation) - Relax an assertion on the number of HLT exits in the xAPIC IPI test when running on a CPU that supports AMD's Idle HLT (which elides interception of HLT if a virtual IRQ is pending and unmasked)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (216 commits) RISC-V: KVM: Optimize comments in kvm_riscv_vcpu_isa_disable_allowed RISC-V: KVM: Teardown riscv specific bits after kvm_exit LoongArch: KVM: Register perf callbacks for guest LoongArch: KVM: Implement arch-specific functions for guest perf LoongArch: KVM: Add stub for kvm_arch_vcpu_preempted_in_kernel() LoongArch: KVM: Remove PGD saving during VM context switch LoongArch: KVM: Remove unnecessary header include path KVM: arm64: Tear down vGIC on failed vCPU creation KVM: arm64: PMU: Reload when resetting KVM: arm64: PMU: Reload when user modifies registers KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs KVM: arm64: PMU: Assume PMU presence in pmu-emul.c KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu KVM: arm64: Factor out pKVM hyp vcpu creation to separate function KVM: arm64: Initialize HCRX_EL2 traps in pKVM KVM: arm64: Factor out setting HCRX_EL2 traps into separate function KVM: x86: block KVM_CAP_SYNC_REGS if guest state is protected KVM: x86: Add infrastructure for secure TSC KVM: x86: Push down setting vcpu.arch.user_set_tsc ...
2025-03-10arm64/sysreg: Move POR_EL0_INIT to asm/por.hKevin Brodsky
The value of POR_EL0_INIT is not architectural, it is a software decision. Since we have a dedicated header for POR_ELx, we might as well define POR_EL0_INIT there. While at it also define POR_EL0_INIT using POR_ELx_PERM_PREP(), making it clearer that we are setting permissions for POIndex/pkey 0. Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Link: https://lore.kernel.org/r/20250219164029.2309119-4-kevin.brodsky@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-03-10arm64/sysreg: Rename POE_RXW to POE_RWXKevin Brodsky
It is customary to list R, W, X permissions in that order. In fact this is already the case for PIE constants (PIE_RWX). Rename POE_RXW accordingly, as well as POE_XW (currently unused). While at it also swap the W/X lines in compute_s1_overlay_permissions() to follow the R, W, X order. Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Link: https://lore.kernel.org/r/20250219164029.2309119-3-kevin.brodsky@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-03-10arm64/sysreg: Improve PIR/POR helpersKevin Brodsky
We currently have one helper to set a PIRx_ELx's permission field to a given value, PIRx_ELx_PERM(), and another helper to extract a permission field from POR_ELx, POR_ELx_IDX(). The naming is pretty confusing - it isn't clear at all that "_PERM" corresponds to a setter and "_IDX" to a getter. This patch aims at improving the situation by using the same suffixes as FIELD_PREP()/FIELD_GET(), which we have already adopted for SYS_FIELD_{PREP,GET}(): * PIRx_ELx_PERM_PREP(), POR_ELx_PERM_PREP() create a register value where the permission field for a given index is set to a given value. * POR_ELx_PERM_GET() extracts the permission field from a given register value for a given index. These helpers are not implemented using FIELD_PREP()/FIELD_GET() because the mask may not be constant, and they need to be usable in assembly. They are all defined in asm/sysreg.h, as one would expect for basic sysreg-related helpers. Finally the new POR_ELx_PERM_* macros are used for existing calculations in signal.c and mmu.c. Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Link: https://lore.kernel.org/r/20250219164029.2309119-2-kevin.brodsky@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-03-03arm64: sysreg: Add layout for ICH_MISR_EL2Marc Zyngier
The ICH_MISR_EL2-related macros are missing a number of status bits that we are about to handle. Take this opportunity to fully describe the layout of that register as part of the automatic generation infrastructure. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-03arm64: sysreg: Add layout for ICH_VTR_EL2Marc Zyngier
The ICH_VTR_EL2-related macros are missing a number of config bits that we are about to handle. Take this opportunity to fully describe the layout of that register as part of the automatic generation infrastructure. This results in a bit of churn to repaint constants that are now generated with a different format. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-03arm64: sysreg: Add layout for ICH_HCR_EL2Marc Zyngier
The ICH_HCR_EL2-related macros are missing a number of control bits that we are about to handle. Take this opportunity to fully describe the layout of that register as part of the automatic generation infrastructure. This results in a bit of churn, unfortunately. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-01-17Merge branch kvm-arm64/coresight-6.14 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/coresight-6.14: : . : Trace filtering update from James Clark. From the cover letter: : : "The guest filtering rules from the Perf session are now honored for both : nVHE and VHE modes. This is done by either writing to TRFCR_EL12 at the : start of the Perf session and doing nothing else further, or caching the : guest value and writing it at guest switch for nVHE. In pKVM, trace is : now be disabled for both protected and unprotected guests." : . KVM: arm64: Fix selftests after sysreg field name update coresight: Pass guest TRFCR value to KVM KVM: arm64: Support trace filtering for guests KVM: arm64: coresight: Give TRBE enabled state to KVM coresight: trbe: Remove redundant disable call arm64/sysreg/tools: Move TRFCR definitions to sysreg tools: arm64: Update sysreg.h header files Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-12arm64/sysreg/tools: Move TRFCR definitions to sysregJames Clark
Convert TRFCR to automatic generation. Add separate definitions for ELx and EL2 as TRFCR_EL1 doesn't have CX. This also mirrors the previous definition so no code change is required. Also add TRFCR_EL12 which will start to be used in a later commit. Unfortunately, to avoid breaking the Perf build with duplicate definition errors, the tools copy of the sysreg.h header needs to be updated at the same time rather than the usual second commit. This is because the generated version of sysreg (arch/arm64/include/generated/asm/sysreg-defs.h), is currently shared and tools/ does not have its own copy. Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250106142446.628923-4-james.clark@linaro.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-02KVM: arm64: nv: Add handling of EL2-specific timer registersMarc Zyngier
Add the required handling for EL2 and EL02 registers, as well as EL1 registers used in the E2H context. This includes handling the virtual timer accesses when CNTHCTL_EL2.EL1TVT or CNTHCTL_EL2.EL1TVCT are set. Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241217142321.763801-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-11-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "The biggest change here is eliminating the awful idea that KVM had of essentially guessing which pfns are refcounted pages. The reason to do so was that KVM needs to map both non-refcounted pages (for example BARs of VFIO devices) and VM_PFNMAP/VM_MIXMEDMAP VMAs that contain refcounted pages. However, the result was security issues in the past, and more recently the inability to map VM_IO and VM_PFNMAP memory that _is_ backed by struct page but is not refcounted. In particular this broke virtio-gpu blob resources (which directly map host graphics buffers into the guest as "vram" for the virtio-gpu device) with the amdgpu driver, because amdgpu allocates non-compound higher order pages and the tail pages could not be mapped into KVM. This requires adjusting all uses of struct page in the per-architecture code, to always work on the pfn whenever possible. The large series that did this, from David Stevens and Sean Christopherson, also cleaned up substantially the set of functions that provided arch code with the pfn for a host virtual addresses. The previous maze of twisty little passages, all different, is replaced by five functions (__gfn_to_page, __kvm_faultin_pfn, the non-__ versions of these two, and kvm_prefetch_pages) saving almost 200 lines of code. ARM: - Support for stage-1 permission indirection (FEAT_S1PIE) and permission overlays (FEAT_S1POE), including nested virt + the emulated page table walker - Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This call was introduced in PSCIv1.3 as a mechanism to request hibernation, similar to the S4 state in ACPI - Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As part of it, introduce trivial initialization of the host's MPAM context so KVM can use the corresponding traps - PMU support under nested virtualization, honoring the guest hypervisor's trap configuration and event filtering when running a nested guest - Fixes to vgic ITS serialization where stale device/interrupt table entries are not zeroed when the mapping is invalidated by the VM - Avoid emulated MMIO completion if userspace has requested synchronous external abort injection - Various fixes and cleanups affecting pKVM, vCPU initialization, and selftests LoongArch: - Add iocsr and mmio bus simulation in kernel. - Add in-kernel interrupt controller emulation. - Add support for virtualization extensions to the eiointc irqchip. PPC: - Drop lingering and utterly obsolete references to PPC970 KVM, which was removed 10 years ago. - Fix incorrect documentation references to non-existing ioctls RISC-V: - Accelerate KVM RISC-V when running as a guest - Perf support to collect KVM guest statistics from host side s390: - New selftests: more ucontrol selftests and CPU model sanity checks - Support for the gen17 CPU model - List registers supported by KVM_GET/SET_ONE_REG in the documentation x86: - Cleanup KVM's handling of Accessed and Dirty bits to dedup code, improve documentation, harden against unexpected changes. Even if the hardware A/D tracking is disabled, it is possible to use the hardware-defined A/D bits to track if a PFN is Accessed and/or Dirty, and that removes a lot of special cases. - Elide TLB flushes when aging secondary PTEs, as has been done in x86's primary MMU for over 10 years. - Recover huge pages in-place in the TDP MMU when dirty page logging is toggled off, instead of zapping them and waiting until the page is re-accessed to create a huge mapping. This reduces vCPU jitter. - Batch TLB flushes when dirty page logging is toggled off. This reduces the time it takes to disable dirty logging by ~3x. - Remove the shrinker that was (poorly) attempting to reclaim shadow page tables in low-memory situations. - Clean up and optimize KVM's handling of writes to MSR_IA32_APICBASE. - Advertise CPUIDs for new instructions in Clearwater Forest - Quirk KVM's misguided behavior of initialized certain feature MSRs to their maximum supported feature set, which can result in KVM creating invalid vCPU state. E.g. initializing PERF_CAPABILITIES to a non-zero value results in the vCPU having invalid state if userspace hides PDCM from the guest, which in turn can lead to save/restore failures. - Fix KVM's handling of non-canonical checks for vCPUs that support LA57 to better follow the "architecture", in quotes because the actual behavior is poorly documented. E.g. most MSR writes and descriptor table loads ignore CR4.LA57 and operate purely on whether the CPU supports LA57. - Bypass the register cache when querying CPL from kvm_sched_out(), as filling the cache from IRQ context is generally unsafe; harden the cache accessors to try to prevent similar issues from occuring in the future. The issue that triggered this change was already fixed in 6.12, but was still kinda latent. - Advertise AMD_IBPB_RET to userspace, and fix a related bug where KVM over-advertises SPEC_CTRL when trying to support cross-vendor VMs. - Minor cleanups - Switch hugepage recovery thread to use vhost_task. These kthreads can consume significant amounts of CPU time on behalf of a VM or in response to how the VM behaves (for example how it accesses its memory); therefore KVM tried to place the thread in the VM's cgroups and charge the CPU time consumed by that work to the VM's container. However the kthreads did not process SIGSTOP/SIGCONT, and therefore cgroups which had KVM instances inside could not complete freezing. Fix this by replacing the kthread with a PF_USER_WORKER thread, via the vhost_task abstraction. Another 100+ lines removed, with generally better behavior too like having these threads properly parented in the process tree. - Revert a workaround for an old CPU erratum (Nehalem/Westmere) that didn't really work; there was really nothing to work around anyway: the broken patch was meant to fix nested virtualization, but the PERF_GLOBAL_CTRL MSR is virtualized and therefore unaffected by the erratum. - Fix 6.12 regression where CONFIG_KVM will be built as a module even if asked to be builtin, as long as neither KVM_INTEL nor KVM_AMD is 'y'. x86 selftests: - x86 selftests can now use AVX. Documentation: - Use rST internal links - Reorganize the introduction to the API document Generic: - Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock instead of RCU, so that running a vCPU on a different task doesn't encounter long due to having to wait for all CPUs become quiescent. In general both reads and writes are rare, but userspace that supports confidential computing is introducing the use of "helper" vCPUs that may jump from one host processor to another. Those will be very happy to trigger a synchronize_rcu(), and the effect on performance is quite the disaster" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (298 commits) KVM: x86: Break CONFIG_KVM_X86's direct dependency on KVM_INTEL || KVM_AMD KVM: x86: add back X86_LOCAL_APIC dependency Revert "KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()" KVM: x86: switch hugepage recovery thread to vhost_task KVM: x86: expose MSR_PLATFORM_INFO as a feature MSR x86: KVM: Advertise CPUIDs for new instructions in Clearwater Forest Documentation: KVM: fix malformed table irqchip/loongson-eiointc: Add virt extension support LoongArch: KVM: Add irqfd support LoongArch: KVM: Add PCHPIC user mode read and write functions LoongArch: KVM: Add PCHPIC read and write functions LoongArch: KVM: Add PCHPIC device support LoongArch: KVM: Add EIOINTC user mode read and write functions LoongArch: KVM: Add EIOINTC read and write functions LoongArch: KVM: Add EIOINTC device support LoongArch: KVM: Add IPI user mode read and write function LoongArch: KVM: Add IPI read and write function LoongArch: KVM: Add IPI device support LoongArch: KVM: Add iocsr and mmio bus simulation in kernel KVM: arm64: Pass on SVE mapping failures ...
2024-10-31arm64/sysreg: Convert existing MPAM sysregs and add the remaining entriesJames Morse
Move the existing MPAM system register defines from sysreg.h to tools/sysreg and add the remaining system registers. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-2-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-04arm64/sysreg: Add definitions for architected GCS capsMark Brown
The architecture defines a format for guarded control stack caps, used to mark the top of an unused GCS in order to limit the potential for exploitation via stack switching. Add definitions associated with these. Reviewed-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241001-arm64-gcs-v13-8-222b78d87eee@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-09-16Merge tag 'for-linus-non-x86' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "These are the non-x86 changes (mostly ARM, as is usually the case). The generic and x86 changes will come later" ARM: - New Stage-2 page table dumper, reusing the main ptdump infrastructure - FP8 support - Nested virtualization now supports the address translation (FEAT_ATS1A) family of instructions - Add selftest checks for a bunch of timer emulation corner cases - Fix multiple cases where KVM/arm64 doesn't correctly handle the guest trying to use a GICv3 that wasn't advertised - Remove REG_HIDDEN_USER from the sysreg infrastructure, making things little simpler - Prevent MTE tags being restored by userspace if we are actively logging writes, as that's a recipe for disaster - Correct the refcount on a page that is not considered for MTE tag copying (such as a device) - When walking a page table to split block mappings, synchronize only at the end the walk rather than on every store - Fix boundary check when transfering memory using FFA - Fix pKVM TLB invalidation, only affecting currently out of tree code but worth addressing for peace of mind LoongArch: - Revert qspinlock to test-and-set simple lock on VM. - Add Loongson Binary Translation extension support. - Add PMU support for guest. - Enable paravirt feature control from VMM. - Implement function kvm_para_has_feature(). RISC-V: - Fix sbiret init before forwarding to userspace - Don't zero-out PMU snapshot area before freeing data - Allow legacy PMU access from guest - Fix to allow hpmcounter31 from the guest" * tag 'for-linus-non-x86' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (64 commits) LoongArch: KVM: Implement function kvm_para_has_feature() LoongArch: KVM: Enable paravirt feature control from VMM LoongArch: KVM: Add PMU support for guest KVM: arm64: Get rid of REG_HIDDEN_USER visibility qualifier KVM: arm64: Simplify visibility handling of AArch32 SPSR_* KVM: arm64: Simplify handling of CNTKCTL_EL12 LoongArch: KVM: Add vm migration support for LBT registers LoongArch: KVM: Add Binary Translation extension support LoongArch: KVM: Add VM feature detection function LoongArch: Revert qspinlock to test-and-set simple lock on VM KVM: arm64: Register ptdump with debugfs on guest creation arm64: ptdump: Don't override the level when operating on the stage-2 tables arm64: ptdump: Use the ptdump description from a local context arm64: ptdump: Expose the attribute parsing functionality KVM: arm64: Add memory length checks and remove inline in do_ffa_mem_xfer KVM: arm64: Move pagetable definitions to common header KVM: arm64: nv: Add support for FEAT_ATS1A KVM: arm64: nv: Plumb handling of AT S1* traps from EL2 KVM: arm64: nv: Make AT+PAN instructions aware of FEAT_PAN3 KVM: arm64: nv: Sanitise SCTLR_EL1.EPAN according to VM configuration ...
2024-09-12Merge branch 'for-next/poe' into for-next/coreWill Deacon
* for-next/poe: (31 commits) arm64: pkeys: remove redundant WARN kselftest/arm64: Add test case for POR_EL0 signal frame records kselftest/arm64: parse POE_MAGIC in a signal frame kselftest/arm64: add HWCAP test for FEAT_S1POE selftests: mm: make protection_keys test work on arm64 selftests: mm: move fpregs printing kselftest/arm64: move get_header() arm64: add Permission Overlay Extension Kconfig arm64: enable PKEY support for CPUs with S1POE arm64: enable POE and PIE to coexist arm64/ptrace: add support for FEAT_POE arm64: add POE signal support arm64: implement PKEYS support arm64: add pte_access_permitted_no_overlay() arm64: handle PKEY/POE faults arm64: mask out POIndex when modifying a PTE arm64: convert protection key into vm_flags and pgprot values arm64: add POIndex defines arm64: re-order MTE VM_ flags arm64: enable the Permission Overlay Extension for EL0 ...
2024-09-04arm64: context switch POR_EL0 registerJoey Gouly
POR_EL0 is a register that can be modified by userspace directly, so it must be context switched. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20240822151113.1479789-7-joey.gouly@arm.com [will: Dropped unnecessary isb()s] Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30KVM: arm64: nv: Add support for FEAT_ATS1AMarc Zyngier
Handling FEAT_ATS1A (which provides the AT S1E{1,2}A instructions) is pretty easy, as it is just the usual AT without the permission check. This basically amounts to plumbing the instructions in the various dispatch tables, and handling FEAT_ATS1A being disabled in the ID registers. Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30arm64: Add system register encoding for PSTATE.PANMarc Zyngier
Although we already have the primitives to set PSTATE.PAN with an immediate, we don't have a way to read the current state nor set it ot an arbitrary value (i.e. we can generally save/restore it). Thankfully, all that is missing for this is the definition for the PAN pseudo system register, here named SYS_PSTATE_PAN. Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-30arm64: Add PAR_EL1 field descriptionMarc Zyngier
As KVM is about to grow a full emulation for the AT instructions, add the layout of the PAR_EL1 register in its non-D128 configuration. Note that the constants are a bit ugly, as the register has two layouts, based on the state of the F bit. Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-16KVM: arm64: pmu: Use generated define for PMSELR_EL0.SEL accessRob Herring (Arm)
ARMV8_PMU_COUNTER_MASK is really a mask for the PMSELR_EL0.SEL register field. Make that clear by adding a standard sysreg definition for the register, and using it instead. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Tested-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20240731-arm-pmu-3-9-icntr-v3-4-280a8d7ff465@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-07-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Initial infrastructure for shadow stage-2 MMUs, as part of nested virtualization enablement - Support for userspace changes to the guest CTR_EL0 value, enabling (in part) migration of VMs between heterogenous hardware - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of the protocol - FPSIMD/SVE support for nested, including merged trap configuration and exception routing - New command-line parameter to control the WFx trap behavior under KVM - Introduce kCFI hardening in the EL2 hypervisor - Fixes + cleanups for handling presence/absence of FEAT_TCRX - Miscellaneous fixes + documentation updates LoongArch: - Add paravirt steal time support - Add support for KVM_DIRTY_LOG_INITIALLY_SET - Add perf kvm-stat support for loongarch RISC-V: - Redirect AMO load/store access fault traps to guest - perf kvm stat support - Use guest files for IMSIC virtualization, when available s390: - Assortment of tiny fixes which are not time critical x86: - Fixes for Xen emulation - Add a global struct to consolidate tracking of host values, e.g. EFER - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX - Print the name of the APICv/AVIC inhibits in the relevant tracepoint - Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor - Drop MTRR virtualization, and instead always honor guest PAT on CPUs that support self-snoop - Update to the newfangled Intel CPU FMS infrastructure - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads '0' and writes from userspace are ignored - Misc cleanups x86 - MMU: - Small cleanups, renames and refactoring extracted from the upcoming Intel TDX support - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't hold leafs SPTEs - Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager page splitting, to avoid stalling vCPUs when splitting huge pages - Bug the VM instead of simply warning if KVM tries to split a SPTE that is non-present or not-huge. KVM is guaranteed to end up in a broken state because the callers fully expect a valid SPTE, it's all but dangerous to let more MMU changes happen afterwards x86 - AMD: - Make per-CPU save_area allocations NUMA-aware - Force sev_es_host_save_area() to be inlined to avoid calling into an instrumentable function from noinstr code - Base support for running SEV-SNP guests. API-wise, this includes a new KVM_X86_SNP_VM type, encrypting/measure the initial image into guest memory, and finalizing it before launching it. Internally, there are some gmem/mmu hooks needed to prepare gmem-allocated pages before mapping them into guest private memory ranges This includes basic support for attestation guest requests, enough to say that KVM supports the GHCB 2.0 specification There is no support yet for loading into the firmware those signing keys to be used for attestation requests, and therefore no need yet for the host to provide certificate data for those keys. To support fetching certificate data from userspace, a new KVM exit type will be needed to handle fetching the certificate from userspace. An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS exit type to handle this was introduced in v1 of this patchset, but is still being discussed by community, so for now this patchset only implements a stub version of SNP Extended Guest Requests that does not provide certificate data x86 - Intel: - Remove an unnecessary EPT TLB flush when enabling hardware - Fix a series of bugs that cause KVM to fail to detect nested pending posted interrupts as valid wake eents for a vCPU executing HLT in L2 (with HLT-exiting disable by L1) - KVM: x86: Suppress MMIO that is triggered during task switch emulation Explicitly suppress userspace emulated MMIO exits that are triggered when emulating a task switch as KVM doesn't support userspace MMIO during complex (multi-step) emulation Silently ignoring the exit request can result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace for some other reason prior to purging mmio_needed See commit 0dc902267cb3 ("KVM: x86: Suppress pending MMIO write exits if emulator detects exception") for more details on KVM's limitations with respect to emulated MMIO during complex emulator flows Generic: - Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE, because the special casing needed by these pages is not due to just unmovability (and in fact they are only unmovable because the CPU cannot access them) - New ioctl to populate the KVM page tables in advance, which is useful to mitigate KVM page faults during guest boot or after live migration. The code will also be used by TDX, but (probably) not through the ioctl - Enable halt poll shrinking by default, as Intel found it to be a clear win - Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86 - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out() - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs - Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout Selftests: - Remove dead code in the memslot modification stress test - Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs - Print the guest pseudo-RNG seed only when it changes, to avoid spamming the log for tests that create lots of VMs - Make the PMU counters test less flaky when counting LLC cache misses by doing CLFLUSH{OPT} in every loop iteration" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) crypto: ccp: Add the SNP_VLEK_LOAD command KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops KVM: x86: Replace static_call_cond() with static_call() KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event x86/sev: Move sev_guest.h into common SEV header KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event KVM: x86: Suppress MMIO that is triggered during task switch emulation KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory() KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault" KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory KVM: Document KVM_PRE_FAULT_MEMORY ioctl mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE perf kvm: Add kvm-stat for loongarch64 LoongArch: KVM: Add PV steal time support in guest side ...
2024-07-04arm64/cpufeature: Replace custom macros with fields from ID_AA64PFR0_EL1Anshuman Khandual
This replaces custom macros usage (i.e ID_AA64PFR0_EL1_ELx_64BIT_ONLY and ID_AA64PFR0_EL1_ELx_32BIT_64BIT) and instead directly uses register fields from ID_AA64PFR0_EL1 sysreg definition. Finally let's drop off both these custom macros as they are now redundant. Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20240613102710.3295108-3-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-06-19KVM: arm64: nv: Handle EL2 Stage-1 TLB invalidationMarc Zyngier
Due to the way FEAT_NV2 suppresses traps when accessing EL2 system registers, we can't track when the guest changes its HCR_EL2.TGE setting. This means we always trap EL1 TLBIs, even if they don't affect any L2 guest. Given that invalidating the EL2 TLBs doesn't require any messing with the shadow stage-2 page-tables, we can simply emulate the instructions early and return directly to the guest. This is conditioned on the instruction being an EL1 one and the guest's HCR_EL2.{E2H,TGE} being {1,1} (indicating that the instruction targets the EL2 S1 TLBs), or the instruction being one of the EL2 ones (which are not ambiguous). EL1 TLBIs issued with HCR_EL2.{E2H,TGE}={1,0} are not handled here, and cause a full exit so that they can be handled in the context of a VMID. Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-7-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-04-28arm64/sysreg: Update PIE permission encodingsShiqi Liu
Fix left shift overflow issue when the parameter idx is greater than or equal to 8 in the calculation of perm in PIRx_ELx_PERM macro. Fix this by modifying the encoding to use a long integer type. Signed-off-by: Shiqi Liu <shiqiliu@hust.edu.cn> Acked-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20240421063328.29710-1-shiqiliu@hust.edu.cn Signed-off-by: Will Deacon <will@kernel.org>
2024-02-08arm64: Add macro to compose a sysreg field valueMarc Zyngier
A common idiom is to compose a tupple (reg, field, val) into a symbol matching an autogenerated definition. Add a help performing the concatenation and replace it when open-coded implementations exist. Suggested-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20240122181344.258974-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-01-04Merge branch 'for-next/sysregs' into for-next/coreWill Deacon
* for-next/sysregs: arm64/sysreg: Add missing system instruction definitions for FGT arm64/sysreg: Add missing system register definitions for FGT arm64/sysreg: Add missing ExtTrcBuff field definition to ID_AA64DFR0_EL1 arm64/sysreg: Add missing Pauth_LR field definitions to ID_AA64ISAR1_EL1 arm64/sysreg: Add new system registers for GCS arm64/sysreg: Add definition for FPMR arm64/sysreg: Update HCRX_EL2 definition for DDI0601 2023-09 arm64/sysreg: Update SCTLR_EL1 for DDI0601 2023-09 arm64/sysreg: Update ID_AA64SMFR0_EL1 definition for DDI0601 2023-09 arm64/sysreg: Add definition for ID_AA64FPFR0_EL1 arm64/sysreg: Add definition for ID_AA64ISAR3_EL1 arm64/sysreg: Update ID_AA64ISAR2_EL1 defintion for DDI0601 2023-09 arm64/sysreg: Add definition for ID_AA64PFR2_EL1 arm64/sysreg: update CPACR_EL1 register arm64/sysreg: add system register POR_EL{0,1} arm64/sysreg: Add definition for HAFGRTR_EL2 arm64/sysreg: Update HFGITR_EL2 definiton to DDI0601 2023-09
2023-12-17arm64/sysreg: Add missing system instruction definitions for FGTFuad Tabba
Add the definitions of missing system instructions that are trappable by fine grain traps. The definitions are based on DDI0602 2023-09. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20231214100158.2305400-5-tabba@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-12-11arm64/sysreg: add system register POR_EL{0,1}Joey Gouly
Add POR_EL{0,1} according to DDI0601 2023-03. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20231209-b4-arm64-sysreg-additions-v1-3-45284e538474@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-11-27arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]Anshuman Khandual
PAGE_SIZE support is tested against possible minimum and maximum values for its respective ID_AA64MMFR0.TGRAN field, depending on whether it is signed or unsigned. But then FEAT_LPA2 implementation needs to be validated for 4K and 16K page sizes via feature specific ID_AA64MMFR0.TGRAN values. Hence it adds FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] values per ARM ARM (0487G.A). Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231127111737.1897081-5-ryan.roberts@arm.com
2023-10-25KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WIMarc Zyngier
When trapping accesses from a NV guest that tries to access SPSR_{irq,abt,und,fiq}, make sure we handle them as RAZ/WI, as if AArch32 wasn't implemented. This involves a bit of repainting to make the visibility handler more generic. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231023095444.1587322-6-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-25arm64: Add missing _EL2 encodingsMiguel Luis
Some _EL2 encodings are missing. Add them. Signed-off-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> [maz: dropped secure encodings] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231023095444.1587322-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-25arm64: Add missing _EL12 encodingsMiguel Luis
Some _EL12 encodings are missing. Add them. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Miguel Luis <miguel.luis@oracle.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231023095444.1587322-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-09-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Clean up vCPU targets, always returning generic v8 as the preferred target - Trap forwarding infrastructure for nested virtualization (used for traps that are taken from an L2 guest and are needed by the L1 hypervisor) - FEAT_TLBIRANGE support to only invalidate specific ranges of addresses when collapsing a table PTE to a block PTE. This avoids that the guest refills the TLBs again for addresses that aren't covered by the table PTE. - Fix vPMU issues related to handling of PMUver. - Don't unnecessary align non-stack allocations in the EL2 VA space - Drop HCR_VIRT_EXCP_MASK, which was never used... - Don't use smp_processor_id() in kvm_arch_vcpu_load(), but the cpu parameter instead - Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort() - Remove prototypes without implementations RISC-V: - Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest - Added ONE_REG interface for SATP mode - Added ONE_REG interface to enable/disable multiple ISA extensions - Improved error codes returned by ONE_REG interfaces - Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V - Added get-reg-list selftest for KVM RISC-V s390: - PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch) Allows a PV guest to use crypto cards. Card access is governed by the firmware and once a crypto queue is "bound" to a PV VM every other entity (PV or not) looses access until it is not bound anymore. Enablement is done via flags when creating the PV VM. - Guest debug fixes (Ilya) x86: - Clean up KVM's handling of Intel architectural events - Intel bugfixes - Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use debug registers and generate/handle #DBs - Clean up LBR virtualization code - Fix a bug where KVM fails to set the target pCPU during an IRTE update - Fix fatal bugs in SEV-ES intrahost migration - Fix a bug where the recent (architecturally correct) change to reinject #BP and skip INT3 broke SEV guests (can't decode INT3 to skip it) - Retry APIC map recalculation if a vCPU is added/enabled - Overhaul emergency reboot code to bring SVM up to par with VMX, tie the "emergency disabling" behavior to KVM actually being loaded, and move all of the logic within KVM - Fix user triggerable WARNs in SVM where KVM incorrectly assumes the TSC ratio MSR cannot diverge from the default when TSC scaling is disabled up related code - Add a framework to allow "caching" feature flags so that KVM can check if the guest can use a feature without needing to search guest CPUID - Rip out the ancient MMU_DEBUG crud and replace the useful bits with CONFIG_KVM_PROVE_MMU - Fix KVM's handling of !visible guest roots to avoid premature triple fault injection - Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the API surface that is needed by external users (currently only KVMGT), and fix a variety of issues in the process Generic: - Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier events to pass action specific data without needing to constantly update the main handlers. - Drop unused function declarations Selftests: - Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs - Add support for printf() in guest code and covert all guest asserts to use printf-based reporting - Clean up the PMU event filter test and add new testcases - Include x86 selftests in the KVM x86 MAINTAINERS entry" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (279 commits) KVM: x86/mmu: Include mmu.h in spte.h KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots KVM: x86/mmu: Disallow guest from using !visible slots for page tables KVM: x86/mmu: Harden TDP MMU iteration against root w/o shadow page KVM: x86/mmu: Harden new PGD against roots without shadow pages KVM: x86/mmu: Add helper to convert root hpa to shadow page drm/i915/gvt: Drop final dependencies on KVM internal details KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers KVM: x86/mmu: Drop @slot param from exported/external page-track APIs KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled KVM: x86/mmu: Assert that correct locks are held for page write-tracking KVM: x86/mmu: Rename page-track APIs to reflect the new reality KVM: x86/mmu: Drop infrastructure for multiple page-track modes KVM: x86/mmu: Use page-track notifiers iff there are external users KVM: x86/mmu: Move KVM-only page-track declarations to internal header KVM: x86: Remove the unused page-track hook track_flush_slot() drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region() KVM: x86: Add a new page-track hook to handle memslot deletion drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot KVM: x86: Reject memslot MOVE operations if KVMGT is attached ...
2023-08-17KVM: arm64: Correctly handle ACCDATA_EL1 trapsMarc Zyngier
As we blindly reset some HFGxTR_EL2 bits to 0, we also randomly trap unsuspecting sysregs that have their trap bits with a negative polarity. ACCDATA_EL1 is one such register that can be accessed by the guest, causing a splat on the host as we don't have a proper handler for it. Adding such handler addresses the issue, though there are a number of other registers missing as the current architecture documentation doesn't describe them yet. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Jing Zhang <jingzhangos@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230815183903.2735724-11-maz@kernel.org
2023-08-17arm64: Add HDFGRTR_EL2 and HDFGWTR_EL2 layoutsMarc Zyngier
As we're about to implement full support for FEAT_FGT, add the full HDFGRTR_EL2 and HDFGWTR_EL2 layouts. Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Jing Zhang <jingzhangos@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230815183903.2735724-9-maz@kernel.org
2023-08-17arm64: Add missing BRB/CFP/DVP/CPP instructionsMarc Zyngier
HFGITR_EL2 traps a bunch of instructions for which we don't have encodings yet. Add them. Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Jing Zhang <jingzhangos@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230815183903.2735724-8-maz@kernel.org