summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-11-20KVM: arm64: vgic: Make vgic_get_irq() more robustMarc Zyngier
vgic_get_irq() has an awkward signature, as it takes both a kvm *and* a vcpu, where the vcpu is allowed to be NULL if the INTID being looked up is a global interrupt (SPI or LPI). This leads to potentially problematic situations where the INTID passed is a private interrupt, but that there is no vcpu. In order to make things less ambiguous, let have *two* helpers instead: - vgic_get_irq(struct kvm *kvm, u32 intid), which is only concerned with *global* interrupts, as indicated by the lack of vcpu. - vgic_get_vcpu_irq(struct kvm_vcpu *vcpu, u32 intid), which can return *any* interrupt class, but must have of course a non-NULL vcpu. Most of the code nicely falls under one or the other situations, except for a couple of cases (close to the UABI or in the debug code) where we have to distinguish between the two cases. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241117165757.247686-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-20KVM: arm64: vgic-v3: Sanitise guest writes to GICR_INVLPIRMarc Zyngier
Make sure we filter out non-LPI invalidation when handling writes to GICR_INVLPIR. Fixes: 4645d11f4a553 ("KVM: arm64: vgic-v3: Implement MMIO-based LPI invalidation") Reported-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241117165757.247686-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-12KVM: arm64: Pass on SVE mapping failuresJames Clark
This function can fail but its return value isn't passed onto the caller. Presumably this could result in a broken state. Fixes: 66d5b53e20a6 ("KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVM") Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Fuad Tabba <tabba@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241112105604.795809-1-james.clark@linaro.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11Merge branch kvm-arm64/vgic-its-fixes into kvmarm/nextOliver Upton
* kvm-arm64/vgic-its-fixes: : Fixes for vgic-its save/restore, courtesy of Kunkun Jiang and Jing Zhang : : Address bugs where restoring an ITS consumes a stale DTE/ITE, which : may lead to either garbage mappings in the ITS or the overall restore : ioctl failing. The fix in both cases is to zero a DTE/ITE when its : translation has been invalidated by the guest. KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE KVM: arm64: vgic-its: Clear DTE when MAPD unmaps a device KVM: arm64: vgic-its: Add a data length check in vgic_its_save_* Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITEKunkun Jiang
When DISCARD frees an ITE, it does not invalidate the corresponding ITE. In the scenario of continuous saves and restores, there may be a situation where an ITE is not saved but is restored. This is unreasonable and may cause restore to fail. This patch clears the corresponding ITE when DISCARD frees an ITE. Cc: stable@vger.kernel.org Fixes: eff484e0298d ("KVM: arm64: vgic-its: ITT save and restore") Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> [Jing: Update with entry write helper] Signed-off-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20241107214137.428439-6-jingzhangos@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11KVM: arm64: vgic-its: Clear DTE when MAPD unmaps a deviceKunkun Jiang
vgic_its_save_device_tables will traverse its->device_list to save DTE for each device. vgic_its_restore_device_tables will traverse each entry of device table and check if it is valid. Restore if valid. But when MAPD unmaps a device, it does not invalidate the corresponding DTE. In the scenario of continuous saves and restores, there may be a situation where a device's DTE is not saved but is restored. This is unreasonable and may cause restore to fail. This patch clears the corresponding DTE when MAPD unmaps a device. Cc: stable@vger.kernel.org Fixes: 57a9a117154c ("KVM: arm64: vgic-its: Device table save/restore") Co-developed-by: Shusen Li <lishusen2@huawei.com> Signed-off-by: Shusen Li <lishusen2@huawei.com> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> [Jing: Update with entry write helper] Signed-off-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20241107214137.428439-5-jingzhangos@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11KVM: arm64: vgic-its: Add a data length check in vgic_its_save_*Jing Zhang
In all the vgic_its_save_*() functinos, they do not check whether the data length is 8 bytes before calling vgic_write_guest_lock. This patch adds the check. To prevent the kernel from being blown up when the fault occurs, KVM_BUG_ON() is used. And the other BUG_ON()s are replaced together. Cc: stable@vger.kernel.org Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> [Jing: Update with the new entry read/write helpers] Signed-off-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20241107214137.428439-4-jingzhangos@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11Merge branch kvm-arm64/nv-pmu into kvmarm/nextOliver Upton
* kvm-arm64/nv-pmu: : Support for vEL2 PMU controls : : Align the vEL2 PMU support with the current state of non-nested KVM, : including: : : - Trap routing, with the annoying complication of EL2 traps that apply : in Host EL0 : : - PMU emulation, using the correct configuration bits depending on : whether a counter falls in the hypervisor or guest range of PMCs : : - Perf event swizzling across nested boundaries, as the event filtering : needs to be remapped to cope with vEL2 KVM: arm64: nv: Reprogram PMU events affected by nested transition KVM: arm64: nv: Apply EL2 event filtering when in hyp context KVM: arm64: nv: Honor MDCR_EL2.HLP KVM: arm64: nv: Honor MDCR_EL2.HPME KVM: arm64: Add helpers to determine if PMC counts at a given EL KVM: arm64: nv: Adjust range of accessible PMCs according to HPMN KVM: arm64: Rename kvm_pmu_valid_counter_mask() KVM: arm64: nv: Advertise support for FEAT_HPMN0 KVM: arm64: nv: Describe trap behaviour of MDCR_EL2.HPMN KVM: arm64: nv: Honor MDCR_EL2.{TPM, TPMCR} in Host EL0 KVM: arm64: nv: Reinject traps that take effect in Host EL0 KVM: arm64: nv: Rename BEHAVE_FORWARD_ANY KVM: arm64: nv: Allow coarse-grained trap combos to use complex traps KVM: arm64: Describe RES0/RES1 bits of MDCR_EL2 arm64: sysreg: Add new definitions for ID_AA64DFR0_EL1 arm64: sysreg: Migrate MDCR_EL2 definition to table arm64: sysreg: Describe ID_AA64DFR2_EL1 fields Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11Merge branch kvm-arm64/mmio-sea into kvmarm/nextOliver Upton
* kvm-arm64/mmio-sea: : Fix for SEA injection in response to MMIO : : Fix + test coverage for SEA injection in response to an unhandled MMIO : exit to userspace. Naturally, if userspace decides to abort an MMIO : instruction KVM shouldn't continue with instruction emulation... KVM: arm64: selftests: Add tests for MMIO external abort injection KVM: arm64: selftests: Convert to kernel's ESR terminology tools: arm64: Grab a copy of esr.h from kernel KVM: arm64: Don't retire aborted MMIO instruction Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11Merge branch kvm-arm64/misc into kvmarm/nextOliver Upton
* kvm-arm64/misc: : Miscellaneous updates : : - Drop useless check against vgic state in ICC_CLTR_EL1.SEIS read : emulation : : - Fix trap configuration for pKVM : : - Close the door on initialization bugs surrounding userspace irqchip : static key by removing it. KVM: selftests: Don't bother deleting memslots in KVM when freeing VMs KVM: arm64: Get rid of userspace_irqchip_in_use KVM: arm64: Initialize trap register values in hyp in pKVM KVM: arm64: Initialize the hypervisor's VM state at EL2 KVM: arm64: Refactor kvm_vcpu_enable_ptrauth() for hyp use KVM: arm64: Move pkvm_vcpu_init_traps() to init_pkvm_hyp_vcpu() KVM: arm64: Don't map 'kvm_vgic_global_state' at EL2 with pKVM KVM: arm64: Just advertise SEIS as 0 when emulating ICC_CTLR_EL1 Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11KVM: selftests: Don't bother deleting memslots in KVM when freeing VMsSean Christopherson
When freeing a VM, don't call into KVM to manually remove each memslot, simply cleanup and free any userspace assets associated with the memory region. KVM is ultimately responsible for ensuring kernel resources are freed when the VM is destroyed, deleting memslots one-by-one is unnecessarily slow, and unless a test is already leaking the VM fd, the VM will be destroyed when kvm_vm_release() is called. Not deleting KVM's memslot also allows cleaning up dead VMs without having to care whether or not the to-be-freed VM is dead or alive. Reported-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Reported-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/kvmarm/Zy0bcM0m-N18gAZz@google.com/ Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11Merge branch kvm-arm64/mpam-ni into kvmarm/nextOliver Upton
* kvm-arm64/mpam-ni: : Hiding FEAT_MPAM from KVM guests, courtesy of James Morse + Joey Gouly : : Fix a longstanding bug where FEAT_MPAM was accidentally exposed to KVM : guests + the EL2 trap configuration was not explicitly configured. As : part of this, bring in skeletal support for initialising the MPAM CPU : context so KVM can actually set traps for its guests. : : Be warned -- if this series leads to boot failures on your system, : you're running on turd firmware. : : As an added bonus (that builds upon the infrastructure added by the MPAM : series), allow userspace to configure CTR_EL0.L1Ip, courtesy of Shameer : Kolothum. KVM: arm64: Make L1Ip feature in CTR_EL0 writable from userspace KVM: arm64: selftests: Test ID_AA64PFR0.MPAM isn't completely ignored KVM: arm64: Disable MPAM visibility by default and ignore VMM writes KVM: arm64: Add a macro for creating filtered sys_reg_descs entries KVM: arm64: Fix missing traps of guest accesses to the MPAM registers arm64: cpufeature: discover CPU support for MPAM arm64: head.S: Initialise MPAM EL2 registers and disable traps arm64/sysreg: Convert existing MPAM sysregs and add the remaining entries Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11Merge branch kvm-arm64/psci-1.3 into kvmarm/nextOliver Upton
* kvm-arm64/psci-1.3: : PSCI v1.3 support, courtesy of David Woodhouse : : Bump KVM's PSCI implementation up to v1.3, with the added bonus of : implementing the SYSTEM_OFF2 call. Like other system-scoped PSCI calls, : this gets relayed to userspace for further processing with a new : KVM_SYSTEM_EVENT_SHUTDOWN flag. : : As an added bonus, implement client-side support for hibernation with : the SYSTEM_OFF2 call. arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate KVM: arm64: nvhe: Pass through PSCI v1.3 SYSTEM_OFF2 call KVM: selftests: Add test for PSCI SYSTEM_OFF2 KVM: arm64: Add support for PSCI v1.2 and v1.3 KVM: arm64: Add PSCI v1.3 SYSTEM_OFF2 function for hibernation firmware/psci: Add definitions for PSCI v1.3 specification Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11Merge branch kvm-arm64/nv-s1pie-s1poe into kvmarm/nextOliver Upton
* kvm-arm64/nv-s1pie-s1poe: (36 commits) : NV support for S1PIE/S1POE, courtesy of Marc Zyngier : : Complete support for S1PIE/S1POE at vEL2, including: : : - Save/restore of the vEL2 sysreg context : : - Use the S1PIE/S1POE context for fast-path AT emulation : : - Enlightening the software walker to the behavior of S1PIE/S1POE : : - Like any other good NV series, some trap routing descriptions KVM: arm64: Handle WXN attribute KVM: arm64: Handle stage-1 permission overlays KVM: arm64: Make PAN conditions part of the S1 walk context KVM: arm64: Disable hierarchical permissions when POE is enabled KVM: arm64: Add POE save/restore for AT emulation fast-path KVM: arm64: Add save/restore support for POR_EL2 KVM: arm64: Add basic support for POR_EL2 KVM: arm64: Add kvm_has_s1poe() helper KVM: arm64: Subject S1PIE/S1POE registers to HCR_EL2.{TVM,TRVM} KVM: arm64: Drop bogus CPTR_EL2.E0POE trap routing arm64: Add encoding for POR_EL2 KVM: arm64: Rely on visibility to let PIR*_ELx/TCR2_ELx UNDEF KVM: arm64: Hide S1PIE registers from userspace when disabled for guests KVM: arm64: Hide TCR2_EL1 from userspace when disabled for guests KVM: arm64: Define helper for EL2 registers with custom visibility KVM: arm64: Add a composite EL2 visibility helper KVM: arm64: Implement AT S1PIE support KVM: arm64: Disable hierarchical permissions when S1PIE is enabled KVM: arm64: Split S1 permission evaluation into direct and hierarchical parts KVM: arm64: Add AT fast-path support for S1PIE ... Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11KVM: arm64: Make L1Ip feature in CTR_EL0 writable from userspaceShameer Kolothum
Only allow userspace to set VIPT(0b10) or PIPT(0b11) for L1Ip based on what hardware reports as both AIVIVT (0b01) and VPIPT (0b00) are documented as reserved. Using a VIPT for Guest where hardware reports PIPT may lead to over invalidation, but is still correct. Hence, we can allow downgrading PIPT to VIPT, but not the other way around. Reviewed-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20241022073943.35764-1-shameerali.kolothum.thodi@huawei.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Get rid of userspace_irqchip_in_useRaghavendra Rao Ananta
Improper use of userspace_irqchip_in_use led to syzbot hitting the following WARN_ON() in kvm_timer_update_irq(): WARNING: CPU: 0 PID: 3281 at arch/arm64/kvm/arch_timer.c:459 kvm_timer_update_irq+0x21c/0x394 Call trace: kvm_timer_update_irq+0x21c/0x394 arch/arm64/kvm/arch_timer.c:459 kvm_timer_vcpu_reset+0x158/0x684 arch/arm64/kvm/arch_timer.c:968 kvm_reset_vcpu+0x3b4/0x560 arch/arm64/kvm/reset.c:264 kvm_vcpu_set_target arch/arm64/kvm/arm.c:1553 [inline] kvm_arch_vcpu_ioctl_vcpu_init arch/arm64/kvm/arm.c:1573 [inline] kvm_arch_vcpu_ioctl+0x112c/0x1b3c arch/arm64/kvm/arm.c:1695 kvm_vcpu_ioctl+0x4ec/0xf74 virt/kvm/kvm_main.c:4658 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl fs/ioctl.c:893 [inline] __arm64_sys_ioctl+0x108/0x184 fs/ioctl.c:893 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x78/0x1b8 arch/arm64/kernel/syscall.c:49 el0_svc_common+0xe8/0x1b0 arch/arm64/kernel/syscall.c:132 do_el0_svc+0x40/0x50 arch/arm64/kernel/syscall.c:151 el0_svc+0x54/0x14c arch/arm64/kernel/entry-common.c:712 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 The following sequence led to the scenario: - Userspace creates a VM and a vCPU. - The vCPU is initialized with KVM_ARM_VCPU_PMU_V3 during KVM_ARM_VCPU_INIT. - Without any other setup, such as vGIC or vPMU, userspace issues KVM_RUN on the vCPU. Since the vPMU is requested, but not setup, kvm_arm_pmu_v3_enable() fails in kvm_arch_vcpu_run_pid_change(). As a result, KVM_RUN returns after enabling the timer, but before incrementing 'userspace_irqchip_in_use': kvm_arch_vcpu_run_pid_change() ret = kvm_arm_pmu_v3_enable() if (!vcpu->arch.pmu.created) return -EINVAL; if (ret) return ret; [...] if (!irqchip_in_kernel(kvm)) static_branch_inc(&userspace_irqchip_in_use); - Userspace ignores the error and issues KVM_ARM_VCPU_INIT again. Since the timer is already enabled, control moves through the following flow, ultimately hitting the WARN_ON(): kvm_timer_vcpu_reset() if (timer->enabled) kvm_timer_update_irq() if (!userspace_irqchip()) ret = kvm_vgic_inject_irq() ret = vgic_lazy_init() if (unlikely(!vgic_initialized(kvm))) if (kvm->arch.vgic.vgic_model != KVM_DEV_TYPE_ARM_VGIC_V2) return -EBUSY; WARN_ON(ret); Theoretically, since userspace_irqchip_in_use's functionality can be simply replaced by '!irqchip_in_kernel()', get rid of the static key to avoid the mismanagement, which also helps with the syzbot issue. Cc: <stable@vger.kernel.org> Reported-by: syzbot <syzkaller@googlegroups.com> Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Reprogram PMU events affected by nested transitionOliver Upton
Start reprogramming PMU events at nested boundaries now that everything is in place to handle the EL2 event filter. Only repaint events where the filter differs between EL1 and EL2 as a slight optimization. PMU now 'works' for nested VMs, albeit slow. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182559.3364829-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Apply EL2 event filtering when in hyp contextOliver Upton
It hopefully comes as no surprise when I say that vEL2 actually runs at EL1. So, the guest hypervisor's EL2 event filter (NSH) needs to actually be applied to EL1 in the perf event. In addition to this, the disable bit for the guest counter range (HPMD) needs to have the effect of stopping the affected counters. Do exactly that by stuffing ::exclude_kernel with the combined effect of these controls. This isn't quite enough yet, as the backing perf events need to be reprogrammed upon nested ERET/exception entry to remap the effective filter onto ::exclude_kernel. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-18-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Honor MDCR_EL2.HLPOliver Upton
Counters that fall in the hypervisor range (i.e. N >= HPMN) have a separate control for enabling 64 bit overflow. Take it into account. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-17-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Honor MDCR_EL2.HPMEOliver Upton
When the PMU is configured with split counter ranges, HPME becomes the enable bit for the counters reserved for EL2. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-16-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add helpers to determine if PMC counts at a given ELOliver Upton
Checking the exception level filters for a PMC is a minor annoyance to open code. Add helpers to check if an event counts at EL0 and EL1, which will prove useful in a subsequent change. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-15-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Adjust range of accessible PMCs according to HPMNOliver Upton
The value of MDCR_EL2.HPMN controls the number of event counters made visible to EL0 and EL1. This means it is possible for the guest hypervisor to allow direct access to event counters to the L2. Rework KVM's PMU register emulation to take the effects of HPMN into account when handling a trap. For bitmask-style registers, writes only affect accessible registers. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-14-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Rename kvm_pmu_valid_counter_mask()Oliver Upton
Nested PMU support requires dynamically changing the visible range of PMU counters based on the exception level and value of MDCR_EL2.HPMN. At the same time, the PMU emulation code needs to know the absolute number of implemented counters, regardless of context. Rename the existing helper to make it obvious that it returns the number of implemented counters and not anything else. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-13-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Advertise support for FEAT_HPMN0Oliver Upton
Everything is in place now for KVM to actually handle MDCR_EL2.HPMN. Not only that, the emulation is capable of doing FEAT_HPMN0. Advertise support for the feature in the VM's ID registers. It is possible to emulate FEAT_HPMN0 on hardware that doesn't support it since KVM currently traps all PMU registers. Having said that, let's only advertise the feature on supporting hardware in case KVM ever provides 'direct' PMU support to VMs w/o involving host perf. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-12-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Describe trap behaviour of MDCR_EL2.HPMNOliver Upton
MDCR_EL2.HPMN splits the PMU event counters into two ranges: the first range is accessible from all ELs, and the second range is accessible only to EL2/3. Supposing the guest hypervisor allows direct access to the PMU counters from the L2, KVM needs to locally handle those accesses. Add a new complex trap configuration for HPMN that checks if the counter index is accessible to the current context. As written, the architecture suggests HPMN only causes PMEVCNTR<n>_EL0 to trap, though intuition (and the pseudocode) suggest that the trap applies to PMEVTYPER<n>_EL0 as well. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-11-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Honor MDCR_EL2.{TPM, TPMCR} in Host EL0Oliver Upton
TPM and TPMCR trap bits also affect Host EL0. How fun. Mark these two trap bits as such and take advantage of the new infrastructure for dealing w/ EL0 traps. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-10-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Reinject traps that take effect in Host EL0Oliver Upton
Wire up the other end of traps that affect host EL0 by actually injecting them into the guest hypervisor. Skip over FGT entirely, as a cursory glance suggests no FGT is effective in host EL0. Note that kvm_inject_nested() is already equipped for handling exceptions while the VM is already in a host context. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-9-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Rename BEHAVE_FORWARD_ANYOliver Upton
BEHAVE_FORWARD_ANY is slightly ambiguous, especially since we're about to cram some more information into the enum. Rephrase it. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-8-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Allow coarse-grained trap combos to use complex trapsOliver Upton
KVM uses a sanity-check to avoid infinite recursion in trap combinations that could potentially depend on itself. Narrow the scope of this sanity check to the exact CGT IDs that correspond w/ trap combos, opening the door to using 'complex' traps as part of a combination. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-7-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Describe RES0/RES1 bits of MDCR_EL2Oliver Upton
Add support for sanitising MDCR_EL2 and describe the RES0/RES1 bits according to the feature set exposed to the VM. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: sysreg: Add new definitions for ID_AA64DFR0_EL1Oliver Upton
Align the field definitions w/ DDI0601 2024-09 and opportunistically declare MTPMU as a signed field. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: sysreg: Migrate MDCR_EL2 definition to tableOliver Upton
Migrate MDCR_EL2 over to the sysreg table and align definitions with DDI0601 2024-09. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: sysreg: Describe ID_AA64DFR2_EL1 fieldsOliver Upton
Describe the new ID register in line with DDI0601 2024-09. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Initialize trap register values in hyp in pKVMFuad Tabba
Handle the initialization of trap registers at the hypervisor in pKVM, even for non-protected guests. The host is not trusted with the values of the trap registers, regardless of the VM type. Therefore, when switching between the host and the guests, only flush the HCR_EL2 TWI and TWE bits. The host is allowed to configure these for opportunistic scheduling, as neither affects the protection of VMs or the hypervisor. Reported-by: Will Deacon <will@kernel.org> Fixes: 814ad8f96e92 ("KVM: arm64: Drop trapping of PAuth instructions/keys") Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241018074833.2563674-5-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Initialize the hypervisor's VM state at EL2Fuad Tabba
Do not trust the state of the VM as provided by the host when initializing the hypervisor's view of the VM sate. Initialize it instead at EL2 to a known good and safe state, as pKVM already does with hypervisor VCPU states. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241018074833.2563674-4-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Refactor kvm_vcpu_enable_ptrauth() for hyp useFuad Tabba
Move kvm_vcpu_enable_ptrauth() to a shared header to be used by hypervisor code in protected mode. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241018074833.2563674-3-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Move pkvm_vcpu_init_traps() to init_pkvm_hyp_vcpu()Fuad Tabba
Move pkvm_vcpu_init_traps() to the initialization of the hypervisor's vcpu state in init_pkvm_hyp_vcpu(), and remove the associated hypercall. In protected mode, traps need to be initialized whenever a VCPU is initialized anyway, and not only for protected VMs. This also saves an unnecessary hypercall. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241018074833.2563674-2-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: selftests: Test ID_AA64PFR0.MPAM isn't completely ignoredJames Morse
The ID_AA64PFR0.MPAM bit was previously accidentally exposed to guests, and is ignored by KVM. KVM will always present the guest with 0 here, and trap the MPAM system registers to inject an undef. But, this value is still needed to prevent migration when the value is incompatible with the target hardware. Add a kvm unit test to try and write multiple values to ID_AA64PFR0.MPAM. Only the hardware value previously exposed should be ignored, all other values should be rejected. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-8-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Disable MPAM visibility by default and ignore VMM writesJames Morse
commit 011e5f5bf529f ("arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to guests, but didn't add trap handling. A previous patch supplied the missing trap handling. Existing VMs that have the MPAM field of ID_AA64PFR0_EL1 set need to be migratable, but there is little point enabling the MPAM CPU interface on new VMs until there is something a guest can do with it. Clear the MPAM field from the guest's ID_AA64PFR0_EL1 and on hardware that supports MPAM, politely ignore the VMMs attempts to set this bit. Guests exposed to this bug have the sanitised value of the MPAM field, so only the correct value needs to be ignored. This means the field can continue to be used to block migration to incompatible hardware (between MPAM=1 and MPAM=5), and the VMM can't rely on the field being ignored. Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-7-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add a macro for creating filtered sys_reg_descs entriesJames Morse
The sys_reg_descs array holds function pointers and reset value for managing the user-space and guest view of system registers. These are mostly created by a set of macro's as only some combinations of behaviour are needed. If a register needs special treatment, its sys_reg_descs entry is open-coded. This is true of some id registers where the value provided by user-space is validated by some helpers. Before adding another one of these, add a helper that covers the existing special cases. 'ID_FILTERED' expects helpers to set the user-space value, and retrieve the modified reset value. Like ID_WRITABLE() this uses id_visibility(), which should have no functional change for the registers converted to use ID_FILTERED(). read_sanitised_id_aa64dfr0_el1() and read_sanitised_id_aa64pfr0_el1() have been refactored to be called from kvm_read_sanitised_id_reg(), to try be consistent with ID_WRITABLE(). Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-6-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Fix missing traps of guest accesses to the MPAM registersJames Morse
commit 011e5f5bf529f ("arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to guests, but didn't add trap handling. If you are unlucky, this results in an MPAM aware guest being delivered an undef during boot. The host prints: | kvm [97]: Unsupported guest sys_reg access at: ffff800080024c64 [00000005] | { Op0( 3), Op1( 0), CRn(10), CRm( 5), Op2( 0), func_read }, Which results in: | Internal error: Oops - Undefined instruction: 0000000002000000 [#1] PREEMPT SMP | Modules linked in: | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc7-00559-gd89c186d50b2 #14616 | Hardware name: linux,dummy-virt (DT) | pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : test_has_mpam+0x18/0x30 | lr : test_has_mpam+0x10/0x30 | sp : ffff80008000bd90 ... | Call trace: | test_has_mpam+0x18/0x30 | update_cpu_capabilities+0x7c/0x11c | setup_cpu_features+0x14/0xd8 | smp_cpus_done+0x24/0xb8 | smp_init+0x7c/0x8c | kernel_init_freeable+0xf8/0x280 | kernel_init+0x24/0x1e0 | ret_from_fork+0x10/0x20 | Code: 910003fd 97ffffde 72001c00 54000080 (d538a500) | ---[ end trace 0000000000000000 ]--- | Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b | ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- Add the support to enable the traps, and handle the three guest accessible registers by injecting an UNDEF. This stops KVM from spamming the host log, but doesn't yet hide the feature from the id registers. With MPAM v1.0 we can trap the MPAMIDR_EL1 register only if ARM64_HAS_MPAM_HCR, with v1.1 an additional MPAM2_EL2.TIDR bit traps MPAMIDR_EL1 on platforms that don't have MPAMHCR_EL2. Enable one of these if either is supported. If neither is supported, the guest can discover that the CPU has MPAM support, and how many PARTID etc the host has ... but it can't influence anything, so its harmless. Fixes: 011e5f5bf529f ("arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register") CC: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/linux-arm-kernel/20200925160102.118858-1-james.morse@arm.com/ Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-5-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: cpufeature: discover CPU support for MPAMJames Morse
ARMv8.4 adds support for 'Memory Partitioning And Monitoring' (MPAM) which describes an interface to cache and bandwidth controls wherever they appear in the system. Add support to detect MPAM. Like SVE, MPAM has an extra id register that describes some more properties, including the virtualisation support, which is optional. Detect this separately so we can detect mismatched/insane systems, but still use MPAM on the host even if the virtualisation support is missing. MPAM needs enabling at the highest implemented exception level, otherwise the register accesses trap. The 'enabled' flag is accessible to lower exception levels, but its in a register that traps when MPAM isn't enabled. The cpufeature 'matches' hook is extended to test this on one of the CPUs, so that firmware can emulate MPAM as disabled if it is reserved for use by secure world. Secondary CPUs that appear late could trip cpufeature's 'lower safe' behaviour after the MPAM properties have been advertised to user-space. Add a verify call to ensure late secondaries match the existing CPUs. (If you have a boot failure that bisects here its likely your CPUs advertise MPAM in the id registers, but firmware failed to either enable or MPAM, or emulate the trap as if it were disabled) Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-4-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: head.S: Initialise MPAM EL2 registers and disable trapsJames Morse
Add code to head.S's el2_setup to detect MPAM and disable any EL2 traps. This register resets to an unknown value, setting it to the default parititons/pmg before we enable the MMU is the best thing to do. Kexec/kdump will depend on this if the previous kernel left the CPU configured with a restrictive configuration. If linux is booted at the highest implemented exception level el2_setup will clear the enable bit, disabling MPAM. This code can't be enabled until a subsequent patch adds the Kconfig and cpufeature boiler plate. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-3-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64/sysreg: Convert existing MPAM sysregs and add the remaining entriesJames Morse
Move the existing MPAM system register defines from sysreg.h to tools/sysreg and add the remaining system registers. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-2-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernateDavid Woodhouse
The PSCI v1.3 specification adds support for a SYSTEM_OFF2 function which is analogous to ACPI S4 state. This will allow hosting environments to determine that a guest is hibernated rather than just powered off, and handle that state appropriately on subsequent launches. Since commit 60c0d45a7f7a ("efi/arm64: use UEFI for system reset and poweroff") the EFI shutdown method is deliberately preferred over PSCI or other methods. So register a SYS_OFF_MODE_POWER_OFF handler which *only* handles the hibernation, leaving the original PSCI SYSTEM_OFF as a last resort via the legacy pm_power_off function pointer. The hibernation code already exports a system_entering_hibernation() function which is be used by the higher-priority handler to check for hibernation. That existing function just returns the value of a static boolean variable from hibernate.c, which was previously only set in the hibernation_platform_enter() code path. Set the same flag in the simpler code path around the call to kernel_power_off() too. An alternative way to hook SYSTEM_OFF2 into the hibernation code would be to register a platform_hibernation_ops structure with an ->enter() method which makes the new SYSTEM_OFF2 call. But that would have the unwanted side-effect of making hibernation take a completely different code path in hibernation_platform_enter(), invoking a lot of special dpm callbacks. Another option might be to add a new SYS_OFF_MODE_HIBERNATE mode, with fallback to SYS_OFF_MODE_POWER_OFF. Or to use the sys_off_data to indicate whether the power off is for hibernation. But this version works and is relatively simple. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20241019172459.2241939-7-dwmw2@infradead.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Handle WXN attributeMarc Zyngier
Until now, we didn't really care about WXN as it didn't have an effect on the R/W permissions (only the execution could be droppped), and therefore not of interest for AT. However, with S1POE, WXN can revoke the Write permission if an overlay is active and that execution is allowed. This *is* relevant to AT. Add full handling of WXN so that we correctly handle this case. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-38-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Handle stage-1 permission overlaysMarc Zyngier
We now have the intrastructure in place to emulate S1POE: - direct permissions are always overlay-capable - indirect permissions are overlay-capable if the permissions are in the 0b0xxx range - the overlays are strictly substractive Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-37-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Make PAN conditions part of the S1 walk contextMarc Zyngier
Move the conditions describing PAN as part of the s1_walk_info structure, in an effort to declutter the permission processing. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-36-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Disable hierarchical permissions when POE is enabledMarc Zyngier
The hierarchical permissions must be disabled when POE is enabled in the translation regime used for a given table walk. We store the two enable bits in the s1_walk_info structure so that they can be retrieved down the line, as they will be useful. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241023145345.1613824-35-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add POE save/restore for AT emulation fast-pathMarc Zyngier
Just like the other extensions affecting address translation, we must save/restore POE so that an out-of-context translation context can be restored and used with the AT instructions. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241023145345.1613824-34-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>