Age | Commit message (Collapse) | Author |
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, round #1
- Host driver for GICv5, the next generation interrupt controller for
arm64, including support for interrupt routing, MSIs, interrupt
translation and wired interrupts.
- Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
GICv5 hardware, leveraging the legacy VGIC interface.
- Userspace control of the 'nASSGIcap' GICv3 feature, allowing
userspace to disable support for SGIs w/o an active state on hardware
that previously advertised it unconditionally.
- Map supporting endpoints with cacheable memory attributes on systems
with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
maintenance on the address range.
- Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
hypervisor to inject external aborts into an L2 VM and take traps of
masked external aborts to the hypervisor.
- Convert more system register sanitization to the config-driven
implementation.
- Fixes to the visibility of EL2 registers, namely making VGICv3 system
registers accessible through the VGIC device instead of the ONE_REG
vCPU ioctls.
- Various cleanups and minor fixes.
|
|
KVM IRQ changes for 6.17
- Rework irqbypass to track/match producers and consumers via an xarray
instead of a linked list. Using a linked list leads to O(n^2) insertion
times, which is hugely problematic for use cases that create large numbers
of VMs. Such use cases typically don't actually use irqbypass, but
eliminating the pointless registration is a future problem to solve as it
likely requires new uAPI.
- Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *",
to avoid making a simple concept unnecessarily difficult to understand.
- Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC,
and PIT emulation at compile time.
- Drop x86's irq_comm.c, and move a pile of IRQ related code into irq.c.
- Fix a variety of flaws and bugs in the AVIC device posted IRQ code.
- Inhibited AVIC if a vCPU's ID is too big (relative to what hardware
supports) instead of rejecting vCPU creation.
- Extend enable_ipiv module param support to SVM, by simply leaving IsRunning
clear in the vCPU's physical ID table entry.
- Disable IPI virtualization, via enable_ipiv, if the CPU is affected by
erratum #1235, to allow (safely) enabling AVIC on such CPUs.
- Dedup x86's device posted IRQ code, as the vast majority of functionality
can be shared verbatime between SVM and VMX.
- Harden the device posted IRQ code against bugs and runtime errors.
- Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1)
instead of O(n).
- Generate GA Log interrupts if and only if the target vCPU is blocking, i.e.
only if KVM needs a notification in order to wake the vCPU.
- Decouple device posted IRQs from VFIO device assignment, as binding a VM to
a VFIO group is not a requirement for enabling device posted IRQs.
- Clean up and document/comment the irqfd assignment code.
- Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e.
ensure an eventfd is bound to at most one irqfd through the entire host,
and add a selftest to verify eventfd:irqfd bindings are globally unique.
|
|
* kvm-arm64/doublefault2: (33 commits)
: NV Support for FEAT_RAS + DoubleFault2
:
: Delegate the vSError context to the guest hypervisor when in a nested
: state, including registers related to ESR propagation. Additionally,
: catch up KVM's external abort infrastructure to the architecture,
: implementing the effects of FEAT_DoubleFault2.
:
: This has some impact on non-nested guests, as SErrors deemed unmasked at
: the time they're made pending are now immediately injected with an
: emulated exception entry rather than using the VSE bit.
KVM: arm64: Make RAS registers UNDEF when RAS isn't advertised
KVM: arm64: Filter out HCR_EL2 bits when running in hypervisor context
KVM: arm64: Check for SYSREGS_ON_CPU before accessing the CPU state
KVM: arm64: Commit exceptions from KVM_SET_VCPU_EVENTS immediately
KVM: arm64: selftests: Test ESR propagation for vSError injection
KVM: arm64: Populate ESR_ELx.EC for emulated SError injection
KVM: arm64: selftests: Catch up set_id_regs with the kernel
KVM: arm64: selftests: Add SCTLR2_EL1 to get-reg-list
KVM: arm64: selftests: Test SEAs are taken to SError vector when EASE=1
KVM: arm64: selftests: Add basic SError injection test
KVM: arm64: Don't retire MMIO instruction w/ pending (emulated) SError
KVM: arm64: Advertise support for FEAT_DoubleFault2
KVM: arm64: Advertise support for FEAT_SCTLR2
KVM: arm64: nv: Enable vSErrors when HCRX_EL2.TMEA is set
KVM: arm64: nv: Honor SError routing effects of SCTLR2_ELx.NMEA
KVM: arm64: nv: Take "masked" aborts to EL2 when HCRX_EL2.TMEA is set
KVM: arm64: Route SEAs to the SError vector when EASE is set
KVM: arm64: nv: Ensure Address size faults affect correct ESR
KVM: arm64: Factor out helper for selecting exception target EL
KVM: arm64: Describe SCTLR2_ELx RESx masks
...
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
To date KVM has used HCR_EL2.VSE to track the state of a pending SError
for the guest. With this bit set, hardware respects the EL1 exception
routing / masking rules and injects the vSError when appropriate.
This isn't correct for NV guests as hardware is oblivious to vEL2's
intentions for SErrors. Better yet, with FEAT_NV2 the guest can change
the routing behind our back as HCR_EL2 is redirected to memory. Cope
with this mess by:
- Using a flag (instead of HCR_EL2.VSE) to track the pending SError
state when SErrors are unconditionally masked for the current context
- Resampling the routing / masking of a pending SError on every guest
entry/exit
- Emulating exception entry when SError routing implies a translation
regime change
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-7-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Per R_VRLPB, a pending SError is a WFI wakeup event regardless of
PSTATE.A, meaning that the vCPU is runnable. Sample VSE in addition to
the other IRQ lines.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-5-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
A common idiom in the KVM code is to check if we are currently
dealing with a "nested" context, defined as having NV enabled,
but being in the EL1&0 translation regime.
This is usually expressed as:
if (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu) ... )
which is a mouthful and a bit hard to read, specially when followed
by additional conditions.
Introduce a new helper that encapsulate these two terms, allowing
the above to be written as
if (is_nested_context(vcpu) ... )
which is both shorter and easier to read, and makes more obvious
the potential for simplification on some code paths.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-4-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Introduce a new KVM capability to expose to the userspace whether
cacheable mapping of PFNMAP is supported.
The ability to safely do the cacheable mapping of PFNMAP is contingent
on S2FWB and ARM64_HAS_CACHE_DIC. S2FWB allows KVM to avoid flushing
the D cache, ARM64_HAS_CACHE_DIC allows KVM to avoid flushing the icache
and turns icache_inval_pou() into a NOP. The cap would be false if
those requirements are missing and is checked by making use of
kvm_arch_supports_cacheable_pfnmap.
This capability would allow userspace to discover the support.
It could for instance be used by userspace to prevent live-migration
across FWB and non-FWB hosts.
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Jason Gunthorpe <jgg@nvidia.com>
CC: Oliver Upton <oliver.upton@linux.dev>
CC: David Hildenbrand <david@redhat.com>
Suggested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250705071717.5062-7-ankita@nvidia.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Historically KVM hyp code saved the host's FPSIMD state into the hosts's
fpsimd_state memory, and so it was necessary to map this into the hyp
Stage-1 mappings before running a vCPU.
This is no longer necessary as of commits:
* fbc7e61195e2 ("KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state")
* 8eca7f6d5100 ("KVM: arm64: Remove host FPSIMD saving for non-protected KVM")
Since those commits, we eagerly save the host's FPSIMD state before
calling into hyp to run a vCPU, and hyp code never reads nor writes the
host's fpsimd_state memory. There's no longer any need to map the host's
fpsimd_state memory into the hyp Stage-1, and kvm_arch_vcpu_run_map_fp()
is unnecessary but benign.
Remove kvm_arch_vcpu_run_map_fp(). Currently there is no code to perform
a corresponding unmap, and we never mapped the host's SVE or SME state
into the hyp Stage-1, so no other code needs to be removed.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Cc: kvmarm@lists.linux.dev
Reviewed-by: Mark Brown <broonie@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20250619134817.4075340-1-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Marc reported that enabling protected mode on a device with GICv2
doesn't fail gracefully as one would expect, and leads to a host
kernel crash.
As it turns out, the first half of pKVM init happens before the vgic
probe, and so by the time we find out we have a GICv2 we're already
committed to keeping the pKVM vectors installed at EL2 -- pKVM rejects
stub HVCs for obvious security reasons. However, the error path on KVM
init leads to teardown_hyp_mode() which unconditionally frees hypervisor
allocations (including the EL2 stacks and per-cpu pages) under the
assumption that a previous cpu_hyp_uninit() execution has reset the
vectors back to the stubs, which is false with pKVM.
Interestingly, host stage-2 protection is not enabled yet at this point,
so this use-after-free may go unnoticed for a while. The issue becomes
more obvious after the finalize_pkvm() call.
Fix this by keeping track of the CPUs on which pKVM is initialized in
the kvm_hyp_initialized per-cpu variable, and use it from
teardown_hyp_mode() to skip freeing pages that are in fact used.
Fixes: a770ee80e662 ("KVM: arm64: pkvm: Disable GICv2 support")
Reported-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250626101014.1519345-1-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In the unlikely case pKVM failed to allocate carveout, the error path
tries to access NULL ptr when it de-reference the SVE state from the
uninitialized nVHE per-cpu base.
[ 1.575420] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 1.576010] pc : teardown_hyp_mode+0xe4/0x180
[ 1.576920] lr : teardown_hyp_mode+0xd0/0x180
[ 1.577308] sp : ffff8000826fb9d0
[ 1.577600] x29: ffff8000826fb9d0 x28: 0000000000000000 x27: ffff80008209b000
[ 1.578383] x26: ffff800081dde000 x25: ffff8000820493c0 x24: ffff80008209eb00
[ 1.579180] x23: 0000000000000040 x22: 0000000000000001 x21: 0000000000000000
[ 1.579881] x20: 0000000000000002 x19: ffff800081d540b8 x18: 0000000000000000
[ 1.580544] x17: ffff800081205230 x16: 0000000000000152 x15: 00000000fffffff8
[ 1.581183] x14: 0000000000000008 x13: fff00000ff7f6880 x12: 000000000000003e
[ 1.581813] x11: 0000000000000002 x10: 00000000000000ff x9 : 0000000000000000
[ 1.582503] x8 : 0000000000000000 x7 : 7f7f7f7f7f7f7f7f x6 : 43485e525851ff30
[ 1.583140] x5 : fff00000ff6e9030 x4 : fff00000ff6e8f80 x3 : 0000000000000000
[ 1.583780] x2 : 0000000000000000 x1 : 0000000000000002 x0 : 0000000000000000
[ 1.584526] Call trace:
[ 1.584945] teardown_hyp_mode+0xe4/0x180 (P)
[ 1.585578] init_hyp_mode+0x920/0x994
[ 1.586005] kvm_arm_init+0xb4/0x25c
[ 1.586387] do_one_initcall+0xe0/0x258
[ 1.586819] do_initcall_level+0xa0/0xd4
[ 1.587224] do_initcalls+0x54/0x94
[ 1.587606] do_basic_setup+0x1c/0x28
[ 1.587998] kernel_init_freeable+0xc8/0x130
[ 1.588409] kernel_init+0x20/0x1a4
[ 1.588768] ret_from_fork+0x10/0x20
[ 1.589568] Code: f875db48 8b1c0109 f100011f 9a8903e8 (f9463100)
[ 1.590332] ---[ end trace 0000000000000000 ]---
As Quentin pointed, the order of free is also wrong, we need to free
SVE state first before freeing the per CPU ptrs.
I initially observed this on 6.12, but I could also repro in master.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Fixes: 66d5b53e20a6 ("KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVM")
Reviewed-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250625123058.875179-1-smostafa@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Fold kvm_arch_irqfd_route_changed() into kvm_arch_update_irqfd_routing().
Calling arch code to know whether or not to call arch code is absurd.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250611224604.313496-35-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Don't bother WARNing if updating an IRTE route fails now that vendor code
provides much more precise WARNs. The generic WARN doesn't provide enough
information to actually debug the problem, and has obviously done nothing
to surface the myriad bugs in KVM x86's implementation.
Drop all of the associated return code plumbing that existed just so that
common KVM could WARN.
Link: https://lore.kernel.org/r/20250611224604.313496-34-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
When updating IRTEs in response to a GSI routing or IRQ bypass change,
pass the new/current routing information along with the associated irqfd.
This will allow KVM x86 to harden, simplify, and deduplicate its code.
Since adding/removing a bypass producer is now conveniently protected with
irqfds.lock, i.e. can't run concurrently with kvm_irq_routing_update(),
use the routing information cached in the irqfd instead of looking up
the information in the current GSI routing tables.
Opportunistically convert an existing printk() to pr_info() and put its
string onto a single line (old code that strictly adhered to 80 chars).
Link: https://lore.kernel.org/r/20250611224604.313496-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Explicitly treat type differences as GSI routing changes, as comparing MSI
data between two entries could get a false negative, e.g. if userspace
changed the type but left the type-specific data as-
Note, the same bug was fixed in x86 by commit bcda70c56f3e ("KVM: x86:
Explicitly treat routing entry type changes as changes").
Fixes: 4bf3693d36af ("KVM: arm64: Unmap vLPIs affected by changes to GSI routing information")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250611224604.313496-3-seanjc@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.16, take #1
- Make the irqbypass hooks resilient to changes in the GSI<->MSI
routing, avoiding behind stale vLPI mappings being left behind. The
fix is to resolve the VGIC IRQ using the host IRQ (which is stable)
and nuking the vLPI mapping upon a routing change.
- Close another VGIC race where vCPU creation races with VGIC
creation, leading to in-flight vCPUs entering the kernel w/o private
IRQs allocated.
- Fix a build issue triggered by the recently added workaround for
Ampere's AC04_CPU_23 erratum.
- Correctly sign-extend the VA when emulating a TLBI instruction
potentially targeting a VNCR mapping.
- Avoid dereferencing a NULL pointer in the VGIC debug code, which can
happen if the device doesn't have any mapping yet.
|
|
KVM's interrupt infrastructure is dodgy at best, allowing for some ugly
'off label' usage of the various UAPIs. In one example, userspace can
change the routing entry of a particular "GSI" after configuring
irqbypass with KVM_IRQFD. KVM/arm64 is oblivious to this, and winds up
preserving the stale translation in cases where vLPIs are configured.
Honor userspace's intentions and tear down the vLPI mapping if affected
by a "GSI" routing change. Make no attempt to reconstruct vLPIs if the
new target is an MSI and just fall back to software injection.
Tested-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250523194722.4066715-5-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The virtual mapping and "GSI" routing of a particular vLPI is subject to
change in response to the guest / userspace. This can be pretty annoying
to deal with when KVM needs to track the physical state that's managed
for vLPI direct injection.
Make vgic_v4_unset_forwarding() resilient by using the host IRQ to
resolve the vgic IRQ. Since this uses the LPI xarray directly, finding
the ITS by doorbell address + grabbing it's its_lock is no longer
necessary. Note that matching the right ITS / ITE is already handled in
vgic_v4_set_forwarding(), and unless there's a bug in KVM's VGIC ITS
emulation the virtual mapping that should remain stable for the lifetime
of the vLPI mapping.
Tested-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250523194722.4066715-4-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Use kvm_trylock_all_vcpus instead of a custom implementation when locking
all vCPUs of a VM, to avoid triggering a lockdep warning, in the case in
which the VM is configured to have more than MAX_LOCK_DEPTH vCPUs.
This fixes the following false lockdep warning:
[ 328.171264] BUG: MAX_LOCK_DEPTH too low!
[ 328.175227] turning off the locking correctness validator.
[ 328.180726] Please attach the output of /proc/lock_stat to the bug report
[ 328.187531] depth: 48 max: 48!
[ 328.190678] 48 locks held by qemu-kvm/11664:
[ 328.194957] #0: ffff800086de5ba0 (&kvm->lock){+.+.}-{3:3}, at: kvm_ioctl_create_device+0x174/0x5b0
[ 328.204048] #1: ffff0800e78800b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
[ 328.212521] #2: ffff07ffeee51e98 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
[ 328.220991] #3: ffff0800dc7d80b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
[ 328.229463] #4: ffff07ffe0c980b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
[ 328.237934] #5: ffff0800a3883c78 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
[ 328.246405] #6: ffff07fffbe480b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-ID: <20250512180407.659015-6-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
* kvm-arm64/nv-nv:
: .
: Flick the switch on the NV support by adding the missing piece
: in the form of the VNCR page management. From the cover letter:
:
: "This is probably the most interesting bit of the whole NV adventure.
: So far, everything else has been a walk in the park, but this one is
: where the real fun takes place.
:
: With FEAT_NV2, most of the NV support revolves around tricking a guest
: into accessing memory while it tries to access system registers. The
: hypervisor's job is to handle the context switch of the actual
: registers with the state in memory as needed."
: .
KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
KVM: arm64: Document NV caps and vcpu flags
KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2*
KVM: arm64: nv: Remove dead code from ERET handling
KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch
KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2
KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address
KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers
KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2
KVM: arm64: nv: Handle VNCR_EL2-triggered faults
KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2
KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2
KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting
KVM: arm64: nv: Move TLBI range decoding to a helper
KVM: arm64: nv: Snapshot S1 ASID tagging information during walk
KVM: arm64: nv: Extract translation helper from the AT code
KVM: arm64: nv: Allocate VNCR page when required
arm64: sysreg: Add layout for VNCR_EL2
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/fgt-masks: (43 commits)
: .
: Large rework of the way KVM deals with trap bits in conjunction with
: the CPU feature registers. It now draws a direct link between which
: the feature set, the system registers that need to UNDEF to match
: the configuration and bits that need to behave as RES0 or RES1 in
: the trap registers that are visible to the guest.
:
: Best of all, these definitions are mostly automatically generated
: from the JSON description published by ARM under a permissive
: license.
: .
KVM: arm64: Handle TSB CSYNC traps
KVM: arm64: Add FGT descriptors for FEAT_FGT2
KVM: arm64: Allow sysreg ranges for FGT descriptors
KVM: arm64: Add context-switch for FEAT_FGT2 registers
KVM: arm64: Add trap routing for FEAT_FGT2 registers
KVM: arm64: Add sanitisation for FEAT_FGT2 registers
KVM: arm64: Add FEAT_FGT2 registers to the VNCR page
KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits
KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits
KVM: arm64: Allow kvm_has_feat() to take variable arguments
KVM: arm64: Use FGT feature maps to drive RES0 bits
KVM: arm64: Validate FGT register descriptions against RES0 masks
KVM: arm64: Switch to table-driven FGU configuration
KVM: arm64: Handle PSB CSYNC traps
KVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask
KVM: arm64: Remove hand-crafted masks for FGT registers
KVM: arm64: Use computed FGT masks to setup FGT registers
KVM: arm64: Propagate FGT masks to the nVHE hypervisor
KVM: arm64: Unconditionally configure fine-grain traps
KVM: arm64: Use computed masks as sanitisers for FGT registers
...
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/pkvm-selftest-6.16:
: .
: pKVM selftests covering the memory ownership transitions by
: Quentin Perret. From the initial cover letter:
:
: "We have recently found a bug [1] in the pKVM memory ownership
: transitions by code inspection, but it could have been caught with a
: test.
:
: Introduce a boot-time selftest exercising all the known pKVM memory
: transitions and importantly checks the rejection of illegal transitions.
:
: The new test is hidden behind a new Kconfig option separate from
: CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the
: transition checks ([1] doesn't reproduce with EL2 debug enabled).
:
: [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/"
: .
KVM: arm64: Extend pKVM selftest for np-guests
KVM: arm64: Selftest for pKVM transitions
KVM: arm64: Don't WARN from __pkvm_host_share_guest()
KVM: arm64: Add .hyp.data section
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Just like the FEAT_FGT registers, treat the FGT2 variant the same
way. THis is a large update, but a fairly mechanical one.
The config dependencies are extracted from the 2025-03 JSON drop.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Since we're (almost) feature complete, let's allow userspace to
request KVM_ARM_VCPU_EL2* by bumping KVM_VCPU_MAX_FEATURES up.
We also now advertise the features to userspace with new capabilities.
It's going to be great...
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20250514103501.2225951-17-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
FEAT_NV2 introduces an interesting problem for NV, as VNCR_EL2.BADDR
is a virtual address in the EL2&0 (or EL2, but we thankfully ignore
this) translation regime.
As we need to replicate such mapping in the real EL2, it means that
we need to remember that there is such a translation, and that any
TLBI affecting EL2 can possibly affect this translation.
It also means that any invalidation driven by an MMU notifier must
be able to shoot down any such mapping.
All in all, we need a data structure that represents this mapping,
and that is extremely close to a TLB. Given that we can only use
one of those per vcpu at any given time, we only allocate one.
No effort is made to keep that structure small. If we need to
start caching multiple of them, we may want to revisit that design
point. But for now, it is kept simple so that we can reason about it.
Oh, and add a braindump of how things are supposed to work, because
I will definitely page this out at some point. Yes, pun intended.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-8-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The nVHE hypervisor needs to have access to its own view of the FGT
masks, which unfortunately results in a bit of data duplication.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The hypervisor has not needed its own .data section because all globals
were either .rodata or .bss. To avoid having to initialize future
data-structures at run-time, let's introduce add a .data section to the
hypervisor.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416160900.3078417-2-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
kvm_arch_has_irq_bypass() is a small function and even though it does
not appear in any *really* hot paths, it's also not entirely rare.
Make it inline---it also works out nicely in preparation for using it in
kvm-intel.ko and kvm-amd.ko, since the function is not currently exported.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
* kvm-arm64/pmu-fixes:
: vPMU fixes for 6.15 courtesy of Akihiko Odaki
:
: Various fixes to KVM's vPMU implementation, notably ensuring
: userspace-directed changes to the PMCs are reflected in the backing perf
: events.
KVM: arm64: PMU: Reload when resetting
KVM: arm64: PMU: Reload when user modifies registers
KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs
KVM: arm64: PMU: Assume PMU presence in pmu-emul.c
KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
* kvm-arm64/pkvm-6.15:
: pKVM updates for 6.15
:
: - SecPageTable stats for stage-2 table pages allocated by the protected
: hypervisor (Vincent Donnefort)
:
: - HCRX_EL2 trap + vCPU initialization fixes for pKVM (Fuad Tabba)
KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu
KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
KVM: arm64: Initialize HCRX_EL2 traps in pKVM
KVM: arm64: Factor out setting HCRX_EL2 traps into separate function
KVM: arm64: Count pKVM stage-2 usage in secondary pagetable stats
KVM: arm64: Distinct pKVM teardown memcache for stage-2
KVM: arm64: Add flags to kvm_hyp_memcache
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
* kvm-arm64/writable-midr:
: Writable implementation ID registers, courtesy of Sebastian Ott
:
: Introduce a new capability that allows userspace to set the
: ID registers that identify a CPU implementation: MIDR_EL1, REVIDR_EL1,
: and AIDR_EL1. Also plug a hole in KVM's trap configuration where
: SMIDR_EL1 was readable at EL1, despite the fact that KVM does not
: support SME.
KVM: arm64: Fix documentation for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS
KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writable
KVM: arm64: Copy guest CTR_EL0 into hyp VM
KVM: selftests: arm64: Test writes to MIDR,REVIDR,AIDR
KVM: arm64: Allow userspace to change the implementation ID registers
KVM: arm64: Load VPIDR_EL2 with the VM's MIDR_EL1 value
KVM: arm64: Maintain per-VM copy of implementation ID regs
KVM: arm64: Set HCR_EL2.TID1 unconditionally
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
* kvm-arm64/pmuv3-asahi:
: Support PMUv3 for KVM guests on Apple silicon
:
: Take advantage of some IMPLEMENTATION DEFINED traps available on Apple
: parts to trap-and-emulate the PMUv3 registers on behalf of a KVM guest.
: Constrain the vPMU to a cycle counter and single event counter, as the
: Apple PMU has events that cannot be counted on every counter.
:
: There is a small new interface between the ARM PMU driver and KVM, where
: the PMU driver owns the PMUv3 -> hardware event mappings.
arm64: Enable IMP DEF PMUv3 traps on Apple M*
KVM: arm64: Provide 1 event counter on IMPDEF hardware
drivers/perf: apple_m1: Provide helper for mapping PMUv3 events
KVM: arm64: Remap PMUv3 events onto hardware
KVM: arm64: Advertise PMUv3 if IMPDEF traps are present
KVM: arm64: Compute synthetic sysreg ESR for Apple PMUv3 traps
KVM: arm64: Move PMUVer filtering into KVM code
KVM: arm64: Use guard() to cleanup usage of arm_pmus_lock
KVM: arm64: Drop kvm_arm_pmu_available static key
KVM: arm64: Use a cpucap to determine if system supports FEAT_PMUv3
KVM: arm64: Always support SW_INCR PMU event
KVM: arm64: Compute PMCEID from arm_pmu's event bitmaps
drivers/perf: apple_m1: Support host/guest event filtering
drivers/perf: apple_m1: Refactor event select/filter configuration
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
* kvm-arm64/nv-idregs:
: Changes to exposure of NV features, courtesy of Marc Zyngier
:
: Apply NV-specific feature restrictions at reset rather than at the point
: of KVM_RUN. This makes the true feature set visible to userspace, a
: necessary step towards save/restore support or NV VMs.
:
: Add an additional vCPU feature flag for selecting the E2H0 flavor of NV,
: such that the VHE-ness of the VM can be applied to the feature set.
KVM: arm64: selftests: Test that TGRAN*_2 fields are writable
KVM: arm64: Allow userspace to write ID_AA64MMFR0_EL1.TGRAN*_2
KVM: arm64: Advertise FEAT_ECV when possible
KVM: arm64: Make ID_AA64MMFR4_EL1.NV_frac writable
KVM: arm64: Allow userspace to limit NV support to nVHE
KVM: arm64: Move NV-specific capping to idreg sanitisation
KVM: arm64: Enforce NV limits on a per-idregs basis
KVM: arm64: Make ID_REG_LIMIT_FIELD_ENUM() more widely available
KVM: arm64: Consolidate idreg callbacks
KVM: arm64: Advertise NV2 in the boot messages
KVM: arm64: Mark HCR.EL2.{NV*,AT} RES0 when ID_AA64MMFR4_EL1.NV_frac is 0
KVM: arm64: Mark HCR.EL2.E2H RES0 when ID_AA64MMFR1_EL1.VH is zero
KVM: arm64: Hide ID_AA64MMFR2_EL1.NV from guest and userspace
arm64: cpufeature: Handle NV_frac as a synonym of NV2
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
* kvm-arm64/nv-vgic:
: NV VGICv3 support, courtesy of Marc Zyngier
:
: Support for emulating the GIC hypervisor controls and managing shadow
: VGICv3 state for the L1 hypervisor. As part of it, bring in support for
: taking IRQs to the L1 and UAPI to manage the VGIC maintenance interrupt.
KVM: arm64: nv: Fail KVM init if asking for NV without GICv3
KVM: arm64: nv: Allow userland to set VGIC maintenance IRQ
KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup
KVM: arm64: nv: Propagate used_lrs between L1 and L0 contexts
KVM: arm64: nv: Request vPE doorbell upon nested ERET to L2
KVM: arm64: nv: Respect virtual HCR_EL2.TWx setting
KVM: arm64: nv: Add Maintenance Interrupt emulation
KVM: arm64: nv: Handle L2->L1 transition on interrupt injection
KVM: arm64: nv: Nested GICv3 emulation
KVM: arm64: nv: Sanitise ICH_HCR_EL2 accesses
KVM: arm64: nv: Plumb handling of GICv3 EL2 accesses
KVM: arm64: nv: Add ICH_*_EL2 registers to vpcu_sysreg
KVM: arm64: nv: Load timer before the GIC
arm64: sysreg: Add layout for ICH_MISR_EL2
arm64: sysreg: Add layout for ICH_VTR_EL2
arm64: sysreg: Add layout for ICH_HCR_EL2
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
* kvm-arm64/misc:
: Miscellaneous fixes/cleanups for KVM/arm64
:
: - Avoid GICv4 vLPI configuration when confronted with user error
:
: - Only attempt vLPI configuration when the target routing is an MSI
:
: - Document ordering requirements to avoid aforementioned user error
KVM: arm64: Tear down vGIC on failed vCPU creation
KVM: arm64: Document ordering requirements for irqbypass
KVM: arm64: vgic-v4: Fall back to software irqbypass if LPI not found
KVM: arm64: vgic-v4: Only WARN for HW IRQ mismatch when unmapping vLPI
KVM: arm64: vgic-v4: Only attempt vLPI mapping for actual MSIs
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
If kvm_arch_vcpu_create() fails to share the vCPU page with the
hypervisor, we propagate the error back to the ioctl but leave the
vGIC vCPU data initialised. Note only does this leak the corresponding
memory when the vCPU is destroyed but it can also lead to use-after-free
if the redistributor device handling tries to walk into the vCPU.
Add the missing cleanup to kvm_arch_vcpu_create(), ensuring that the
vGIC vCPU structures are destroyed on error.
Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250314133409.9123-1-will@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Many functions in pmu-emul.c checks kvm_vcpu_has_pmu(vcpu). A favorable
interpretation is defensive programming, but it also has downsides:
- It is confusing as it implies these functions are called without PMU
although most of them are called only when a PMU is present.
- It makes semantics of functions fuzzy. For example, calling
kvm_pmu_disable_counter_mask() without PMU may result in no-op as
there are no enabled counters, but it's unclear what
kvm_pmu_get_counter_value() returns when there is no PMU.
- It allows callers without checking kvm_vcpu_has_pmu(vcpu), but it is
often wrong to call these functions without PMU.
- It is error-prone to duplicate kvm_vcpu_has_pmu(vcpu) checks into
multiple functions. Many functions are called for system registers,
and the system register infrastructure already employs less
error-prone, comprehensive checks.
Check kvm_vcpu_has_pmu(vcpu) in callers of these functions instead,
and remove the obsolete checks from pmu-emul.c. The only exceptions are
the functions that implement ioctls as they have definitive semantics
even when the PMU is not present.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250315-pmc-v5-2-ecee87dab216@daynix.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Instead of creating and initializing _all_ hyp vcpus in pKVM when
the first host vcpu runs for the first time, initialize _each_
hyp vcpu in conjunction with its corresponding host vcpu.
Some of the host vcpu state (e.g., system registers and traps
values) is not initialized until the first time the host vcpu is
run. Therefore, initializing a hyp vcpu before its corresponding
host vcpu has run for the first time might not view the complete
host state of these vcpus.
Additionally, this behavior is inline with non-protected modes.
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20250314111832.4137161-5-tabba@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
With the PMUv3 cpucap, kvm_arm_pmu_available is no longer used in the
hot path of guest entry/exit. On top of that, guest support for PMUv3
may not correlate with host support for the feature, e.g. on IMPDEF
hardware.
Throw out the static key and just inspect the list of PMUs to determine
if PMUv3 is supported for KVM guests.
Tested-by: Janne Grunau <j@jannau.net>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250305202641.428114-7-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Although there is nothing in NV that is fundamentally incompatible
with the lack of GICv3, there is no HW implementation without one,
at least on the virtual side (yes, even fruits have some form of
vGICv3).
We therefore make the decision to require GICv3, which will only
affect models such as QEMU. Booting with a GICv2 or something
even more exotic while asking for NV will result in KVM being
disabled.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-17-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Emulating the vGIC means emulating the dreaded Maintenance Interrupt.
This is a two-pronged problem:
- while running L2, getting an MI translates into an MI injected
in the L1 based on the state of the HW.
- while running L1, we must accurately reflect the state of the
MI line, based on the in-memory state.
The MI INTID is added to the distributor, as expected on any
virtualisation-capable implementation, and further patches
will allow its configuration.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-11-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
An interrupt being delivered to L1 while running L2 must result
in the correct exception being delivered to L1.
This means that if, on entry to L2, we found ourselves with pending
interrupts in the L1 distributor, we need to take immediate action.
This is done by posting a request which will prevent the entry in
L2, and deliver an IRQ exception to L1, forcing the switch.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-10-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
In order for vgic_v3_load_nested to be able to observe which timer
interrupts have the HW bit set for the current context, the timers
must have been loaded in the new mode and the right timer mapped
to their corresponding HW IRQs.
At the moment, we load the GIC first, meaning that timer interrupts
injected to an L2 guest will never have the HW bit set (we see the
old configuration).
Swapping the two loads solves this particular problem.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-5-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Some 'creative' VMMs out there may assign a VFIO MSI eventfd to an SPI
routing entry.
And yes, I can already hear you shouting about possibly driving a level
interrupt with an edge-sensitive one. You know who you are.
This works for the most part, and interrupt injection winds up taking
the software path. However, when running on GICv4-enabled hardware, KVM
erroneously attempts to setup LPI forwarding, even though the KVM
routing isn't an MSI.
Thanks to misuse of a union, the MSI destination is unlikely to match any
ITS in the VM and kvm_vgic_v4_set_forwarding() bails early. Later on when
the VM is being torn down, this half-configured state triggers the
WARN_ON() in kvm_vgic_v4_unset_forwarding() due to the fact that no HW
IRQ was ever assigned.
Avoid the whole mess by preventing SPI routing entries from getting into
the LPI forwarding helpers.
Reported-by: Sudheer Dantuluri <dantuluris@google.com>
Tested-by: Sudheer Dantuluri <dantuluris@google.com>
Fixes: 196b136498b3 ("KVM: arm/arm64: GICv4: Wire mapping/unmapping of VLPIs in VFIO irq bypass")
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250226183124.82094-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
KVM's treatment of the ID registers that describe the implementation
(MIDR, REVIDR, and AIDR) is interesting, to say the least. On the
userspace-facing end of it, KVM presents the values of the boot CPU on
all vCPUs and treats them as invariant. On the guest side of things KVM
presents the hardware values of the local CPU, which can change during
CPU migration in a big-little system.
While one may call this fragile, there is at least some degree of
predictability around it. For example, if a VMM wanted to present
big-little to a guest, it could affine vCPUs accordingly to the correct
clusters.
All of this makes a giant mess out of adding support for making these
implementation ID registers writable. Avoid breaking the rather subtle
ABI around the old way of doing things by requiring opt-in from
userspace to make the registers writable.
When the cap is enabled, allow userspace to set MIDR, REVIDR, and AIDR
to any non-reserved value and present those values consistently across
all vCPUs.
Signed-off-by: Sebastian Ott <sebott@redhat.com>
[oliver: changelog, capability]
Link: https://lore.kernel.org/r/20250225005401.679536-5-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Make it a bit easier to understand what people are running by
adding a +NV2 string to the successful KVM initialisation.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250220134907.554085-6-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Vladimir reports that a race condition to attach a VMID to a stage-2 MMU
sometimes results in a vCPU entering the guest with a VMID of 0:
| CPU1 | CPU2
| |
| | kvm_arch_vcpu_ioctl_run
| | vcpu_load <= load VTTBR_EL2
| | kvm_vmid->id = 0
| |
| kvm_arch_vcpu_ioctl_run |
| vcpu_load <= load VTTBR_EL2 |
| with kvm_vmid->id = 0|
| kvm_arm_vmid_update <= allocates fresh |
| kvm_vmid->id and |
| reload VTTBR_EL2 |
| |
| | kvm_arm_vmid_update <= observes that kvm_vmid->id
| | already allocated,
| | skips reload VTTBR_EL2
Oh yeah, it's as bad as it looks. Remember that VHE loads the stage-2
MMU eagerly but a VMID only gets attached to the MMU later on in the
KVM_RUN loop.
Even in the "best case" where VTTBR_EL2 correctly gets reprogrammed
before entering the EL1&0 regime, there is a period of time where
hardware is configured with VMID 0. That's completely insane. So, rather
than decorating the 'late' binding with another hack, just allocate the
damn thing up front.
Attaching a VMID from vcpu_load() is still rollover safe since
(surprise!) it'll always get called after a vCPU was preempted.
Excuse me while I go find a brown paper bag.
Cc: stable@vger.kernel.org
Fixes: 934bf871f011 ("KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe()")
Reported-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250219220737.130842-1-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
When not running in VHE mode, cpu_prepare_hyp_mode() computes the value
of TCR_EL2 using the host's TCR_EL1 settings as a starting point. For
nVHE, this amounts to masking out everything apart from the TG0, SH0,
ORGN0, IRGN0 and T0SZ fields before setting the RES1 bits, shifting the
IPS field down to the PS field and setting DS if LPA2 is enabled.
Unfortunately, for hVHE, things go slightly wonky: EPD1 is correctly set
to disable walks via TTBR1_EL2 but then the T1SZ and IPS fields are
corrupted when we mistakenly attempt to initialise the PS and DS fields
in their E2H=0 positions. Furthermore, many fields are retained from
TCR_EL1 which should not be propagated to TCR_EL2. Notably, this means
we can end up with A1 set despite not initialising TTBR1_EL2 at all.
This has been shown to cause unexpected translation faults at EL2 with
pKVM due to TLB invalidation not taking effect when running with a
non-zero ASID.
Fix the TCR_EL2 initialisation code to set PS and DS only when E2H=0,
masking out HD, HA and A1 when E2H=1.
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Fixes: ad744e8cb346 ("arm64: Allow arm64_sw.hvhe on command line")
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250214133724.13179-1-will@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.14, take #2
- Large set of fixes for vector handling, specially in the interactions
between host and guest state. This fixes a number of bugs affecting
actual deployments, and greatly simplifies the FP/SIMD/SVE handling.
Thanks to Mark Rutland for dealing with this thankless task.
- Fix an ugly race between vcpu and vgic creation/init, resulting in
unexpected behaviours.
- Fix use of kernel VAs at EL2 when emulating timers with nVHE.
- Small set of pKVM improvements and cleanups.
|
|
Now that the host eagerly saves its own FPSIMD/SVE/SME state,
non-protected KVM never needs to save the host FPSIMD/SVE/SME state,
and the code to do this is never used. Protected KVM still needs to
save/restore the host FPSIMD/SVE state to avoid leaking guest state to
the host (and to avoid revealing to the host whether the guest used
FPSIMD/SVE/SME), and that code needs to be retained.
Remove the unused code and data structures.
To avoid the need for a stub copy of kvm_hyp_save_fpsimd_host() in the
VHE hyp code, the nVHE/hVHE version is moved into the shared switch
header, where it is only invoked when KVM is in protected mode.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250210195226.1215254-3-mark.rutland@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.14, take #1
- Correctly clean the BSS to the PoC before allowing EL2 to access it
on nVHE/hVHE/protected configurations
- Propagate ownership of debug registers in protected mode after
the rework that landed in 6.14-rc1
- Stop pretending that we can run the protected mode without a GICv3
being present on the host
- Fix a use-after-free situation that can occur if a vcpu fails to
initialise the NV shadow S2 MMU contexts
- Always evaluate the need to arm a background timer for fully emulated
guest timers
- Fix the emulation of EL1 timers in the absence of FEAT_ECV
- Correctly handle the EL2 virtual timer, specially when HCR_EL2.E2H==0
|