Age | Commit message (Collapse) | Author |
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, round #1
- Host driver for GICv5, the next generation interrupt controller for
arm64, including support for interrupt routing, MSIs, interrupt
translation and wired interrupts.
- Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
GICv5 hardware, leveraging the legacy VGIC interface.
- Userspace control of the 'nASSGIcap' GICv3 feature, allowing
userspace to disable support for SGIs w/o an active state on hardware
that previously advertised it unconditionally.
- Map supporting endpoints with cacheable memory attributes on systems
with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
maintenance on the address range.
- Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
hypervisor to inject external aborts into an L2 VM and take traps of
masked external aborts to the hypervisor.
- Convert more system register sanitization to the config-driven
implementation.
- Fixes to the visibility of EL2 registers, namely making VGICv3 system
registers accessible through the VGIC device instead of the ONE_REG
vCPU ioctls.
- Various cleanups and minor fixes.
|
|
* kvm-arm64/config-masks:
: More config-driven mask computation, courtesy of Marc Zyngier
:
: Converts more system registers to the config-driven computation of RESx
: masks based on the advertised feature set
KVM: arm64: Tighten the definition of FEAT_PMUv3p9
KVM: arm64: Convert MDCR_EL2 to config-driven sanitisation
KVM: arm64: Convert SCTLR_EL1 to config-driven sanitisation
KVM: arm64: Convert TCR2_EL2 to config-driven sanitisation
arm64: sysreg: Add THE/ASID2 controls to TCR2_ELx
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
As for other registers, convert the determination of the RES0 bits
affecting MDCR_EL2 to be driven by a table extracted from the 2025-06
JSON drop
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714115503.3334242-5-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
As for other registers, convert the determination of the RES0 bits
affecting SCTLR_EL1 to be driven by a table extracted from the 2025-06
JSON drop
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714115503.3334242-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
As for other registers, convert the determination of the RES0 bits
affecting TCR2_EL2 to be driven by a table extracted from the 2025-06
JSON drop.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714115503.3334242-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
External abort injection will soon rely on a sanitised view of
SCTLR2_ELx to determine exception routing. Compute the RESx masks.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-15-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Now that the missing bits for vSError injection/deferral have been added
we can merrily claim support for FEAT_RAS.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-10-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
To date KVM has used HCR_EL2.VSE to track the state of a pending SError
for the guest. With this bit set, hardware respects the EL1 exception
routing / masking rules and injects the vSError when appropriate.
This isn't correct for NV guests as hardware is oblivious to vEL2's
intentions for SErrors. Better yet, with FEAT_NV2 the guest can change
the routing behind our back as HCR_EL2 is redirected to memory. Cope
with this mess by:
- Using a flag (instead of HCR_EL2.VSE) to track the pending SError
state when SErrors are unconditionally masked for the current context
- Resampling the routing / masking of a pending SError on every guest
entry/exit
- Emulating exception entry when SError routing implies a translation
regime change
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-7-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Booting an EL2 guest on a system only supporting a subset of the
possible page sizes leads to interesting situations.
For example, on a system that only supports 4kB and 64kB, and is
booted with a 4kB kernel, we end-up advertising 16kB support at
stage-2, which is pretty weird.
That's because we consider that any S2 bigger than our base granule
is fair game, irrespective of what the HW actually supports. While this
is not impossible to support (KVM would happily handle it), it is likely
to be confusing for the guest.
Add new checks that will verify that this granule size is actually
supported before publishing it to the guest.
Fixes: e7ef6ed4583ea ("KVM: arm64: Enforce NV limits on a per-idregs basis")
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In a number of cases, we perform a Read-Modify-Write operation on
a system register, meaning that we would apply the RESx masks twice.
Instead, provide a new accessor that performs this RMW operation,
allowing the masks to be applied exactly once per operation.
Reviewed-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250603070824.1192795-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
When handling a TLBI VA* instruction that potentially targets a
VNCR page mapping, we fail to mask out the top bits that contain
the ASID and TTL fields, hence potentially failing the VA check
in the TLB code.
An additional wrinkle is that we fail to sign extend the VA,
again leading to failed VA checks.
Fix both in one go by sign-extending the VA from bit 48, making
it comparable to the way we interpret VNCR_EL2.BADDR.
Fixes: 4ffa72ad8f37e ("KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2")
Link: https://lore.kernel.org/r/20250525175759.780891-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/nv-nv:
: .
: Flick the switch on the NV support by adding the missing piece
: in the form of the VNCR page management. From the cover letter:
:
: "This is probably the most interesting bit of the whole NV adventure.
: So far, everything else has been a walk in the park, but this one is
: where the real fun takes place.
:
: With FEAT_NV2, most of the NV support revolves around tricking a guest
: into accessing memory while it tries to access system registers. The
: hypervisor's job is to handle the context switch of the actual
: registers with the state in memory as needed."
: .
KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
KVM: arm64: Document NV caps and vcpu flags
KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2*
KVM: arm64: nv: Remove dead code from ERET handling
KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch
KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2
KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address
KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers
KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2
KVM: arm64: nv: Handle VNCR_EL2-triggered faults
KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2
KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2
KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting
KVM: arm64: nv: Move TLBI range decoding to a helper
KVM: arm64: nv: Snapshot S1 ASID tagging information during walk
KVM: arm64: nv: Extract translation helper from the AT code
KVM: arm64: nv: Allocate VNCR page when required
arm64: sysreg: Add layout for VNCR_EL2
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The conversion to kvm_release_faultin_page() missed the requirement
for this to be called within a critical section with mmu_lock held
for write. Move this call up to satisfy this requirement.
Fixes: 069a05e535496 ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults")
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Calling invalidate_vncr_va() without the mmu_lock held for write
is a bad idea, and lockdep tells you about that.
Fixes: 4ffa72ad8f37e ("KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2")
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
When translating a VNCR translation fault, we start by marking the
current SW-managed TLB as invalid, so that we can populate it
in place. This is, however, done without the mmu_lock held.
A consequence of this is that another CPU dealing with TLBI
emulation can observe a translation still flagged as valid, but
with invalid walk results (such as pgshift being 0). Bad things
can result from this, such as a BUG() in pgshift_level_to_ttl().
Fix it by taking the mmu_lock for write to perform this local
invalidation, and use invalidate_vncr() instead of open-coding
the write to the 'valid' flag.
Fixes: 069a05e535496 ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults")
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250520144116.3667978-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Just like the FEAT_FGT registers, treat the FGT2 variant the same
way. THis is a large update, but a fairly mechanical one.
The config dependencies are extracted from the 2025-03 JSON drop.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Similarly to other registers, describe which HCR_EL2 bit depends
on which feature, and use this to compute the RES0 status of these
bits.
An additional complexity stems from the status of some bits such
as E2H and RW, which do not had a RESx status, but still take
a fixed value due to implementation choices in KVM.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Similarly to other registers, describe which HCR_EL2 bit depends
on which feature, and use this to compute the RES0 status of these
bits.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Another benefit of mapping bits to features is that it becomes trivial
to define which bits should be handled as RES0.
Let's apply this principle to the guest's view of the FGT registers.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
A TLBI by VA for S1 must take effect on our pseudo-TLB for VNCR
and potentially knock the fixmap mapping. Even worse, that TLBI
must be able to work cross-vcpu.
For that, we track on a per-VM basis if any VNCR is mapped, using
an atomic counter. Whenever a TLBI S1E2 occurs and that this counter
is non-zero, we take the long road all the way back to the core code.
There, we iterate over all vcpus and check whether this particular
invalidation has any damaging effect. If it does, we nuke the pseudo
TLB and the corresponding fixmap.
Yes, this is costly.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-14-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
During an invalidation triggered by an MMU notifier, we need to
make sure we can drop the *host* mapping that would have been
translated by the stage-2 mapping being invalidated.
For the moment, the invalidation is pretty brutal, as we nuke
the full IPA range, and therefore any VNCR_EL2 mapping.
At some point, we'll be more light-weight, and the code is able
to deal with something more targetted.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-12-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Now that we can handle faults triggered through VNCR_EL2, we need
to map the corresponding page at EL2. But where, you'll ask?
Since each CPU in the system can run a vcpu, we need a per-CPU
mapping. For that, we carve a NR_CPUS range in the fixmap, giving
us a per-CPU va at which to map the guest's VNCR's page.
The mapping occurs both on vcpu load and on the back of a fault,
both generating a request that will take care of the mapping.
That mapping will also get dropped on vcpu put.
Yes, this is a bit heavy handed, but it is simple. Eventually,
we may want to have a per-VM, per-CPU mapping, which would avoid
all the TLBI overhead.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-11-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
As VNCR_EL2.BADDR contains a VA, it is bound to trigger faults.
These faults can have multiple source:
- We haven't mapped anything on the host: we need to compute the
resulting translation, populate a TLB, and eventually map
the corresponding page
- The permissions are out of whack: we need to tell the guest about
this state of affairs
Note that the kernel doesn't support S1POE for itself yet, so
the particular case of a VNCR page mapped with no permissions
or with write-only permissions is not correctly handled yet.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-10-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Plug VNCR_EL2 in the vcpu_sysreg enum, define its RES0/RES1 bits,
and make it accessible to userspace when the VM is configured to
support FEAT_NV2.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-9-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
FEAT_NV2 introduces an interesting problem for NV, as VNCR_EL2.BADDR
is a virtual address in the EL2&0 (or EL2, but we thankfully ignore
this) translation regime.
As we need to replicate such mapping in the real EL2, it means that
we need to remember that there is such a translation, and that any
TLBI affecting EL2 can possibly affect this translation.
It also means that any invalidation driven by an MMU notifier must
be able to shoot down any such mapping.
All in all, we need a data structure that represents this mapping,
and that is extremely close to a TLB. Given that we can only use
one of those per vcpu at any given time, we only allocate one.
No effort is made to keep that structure small. If we need to
start caching multiple of them, we may want to revisit that design
point. But for now, it is kept simple so that we can reason about it.
Oh, and add a braindump of how things are supposed to work, because
I will definitely page this out at some point. Yes, pun intended.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-8-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
If running a NV guest on an ARMv8.4-NV capable system, let's
allocate an additional page that will be used by the hypervisor
to fulfill system register accesses.
Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
We do not have a computed table for HCRX_EL2, so statically define
the bits we know about. A warning will fire if the architecture
grows bits that are not handled yet.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Now that we have computed RES0 bits, use them to sanitise the
guest view of FGT registers.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Treating HFGRTR_EL2 and HFGWTR_EL2 identically was a mistake.
It makes things hard to reason about, has the potential to
introduce bugs by giving a meaning to bits that are really reserved,
and is in general a bad description of the architecture.
Given that #defines are cheap, let's describe both registers as
intended by the architecture, and repaint all the existing uses.
Yes, this is painful.
The registers themselves are generated from the JSON file in
an automated way.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/nv-idregs:
: Changes to exposure of NV features, courtesy of Marc Zyngier
:
: Apply NV-specific feature restrictions at reset rather than at the point
: of KVM_RUN. This makes the true feature set visible to userspace, a
: necessary step towards save/restore support or NV VMs.
:
: Add an additional vCPU feature flag for selecting the E2H0 flavor of NV,
: such that the VHE-ness of the VM can be applied to the feature set.
KVM: arm64: selftests: Test that TGRAN*_2 fields are writable
KVM: arm64: Allow userspace to write ID_AA64MMFR0_EL1.TGRAN*_2
KVM: arm64: Advertise FEAT_ECV when possible
KVM: arm64: Make ID_AA64MMFR4_EL1.NV_frac writable
KVM: arm64: Allow userspace to limit NV support to nVHE
KVM: arm64: Move NV-specific capping to idreg sanitisation
KVM: arm64: Enforce NV limits on a per-idregs basis
KVM: arm64: Make ID_REG_LIMIT_FIELD_ENUM() more widely available
KVM: arm64: Consolidate idreg callbacks
KVM: arm64: Advertise NV2 in the boot messages
KVM: arm64: Mark HCR.EL2.{NV*,AT} RES0 when ID_AA64MMFR4_EL1.NV_frac is 0
KVM: arm64: Mark HCR.EL2.E2H RES0 when ID_AA64MMFR1_EL1.VH is zero
KVM: arm64: Hide ID_AA64MMFR2_EL1.NV from guest and userspace
arm64: cpufeature: Handle NV_frac as a synonym of NV2
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
An interrupt being delivered to L1 while running L2 must result
in the correct exception being delivered to L1.
This means that if, on entry to L2, we found ourselves with pending
interrupts in the L1 distributor, we need to take immediate action.
This is done by posting a request which will prevent the entry in
L2, and deliver an IRQ exception to L1, forcing the switch.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-10-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
As ICH_HCR_EL2 is a VNCR accessor when runnintg NV, add some
sanitising to what gets written. Crucially, mark TDIR as RES0
if the HW doesn't support it (unlikely, but hey...), as well
as anything GICv4 related, since we only expose a GICv3 to the
uest.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-8-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
We can advertise support for FEAT_ECV if supported on the HW as
long as we limit it to the basic trap bits, and not advertise
CNTPOFF_EL2 support, even if the host has it (the short story
being that CNTPOFF_EL2 is not virtualisable).
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250220134907.554085-13-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
NV is hard. No kidding.
In order to make things simpler, we have established that NV would
support two mutually exclusive configurations:
- VHE-only, and supporting recursive virtualisation
- nVHE-only, and not supporting recursive virtualisation
For that purpose, introduce a new vcpu feature flag that denotes
the second configuration. We use this flag to limit the idregs
further.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250220134907.554085-11-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Instead of applying the NV idreg limits at run time, switch to
doing it at the same time as the reset of the VM initialisation.
This will make things much simpler once we introduce vcpu-driven
variants of NV.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250220134907.554085-10-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
As we are about to change the way the idreg reset values are computed,
move all the NV limits into a function that initialises one register
at a time.
This will be most useful in the upcoming patches. We take this opportunity
to remove the NV_FTR() macro and rely on the generated names instead.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250220134907.554085-9-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Enforce HCR_EL2.{NV*,AT} being RES0 when NV2 is disabled, so that
we can actually rely on these bits never being flipped behind our back.
This of course relies on our earlier ID reg sanitising.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250220134907.554085-5-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Enforce HCR_EL2.E2H being RES0 when VHE is disabled, so that we can
actually rely on that bit never being flipped behind our back.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250220134907.554085-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
For each vcpu that userspace creates, we allocate a number of
s2_mmu structures that will eventually contain our shadow S2
page tables.
Since this is a dynamically allocated array, we reallocate
the array and initialise the newly allocated elements. Once
everything is correctly initialised, we adjust pointer and size
in the kvm structure, and move on.
But should that initialisation fail *and* the reallocation triggered
a copy to another location, we end-up returning early, with the
kvm structure still containing the (now stale) old pointer. Weeee!
Cure it by assigning the pointer early, and use this to perform
the initialisation. If everything succeeds, we adjust the size.
Otherwise, we just leave the size as it was, no harm done, and the
new memory is as good as the ol' one (we hope...).
Fixes: 4f128f8e1aaac ("KVM: arm64: nv: Support multiple nested Stage-2 mmu structures")
Reported-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Link: https://lore.kernel.org/r/20250204145554.774427-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/misc-6.14:
: .
: Misc KVM/arm64 changes for 6.14
:
: - Don't expose AArch32 EL0 capability when NV is enabled
:
: - Update documentation to reflect the full gamut of kvm-arm.mode
: behaviours
:
: - Use the hypervisor VA bit width when dumping stacktraces
:
: - Decouple the hypervisor stack size from PAGE_SIZE, at least
: on the surface...
:
: - Make use of str_enabled_disabled() when advertising GICv4.1 support
:
: - Explicitly handle BRBE traps as UNDEFINED
: .
KVM: arm64: Explicitly handle BRBE traps as UNDEFINED
KVM: arm64: vgic: Use str_enabled_disabled() in vgic_v3_probe()
arm64: kvm: Introduce nvhe stack size constants
KVM: arm64: Fix nVHE stacktrace VA bits mask
Documentation: Update the behaviour of "kvm-arm.mode"
KVM: arm64: nv: Advertise the lack of AArch32 EL0 support
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/nv-resx-fixes-6.14:
: .
: Fixes for NV sysreg accessors. From the cover letter:
:
: "Joey recently reported that some rather basic tests were failing on
: NV, and managed to track it down to critical register fields (such as
: HCR_EL2.E2H) not having their expect value.
:
: Further investigation has outlined a couple of critical issues:
:
: - Evaluating HCR_EL2.E2H must always be done with a sanitising
: accessor, no ifs, no buts. Given that KVM assumes a fixed value for
: this bit, we cannot leave it to the guest to mess with.
:
: - Resetting the sysreg file must result in the RESx bits taking
: effect. Otherwise, we may end-up making the wrong decision (see
: above), and we definitely expose invalid values to the guest. Note
: that because we compute the RESx masks very late in the VM setup, we
: need to apply these masks at that particular point as well.
: [...]"
: .
KVM: arm64: nv: Apply RESx settings to sysreg reset values
KVM: arm64: nv: Always evaluate HCR_EL2 using sanitising accessors
Signed-off-by: Marc Zyngier <maz@kernel.org>
# Conflicts:
# arch/arm64/kvm/nested.c
|
|
* kvm-arm64/nv-timers:
: .
: Nested Virt support for the EL2 timers. From the initial cover letter:
:
: "Here's another batch of NV-related patches, this time bringing in most
: of the timer support for EL2 as well as nested guests.
:
: The code is pretty convoluted for a bunch of reasons:
:
: - FEAT_NV2 breaks the timer semantics by redirecting HW controls to
: memory, meaning that a guest could setup a timer and never see it
: firing until the next exit
:
: - We go try hard to reflect the timer state in memory, but that's not
: great.
:
: - With FEAT_ECV, we can finally correctly emulate the virtual timer,
: but this emulation is pretty costly
:
: - As a way to make things suck less, we handle timer reads as early as
: possible, and only defer writes to the normal trap handling
:
: - Finally, some implementations are badly broken, and require some
: hand-holding, irrespective of NV support. So we try and reuse the NV
: infrastructure to make them usable. This could be further optimised,
: but I'm running out of patience for this sort of HW.
:
: [...]"
: .
KVM: arm64: nv: Fix doc header layout for timers
KVM: arm64: nv: Document EL2 timer API
KVM: arm64: Work around x1e's CNTVOFF_EL2 bogosity
KVM: arm64: nv: Sanitise CNTHCTL_EL2
KVM: arm64: nv: Propagate CNTHCTL_EL2.EL1NV{P,V}CT bits
KVM: arm64: nv: Add trap routing for CNTHCTL_EL2.EL1{NVPCT,NVVCT,TVT,TVCT}
KVM: arm64: Handle counter access early in non-HYP context
KVM: arm64: nv: Accelerate EL0 counter accesses from hypervisor context
KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV in use
KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers
KVM: arm64: nv: Publish emulated timer interrupt state in the in-memory state
KVM: arm64: nv: Sync nested timer state with FEAT_NV2
KVM: arm64: nv: Add handling of EL2-specific timer registers
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
While we have sanitisation in place for the guest sysregs, we lack
that sanitisation out of reset. So some of the fields could be
evaluated and not reflect their RESx status, which sounds like
a very bad idea.
Apply the RESx masks to the the sysreg file in two situations:
- when going via a reset of the sysregs
- after having computed the RESx masks
Having this separate reset phase from the actual reset handling is
a bit grotty, but we need to apply this after the ID registers are
final.
Tested-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250112165029.1181056-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Inject some sanity in CNTHCTL_EL2, ensuring that we don't handle
more than we advertise to the guest.
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241217142321.763801-11-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Although we never supported 32bit anywhere in NV, we fail to
advertise so for EL0, probably owing to the relative lack of
hardware supporting both NV2 and 32bit EL0.
Add some sanitising to ID_AA64PFR0_EL1.EL0, and reaffirm that
"in 64bit-only we trust".
Reported-by: Oliver Upton <oliver.upton@linux.dev>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Now that we have introduced kvm_vcpu_has_feature(), use it in the
remaining code that checks for features in struct kvm, instead of
using the __vcpu_has_feature() helper.
No functional change intended.
Suggested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20241216105057.579031-18-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
When compiling with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, set_sysreg_masks()
fails to compile thanks to:
BUILD_BUG_ON(!__builtin_constant_p(sr));
as the compiler doesn't identify sr as a constant, despite all the
callers passing constants.
Fix the issue by always inlining this function, which allows GCC to
do the right thing.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202411201857.ZNudtGJl-lkp@intel.com/
Fixes: a0162020095e2 ("KVM: arm64: Extend masking facility to arbitrary registers")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20241120111516.304250-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
* kvm-arm64/nv-pmu:
: Support for vEL2 PMU controls
:
: Align the vEL2 PMU support with the current state of non-nested KVM,
: including:
:
: - Trap routing, with the annoying complication of EL2 traps that apply
: in Host EL0
:
: - PMU emulation, using the correct configuration bits depending on
: whether a counter falls in the hypervisor or guest range of PMCs
:
: - Perf event swizzling across nested boundaries, as the event filtering
: needs to be remapped to cope with vEL2
KVM: arm64: nv: Reprogram PMU events affected by nested transition
KVM: arm64: nv: Apply EL2 event filtering when in hyp context
KVM: arm64: nv: Honor MDCR_EL2.HLP
KVM: arm64: nv: Honor MDCR_EL2.HPME
KVM: arm64: Add helpers to determine if PMC counts at a given EL
KVM: arm64: nv: Adjust range of accessible PMCs according to HPMN
KVM: arm64: Rename kvm_pmu_valid_counter_mask()
KVM: arm64: nv: Advertise support for FEAT_HPMN0
KVM: arm64: nv: Describe trap behaviour of MDCR_EL2.HPMN
KVM: arm64: nv: Honor MDCR_EL2.{TPM, TPMCR} in Host EL0
KVM: arm64: nv: Reinject traps that take effect in Host EL0
KVM: arm64: nv: Rename BEHAVE_FORWARD_ANY
KVM: arm64: nv: Allow coarse-grained trap combos to use complex traps
KVM: arm64: Describe RES0/RES1 bits of MDCR_EL2
arm64: sysreg: Add new definitions for ID_AA64DFR0_EL1
arm64: sysreg: Migrate MDCR_EL2 definition to table
arm64: sysreg: Describe ID_AA64DFR2_EL1 fields
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Everything is in place now for KVM to actually handle MDCR_EL2.HPMN. Not
only that, the emulation is capable of doing FEAT_HPMN0. Advertise
support for the feature in the VM's ID registers. It is possible to
emulate FEAT_HPMN0 on hardware that doesn't support it since KVM
currently traps all PMU registers. Having said that, let's only
advertise the feature on supporting hardware in case KVM ever provides
'direct' PMU support to VMs w/o involving host perf.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241025182354.3364124-12-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Add support for sanitising MDCR_EL2 and describe the RES0/RES1 bits
according to the feature set exposed to the VM.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241025182354.3364124-6-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|