summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-30KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidationMarc Zyngier
When handling a TLBI VA* instruction that potentially targets a VNCR page mapping, we fail to mask out the top bits that contain the ASID and TTL fields, hence potentially failing the VA check in the TLB code. An additional wrinkle is that we fail to sign extend the VA, again leading to failed VA checks. Fix both in one go by sign-extending the VA from bit 48, making it comparable to the way we interpret VNCR_EL2.BADDR. Fixes: 4ffa72ad8f37e ("KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2") Link: https://lore.kernel.org/r/20250525175759.780891-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-30arm64: sysreg: Drag linux/kconfig.h to work around vdso build issueMarc Zyngier
Broonie reports that fed55f49fad18 ("arm64: errata: Work around AmpereOne's erratum AC04_CPU_23") breaks one of the vdso selftests (vdso_test_chacha) as it indirectly drags asm/sysreg.h. It is rather unfortunate (and worrying) that userspace gets built with non-UAPI headers. In any case, paper over the issue by dragging linux/kconfig.h in asm/sysreg.h. It is the right thing to do, at least from the kernel perspective. Reported-by: Mark Brown <broonie@kernel.org> Fixes: fed55f49fad18 ("arm64: errata: Work around AmpereOne's erratum AC04_CPU_23") Link: https://lore.kernel.org/r/aDCDGZ-G-nCP3hJI@finisterre.sirena.org.uk Cc: D Scott Phillips <scott@os.amperecomputing.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250523170208.530818-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/misc-6.16 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/misc-6.16: : . : Misc changes and improvements for 6.16: : : - Add a new selftest for the SVE host state being corrupted by a guest : : - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, : ensuring that the window for interrupts is slightly bigger, and avoiding : a pretty bad erratum on the AmpereOne HW : : - Replace a couple of open-coded on/off strings with str_on_off() : : - Get rid of the pKVM memblock sorting, which now appears to be superflous : : - Drop superflous clearing of ICH_LR_EOI in the LR when nesting : : - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from : a pretty bad case of TLB corruption unless accesses to HCR_EL2 are : heavily synchronised : : - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables : in a human-friendly fashion : . KVM: arm64: Fix documentation for vgic_its_iter_next() KVM: arm64: vgic-its: Add debugfs interface to expose ITS tables arm64: errata: Work around AmpereOne's erratum AC04_CPU_23 KVM: arm64: nv: Remove clearing of ICH_LR<n>.EOI if ICH_LR<n>.HW == 1 KVM: arm64: Drop sort_memblock_regions() KVM: arm64: selftests: Add test for SVE host corruption KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode KVM: arm64: Replace ternary flags with str_on_off() helper Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/nv-nv into kvmarm-master/nextMarc Zyngier
* kvm-arm64/nv-nv: : . : Flick the switch on the NV support by adding the missing piece : in the form of the VNCR page management. From the cover letter: : : "This is probably the most interesting bit of the whole NV adventure. : So far, everything else has been a walk in the park, but this one is : where the real fun takes place. : : With FEAT_NV2, most of the NV support revolves around tricking a guest : into accessing memory while it tries to access system registers. The : hypervisor's job is to handle the context switch of the actual : registers with the state in memory as needed." : . KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating KVM: arm64: Document NV caps and vcpu flags KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2* KVM: arm64: nv: Remove dead code from ERET handling KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2 KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2 KVM: arm64: nv: Handle VNCR_EL2-triggered faults KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2 KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2 KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting KVM: arm64: nv: Move TLBI range decoding to a helper KVM: arm64: nv: Snapshot S1 ASID tagging information during walk KVM: arm64: nv: Extract translation helper from the AT code KVM: arm64: nv: Allocate VNCR page when required arm64: sysreg: Add layout for VNCR_EL2 Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/at-fixes-6.16 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/at-fixes-6.16: : . : Set of fixes for Address Translation (AT) instruction emulation, : which affect the (not yet upstream) NV support. : : From the cover letter: : : "Here's a small series of fixes for KVM's implementation of address : translation (aka the AT S1* instructions), addressing a number of : issues in increasing levels of severity: : : - We misreport PAR_EL1.PTW in a number of occasions, including state : that is not possible as per the architecture definition : : - We don't handle access faults at all, and that doesn't play very : well with the rest of the VNCR stuff : : - AT S1E{0,1} from EL2 with HCR_EL2.{E2H,TGE}={1,1} will absolutely : take the host down, no questions asked" : . KVM: arm64: Don't feed uninitialised data to HCR_EL2 KVM: arm64: Teach address translation about access faults KVM: arm64: Fix PAR_EL1.{PTW,S} reporting on AT S1E* Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/fgt-masks into kvmarm-master/nextMarc Zyngier
* kvm-arm64/fgt-masks: (43 commits) : . : Large rework of the way KVM deals with trap bits in conjunction with : the CPU feature registers. It now draws a direct link between which : the feature set, the system registers that need to UNDEF to match : the configuration and bits that need to behave as RES0 or RES1 in : the trap registers that are visible to the guest. : : Best of all, these definitions are mostly automatically generated : from the JSON description published by ARM under a permissive : license. : . KVM: arm64: Handle TSB CSYNC traps KVM: arm64: Add FGT descriptors for FEAT_FGT2 KVM: arm64: Allow sysreg ranges for FGT descriptors KVM: arm64: Add context-switch for FEAT_FGT2 registers KVM: arm64: Add trap routing for FEAT_FGT2 registers KVM: arm64: Add sanitisation for FEAT_FGT2 registers KVM: arm64: Add FEAT_FGT2 registers to the VNCR page KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits KVM: arm64: Allow kvm_has_feat() to take variable arguments KVM: arm64: Use FGT feature maps to drive RES0 bits KVM: arm64: Validate FGT register descriptions against RES0 masks KVM: arm64: Switch to table-driven FGU configuration KVM: arm64: Handle PSB CSYNC traps KVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask KVM: arm64: Remove hand-crafted masks for FGT registers KVM: arm64: Use computed FGT masks to setup FGT registers KVM: arm64: Propagate FGT masks to the nVHE hypervisor KVM: arm64: Unconditionally configure fine-grain traps KVM: arm64: Use computed masks as sanitisers for FGT registers ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/mte-frac into kvmarm-master/nextMarc Zyngier
* kvm-arm64/mte-frac: : . : Prevent FEAT_MTE_ASYNC from being accidently exposed to a guest, : courtesy of Ben Horgan. From the cover letter: : : "The ID_AA64PFR1_EL1.MTE_frac field is currently hidden from KVM. : However, when ID_AA64PFR1_EL1.MTE==2, ID_AA64PFR1_EL1.MTE_frac==0 : indicates that MTE_ASYNC is supported. On a host with : ID_AA64PFR1_EL1.MTE==2 but without MTE_ASYNC support a guest with the : MTE capability enabled will incorrectly see MTE_ASYNC advertised as : supported. This series fixes that." : . KVM: selftests: Confirm exposing MTE_frac does not break migration KVM: arm64: Make MTE_frac masking conditional on MTE capability arm64/sysreg: Expose MTE_frac so that it is visible to KVM Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/ubsan-el2 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/ubsan-el2: : . : Add UBSAN support to the EL2 portion of KVM, reusing most of the : existing logic provided by CONFIG_IBSAN_TRAP. : : Patches courtesy of Mostafa Saleh. : . KVM: arm64: Handle UBSAN faults KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2 ubsan: Remove regs from report_ubsan_failure() arm64: Introduce esr_is_ubsan_brk() Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/pkvm-np-thp-6.16 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pkvm-np-thp-6.16: (21 commits) : . : Large mapping support for non-protected pKVM guests, courtesy of : Vincent Donnefort. From the cover letter: : : "This series adds support for stage-2 huge mappings (PMD_SIZE) to pKVM : np-guests, that is installing PMD-level mappings in the stage-2, : whenever the stage-1 is backed by either Hugetlbfs or THPs." : . KVM: arm64: np-guest CMOs with PMD_SIZE fixmap KVM: arm64: Stage-2 huge mappings for np-guests KVM: arm64: Add a range to pkvm_mappings KVM: arm64: Convert pkvm_mappings to interval tree KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() KVM: arm64: Add a range to __pkvm_host_unshare_guest() KVM: arm64: Add a range to __pkvm_host_share_guest() KVM: arm64: Introduce for_each_hyp_page KVM: arm64: Handle huge mappings for np-guest CMOs KVM: arm64: Extend pKVM selftest for np-guests KVM: arm64: Selftest for pKVM transitions KVM: arm64: Don't WARN from __pkvm_host_share_guest() KVM: arm64: Add .hyp.data section KVM: arm64: Unconditionally cross check hyp state KVM: arm64: Defer EL2 stage-1 mapping on share KVM: arm64: Move hyp state to hyp_vmemmap KVM: arm64: Introduce {get,set}_host_state() helpers KVM: arm64: Use 0b11 for encoding PKVM_NOPAGE KVM: arm64: Fix pKVM page-tracking comments ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-22KVM: arm64: Fix documentation for vgic_its_iter_next()Marc Zyngier
As reported by the build robot, the documentation for vgic_its_iter_next() contains a typo. Fix it. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505221421.KAuWlmSr-lkp@intel.com/ Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: np-guest CMOs with PMD_SIZE fixmapVincent Donnefort
With the introduction of stage-2 huge mappings in the pKVM hypervisor, guest pages CMO is needed for PMD_SIZE size. Fixmap only supports PAGE_SIZE and iterating over the huge-page is time consuming (mostly due to TLBI on hyp_fixmap_unmap) which is a problem for EL2 latency. Introduce a shared PMD_SIZE fixmap (hyp_fixblock_map/hyp_fixblock_unmap) to improve guest page CMOs when stage-2 huge mappings are installed. On a Pixel6, the iterative solution resulted in a latency of ~700us, while the PMD_SIZE fixmap reduces it to ~100us. Because of the horrendous private range allocation that would be necessary, this is disabled for 64KiB pages systems. Suggested-by: Quentin Perret <qperret@google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-11-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Stage-2 huge mappings for np-guestsVincent Donnefort
Now np-guests hypercalls with range are supported, we can let the hypervisor to install block mappings whenever the Stage-1 allows it, that is when backed by either Hugetlbfs or THPs. The size of those block mappings is limited to PMD_SIZE. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-10-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Add a range to pkvm_mappingsQuentin Perret
In preparation for supporting stage-2 huge mappings for np-guest, add a nr_pages member for pkvm_mappings to allow EL1 to track the size of the stage-2 mapping. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-9-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Convert pkvm_mappings to interval treeQuentin Perret
In preparation for supporting stage-2 huge mappings for np-guest, let's convert pgt.pkvm_mappings to an interval tree. No functional change intended. Suggested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-8-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()Vincent Donnefort
In preparation for supporting stage-2 huge mappings for np-guest. Add a nr_pages argument to the __pkvm_host_test_clear_young_guest hypercall. This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a 4K-pages system). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-7-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()Vincent Donnefort
In preparation for supporting stage-2 huge mappings for np-guest. Add a nr_pages argument to the __pkvm_host_wrprotect_guest hypercall. This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a 4K-pages system). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-6-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Add a range to __pkvm_host_unshare_guest()Vincent Donnefort
In preparation for supporting stage-2 huge mappings for np-guest. Add a nr_pages argument to the __pkvm_host_unshare_guest hypercall. This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a 4K-pages system). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-5-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Add a range to __pkvm_host_share_guest()Vincent Donnefort
In preparation for supporting stage-2 huge mappings for np-guest. Add a nr_pages argument to the __pkvm_host_share_guest hypercall. This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a 4K-pages system). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-4-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Introduce for_each_hyp_pageVincent Donnefort
Add a helper to iterate over the hypervisor vmemmap. This will be particularly handy with the introduction of huge mapping support for the np-guest stage-2. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-3-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Handle huge mappings for np-guest CMOsVincent Donnefort
clean_dcache_guest_page() and invalidate_icache_guest_page() accept a size as an argument. But they also rely on fixmap, which can only map a single PAGE_SIZE page. With the upcoming stage-2 huge mappings for pKVM np-guests, those callbacks will get size > PAGE_SIZE. Loop the CMOs on a PAGE_SIZE basis until the whole range is done. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-2-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21Merge branch kvm-arm64/pkvm-selftest-6.16 into kvm-arm64/pkvm-np-thp-6.16Marc Zyngier
* kvm-arm64/pkvm-selftest-6.16: : . : pKVM selftests covering the memory ownership transitions by : Quentin Perret. From the initial cover letter: : : "We have recently found a bug [1] in the pKVM memory ownership : transitions by code inspection, but it could have been caught with a : test. : : Introduce a boot-time selftest exercising all the known pKVM memory : transitions and importantly checks the rejection of illegal transitions. : : The new test is hidden behind a new Kconfig option separate from : CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the : transition checks ([1] doesn't reproduce with EL2 debug enabled). : : [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/" : . KVM: arm64: Extend pKVM selftest for np-guests KVM: arm64: Selftest for pKVM transitions KVM: arm64: Don't WARN from __pkvm_host_share_guest() KVM: arm64: Add .hyp.data section Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21Merge branch kvm-arm64/pkvm-6.16 into kvm-arm64/pkvm-np-thp-6.16Marc Zyngier
* kvm-arm64/pkvm-6.16: : . : pKVM memory management cleanups, courtesy of Quentin Perret. : From the cover letter: : : "This series moves the hypervisor's ownership state to the hyp_vmemmap, : as discussed in [1]. The two main benefits are: : : 1. much cheaper hyp state lookups, since we can avoid the hyp stage-1 : page-table walk; : : 2. de-correlates the hyp state from the presence of a mapping in the : linear map range of the hypervisor; which enables a bunch of : clean-ups in the existing code and will simplify the introduction of : other features in the future (hyp tracing, ...)" : . KVM: arm64: Unconditionally cross check hyp state KVM: arm64: Defer EL2 stage-1 mapping on share KVM: arm64: Move hyp state to hyp_vmemmap KVM: arm64: Introduce {get,set}_host_state() helpers KVM: arm64: Use 0b11 for encoding PKVM_NOPAGE KVM: arm64: Fix pKVM page-tracking comments KVM: arm64: Track SVE state in the hypervisor vcpu structure Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical sectionMarc Zyngier
The conversion to kvm_release_faultin_page() missed the requirement for this to be called within a critical section with mmu_lock held for write. Move this call up to satisfy this requirement. Fixes: 069a05e535496 ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults") Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock heldMarc Zyngier
Calling invalidate_vncr_va() without the mmu_lock held for write is a bad idea, and lockdep tells you about that. Fixes: 4ffa72ad8f37e ("KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2") Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translatingMarc Zyngier
When translating a VNCR translation fault, we start by marking the current SW-managed TLB as invalid, so that we can populate it in place. This is, however, done without the mmu_lock held. A consequence of this is that another CPU dealing with TLBI emulation can observe a translation still flagged as valid, but with invalid walk results (such as pgshift being 0). Bad things can result from this, such as a BUG() in pgshift_level_to_ttl(). Fix it by taking the mmu_lock for write to perform this local invalidation, and use invalidate_vncr() instead of open-coding the write to the 'valid' flag. Fixes: 069a05e535496 ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults") Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250520144116.3667978-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: vgic-its: Add debugfs interface to expose ITS tablesJing Zhang
This commit introduces a debugfs interface to display the contents of the VGIC Interrupt Translation Service (ITS) tables. The ITS tables map Device/Event IDs to Interrupt IDs and target processors. Exposing this information through debugfs allows for easier inspection and debugging of the interrupt routing configuration. The debugfs interface presents the ITS table data in a tabular format: Device ID: 0x0, Event ID Range: [0 - 31] EVENT_ID INTID HWINTID TARGET COL_ID HW ----------------------------------------------- 0 8192 0 0 0 0 1 8193 0 0 0 0 2 8194 0 2 2 0 Device ID: 0x18, Event ID Range: [0 - 3] EVENT_ID INTID HWINTID TARGET COL_ID HW ----------------------------------------------- 0 8225 0 0 0 0 1 8226 0 1 1 0 2 8227 0 3 3 0 Device ID: 0x10, Event ID Range: [0 - 7] EVENT_ID INTID HWINTID TARGET COL_ID HW ----------------------------------------------- 0 8229 0 3 3 1 1 8230 0 0 0 1 2 8231 0 1 1 1 3 8232 0 2 2 1 4 8233 0 3 3 1 The output is generated using the seq_file interface, allowing for efficient handling of potentially large ITS tables. This interface is read-only and does not allow modification of the ITS tables. It is intended for debugging and informational purposes only. Signed-off-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20250220224247.2017205-1-jingzhangos@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19arm64: errata: Work around AmpereOne's erratum AC04_CPU_23D Scott Phillips
On AmpereOne AC04, updates to HCR_EL2 can rarely corrupt simultaneous translations for data addresses initiated by load/store instructions. Only instruction initiated translations are vulnerable, not translations from prefetches for example. A DSB before the store to HCR_EL2 is sufficient to prevent older instructions from hitting the window for corruption, and an ISB after is sufficient to prevent younger instructions from hitting the window for corruption. Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250513184514.2678288-1-scott@os.amperecomputing.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Handle TSB CSYNC trapsMarc Zyngier
The architecture introduces a trap for TSB CSYNC that fits in the same EC as LS64 and PSB CSYNC. Let's deal with it in a similar way. It's not that we expect this to be useful any time soon anyway. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Add FGT descriptors for FEAT_FGT2Marc Zyngier
Bulk addition of all the FGT2 traps reported with EC == 0x18, as described in the 2025-03 JSON drop. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Allow sysreg ranges for FGT descriptorsMarc Zyngier
Just like we allow sysreg ranges for Coarse Grained Trap descriptors, allow them for Fine Grain Traps as well. This comes with a warning that not all ranges are suitable for this particular definition of ranges. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Add context-switch for FEAT_FGT2 registersMarc Zyngier
Just like the rest of the FGT registers, perform a switch of the FGT2 equivalent. This avoids the host configuration leaking into the guest... Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Add trap routing for FEAT_FGT2 registersMarc Zyngier
Similarly to the FEAT_FGT registers, pick the correct FEAT_FGT2 register when a sysreg trap indicates they could be responsible for the exception. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Add sanitisation for FEAT_FGT2 registersMarc Zyngier
Just like the FEAT_FGT registers, treat the FGT2 variant the same way. THis is a large update, but a fairly mechanical one. The config dependencies are extracted from the 2025-03 JSON drop. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Add FEAT_FGT2 registers to the VNCR pageMarc Zyngier
The FEAT_FGT2 registers are part of the VNCR page. Describe the corresponding offsets and add them to the vcpu sysreg enumeration. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bitsMarc Zyngier
Similarly to other registers, describe which HCR_EL2 bit depends on which feature, and use this to compute the RES0 status of these bits. An additional complexity stems from the status of some bits such as E2H and RW, which do not had a RESx status, but still take a fixed value due to implementation choices in KVM. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bitsMarc Zyngier
Similarly to other registers, describe which HCR_EL2 bit depends on which feature, and use this to compute the RES0 status of these bits. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Allow kvm_has_feat() to take variable argumentsMarc Zyngier
In order to be able to write more compact (and easier to read) code, let kvm_has_feat() and co take variable arguments. This enables constructs such as: #define FEAT_SME ID_AA64PFR1_EL1, SME, IMP if (kvm_has_feat(kvm, FEAT_SME)) [...] which is admitedly more readable. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Use FGT feature maps to drive RES0 bitsMarc Zyngier
Another benefit of mapping bits to features is that it becomes trivial to define which bits should be handled as RES0. Let's apply this principle to the guest's view of the FGT registers. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Document NV caps and vcpu flagsMarc Zyngier
Describe the two new vcpu flags that control NV, together with the capabilities that advertise them. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250514103501.2225951-18-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2*Marc Zyngier
Since we're (almost) feature complete, let's allow userspace to request KVM_ARM_VCPU_EL2* by bumping KVM_VCPU_MAX_FEATURES up. We also now advertise the features to userspace with new capabilities. It's going to be great... Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Link: https://lore.kernel.org/r/20250514103501.2225951-17-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Remove dead code from ERET handlingMarc Zyngier
Cleanly, this code cannot trigger, since we filter this from the caller. Drop it. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-16-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatchMarc Zyngier
Now that we have to handle TLBI S1E2 in the core code, plumb the sysinsn dispatch code into it, so that these instructions don't just UNDEF anymore. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-15-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2Marc Zyngier
A TLBI by VA for S1 must take effect on our pseudo-TLB for VNCR and potentially knock the fixmap mapping. Even worse, that TLBI must be able to work cross-vcpu. For that, we track on a per-VM basis if any VNCR is mapped, using an atomic counter. Whenever a TLBI S1E2 occurs and that this counter is non-zero, we take the long road all the way back to the core code. There, we iterate over all vcpus and check whether this particular invalidation has any damaging effect. If it does, we nuke the pseudo TLB and the corresponding fixmap. Yes, this is costly. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-14-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap addressMarc Zyngier
Since we now have a way to map the guest's VNCR_EL2 on the host, we can point the host's VNCR_EL2 to it and go full circle! Note that we unconditionally assign the fixmap to VNCR_EL2, irrespective of the guest's version being mapped or not. We want to take a fault on first access, so the fixmap either contains something guranteed to be either invalid or a guest mapping. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-13-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiersMarc Zyngier
During an invalidation triggered by an MMU notifier, we need to make sure we can drop the *host* mapping that would have been translated by the stage-2 mapping being invalidated. For the moment, the invalidation is pretty brutal, as we nuke the full IPA range, and therefore any VNCR_EL2 mapping. At some point, we'll be more light-weight, and the code is able to deal with something more targetted. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-12-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2Marc Zyngier
Now that we can handle faults triggered through VNCR_EL2, we need to map the corresponding page at EL2. But where, you'll ask? Since each CPU in the system can run a vcpu, we need a per-CPU mapping. For that, we carve a NR_CPUS range in the fixmap, giving us a per-CPU va at which to map the guest's VNCR's page. The mapping occurs both on vcpu load and on the back of a fault, both generating a request that will take care of the mapping. That mapping will also get dropped on vcpu put. Yes, this is a bit heavy handed, but it is simple. Eventually, we may want to have a per-VM, per-CPU mapping, which would avoid all the TLBI overhead. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-11-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Handle VNCR_EL2-triggered faultsMarc Zyngier
As VNCR_EL2.BADDR contains a VA, it is bound to trigger faults. These faults can have multiple source: - We haven't mapped anything on the host: we need to compute the resulting translation, populate a TLB, and eventually map the corresponding page - The permissions are out of whack: we need to tell the guest about this state of affairs Note that the kernel doesn't support S1POE for itself yet, so the particular case of a VNCR page mapped with no permissions or with write-only permissions is not correctly handled yet. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-10-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2Marc Zyngier
Plug VNCR_EL2 in the vcpu_sysreg enum, define its RES0/RES1 bits, and make it accessible to userspace when the VM is configured to support FEAT_NV2. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-9-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2Marc Zyngier
FEAT_NV2 introduces an interesting problem for NV, as VNCR_EL2.BADDR is a virtual address in the EL2&0 (or EL2, but we thankfully ignore this) translation regime. As we need to replicate such mapping in the real EL2, it means that we need to remember that there is such a translation, and that any TLBI affecting EL2 can possibly affect this translation. It also means that any invalidation driven by an MMU notifier must be able to shoot down any such mapping. All in all, we need a data structure that represents this mapping, and that is extremely close to a TLB. Given that we can only use one of those per vcpu at any given time, we only allocate one. No effort is made to keep that structure small. If we need to start caching multiple of them, we may want to revisit that design point. But for now, it is kept simple so that we can reason about it. Oh, and add a braindump of how things are supposed to work, because I will definitely page this out at some point. Yes, pun intended. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-8-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nestingMarc Zyngier
We currently check for HCR_EL2.NV being set to decide whether we need to repaint PSTATE.M to say EL2 instead of EL1 on exit. However, this isn't correct when L2 is itself a hypervisor, and that L1 as set its own HCR_EL2.NV. That's because we "flatten" the state and inherit parts of the guest's own setup. In that case, we shouldn't adjust PSTATE.M, as this is really EL1 for both us and the guest. Instead of trying to try and work out how we ended-up with HCR_EL2.NV being set by introspecting both the host and guest states, use a per-CPU flag to remember the context (HYP or not), and use that information to decide whether PSTATE needs tweaking. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-7-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>