linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2024-12-18	KVM: x86: Don't update PV features caches when enabling enforcement capability	Sean Christopherson
	Revert the chunk of commit 01b4f510b9f4 ("kvm: x86: ensure pv_cpuid.features is initialized when enabling cap") that forced a PV features cache refresh during KVM_CAP_ENFORCE_PV_FEATURE_CPUID, as whatever ioctl() ordering issue it alleged to have fixed never existed upstream, and likely never existed in any kernel. At the time of the commit, there was a tangentially related ioctl() ordering issue, as toggling KVM_X86_DISABLE_EXITS_HLT after KVM_SET_CPUID2 would have resulted in KVM potentially leaving KVM_FEATURE_PV_UNHALT set. But (a) that bug affected the entire guest CPUID, not just the cache, (b) commit 01b4f510b9f4 didn't address that bug, it only refreshed the cache (with the bad CPUID), and (c) setting KVM_X86_DISABLE_EXITS_HLT after vCPU creation is completely broken as KVM configures HLT-exiting only during vCPU creation, which is why KVM_CAP_X86_DISABLE_EXITS is now disallowed if vCPUs have been created. Another tangentially related bug was KVM's failure to clear the cache when handling KVM_SET_CPUID2, but again commit 01b4f510b9f4 did nothing to fix that bug. The most plausible explanation for the what commit 01b4f510b9f4 was trying to fix is a bug that existed in Google's internal kernel that was the source of commit 01b4f510b9f4. At the time, Google's internal kernel had not yet picked up commit 0d3b2ba16ba68 ("KVM: X86: Go on updating other CPUID leaves when leaf 1 is absent"), i.e. KVM would not initialize the PV features cache if KVM_SET_CPUID2 was called without a CPUID.0x1 entry. Of course, no sane real world VMM would omit CPUID.0x1, including the KVM selftest added by commit ac4a4d6de22e ("selftests: kvm: test enforcement of paravirtual cpuid features"). And the test didn't actually try to verify multiple orderings, nor did the selftest enter the guest without doing KVM_SET_CPUID2, so who knows what motivated the change. Regardless of why commit 01b4f510b9f4 ("kvm: x86: ensure pv_cpuid.features is initialized when enabling cap") was added, refreshing the cache during KVM_CAP_ENFORCE_PV_FEATURE_CPUID isn't necessary. Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-20-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed	Sean Christopherson
	Reject KVM_CAP_X86_DISABLE_EXITS if userspace attempts to disable MWAIT or HLT exits and KVM previously reported (via KVM_CHECK_EXTENSION) that disabling the exit(s) is not allowed. E.g. because MWAIT isn't supported or the CPU doesn't have an always-running APIC timer, or because KVM is configured to mitigate cross-thread vulnerabilities. Cc: Kechen Lu <kechenl@nvidia.com> Fixes: 4d5422cea3b6 ("KVM: X86: Provide a capability to disable MWAIT intercepts") Fixes: 6f0f2d5ef895 ("KVM: x86: Mitigate the cross-thread return address predictions bug") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Disallow KVM_CAP_X86_DISABLE_EXITS after vCPU creation	Sean Christopherson
	Reject KVM_CAP_X86_DISABLE_EXITS if vCPUs have been created, as disabling PAUSE/MWAIT/HLT exits after vCPUs have been created is broken and useless, e.g. except for PAUSE on SVM, the relevant intercepts aren't updated after vCPU creation. vCPUs may also end up with an inconsistent configuration if exits are disabled between creation of multiple vCPUs. Cc: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: https://lore.kernel.org/all/9227068821b275ac547eb2ede09ec65d2281fe07.1680179693.git.houwenlong.hwl@antgroup.com Link: https://lore.kernel.org/all/20230121020738.2973-2-kechenl@nvidia.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Drop now-redundant MAXPHYADDR and GPA rsvd bits from vCPU creation	Sean Christopherson
	Drop the manual initialization of maxphyaddr and reserved_gpa_bits during vCPU creation now that kvm_arch_vcpu_create() unconditionally invokes kvm_vcpu_after_set_cpuid(), which handles all such CPUID caching. None of the helpers between the existing code in kvm_arch_vcpu_create() and the call to kvm_vcpu_after_set_cpuid() consume maxphyaddr or reserved_gpa_bits (though auditing vmx_vcpu_create() and svm_vcpu_create() isn't exactly easy). Link: https://lore.kernel.org/r/20241128013424.4096668-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Move __kvm_is_valid_cr4() definition to x86.h	Sean Christopherson
	Let vendor code inline __kvm_is_valid_cr4() now x86.c's cr4_reserved_bits no longer exists, as keeping cr4_reserved_bits local to x86.c was the only reason for "hiding" the definition of __kvm_is_valid_cr4(). No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Account for KVM-reserved CR4 bits when passing through CR4 on VMX	Sean Christopherson
	Drop x86.c's local pre-computed cr4_reserved bits and instead fold KVM's reserved bits into the guest's reserved bits. This fixes a bug where VMX's set_cr4_guest_host_mask() fails to account for KVM-reserved bits when deciding which bits can be passed through to the guest. In most cases, letting the guest directly write reserved CR4 bits is ok, i.e. attempting to set the bit(s) will still #GP, but not if a feature is available in hardware but explicitly disabled by the host, e.g. if FSGSBASE support is disabled via "nofsgsbase". Note, the extra overhead of computing host reserved bits every time userspace sets guest CPUID is negligible. The feature bits that are queried are packed nicely into a handful of words, and so checking and setting each reserved bit costs in the neighborhood of ~5 cycles, i.e. the total cost will be in the noise even if the number of checked CR4 bits doubles over the next few years. In other words, x86 will run out of CR4 bits long before the overhead becomes problematic. Note #2, __cr4_reserved_bits() starts from CR4_RESERVED_BITS, which is why the existing __kvm_cpu_cap_has() processing doesn't explicitly OR in CR4_RESERVED_BITS (and why the new code doesn't do so either). Fixes: 2ed41aa631fc ("KVM: VMX: Intercept guest reserved CR4 bits to inject #GP fault") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Do all post-set CPUID processing during vCPU creation	Sean Christopherson
	During vCPU creation, process KVM's default, empty CPUID as if userspace set an empty CPUID to ensure consistent and correct behavior with respect to guest CPUID. E.g. if userspace never sets guest CPUID, KVM will never configure cr4_guest_rsvd_bits, and thus create divergent, incorrect, guest- visible behavior due to letting the guest set any KVM-supported CR4 bits despite the features not being allowed per guest CPUID. Note! This changes KVM's ABI, as lack of full CPUID processing allowed userspace to stuff garbage vCPU state, e.g. userspace could set CR4 to a guest-unsupported value via KVM_SET_SREGS. But it's extremely unlikely that this is a breaking change, as KVM already has many flows that require userspace to set guest CPUID before loading vCPU state. E.g. multiple MSR flows consult guest CPUID on host writes, and KVM_SET_SREGS itself already relies on guest CPUID being up-to-date, as KVM's validity check on CR3 consumes CPUID.0x7.1 (for LAM) and CPUID.0x80000008 (for MAXPHYADDR). Furthermore, the plan is to commit to enforcing guest CPUID for userspace writes to MSRs, at which point bypassing sregs CPUID checks is even more nonsensical. Link: https://lore.kernel.org/r/20241128013424.4096668-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-13	KVM: x86: Cache CPUID.0xD XSTATE offsets+sizes during module init	Sean Christopherson
	Snapshot the output of CPUID.0xD.[1..n] during kvm.ko initiliaization to avoid the overead of CPUID during runtime. The offset, size, and metadata for CPUID.0xD.[1..n] sub-leaves does not depend on XCR0 or XSS values, i.e. is constant for a given CPU, and thus can be cached during module load. On Intel's Emerald Rapids, CPUID is wildly expensive, to the point where recomputing XSAVE offsets and sizes results in a 4x increase in latency of nested VM-Enter and VM-Exit (nested transitions can trigger xstate_required_size() multiple times per transition), relative to using cached values. The issue is easily visible by running `perf top` while triggering nested transitions: kvm_update_cpuid_runtime() shows up at a whopping 50%. As measured via RDTSC from L2 (using KVM-Unit-Test's CPUID VM-Exit test and a slightly modified L1 KVM to handle CPUID in the fastpath), a nested roundtrip to emulate CPUID on Skylake (SKX), Icelake (ICX), and Emerald Rapids (EMR) takes: SKX 11650 ICX 22350 EMR 28850 Using cached values, the latency drops to: SKX 6850 ICX 9000 EMR 7900 The underlying issue is that CPUID itself is slow on ICX, and comically slow on EMR. The problem is exacerbated on CPUs which support XSAVES and/or XSAVEC, as KVM invokes xstate_required_size() twice on each runtime CPUID update, and because there are more supported XSAVE features (CPUID for supported XSAVE feature sub-leafs is significantly slower). SKX: CPUID.0xD.2 = 348 cycles CPUID.0xD.3 = 400 cycles CPUID.0xD.4 = 276 cycles CPUID.0xD.5 = 236 cycles <other sub-leaves are similar> EMR: CPUID.0xD.2 = 1138 cycles CPUID.0xD.3 = 1362 cycles CPUID.0xD.4 = 1068 cycles CPUID.0xD.5 = 910 cycles CPUID.0xD.6 = 914 cycles CPUID.0xD.7 = 1350 cycles CPUID.0xD.8 = 734 cycles CPUID.0xD.9 = 766 cycles CPUID.0xD.10 = 732 cycles CPUID.0xD.11 = 718 cycles CPUID.0xD.12 = 734 cycles CPUID.0xD.13 = 1700 cycles CPUID.0xD.14 = 1126 cycles CPUID.0xD.15 = 898 cycles CPUID.0xD.16 = 716 cycles CPUID.0xD.17 = 748 cycles CPUID.0xD.18 = 776 cycles Note, updating runtime CPUID information multiple times per nested transition is itself a flaw, especially since CPUID is a mandotory intercept on both Intel and AMD. E.g. KVM doesn't need to ensure emulated CPUID state is up-to-date while running L2. That flaw will be fixed in a future patch, as deferring runtime CPUID updates is more subtle than it appears at first glance, the benefits aren't super critical to have once the XSAVE issue is resolved, and caching CPUID output is desirable even if KVM's updates are deferred. Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20241211013302.1347853-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-13	KVM: x86: expose MSR_PLATFORM_INFO as a feature MSR	Paolo Bonzini
	For userspace that wants to disable KVM_X86_QUIRK_STUFF_FEATURE_MSRS, it is useful to know what bits can be set to 1 in MSR_PLATFORM_INFO (apart from the TSC ratio). The right way to do that is via /dev/kvm's feature MSR mechanism. In fact, MSR_PLATFORM_INFO is already a feature MSR for the purpose of blocking updates after the vCPU is run, but KVM_GET_MSRS did not return a valid value for it. Just like in a VM that leaves KVM_X86_QUIRK_STUFF_FEATURE_MSRS enabled, the TSC ratio field is left to 0. Only bit 31 is set. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-13	Merge tag 'kvm-x86-misc-6.13' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86 misc changes for 6.13 - Clean up and optimize KVM's handling of writes to MSR_IA32_APICBASE. - Quirk KVM's misguided behavior of initialized certain feature MSRs to their maximum supported feature set, which can result in KVM creating invalid vCPU state. E.g. initializing PERF_CAPABILITIES to a non-zero value results in the vCPU having invalid state if userspace hides PDCM from the guest, which can lead to save/restore failures. - Fix KVM's handling of non-canonical checks for vCPUs that support LA57 to better follow the "architecture", in quotes because the actual behavior is poorly documented. E.g. most MSR writes and descriptor table loads ignore CR4.LA57 and operate purely on whether the CPU supports LA57. - Bypass the register cache when querying CPL from kvm_sched_out(), as filling the cache from IRQ context is generally unsafe, and harden the cache accessors to try to prevent similar issues from occuring in the future. - Advertise AMD_IBPB_RET to userspace, and fix a related bug where KVM over-advertises SPEC_CTRL when trying to support cross-vendor VMs. - Minor cleanups
2024-11-04	KVM: x86: Unpack msr_data structure prior to calling kvm_apic_set_base()	Sean Christopherson
	Pass in the new value and "host initiated" as separate parameters to kvm_apic_set_base(), as forcing the KVM_SET_SREGS path to declare and fill an msr_data structure is awkward and kludgy, e.g. __set_sregs_common() doesn't even bother to set the proper MSR index. No functional change intended. Suggested-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20241101183555.1794700-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04	KVM: x86: Rename APIC base setters to better capture their relationship	Sean Christopherson
	Rename kvm_set_apic_base() and kvm_lapic_set_base() to kvm_apic_set_base() and __kvm_apic_set_base() respectively to capture that the underscores version is a "special" variant (it exists purely to avoid recalculating the optimized map multiple times when stuffing the RESET value). Opportunistically add a comment explaining why kvm_lapic_reset() uses the inner helper. Note, KVM deliberately invokes kvm_arch_vcpu_create() while kvm->lock is NOT held so that vCPU setup isn't serialized if userspace is creating multiple/all vCPUs in parallel. I.e. triggering an extra recalculation is not limited to theoretical/rare edge cases, and so is worth avoiding. No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-7-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04	KVM: x86: Move kvm_set_apic_base() implementation to lapic.c (from x86.c)	Sean Christopherson
	Move kvm_set_apic_base() to lapic.c so that the bulk of KVM's local APIC code resides in lapic.c, regardless of whether or not KVM is emulating the local APIC in-kernel. This will also allow making various helpers visible only to lapic.c. No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-6-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04	KVM: x86: Inline kvm_get_apic_mode() in lapic.h	Sean Christopherson
	Inline kvm_get_apic_mode() in lapic.h to avoid a CALL+RET as well as an export. The underlying kvm_apic_mode() helper is public information, i.e. there is no state/information that needs to be hidden from vendor modules. No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-5-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04	KVM: x86: Get vcpu->arch.apic_base directly and drop kvm_get_apic_base()	Sean Christopherson
	Access KVM's emulated APIC base MSR value directly instead of bouncing through a helper, as there is no reason to add a layer of indirection, and there are other MSRs with a "set" but no "get", e.g. EFER. No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-4-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04	KVM: x86/mmu: Recover TDP MMU huge page mappings in-place instead of zapping	David Matlack
	Recover TDP MMU huge page mappings in-place instead of zapping them when dirty logging is disabled, and rename functions that recover huge page mappings when dirty logging is disabled to move away from the "zap collapsible spte" terminology. Before KVM flushes TLBs, guest accesses may be translated through either the (stale) small SPTE or the (new) huge SPTE. This is already possible when KVM is doing eager page splitting (where TLB flushes are also batched), and when vCPUs are faulting in huge mappings (where TLBs are flushed after the new huge SPTE is installed). Recovering huge pages reduces the number of page faults when dirty logging is disabled: $ perf stat -e kvm:kvm_page_fault -- ./dirty_log_perf_test -s anonymous_hugetlb_2mb -v 64 -e -b 4g Before: 393,599 kvm:kvm_page_fault After: 262,575 kvm:kvm_page_fault vCPU throughput and the latency of disabling dirty-logging are about equal compared to zapping, but avoiding faults can be beneficial to remove vCPU jitter in extreme scenarios. Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20240823235648.3236880-5-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01	KVM: x86: Remove ordering check b/w MSR_PLATFORM_INFO and MISC_FEATURES_ENABLES	Sean Christopherson
	Drop KVM's odd restriction that disallows clearing CPUID_FAULT in MSR_PLATFORM_INFO if CPL>0 CPUID faulting is enabled in MSR_MISC_FEATURES_ENABLES. KVM generally doesn't require specific ordering when userspace sets MSRs, and the completely arbitrary order of MSRs in emulated_msrs_all means that a userspace that uses KVM's list verbatim could run afoul of the check. Dropping the restriction obviously means that userspace could stuff a nonsensical vCPU model, but that's the case all over KVM. KVM typically restricts userspace MSR writes only when it makes things easier for KVM and/or userspace. Link: https://lore.kernel.org/r/20240802185511.305849-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01	KVM: x86: Reject userspace attempts to access ARCH_CAPABILITIES w/o support	Sean Christopherson
	Reject userspace accesses to ARCH_CAPABILITIES if the MSR isn't supposed to exist, according to guest CPUID. However, "reject" accesses with KVM_MSR_RET_UNSUPPORTED, so that reads get '0' and writes of '0' are ignored if KVM advertised support ARCH_CAPABILITIES. KVM's ABI is that userspace must set guest CPUID prior to setting MSRs, and that setting MSRs that aren't supposed exist is disallowed (modulo the '0' exemption). Link: https://lore.kernel.org/r/20240802185511.305849-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01	KVM: x86: Reject userspace attempts to access PERF_CAPABILITIES w/o PDCM	Sean Christopherson
	Reject userspace accesses to PERF_CAPABILITIES if PDCM isn't set in guest CPUID, i.e. if the vCPU doesn't actually have PERF_CAPABILITIES. But! Do so via KVM_MSR_RET_UNSUPPORTED, so that reads get '0' and writes of '0' are ignored if KVM advertised support PERF_CAPABILITIES. KVM's ABI is that userspace must set guest CPUID prior to setting MSRs, and that setting MSRs that aren't supposed exist is disallowed (modulo the '0' exemption). Link: https://lore.kernel.org/r/20240802185511.305849-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01	KVM: x86: Quirk initialization of feature MSRs to KVM's max configuration	Sean Christopherson
	Add a quirk to control KVM's misguided initialization of select feature MSRs to KVM's max configuration, as enabling features by default violates KVM's approach of letting userspace own the vCPU model, and is actively problematic for MSRs that are conditionally supported, as the vCPU will end up with an MSR value that userspace can't restore. E.g. if the vCPU is configured with PDCM=0, userspace will save and attempt to restore a non-zero PERF_CAPABILITIES, thanks to KVM's meddling. Link: https://lore.kernel.org/r/20240802185511.305849-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01	KVM: x86: Disallow changing MSR_PLATFORM_INFO after vCPU has run	Sean Christopherson
	Tag MSR_PLATFORM_INFO as a feature MSR (because it is), i.e. disallow it from being modified after the vCPU has run. To make KVM's selftest compliant, simply delete the userspace MSR write that restores KVM's original value at the end of the test. Verifying that userspace can write back what it originally read is uninteresting in this particular case, because KVM doesn't enforce _any_ bits in the MSR, i.e. userspace should be able to write any arbitrary value. Link: https://lore.kernel.org/r/20240802185511.305849-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01	KVM: x86: Co-locate initialization of feature MSRs in kvm_arch_vcpu_create()	Sean Christopherson
	Bunch all of the feature MSR initialization in kvm_arch_vcpu_create() so that it can be easily quirked in a future patch. No functional change intended. Link: https://lore.kernel.org/r/20240802185511.305849-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01	KVM: x86: model canonical checks more precisely	Maxim Levitsky
	As a result of a recent investigation, it was determined that x86 CPUs which support 5-level paging, don't always respect CR4.LA57 when doing canonical checks. In particular: 1. MSRs which contain a linear address, allow full 57-bitcanonical address regardless of CR4.LA57 state. For example: MSR_KERNEL_GS_BASE. 2. All hidden segment bases and GDT/IDT bases also behave like MSRs. This means that full 57-bit canonical address can be loaded to them regardless of CR4.LA57, both using MSRS (e.g GS_BASE) and instructions (e.g LGDT). 3. TLB invalidation instructions also allow the user to use full 57-bit address regardless of the CR4.LA57. Finally, it must be noted that the CPU doesn't prevent the user from disabling 5-level paging, even when the full 57-bit canonical address is present in one of the registers mentioned above (e.g GDT base). In fact, this can happen without any userspace help, when the CPU enters SMM mode - some MSRs, for example MSR_KERNEL_GS_BASE are left to contain a non-canonical address in regard to the new mode. Since most of the affected MSRs and all segment bases can be read and written freely by the guest without any KVM intervention, this patch makes the emulator closely follow hardware behavior, which means that the emulator doesn't take in the account the guest CPUID support for 5-level paging, and only takes in the account the host CPU support. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240906221824.491834-4-mlevitsk@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01	KVM: x86: Add X86EMUL_F_MSR and X86EMUL_F_DT_LOAD to aid canonical checks	Maxim Levitsky
	Add emulation flags for MSR accesses and Descriptor Tables loads, and pass the new flags as appropriate to emul_is_noncanonical_address(). The flags will be used to perform the correct canonical check, as the type of access affects whether or not CR4.LA57 is consulted when determining the canonical bit. No functional change is intended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240906221824.491834-3-mlevitsk@redhat.com [sean: split to separate patch, massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01	KVM: x86: Route non-canonical checks in emulator through emulate_ops	Maxim Levitsky
	Add emulate_ops.is_canonical_addr() to perform (non-)canonical checks in the emulator, which will allow extending is_noncanonical_address() to support different flavors of canonical checks, e.g. for descriptor table bases vs. MSRs, without needing duplicate logic in the emulator. No functional change is intended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240906221824.491834-3-mlevitsk@redhat.com [sean: separate from additional of flags, massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01	KVM: x86: Use '0' for guest RIP if PMI encounters protected guest state	Sean Christopherson
	Explicitly return '0' for guest RIP when handling a PMI VM-Exit for a vCPU with protected guest state, i.e. when KVM can't read the real RIP. While there is no "right" value, and profiling a protect guest is rather futile, returning the last known RIP is worse than returning obviously "bad" data. E.g. for SEV-ES+, the last known RIP will often point somewhere in the guest's boot flow. Opportunistically add WARNs to effectively assert that the in_kernel() and get_ip() callbacks are restricted to the common PMI handler, as the return values for the protected guest state case are largely arbitrary, i.e. only make any sense whatsoever for PMIs, where the returned values have no functional impact and thus don't truly matter. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241009175002.1118178-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01	KVM: x86: Bypass register cache when querying CPL from kvm_sched_out()	Sean Christopherson
	When querying guest CPL to determine if a vCPU was preempted while in kernel mode, bypass the register cache, i.e. always read SS.AR_BYTES from the VMCS on Intel CPUs. If the kernel is running with full preemption enabled, using the register cache in the preemption path can result in stale and/or uninitialized data being cached in the segment cache. In particular the following scenario is currently possible: - vCPU is just created, and the vCPU thread is preempted before SS.AR_BYTES is written in vmx_vcpu_reset(). - When scheduling out the vCPU task, kvm_arch_vcpu_in_kernel() => vmx_get_cpl() reads and caches '0' for SS.AR_BYTES. - vmx_vcpu_reset() => seg_setup() configures SS.AR_BYTES, but doesn't invoke vmx_segment_cache_clear() to invalidate the cache. As a result, KVM retains a stale value in the cache, which can be read, e.g. via KVM_GET_SREGS. Usually this is not a problem because the VMX segment cache is reset on each VM-Exit, but if the userspace VMM (e.g KVM selftests) reads and writes system registers just after the vCPU was created, _without_ modifying SS.AR_BYTES, userspace will write back the stale '0' value and ultimately will trigger a VM-Entry failure due to incorrect SS segment type. Note, the VM-Enter failure can also be avoided by moving the call to vmx_segment_cache_clear() until after the vmx_vcpu_reset() initializes all segments. However, while that change is correct and desirable (and will come along shortly), it does not address the underlying problem that accessing KVM's register caches from !task context is generally unsafe. In addition to fixing the immediate bug, bypassing the cache for this particular case will allow hardening KVM register caching log to assert that the caches are accessed only when KVM _knows_ it is safe to do so. Fixes: de63ad4cf497 ("KVM: X86: implement the logic for spinlock optimization") Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Closes: https://lore.kernel.org/all/20240716022014.240960-3-mlevitsk@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241009175002.1118178-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01	KVM: x86: Ensure vcpu->mode is loaded from memory in kvm_vcpu_exit_request()	Sean Christopherson
	Wrap kvm_vcpu_exit_request()'s load of vcpu->mode with READ_ONCE() to ensure the variable is re-loaded from memory, as there is no guarantee the caller provides the necessary annotations to ensure KVM sees a fresh value, e.g. the VM-Exit fastpath could theoretically reuse the pre-VM-Enter value. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20240828232013.768446-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01	KVM: x86: Fix a comment inside __kvm_set_or_clear_apicv_inhibit()	Kai Huang
	Change svm_vcpu_run() to vcpu_enter_guest() in the comment of __kvm_set_or_clear_apicv_inhibit() to make it reflect the fact. When one thread updates VM's APICv state due to updating the APICv inhibit reasons, it kicks off all vCPUs and makes them wait until the new reason has been updated and can be seen by all vCPUs. There was one WARN() to make sure VM's APICv state is consistent with vCPU's APICv state in the svm_vcpu_run(). Commit ee49a8932971 ("KVM: x86: Move SVM's APICv sanity check to common x86") moved that WARN() to x86 common code vcpu_enter_guest() due to the logic is not unique to SVM, and added comments to both __kvm_set_or_clear_apicv_inhibit() and vcpu_enter_guest() to explain this. However, although the comment in __kvm_set_or_clear_apicv_inhibit() mentioned the WARN(), it seems forgot to reflect that the WARN() had been moved to x86 common, i.e., it still mentioned the svm_vcpu_run() but not vcpu_enter_guest(). Fix it. Note after the change the first line that contains vcpu_enter_guest() exceeds 80 characters, but leave it as is to make the diff clean. Fixes: ee49a8932971 ("KVM: x86: Move SVM's APICv sanity check to common x86") Signed-off-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/e462e7001b8668649347f879c66597d3327dbac2.1728383775.git.kai.huang@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01	KVM: x86: Fix a comment inside kvm_vcpu_update_apicv()	Kai Huang
	The sentence "... so that KVM can the AVIC doorbell to ..." doesn't have a verb. Fix it. After adding the verb 'use', that line exceeds 80 characters. Thus wrap the 'to' to the next line. Signed-off-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/666e991edf81e1fccfba9466f3fe65965fcba897.1728383775.git.kai.huang@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-17	Merge tag 'kvm-x86-vmx-6.12' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM VMX changes for 6.12: - Set FINAL/PAGE in the page fault error code for EPT Violations if and only if the GVA is valid. If the GVA is NOT valid, there is no guest-side page table walk and so stuffing paging related metadata is nonsensical. - Fix a bug where KVM would incorrectly synthesize a nested VM-Exit instead of emulating posted interrupt delivery to L2. - Add a lockdep assertion to detect unsafe accesses of vmcs12 structures. - Harden eVMCS loading against an impossible NULL pointer deref (really truly should be impossible). - Minor SGX fix and a cleanup.
2024-09-17	Merge tag 'kvm-x86-pat_vmx_msrs-6.12' of https://github.com/kvm-x86/linux ↵	Paolo Bonzini
	into HEAD KVM VMX and x86 PAT MSR macro cleanup for 6.12: - Add common defines for the x86 architectural memory types, i.e. the types that are shared across PAT, MTRRs, VMCSes, and EPTPs. - Clean up the various VMX MSR macros to make the code self-documenting (inasmuch as possible), and to make it less painful to add new macros.
2024-09-17	Merge tag 'kvm-x86-mmu-6.12' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86 MMU changes for 6.12: - Overhaul the "unprotect and retry" logic to more precisely identify cases where retrying is actually helpful, and to harden all retry paths against putting the guest into an infinite retry loop. - Add support for yielding, e.g. to honor NEED_RESCHED, when zapping rmaps in the shadow MMU. - Refactor pieces of the shadow MMU related to aging SPTEs in prepartion for adding MGLRU support in KVM. - Misc cleanups
2024-09-17	Merge tag 'kvm-x86-misc-6.12' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86 misc changes for 6.12 - Advertise AVX10.1 to userspace (effectively prep work for the "real" AVX10 functionality that is on the horizon). - Rework common MSR handling code to suppress errors on userspace accesses to unsupported-but-advertised MSRs. This will allow removing (almost?) all of KVM's exemptions for userspace access to MSRs that shouldn't exist based on the vCPU model (the actual cleanup is non-trivial future work). - Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC) splits the 64-bit value into the legacy ICR and ICR2 storage, whereas Intel (APICv) stores the entire 64-bit value a the ICR offset. - Fix a bug where KVM would fail to exit to userspace if one was triggered by a fastpath exit handler. - Add fastpath handling of HLT VM-Exit to expedite re-entering the guest when there's already a pending wake event at the time of the exit. - Finally fix the RSM vs. nested VM-Enter WARN by forcing the vCPU out of guest mode prior to signalling SHUTDOWN (architecturally, the SHUTDOWN is supposed to hit L1, not L2).
2024-09-17	Merge branch 'kvm-redo-enable-virt' into HEAD	Paolo Bonzini
	Register KVM's cpuhp and syscore callbacks when enabling virtualization in hardware, as the sole purpose of said callbacks is to disable and re-enable virtualization as needed. The primary motivation for this series is to simplify dealing with enabling virtualization for Intel's TDX, which needs to enable virtualization when kvm-intel.ko is loaded, i.e. long before the first VM is created. That said, this is a nice cleanup on its own. By registering the callbacks on-demand, the callbacks themselves don't need to check kvm_usage_count, because their very existence implies a non-zero count. Patch 1 (re)adds a dedicated lock for kvm_usage_count. This avoids a lock ordering issue between cpus_read_lock() and kvm_lock. The lock ordering issue still exist in very rare cases, and will be fixed for good by switching vm_list to an (S)RCU-protected list. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-09-09	KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure()	Sean Christopherson
	Rename reexecute_instruction() to kvm_unprotect_and_retry_on_failure() to make the intent and purpose of the helper much more obvious. No functional change intended. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20240831001538.336683-20-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86: Update retry protection fields when forcing retry on emulation failure	Sean Christopherson
	When retrying the faulting instruction after emulation failure, refresh the infinite loop protection fields even if no shadow pages were zapped, i.e. avoid hitting an infinite loop even when retrying the instruction as a last-ditch effort to avoid terminating the guest. Link: https://lore.kernel.org/r/20240831001538.336683-19-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86: Apply retry protection to "unprotect on failure" path	Sean Christopherson
	Use kvm_mmu_unprotect_gfn_and_retry() in reexecute_instruction() to pick up protection against infinite loops, e.g. if KVM somehow manages to encounter an unsupported instruction and unprotecting the gfn doesn't allow the vCPU to make forward progress. Other than that, the retry-on- failure logic is a functionally equivalent, open coded version of kvm_mmu_unprotect_gfn_and_retry(). Note, the emulation failure path still isn't fully protected, as KVM won't update the retry protection fields if no shadow pages are zapped (but this change is still a step forward). That flaw will be addressed in a future patch. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20240831001538.336683-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn	Sean Christopherson
	Don't bother unprotecting the target gfn if EMULTYPE_WRITE_PF_TO_SP is set, as KVM will simply report the emulation failure to userspace. This will allow converting reexecute_instruction() to use kvm_mmu_unprotect_gfn_instead_retry() instead of kvm_mmu_unprotect_page(). Link: https://lore.kernel.org/r/20240831001538.336683-17-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86: Remove manual pfn lookup when retrying #PF after failed emulation	Sean Christopherson
	Drop the manual pfn look when retrying an instruction that KVM failed to emulation in response to a #PF due to a write-protected gfn. Now that KVM sets EMULTYPE_ALLOW_RETRY_PF if and only if the page fault hit a write- protected gfn, i.e. if and only if there's a writable memslot, there's no need to redo the lookup to avoid retrying an instruction that failed on emulated MMIO (no slot, or a write to a read-only slot). I.e. KVM will never attempt to retry an instruction that failed on emulated MMIO, whereas that was not the case prior to the introduction of RET_PF_WRITE_PROTECTED. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20240831001538.336683-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86: Fold retry_instruction() into x86_emulate_instruction()	Sean Christopherson
	Now that retry_instruction() is reasonably tiny, fold it into its sole caller, x86_emulate_instruction(). In addition to getting rid of the absurdly confusing retry_instruction() name, handling the retry in x86_emulate_instruction() pairs it back up with the code that resets last_retry_{eip,address}. No functional change intended. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20240831001538.336683-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86: Move EMULTYPE_ALLOW_RETRY_PF to x86_emulate_instruction()	Sean Christopherson
	Move the sanity checks for EMULTYPE_ALLOW_RETRY_PF to the top of x86_emulate_instruction(). In addition to deduplicating a small amount of code, this makes the connection between EMULTYPE_ALLOW_RETRY_PF and EMULTYPE_PF even more explicit, and will allow dropping retry_instruction() entirely. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20240831001538.336683-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86/mmu: Apply retry protection to "fast nTDP unprotect" path	Sean Christopherson
	Move the anti-infinite-loop protection provided by last_retry_{eip,addr} into kvm_mmu_write_protect_fault() so that it guards unprotect+retry that never hits the emulator, as well as reexecute_instruction(), which is the last ditch "might as well try it" logic that kicks in when emulation fails on an instruction that faulted on a write-protected gfn. Add a new helper, kvm_mmu_unprotect_gfn_and_retry(), to set the retry fields and deduplicate other code (with more to come). Link: https://lore.kernel.org/r/20240831001538.336683-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86: Store gpa as gpa_t, not unsigned long, when unprotecting for retry	Sean Christopherson
	Store the gpa used to unprotect the faulting gfn for retry as a gpa_t, not an unsigned long. This fixes a bug where 32-bit KVM would unprotect and retry the wrong gfn if the gpa had bits 63:32!=0. In practice, this bug is functionally benign, as unprotecting the wrong gfn is purely a performance issue (thanks to the anti-infinite-loop logic). And of course, almost no one runs 32-bit KVM these days. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20240831001538.336683-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86: Get RIP from vCPU state when storing it to last_retry_eip	Sean Christopherson
	Read RIP from vCPU state instead of pulling it from the emulation context when filling last_retry_eip, which is part of the anti-infinite-loop protection used when unprotecting and retrying instructions that hit a write-protected gfn. This will allow reusing the anti-infinite-loop protection in flows that never make it into the emulator. No functional change intended, as ctxt->eip is set to kvm_rip_read() in init_emulate_ctxt(), and EMULTYPE_PF emulation is mutually exclusive with EMULTYPE_NO_DECODE and EMULTYPE_SKIP, i.e. always goes through x86_decode_emulated_instruction() and hasn't advanced ctxt->eip (yet). Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20240831001538.336683-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86: Retry to-be-emulated insn in "slow" unprotect path iff sp is zapped	Sean Christopherson
	Resume the guest and thus skip emulation of a non-PTE-writing instruction if and only if unprotecting the gfn actually zapped at least one shadow page. If the gfn is write-protected for some reason other than shadow paging, attempting to unprotect the gfn will effectively fail, and thus retrying the instruction is all but guaranteed to be pointless. This bug has existed for a long time, but was effectively fudged around by the retry RIP+address anti-loop detection. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20240831001538.336683-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09	KVM: x86: Forcibly leave nested if RSM to L2 hits shutdown	Sean Christopherson
	Leave nested mode before synthesizing shutdown (a.k.a. TRIPLE_FAULT) if RSM fails when resuming L2 (a.k.a. guest mode). Architecturally, shutdown on RSM occurs _before_ the transition back to guest mode on both Intel and AMD. On Intel, per the SDM pseudocode, SMRAM state is loaded before critical VMX state: restore state normally from SMRAM; ... CR4.VMXE := value stored internally; IF internal storage indicates that the logical processor had been in VMX operation (root or non-root) THEN enter VMX operation (root or non-root); restore VMX-critical state as defined in Section 32.14.1; ... restore current VMCS pointer; FI; AMD's APM is both less clearcut and more explicit. Because AMD CPUs save VMCB and guest state in SMRAM itself, given the lack of anything in the APM to indicate a shutdown in guest mode is possible, a straightforward reading of the clause on invalid state is that _what_ state is invalid is irrelevant, i.e. all roads lead to shutdown. An RSM causes a processor shutdown if an invalid-state condition is found in the SMRAM state-save area. This fixes a bug found by syzkaller where synthesizing shutdown for L2 led to a nested VM-Exit (if L1 is intercepting shutdown), which in turn caused KVM to complain about trying to cancel a nested VM-Enter (see commit 759cbd59674a ("KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM"). Note, Paolo pointed out that KVM shouldn't set nested_run_pending until after loading SMRAM state. But as above, that's only half the story, KVM shouldn't transition to guest mode either. Unfortunately, fixing that mess requires rewriting the nVMX and nSVM RSM flows to not piggyback their nested VM-Enter flows, as executing the nested VM-Enter flows after loading state from SMRAM would clobber much of said state. For now, add a FIXME to call out that transitioning to guest mode before loading state from SMRAM is wrong. Link: https://lore.kernel.org/all/CABgObfYaUHXyRmsmg8UjRomnpQ0Jnaog9-L2gMjsjkqChjDYUQ@mail.gmail.com Reported-by: syzbot+988d9efcdf137bc05f66@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/0000000000007a9acb06151e1670@google.com Reported-by: Zheyu Ma <zheyuma97@gmail.com> Closes: https://lore.kernel.org/all/CAMhUBjmXMYsEoVYw_M8hSZjBMHh24i88QYm-RY6HDta5YZ7Wgw@mail.gmail.com Analyzed-by: Michal Wilczynski <michal.wilczynski@intel.com> Cc: Kishen Maloor <kishen.maloor@intel.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240906161337.1118412-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-06	KVM: x86: don't fall through case statements without annotations	Linus Torvalds
	clang warns on this because it has an unannotated fall-through between cases: arch/x86/kvm/x86.c:4819:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough] and while we could annotate it as a fallthrough, the proper fix is to just add the break for this case, instead of falling through to the default case and the break there. gcc also has that warning, but it looks like gcc only warns for the cases where they fall through to "real code", rather than to just a break. Odd. Fixes: d30d9ee94cc0 ("KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM") Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Tom Dohrmann <erbse.13@gmx.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-04	KVM: x86: Register "emergency disable" callbacks when virt is enabled	Sean Christopherson
	Register the "disable virtualization in an emergency" callback just before KVM enables virtualization in hardware, as there is no functional need to keep the callbacks registered while KVM happens to be loaded, but is inactive, i.e. if KVM hasn't enabled virtualization. Note, unregistering the callback every time the last VM is destroyed could have measurable latency due to the synchronize_rcu() needed to ensure all references to the callback are dropped before KVM is unloaded. But the latency should be a small fraction of the total latency of disabling virtualization across all CPUs, and userspace can set enable_virt_at_load to completely eliminate the runtime overhead. Add a pointer in kvm_x86_ops to allow vendor code to provide its callback. There is no reason to force vendor code to do the registration, and either way KVM would need a new kvm_x86_ops hook. Suggested-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Tested-by: Farrah Chen <farrah.chen@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240830043600.127750-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-09-04	KVM: x86: Rename virtualization {en,dis}abling APIs to match common KVM	Sean Christopherson
	Rename x86's the per-CPU vendor hooks used to enable virtualization in hardware to align with the recently renamed arch hooks. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-ID: <20240830043600.127750-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>