summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/svm/svm.c
AgeCommit message (Collapse)Author
2023-08-03x86/reboot: KVM: Disable SVM during reboot via virt/KVM reboot callbackSean Christopherson
Use the virt callback to disable SVM (and set GIF=1) during an emergency instead of blindly attempting to disable SVM. Like the VMX case, if a hypervisor, i.e. KVM, isn't loaded/active, SVM can't be in use. Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20230721201859.2307736-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: SVM: Use svm_get_lbr_vmcb() helper to handle writes to DEBUGCTLSean Christopherson
Use the recently introduced svm_get_lbr_vmcb() instead an open coded equivalent to retrieve the target VMCB when emulating writes to MSR_IA32_DEBUGCTLMSR. No functional change intended. Link: https://lore.kernel.org/r/20230607203519.1570167-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: SVM: Clean up handling of LBR virtualization enabledSean Christopherson
Clean up the enable_lbrv computation in svm_update_lbrv() to consolidate the logic for computing enable_lbrv into a single statement, and to remove the coding style violations (lack of curly braces on nested if). No functional change intended. Link: https://lore.kernel.org/r/20230607203519.1570167-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02KVM: SVM: Fix dead KVM_BUG() code in LBR MSR virtualizationSean Christopherson
Refactor KVM's handling of LBR MSRs on SVM to avoid a second layer of case statements, and thus eliminate a dead KVM_BUG() call, which (a) will never be hit in the current code base and (b) if a future commit breaks things, will never fire as KVM passes "false" instead "true" or '1' for the KVM_BUG() condition. Reported-by: Michal Luczaj <mhal@rbox.co> Cc: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20230607203519.1570167-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-29KVM: x86: Disallow KVM_SET_SREGS{2} if incoming CR0 is invalidSean Christopherson
Reject KVM_SET_SREGS{2} with -EINVAL if the incoming CR0 is invalid, e.g. due to setting bits 63:32, illegal combinations, or to a value that isn't allowed in VMX (non-)root mode. The VMX checks in particular are "fun" as failure to disallow Real Mode for an L2 that is configured with unrestricted guest disabled, when KVM itself has unrestricted guest enabled, will result in KVM forcing VM86 mode to virtual Real Mode for L2, but then fail to unwind the related metadata when synthesizing a nested VM-Exit back to L1 (which has unrestricted guest enabled). Opportunistically fix a benign typo in the prototype for is_valid_cr4(). Cc: stable@vger.kernel.org Reported-by: syzbot+5feef0b9ee9c8e9e5689@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000f316b705fdf6e2b4@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230613203037.1968489-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29Revert "KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"Sean Christopherson
Now that handle_fastpath_set_msr_irqoff() acquires kvm->srcu, i.e. allows dereferencing memslots during WRMSR emulation, drop the requirement that "next RIP" is valid. In hindsight, acquiring kvm->srcu would have been a better fix than avoiding the pastpath, but at the time it was thought that accessing SRCU-protected data in the fastpath was a one-off edge case. This reverts commit 5c30e8101e8d5d020b1d7119117889756a6ed713. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230721224337.2335137-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-28KVM: SVM: Don't try to pointlessly single-step SEV-ES guests for NMI windowSean Christopherson
Bail early from svm_enable_nmi_window() for SEV-ES guests without trying to enable single-step of the guest, as single-stepping an SEV-ES guest is impossible and the guest is responsible for *telling* KVM when it is ready for an new NMI to be injected. Functionally, setting TF and RF in svm->vmcb->save.rflags is benign as the field is ignored by hardware, but it's all kinds of confusing. Signed-off-by: Alexey Kardashevskiy <aik@amd.com> Link: https://lore.kernel.org/r/20230615063757.3039121-10-aik@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28KVM: SVM: Don't defer NMI unblocking until next exit for SEV-ES guestsSean Christopherson
Immediately mark NMIs as unmasked in response to #VMGEXIT(NMI complete) instead of setting awaiting_iret_completion and waiting until the *next* VM-Exit to unmask NMIs. The whole point of "NMI complete" is that the guest is responsible for telling the hypervisor when it's safe to inject an NMI, i.e. there's no need to wait. And because there's no IRET to single-step, the next VM-Exit could be a long time coming, i.e. KVM could incorrectly hold an NMI pending for far longer than what is required and expected. Opportunistically fix a stale reference to HF_IRET_MASK. Fixes: 916b54a7688b ("KVM: x86: Move HF_NMI_MASK and HF_IRET_MASK into "struct vcpu_svm"") Fixes: 4444dfe4050b ("KVM: SVM: Add NMI support for an SEV-ES guest") Cc: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20230615063757.3039121-9-aik@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28KVM: SVM/SEV/SEV-ES: Rework interceptsAlexey Kardashevskiy
Currently SVM setup is done sequentially in init_vmcb() -> sev_init_vmcb() -> sev_es_init_vmcb() and tries keeping SVM/SEV/SEV-ES bits separated. One of the exceptions is DR intercepts which is for SEV-ES before sev_es_init_vmcb() runs. Move the SEV-ES intercept setup to sev_es_init_vmcb(). From now on set_dr_intercepts()/clr_dr_intercepts() handle SVM/SEV only. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Santosh Shukla <santosh.shukla@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20230615063757.3039121-6-aik@amd.com [sean: drop comment about intercepting DR7] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28KVM: SEV-ES: explicitly disable debugAlexey Kardashevskiy
SVM/SEV enable debug registers intercepts to skip swapping DRs on entering/exiting the guest. When the guest is in control of debug registers (vcpu->guest_debug == 0), there is an optimisation to reduce the number of context switches: intercepts are cleared and the KVM_DEBUGREG_WONT_EXIT flag is set to tell KVM to do swapping on guest enter/exit. The same code also executes for SEV-ES, however it has no effect as - it always takes (vcpu->guest_debug == 0) branch; - KVM_DEBUGREG_WONT_EXIT is set but DR7 intercept is not cleared; - vcpu_enter_guest() writes DRs but VMRUN for SEV-ES swaps them with the values from _encrypted_ VMSA. Be explicit about SEV-ES not supporting debug: - return right away from dr_interception() and skip unnecessary processing; - return an error right away from the KVM_SEV_LAUNCH_UPDATE_VMSA handler if debugging was already enabled. KVM_SET_GUEST_DEBUG are failing already after KVM_SEV_LAUNCH_UPDATE_VMSA is finished due to vcpu->arch.guest_state_protected set to true. Add WARN_ON to kvm_x86::sync_dirty_debug_regs() (saves guest DRs on guest exit) to signify that SEV-ES won't hit that path. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Alexey Kardashevskiy <aik@amd.com> Link: https://lore.kernel.org/r/20230615063757.3039121-5-aik@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28KVM: SEV: Move SEV's GP_VECTOR intercept setup to SEVAlexey Kardashevskiy
Currently SVM setup is done sequentially in init_vmcb() -> sev_init_vmcb() -> sev_es_init_vmcb() and tries keeping SVM/SEV/SEV-ES bits separated. One of the exceptions is #GP intercept which init_vmcb() skips setting for SEV guests and then sev_es_init_vmcb() needlessly clears it. Remove the SEV check from init_vmcb(). Clear the #GP intercept in sev_init_vmcb(). SEV-ES will use the SEV setting. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Carlos Bilbao <carlos.bilbao@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Santosh Shukla <santosh.shukla@amd.com> Link: https://lore.kernel.org/r/20230615063757.3039121-3-aik@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28KVM: SEV: move set_dr_intercepts/clr_dr_intercepts from the headerAlexey Kardashevskiy
Static functions set_dr_intercepts() and clr_dr_intercepts() are only called from SVM so move them to .c. No functional change intended. Signed-off-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Carlos Bilbao <carlos.bilbao@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Santosh Shukla <santosh.shukla@amd.com> Link: https://lore.kernel.org/r/20230615063757.3039121-2-aik@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-27x86/srso: Add IBPB on VMEXITBorislav Petkov (AMD)
Add the option to flush IBPB only on VMEXIT in order to protect from malicious guests but one otherwise trusts the software that runs on the hypervisor. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-07-01Merge tag 'kvm-x86-svm-6.5' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM SVM changes for 6.5: - Drop manual TR/TSS load after VM-Exit now that KVM uses VMLOAD for host state - Fix a not-yet-problematic missing call to trace_kvm_exit() for VM-Exits that are handled in the fastpath - Print more descriptive information about the status of SEV and SEV-ES during module load - Assert that misc_cg_set_capacity() doesn't fail to avoid should-be-impossible memory leaks
2023-07-01Merge tag 'kvm-x86-pmu-6.5' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86/pmu changes for 6.5: - Add support for AMD PerfMonV2, with a variety of cleanups and minor fixes included along the way
2023-07-01Merge tag 'kvm-x86-misc-6.5' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 changes for 6.5: * Move handling of PAT out of MTRR code and dedup SVM+VMX code * Fix output of PIC poll command emulation when there's an interrupt * Add a maintainer's handbook to document KVM x86 processes, preferred coding style, testing expectations, etc. * Misc cleanups
2023-06-06KVM: x86/cpuid: Add AMD CPUID ExtPerfMonAndDbg leaf 0x80000022Like Xu
CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some new performance monitoring features for AMD processors. Bit 0 of EAX indicates support for Performance Monitoring Version 2 (PerfMonV2) features. If found to be set during PMU initialization, the EBX bits of the same CPUID function can be used to determine the number of available PMCs for different PMU types. Expose the relevant bits via KVM_GET_SUPPORTED_CPUID so that guests can make use of the PerfMonV2 features. Co-developed-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230603011058.1038821-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-06KVM: x86/pmu: Advertise PERFCTR_CORE iff the min nr of counters is metLike Xu
Enable and advertise PERFCTR_CORE if and only if the minimum number of required counters are available, i.e. if perf says there are less than six general purpose counters. Opportunistically, use kvm_cpu_cap_check_and_set() instead of open coding the check for host support. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Like Xu <likexu@tencent.com> [sean: massage shortlog and changelog] Link: https://lore.kernel.org/r/20230603011058.1038821-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-06KVM: x86: Clean up: remove redundant bool conversionsMichal Luczaj
As test_bit() returns bool, explicitly converting result to bool is unnecessary. Get rid of '!!'. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://lore.kernel.org/r/20230605200158.118109-1-mhal@rbox.co Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-02KVM: SVM: Invoke trace_kvm_exit() for fastpath VM-ExitsSean Christopherson
Move SVM's call to trace_kvm_exit() from the "slow" VM-Exit handler to svm_vcpu_run() so that KVM traces fastpath VM-Exits that re-enter the guest without bouncing through the slow path. This bug is benign in the current code base as KVM doesn't currently support any such exits on SVM. Fixes: a9ab13ff6e84 ("KVM: X86: Improve latency for single target IPI fastpath") Link: https://lore.kernel.org/r/20230602011920.787844-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-02KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASKMaciej S. Szmigiero
While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware I noticed that with vCPU count large enough (> 16) they sometimes froze at boot. With vCPU count of 64 they never booted successfully - suggesting some kind of a race condition. Since adding "vnmi=0" module parameter made these guests boot successfully it was clear that the problem is most likely (v)NMI-related. Running kvm-unit-tests quickly showed failing NMI-related tests cases, like "multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests and the NMI parts of eventinj test. The issue was that once one NMI was being serviced no other NMI was allowed to be set pending (NMI limit = 0), which was traced to svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather than for the "NMI pending" flag. Fix this by testing for the right flag in svm_is_vnmi_pending(). Once this is done, the NMI-related kvm-unit-tests pass successfully and the Windows guest no longer freezes at boot. Fixes: fa4c027a7956 ("KVM: x86: Add support for SVM's Virtual NMI") Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/be4ca192eb0c1e69a210db3009ca984e6a54ae69.1684495380.git.maciej.szmigiero@oracle.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-01KVM: x86: Move common handling of PAT MSR writes to kvm_set_msr_common()Sean Christopherson
Move the common check-and-set handling of PAT MSR writes out of vendor code and into kvm_set_msr_common(). This aligns writes with reads, which are already handled in common code, i.e. makes the handling of reads and writes symmetrical in common code. Alternatively, the common handling in kvm_get_msr_common() could be moved to vendor code, but duplicating code is generally undesirable (even though the duplicatated code is trivial in this case), and guest writes to PAT should be rare, i.e. the overhead of the extra function call is a non-issue in practice. Suggested-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20230511233351.635053-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-01KVM: SVM: Use kvm_pat_valid() directly instead of kvm_mtrr_valid()Ke Guo
Use kvm_pat_valid() directly instead of bouncing through kvm_mtrr_valid(). The PAT is not an MTRR, and kvm_mtrr_valid() just redirects to kvm_pat_valid(), i.e. is exempt from KVM's "zap SPTEs" logic that's needed to honor guest MTRRs when the VM has a passthrough device with non-coherent DMA (KVM does NOT set "ignore guest PAT" in this case, and so enables hardware virtualization of the guest's PAT, i.e. doesn't need to manually emulate the PAT memtype). Signed-off-by: Ke Guo <guoke@uniontech.com> [sean: massage changelog] Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20230511233351.635053-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-01KVM: SVM: Remove TSS reloading code after VMEXITMingwei Zhang
Remove the dedicated post-VMEXIT TSS reloading code now that KVM uses VMLOAD to load host segment state, which includes TSS state. Fixes: e79b91bb3c91 ("KVM: SVM: use vmsave/vmload for saving/restoring additional host state") Reported-by: Venkatesh Srinivas <venkateshs@google.com> Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20230523165635.4002711-1-mizhang@google.com [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-05-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "s390: - More phys_to_virt conversions - Improvement of AP management for VSIE (nested virtualization) ARM64: - Numerous fixes for the pathological lock inversion issue that plagued KVM/arm64 since... forever. - New framework allowing SMCCC-compliant hypercalls to be forwarded to userspace, hopefully paving the way for some more features being moved to VMMs rather than be implemented in the kernel. - Large rework of the timer code to allow a VM-wide offset to be applied to both virtual and physical counters as well as a per-timer, per-vcpu offset that complements the global one. This last part allows the NV timer code to be implemented on top. - A small set of fixes to make sure that we don't change anything affecting the EL1&0 translation regime just after having having taken an exception to EL2 until we have executed a DSB. This ensures that speculative walks started in EL1&0 have completed. - The usual selftest fixes and improvements. x86: - Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled, and by giving the guest control of CR0.WP when EPT is enabled on VMX (VMX-only because SVM doesn't support per-bit controls) - Add CR0/CR4 helpers to query single bits, and clean up related code where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return as a bool - Move AMD_PSFD to cpufeatures.h and purge KVM's definition - Avoid unnecessary writes+flushes when the guest is only adding new PTEs - Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations when emulating invalidations - Clean up the range-based flushing APIs - Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle changed SPTE" overhead associated with writing the entire entry - Track the number of "tail" entries in a pte_list_desc to avoid having to walk (potentially) all descriptors during insertion and deletion, which gets quite expensive if the guest is spamming fork() - Disallow virtualizing legacy LBRs if architectural LBRs are available, the two are mutually exclusive in hardware - Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features - Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES - Apply PMU filters to emulated events and add test coverage to the pmu_event_filter selftest - AMD SVM: - Add support for virtual NMIs - Fixes for edge cases related to virtual interrupts - Intel AMX: - Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is not being reported due to userspace not opting in via prctl() - Fix a bug in emulation of ENCLS in compatibility mode - Allow emulation of NOP and PAUSE for L2 - AMX selftests improvements - Misc cleanups MIPS: - Constify MIPS's internal callbacks (a leftover from the hardware enabling rework that landed in 6.3) Generic: - Drop unnecessary casts from "void *" throughout kvm_main.c - Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct size by 8 bytes on 64-bit kernels by utilizing a padding hole Documentation: - Fix goof introduced by the conversion to rST" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (211 commits) KVM: s390: pci: fix virtual-physical confusion on module unload/load KVM: s390: vsie: clarifications on setting the APCB KVM: s390: interrupt: fix virtual-physical confusion for next alert GISA KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init() KVM: selftests: Test the PMU event "Instructions retired" KVM: selftests: Copy full counter values from guest in PMU event filter test KVM: selftests: Use error codes to signal errors in PMU event filter test KVM: selftests: Print detailed info in PMU event filter asserts KVM: selftests: Add helpers for PMC asserts in PMU event filter test KVM: selftests: Add a common helper for the PMU event filter guest code KVM: selftests: Fix spelling mistake "perrmited" -> "permitted" KVM: arm64: vhe: Drop extra isb() on guest exit KVM: arm64: vhe: Synchronise with page table walker on MMU update KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc() KVM: arm64: nvhe: Synchronise with page table walker on TLBI KVM: arm64: Handle 32bit CNTPCTSS traps KVM: arm64: nvhe: Synchronise with page table walker on vcpu run KVM: arm64: vgic: Don't acquire its_lock before config_lock KVM: selftests: Add test to verify KVM's supported XCR0 ...
2023-04-28Merge tag 'smp-core-2023-04-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP cross-CPU function-call updates from Ingo Molnar: - Remove diagnostics and adjust config for CSD lock diagnostics - Add a generic IPI-sending tracepoint, as currently there's no easy way to instrument IPI origins: it's arch dependent and for some major architectures it's not even consistently available. * tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: trace,smp: Trace all smp_function_call*() invocations trace: Add trace_ipi_send_cpu() sched, smp: Trace smp callback causing an IPI smp: reword smp call IPI comment treewide: Trace IPIs sent via smp_send_reschedule() irq_work: Trace self-IPIs sent via arch_irq_work_raise() smp: Trace IPIs sent via arch_send_call_function_ipi_mask() sched, smp: Trace IPIs sent via send_call_function_single_ipi() trace: Add trace_ipi_send_cpumask() kernel/smp: Make csdlock_debug= resettable locking/csd_lock: Remove per-CPU data indirection from CSD lock debugging locking/csd_lock: Remove added data from CSD lock debugging locking/csd_lock: Add Kconfig option for csd_debug default
2023-04-26Merge tag 'kvm-x86-svm-6.4' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM SVM changes for 6.4: - Add support for virtual NMIs - Fixes for edge cases related to virtual interrupts
2023-04-26Merge tag 'kvm-x86-pmu-6.4' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 PMU changes for 6.4: - Disallow virtualizing legacy LBRs if architectural LBRs are available, the two are mutually exclusive in hardware - Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES) after KVM_RUN, and overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES - Apply PMU filters to emulated events and add test coverage to the pmu_event_filter selftest - Misc cleanups and fixes
2023-04-26Merge tag 'kvm-x86-misc-6.4' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 changes for 6.4: - Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled, and by giving the guest control of CR0.WP when EPT is enabled on VMX (VMX-only because SVM doesn't support per-bit controls) - Add CR0/CR4 helpers to query single bits, and clean up related code where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return as a bool - Move AMD_PSFD to cpufeatures.h and purge KVM's definition - Misc cleanups
2023-04-06KVM: x86: Add macros to track first...last VMX feature MSRsSean Christopherson
Add macros to track the range of VMX feature MSRs that are emulated by KVM to reduce the maintenance cost of extending the set of emulated MSRs. Note, KVM doesn't necessarily emulate all known/consumed VMX MSRs, e.g. PROCBASED_CTLS3 is consumed by KVM to enable IPI virtualization, but is not emulated as KVM doesn't emulate/virtualize IPI virtualization for nested guests. No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230311004618.920745-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: SVM: Return the local "r" variable from svm_set_msr()Sean Christopherson
Rename "r" to "ret" and actually return it from svm_set_msr() to reduce the probability of repeating the mistake of commit 723d5fb0ffe4 ("kvm: svm: Add IA32_FLUSH_CMD guest support"), which set "r" thinking that it would be propagated to the caller. Alternatively, the declaration of "r" could be moved into the handling of MSR_TSC_AUX, but that risks variable shadowing in the future. A wrapper for kvm_set_user_return_msr() would allow eliding a local variable, but that feels like delaying the inevitable. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230322011440.2195485-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-06KVM: x86: Virtualize FLUSH_L1D and passthrough MSR_IA32_FLUSH_CMDSean Christopherson
Virtualize FLUSH_L1D so that the guest can use the performant L1D flush if one of the many mitigations might require a flush in the guest, e.g. Linux provides an option to flush the L1D when switching mms. Passthrough MSR_IA32_FLUSH_CMD for write when it's supported in hardware and exposed to the guest, i.e. always let the guest write it directly if FLUSH_L1D is fully supported. Forward writes to hardware in host context on the off chance that KVM ends up emulating a WRMSR, or in the really unlikely scenario where userspace wants to force a flush. Restrict these forwarded WRMSRs to the known command out of an abundance of caution. Passing through the MSR means the guest can throw any and all values at hardware, but doing so in host context is arguably a bit more dangerous. Link: https://lkml.kernel.org/r/CALMp9eTt3xzAEoQ038bJQ9LN0ZOXrSWsN7xnNUD%2B0SS%3DWwF7Pg%40mail.gmail.com Link: https://lore.kernel.org/all/20230201132905.549148-2-eesposit@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230322011440.2195485-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-06KVM: x86: Move MSR_IA32_PRED_CMD WRMSR emulation to common codeSean Christopherson
Dedup the handling of MSR_IA32_PRED_CMD across VMX and SVM by moving the logic to kvm_set_msr_common(). Now that the MSR interception toggling is handled as part of setting guest CPUID, the VMX and SVM paths are identical. Opportunistically massage the code to make it a wee bit denser. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20230322011440.2195485-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-06KVM: SVM: Passthrough MSR_IA32_PRED_CMD based purely on host+guest CPUIDSean Christopherson
Passthrough MSR_IA32_PRED_CMD based purely on whether or not the MSR is supported and enabled, i.e. don't wait until the first write. There's no benefit to deferred passthrough, and the extra logic only adds complexity. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230322011440.2195485-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-06KVM: x86: Revert MSR_IA32_FLUSH_CMD.FLUSH_L1D enablingSean Christopherson
Revert the recently added virtualizing of MSR_IA32_FLUSH_CMD, as both the VMX and SVM are fatally buggy to guests that use MSR_IA32_FLUSH_CMD or MSR_IA32_PRED_CMD, and because the entire foundation of the logic is flawed. The most immediate problem is an inverted check on @cmd that results in rejecting legal values. SVM doubles down on bugs and drops the error, i.e. silently breaks all guest mitigations based on the command MSRs. The next issue is that neither VMX nor SVM was updated to mark MSR_IA32_FLUSH_CMD as being a possible passthrough MSR, which isn't hugely problematic, but does break MSR filtering and triggers a WARN on VMX designed to catch this exact bug. The foundational issues stem from the MSR_IA32_FLUSH_CMD code reusing logic from MSR_IA32_PRED_CMD, which in turn was likely copied from KVM's support for MSR_IA32_SPEC_CTRL. The copy+paste from MSR_IA32_SPEC_CTRL was misguided as MSR_IA32_PRED_CMD (and MSR_IA32_FLUSH_CMD) is a write-only MSR, i.e. doesn't need the same "deferred passthrough" shenanigans as MSR_IA32_SPEC_CTRL. Revert all MSR_IA32_FLUSH_CMD enabling in one fell swoop so that there is no point where KVM advertises, but does not support, L1D_FLUSH. This reverts commits 45cf86f26148e549c5ba4a8ab32a390e4bde216e, 723d5fb0ffe4c02bd4edf47ea02c02e454719f28, and a807b78ad04b2eaa348f52f5cc7702385b6de1ee. Reported-by: Nathan Chancellor <nathan@kernel.org> Link: https://lkml.kernel.org/r/20230317190432.GA863767%40dev-arch.thelio-3990X Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Message-Id: <20230322011440.2195485-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-27KVM: SVM: Flush Hyper-V TLB when requiredJeremi Piotrowski
The Hyper-V "EnlightenedNptTlb" enlightenment is always enabled when KVM is running on top of Hyper-V and Hyper-V exposes support for it (which is always). On AMD CPUs this enlightenment results in ASID invalidations not flushing TLB entries derived from the NPT. To force the underlying (L0) hypervisor to rebuild its shadow page tables, an explicit hypercall is needed. The original KVM implementation of Hyper-V's "EnlightenedNptTlb" on SVM only added remote TLB flush hooks. This worked out fine for a while, as sufficient remote TLB flushes where being issued in KVM to mask the problem. Since v5.17, changes in the TDP code reduced the number of flushes and the out-of-sync TLB prevents guests from booting successfully. Split svm_flush_tlb_current() into separate callbacks for the 3 cases (guest/all/current), and issue the required Hyper-V hypercall when a Hyper-V TLB flush is needed. The most important case where the TLB flush was missing is when loading a new PGD, which is followed by what is now svm_flush_tlb_current(). Cc: stable@vger.kernel.org # v5.17+ Fixes: 1e0c7d40758b ("KVM: SVM: hyper-v: Remote TLB flush for SVM") Link: https://lore.kernel.org/lkml/43980946-7bbf-dcef-7e40-af904c456250@linux.microsoft.com/ Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20230324145233.4585-1-jpiotrowski@linux.microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-24treewide: Trace IPIs sent via smp_send_reschedule()Valentin Schneider
To be able to trace invocations of smp_send_reschedule(), rename the arch-specific definitions of it to arch_smp_send_reschedule() and wrap it into an smp_send_reschedule() that contains a tracepoint. Changes to include the declaration of the tracepoint were driven by the following coccinelle script: @func_use@ @@ smp_send_reschedule(...); @include@ @@ #include <trace/events/ipi.h> @no_include depends on func_use && !include@ @@ #include <...> + + #include <trace/events/ipi.h> [csky bits] [riscv bits] Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20230307143558.294354-6-vschneid@redhat.com
2023-03-22KVM: nSVM: Implement support for nested VNMISantosh Shukla
Allow L1 to use vNMI to accelerate its injection of NMI to L2 by propagating vNMI int_ctl bits from/to vmcb12 to/from vmcb02. To handle both the case where vNMI is enabled for L1 and L2, and where vNMI is enabled for L1 but _not_ L2, move pending L1 vNMIs to nmi_pending on nested VM-Entry and raise KVM_REQ_EVENT, i.e. rely on existing code to route the NMI to the correct domain. On nested VM-Exit, reverse the process and set/clear V_NMI_PENDING for L1 based one whether nmi_pending is zero or non-zero. There is no need to consider vmcb02 in this case, as V_NMI_PENDING can be set in vmcb02 if vNMI is disabled for L2, and if vNMI is enabled for L2, then L1 and L2 have different NMI contexts. Co-developed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com> Link: https://lore.kernel.org/r/20230227084016.3368-12-santosh.shukla@amd.com [sean: massage changelog to match the code] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-22KVM: x86: Add support for SVM's Virtual NMISantosh Shukla
Add support for SVM's Virtual NMIs implementation, which adds proper tracking of virtual NMI blocking, and an intr_ctrl flag that software can set to mark a virtual NMI as pending. Pending virtual NMIs are serviced by hardware if/when virtual NMIs become unblocked, i.e. act more or less like real NMIs. Introduce two new kvm_x86_ops callbacks so to support SVM's vNMI, as KVM needs to treat a pending vNMI as partially injected. Specifically, if two NMIs (for L1) arrive concurrently in KVM's software model, KVM's ABI is to inject one and pend the other. Without vNMI, KVM manually tracks the pending NMI and uses NMI windows to detect when the NMI should be injected. With vNMI, the pending NMI is simply stuffed into the VMCB and handed off to hardware. This means that KVM needs to be able to set a vNMI pending on-demand, and also query if a vNMI is pending, e.g. to honor the "at most one NMI pending" rule and to preserve all NMIs across save and restore. Warn if KVM attempts to open an NMI window when vNMI is fully enabled, as the above logic should prevent KVM from ever getting to kvm_check_and_inject_events() with two NMIs pending _in software_, and the "at most one NMI pending" logic should prevent having an NMI pending in hardware and an NMI pending in software if NMIs are also blocked, i.e. if KVM can't immediately inject the second NMI. Signed-off-by: Santosh Shukla <Santosh.Shukla@amd.com> Co-developed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20230227084016.3368-11-santosh.shukla@amd.com [sean: rewrite shortlog and changelog, massage code comments] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-22KVM: SVM: add wrappers to enable/disable IRET interceptionMaxim Levitsky
SEV-ES guests don't use IRET interception for the detection of an end of a NMI. Therefore it makes sense to create a wrapper to avoid repeating the check for the SEV-ES. No functional change is intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> [Renamed iret intercept API of style svm_{clr,set}_iret_intercept()] Signed-off-by: Santosh Shukla <Santosh.Shukla@amd.com> Link: https://lore.kernel.org/r/20230227084016.3368-5-santosh.shukla@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-22KVM: nSVM: Disable intercept of VINTR if saved L1 host RFLAGS.IF is 0Santosh Shukla
Disable intercept of virtual interrupts (used to detect interrupt windows) if the saved host (L1) RFLAGS.IF is '0', as the effective RFLAGS.IF for L1 interrupts will never be set while L2 is running (L2's RFLAGS.IF doesn't affect L1 IRQs when virtual interrupts are enabled). Suggested-by: Sean Christopherson <seanjc@google.com> Link: https://lkml.kernel.org/r/Y9hybI65So5X2LFg%40google.com Signed-off-by: Santosh Shukla <Santosh.Shukla@amd.com> Link: https://lore.kernel.org/r/20230227084016.3368-3-santosh.shukla@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-22KVM: SVM: Use kvm_is_cr4_bit_set() to query SMAP/SMEP in "can emulate"Binbin Wu
Use kvm_is_cr4_bit_set() to query SMAP and SMEP when determining whether or not AMD's SMAP+SEV errata prevents KVM from emulating an instruction. This eliminates an implicit cast from ulong to bool and makes the code slightly more readable. Note, any overhead from making multiple calls to kvm_read_cr4_bits() is negligible, not to mention the code is question is encountered only in rare situations, i.e. is not a remotely hot path. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20230322045824.22970-4-binbin.wu@linux.intel.com [sean: keep local smap/smep variables, massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-22KVM: x86: Use boolean return value for is_{pae,pse,paging}()Binbin Wu
Convert is_{pae,pse,paging}() to use kvm_is_cr{0,4}_bit_set() and return bools. Returning an "int" requires not one, but two implicit casts, first from "unsigned long" to "int", and then again to a "bool". Both casts are more than a bit dangerous; the ulong=>int casts would drop a bit on 64-bit kernels _if_ the bits in question weren't in the lower 32 bits, and the int=>bool cast can result in false negatives/positives, e.g. see commit 0c928ff26bd6 ("KVM: SVM: Fix benign "bool vs. int" comparison in svm_set_cr0()"). Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20230322045824.22970-3-binbin.wu@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-22KVM: SVM: Fix benign "bool vs. int" comparison in svm_set_cr0()Sean Christopherson
Explicitly convert the return from is_paging() to a bool when comparing against old_paging, which is also a boolean. is_paging() sneakily uses kvm_read_cr0_bits() and returns an int, i.e. returns X86_CR0_PG or 0, not 1 or 0. Luckily, the bug is benign as it only results in a false positive, not a false negative, i.e. only causes a spurious refresh of CR4 when paging is enabled in both the old and new. Cc: Maxim Levitsky <mlevitsk@redhat.com> Fixes: c53bbe2145f5 ("KVM: x86: SVM: don't passthrough SMAP/SMEP/PKE bits in !NPT && !gCR0.PG case") Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16kvm: svm: Add IA32_FLUSH_CMD guest supportEmanuele Giuseppe Esposito
Expose IA32_FLUSH_CMD to the guest if the guest CPUID enumerates support for this MSR. As with IA32_PRED_CMD, permission for unintercepted writes to this MSR will be granted to the guest after the first non-zero write. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20230201132905.549148-3-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Provide a virtual cache topology to the guest to avoid inconsistencies with migration on heterogenous systems. Non secure software has no practical need to traverse the caches by set/way in the first place - Add support for taking stage-2 access faults in parallel. This was an accidental omission in the original parallel faults implementation, but should provide a marginal improvement to machines w/o FEAT_HAFDBS (such as hardware from the fruit company) - A preamble to adding support for nested virtualization to KVM, including vEL2 register state, rudimentary nested exception handling and masking unsupported features for nested guests - Fixes to the PSCI relay that avoid an unexpected host SVE trap when resuming a CPU when running pKVM - VGIC maintenance interrupt support for the AIC - Improvements to the arch timer emulation, primarily aimed at reducing the trap overhead of running nested - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the interest of CI systems - Avoid VM-wide stop-the-world operations when a vCPU accesses its own redistributor - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions in the host - Aesthetic and comment/kerneldoc fixes - Drop the vestiges of the old Columbia mailing list and add [Oliver] as co-maintainer RISC-V: - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE - Correctly place the guest in S-mode after redirecting a trap to the guest - Redirect illegal instruction traps to guest - SBI PMU support for guest s390: - Sort out confusion between virtual and physical addresses, which currently are the same on s390 - A new ioctl that performs cmpxchg on guest memory - A few fixes x86: - Change tdp_mmu to a read-only parameter - Separate TDP and shadow MMU page fault paths - Enable Hyper-V invariant TSC control - Fix a variety of APICv and AVIC bugs, some of them real-world, some of them affecting architecurally legal but unlikely to happen in practice - Mark APIC timer as expired if its in one-shot mode and the count underflows while the vCPU task was being migrated - Advertise support for Intel's new fast REP string features - Fix a double-shootdown issue in the emergency reboot code - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM similar treatment to VMX - Update Xen's TSC info CPUID sub-leaves as appropriate - Add support for Hyper-V's extended hypercalls, where "support" at this point is just forwarding the hypercalls to userspace - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and MSR filters - One-off fixes and cleanups - Fix and cleanup the range-based TLB flushing code, used when KVM is running on Hyper-V - Add support for filtering PMU events using a mask. If userspace wants to restrict heavily what events the guest can use, it can now do so without needing an absurd number of filter entries - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU support is disabled - Add PEBS support for Intel Sapphire Rapids - Fix a mostly benign overflow bug in SEV's send|receive_update_data() - Move several SVM-specific flags into vcpu_svm x86 Intel: - Handle NMI VM-Exits before leaving the noinstr region - A few trivial cleanups in the VM-Enter flows - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support EPTP switching (or any other VM function) for L1 - Fix a crash when using eVMCS's enlighted MSR bitmaps Generic: - Clean up the hardware enable and initialization flow, which was scattered around multiple arch-specific hooks. Instead, just let the arch code call into generic code. Both x86 and ARM should benefit from not having to fight common KVM code's notion of how to do initialization - Account allocations in generic kvm_arch_alloc_vm() - Fix a memory leak if coalesced MMIO unregistration fails selftests: - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit the correct hypercall instruction instead of relying on KVM to patch in VMMCALL - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits) KVM: SVM: hyper-v: placate modpost section mismatch error KVM: x86/mmu: Make tdp_mmu_allowed static KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes KVM: arm64: nv: Filter out unsupported features from ID regs KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 KVM: arm64: nv: Allow a sysreg to be hidden from userspace only KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2 KVM: arm64: nv: Handle SMCs taken from virtual EL2 KVM: arm64: nv: Handle trapped ERET from virtual EL2 KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 KVM: arm64: nv: Support virtual EL2 exceptions KVM: arm64: nv: Handle HCR_EL2.NV system register traps KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state KVM: arm64: nv: Add EL2 system registers to vcpu context KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set KVM: arm64: nv: Introduce nested virtualization VCPU feature KVM: arm64: Use the S2 MMU context to iterate over S2 table ...
2023-02-15Merge tag 'kvm-x86-svm-6.3' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM SVM changes for 6.3: - Fix a mostly benign overflow bug in SEV's send|receive_update_data() - Move the SVM-specific "host flags" into vcpu_svm (extracted from the vNMI enabling series) - A handful for fixes and cleanups
2023-01-31KVM: x86: Move HF_NMI_MASK and HF_IRET_MASK into "struct vcpu_svm"Maxim Levitsky
Move HF_NMI_MASK and HF_IRET_MASK (a.k.a. "waiting for IRET") out of the common "hflags" and into dedicated flags in "struct vcpu_svm". The flags are used only for the SVM and thus should not be in hflags. Tracking NMI masking in software isn't SVM specific, e.g. VMX has a similar flag (soft_vnmi_blocked), but that's much more of a hack as VMX can't intercept IRET, is useful only for ancient CPUs, i.e. will hopefully be removed at some point, and again the exact behavior is vendor specific and shouldn't ever be referenced in common code. converting VMX No functional change is intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Santosh Shukla <Santosh.Shukla@amd.com> Link: https://lore.kernel.org/r/20221129193717.513824-5-mlevitsk@redhat.com [sean: split from HF_GIF_MASK patch] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-26KVM: x86/pmu: Gate all "unimplemented MSR" prints on report_ignored_msrsSean Christopherson
Add helpers to print unimplemented MSR accesses and condition all such prints on report_ignored_msrs, i.e. honor userspace's request to not print unimplemented MSRs. Even though vcpu_unimpl() is ratelimited, printing can still be problematic, e.g. if a print gets stalled when host userspace is writing MSRs during live migration, an effective stall can result in very noticeable disruption in the guest. E.g. the profile below was taken while calling KVM_SET_MSRS on the PMU counters while the PMU was disabled in KVM. - 99.75% 0.00% [.] __ioctl - __ioctl - 99.74% entry_SYSCALL_64_after_hwframe do_syscall_64 sys_ioctl - do_vfs_ioctl - 92.48% kvm_vcpu_ioctl - kvm_arch_vcpu_ioctl - 85.12% kvm_set_msr_ignored_check svm_set_msr kvm_set_msr_common printk vprintk_func vprintk_default vprintk_emit console_unlock call_console_drivers univ8250_console_write serial8250_console_write uart_console_write Reported-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20230124234905.3774678-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-25KVM: x86: Propagate the AMD Automatic IBRS feature to the guestKim Phillips
Add the AMD Automatic IBRS feature bit to those being propagated to the guest, and enable the guest EFER bit. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230124163319.2277355-9-kim.phillips@amd.com