summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/svm/svm.c
AgeCommit message (Collapse)Author
2020-06-11x86/entry: Convert Machine Check to IDTENTRY_ISTThomas Gleixner
Convert #MC to IDTENTRY_MCE: - Implement the C entry points with DEFINE_IDTENTRY_MCE - Emit the ASM stub with DECLARE_IDTENTRY_MCE - Remove the ASM idtentry in 64bit - Remove the open coded ASM entry code in 32bit - Fixup the XEN/PV code - Remove the old prototypes - Remove the error code from *machine_check_vector() as it is always 0 and not used by any of the functions it can point to. Fixup all the functions as well. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200505135314.334980426@linutronix.de
2020-06-08KVM: SVM: fix calls to is_interceptPaolo Bonzini
is_intercept takes an INTERCEPT_* constant, not SVM_EXIT_*; because of this, the compiler was removing the body of the conditionals, as if is_intercept returned 0. This unveils a latent bug: when clearing the VINTR intercept, int_ctl must also be changed in the L1 VMCB (svm->nested.hsave), just like the intercept itself is also changed in the L1 VMCB. Otherwise V_IRQ remains set and, due to the VINTR intercept being clear, we get a spurious injection of a vector 0 interrupt on the next L2->L1 vmexit. Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01KVM: x86: extend struct kvm_vcpu_pv_apf_data with token infoVitaly Kuznetsov
Currently, APF mechanism relies on the #PF abuse where the token is being passed through CR2. If we switch to using interrupts to deliver page-ready notifications we need a different way to pass the data. Extent the existing 'struct kvm_vcpu_pv_apf_data' with token information for page-ready notifications. While on it, rename 'reason' to 'flags'. This doesn't change the semantics as we only have reasons '1' and '2' and these can be treated as bit flags but KVM_PV_REASON_PAGE_READY is going away with interrupt based delivery making 'reason' name misleading. The newly introduced apf_put_user_ready() temporary puts both flags and token information, this will be changed to put token only when we switch to interrupt based notifications. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200525144125.143875-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATEPaolo Bonzini
Similar to VMX, the state that is captured through the currently available IOCTLs is a mix of L1 and L2 state, dependent on whether the L2 guest was running at the moment when the process was interrupted to save its state. In particular, the SVM-specific state for nested virtualization includes the L1 saved state (including the interrupt flag), the cached L2 controls, and the GIF. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01KVM: nSVM: leave guest mode when clearing EFER.SVMEPaolo Bonzini
According to the AMD manual, the effect of turning off EFER.SVME while a guest is running is undefined. We make it leave guest mode immediately, similar to the effect of clearing the VMX bit in MSR_IA32_FEAT_CTL. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01KVM: nSVM: remove HF_HIF_MASKPaolo Bonzini
The L1 flags can be found in the save area of svm->nested.hsave, fish it from there so that there is one fewer thing to migrate. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01KVM: nSVM: remove HF_VINTR_MASKPaolo Bonzini
Now that the int_ctl field is stored in svm->nested.ctl.int_ctl, we can use it instead of vcpu->arch.hflags to check whether L2 is running in V_INTR_MASKING mode. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01KVM: nSVM: synthesize correct EXITINTINFO on vmexitPaolo Bonzini
This bit was added to nested VMX right when nested_run_pending was introduced, but it is not yet there in nSVM. Since we can have pending events that L0 injected directly into L2 on vmentry, we have to transfer them into L1's queue. For this to work, one important change is required: svm_complete_interrupts (which clears the "injected" fields from the previous VMRUN, and updates them from svm->vmcb's EXITINTINFO) must be placed before we inject the vmexit. This is not too scary though; VMX even does it in vmx_vcpu_run. While at it, the nested_vmexit_inject tracepoint is moved towards the end of nested_svm_vmexit. This ensures that the synthesized EXITINTINFO is visible in the trace. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01KVM: nSVM: extract svm_set_gifPaolo Bonzini
Extract the code that is needed to implement CLGI and STGI, so that we can run it from VMRUN and vmexit (and in the future, KVM_SET_NESTED_STATE). Skip the request for KVM_REQ_EVENT unless needed, subsuming the evaluate_pending_interrupts optimization that is found in enter_svm_guest_mode. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01KVM: nSVM: remove unnecessary ifPaolo Bonzini
kvm_vcpu_apicv_active must be false when nested virtualization is enabled, so there is no need to check it in clgi_interception. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01KVM: nSVM: synchronize VMCB controls updated by the processor on every vmexitPaolo Bonzini
The control state changes on every L2->L0 vmexit, and we will have to serialize it in the nested state. So keep it up to date in svm->nested.ctl and just copy them back to the nested VMCB in nested_svm_vmexit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01KVM: nSVM: restore clobbered INT_CTL fields after clearing VINTRPaolo Bonzini
Restore the INT_CTL value from the guest's VMCB once we've stopped using it, so that virtual interrupts can be injected as requested by L1. V_TPR is up-to-date however, and it can change if the guest writes to CR8, so keep it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01KVM: nSVM: save all control fields in svm->nestedPaolo Bonzini
In preparation for nested SVM save/restore, store all data that matters from the VMCB control area into svm->nested. It will then become part of the nested SVM state that is saved by KVM_SET_NESTED_STATE and restored by KVM_GET_NESTED_STATE, just like the cached vmcs12 for nVMX. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01KVM: nSVM: move map argument out of enter_svm_guest_modePaolo Bonzini
Unmapping the nested VMCB in enter_svm_guest_mode is a bit of a wart, since the map argument is not used elsewhere in the function. There are just two callers, and those are also the place where kvm_vcpu_map is called, so it is cleaner to unmap there. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-28KVM: SVM: always update CR3 in VMCBPaolo Bonzini
svm_load_mmu_pgd is delaying the write of GUEST_CR3 to prepare_vmcs02 as an optimization, but this is only correct before the nested vmentry. If userspace is modifying CR3 with KVM_SET_SREGS after the VM has already been put in guest mode, the value of CR3 will not be updated. Remove the optimization, which almost never triggers anyway. This was was added in commit 689f3bf21628 ("KVM: x86: unify callbacks to load paging root", 2020-03-16) just to keep the two vendor-specific modules closer, but we'll fix VMX too. Fixes: 689f3bf21628 ("KVM: x86: unify callbacks to load paging root") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-28KVM: nSVM: remove exit_requiredPaolo Bonzini
All events now inject vmexits before vmentry rather than after vmexit. Therefore, exit_required is not set anymore and we can remove it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-28KVM: nSVM: inject exceptions via svm_check_nested_eventsPaolo Bonzini
This allows exceptions injected by the emulator to be properly delivered as vmexits. The code also becomes simpler, because we can just let all L0-intercepted exceptions go through the usual path. In particular, our emulation of the VMX #DB exit qualification is very much simplified, because the vmexit injection path can use kvm_deliver_exception_payload to update DR6. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-28KVM: x86: enable event window in inject_pending_eventPaolo Bonzini
In case an interrupt arrives after nested.check_events but before the call to kvm_cpu_has_injectable_intr, we could end up enabling the interrupt window even if the interrupt is actually going to be a vmexit. This is useless rather than harmful, but it really complicates reasoning about SVM's handling of the VINTR intercept. We'd like to never bother with the VINTR intercept if V_INTR_MASKING=1 && INTERCEPT_INTR=1, because in that case there is no interrupt window and we can just exit the nested guest whenever we want. This patch moves the opening of the interrupt window inside inject_pending_event. This consolidates the check for pending interrupt/NMI/SMI in one place, and makes KVM's usage of immediate exits more consistent, extending it beyond just nested virtualization. There are two functional changes here. They only affect corner cases, but overall they simplify the inject_pending_event. - re-injection of still-pending events will also use req_immediate_exit instead of using interrupt-window intercepts. This should have no impact on performance on Intel since it simply replaces an interrupt-window or NMI-window exit for a preemption-timer exit. On AMD, which has no equivalent of the preemption time, it may incur some overhead but an actual effect on performance should only be visible in pathological cases. - kvm_arch_interrupt_allowed and kvm_vcpu_has_events will return true if an interrupt, NMI or SMI is blocked by nested_run_pending. This makes sense because entering the VM will allow it to make progress and deliver the event. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-27KVM: x86: Take an unsigned 32-bit int for has_emulated_msr()'s indexSean Christopherson
Take a u32 for the index in has_emulated_msr() to match hardware, which treats MSR indices as unsigned 32-bit values. Functionally, taking a signed int doesn't cause problems with the current code base, but could theoretically cause problems with 32-bit KVM, e.g. if the index were checked via a less-than statement, which would evaluate incorrectly for MSR indices with bit 31 set. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200218234012.7110-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-27Merge branch 'kvm-master' into HEADPaolo Bonzini
Merge AMD fixes before doing more development work.
2020-05-27KVM: x86: simplify is_mmio_sptePaolo Bonzini
We can simply look at bits 52-53 to identify MMIO entries in KVM's page tables. Therefore, there is no need to pass a mask to kvm_mmu_set_mmio_spte_mask. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15KVM: SVM: Remove unnecessary V_IRQ unsettingSuravee Suthikulpanit
This has already been handled in the prior call to svm_clear_vintr(). Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <1588771076-73790-5-git-send-email-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15KVM: SVM: Merge svm_enable_vintr into svm_set_vintrSuravee Suthikulpanit
Code clean up and remove unnecessary intercept check for INTERCEPT_VINTR. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <1588771076-73790-4-git-send-email-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15KVM: X86: Introduce more exit_fastpath_completion enum valuesWanpeng Li
Adds a fastpath_t typedef since enum lines are a bit long, and replace EXIT_FASTPATH_SKIP_EMUL_INS with two new exit_fastpath_completion enum values. - EXIT_FASTPATH_EXIT_HANDLED kvm will still go through it's full run loop, but it would skip invoking the exit handler. - EXIT_FASTPATH_REENTER_GUEST complete fastpath, guest can be re-entered without invoking the exit handler or going back to vcpu_run Tested-by: Haiwei Li <lihaiwei@tencent.com> Cc: Haiwei Li <lihaiwei@tencent.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1588055009-12677-4-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15KVM: x86/mmu: Drop KVM's hugepage enums in favor of the kernel's enumsSean Christopherson
Replace KVM's PT_PAGE_TABLE_LEVEL, PT_DIRECTORY_LEVEL and PT_PDPE_LEVEL with the kernel's PG_LEVEL_4K, PG_LEVEL_2M and PG_LEVEL_1G. KVM's enums are borderline impossible to remember and result in code that is visually difficult to audit, e.g. if (!enable_ept) ept_lpage_level = 0; else if (cpu_has_vmx_ept_1g_page()) ept_lpage_level = PT_PDPE_LEVEL; else if (cpu_has_vmx_ept_2m_page()) ept_lpage_level = PT_DIRECTORY_LEVEL; else ept_lpage_level = PT_PAGE_TABLE_LEVEL; versus if (!enable_ept) ept_lpage_level = 0; else if (cpu_has_vmx_ept_1g_page()) ept_lpage_level = PG_LEVEL_1G; else if (cpu_has_vmx_ept_2m_page()) ept_lpage_level = PG_LEVEL_2M; else ept_lpage_level = PG_LEVEL_4K; No functional change intended. Suggested-by: Barret Rhoden <brho@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200428005422.4235-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: VMX: Add proper cache tracking for CR0Sean Christopherson
Move CR0 caching into the standard register caching mechanism in order to take advantage of the availability checks provided by regs_avail. This avoids multiple VMREADs in the (uncommon) case where kvm_read_cr0() is called multiple times in a single VM-Exit, and more importantly eliminates a kvm_x86_ops hook, saves a retpoline on SVM when reading CR0, and squashes the confusing naming discrepancy of "cache_reg" vs. "decache_cr0_guest_bits". No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200502043234.12481-8-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: VMX: Add proper cache tracking for CR4Sean Christopherson
Move CR4 caching into the standard register caching mechanism in order to take advantage of the availability checks provided by regs_avail. This avoids multiple VMREADs and retpolines (when configured) during nested VMX transitions as kvm_read_cr4_bits() is invoked multiple times on each transition, e.g. when stuffing CR0 and CR3. As an added bonus, this eliminates a kvm_x86_ops hook, saves a retpoline on SVM when reading CR4, and squashes the confusing naming discrepancy of "cache_reg" vs. "decache_cr4_guest_bits". No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200502043234.12481-7-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: x86: Save L1 TSC offset in 'struct kvm_vcpu_arch'Sean Christopherson
Save L1's TSC offset in 'struct kvm_vcpu_arch' and drop the kvm_x86_ops hook read_l1_tsc_offset(). This avoids a retpoline (when configured) when reading L1's effective TSC, which is done at least once on every VM-Exit. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200502043234.12481-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: x86: handle wrap around 32-bit address spacePaolo Bonzini
KVM is not handling the case where EIP wraps around the 32-bit address space (that is, outside long mode). This is needed both in vmx.c and in emulate.c. SVM with NRIPS is okay, but it can still print an error to dmesg due to integer overflow. Reported-by: Nick Peterson <everdox@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: x86: Replace late check_nested_events() hack with more precise fixPaolo Bonzini
Add an argument to interrupt_allowed and nmi_allowed, to checking if interrupt injection is blocked. Use the hook to handle the case where an interrupt arrives between check_nested_events() and the injection logic. Drop the retry of check_nested_events() that hack-a-fixed the same condition. Blocking injection is also a bit of a hack, e.g. KVM should do exiting and non-exiting interrupt processing in a single pass, but it's a more precise hack. The old comment is also misleading, e.g. KVM_REQ_EVENT is purely an optimization, setting it on every run loop (which KVM doesn't do) should not affect functionality, only performance. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200423022550.15113-13-sean.j.christopherson@intel.com> [Extend to SVM, add SMI and NMI. Even though NMI and SMI cannot come asynchronously right now, making the fix generic is easy and removes a special case. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: nSVM: Report interrupts as allowed when in L2 and exit-on-interrupt is setPaolo Bonzini
Report interrupts as allowed when the vCPU is in L2 and L2 is being run with exit-on-interrupts enabled and EFLAGS.IF=1 (either on the host or on the guest according to VINTR). Interrupts are always unblocked from L1's perspective in this case. While moving nested_exit_on_intr to svm.h, use INTERCEPT_INTR properly instead of assuming it's zero (which it is of course). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: SVM: Split out architectural interrupt/NMI/SMI blocking checksPaolo Bonzini
Move the architectural (non-KVM specific) interrupt/NMI/SMI blocking checks to a separate helper so that they can be used in a future patch by svm_check_nested_events(). No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: nSVM: Move SMI vmexit handling to svm_check_nested_events()Paolo Bonzini
Unlike VMX, SVM allows a hypervisor to take a SMI vmexit without having any special SMM-monitor enablement sequence. Therefore, it has to be handled like interrupts and NMIs. Check for an unblocked SMI in svm_check_nested_events() so that pending SMIs are correctly prioritized over IRQs and NMIs when the latter events will trigger VM-Exit. Note that there is no need to test explicitly for SMI vmexits, because guests always runs outside SMM and therefore can never get an SMI while they are blocked. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: nSVM: Report NMIs as allowed when in L2 and Exit-on-NMI is setPaolo Bonzini
Report NMIs as allowed when the vCPU is in L2 and L2 is being run with Exit-on-NMI enabled, as NMIs are always unblocked from L1's perspective in this case. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: x86: replace is_smm checks with kvm_x86_ops.smi_allowedPaolo Bonzini
Do not hardcode is_smm so that all the architectural conditions for blocking SMIs are listed in a single place. Well, in two places because this introduces some code duplication between Intel and AMD. This ensures that nested SVM obeys GIF in kvm_vcpu_has_events. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: x86: Make return for {interrupt_nmi,smi}_allowed() a bool instead of intSean Christopherson
Return an actual bool for kvm_x86_ops' {interrupt_nmi}_allowed() hook to better reflect the return semantics, and to avoid creating an even bigger mess when the related VMX code is refactored in upcoming patches. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200423022550.15113-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: SVM: Implement check_nested_events for NMICathy Avery
Migrate nested guest NMI intercept processing to new check_nested_events. Signed-off-by: Cathy Avery <cavery@redhat.com> Message-Id: <20200414201107.22952-2-cavery@redhat.com> [Reorder clauses as NMIs have higher priority than IRQs; inject immediate vmexit as is now done for IRQ vmexits. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: SVM: introduce nested_run_pendingPaolo Bonzini
We want to inject vmexits immediately from svm_check_nested_events, so that the interrupt/NMI window requests happen in inject_pending_event right after it returns. This however has the same issue as in vmx_check_nested_events, so introduce a nested_run_pending flag with the exact same purpose of delaying vmexit injection after the vmentry. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13Merge branch 'kvm-amd-fixes' into HEADPaolo Bonzini
2020-05-08KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6Paolo Bonzini
There are two issues with KVM_EXIT_DEBUG on AMD, whose root cause is the different handling of DR6 on intercepted #DB exceptions on Intel and AMD. On Intel, #DB exceptions transmit the DR6 value via the exit qualification field of the VMCS, and the exit qualification only contains the description of the precise event that caused a vmexit. On AMD, instead the DR6 field of the VMCB is filled in as if the #DB exception was to be injected into the guest. This has two effects when guest debugging is in use: * the guest DR6 is clobbered * the kvm_run->debug.arch.dr6 field can accumulate more debug events, rather than just the last one that happened (the testcase in the next patch covers this issue). This patch fixes both issues by emulating, so to speak, the Intel behavior on AMD processors. The important observation is that (after the previous patches) the VMCB value of DR6 is only ever observable from the guest is KVM_DEBUGREG_WONT_EXIT is set. Therefore we can actually set vmcb->save.dr6 to any value we want as long as KVM_DEBUGREG_WONT_EXIT is clear, which it will be if guest debugging is enabled. Therefore it is possible to enter the guest with an all-zero DR6, reconstruct the #DB payload from the DR6 we get at exit time, and let kvm_deliver_exception_payload move the newly set bits into vcpu->arch.dr6. Some extra bits may be included in the payload if KVM_DEBUGREG_WONT_EXIT is set, but this is harmless. This may not be the most optimized way to deal with this, but it is simple and, being confined within SVM code, it gets rid of the set_dr6 callback and kvm_update_dr6. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6Paolo Bonzini
kvm_x86_ops.set_dr6 is only ever called with vcpu->arch.dr6 as the second argument. Ensure that the VMCB value is synchronized to vcpu->arch.dr6 on #DB (both "normal" and nested) and nested vmentry, so that the current value of DR6 is always available in vcpu->arch.dr6. The get_dr6 callback can just access vcpu->arch.dr6 and becomes redundant. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-04KVM: SVM: fill in kvm_run->debug.arch.dr[67]Paolo Bonzini
The corresponding code was added for VMX in commit 42dbaa5a057 ("KVM: x86: Virtualize debug registers, 2008-12-15) but never for AMD. Fix this. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-23KVM: x86: move nested-related kvm_x86_ops to a separate structPaolo Bonzini
Clean up some of the patching of kvm_x86_ops, by moving kvm_x86_ops related to nested virtualization into a separate struct. As a result, these ops will always be non-NULL on VMX. This is not a problem: * check_nested_events is only called if is_guest_mode(vcpu) returns true * get_nested_state treats VMXOFF state the same as nested being disabled * set_nested_state fails if you attempt to set nested state while nesting is disabled * nested_enable_evmcs could already be called on a CPU without VMX enabled in CPUID. * nested_get_evmcs_version was fixed in the previous patch Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21KVM: SVM: avoid infinite loop on NPF from bad addressPaolo Bonzini
When a nested page fault is taken from an address that does not have a memslot associated to it, kvm_mmu_do_page_fault returns RET_PF_EMULATE (via mmu_set_spte) and kvm_mmu_page_fault then invokes svm_need_emulation_on_page_fault. The default answer there is to return false, but in this case this just causes the page fault to be retried ad libitum. Since this is not a fast path, and the only other case where it is taken is an erratum, just stick a kvm_vcpu_gfn_to_memslot check in there to detect the common case where the erratum is not happening. This fixes an infinite loop in the new set_memory_region_test. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21KVM: X86: Improve latency for single target IPI fastpathWanpeng Li
IPI and Timer cause the main MSRs write vmexits in cloud environment observation, let's optimize virtual IPI latency more aggressively to inject target IPI as soon as possible. Running kvm-unit-tests/vmexit.flat IPI testing on SKX server, disable adaptive advance lapic timer and adaptive halt-polling to avoid the interference, this patch can give another 7% improvement. w/o fastpath -> x86.c fastpath 4238 -> 3543 16.4% x86.c fastpath -> vmx.c fastpath 3543 -> 3293 7% w/o fastpath -> vmx.c fastpath 4238 -> 3293 22.3% Cc: Haiwei Li <lihaiwei@tencent.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200410174703.1138-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21KVM: SVM: Use do_machine_check to pass MCE to the hostUros Bizjak
Use do_machine_check instead of INT $12 to pass MCE to the host, the same approach VMX uses. On a related note, there is no reason to limit the use of do_machine_check to 64 bit targets, as is currently done for VMX. MCE handling works for both target families. The patch is only compile tested, for both, 64 and 32 bit targets, someone should test the passing of the exception by injecting some MCEs into the guest. For future non-RFC patch, kvm_machine_check should be moved to some appropriate header file. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200411153627.3474710-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21KVM: x86: Introduce KVM_REQ_TLB_FLUSH_CURRENT to flush current ASIDSean Christopherson
Add KVM_REQ_TLB_FLUSH_CURRENT to allow optimized TLB flushing of VMX's EPTP/VPID contexts[*] from the KVM MMU and/or in a deferred manner, e.g. to flush L2's context during nested VM-Enter. Convert KVM_REQ_TLB_FLUSH to KVM_REQ_TLB_FLUSH_CURRENT in flows where the flush is directly associated with vCPU-scoped instruction emulation, i.e. MOV CR3 and INVPCID. Add a comment in vmx_vcpu_load_vmcs() above its KVM_REQ_TLB_FLUSH to make it clear that it deliberately requests a flush of all contexts. Service any pending flush request on nested VM-Exit as it's possible a nested VM-Exit could occur after requesting a flush for L2. Add the same logic for nested VM-Enter even though it's _extremely_ unlikely for flush to be pending on nested VM-Enter, but theoretically possible (in the future) due to RSM (SMM) emulation. [*] Intel also has an Address Space Identifier (ASID) concept, e.g. EPTP+VPID+PCID == ASID, it's just not documented in the SDM because the rules of invalidation are different based on which piece of the ASID is being changed, i.e. whether the EPTP, VPID, or PCID context must be invalidated. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-25-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21KVM: x86: Rename ->tlb_flush() to ->tlb_flush_all()Sean Christopherson
Rename ->tlb_flush() to ->tlb_flush_all() in preparation for adding a new hook to flush only the current ASID/context. Opportunstically replace the comment in vmx_flush_tlb() that explains why it flushes all EPTP/VPID contexts with a comment explaining why it unconditionally uses INVEPT when EPT is enabled. I.e. rely on the "all" part of the name to clarify why it does global INVEPT/INVVPID. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-23-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21KVM: SVM: Document the ASID logic in svm_flush_tlb()Sean Christopherson
Add a comment in svm_flush_tlb() to document why it flushes only the current ASID, even when it is invoked when flushing remote TLBs. Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-22-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21KVM: SVM: Wire up ->tlb_flush_guest() directly to svm_flush_tlb()Sean Christopherson
Use svm_flush_tlb() directly for kvm_x86_ops->tlb_flush_guest() now that the @invalidate_gpa param to ->tlb_flush() is gone, i.e. the wrapper for ->tlb_flush_guest() is no longer necessary. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-18-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>