summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/lapic.c
AgeCommit message (Collapse)Author
2023-01-13KVM: x86: Move APIC access page helper to common x86 codeSean Christopherson
Move the APIC access page allocation helper function to common x86 code, the allocation routine is virtually identical between APICv (VMX) and AVIC (SVM). Keep APICv's gfn_to_page() + put_page() sequence, which verifies that a backing page can be allocated, i.e. that the system isn't under heavy memory pressure. Forcing the backing page to be populated isn't strictly necessary, but skipping the effective prefetch only delays the inevitable. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Handle APICv updates for APIC "mode" changes via requestSean Christopherson
Use KVM_REQ_UPDATE_APICV to react to APIC "mode" changes, i.e. to handle the APIC being hardware enabled/disabled and/or x2APIC being toggled. There is no need to immediately update APICv state, the only requirement is that APICv be updating prior to the next VM-Enter. Making a request will allow piggybacking KVM_REQ_UPDATE_APICV to "inhibit" the APICv memslot when x2APIC is enabled. Doing that directly from kvm_lapic_set_base() isn't feasible as KVM's SRCU must not be held when modifying memslots (to avoid deadlock), and may or may not be held when kvm_lapic_set_base() is called, i.e. KVM can't do the right thing without tracking that is rightly buried behind CONFIG_PROVE_RCU=y. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit IDSean Christopherson
Truncate the vcpu_id, a.k.a. x2APIC ID, to an 8-bit value when comparing it against the xAPIC ID to avoid false positives (sort of) on systems with >255 CPUs, i.e. with IDs that don't fit into a u8. The intent of APIC_ID_MODIFIED is to inhibit APICv/AVIC when the xAPIC is changed from it's original value, The mismatch isn't technically a false positive, as architecturally the xAPIC IDs do end up being aliased in this scenario, and neither APICv nor AVIC correctly handles IPI virtualization when there is aliasing. However, KVM already deliberately does not honor the aliasing behavior that results when an x2APIC ID gets truncated to an xAPIC ID. I.e. the resulting APICv/AVIC behavior is aligned with KVM's existing behavior when KVM's x2APIC hotplug hack is effectively enabled. If/when KVM provides a way to disable the hotplug hack, APICv/AVIC can piggyback whatever logic disables the optimized APIC map (which is what provides the hotplug hack), i.e. so that KVM's optimized map and APIC virtualization yield the same behavior. For now, fix the immediate problem of APIC virtualization being disabled for large VMs, which is a much more pressing issue than ensuring KVM honors architectural behavior for APIC ID aliasing. Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base") Reported-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Don't inhibit APICv/AVIC on xAPIC ID "change" if APIC is disabledSean Christopherson
Don't inhibit APICv/AVIC due to an xAPIC ID mismatch if the APIC is hardware disabled. The ID cannot be consumed while the APIC is disabled, and the ID is guaranteed to be set back to the vcpu_id when the APIC is hardware enabled (architectural behavior correctly emulated by KVM). Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base") Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Purge "highest ISR" cache when updating APICv stateSean Christopherson
Purge the "highest ISR" cache when updating APICv state on a vCPU. The cache must not be used when APICv is active as hardware may emulate EOIs (and other operations) without exiting to KVM. This fixes a bug where KVM will effectively block IRQs in perpetuity due to the "highest ISR" never getting reset if APICv is activated on a vCPU while an IRQ is in-service. Hardware emulates the EOI and KVM never gets a chance to update its cache. Fixes: b26a695a1d78 ("kvm: lapic: Introduce APICv update helper function") Cc: stable@vger.kernel.org Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Blindly get current x2APIC reg value on "nodecode write" trapsSean Christopherson
When emulating a x2APIC write in response to an APICv/AVIC trap, get the the written value from the vAPIC page without checking that reads are allowed for the target register. AVIC can generate trap-like VM-Exits on writes to EOI, and so KVM needs to get the written value from the backing page without running afoul of EOI's write-only behavior. Alternatively, EOI could be special cased to always write '0', e.g. so that the sanity check could be preserved, but x2APIC on AMD is actually supposed to disallow non-zero writes (not emulated by KVM), and the sanity check was a byproduct of how the KVM code was written, i.e. wasn't added to guard against anything in particular. Fixes: 70c8327c11c6 ("KVM: x86: Bug the VM if an accelerated x2APIC trap occurs on a "bad" reg") Fixes: 1bd9dfec9fd4 ("KVM: x86: Do not block APIC write for non ICR registers") Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29KVM: x86: Unify pr_fmt to use module name for all KVM modulesSean Christopherson
Define pr_fmt using KBUILD_MODNAME for all KVM x86 code so that printks use consistent formatting across common x86, Intel, and AMD code. In addition to providing consistent print formatting, using KBUILD_MODNAME, e.g. kvm_amd and kvm_intel, allows referencing SVM and VMX (and SEV and SGX and ...) as technologies without generating weird messages, and without causing naming conflicts with other kernel code, e.g. "SEV: ", "tdx: ", "sgx: " etc.. are all used by the kernel for non-KVM subsystems. Opportunistically move away from printk() for prints that need to be modified anyways, e.g. to drop a manual "kvm: " prefix. Opportunistically convert a few SGX WARNs that are similarly modified to WARN_ONCE; in the very unlikely event that the WARNs fire, odds are good that they would fire repeatedly and spam the kernel log without providing unique information in each print. Note, defining pr_fmt yields undesirable results for code that uses KVM's printk wrappers, e.g. vcpu_unimpl(). But, that's a pre-existing problem as SVM/kvm_amd already defines a pr_fmt, and thankfully use of KVM's wrappers is relatively limited in KVM x86 code. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Paul Durrant <paul@xen.org> Message-Id: <20221130230934.1014142-35-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-02KVM: x86: remove unnecessary exportsPaolo Bonzini
Several symbols are not used by vendor modules but still exported. Removing them ensures that new coupling between kvm.ko and kvm-*.ko is noticed and reviewed. Co-developed-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Like Xu <like.xu.linux@gmail.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Like Xu <like.xu.linux@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-02KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itselfYuan ZhaoXiong
When a VM reboots itself, the reset process will result in an ioctl(KVM_SET_LAPIC, ...) to disable x2APIC mode and set the xAPIC id of the vCPU to its default value, which is the vCPU id. That will be handled in KVM as follows: kvm_vcpu_ioctl_set_lapic kvm_apic_set_state kvm_lapic_set_base => disable X2APIC mode kvm_apic_state_fixup kvm_lapic_xapic_id_updated kvm_xapic_id(apic) != apic->vcpu->vcpu_id kvm_set_apicv_inhibit(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s)) => update APIC_ID When kvm_apic_set_state invokes kvm_lapic_set_base to disable x2APIC mode, the old 32-bit x2APIC id is still present rather than the 8-bit xAPIC id. kvm_lapic_xapic_id_updated will set the APICV_INHIBIT_REASON_APIC_ID_MODIFIED bit and disable APICv/x2AVIC. Instead, kvm_lapic_xapic_id_updated must be called after APIC_ID is changed. In fact, this fixes another small issue in the code in that potential changes to a vCPU's xAPIC ID need not be tracked for KVM_GET_LAPIC. Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base") Signed-off-by: Yuan ZhaoXiong <yuanzhaoxiong@baidu.com> Message-Id: <1669984574-32692-1-git-send-email-yuanzhaoxiong@baidu.com> Cc: stable@vger.kernel.org Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: x86: start moving SMM-related functions to new filesPaolo Bonzini
Create a new header and source with code related to system management mode emulation. Entry and exit will move there too; for now, opportunistically rename put_smstate to PUT_SMSTATE while moving it to smm.h, and adjust the SMM state saving code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220929172016.319443-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: x86: Don't snapshot pending INIT/SIPI prior to checking nested eventsSean Christopherson
Don't snapshot pending INIT/SIPI events prior to checking nested events, architecturally there's nothing wrong with KVM processing (dropping) a SIPI that is received immediately after synthesizing a VM-Exit. Taking and consuming the snapshot makes the flow way more subtle than it needs to be, e.g. nVMX consumes/clears events that trigger VM-Exit (INIT/SIPI), and so at first glance it appears that KVM is double-dipping on pending INITs and SIPIs. But that's not the case because INIT is blocked unconditionally in VMX root mode the CPU cannot be in wait-for_SIPI after VM-Exit, i.e. the paths that truly consume the snapshot are unreachable if apic->pending_events is modified by kvm_check_nested_events(). nSVM is a similar story as GIF is cleared by the CPU on VM-Exit; INIT is blocked regardless of whether or not it was pending prior to VM-Exit. Drop the snapshot logic so that a future fix doesn't create weirdness when kvm_vcpu_running()'s call to kvm_check_nested_events() is moved to vcpu_block(). In that case, kvm_check_nested_events() will be called immediately before kvm_apic_accept_events(), which raises the obvious question of why that change doesn't break the snapshot logic. Note, there is a subtle functional change. Previously, KVM would clear pending SIPIs if and only SIPI was pending prior to VM-Exit, whereas now KVM clears pending SIPI unconditionally if INIT+SIPI are blocked. The latter is architecturally allowed, as SIPI is ignored if the CPU is not in wait-for-SIPI mode (arguably, KVM should be even more aggressive in dropping SIPIs). It is software's responsibility to ensure the SIPI is delivered, i.e. software shouldn't be firing INIT-SIPI at a CPU until it knows with 100% certaining that the target CPU isn't in VMX root mode. Furthermore, the existing code is extra weird as SIPIs that arrive after VM-Exit _are_ dropped if there also happened to be a pending SIPI before VM-Exit. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: x86: Rename and expose helper to detect if INIT/SIPI are allowedSean Christopherson
Rename and invert kvm_vcpu_latch_init() to kvm_apic_init_sipi_allowed() so as to match the behavior of {interrupt,nmi,smi}_allowed(), and expose the helper so that it can be used by kvm_vcpu_has_events() to determine whether or not an INIT or SIPI is pending _and_ can be taken immediately. Opportunistically replaced usage of the "latch" terminology with "blocked" and/or "allowed", again to align with KVM's terminology used for all other event types. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-10KVM: x86: Bug the VM if an accelerated x2APIC trap occurs on a "bad" regSean Christopherson
Bug the VM if retrieving the x2APIC MSR/register while processing an accelerated vAPIC trap VM-Exit fails. In theory it's impossible for the lookup to fail as hardware has already validated the register, but bugs happen, and not checking the result of kvm_lapic_msr_read() would result in consuming the uninitialized "val" if a KVM or hardware bug occurs. Fixes: 1bd9dfec9fd4 ("KVM: x86: Do not block APIC write for non ICR registers") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220804235028.1766253-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-28KVM: x86: Do not block APIC write for non ICR registersSuravee Suthikulpanit
The commit 5413bcba7ed5 ("KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode") introduces logic to prevent APIC write for offset other than ICR in kvm_apic_write_nodecode() function. This breaks x2AVIC support, which requires KVM to trap and emulate x2APIC MSR writes. Therefore, removes the warning and modify to logic to allow MSR write. Fixes: 5413bcba7ed5 ("KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode") Cc: Zeng Guang <guang.zeng@intel.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220725053356.4275-1-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14KVM: x86: Check target, not vCPU's x2APIC ID, when applying hotplug hackSean Christopherson
When applying the hotplug hack to match x2APIC IDs for vCPUs in xAPIC mode, check the target APID ID for being unaddressable in xAPIC mode instead of checking the vCPU's x2APIC ID, and in that case proceed as if apic_x2apic_mode(vcpu) were true. Functionally, it does not matter whether you compare kvm_x2apic_id(apic) or mda with 0xff, since the two values are then checked for equality. But in isolation, checking the x2APIC ID takes an unnecessary dependency on the x2APIC ID being read-only (which isn't strictly true on AMD CPUs, and is difficult to document as well); it also requires KVM to fallthrough and check the xAPIC ID as well to deal with a writable xAPIC ID, whereas the xAPIC ID _can't_ match a target ID greater than 0xff. Opportunistically reword the comment to call out the various subtleties, and to fix a typo reported by Zhang Jiaming. No functional change intended. Cc: Zhang Jiaming <jiaming@nfschina.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-08KVM: x86: Fix handling of APIC LVT updates when userspace changes MCG_CAPSean Christopherson
Add a helper to update KVM's in-kernel local APIC in response to MCG_CAP being changed by userspace to fix multiple bugs. First and foremost, KVM needs to check that there's an in-kernel APIC prior to dereferencing vcpu->arch.apic. Beyond that, any "new" LVT entries need to be masked, and the APIC version register needs to be updated as it reports out the number of LVT entries. Fixes: 4b903561ec49 ("KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic.") Reported-by: syzbot+8cdad6430c24f396f158@syzkaller.appspotmail.com Cc: Siddh Raman Pant <code@siddh.me> Cc: Jue Wang <juew@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-07-08KVM: x86: Initialize number of APIC LVT entries during APIC creationSean Christopherson
Initialize the number of LVT entries during APIC creation, else the field will be incorrectly left '0' if userspace never invokes KVM_X86_SETUP_MCE. Add and use a helper to calculate the number of entries even though MCG_CMCI_P is not set by default in vcpu->arch.mcg_cap. Relying on that to always be true is unnecessarily risky, and subtle/confusing as well. Fixes: 4b903561ec49 ("KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic.") Reported-by: Xiaoyao Li <xiaoyao.li@intel.com> Cc: Jue Wang <juew@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-06-24KVM: x86: Deactivate APICv on vCPU with APIC disabledSuravee Suthikulpanit
APICv should be deactivated on vCPU that has APIC disabled. Therefore, call kvm_vcpu_update_apicv() when changing APIC mode, and add additional check for APIC disable mode when determine APICV activation, Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220519102709.24125-9-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: x86: lapic: Rename [GET/SET]_APIC_DEST_FIELD to [GET/SET]_XAPIC_DEST_FIELDSuravee Suthikulpanit
To signify that the macros only support 8-bit xAPIC destination ID. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220519102709.24125-3-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic.Jue Wang
This patch calculates the number of lvt entries as part of KVM_X86_MCE_SETUP conditioned on the presence of MCG_CMCI_P bit in MCG_CAP and stores result in kvm_lapic. It translats from APIC_LVTx register to index in lapic_lvt_entry enum. It extends the APIC_LVTx macro as well as other lapic write/reset handling etc to support Corrected Machine Check Interrupt. Signed-off-by: Jue Wang <juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220610171134.772566-5-juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: x86: Add APIC_LVTx() macro.Jue Wang
An APIC_LVTx macro is introduced to calcualte the APIC_LVTx register offset based on the index in the lapic_lvt_entry enum. Later patches will extend the APIC_LVTx macro to support the APIC_LVTCMCI register in order to implement Corrected Machine Check Interrupt signaling. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Jue Wang <juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220610171134.772566-4-juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: x86: Fill apic_lvt_mask with enums / explicit entries.Jue Wang
This patch defines a lapic_lvt_entry enum used as explicit indices to the apic_lvt_mask array. In later patches a LVT_CMCI will be added to implement the Corrected Machine Check Interrupt signaling. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Jue Wang <juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220610171134.772566-3-juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: x86: Make APIC_VERSION capture only the magic 0x14UL.Jue Wang
Refactor APIC_VERSION so that the maximum number of LVT entries is inserted at runtime rather than compile time. This will be used in a subsequent commit to expose the LVT CMCI Register to VMs that support Corrected Machine Check error counting/signaling (IA32_MCG_CAP.MCG_CMCI_P=1). Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Jue Wang <juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220610171134.772566-2-juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-20KVM: x86: Move "apicv_active" into "struct kvm_lapic"Sean Christopherson
Move the per-vCPU apicv_active flag into KVM's local APIC instance. APICv is fully dependent on an in-kernel local APIC, but that's not at all clear when reading the current code due to the flag being stored in the generic kvm_vcpu_arch struct. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220614230548.3852141-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-20KVM: x86: Drop @vcpu parameter from kvm_x86_ops.hwapic_isr_update()Sean Christopherson
Drop the unused @vcpu parameter from hwapic_isr_update(). AMD/AVIC is unlikely to implement the helper, and VMX/APICv doesn't need the vCPU as it operates on the current VMCS. The result is somewhat odd, but allows for a decent amount of (future) cleanup in the APIC code. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220614230548.3852141-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09Merge branch 'kvm-5.20-early'Paolo Bonzini
s390: * add an interface to provide a hypervisor dump for secure guests * improve selftests to show tests x86: * Intel IPI virtualization * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS * PEBS virtualization * Simplify PMU emulation by just using PERF_TYPE_RAW events * More accurate event reinjection on SVM (avoid retrying instructions) * Allow getting/setting the state of the speaker port data bit * Rewrite gfn-pfn cache refresh * Refuse starting the module if VM-Entry/VM-Exit controls are inconsistent * "Notify" VM exit
2022-06-09KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC baseMaxim Levitsky
Neither of these settings should be changed by the guest and it is a burden to support it in the acceleration code, so just inhibit this code instead. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86: Introduce "struct kvm_caps" to track misc caps/settingsSean Christopherson
Add kvm_caps to hold a variety of capabilites and defaults that aren't handled by kvm_cpu_caps because they aren't CPUID bits in order to reduce the amount of boilerplate code required to add a new feature. The vast majority (all?) of the caps interact with vendor code and are written only during initialization, i.e. should be tagged __read_mostly, declared extern in x86.h, and exported. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220524135624.22988-4-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC modeZeng Guang
Upcoming Intel CPUs will support virtual x2APIC MSR writes to the vICR, i.e. will trap and generate an APIC-write VM-Exit instead of intercepting the WRMSR. Add support for handling "nodecode" x2APIC writes, which were previously impossible. Note, x2APIC MSR writes are 64 bits wide. Signed-off-by: Zeng Guang <guang.zeng@intel.com> Message-Id: <20220419153516.11739-1-guang.zeng@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-25KVM: LAPIC: Drop pending LAPIC timer injection when canceling the timerWanpeng Li
The timer is disarmed when switching between TSC deadline and other modes; however, the pending timer is still in-flight, so let's accurately remove any traces of the previous mode. Fixes: 4427593258 ("KVM: x86: thoroughly disarm LAPIC timer around TSC deadline switch") Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-25KVM: LAPIC: Trace LAPIC timer expiration on every vmentryWanpeng Li
In commit ec0671d5684a ("KVM: LAPIC: Delay trace_kvm_wait_lapic_expire tracepoint to after vmexit", 2019-06-04), trace_kvm_wait_lapic_expire was moved after guest_exit_irqoff() because invoking tracepoints within kvm_guest_enter/kvm_guest_exit caused a lockdep splat. These days this is not necessary, because commit 87fa7f3e98a1 ("x86/kvm: Move context tracking where it belongs", 2020-07-09) restricted the RCU extended quiescent state to be closer to vmentry/vmexit. Moving the tracepoint back to __kvm_wait_lapic_expire is more accurate, because it will be reported even if vcpu_enter_guest causes multiple vmentries via the IPI/Timer fast paths, and it allows the removal of advance_expire_delta. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1650961551-38390-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29KVM: x86: Avoid theoretical NULL pointer dereference in ↵Vitaly Kuznetsov
kvm_irq_delivery_to_apic_fast() When kvm_irq_delivery_to_apic_fast() is called with APIC_DEST_SELF shorthand, 'src' must not be NULL. Crash the VM with KVM_BUG_ON() instead of crashing the host. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220325132140.25650-3-vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-01KVM: x86: Make kvm_lapic_set_reg() a "private" xAPIC helperSean Christopherson
Hide the lapic's "raw" write helper inside lapic.c to force non-APIC code to go through proper helpers when modification the vAPIC state. Keep the read helper visible to outsiders for now, refactoring KVM to hide it too is possible, it will just take more work to do so. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-01KVM: x86: Treat x2APIC's ICR as a 64-bit register, not two 32-bit regsSean Christopherson
Emulate the x2APIC ICR as a single 64-bit register, as opposed to forking it across ICR and ICR2 as two 32-bit registers. This mirrors hardware behavior for Intel's upcoming IPI virtualization support, which does not split the access. Previous versions of Intel's SDM and AMD's APM don't explicitly state exactly how ICR is reflected in the vAPIC page for x2APIC, KVM just happened to speculate incorrectly. Handling the upcoming behavior is necessary in order to maintain backwards compatibility with KVM_{G,S}ET_LAPIC, e.g. failure to shuffle the 64-bit ICR to ICR+ICR2 and vice versa would break live migration if IPI virtualization support isn't symmetrical across the source and dest. Cc: Zeng Guang <guang.zeng@intel.com> Cc: Chao Gao <chao.gao@intel.com> Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-01KVM: x86: Add helpers to handle 64-bit APIC MSR read/writesSean Christopherson
Add helpers to handle 64-bit APIC read/writes via MSRs to deduplicate the x2APIC and Hyper-V code needed to service reads/writes to ICR. Future support for IPI virtualization will add yet another path where KVM must handle 64-bit APIC MSR reads/write (to ICR). Opportunistically fix the comment in the write path; ICR2 holds the destination (if there's no shorthand), not the vector. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-01KVM: x86: Make kvm_lapic_reg_{read,write}() staticSean Christopherson
Make the low level read/write lapic helpers static, any accesses to the local APIC from vendor code or non-APIC code should be routed through proper helpers. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-01KVM: x86: WARN if KVM emulates an IPI without clearing the BUSY flagSean Christopherson
WARN if KVM emulates an IPI without clearing the BUSY flag, failure to do so could hang the guest if it waits for the IPI be sent. Opportunistically use APIC_ICR_BUSY macro instead of open coding the magic number, and add a comment to clarify why kvm_recalculate_apic_map() is unconditionally invoked (it's really, really confusing for IPIs due to the existence of fast paths that don't trigger a potential recalc). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-01KVM: SVM: Don't rewrite guest ICR on AVIC IPI virtualization failureSean Christopherson
Don't bother rewriting the ICR value into the vAPIC page on an AVIC IPI virtualization failure, the access is a trap, i.e. the value has already been written to the vAPIC page. The one caveat is if hardware left the BUSY flag set (which appears to happen somewhat arbitrarily), in which case go through the "nodecode" APIC-write path in order to clear the BUSY flag. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-01KVM: x86: Use "raw" APIC register read for handling APIC-write VM-ExitSean Christopherson
Use the "raw" helper to read the vAPIC register after an APIC-write trap VM-Exit. Hardware is responsible for vetting the write, and the caller is responsible for sanitizing the offset. This is a functional change, as it means KVM will consume whatever happens to be in the vAPIC page if the write was dropped by hardware. But, unless userspace deliberately wrote garbage into the vAPIC page via KVM_SET_LAPIC, the value should be zero since it's not writable by the guest. This aligns common x86 with SVM's AVIC logic, i.e. paves the way for using the nodecode path to handle APIC-write traps when AVIC is enabled. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-01KVM: VMX: Handle APIC-write offset wrangling in VMX codeSean Christopherson
Move the vAPIC offset adjustments done in the APIC-write trap path from common x86 to VMX in anticipation of using the nodecode path for SVM's AVIC. The adjustment reflects hardware behavior, i.e. it's technically a property of VMX, no common x86. SVM's AVIC behavior is identical, so it's a bit of a moot point, the goal is purely to make it easier to understand why the adjustment is ok. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-01KVM: x86: Do not change ICR on write to APIC_SELF_IPIPaolo Bonzini
Emulating writes to SELF_IPI with a write to ICR has an unwanted side effect: the value of ICR in vAPIC page gets changed. The lists SELF_IPI as write-only, with no associated MMIO offset, so any write should have no visible side effect in the vAPIC page. Reported-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-01KVM: x86: Fix emulation in writing cr8Zhenzhong Duan
In emulation of writing to cr8, one of the lowest four bits in TPR[3:0] is kept. According to Intel SDM 10.8.6.1(baremetal scenario): "APIC.TPR[bits 7:4] = CR8[bits 3:0], APIC.TPR[bits 3:0] = 0"; and SDM 28.3(use TPR shadow): "MOV to CR8. The instruction stores bits 3:0 of its source operand into bits 7:4 of VTPR; the remainder of VTPR (bits 3:0 and bits 31:8) are cleared."; and AMD's APM 16.6.4: "Task Priority Sub-class (TPS)-Bits 3 : 0. The TPS field indicates the current sub-priority to be used when arbitrating lowest-priority messages. This field is written with zero when TPR is written using the architectural CR8 register."; so in KVM emulated scenario, clear TPR[3:0] to make a consistent behavior as in other scenarios. This doesn't impact evaluation and delivery of pending virtual interrupts because processor does not use the processor-priority sub-class to determine which interrupts to delivery and which to inhibit. Sub-class is used by hardware to arbitrate lowest priority interrupts, but KVM just does a round-robin style delivery. Fixes: b93463aa59d6 ("KVM: Accelerated apic support") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220210094506.20181-1-zhenzhong.duan@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-18KVM: x86: make several APIC virtualization callbacks optionalPaolo Bonzini
All their invocations are conditional on vcpu->arch.apicv_active, meaning that they need not be implemented by vendor code: even though at the moment both vendors implement APIC virtualization, all of them can be optional. In fact SVM does not need many of them, and their implementation can be deleted now. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-10KVM: LAPIC: Enable timer posted-interrupt only when mwait/hlt is advertisedWanpeng Li
As commit 0c5f81dad46 ("KVM: LAPIC: Inject timer interrupt via posted interrupt") mentioned that the host admin should well tune the guest setup, so that vCPUs are placed on isolated pCPUs, and with several pCPUs surplus for *busy* housekeeping. In this setup, it is preferrable to disable mwait/hlt/pause vmexits to keep the vCPUs in non-root mode. However, if only some guests isolated and others not, they would not have any benefit from posted timer interrupts, and at the same time lose VMX preemption timer fast paths because kvm_can_post_timer_interrupt() returns true and therefore forces kvm_can_use_hv_timer() to false. By guaranteeing that posted-interrupt timer is only used if MWAIT or HLT are done without vmexit, KVM can make a better choice and use the VMX preemption timer and the corresponding fast paths. Reported-by: Aili Yao <yaoaili@kingsoft.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Cc: Aili Yao <yaoaili@kingsoft.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1643112538-36743-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-10KVM: x86: Use static_call() for .vcpu_deliver_sipi_vector()Sean Christopherson
Define and use a static_call() for kvm_x86_ops.vcpu_deliver_sipi_vector(), mostly so that the op is defined in kvm-x86-ops.h. This will allow using KVM_X86_OP in vendor code to wire up the implementation. Any performance gains eeked out by using static_call() is a happy bonus and not the primary motiviation. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-08KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when ↵Maxim Levitsky
inhibiting it kvm_apic_update_apicv is called when AVIC is still active, thus IRR bits can be set by the CPU after it is called, and don't cause the irr_pending to be set to true. Also logic in avic_kick_target_vcpu doesn't expect a race with this function so to make it simple, just keep irr_pending set to true and let the next interrupt injection to the guest clear it. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220207155447.840194-9-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-01KVM: x86: Move delivery of non-APICv interrupt into vendor codeSean Christopherson
Handle non-APICv interrupt delivery in vendor code, even though it means VMX and SVM will temporarily have duplicate code. SVM's AVIC has a race condition that requires KVM to fall back to legacy interrupt injection _after_ the interrupt has been logged in the vIRR, i.e. to fix the race, SVM will need to open code the full flow anyways[*]. Refactor the code so that the SVM bug without introducing other issues, e.g. SVM would return "success" and thus invoke trace_kvm_apicv_accept_irq() even when delivery through the AVIC failed, and to opportunistically prepare for using KVM_X86_OP to fill each vendor's kvm_x86_ops struct, which will rely on the vendor function matching the kvm_x86_op pointer name. No functional change intended. [*] https://lore.kernel.org/all/20211213104634.199141-4-mlevitsk@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-26KVM: LAPIC: Also cancel preemption timer during SET_LAPICWanpeng Li
The below warning is splatting during guest reboot. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1931 at arch/x86/kvm/x86.c:10322 kvm_arch_vcpu_ioctl_run+0x874/0x880 [kvm] CPU: 0 PID: 1931 Comm: qemu-system-x86 Tainted: G I 5.17.0-rc1+ #5 RIP: 0010:kvm_arch_vcpu_ioctl_run+0x874/0x880 [kvm] Call Trace: <TASK> kvm_vcpu_ioctl+0x279/0x710 [kvm] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fd39797350b This can be triggered by not exposing tsc-deadline mode and doing a reboot in the guest. The lapic_shutdown() function which is called in sys_reboot path will not disarm the flying timer, it just masks LVTT. lapic_shutdown() clears APIC state w/ LVT_MASKED and timer-mode bit is 0, this can trigger timer-mode switch between tsc-deadline and oneshot/periodic, which can result in preemption timer be cancelled in apic_update_lvtt(). However, We can't depend on this when not exposing tsc-deadline mode and oneshot/periodic modes emulated by preemption timer. Qemu will synchronise states around reset, let's cancel preemption timer under KVM_SET_LAPIC. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1643102220-35667-1-git-send-email-wanpengli@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19KVM: x86: Unexport LAPIC's switch_to_{hv,sw}_timer() helpersSean Christopherson
Unexport switch_to_{hv,sw}_timer() now that common x86 handles the transitions. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-09KVM: x86: add a tracepoint for APICv/AVIC interrupt deliveryMaxim Levitsky
This allows to see how many interrupts were delivered via the APICv/AVIC from the host. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211209115440.394441-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>