linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
39 hours	Merge tag 'mm-nonmm-stable-2025-08-03-12-47' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Significant patch series in this pull request: - "squashfs: Remove page->mapping references" (Matthew Wilcox) gets us closer to being able to remove page->mapping - "relayfs: misc changes" (Jason Xing) does some maintenance and minor feature addition work in relayfs - "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches us from static preallocation of the kdump crashkernel's working memory over to dynamic allocation. So the difficulty of a-priori estimation of the second kernel's needs is removed and the first kernel obtains extra memory - "generalize panic_print's dump function to be used by other kernel parts" (Feng Tang) implements some consolidation and rationalization of the various ways in which a failing kernel splats information at the operator * tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits) tools/getdelays: add backward compatibility for taskstats version kho: add test for kexec handover delaytop: enhance error logging and add PSI feature description samples: Kconfig: fix spelling mistake "instancess" -> "instances" fat: fix too many log in fat_chain_add() scripts/spelling.txt: add notifer\|\|notifier to spelling.txt xen/xenbus: fix typo "notifer" net: mvneta: fix typo "notifer" drm/xe: fix typo "notifer" cxl: mce: fix typo "notifer" KVM: x86: fix typo "notifer" MAINTAINERS: add maintainers for delaytop ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below() ucount: fix atomic_long_inc_below() argument type kexec: enable CMA based contiguous allocation stackdepot: make max number of pools boot-time configurable lib/xxhash: remove unused functions init/Kconfig: restore CONFIG_BROKEN help text lib/raid6: update recov_rvv.c zero page usage docs: update docs after introducing delaytop ...
3 days	KVM: x86: fix typo "notifer"	WangYuli
	Patch series "treewide: Fix typo "notifer"", v3. There are some spelling mistakes of 'notifer' in comments which should be 'notifier'. Fix them and add it to scripts/spelling.txt. This patch (of 8): There are some spelling mistakes of 'notifer' which should be 'notifier'. Link: https://lkml.kernel.org/r/576F0D85F6853074+20250722072734.19367-1-wangyuli@uniontech.com Link: https://lkml.kernel.org/r/7F05778C3A1A9F8B+20250722073431.21983-1-wangyuli@uniontech.com Signed-off-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
7 days	Merge tag 'kvm-x86-sev-6.17' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM SEV cache maintenance changes for 6.17 - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM. - Use WBNOINVD instead of WBINVD when possible, for SEV cache maintenance, e.g. to minimize collateral damage when reclaiming memory from an SEV guest. - When reclaiming memory from an SEV guest, only do cache flushes on CPUs that have ever run a vCPU for the guest, i.e. don't flush the caches for CPUs that can't possibly have cache lines with dirty, encrypted data.
7 days	Merge tag 'x86_core_for_kvm' of ↵	Paolo Bonzini
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD Immutable branch for KVM tree to put the KVM patches from https://lore.kernel.org/r/20250522233733.3176144-1-seanjc@google.com ontop. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
7 days	Merge tag 'kvm-x86-svm-6.17' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM SVM changes for 6.17 Drop KVM's rejection of SNP's SMT and single-socket policy restrictions, and instead rely on firmware to verify that the policy can actually be supported. Don't bother checking that requested policy(s) can actually be satisfied, as an incompatible policy doesn't put the kernel at risk in any way, and providing guarantees with respect to the physical topology is outside of KVM's purview.
7 days	Merge tag 'kvm-x86-apic-6.17' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM local APIC changes for 6.17 Extract many of KVM's helpers for accessing architectural local APIC state to common x86 so that they can be shared by guest-side code for Secure AVIC.
7 days	Merge tag 'kvm-x86-mmu-6.17' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86 MMU changes for 6.17 - Exempt nested EPT from the the !USER + CR0.WP logic, as EPT doesn't interact with CR0.WP. - Move the TDX hardware setup code to tdx.c to better co-locate TDX code and eliminate a few global symbols. - Dynamically allocation the shadow MMU's hashed page list, and defer allocating the hashed list until it's actually needed (the TDP MMU doesn't use the list).
7 days	Merge tag 'kvm-x86-misc-6.17' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86 misc changes for 6.17 - Prevert the host's DEBUGCTL.FREEZE_IN_SMM (Intel only) when running the guest. Failure to honor FREEZE_IN_SMM can bleed host state into the guest. - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter (Intel only) to prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF. - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the vCPU's CPUID model. - Rework the MSR interception code so that the SVM and VMX APIs are more or less identical. - Recalculate all MSR intercepts from the "source" on MSR filter changes, and drop the dedicated "shadow" bitmaps (and their awful "max" size defines). - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the nested SVM MSRPM offsets tracker can't handle an MSR. - Advertise support for LKGS (Load Kernel GS base), a new instruction that's loosely related to FRED, but is supported and enumerated independently. - Fix a user-triggerable WARN that syzkaller found by stuffing INIT_RECEIVED, a.k.a. WFS, and then putting the vCPU into VMX Root Mode (post-VMXON). Use the same approach KVM uses for dealing with "impossible" emulation when running a !URG guest, and simply wait until KVM_RUN to detect that the vCPU has architecturally impossible state. - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of APERF/MPERF reads, so that a "properly" configured VM can "virtualize" APERF/MPERF (with many caveats). - Reject KVM_SET_TSC_KHZ if vCPUs have been created, as changing the "default" frequency is unsupported for VMs with a "secure" TSC, and there's no known use case for changing the default frequency for other VM types.
7 days	Merge tag 'kvm-x86-no_assignment-6.17' of https://github.com/kvm-x86/linux ↵	Paolo Bonzini
	into HEAD KVM VFIO device assignment cleanups for 6.17 Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking now that KVM no longer uses assigned_device_count as a bad heuristic for "VM has an irqbypass producer" or for "VM has access to host MMIO".
7 days	Merge tag 'kvm-x86-mmio-6.17' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM MMIO Stale Data mitigation cleanup for 6.17 Rework KVM's mitigation for the MMIO State Data vulnerability to track whether or not a vCPU has access to (host) MMIO based on the MMU that will be used when running in the guest. The current approach doesn't actually detect whether or not a guest has access to MMIO, and is prone to false negatives (and to a lesser extent, false positives), as KVM_DEV_VFIO_FILE_ADD is optional, and obviously only covers VFIO devices.
7 days	Merge tag 'kvm-x86-irqs-6.17' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM IRQ changes for 6.17 - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI. - Track irqbypass's "token" as "struct eventfd_ctx " instead of a "void ", to avoid making a simple concept unnecessarily difficult to understand. - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time. - Drop x86's irq_comm.c, and move a pile of IRQ related code into irq.c. - Fix a variety of flaws and bugs in the AVIC device posted IRQ code. - Inhibited AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation. - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry. - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs. - Dedup x86's device posted IRQ code, as the vast majority of functionality can be shared verbatime between SVM and VMX. - Harden the device posted IRQ code against bugs and runtime errors. - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n). - Generate GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU. - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs. - Clean up and document/comment the irqfd assignment code. - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique.
13 days	KVM: x86/xen: Fix cleanup logic in emulation of Xen schedop poll hypercalls	Manuel Andreas
	kvm_xen_schedop_poll does a kmalloc_array() when a VM polls the host for more than one event channel potr (nr_ports > 1). After the kmalloc_array(), the error paths need to go through the "out" label, but the call to kvm_read_guest_virt() does not. Fixes: 92c58965e965 ("KVM: x86/xen: Use kvm_read_guest_virt() instead of open-coding it badly") Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Manuel Andreas <manuel.andreas@tum.de> [Adjusted commit message. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-17	Merge tag 'kvm-x86-fixes-6.16-rc7' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM TDX fixes for 6.16 - Fix a formatting goof in the TDX documentation. - Reject KVM_SET_TSC_KHZ for guests with a protected TSC (currently only TDX). - Ensure struct kvm_tdx_capabilities fields that are not explicitly set by KVM are zeroed.
2025-07-17	KVM: TDX: Don't report base TDVMCALLs	Xiaoyao Li
	Remove TDVMCALLINFO_GET_QUOTE from user_tdvmcallinfo_1_r11 reported to userspace to align with the direction of the GHCI spec. Recently, concern was raised about a gap in the GHCI spec that left ambiguity in how to expose to the guest that only a subset of GHCI TDVMCalls were supported. During the back and forth on the spec details[0], <GetQuote> was moved from an individually enumerable TDVMCall, to one that is part of the 'base spec', meaning it doesn't have a specific bit in the <GetTDVMCallInfo> return values. Although the spec[1] is still in draft form, the GetQoute part has been agreed by the major TDX VMMs. Unfortunately the commits that were upstreamed still treat <GetQuote> as individually enumerable. They set bit 0 in the user_tdvmcallinfo_1_r11 which is reported to userspace to tell supported optional TDVMCalls, intending to say that <GetQuote> is supported. So stop reporting <GetQute> in user_tdvmcallinfo_1_r11 to align with the direction of the spec, and allow some future TDVMCall to use that bit. [0] https://lore.kernel.org/all/aEmuKII8FGU4eQZz@google.com/ [1] https://cdrdv2.intel.com/v1/dl/getContent/858626 Fixes: 28224ef02b56 ("KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities") Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20250717022010.677645-1-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-15	KVM: VMX: Ensure unused kvm_tdx_capabilities fields are zeroed out	Sean Christopherson
	Zero-allocate the kernel's kvm_tdx_capabilities structure and copy only the number of CPUID entries from the userspace structure. As is, KVM doesn't ensure kernel_tdvmcallinfo_1_{r11,r12} and user_tdvmcallinfo_1_r12 are zeroed, i.e. KVM will reflect whatever happens to be in the userspace structure back at userspace, and thus may report garbage to userspace. Zeroing the entire kernel structure also provides better semantics for the reserved field. E.g. if KVM extends kvm_tdx_capabilities to enumerate new information by repurposing bytes from the reserved field, userspace would be required to zero the new field in order to get useful information back (because older KVMs without support for the repurposed field would report garbage, a la the aforementioned tdvmcallinfo bugs). Fixes: 61bb28279623 ("KVM: TDX: Get system-wide info about TDX module on initialization") Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reported-by: Xiaoyao Li <xiaoyao.li@intel.com> Closes: https://lore.kernel.org/all/3ef581f1-1ff1-4b99-b216-b316f6415318@intel.com Tested-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://lore.kernel.org/r/20250714221928.1788095-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-15	KVM: x86: Reject KVM_SET_TSC_KHZ vCPU ioctl for TSC protected guest	Kai Huang
	Reject KVM_SET_TSC_KHZ vCPU ioctl if guest's TSC is protected and not changeable by KVM, and update the documentation to reflect it. For such TSC protected guests, e.g. TDX guests, typically the TSC is configured once at VM level before any vCPU are created and remains unchanged during VM's lifetime. KVM provides the KVM_SET_TSC_KHZ VM scope ioctl to allow the userspace VMM to configure the TSC of such VM. After that the userspace VMM is not supposed to call the KVM_SET_TSC_KHZ vCPU scope ioctl anymore when creating the vCPU. The de facto userspace VMM Qemu does this for TDX guests. The upcoming SEV-SNP guests with Secure TSC should follow. Note, TDX support hasn't been fully released as of the "buggy" commit, i.e. there is no established ABI to break. Fixes: adafea110600 ("KVM: x86: Add infrastructure for secure TSC") Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Link: https://lore.kernel.org/r/71bbdf87fdd423e3ba3a45b57642c119ee2dd98c.1752444335.git.kai.huang@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-14	KVM: x86: Reject KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created	Kai Huang
	Reject the KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created and update the documentation to reflect it. The VM scope KVM_SET_TSC_KHZ ioctl is used to set up the default TSC frequency that all subsequently created vCPUs can use. It is only intended to be called before any vCPU is created. Allowing it to be called after that only results in confusion but nothing good. Note this is an ABI change. But currently in Qemu (the de facto userspace VMM) only TDX uses this VM ioctl, and it is only called once before creating any vCPU, therefore the risk of breaking userspace is pretty low. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Link: https://lore.kernel.org/r/135a35223ce8d01cea06b6cef30bfe494ec85827.1752444335.git.kai.huang@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-14	KVM: SVM: Flush cache only on CPUs running SEV guest	Zheyun Shen
	On AMD CPUs without ensuring cache consistency, each memory page reclamation in an SEV guest triggers a call to do WBNOINVD/WBINVD on all CPUs, thereby affecting the performance of other programs on the host. Typically, an AMD server may have 128 cores or more, while the SEV guest might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity to bind these 8 vCPUs to specific physical CPUs. Therefore, keeping a record of the physical core numbers each time a vCPU runs can help avoid flushing the cache for all CPUs every time. Take care to allocate the cpumask used to track which CPUs have run a vCPU when copying or moving an "encryption context", as nothing guarantees memory in a mirror VM is a strict subset of the ASID owner, and the destination VM for intrahost migration needs to maintain it's own set of CPUs. E.g. for intrahost migration, if a CPU was used for the source VM but not the destination VM, then it can only have cached memory that was accessible to the source VM. And a CPU that was run in the source is also used by the destination is no different than a CPU that was run in the destination only. Note, KVM is guaranteed to do flush caches prior to sev_vm_destroy(), thanks to kvm_arch_guest_memory_reclaimed for SEV and SEV-ES, and kvm_arch_gmem_invalidate() for SEV-SNP. I.e. it's safe to free the cpumask prior to unregistering encrypted regions and freeing the ASID. Opportunistically clean up sev_vm_destroy()'s comment regarding what is (implicitly, what isn't) skipped for mirror VMs. Cc: Srikanth Aithal <sraithal@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn> Link: https://lore.kernel.org/r/20250522233733.3176144-9-seanjc@google.com Link: https://lore.kernel.org/all/935a82e3-f7ad-47d7-aaaf-f3d2b62ed768@amd.com Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10	x86/apic: KVM: Move apic_test)vector() to common code	Neeraj Upadhyay
	Move apic_test_vector() to apic.h in order to reuse it in the Secure AVIC guest APIC driver in later patches to test vector state in the APIC backing page. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-14-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10	x86/apic: KVM: Move lapic set/clear_vector() helpers to common code	Neeraj Upadhyay
	Move apic_clear_vector() and apic_set_vector() helper functions to apic.h in order to reuse them in the Secure AVIC guest APIC driver in later patches to atomically set/clear vectors in the APIC backing page. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-13-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10	x86/apic: KVM: Move lapic get/set helpers to common code	Neeraj Upadhyay
	Move the apic_get_reg(), apic_set_reg(), apic_get_reg64() and apic_set_reg64() helper functions to apic.h in order to reuse them in the Secure AVIC guest APIC driver in later patches to read/write registers from/to the APIC backing page. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-12-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10	x86/apic: KVM: Move apic_find_highest_vector() to a common header	Neeraj Upadhyay
	In preparation for using apic_find_highest_vector() in Secure AVIC guest APIC driver, move it and associated macros to apic.h. No functional change intended. Acked-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Link: https://lore.kernel.org/r/20250709033242.267892-11-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10	KVM: x86: Rename lapic set/clear vector helpers	Neeraj Upadhyay
	In preparation for moving kvm-internal kvm_lapic_set_vector(), kvm_lapic_clear_vector() to apic.h for use in Secure AVIC APIC driver, rename them as part of the APIC API. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-10-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10	KVM: x86: Rename lapic get/set_reg64() helpers	Neeraj Upadhyay
	In preparation for moving kvm-internal __kvm_lapic_set_reg64(), __kvm_lapic_get_reg64() to apic.h for use in Secure AVIC APIC driver, rename them as part of the APIC API. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-9-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10	KVM: x86: Rename lapic get/set_reg() helpers	Neeraj Upadhyay
	In preparation for moving kvm-internal __kvm_lapic_set_reg(), __kvm_lapic_get_reg() to apic.h for use in Secure AVIC APIC driver, rename them as part of the APIC API. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-8-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10	KVM: x86: Rename find_highest_vector()	Neeraj Upadhyay
	In preparation for moving kvm-internal find_highest_vector() to apic.h for use in Secure AVIC APIC driver, rename find_highest_vector() to apic_find_highest_vector() as part of the APIC API. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-7-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10	KVM: x86: Change lapic regs base address to void pointer	Neeraj Upadhyay
	Change APIC base address from "char " to "void " in KVM lapic's set/get helper functions. Pointer arithmetic for "void " and "char " operate identically. With "void *" there is less of a chance of doing the wrong thing, e.g. neglecting to cast and reading a byte instead of the desired APIC register size. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-6-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10	KVM: x86: Rename VEC_POS/REG_POS macro usages	Neeraj Upadhyay
	In preparation for moving most of the KVM's lapic helpers which use VEC_POS/REG_POS macros to common APIC header for use in Secure AVIC APIC driver, rename all VEC_POS/REG_POS macro usages to APIC_VECTOR_TO_BIT_NUMBER/APIC_VECTOR_TO_REG_OFFSET and remove VEC_POS/REG_POS. While at it, clean up line wrap in find_highest_vector(). No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-5-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10	x86/apic: KVM: Deduplicate APIC vector => register+bit math	Sean Christopherson
	Consolidate KVM's {REG,VEC}_POS() macros and lapic_vector_set_in_irr()'s open coded equivalent logic in anticipation of the kernel gaining more usage of vector => reg+bit lookups. Use lapic_vector_set_in_irr()'s math as using divides for both the bit number and register offset makes it easier to connect the dots, and for at least one user, fixup_irqs(), "/ 32 * 0x10" generates ever so slightly better code with gcc-14 (shaves a whole 3 bytes from the code stream): ((v) >> 5) << 4: c1 ef 05 shr $0x5,%edi c1 e7 04 shl $0x4,%edi 81 c7 00 02 00 00 add $0x200,%edi (v) / 32 * 0x10: c1 ef 05 shr $0x5,%edi 83 c7 20 add $0x20,%edi c1 e7 04 shl $0x4,%edi Keep KVM's tersely named macros as "wrappers" to avoid unnecessary churn in KVM, and because the shorter names yield more readable code overall in KVM. The new macros type cast the vector parameter to "unsigned int". This is required from better code generation for cases where an "int" is passed to these macros in KVM code. int v; ((v) >> 5) << 4: c1 f8 05 sar $0x5,%eax c1 e0 04 shl $0x4,%eax ((v) / 32 * 0x10): 85 ff test %edi,%edi 8d 47 1f lea 0x1f(%rdi),%eax 0f 49 c7 cmovns %edi,%eax c1 f8 05 sar $0x5,%eax c1 e0 04 shl $0x4,%eax ((unsigned int)(v) / 32 * 0x10): c1 f8 05 sar $0x5,%eax c1 e0 04 shl $0x4,%eax (v) & (32 - 1): 89 f8 mov %edi,%eax 83 e0 1f and $0x1f,%eax (v) % 32 89 fa mov %edi,%edx c1 fa 1f sar $0x1f,%edx c1 ea 1b shr $0x1b,%edx 8d 04 17 lea (%rdi,%rdx,1),%eax 83 e0 1f and $0x1f,%eax 29 d0 sub %edx,%eax (unsigned int)(v) % 32: 89 f8 mov %edi,%eax 83 e0 1f and $0x1f,%eax Overall kvm.ko text size is impacted if "unsigned int" is not used. Bin Orig New (w/o unsigned int) New (w/ unsigned int) lapic.o 28580 28772 28580 kvm.o 670810 671002 670810 kvm.ko 708079 708271 708079 No functional change intended. [Neeraj: Type cast vec macro param to "unsigned int", provide data in commit log on "unsigned int" requirement] Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Link: https://lore.kernel.org/r/20250709033242.267892-4-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10	KVM: x86: Remove redundant parentheses around 'bitmap'	Neeraj Upadhyay
	When doing pointer arithmetic in apic_test_vector() and kvm_lapic_{set\|clear}_vector(), remove the unnecessary parentheses surrounding the 'bitmap' parameter. No functional change intended. Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Link: https://lore.kernel.org/r/20250709033242.267892-3-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10	KVM: x86: Open code setting/clearing of bits in the ISR	Neeraj Upadhyay
	Remove __apic_test_and_set_vector() and __apic_test_and_clear_vector(), because the _only_ register that's safe to modify with a non-atomic operation is ISR, because KVM isn't running the vCPU, i.e. hardware can't service an IRQ or process an EOI for the relevant (virtual) APIC. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-2-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10	KVM: SEV: Prefer WBNOINVD over WBINVD for cache maintenance efficiency	Kevin Loughlin
	AMD CPUs currently execute WBINVD in the host when unregistering SEV guest memory or when deactivating SEV guests. Such cache maintenance is performed to prevent data corruption, wherein the encrypted (C=1) version of a dirty cache line might otherwise only be written back after the memory is written in a different context (ex: C=0), yielding corruption. However, WBINVD is performance-costly, especially because it invalidates processor caches. Strictly-speaking, unless the SEV ASID is being recycled (meaning the SNP firmware requires the use of WBINVD prior to DF_FLUSH), the cache invalidation triggered by WBINVD is unnecessary; only the writeback is needed to prevent data corruption in remaining scenarios. To improve performance in these scenarios, use WBNOINVD when available instead of WBINVD. WBNOINVD still writes back all dirty lines (preventing host data corruption by SEV guests) but does not invalidate processor caches. Note that the implementation of wbnoinvd() ensures fall back to WBINVD if WBNOINVD is unavailable. In anticipation of forthcoming optimizations to limit the WBNOINVD only to physical CPUs that have executed SEV guests, place the call to wbnoinvd_on_all_cpus() in a wrapper function sev_writeback_caches(). Signed-off-by: Kevin Loughlin <kevinloughlin@google.com> Reviewed-by: Mingwei Zhang <mizhang@google.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250201000259.3289143-3-kevinloughlin@google.com [sean: tweak comment regarding CLFUSH] Cc: Francesco Lavra <francescolavra.fl@gmail.com> Link: https://lore.kernel.org/r/20250522233733.3176144-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10	KVM: SVM: Remove wbinvd in sev_vm_destroy()	Zheyun Shen
	Before sev_vm_destroy() is called, kvm_arch_guest_memory_reclaimed() has been called for SEV and SEV-ES and kvm_arch_gmem_invalidate() has been called for SEV-SNP. These functions have already handled flushing the memory. Therefore, this wbinvd_on_all_cpus() can simply be dropped. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250522233733.3176144-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10	KVM: x86: Use wbinvd_on_cpu() instead of an open-coded equivalent	Sean Christopherson
	Use wbinvd_on_cpu() to target a single CPU instead of open-coding an equivalent, and drop KVM's wbinvd_ipi() now that all users have switched to x86 library versions. No functional change intended. Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250522233733.3176144-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull KVM fixes from Paolo Bonzini: "Many patches, pretty much all of them small, that accumulated while I was on vacation. ARM: - Remove the last leftovers of the ill-fated FPSIMD host state mapping at EL2 stage-1 - Fix unexpected advertisement to the guest of unimplemented S2 base granule sizes - Gracefully fail initialising pKVM if the interrupt controller isn't GICv3 - Also gracefully fail initialising pKVM if the carveout allocation fails - Fix the computing of the minimum MMIO range required for the host on stage-2 fault - Fix the generation of the GICv3 Maintenance Interrupt in nested mode x86: - Reject SEV{-ES} intra-host migration if one or more vCPUs are actively being created, so as not to create a non-SEV{-ES} vCPU in an SEV{-ES} VM - Use a pre-allocated, per-vCPU buffer for handling de-sparsification of vCPU masks in Hyper-V hypercalls; fixes a "stack frame too large" issue - Allow out-of-range/invalid Xen event channel ports when configuring IRQ routing, to avoid dictating a specific ioctl() ordering to userspace - Conditionally reschedule when setting memory attributes to avoid soft lockups when userspace converts huge swaths of memory to/from private - Add back MWAIT as a required feature for the MONITOR/MWAIT selftest - Add a missing field in struct sev_data_snp_launch_start that resulted in the guest-visible workarounds field being filled at the wrong offset - Skip non-canonical address when processing Hyper-V PV TLB flushes to avoid VM-Fail on INVVPID - Advertise supported TDX TDVMCALLs to userspace - Pass SetupEventNotifyInterrupt arguments to userspace - Fix TSC frequency underflow" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: avoid underflow when scaling TSC frequency KVM: arm64: Remove kvm_arch_vcpu_run_map_fp() KVM: arm64: Fix handling of FEAT_GTG for unimplemented granule sizes KVM: arm64: Don't free hyp pages with pKVM on GICv2 KVM: arm64: Fix error path in init_hyp_mode() KVM: arm64: Adjust range correctly during host stage-2 faults KVM: arm64: nv: Fix MI line level calculation in vgic_v3_nested_update_mi() KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush KVM: SVM: Add missing member in SNP_LAUNCH_START command structure Documentation: KVM: Fix unexpected unindent warnings KVM: selftests: Add back the missing check of MONITOR/MWAIT availability KVM: Allow CPU to reschedule while setting per-page memory attributes KVM: x86/xen: Allow 'out of range' event channel ports in IRQ routing table. KVM: x86/hyper-v: Use preallocated per-vCPU buffer for de-sparsified vCPU masks KVM: SVM: Initialize vmsa_pa in VMCB to INVALID_PAGE if VMSA page is NULL KVM: SVM: Reject SEV{-ES} intra host migration if vCPU creation is in-flight KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities KVM: TDX: Exit to userspace for SetupEventNotifyInterrupt
2025-07-10	x86/lib: Add WBINVD and WBNOINVD helpers to target multiple CPUs	Zheyun Shen
	Extract KVM's open-coded calls to do writeback caches on multiple CPUs to common library helpers for both WBINVD and WBNOINVD (KVM will use both). Put the onus on the caller to check for a non-empty mask to simplify the SMP=n implementation, e.g. so that it doesn't need to check that the one and only CPU in the system is present in the mask. [sean: move to lib, add SMP=n helpers, clarify usage] Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250128015345.7929-2-szy0127@sjtu.edu.cn Link: https://lore.kernel.org/20250522233733.3176144-5-seanjc@google.com
2025-07-09	KVM: x86: avoid underflow when scaling TSC frequency	Paolo Bonzini
	In function kvm_guest_time_update(), __scale_tsc() is used to calculate a TSC frequency rather than a TSC value. With low-enough ratios, a TSC value that is less than 1 would underflow to 0 and to an infinite while loop in kvm_get_time_scale(): kvm_guest_time_update(struct kvm_vcpu v) if (kvm_caps.has_tsc_control) tgt_tsc_khz = kvm_scale_tsc(tgt_tsc_khz, v->arch.l1_tsc_scaling_ratio); __scale_tsc(u64 ratio, u64 tsc) ratio=122380531, tsc=2299998, N=48 ratiotsc >> N = 0.999... -> 0 Later in the function: Call Trace: <TASK> kvm_get_time_scale arch/x86/kvm/x86.c:2458 [inline] kvm_guest_time_update+0x926/0xb00 arch/x86/kvm/x86.c:3268 vcpu_enter_guest.constprop.0+0x1e70/0x3cf0 arch/x86/kvm/x86.c:10678 vcpu_run+0x129/0x8d0 arch/x86/kvm/x86.c:11126 kvm_arch_vcpu_ioctl_run+0x37a/0x13d0 arch/x86/kvm/x86.c:11352 kvm_vcpu_ioctl+0x56b/0xe60 virt/kvm/kvm_main.c:4188 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl+0x12d/0x190 fs/ioctl.c:857 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x59/0x110 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x78/0xe2 This can really happen only when fuzzing, since the TSC frequency would have to be nonsensically low. Fixes: 35181e86df97 ("KVM: x86: Add a common TSC scaling function") Reported-by: Yuntao Liu <liuyuntao12@huawei.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-09	KVM: x86: Provide a capability to disable APERF/MPERF read intercepts	Jim Mattson
	Allow a guest to read the physical IA32_APERF and IA32_MPERF MSRs without interception. The IA32_APERF and IA32_MPERF MSRs are not virtualized. Writes are not handled at all. The MSR values are not zeroed on vCPU creation, saved on suspend, or restored on resume. No accommodation is made for processor migration or for sharing a logical processor with other tasks. No adjustments are made for non-unit TSC multipliers. The MSRs do not account for time the same way as the comparable PMU events, whether the PMU is virtualized by the traditional emulation method or the new mediated pass-through approach. Nonetheless, in a properly constrained environment, this capability can be combined with a guest CPUID table that advertises support for CPUID.6:ECX.APERFMPERF[bit 0] to induce a Linux guest to report the effective physical CPU frequency in /proc/cpuinfo. Moreover, there is no performance cost for this capability. Signed-off-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250530185239.2335185-3-jmattson@google.com Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250626001225.744268-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-09	KVM: x86: Replace growing set of *_in_guest bools with a u64	Jim Mattson
	Store each "disabled exit" boolean in a single bit rather than a byte. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250530185239.2335185-2-jmattson@google.com Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250626001225.744268-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-09	KVM: x86: Advertise support for LKGS	Xin Li
	Advertise support for LKGS (load into IA32_KERNEL_GS_BASE) to userspace if the instruction is supported by the underlying CPU. LKGS is introduced with FRED to completely eliminate the need to swapgs explicilty. It behaves like the MOV to GS instruction except that it loads the base address into the IA32_KERNEL_GS_BASE MSR instead of the GS segment’s descriptor cache, which is exactly what Linux kernel does to load a user level GS base. Thus there is no need to SWAPGS away from the kernel GS base. LKGS is an independent CPU feature that works correctly in a KVM guest without requiring explicit enablement. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Link: https://lore.kernel.org/r/20250626173521.2301088-1-xin@zytor.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-09	KVM: VMX: Add a macro to track which DEBUGCTL bits are host-owned	Sean Christopherson
	Add VMX_HOST_OWNED_DEBUGCTL_BITS to track which bits are host-owned, i.e. need to be preserved when running the guest, to dedup the logic without having to incur a memory load to get at kvm_x86_ops.HOST_OWNED_DEBUGCTL. No functional change intended. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/all/aF1yni8U6XNkyfRf@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-07	Merge tag 'tsa_x86_bugs_for_6.16' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU speculation fixes from Borislav Petkov: "Add the mitigation logic for Transient Scheduler Attacks (TSA) TSA are new aspeculative side channel attacks related to the execution timing of instructions under specific microarchitectural conditions. In some cases, an attacker may be able to use this timing information to infer data from other contexts, resulting in information leakage. Add the usual controls of the mitigation and integrate it into the existing speculation bugs infrastructure in the kernel" * tag 'tsa_x86_bugs_for_6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/process: Move the buffer clearing before MONITOR x86/microcode/AMD: Add TSA microcode SHAs KVM: SVM: Advertise TSA CPUID bits to guests x86/bugs: Add a Transient Scheduler Attacks mitigation x86/bugs: Rename MDS machinery to something more generic
2025-06-29	Merge tag 'x86_urgent_for_v6.16_rc4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Make sure DR6 and DR7 are initialized to their architectural values and not accidentally cleared, leading to misconfigurations * tag 'x86_urgent_for_v6.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/traps: Initialize DR7 by writing its architectural reset value x86/traps: Initialize DR6 by writing its architectural reset value
2025-06-25	VFIO: KVM: x86: Drop kvm_arch_{start,end}_assignment()	Sean Christopherson
	Drop kvm_arch_{start,end}_assignment() and all associated code now that KVM x86 no longer consumes assigned_device_count. Tracking whether or not a VFIO-assigned device is formally associated with a VM is fundamentally flawed, as such an association is optional for general usage, i.e. is prone to false negatives. E.g. prior to commit 2edd9cb79fb3 ("kvm: detect assigned device via irqbypass manager"), device passthrough via VFIO would fail to enable IRQ bypass if userspace omitted the formal VFIO<=>KVM binding. And device drivers that need the VFIO<=>KVM connection, e.g. KVM-GT, shouldn't be relying on generic x86 tracking infrastructure. Cc: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250523011756.3243624-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-25	Revert "kvm: detect assigned device via irqbypass manager"	Sean Christopherson
	Now that KVM explicitly tracks the number of possible bypass IRQs, and doesn't conflate IRQ bypass with host MMIO access, stop bumping the assigned device count when adding an IRQ bypass producer. This reverts commit 2edd9cb79fb31b0907c6e0cdce2824780cf9b153. Link: https://lore.kernel.org/r/20250523011756.3243624-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-25	Merge branch 'kvm-x86 mmio'	Sean Christopherson
	Merge the MMIO stale data branch with the device posted IRQs branch to provide a common base for removing KVM's tracking of "assigned" devices. Link: https://lore.kernel.org/all/20250523011756.3243624-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-25	KVM: SVM: Simplify MSR interception logic for IA32_XSS MSR	Chao Gao
	Use svm_set_intercept_for_msr() directly to configure IA32_XSS MSR interception, ensuring consistency with other cases where MSRs are intercepted depending on guest caps and CPUIDs. No functional change intended. Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250612081947.94081-3-chao.gao@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-25	KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush	Manuel Andreas
	In KVM guests with Hyper-V hypercalls enabled, the hypercalls HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST and HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX allow a guest to request invalidation of portions of a virtual TLB. For this, the hypercall parameter includes a list of GVAs that are supposed to be invalidated. However, when non-canonical GVAs are passed, there is currently no filtering in place and they are eventually passed to checked invocations of INVVPID on Intel / INVLPGA on AMD. While AMD's INVLPGA silently ignores non-canonical addresses (effectively a no-op), Intel's INVVPID explicitly signals VM-Fail and ultimately triggers the WARN_ONCE in invvpid_error(): invvpid failed: ext=0x0 vpid=1 gva=0xaaaaaaaaaaaaa000 WARNING: CPU: 6 PID: 326 at arch/x86/kvm/vmx/vmx.c:482 invvpid_error+0x91/0xa0 [kvm_intel] Modules linked in: kvm_intel kvm 9pnet_virtio irqbypass fuse CPU: 6 UID: 0 PID: 326 Comm: kvm-vm Not tainted 6.15.0 #14 PREEMPT(voluntary) RIP: 0010:invvpid_error+0x91/0xa0 [kvm_intel] Call Trace: vmx_flush_tlb_gva+0x320/0x490 [kvm_intel] kvm_hv_vcpu_flush_tlb+0x24f/0x4f0 [kvm] kvm_arch_vcpu_ioctl_run+0x3013/0x5810 [kvm] Hyper-V documents that invalid GVAs (those that are beyond a partition's GVA space) are to be ignored. While not completely clear whether this ruling also applies to non-canonical GVAs, it is likely fine to make that assumption, and manual testing on Azure confirms "real" Hyper-V interprets the specification in the same way. Skip non-canonical GVAs when processing the list of address to avoid tripping the INVVPID failure. Alternatively, KVM could filter out "bad" GVAs before inserting into the FIFO, but practically speaking the only downside of pushing validation to the final processing is that doing so is suboptimal for the guest, and no well-behaved guest will request TLB flushes for non-canonical addresses. Fixes: 260970862c88 ("KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently") Cc: stable@vger.kernel.org Signed-off-by: Manuel Andreas <manuel.andreas@tum.de> Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/c090efb3-ef82-499f-a5e0-360fc8420fb7@tum.de Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-25	KVM: VMX: Apply MMIO Stale Data mitigation if KVM maps MMIO into the guest	Sean Christopherson
	Enforce the MMIO State Data mitigation if KVM has ever mapped host MMIO into the VM, not if the VM has an assigned device. VFIO is but one of many ways to map host MMIO into a KVM guest, and even within VFIO, formally attaching a device to a VM via KVM_DEV_VFIO_FILE_ADD is entirely optional. Track whether or not the guest can access host MMIO on a per-MMU basis, i.e. based on whether or not the vCPU has a mapping to host MMIO. For simplicity, track MMIO mappings in "special" rools (those without a kvm_mmu_page) at the VM level, as only Intel CPUs are vulnerable, and so only legacy 32-bit shadow paging is affected, i.e. lack of precise tracking is a complete non-issue. Make the per-MMU and per-VM flags sticky. Detecting when all MMIO mappings have been removed would be absurdly complex. And in practice, removing MMIO from a guest will be done by deleting the associated memslot, which by default will force KVM to re-allocate all roots. Special roots will forever be mitigated, but as above, the affected scenarios are not expected to be performance sensitive. Use a VMX_RUN flag to communicate the need for a buffers flush to vmx_vcpu_enter_exit() so that kvm_vcpu_can_access_host_mmio() and all its dependencies don't need to be marked __always_inline, e.g. so that KASAN doesn't trigger a noinstr violation. Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Fixes: 8cb861e9e3c9 ("x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data") Tested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Link: https://lore.kernel.org/r/20250523011756.3243624-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-24	KVM: x86: Deduplicate MSR interception enabling and disabling	Chao Gao
	Extract a common function from MSR interception disabling logic and create disabling and enabling functions based on it. This removes most of the duplicated code for MSR interception disabling/enabling. No functional change intended. Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250612081947.94081-2-chao.gao@intel.com [sean: s/enable/set, inline the wrappers] Signed-off-by: Sean Christopherson <seanjc@google.com>