summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2023-04-09Merge tag 'x86_urgent_for_v6.3_rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Add a new Intel Arrow Lake CPU model number - Fix a confusion about how to check the version of the ACPI spec which supports a "online capable" bit in the MADT table which lead to a bunch of boot breakages with Zen1 systems and VMs * tag 'x86_urgent_for_v6.3_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add model number for Intel Arrow Lake processor x86/acpi/boot: Correct acpi_is_processor_usable() check x86/ACPI/boot: Use FADT version to check support for online capable
2023-04-08x86/kexec: remove unnecessary arch_kexec_kernel_image_load()Bjorn Helgaas
Patch series "kexec: Remove unnecessary arch hook", v2. There are no arch-specific things in arch_kexec_kernel_image_load(), so remove it and just use the generic version. This patch (of 2): The x86 implementation of arch_kexec_kernel_image_load() is functionally identical to the generic arch_kexec_kernel_image_load(): arch_kexec_kernel_image_load # x86 if (!image->fops || !image->fops->load) return ERR_PTR(-ENOEXEC); return image->fops->load(image, image->kernel_buf, ...) arch_kexec_kernel_image_load # generic kexec_image_load_default if (!image->fops || !image->fops->load) return ERR_PTR(-ENOEXEC); return image->fops->load(image, image->kernel_buf, ...) Remove the x86-specific version and use the generic arch_kexec_kernel_image_load(). No functional change intended. Link: https://lkml.kernel.org/r/20230307224416.907040-1-helgaas@kernel.org Link: https://lkml.kernel.org/r/20230307224416.907040-2-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-08ELF: fix all "Elf" typosAlexey Dobriyan
ELF is acronym and therefore should be spelled in all caps. I left one exception at Documentation/arm/nwfpe/nwfpe.rst which looks like being written in the first person. Link: https://lkml.kernel.org/r/Y/3wGWQviIOkyLJW@p183 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-08purgatory: fix disabling debug infoAlyssa Ross
Since 32ef9e5054ec, -Wa,-gdwarf-2 is no longer used in KBUILD_AFLAGS. Instead, it includes -g, the appropriate -gdwarf-* flag, and also the -Wa versions of both of those if building with Clang and GNU as. As a result, debug info was being generated for the purgatory objects, even though the intention was that it not be. Fixes: 32ef9e5054ec ("Makefile.debug: re-enable debug info for .S files") Signed-off-by: Alyssa Ross <hi@alyssa.is> Cc: stable@vger.kernel.org Acked-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-04-07KVM: x86/pmu: Prevent the PMU from counting disallowed eventsAaron Lewis
When counting "Instructions Retired" (0xc0) in a guest, KVM will occasionally increment the PMU counter regardless of if that event is being filtered. This is because some PMU events are incremented via kvm_pmu_trigger_event(), which doesn't know about the event filter. Add the event filter to kvm_pmu_trigger_event(), so events that are disallowed do not increment their counters. Fixes: 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions") Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230307141400.1486314-2-aaronlewis@google.com [sean: prepend "pmc" to the new function] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-07KVM: x86/pmu: Fix a typo in kvm_pmu_request_counter_reprogam()Like Xu
Fix a "reprogam" => "reprogram" typo in kvm_pmu_request_counter_reprogam(). Fixes: 68fb4757e867 ("KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event()") Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230310113349.31799-1-likexu@tencent.com [sean: trim the changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-07x86/apic: Fix atomic update of offset in reserve_eilvt_offset()Uros Bizjak
The detection of atomic update failure in reserve_eilvt_offset() is not correct. The value returned by atomic_cmpxchg() should be compared to the old value from the location to be updated. If these two are the same, then atomic update succeeded and "eilvt_offsets[offset]" location is updated to "new" in an atomic way. Otherwise, the atomic update failed and it should be retried with the value from "eilvt_offsets[offset]" - exactly what atomic_try_cmpxchg() does in a correct and more optimal way. Fixes: a68c439b1966c ("apic, x86: Check if EILVT APIC registers are available (AMD only)") Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230227160917.107820-1-ubizjak@gmail.com
2023-04-06KVM: x86/pmu: Rewrite reprogram_counters() to improve performanceLike Xu
A valid pmc is always tested before using pmu->reprogram_pmi. Eliminate this part of the redundancy by setting the counter's bitmask directly, and in addition, trigger KVM_REQ_PMU only once to save more cpu cycles. Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230214050757.9623-4-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: VMX: Refactor intel_pmu_{g,}set_msr() to align with other helpersSean Christopherson
Invert the flows in intel_pmu_{g,s}et_msr()'s case statements so that they follow the kernel's preferred style of: if (<not valid>) return <error> <commit change> return <success> which is also the style used by every other {g,s}et_msr() helper (except AMD's PMU variant, which doesn't use a switch statement). Modify the "set" paths with costly side effects, i.e. that reprogram counters, to skip only the side effects, i.e. to perform reserved bits checks even if the value is unchanged. None of the reserved bits checks are expensive, so there's no strong justification for skipping them, and guarding only the side effect makes it slightly more obvious what is being skipped and why. No functional change intended (assuming no reserved bit bugs). Link: https://lkml.kernel.org/r/Y%2B6cfen%2FCpO3%2FdLO%40google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86/pmu: Rename pmc_is_enabled() to pmc_is_globally_enabled()Like Xu
The name of function pmc_is_enabled() is a bit misleading. A PMC can be disabled either by PERF_CLOBAL_CTRL or by its corresponding EVTSEL. Append global semantics to its name. Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230214050757.9623-2-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86/pmu: Zero out LBR capabilities during PMU refreshSean Christopherson
Zero out the LBR capabilities during PMU refresh to avoid exposing LBRs to the guest against userspace's wishes. If userspace modifies the guest's CPUID model or invokes KVM_CAP_PMU_CAPABILITY to disable vPMU after an initial KVM_SET_CPUID2, but before the first KVM_RUN, KVM will retain the previous LBR info due to bailing before refreshing the LBR descriptor. Note, this is a very theoretical bug, there is no known use case where a VMM would deliberately enable the vPMU via KVM_SET_CPUID2, and then later disable the vPMU. Link: https://lore.kernel.org/r/20230311004618.920745-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86/pmu: WARN and bug the VM if PMU is refreshed after vCPU has runSean Christopherson
Now that KVM disallows changing feature MSRs, i.e. PERF_CAPABILITIES, after running a vCPU, WARN and bug the VM if the PMU is refreshed after the vCPU has run. Note, KVM has disallowed CPUID updates after running a vCPU since commit feb627e8d6f6 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN"), i.e. PERF_CAPABILITIES was the only remaining way to trigger a PMU refresh after KVM_RUN. Cc: Like Xu <like.xu.linux@gmail.com> Link: https://lore.kernel.org/r/20230311004618.920745-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86: Disallow writes to immutable feature MSRs after KVM_RUNSean Christopherson
Disallow writes to feature MSRs after KVM_RUN to prevent userspace from changing the vCPU model after running the vCPU. Similar to guest CPUID, KVM uses feature MSRs to configure intercepts, determine what operations are/aren't allowed, etc. Changing the capabilities while the vCPU is active will at best yield unpredictable guest behavior, and at worst could be dangerous to KVM. Allow writing the current value, e.g. so that userspace can blindly set all MSRs when emulating RESET, and unconditionally allow writes to MSR_IA32_UCODE_REV so that userspace can emulate patch loads. Special case the VMX MSRs to keep the generic list small, i.e. so that KVM can do a linear walk of the generic list without incurring meaningful overhead. Cc: Like Xu <like.xu.linux@gmail.com> Cc: Yu Zhang <yu.c.zhang@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230311004618.920745-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86: Generate set of VMX feature MSRs using first/last definitionsSean Christopherson
Add VMX MSRs to the runtime list of feature MSRs by iterating over the range of emulated MSRs instead of manually defining each MSR in the "all" list. Using the range definition reduces the cost of emulating a new VMX MSR, e.g. prevents forgetting to add an MSR to the list. Extracting the VMX MSRs from the "all" list, which is a compile-time constant, also shrinks the list to the point where the compiler can heavily optimize code that iterates over the list. No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230311004618.920745-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86: Add macros to track first...last VMX feature MSRsSean Christopherson
Add macros to track the range of VMX feature MSRs that are emulated by KVM to reduce the maintenance cost of extending the set of emulated MSRs. Note, KVM doesn't necessarily emulate all known/consumed VMX MSRs, e.g. PROCBASED_CTLS3 is consumed by KVM to enable IPI virtualization, but is not emulated as KVM doesn't emulate/virtualize IPI virtualization for nested guests. No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230311004618.920745-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86: Add a helper to query whether or not a vCPU has ever runSean Christopherson
Add a helper to query if a vCPU has run so that KVM doesn't have to open code the check on last_vmentry_cpu being set to a magic value. No functional change intended. Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com> Cc: Like Xu <like.xu.linux@gmail.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230311004618.920745-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86: Rename kvm_init_msr_list() to clarify it inits multiple listsSean Christopherson
Rename kvm_init_msr_list() to kvm_init_msr_lists() to clarify that it initializes multiple lists: MSRs to save, emulated MSRs, and feature MSRs. No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230311004618.920745-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hotBasavaraj Natikar
The AMD [1022:15b8] USB controller loses some internal functional MSI-X context when transitioning from D0 to D3hot. BIOS normally traps D0->D3hot and D3hot->D0 transitions so it can save and restore that internal context, but some firmware in the field can't do this because it fails to clear the AMD_15B8_RCC_DEV2_EPF0_STRAP2 NO_SOFT_RESET bit. Clear AMD_15B8_RCC_DEV2_EPF0_STRAP2 NO_SOFT_RESET bit before USB controller initialization during boot. Link: https://lore.kernel.org/linux-usb/Y%2Fz9GdHjPyF2rNG3@glanzmann.de/T/#u Link: https://lore.kernel.org/r/20230329172859.699743-1-Basavaraj.Natikar@amd.com Reported-by: Thomas Glanzmann <thomas@glanzmann.de> Tested-by: Thomas Glanzmann <thomas@glanzmann.de> Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Cc: stable@vger.kernel.org
2023-04-06x86/mm/iommu/sva: Do not allow to set FORCE_TAGGED_SVA bit from outsideKirill A. Shutemov
arch_prctl(ARCH_FORCE_TAGGED_SVA) overrides the default and allows LAM and SVA to co-exist in the process. It is expected by called by the process when it knows what it is doing. arch_prctl() operates on the current process, but the same code is reachable from ptrace where it can be called on arbitrary task. Make it strict and only allow to set MM_CONTEXT_FORCE_TAGGED_SVA for the current process. Fixes: 23e5d9ec2bab ("x86/mm/iommu/sva: Make LAM and SVA mutually exclusive") Suggested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/all/20230403111020.3136-3-kirill.shutemov%40linux.intel.com
2023-04-06x86/mm/iommu/sva: Fix error code for LAM enabling failure due to SVAKirill A. Shutemov
Normally, LAM and SVA are mutually exclusive. LAM enabling will fail if SVA is already in use. Correct error code for the failure. EINTR is nonsensical there. Fixes: 23e5d9ec2bab ("x86/mm/iommu/sva: Make LAM and SVA mutually exclusive") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/all/CACT4Y+YfqSMsZArhh25TESmG-U4jO5Hjphz87wKSnTiaw2Wrfw@mail.gmail.com Link: https://lore.kernel.org/all/20230403111020.3136-2-kirill.shutemov%40linux.intel.com
2023-04-06KVM: SVM: Return the local "r" variable from svm_set_msr()Sean Christopherson
Rename "r" to "ret" and actually return it from svm_set_msr() to reduce the probability of repeating the mistake of commit 723d5fb0ffe4 ("kvm: svm: Add IA32_FLUSH_CMD guest support"), which set "r" thinking that it would be propagated to the caller. Alternatively, the declaration of "r" could be moved into the handling of MSR_TSC_AUX, but that risks variable shadowing in the future. A wrapper for kvm_set_user_return_msr() would allow eliding a local variable, but that feels like delaying the inevitable. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230322011440.2195485-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-06KVM: x86: Virtualize FLUSH_L1D and passthrough MSR_IA32_FLUSH_CMDSean Christopherson
Virtualize FLUSH_L1D so that the guest can use the performant L1D flush if one of the many mitigations might require a flush in the guest, e.g. Linux provides an option to flush the L1D when switching mms. Passthrough MSR_IA32_FLUSH_CMD for write when it's supported in hardware and exposed to the guest, i.e. always let the guest write it directly if FLUSH_L1D is fully supported. Forward writes to hardware in host context on the off chance that KVM ends up emulating a WRMSR, or in the really unlikely scenario where userspace wants to force a flush. Restrict these forwarded WRMSRs to the known command out of an abundance of caution. Passing through the MSR means the guest can throw any and all values at hardware, but doing so in host context is arguably a bit more dangerous. Link: https://lkml.kernel.org/r/CALMp9eTt3xzAEoQ038bJQ9LN0ZOXrSWsN7xnNUD%2B0SS%3DWwF7Pg%40mail.gmail.com Link: https://lore.kernel.org/all/20230201132905.549148-2-eesposit@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230322011440.2195485-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-06KVM: x86: Move MSR_IA32_PRED_CMD WRMSR emulation to common codeSean Christopherson
Dedup the handling of MSR_IA32_PRED_CMD across VMX and SVM by moving the logic to kvm_set_msr_common(). Now that the MSR interception toggling is handled as part of setting guest CPUID, the VMX and SVM paths are identical. Opportunistically massage the code to make it a wee bit denser. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20230322011440.2195485-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-06KVM: SVM: Passthrough MSR_IA32_PRED_CMD based purely on host+guest CPUIDSean Christopherson
Passthrough MSR_IA32_PRED_CMD based purely on whether or not the MSR is supported and enabled, i.e. don't wait until the first write. There's no benefit to deferred passthrough, and the extra logic only adds complexity. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230322011440.2195485-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-06KVM: VMX: Passthrough MSR_IA32_PRED_CMD based purely on host+guest CPUIDSean Christopherson
Passthrough MSR_IA32_PRED_CMD based purely on whether or not the MSR is supported and enabled, i.e. don't wait until the first write. There's no benefit to deferred passthrough, and the extra logic only adds complexity. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20230322011440.2195485-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-06KVM: x86: Revert MSR_IA32_FLUSH_CMD.FLUSH_L1D enablingSean Christopherson
Revert the recently added virtualizing of MSR_IA32_FLUSH_CMD, as both the VMX and SVM are fatally buggy to guests that use MSR_IA32_FLUSH_CMD or MSR_IA32_PRED_CMD, and because the entire foundation of the logic is flawed. The most immediate problem is an inverted check on @cmd that results in rejecting legal values. SVM doubles down on bugs and drops the error, i.e. silently breaks all guest mitigations based on the command MSRs. The next issue is that neither VMX nor SVM was updated to mark MSR_IA32_FLUSH_CMD as being a possible passthrough MSR, which isn't hugely problematic, but does break MSR filtering and triggers a WARN on VMX designed to catch this exact bug. The foundational issues stem from the MSR_IA32_FLUSH_CMD code reusing logic from MSR_IA32_PRED_CMD, which in turn was likely copied from KVM's support for MSR_IA32_SPEC_CTRL. The copy+paste from MSR_IA32_SPEC_CTRL was misguided as MSR_IA32_PRED_CMD (and MSR_IA32_FLUSH_CMD) is a write-only MSR, i.e. doesn't need the same "deferred passthrough" shenanigans as MSR_IA32_SPEC_CTRL. Revert all MSR_IA32_FLUSH_CMD enabling in one fell swoop so that there is no point where KVM advertises, but does not support, L1D_FLUSH. This reverts commits 45cf86f26148e549c5ba4a8ab32a390e4bde216e, 723d5fb0ffe4c02bd4edf47ea02c02e454719f28, and a807b78ad04b2eaa348f52f5cc7702385b6de1ee. Reported-by: Nathan Chancellor <nathan@kernel.org> Link: https://lkml.kernel.org/r/20230317190432.GA863767%40dev-arch.thelio-3990X Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Message-Id: <20230322011440.2195485-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-05x86/mm: try VMA lock-based page fault handling firstSuren Baghdasaryan
Attempt VMA lock-based page fault handling first, and fall back to the existing mmap_lock-based handling if that fails. Link: https://lkml.kernel.org/r/20230227173632.3292573-30-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05KVM: x86/pmu: Disallow legacy LBRs if architectural LBRs are availableSean Christopherson
Disallow enabling LBR support if the CPU supports architectural LBRs. Traditional LBR support is absent on CPU models that have architectural LBRs, and KVM doesn't yet support arch LBRs, i.e. KVM will pass through non-existent MSRs if userspace enables LBRs for the guest. Cc: stable@vger.kernel.org Cc: Yang Weijiang <weijiang.yang@intel.com> Cc: Like Xu <like.xu.linux@gmail.com> Reported-by: Paolo Bonzini <pbonzini@redhat.com> Fixes: be635e34c284 ("KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES") Tested-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230128001427.2548858-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-05KVM: x86: set "mitigate_smt_rsb" storage-class-specifier to staticTom Rix
smatch reports arch/x86/kvm/x86.c:199:20: warning: symbol 'mitigate_smt_rsb' was not declared. Should it be static? This variable is only used in one file so it should be static. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20230404010141.1913667-1-trix@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-05KVM: x86/pmu: Zero out pmu->all_valid_pmc_idx each time it's refreshedLike Xu
The kvm_pmu_refresh() may be called repeatedly (e.g. configure guest CPUID repeatedly or update MSR_IA32_PERF_CAPABILITIES) and each call will use the last pmu->all_valid_pmc_idx value, with the residual bits introducing additional overhead later in the vPMU emulation. Fixes: b35e5548b411 ("KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC") Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230404071759.75376-1-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-05Kconfig: introduce HAS_IOPORT option and select it as necessaryNiklas Schnelle
We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O Port access. In a future patch HAS_IOPORT=n will disable compilation of the I/O accessor functions inb()/outb() and friends on architectures which can not meaningfully support legacy I/O spaces such as s390. The following architectures do not select HAS_IOPORT: * ARC * C-SKY * Hexagon * Nios II * OpenRISC * s390 * User-Mode Linux * Xtensa All other architectures select HAS_IOPORT at least conditionally. The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs for HAS_IOPORT specific sections will be added in subsequent patches on a per subsystem basis. Co-developed-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Arnd Bergmann <arnd@kernel.org> Acked-by: Johannes Berg <johannes@sipsolutions.net> # for ARCH=um Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-04-05KVM: VMX: Use is_64_bit_mode() to check 64-bit mode in SGX handlerBinbin Wu
sgx_get_encls_gva() uses is_long_mode() to check 64-bit mode, however, SGX system leaf instructions are valid in compatibility mode, should use is_64_bit_mode() instead. Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions") Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20230404032502.27798-1-binbin.wu@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-05x86/cpu: Add Xeon Emerald Rapids to list of CPUs that support PPINTony Luck
This should be the last addition to this table. Future CPUs will enumerate PPIN support using CPUID. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230404212124.428118-1-tony.luck@intel.com
2023-04-05arch/x86: Remove "select SRCU"Paul E. McKenney
Now that the SRCU Kconfig option is unconditionally selected, there is no longer any point in selecting it. Therefore, remove the "select SRCU" Kconfig statements. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <x86@kernel.org> Reviewed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2023-04-05kvm: Remove "select SRCU"Paul E. McKenney
Now that the SRCU Kconfig option is unconditionally selected, there is no longer any point in selecting it. Therefore, remove the "select SRCU" Kconfig statements from the various KVM Kconfig files. Acked-by: Sean Christopherson <seanjc@google.com> (x86) Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <kvm@vger.kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> (arm64) Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Anup Patel <anup@brainfault.org> (riscv) Acked-by: Heiko Carstens <hca@linux.ibm.com> (s390) Reviewed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2023-04-05x86/cpu: Add model number for Intel Arrow Lake processorTony Luck
Successor to Lunar Lake. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230404174641.426593-1-tony.luck@intel.com
2023-04-05KVM: x86: Redefine 'longmode' as a flag for KVM_EXIT_HYPERCALLOliver Upton
The 'longmode' field is a bit annoying as it blows an entire __u32 to represent a boolean value. Since other architectures are looking to add support for KVM_EXIT_HYPERCALL, now is probably a good time to clean it up. Redefine the field (and the remaining padding) as a set of flags. Preserve the existing ABI by using bit 0 to indicate if the guest was in long mode and requiring that the remaining 31 bits must be zero. Cc: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230404154050.2270077-2-oliver.upton@linux.dev
2023-04-04x86/boot: Centralize __pa()/__va() definitionsKirill A. Shutemov
Replace multiple __pa()/__va() definitions with a single one in misc.h. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20230330114956.20342-2-kirill.shutemov%40linux.intel.com
2023-04-04KVM: x86/mmu: Merge all handle_changed_pte*() functionsVipin Sharma
Merge __handle_changed_pte() and handle_changed_spte_acc_track() into a single function, handle_changed_pte(), as the two are always used together. Remove the existing handle_changed_pte(), as it's just a wrapper that calls __handle_changed_pte() and handle_changed_spte_acc_track(). Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: David Matlack <dmatlack@google.com> [sean: massage changelog] Link: https://lore.kernel.org/r/20230321220021.2119033-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-04KVM: x86/mmu: Remove handle_changed_spte_dirty_log()Vipin Sharma
Remove handle_changed_spte_dirty_log() as there is no code flow which sets 4KiB SPTE writable and hit this path. This function marks the page dirty in a memslot only if new SPTE is 4KiB in size and writable. Current users of handle_changed_spte_dirty_log() are: 1. set_spte_gfn() - Create only non writable SPTEs. 2. write_protect_gfn() - Change an SPTE to non writable. 3. zap leaf and roots APIs - Everything is 0. 4. handle_removed_pt() - Sets SPTEs to REMOVED_SPTE 5. tdp_mmu_link_sp() - Makes non leaf SPTEs. There is also no path which creates a writable 4KiB without going through make_spte() and this functions takes care of marking SPTE dirty in the memslot if it is PT_WRITABLE. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: David Matlack <dmatlack@google.com> [sean: add blurb to __handle_changed_spte()'s comment] Link: https://lore.kernel.org/r/20230321220021.2119033-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-04KVM: x86/mmu: Remove "record_acc_track" in __tdp_mmu_set_spte()Vipin Sharma
Remove bool parameter "record_acc_track" from __tdp_mmu_set_spte() and refactor the code. This variable is always set to true by its caller. Remove single and double underscore prefix from tdp_mmu_set_spte() related APIs: 1. Change __tdp_mmu_set_spte() to tdp_mmu_set_spte() 2. Change _tdp_mmu_set_spte() to tdp_mmu_iter_set_spte() Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20230321220021.2119033-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-04KVM: x86/mmu: Bypass __handle_changed_spte() when aging TDP MMU SPTEsVipin Sharma
Drop everything except the "tdp_mmu_spte_changed" tracepoint part of __handle_changed_spte() when aging SPTEs in the TDP MMU, as clearing the accessed status doesn't affect the SPTE's shadow-present status, whether or not the SPTE is a leaf, or change the PFN. I.e. none of the functional updates handled by __handle_changed_spte() are relevant. Losing __handle_changed_spte()'s sanity checks does mean that a bug could theoretical go unnoticed, but that scenario is extremely unlikely, e.g. would effectively require a misconfigured MMU or a locking bug elsewhere. Link: https://lore.kernel.org/all/Y9HcHRBShQgjxsQb@google.com Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: David Matlack <dmatlack@google.com> [sean: massage changelog] Link: https://lore.kernel.org/r/20230321220021.2119033-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-04KVM: x86/mmu: Drop unnecessary dirty log checks when aging TDP MMU SPTEsVipin Sharma
Drop the unnecessary call to handle dirty log updates when aging TDP MMU SPTEs, as neither clearing the Accessed bit nor marking a SPTE for access tracking can _set_ the Writable bit, i.e. can't trigger marking a gfn dirty in its memslot. The access tracking path can _clear_ the Writable bit, e.g. if the XCHG races with fast_page_fault() and writes the stale value without the Writable bit set, but clearing the Writable bit outside of mmu_lock is not allowed, i.e. access tracking can't spuriously set the Writable bit. Signed-off-by: Vipin Sharma <vipinsh@google.com> [sean: split to separate patch, apply to dirty path, write changelog] Link: https://lore.kernel.org/r/20230321220021.2119033-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-04KVM: x86/mmu: Clear only A-bit (if enabled) when aging TDP MMU SPTEsVipin Sharma
Use tdp_mmu_clear_spte_bits() when clearing the Accessed bit in TDP MMU SPTEs so as to use an atomic-AND instead of XCHG to clear the A-bit. Similar to the D-bit story, this will allow KVM to bypass __handle_changed_spte() by ensuring only the A-bit is modified. Link: https://lore.kernel.org/all/Y9HcHRBShQgjxsQb@google.com Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: David Matlack <dmatlack@google.com> [sean: massage changelog] Link: https://lore.kernel.org/r/20230321220021.2119033-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-04KVM: x86/mmu: Remove "record_dirty_log" in __tdp_mmu_set_spte()Vipin Sharma
Remove bool parameter "record_dirty_log" from __tdp_mmu_set_spte() and refactor the code as this variable is always set to true by its caller. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20230321220021.2119033-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-04KVM: x86/mmu: Bypass __handle_changed_spte() when clearing TDP MMU dirty bitsVipin Sharma
Drop everything except marking the PFN dirty and the relevant tracepoint parts of __handle_changed_spte() when clearing the dirty status of gfns in the TDP MMU. Clearing only the Dirty (or Writable) bit doesn't affect the SPTEs shadow-present status, whether or not the SPTE is a leaf, or change the SPTE's PFN. I.e. other than marking the PFN dirty, none of the functional updates handled by __handle_changed_spte() are relevant. Losing __handle_changed_spte()'s sanity checks does mean that a bug could theoretical go unnoticed, but that scenario is extremely unlikely, e.g. would effectively require a misconfigured or a locking bug elsewhere. Opportunistically remove a comment blurb from __handle_changed_spte() about all modifications to TDP MMU SPTEs needing to invoke said function, that "rule" hasn't been true since fast page fault support was added for the TDP MMU (and perhaps even before). Tested on a VM (160 vCPUs, 160 GB memory) and found that performance of clear dirty log stage improved by ~40% in dirty_log_perf_test (with the full optimization applied). Before optimization: -------------------- Iteration 1 clear dirty log time: 3.638543593s Iteration 2 clear dirty log time: 3.145032742s Iteration 3 clear dirty log time: 3.142340358s Clear dirty log over 3 iterations took 9.925916693s. (Avg 3.308638897s/iteration) After optimization: ------------------- Iteration 1 clear dirty log time: 2.318988110s Iteration 2 clear dirty log time: 1.794470164s Iteration 3 clear dirty log time: 1.791668628s Clear dirty log over 3 iterations took 5.905126902s. (Avg 1.968375634s/iteration) Link: https://lore.kernel.org/all/Y9hXmz%2FnDOr1hQal@google.com Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: David Matlack <dmatlack@google.com> [sean: split the switch to atomic-AND to a separate patch] Link: https://lore.kernel.org/r/20230321220021.2119033-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-04KVM: x86/mmu: Drop access tracking checks when clearing TDP MMU dirty bitsVipin Sharma
Drop the unnecessary call to handle access-tracking changes when clearing the dirty status of TDP MMU SPTEs. Neither the Dirty bit nor the Writable bit has any impact on the accessed state of a page, i.e. clearing only the aforementioned bits doesn't make an accessed SPTE suddently not accessed. Signed-off-by: Vipin Sharma <vipinsh@google.com> [sean: split to separate patch, write changelog] Link: https://lore.kernel.org/r/20230321220021.2119033-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-04KVM: x86/mmu: Atomically clear SPTE dirty state in the clear-dirty-log flowVipin Sharma
Optimize the clearing of dirty state in TDP MMU SPTEs by doing an atomic-AND (on SPTEs that have volatile bits) instead of the full XCHG that currently ends up being invoked (see kvm_tdp_mmu_write_spte()). Clearing _only_ the bit in question will allow KVM to skip the many irrelevant checks in __handle_changed_spte() by avoiding any collateral damage due to the XCHG writing all SPTE bits, e.g. the XCHG could race with fast_page_fault() setting the W-bit and the CPU setting the D-bit, and thus incorrectly drop the CPU's D-bit update. Link: https://lore.kernel.org/all/Y9hXmz%2FnDOr1hQal@google.com Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: David Matlack <dmatlack@google.com> [sean: split the switch to atomic-AND to a separate patch] Link: https://lore.kernel.org/r/20230321220021.2119033-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-04KVM: x86/mmu: Consolidate Dirty vs. Writable clearing logic in TDP MMUVipin Sharma
Deduplicate the guts of the TDP MMU's clearing of dirty status by snapshotting whether to check+clear the Dirty bit vs. the Writable bit, which is the only difference between the two flavors of dirty tracking. Note, kvm_ad_enabled() is just a wrapper for shadow_accessed_mask, i.e. is constant after kvm-{intel,amd}.ko is loaded. Link: https://lore.kernel.org/all/Yz4Qi7cn7TWTWQjj@google.com Signed-off-by: Vipin Sharma <vipinsh@google.com> [sean: split to separate patch, apply to dirty log, write changelog] Link: https://lore.kernel.org/r/20230321220021.2119033-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-04KVM: x86/mmu: Use kvm_ad_enabled() to determine if TDP MMU SPTEs need wrprotVipin Sharma
Use the constant-after-module-load kvm_ad_enabled() to check if SPTEs in the TDP MMU need to be write-protected when clearing accessed/dirty status instead of manually checking every SPTE. The per-SPTE A/D enabling is specific to nested EPT MMUs, i.e. when KVM is using EPT A/D bits but L1 is not, and so cannot happen in the TDP MMU (which is non-nested only). Keep the original code as sanity checks buried under MMU_WARN_ON(). MMU_WARN_ON() is more or less useless at the moment, but there are plans to change that. Link: https://lore.kernel.org/all/Yz4Qi7cn7TWTWQjj@google.com Signed-off-by: Vipin Sharma <vipinsh@google.com> [sean: split to separate patch, apply to dirty path, write changelog] Link: https://lore.kernel.org/r/20230321220021.2119033-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>