summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2025-07-11um/x86: Add system call table to header fileThomas Weißschuh
The generic system call tracing infrastructure requires access to the system call table. The symbol is already visible to the linker but is lacking a public declaration. Add a public declaration. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://patch.msgid.link/20250703-uml-have_syscall_tracepoints-v1-1-23c1d3808578@linutronix.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-10x86/apic: Rename 'reg_off' to 'reg'Neeraj Upadhyay
Rename the 'reg_off' parameter of apic_{set|get}_reg() to 'reg' to match other usages in apic.h. No functional change intended. Reviewed-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Link: https://lore.kernel.org/r/20250709033242.267892-15-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10x86/apic: KVM: Move apic_test)vector() to common codeNeeraj Upadhyay
Move apic_test_vector() to apic.h in order to reuse it in the Secure AVIC guest APIC driver in later patches to test vector state in the APIC backing page. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-14-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10x86/apic: KVM: Move lapic set/clear_vector() helpers to common codeNeeraj Upadhyay
Move apic_clear_vector() and apic_set_vector() helper functions to apic.h in order to reuse them in the Secure AVIC guest APIC driver in later patches to atomically set/clear vectors in the APIC backing page. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-13-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10x86/apic: KVM: Move lapic get/set helpers to common codeNeeraj Upadhyay
Move the apic_get_reg(), apic_set_reg(), apic_get_reg64() and apic_set_reg64() helper functions to apic.h in order to reuse them in the Secure AVIC guest APIC driver in later patches to read/write registers from/to the APIC backing page. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-12-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10x86/apic: KVM: Move apic_find_highest_vector() to a common headerNeeraj Upadhyay
In preparation for using apic_find_highest_vector() in Secure AVIC guest APIC driver, move it and associated macros to apic.h. No functional change intended. Acked-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Link: https://lore.kernel.org/r/20250709033242.267892-11-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: x86: Rename lapic set/clear vector helpersNeeraj Upadhyay
In preparation for moving kvm-internal kvm_lapic_set_vector(), kvm_lapic_clear_vector() to apic.h for use in Secure AVIC APIC driver, rename them as part of the APIC API. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-10-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: x86: Rename lapic get/set_reg64() helpersNeeraj Upadhyay
In preparation for moving kvm-internal __kvm_lapic_set_reg64(), __kvm_lapic_get_reg64() to apic.h for use in Secure AVIC APIC driver, rename them as part of the APIC API. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-9-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: x86: Rename lapic get/set_reg() helpersNeeraj Upadhyay
In preparation for moving kvm-internal __kvm_lapic_set_reg(), __kvm_lapic_get_reg() to apic.h for use in Secure AVIC APIC driver, rename them as part of the APIC API. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-8-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: x86: Rename find_highest_vector()Neeraj Upadhyay
In preparation for moving kvm-internal find_highest_vector() to apic.h for use in Secure AVIC APIC driver, rename find_highest_vector() to apic_find_highest_vector() as part of the APIC API. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-7-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: x86: Change lapic regs base address to void pointerNeeraj Upadhyay
Change APIC base address from "char *" to "void *" in KVM lapic's set/get helper functions. Pointer arithmetic for "void *" and "char *" operate identically. With "void *" there is less of a chance of doing the wrong thing, e.g. neglecting to cast and reading a byte instead of the desired APIC register size. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-6-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: x86: Rename VEC_POS/REG_POS macro usagesNeeraj Upadhyay
In preparation for moving most of the KVM's lapic helpers which use VEC_POS/REG_POS macros to common APIC header for use in Secure AVIC APIC driver, rename all VEC_POS/REG_POS macro usages to APIC_VECTOR_TO_BIT_NUMBER/APIC_VECTOR_TO_REG_OFFSET and remove VEC_POS/REG_POS. While at it, clean up line wrap in find_highest_vector(). No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-5-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10x86/apic: KVM: Deduplicate APIC vector => register+bit mathSean Christopherson
Consolidate KVM's {REG,VEC}_POS() macros and lapic_vector_set_in_irr()'s open coded equivalent logic in anticipation of the kernel gaining more usage of vector => reg+bit lookups. Use lapic_vector_set_in_irr()'s math as using divides for both the bit number and register offset makes it easier to connect the dots, and for at least one user, fixup_irqs(), "/ 32 * 0x10" generates ever so slightly better code with gcc-14 (shaves a whole 3 bytes from the code stream): ((v) >> 5) << 4: c1 ef 05 shr $0x5,%edi c1 e7 04 shl $0x4,%edi 81 c7 00 02 00 00 add $0x200,%edi (v) / 32 * 0x10: c1 ef 05 shr $0x5,%edi 83 c7 20 add $0x20,%edi c1 e7 04 shl $0x4,%edi Keep KVM's tersely named macros as "wrappers" to avoid unnecessary churn in KVM, and because the shorter names yield more readable code overall in KVM. The new macros type cast the vector parameter to "unsigned int". This is required from better code generation for cases where an "int" is passed to these macros in KVM code. int v; ((v) >> 5) << 4: c1 f8 05 sar $0x5,%eax c1 e0 04 shl $0x4,%eax ((v) / 32 * 0x10): 85 ff test %edi,%edi 8d 47 1f lea 0x1f(%rdi),%eax 0f 49 c7 cmovns %edi,%eax c1 f8 05 sar $0x5,%eax c1 e0 04 shl $0x4,%eax ((unsigned int)(v) / 32 * 0x10): c1 f8 05 sar $0x5,%eax c1 e0 04 shl $0x4,%eax (v) & (32 - 1): 89 f8 mov %edi,%eax 83 e0 1f and $0x1f,%eax (v) % 32 89 fa mov %edi,%edx c1 fa 1f sar $0x1f,%edx c1 ea 1b shr $0x1b,%edx 8d 04 17 lea (%rdi,%rdx,1),%eax 83 e0 1f and $0x1f,%eax 29 d0 sub %edx,%eax (unsigned int)(v) % 32: 89 f8 mov %edi,%eax 83 e0 1f and $0x1f,%eax Overall kvm.ko text size is impacted if "unsigned int" is not used. Bin Orig New (w/o unsigned int) New (w/ unsigned int) lapic.o 28580 28772 28580 kvm.o 670810 671002 670810 kvm.ko 708079 708271 708079 No functional change intended. [Neeraj: Type cast vec macro param to "unsigned int", provide data in commit log on "unsigned int" requirement] Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Link: https://lore.kernel.org/r/20250709033242.267892-4-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: x86: Remove redundant parentheses around 'bitmap'Neeraj Upadhyay
When doing pointer arithmetic in apic_test_vector() and kvm_lapic_{set|clear}_vector(), remove the unnecessary parentheses surrounding the 'bitmap' parameter. No functional change intended. Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Link: https://lore.kernel.org/r/20250709033242.267892-3-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: x86: Open code setting/clearing of bits in the ISRNeeraj Upadhyay
Remove __apic_test_and_set_vector() and __apic_test_and_clear_vector(), because the _only_ register that's safe to modify with a non-atomic operation is ISR, because KVM isn't running the vCPU, i.e. hardware can't service an IRQ or process an EOI for the relevant (virtual) APIC. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-2-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: SEV: Prefer WBNOINVD over WBINVD for cache maintenance efficiencyKevin Loughlin
AMD CPUs currently execute WBINVD in the host when unregistering SEV guest memory or when deactivating SEV guests. Such cache maintenance is performed to prevent data corruption, wherein the encrypted (C=1) version of a dirty cache line might otherwise only be written back after the memory is written in a different context (ex: C=0), yielding corruption. However, WBINVD is performance-costly, especially because it invalidates processor caches. Strictly-speaking, unless the SEV ASID is being recycled (meaning the SNP firmware requires the use of WBINVD prior to DF_FLUSH), the cache invalidation triggered by WBINVD is unnecessary; only the writeback is needed to prevent data corruption in remaining scenarios. To improve performance in these scenarios, use WBNOINVD when available instead of WBINVD. WBNOINVD still writes back all dirty lines (preventing host data corruption by SEV guests) but does *not* invalidate processor caches. Note that the implementation of wbnoinvd() ensures fall back to WBINVD if WBNOINVD is unavailable. In anticipation of forthcoming optimizations to limit the WBNOINVD only to physical CPUs that have executed SEV guests, place the call to wbnoinvd_on_all_cpus() in a wrapper function sev_writeback_caches(). Signed-off-by: Kevin Loughlin <kevinloughlin@google.com> Reviewed-by: Mingwei Zhang <mizhang@google.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250201000259.3289143-3-kevinloughlin@google.com [sean: tweak comment regarding CLFUSH] Cc: Francesco Lavra <francescolavra.fl@gmail.com> Link: https://lore.kernel.org/r/20250522233733.3176144-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: SVM: Remove wbinvd in sev_vm_destroy()Zheyun Shen
Before sev_vm_destroy() is called, kvm_arch_guest_memory_reclaimed() has been called for SEV and SEV-ES and kvm_arch_gmem_invalidate() has been called for SEV-SNP. These functions have already handled flushing the memory. Therefore, this wbinvd_on_all_cpus() can simply be dropped. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250522233733.3176144-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: x86: Use wbinvd_on_cpu() instead of an open-coded equivalentSean Christopherson
Use wbinvd_on_cpu() to target a single CPU instead of open-coding an equivalent, and drop KVM's wbinvd_ipi() now that all users have switched to x86 library versions. No functional change intended. Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250522233733.3176144-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "Many patches, pretty much all of them small, that accumulated while I was on vacation. ARM: - Remove the last leftovers of the ill-fated FPSIMD host state mapping at EL2 stage-1 - Fix unexpected advertisement to the guest of unimplemented S2 base granule sizes - Gracefully fail initialising pKVM if the interrupt controller isn't GICv3 - Also gracefully fail initialising pKVM if the carveout allocation fails - Fix the computing of the minimum MMIO range required for the host on stage-2 fault - Fix the generation of the GICv3 Maintenance Interrupt in nested mode x86: - Reject SEV{-ES} intra-host migration if one or more vCPUs are actively being created, so as not to create a non-SEV{-ES} vCPU in an SEV{-ES} VM - Use a pre-allocated, per-vCPU buffer for handling de-sparsification of vCPU masks in Hyper-V hypercalls; fixes a "stack frame too large" issue - Allow out-of-range/invalid Xen event channel ports when configuring IRQ routing, to avoid dictating a specific ioctl() ordering to userspace - Conditionally reschedule when setting memory attributes to avoid soft lockups when userspace converts huge swaths of memory to/from private - Add back MWAIT as a required feature for the MONITOR/MWAIT selftest - Add a missing field in struct sev_data_snp_launch_start that resulted in the guest-visible workarounds field being filled at the wrong offset - Skip non-canonical address when processing Hyper-V PV TLB flushes to avoid VM-Fail on INVVPID - Advertise supported TDX TDVMCALLs to userspace - Pass SetupEventNotifyInterrupt arguments to userspace - Fix TSC frequency underflow" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: avoid underflow when scaling TSC frequency KVM: arm64: Remove kvm_arch_vcpu_run_map_fp() KVM: arm64: Fix handling of FEAT_GTG for unimplemented granule sizes KVM: arm64: Don't free hyp pages with pKVM on GICv2 KVM: arm64: Fix error path in init_hyp_mode() KVM: arm64: Adjust range correctly during host stage-2 faults KVM: arm64: nv: Fix MI line level calculation in vgic_v3_nested_update_mi() KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush KVM: SVM: Add missing member in SNP_LAUNCH_START command structure Documentation: KVM: Fix unexpected unindent warnings KVM: selftests: Add back the missing check of MONITOR/MWAIT availability KVM: Allow CPU to reschedule while setting per-page memory attributes KVM: x86/xen: Allow 'out of range' event channel ports in IRQ routing table. KVM: x86/hyper-v: Use preallocated per-vCPU buffer for de-sparsified vCPU masks KVM: SVM: Initialize vmsa_pa in VMCB to INVALID_PAGE if VMSA page is NULL KVM: SVM: Reject SEV{-ES} intra host migration if vCPU creation is in-flight KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities KVM: TDX: Exit to userspace for SetupEventNotifyInterrupt
2025-07-10x86/lib: Add WBINVD and WBNOINVD helpers to target multiple CPUsZheyun Shen
Extract KVM's open-coded calls to do writeback caches on multiple CPUs to common library helpers for both WBINVD and WBNOINVD (KVM will use both). Put the onus on the caller to check for a non-empty mask to simplify the SMP=n implementation, e.g. so that it doesn't need to check that the one and only CPU in the system is present in the mask. [sean: move to lib, add SMP=n helpers, clarify usage] Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250128015345.7929-2-szy0127@sjtu.edu.cn Link: https://lore.kernel.org/20250522233733.3176144-5-seanjc@google.com
2025-07-10x86/lib: Add WBNOINVD helper functionsKevin Loughlin
In line with WBINVD usage, add WBNOINVD helper functions. Explicitly fall back to WBINVD (via alternative()) if WBNOINVD isn't supported even though the instruction itself is backwards compatible (WBNOINVD is WBINVD with an ignored REP prefix), so that disabling X86_FEATURE_WBNOINVD behaves as one would expect, e.g. in case there's a hardware issue that affects WBNOINVD. Opportunistically, add comments explaining the architectural behavior of WBINVD and WBNOINVD, and provide hints and pointers to uarch-specific behavior. Note, alternative() ensures compatibility with early boot code as needed. [ bp: Massage, fix typos, make export _GPL. ] Signed-off-by: Kevin Loughlin <kevinloughlin@google.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/20250522233733.3176144-4-seanjc@google.com
2025-07-10x86/lib: Drop the unused return value from wbinvd_on_all_cpus()Sean Christopherson
Drop wbinvd_on_all_cpus()'s return value; both the "real" version and the stub always return '0', and none of the callers check the return. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250522233733.3176144-3-seanjc@google.com
2025-07-09mm: remove callers of pfn_t functionalityAlistair Popple
All PFN_* pfn_t flags have been removed. Therefore there is no longer a need for the pfn_t type and all uses can be replaced with normal pfns. Link: https://lkml.kernel.org/r/bbedfa576c9822f8032494efbe43544628698b1f.1750323463.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Björn Töpel <bjorn@kernel.org> Cc: Björn Töpel <bjorn@rivosinc.com> Cc: Chunyan Zhang <zhang.lyra@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Deepak Gupta <debug@rivosinc.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Inki Dae <m.szyprowski@samsung.com> Cc: John Groves <john@groves.net> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09mm: remove devmap related functions and page table bitsAlistair Popple
Now that DAX and all other reference counts to ZONE_DEVICE pages are managed normally there is no need for the special devmap PTE/PMD/PUD page table bits. So drop all references to these, freeing up a software defined page table bit on architectures supporting it. Link: https://lkml.kernel.org/r/6389398c32cc9daa3dfcaa9f79c7972525d310ce.1750323463.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Acked-by: Will Deacon <will@kernel.org> # arm64 Acked-by: David Hildenbrand <david@redhat.com> Suggested-by: Chunyan Zhang <zhang.lyra@gmail.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Björn Töpel <bjorn@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Deepak Gupta <debug@rivosinc.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Inki Dae <m.szyprowski@samsung.com> Cc: John Groves <john@groves.net> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09mm: update architecture and driver code to use vm_flags_tLorenzo Stoakes
In future we intend to change the vm_flags_t type, so it isn't correct for architecture and driver code to assume it is unsigned long. Correct this assumption across the board. Overall, this patch does not introduce any functional change. Link: https://lkml.kernel.org/r/b6eb1894abc5555ece80bb08af5c022ef780c8bc.1750274467.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09mm: change vm_get_page_prot() to accept vm_flags_t argumentLorenzo Stoakes
Patch series "use vm_flags_t consistently". The VMA flags field vma->vm_flags is of type vm_flags_t. Right now this is exactly equivalent to unsigned long, but it should not be assumed to be. Much code that references vma->vm_flags already correctly uses vm_flags_t, but a fairly large chunk of code simply uses unsigned long and assumes that the two are equivalent. This series corrects that and has us use vm_flags_t consistently. This series is motivated by the desire to, in a future series, adjust vm_flags_t to be a u64 regardless of whether the kernel is 32-bit or 64-bit in order to deal with the VMA flag exhaustion issue and avoid all the various problems that arise from it (being unable to use certain features in 32-bit, being unable to add new flags except for 64-bit, etc.) This is therefore a critical first step towards that goal. At any rate, using the correct type is of value regardless. We additionally take the opportunity to refer to VMA flags as vm_flags where possible to make clear what we're referring to. Overall, this series does not introduce any functional change. This patch (of 3): We abstract the type of the VMA flags to vm_flags_t, however in may places it is simply assumed this is unsigned long, which is simply incorrect. At the moment this is simply an incongruity, however in future we plan to change this type and therefore this change is a critical requirement for doing so. Overall, this patch does not introduce any functional change. [lorenzo.stoakes@oracle.com: add missing vm_get_page_prot() instance, remove include] Link: https://lkml.kernel.org/r/552f88e1-2df8-4e95-92b8-812f7c8db829@lucifer.local Link: https://lkml.kernel.org/r/cover.1750274467.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/a12769720a2743f235643b158c4f4f0a9911daf0.1750274467.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09x86/hyperv: Clean up hv_map/unmap_interrupt() return valuesNuno Das Neves
Fix the return values of these hypercall helpers so they return a negated errno either directly or via hv_result_to_errno(). Update the callers to check for errno instead of using hv_status_success(), and remove redundant error printing. While at it, rearrange some variable declarations to adhere to style guidelines i.e. "reverse fir tree order". Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1751582677-30930-5-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1751582677-30930-5-git-send-email-nunodasneves@linux.microsoft.com>
2025-07-09x86/hyperv: Fix usage of cpu_online_mask to get valid cpuNuno Das Neves
Accessing cpu_online_mask here is problematic because the cpus read lock is not held in this context. However, cpu_online_mask isn't needed here since the effective affinity mask is guaranteed to be valid in this callback. So, just use cpumask_first() to get the cpu instead of ANDing it with cpus_online_mask unnecessarily. Fixes: e39397d1fd68 ("x86/hyperv: implement an MSI domain for root partition") Reported-by: Michael Kelley <mhklinux@outlook.com> Closes: https://lore.kernel.org/linux-hyperv/SN6PR02MB4157639630F8AD2D8FD8F52FD475A@SN6PR02MB4157.namprd02.prod.outlook.com/ Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1751582677-30930-4-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1751582677-30930-4-git-send-email-nunodasneves@linux.microsoft.com>
2025-07-09x86/hyperv: Fix warnings for missing export.h header inclusionNaman Jain
Fix below warning in Hyper-V drivers that comes when kernel is compiled with W=1 option. Include export.h in driver files to fix it. * warning: EXPORT_SYMBOL() is used, but #include <linux/export.h> is missing Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Link: https://lore.kernel.org/r/20250611100459.92900-3-namjain@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250611100459.92900-3-namjain@linux.microsoft.com>
2025-07-09KVM: x86: avoid underflow when scaling TSC frequencyPaolo Bonzini
In function kvm_guest_time_update(), __scale_tsc() is used to calculate a TSC *frequency* rather than a TSC value. With low-enough ratios, a TSC value that is less than 1 would underflow to 0 and to an infinite while loop in kvm_get_time_scale(): kvm_guest_time_update(struct kvm_vcpu *v) if (kvm_caps.has_tsc_control) tgt_tsc_khz = kvm_scale_tsc(tgt_tsc_khz, v->arch.l1_tsc_scaling_ratio); __scale_tsc(u64 ratio, u64 tsc) ratio=122380531, tsc=2299998, N=48 ratio*tsc >> N = 0.999... -> 0 Later in the function: Call Trace: <TASK> kvm_get_time_scale arch/x86/kvm/x86.c:2458 [inline] kvm_guest_time_update+0x926/0xb00 arch/x86/kvm/x86.c:3268 vcpu_enter_guest.constprop.0+0x1e70/0x3cf0 arch/x86/kvm/x86.c:10678 vcpu_run+0x129/0x8d0 arch/x86/kvm/x86.c:11126 kvm_arch_vcpu_ioctl_run+0x37a/0x13d0 arch/x86/kvm/x86.c:11352 kvm_vcpu_ioctl+0x56b/0xe60 virt/kvm/kvm_main.c:4188 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl+0x12d/0x190 fs/ioctl.c:857 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x59/0x110 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x78/0xe2 This can really happen only when fuzzing, since the TSC frequency would have to be nonsensically low. Fixes: 35181e86df97 ("KVM: x86: Add a common TSC scaling function") Reported-by: Yuntao Liu <liuyuntao12@huawei.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-07-09KVM: x86: Provide a capability to disable APERF/MPERF read interceptsJim Mattson
Allow a guest to read the physical IA32_APERF and IA32_MPERF MSRs without interception. The IA32_APERF and IA32_MPERF MSRs are not virtualized. Writes are not handled at all. The MSR values are not zeroed on vCPU creation, saved on suspend, or restored on resume. No accommodation is made for processor migration or for sharing a logical processor with other tasks. No adjustments are made for non-unit TSC multipliers. The MSRs do not account for time the same way as the comparable PMU events, whether the PMU is virtualized by the traditional emulation method or the new mediated pass-through approach. Nonetheless, in a properly constrained environment, this capability can be combined with a guest CPUID table that advertises support for CPUID.6:ECX.APERFMPERF[bit 0] to induce a Linux guest to report the effective physical CPU frequency in /proc/cpuinfo. Moreover, there is no performance cost for this capability. Signed-off-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250530185239.2335185-3-jmattson@google.com Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250626001225.744268-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-09KVM: x86: Replace growing set of *_in_guest bools with a u64Jim Mattson
Store each "disabled exit" boolean in a single bit rather than a byte. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250530185239.2335185-2-jmattson@google.com Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250626001225.744268-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-09KVM: x86: Advertise support for LKGSXin Li
Advertise support for LKGS (load into IA32_KERNEL_GS_BASE) to userspace if the instruction is supported by the underlying CPU. LKGS is introduced with FRED to completely eliminate the need to swapgs explicilty. It behaves like the MOV to GS instruction except that it loads the base address into the IA32_KERNEL_GS_BASE MSR instead of the GS segment’s descriptor cache, which is exactly what Linux kernel does to load a user level GS base. Thus there is no need to SWAPGS away from the kernel GS base. LKGS is an independent CPU feature that works correctly in a KVM guest without requiring explicit enablement. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Link: https://lore.kernel.org/r/20250626173521.2301088-1-xin@zytor.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-09KVM: VMX: Add a macro to track which DEBUGCTL bits are host-ownedSean Christopherson
Add VMX_HOST_OWNED_DEBUGCTL_BITS to track which bits are host-owned, i.e. need to be preserved when running the guest, to dedup the logic without having to incur a memory load to get at kvm_x86_ops.HOST_OWNED_DEBUGCTL. No functional change intended. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/all/aF1yni8U6XNkyfRf@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-09Merge tag 'tsa_x86_bugs_for_6.16' into tip-x86-bugsBorislav Petkov (AMD)
Pick up TSA changes from mainline so that attack vectors work can continue ontop. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-07-09x86/mm: Disable hugetlb page table sharing on 32-bitJann Horn
Only select ARCH_WANT_HUGE_PMD_SHARE on 64-bit x86. Page table sharing requires at least three levels because it involves shared references to PMD tables; 32-bit x86 has either two-level paging (without PAE) or three-level paging (with PAE), but even with three-level paging, having a dedicated PGD entry for hugetlb is only barely possible (because the PGD only has four entries), and it seems unlikely anyone's actually using PMD sharing on 32-bit. Having ARCH_WANT_HUGE_PMD_SHARE enabled on non-PAE 32-bit X86 (which has 2-level paging) became particularly problematic after commit 59d9094df3d7 ("mm: hugetlb: independent PMD page table shared count"), since that changes `struct ptdesc` such that the `pt_mm` (for PGDs) and the `pt_share_count` (for PMDs) share the same union storage - and with 2-level paging, PMDs are PGDs. (For comparison, arm64 also gates ARCH_WANT_HUGE_PMD_SHARE on the configuration of page tables such that it is never enabled with 2-level paging.) Closes: https://lore.kernel.org/r/srhpjxlqfna67blvma5frmy3aa@altlinux.org Fixes: cfe28c5d63d8 ("x86: mm: Remove x86 version of huge_pmd_share.") Reported-by: Vitaly Chikunov <vt@altlinux.org> Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Oscar Salvador <osalvador@suse.de> Acked-by: David Hildenbrand <david@redhat.com> Tested-by: Vitaly Chikunov <vt@altlinux.org> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250702-x86-2level-hugetlb-v2-1-1a98096edf92%40google.com
2025-07-09perf/x86/intel/uncore: Add iMC freerunning for Panther LakeKan Liang
PTL uncore imc freerunning counters are the same as the previous HW. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250707201750.616527-5-kan.liang@linux.intel.com
2025-07-09perf/x86/intel/uncore: Add Panther Lake supportKan Liang
The Panther Lake supports CBOX, MC, sNCU, and HBO uncore PMON. The CBOX is similar to Lunar Lake. The only difference is the number of CBOX. The other three uncore PMON can be retrieved from the discovery table. The global control register resides in the sNCU. The global freeze bit is set by default. It must be cleared before monitoring any uncore counters. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250707201750.616527-4-kan.liang@linux.intel.com
2025-07-09perf/x86/intel/uncore: Support customized MMIO map sizeKan Liang
For a server platform, the MMIO map size is always 0x4000. However, a client platform may have a smaller map size. Make the map size customizable. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250707201750.616527-3-kan.liang@linux.intel.com
2025-07-09perf/x86/intel/uncore: Support MSR portal for discovery tablesKan Liang
Starting from the Panther Lake, the discovery table mechanism is also supported in client platforms. The difference is that the portal of the global discovery table is retrieved from an MSR. The layout of discovery tables are the same as the server platforms. Factor out __parse_discovery_table() to parse discover tables. The uncore PMON is Die scope. Need to parse the discovery tables for each die. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250707201750.616527-2-kan.liang@linux.intel.com
2025-07-09x86/microcode: Move away from using a fake platform deviceGreg Kroah-Hartman
Downloading firmware needs a device to hang off of, and so a platform device seemed like the simplest way to do this. Now that we have a faux device interface, use that instead as this "microcode device" is not anything resembling a platform device at all. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/2025070121-omission-small-9308@gregkh
2025-07-08x86/CPU/AMD: Disable INVLPGB on Zen2Mikhail Paulyshka
AMD Cyan Skillfish (Family 17h, Model 47h, Stepping 0h) has an issue that causes system oopses and panics when performing TLB flush using INVLPGB. However, the problem is that that machine has misconfigured CPUID and should not report the INVLPGB bit in the first place. So zap the kernel's representation of the flag so that nothing gets confused. [ bp: Massage. ] Fixes: 767ae437a32d ("x86/mm: Add INVLPGB feature and Kconfig entry") Signed-off-by: Mikhail Paulyshka <me@mixaill.net> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/1ebe845b-322b-4929-9093-b41074e9e939@mixaill.net
2025-07-08x86/rdrand: Disable RDSEED on AMD Cyan SkillfishMikhail Paulyshka
AMD Cyan Skillfish (Family 17h, Model 47h, Stepping 0h) has an error that causes RDSEED to always return 0xffffffff, while RDRAND works correctly. Mask the RDSEED cap for this CPU so that both /proc/cpuinfo and direct CPUID read report RDSEED as unavailable. [ bp: Move to amd.c, massage. ] Signed-off-by: Mikhail Paulyshka <me@mixaill.net> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/20250524145319.209075-1-me@mixaill.net
2025-07-07Merge tag 'tsa_x86_bugs_for_6.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU speculation fixes from Borislav Petkov: "Add the mitigation logic for Transient Scheduler Attacks (TSA) TSA are new aspeculative side channel attacks related to the execution timing of instructions under specific microarchitectural conditions. In some cases, an attacker may be able to use this timing information to infer data from other contexts, resulting in information leakage. Add the usual controls of the mitigation and integrate it into the existing speculation bugs infrastructure in the kernel" * tag 'tsa_x86_bugs_for_6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/process: Move the buffer clearing before MONITOR x86/microcode/AMD: Add TSA microcode SHAs KVM: SVM: Advertise TSA CPUID bits to guests x86/bugs: Add a Transient Scheduler Attacks mitigation x86/bugs: Rename MDS machinery to something more generic
2025-07-07x86/itmt: Add debugfs file to show core prioritiesMario Limonciello
Multiple drivers can report priorities to ITMT. To aid in debugging any issues with the values reported by drivers introduce a debugfs file to read out the values. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/20250609200518.3616080-14-superm1@kernel.org
2025-07-07x86/process: Clear hardware feedback history for AMD processorsPerry Yuan
Incorporate a mechanism within the context switching code to reset the hardware history for AMD processors. Specifically, when a task is switched in, the class ID is read and the hardware workload classification history of the CPU firmware is reset. Then, the workload classification for the next running thread is begun. [ bp: Massage commit message. ] Signed-off-by: Perry Yuan <perry.yuan@amd.com> Co-developed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/20250609200518.3616080-10-superm1@kernel.org
2025-07-07x86/msr-index: Add AMD workload classification MSRsPerry Yuan
Introduce new MSR registers for AMD hardware feedback support. They provide workload classification and configuration capabilities. Signed-off-by: Perry Yuan <perry.yuan@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/20250609200518.3616080-4-superm1@kernel.org
2025-07-06Merge tag 'x86_urgent_for_v6.16_rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Borislav Petkov: - Make sure AMD SEV guests using secure TSC, include a TSC_FACTOR which prevents their TSCs from going skewed from the hypervisor's * tag 'x86_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev: Use TSC_FACTOR for Secure TSC frequency calculation
2025-07-06Merge tag 'ras_urgent_for_v6.16_rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fixes from Borislav Petkov: - Do not remove the MCE sysfs hierarchy if thresholding sysfs nodes init fails due to new/unknown banks present, which in itself is not fatal anyway; add default names for new banks - Make sure MCE polling settings are honored after CMCI storms - Make sure MCE threshold limit is reset after the thresholding interrupt has been serviced - Clean up properly and disable CMCI banks on shutdown so that a second/kexec-ed kernel can rediscover those banks again * tag 'ras_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Make sure CMCI banks are cleared during shutdown on Intel x86/mce/amd: Fix threshold limit reset x86/mce/amd: Add default names for MCA banks and blocks x86/mce: Ensure user polling settings are honored when restarting timer x86/mce: Don't remove sysfs if thresholding sysfs init fails
2025-07-04lib/crypto: sha256: Make library API use strongly-typed contextsEric Biggers
Currently the SHA-224 and SHA-256 library functions can be mixed arbitrarily, even in ways that are incorrect, for example using sha224_init() and sha256_final(). This is because they operate on the same structure, sha256_state. Introduce stronger typing, as I did for SHA-384 and SHA-512. Also as I did for SHA-384 and SHA-512, use the names *_ctx instead of *_state. The *_ctx names have the following small benefits: - They're shorter. - They avoid an ambiguity with the compression function state. - They're consistent with the well-known OpenSSL API. - Users usually name the variable 'sctx' anyway, which suggests that *_ctx would be the more natural name for the actual struct. Therefore: update the SHA-224 and SHA-256 APIs, implementation, and calling code accordingly. In the new structs, also strongly-type the compression function state. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250630160645.3198-7-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>