summaryrefslogtreecommitdiff
path: root/arch/x86/include/asm
AgeCommit message (Collapse)Author
2023-04-10KVM: x86: Rename Hyper-V remote TLB hooks to match established schemeSean Christopherson
Rename the Hyper-V hooks for TLB flushing to match the naming scheme used by all the other TLB flushing hooks, e.g. in kvm_x86_ops, vendor code, arch hooks from common code, etc. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20230405003133.419177-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-08x86/kexec: remove unnecessary arch_kexec_kernel_image_load()Bjorn Helgaas
Patch series "kexec: Remove unnecessary arch hook", v2. There are no arch-specific things in arch_kexec_kernel_image_load(), so remove it and just use the generic version. This patch (of 2): The x86 implementation of arch_kexec_kernel_image_load() is functionally identical to the generic arch_kexec_kernel_image_load(): arch_kexec_kernel_image_load # x86 if (!image->fops || !image->fops->load) return ERR_PTR(-ENOEXEC); return image->fops->load(image, image->kernel_buf, ...) arch_kexec_kernel_image_load # generic kexec_image_load_default if (!image->fops || !image->fops->load) return ERR_PTR(-ENOEXEC); return image->fops->load(image, image->kernel_buf, ...) Remove the x86-specific version and use the generic arch_kexec_kernel_image_load(). No functional change intended. Link: https://lkml.kernel.org/r/20230307224416.907040-1-helgaas@kernel.org Link: https://lkml.kernel.org/r/20230307224416.907040-2-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05x86/cpu: Add model number for Intel Arrow Lake processorTony Luck
Successor to Lunar Lake. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230404174641.426593-1-tony.luck@intel.com
2023-04-05KVM: x86: Redefine 'longmode' as a flag for KVM_EXIT_HYPERCALLOliver Upton
The 'longmode' field is a bit annoying as it blows an entire __u32 to represent a boolean value. Since other architectures are looking to add support for KVM_EXIT_HYPERCALL, now is probably a good time to clean it up. Redefine the field (and the remaining padding) as a set of flags. Preserve the existing ABI by using bit 0 to indicate if the guest was in long mode and requiring that the remaining 31 bits must be zero. Cc: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230404154050.2270077-2-oliver.upton@linux.dev
2023-04-04KVM: SVM: Remove a duplicate definition of VMCB_AVIC_APIC_BAR_MASKXinghui Li
VMCB_AVIC_APIC_BAR_MASK is defined twice with the same value in svm.h, which is meaningless. Delete the duplicate one. Fixes: 391503528257 ("KVM: x86: SVM: move avic definitions from AMD's spec to svm.h") Signed-off-by: Xinghui Li <korantli@tencent.com> Reviewed-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230403095200.1391782-1-korantwork@gmail.com [sean: tweak shortlog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-30docs: move x86 documentation into Documentation/arch/Jonathan Corbet
Move the x86 documentation under Documentation/arch/ as a way of cleaning up the top-level directory and making the structure of our docs more closely match the structure of the source directories it describes. All in-kernel references to the old paths have been updated. Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-arch@vger.kernel.org Cc: x86@kernel.org Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/20230315211523.108836-1-corbet@lwn.net/ Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-03-30Merge branch 'x86/cc' into x86/apicThomas Gleixner
Pick up the cc_vendor changes.
2023-03-30Merge branch 'x86/cc' into x86/sevThomas Gleixner
Pick up the cc_vendor changes.
2023-03-30x86/coco: Export cc_vendorBorislav Petkov (AMD)
It will be used in different checks in future changes. Export it directly and provide accessor functions and stubs so this can be used in general code when CONFIG_ARCH_HAS_CC_PLATFORM is not set. No functional changes. [ tglx: Add accessor functions ] Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230318115634.9392-2-bp@alien8.de
2023-03-28mm: add PTE pointer parameter to flush_tlb_fix_spurious_fault()Gerald Schaefer
s390 can do more fine-grained handling of spurious TLB protection faults, when there also is the PTE pointer available. Therefore, pass on the PTE pointer to flush_tlb_fix_spurious_fault() as an additional parameter. This will add no functional change to other architectures, but those with private flush_tlb_fix_spurious_fault() implementations need to be made aware of the new parameter. Link: https://lkml.kernel.org/r/20230306161548.661740-1-gerald.schaefer@linux.ibm.com Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28x86: kmsan: use C versions of memset16/memset32/memset64Alexander Potapenko
KMSAN must see as many memory accesses as possible to prevent false positive reports. Fall back to versions of memset16()/memset32()/memset64() implemented in lib/string.c instead of those written in assembly. Link: https://lkml.kernel.org/r/20230303141433.3422671-3-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Suggested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Reviewed-by: Marco Elver <elver@google.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Helge Deller <deller@gmx.de> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28x86: kmsan: don't rename memintrinsics in uninstrumented filesAlexander Potapenko
clang -fsanitize=kernel-memory already replaces calls to memset/memcpy/memmove and their __builtin_ versions with __msan_memset/__msan_memcpy/__msan_memmove in instrumented files, so there is no need to override them. In non-instrumented versions we are now required to leave memset() and friends intact, so we cannot replace them with __msan_XXX() functions. Link: https://lkml.kernel.org/r/20230303141433.3422671-1-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Suggested-by: Marco Elver <elver@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Helge Deller <deller@gmx.de> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-27x86/include/asm/msr-index.h: Add IFS Array test bitsJithu Joseph
Define MSR bitfields for enumerating support for Array BIST test. Signed-off-by: Jithu Joseph <jithu.joseph@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20230322003359.213046-5-jithu.joseph@intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-03-27x86/hyperv: Change vTOM handling to use standard coco mechanismsMichael Kelley
Hyper-V guests on AMD SEV-SNP hardware have the option of using the "virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP architecture. With vTOM, shared vs. private memory accesses are controlled by splitting the guest physical address space into two halves. vTOM is the dividing line where the uppermost bit of the physical address space is set; e.g., with 47 bits of guest physical address space, vTOM is 0x400000000000 (bit 46 is set). Guest physical memory is accessible at two parallel physical addresses -- one below vTOM and one above vTOM. Accesses below vTOM are private (encrypted) while accesses above vTOM are shared (decrypted). In this sense, vTOM is like the GPA.SHARED bit in Intel TDX. Support for Hyper-V guests using vTOM was added to the Linux kernel in two patch sets[1][2]. This support treats the vTOM bit as part of the physical address. For accessing shared (decrypted) memory, these patch sets create a second kernel virtual mapping that maps to physical addresses above vTOM. A better approach is to treat the vTOM bit as a protection flag, not as part of the physical address. This new approach is like the approach for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel virtual mapping, the existing mapping is updated using recently added coco mechanisms. When memory is changed between private and shared using set_memory_decrypted() and set_memory_encrypted(), the PTEs for the existing kernel mapping are changed to add or remove the vTOM bit in the guest physical address, just as with TDX. The hypercalls to change the memory status on the host side are made using the existing callback mechanism. Everything just works, with a minor tweak to map the IO-APIC to use private accesses. To accomplish the switch in approach, the following must be done: * Update Hyper-V initialization to set the cc_mask based on vTOM and do other coco initialization. * Update physical_mask so the vTOM bit is no longer treated as part of the physical address * Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear the vTOM bit as a protection flag. * Code already exists to make hypercalls to inform Hyper-V about pages changing between shared and private. Update this code to run as a callback from __set_memory_enc_pgtable(). * Remove the Hyper-V special case from __set_memory_enc_dec() * Remove the Hyper-V specific call to swiotlb_update_mem_attributes() since mem_encrypt_init() will now do it. * Add a Hyper-V specific implementation of the is_private_mmio() callback that returns true for the IO-APIC and vTPM MMIO addresses [1] https://lore.kernel.org/all/20211025122116.264793-1-ltykernel@gmail.com/ [2] https://lore.kernel.org/all/20211213071407.314309-1-ltykernel@gmail.com/ [ bp: Touchups. ] Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/1679838727-87310-7-git-send-email-mikelley@microsoft.com
2023-03-26x86/ioremap: Add hypervisor callback for private MMIO mapping in coco VMMichael Kelley
Current code always maps MMIO devices as shared (decrypted) in a confidential computing VM. But Hyper-V guest VMs on AMD SEV-SNP with vTOM use a paravisor running in VMPL0 to emulate some devices, such as the IO-APIC and TPM. In such a case, the device must be accessed as private (encrypted) because the paravisor emulates the device at an address below vTOM, where all accesses are encrypted. Add a new hypervisor callback to determine if an MMIO address should be mapped private. The callback allows hypervisor-specific code to handle any quirks, the use of a paravisor, etc. in determining whether a mapping must be private. If the callback is not used by a hypervisor, default to returning "false", which is consistent with normal coco VM behavior. Use this callback as another special case to check for when doing ioremap(). Just checking the starting address is sufficient as an ioremap range must be all private or all shared. Also make the callback in early boot IO-APIC mapping code that uses the fixmap. [ bp: Touchups. ] Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/1678329614-3482-2-git-send-email-mikelley@microsoft.com
2023-03-24treewide: Trace IPIs sent via smp_send_reschedule()Valentin Schneider
To be able to trace invocations of smp_send_reschedule(), rename the arch-specific definitions of it to arch_smp_send_reschedule() and wrap it into an smp_send_reschedule() that contains a tracepoint. Changes to include the declaration of the tracepoint were driven by the following coccinelle script: @func_use@ @@ smp_send_reschedule(...); @include@ @@ #include <trace/events/ipi.h> @no_include depends on func_use && !include@ @@ #include <...> + + #include <trace/events/ipi.h> [csky bits] [riscv bits] Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20230307143558.294354-6-vschneid@redhat.com
2023-03-23KVM: x86: Shrink struct kvm_pmuMathias Krause
Move the 'version' member to the beginning of the structure to reuse an existing hole instead of introducing another one. This allows us to save 8 bytes for 64 bit builds. Signed-off-by: Mathias Krause <minipli@grsecurity.net> Link: https://lore.kernel.org/r/20230217193336.15278-2-minipli@grsecurity.net Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-23x86: KVM: Add common feature flag for AMD's PSFDSean Christopherson
Use a common X86_FEATURE_* flag for AMD's PSFD, and suppress it from /proc/cpuinfo via the standard method of an empty string instead of hacking in a one-off "private" #define in KVM. The request that led to KVM defining its own flag was really just that the feature not show up in /proc/cpuinfo, and additional patches+discussions in the interim have clarified that defining flags in cpufeatures.h purely so that KVM can advertise features to userspace is ok so long as the kernel already uses a word to track the associated CPUID leaf. No functional change intended. Link: https://lore.kernel.org/all/d1b1e0da-29f0-c443-6c86-9549bbe1c79d@redhat.como Link: https://lore.kernel.org/all/YxGZH7aOXQF7Pu5q@nazgul.tnic Link: https://lore.kernel.org/all/Y3O7UYWfOLfJkwM%2F@zn.tnic Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20230124194519.2893234-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-23x86,objtool: Split UNWIND_HINT_EMPTY in twoJosh Poimboeuf
Mark reported that the ORC unwinder incorrectly marks an unwind as reliable when the unwind terminates prematurely in the dark corners of return_to_handler() due to lack of information about the next frame. The problem is UNWIND_HINT_EMPTY is used in two different situations: 1) The end of the kernel stack unwind before hitting user entry, boot code, or fork entry 2) A blind spot in ORC coverage where the unwinder has to bail due to lack of information about the next frame The ORC unwinder has no way to tell the difference between the two. When it encounters an undefined stack state with 'end=1', it blindly marks the stack reliable, which can break the livepatch consistency model. Fix it by splitting UNWIND_HINT_EMPTY into UNWIND_HINT_UNDEFINED and UNWIND_HINT_END_OF_STACK. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/fd6212c8b450d3564b855e1cb48404d6277b4d9f.1677683419.git.jpoimboe@kernel.org
2023-03-23x86,objtool: Separate unret validation from unwind hintsJosh Poimboeuf
The ENTRY unwind hint type is serving double duty as both an empty unwind hint and an unret validation annotation. Unret validation is unrelated to unwinding. Separate it out into its own annotation. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/ff7448d492ea21b86d8a90264b105fbd0d751077.1677683419.git.jpoimboe@kernel.org
2023-03-23x86,objtool: Introduce ORC_TYPE_*Josh Poimboeuf
Unwind hints and ORC entry types are two distinct things. Separate them out more explicitly. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/cc879d38fff8a43f8f7beb2fd56e35a5a384d7cd.1677683419.git.jpoimboe@kernel.org
2023-03-23objtool: Change UNWIND_HINT() argument orderJosh Poimboeuf
The most important argument is 'type', make that one first. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/d994f8c29376c5618c75698df28fc03b52d3a868.1677683419.git.jpoimboe@kernel.org
2023-03-23objtool: Use relative pointers for annotationsJosh Poimboeuf
They produce the needed relocations while using half the space. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/bed05c64e28200220c9b1754a2f3ce71f73076ea.1677683419.git.jpoimboe@kernel.org
2023-03-22KVM: x86: Add support for SVM's Virtual NMISantosh Shukla
Add support for SVM's Virtual NMIs implementation, which adds proper tracking of virtual NMI blocking, and an intr_ctrl flag that software can set to mark a virtual NMI as pending. Pending virtual NMIs are serviced by hardware if/when virtual NMIs become unblocked, i.e. act more or less like real NMIs. Introduce two new kvm_x86_ops callbacks so to support SVM's vNMI, as KVM needs to treat a pending vNMI as partially injected. Specifically, if two NMIs (for L1) arrive concurrently in KVM's software model, KVM's ABI is to inject one and pend the other. Without vNMI, KVM manually tracks the pending NMI and uses NMI windows to detect when the NMI should be injected. With vNMI, the pending NMI is simply stuffed into the VMCB and handed off to hardware. This means that KVM needs to be able to set a vNMI pending on-demand, and also query if a vNMI is pending, e.g. to honor the "at most one NMI pending" rule and to preserve all NMIs across save and restore. Warn if KVM attempts to open an NMI window when vNMI is fully enabled, as the above logic should prevent KVM from ever getting to kvm_check_and_inject_events() with two NMIs pending _in software_, and the "at most one NMI pending" logic should prevent having an NMI pending in hardware and an NMI pending in software if NMIs are also blocked, i.e. if KVM can't immediately inject the second NMI. Signed-off-by: Santosh Shukla <Santosh.Shukla@amd.com> Co-developed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20230227084016.3368-11-santosh.shukla@amd.com [sean: rewrite shortlog and changelog, massage code comments] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-22KVM: SVM: Add definitions for new bits in VMCB::int_ctrl related to vNMISantosh Shukla
Add defines for three new bits in VMVC::int_ctrl that are part of SVM's Virtual NMI (vNMI) support: V_NMI_PENDING_MASK(11) - Virtual NMI is pending V_NMI_BLOCKING_MASK(12) - Virtual NMI is masked V_NMI_ENABLE_MASK(26) - Enable NMI virtualization To "inject" an NMI, the hypervisor (KVM) sets V_NMI_PENDING. When the CPU services the pending vNMI, hardware clears V_NMI_PENDING and sets V_NMI_BLOCKING, e.g. to indicate that the vCPU is handling an NMI. Hardware clears V_NMI_BLOCKING upon successful execution of IRET, or if a VM-Exit occurs while delivering the virtual NMI. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com> Link: https://lore.kernel.org/r/20230227084016.3368-10-santosh.shukla@amd.com [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-22x86/cpufeatures: Redefine synthetic virtual NMI bit as AMD's "real" vNMISean Christopherson
The existing X86_FEATURE_VNMI is a synthetic feature flag that exists purely to maintain /proc/cpuinfo's ABI, the "real" Intel vNMI feature flag is tracked as VMX_FEATURE_VIRTUAL_NMIS, as the feature is enumerated through VMX MSRs, not CPUID. AMD is also gaining virtual NMI support, but in true VMX vs. SVM form, enumerates support through CPUID, i.e. wants to add real feature flag for vNMI. Redefine the syntheic X86_FEATURE_VNMI to AMD's real CPUID bit to avoid having both X86_FEATURE_VNMI and e.g. X86_FEATURE_AMD_VNMI. Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-22x86/tdx: Drop flags from __tdx_hypercall()Kirill A. Shutemov
After TDX_HCALL_ISSUE_STI got dropped, the only flag left is TDX_HCALL_HAS_OUTPUT. The flag indicates if the caller wants to see tdx_hypercall_args updated based on the hypercall output. Drop the flags and provide __tdx_hypercall_ret() that matches TDX_HCALL_HAS_OUTPUT semantics. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/20230321003511.9469-1-kirill.shutemov%40linux.intel.com
2023-03-22x86/platform/intel-mid: Remove unused definitions from intel-mid.hAndy Shevchenko
After a few rounds of removal and refactoring Intel MID related code some artifacts are left untouched. However, they are not used anywhere. Remove them. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20230216193958.2971-1-andriy.shevchenko%40linux.intel.com
2023-03-21x86/sev: Change snp_guest_issue_request()'s fw_err argumentDionna Glaze
The GHCB specification declares that the firmware error value for a guest request will be stored in the lower 32 bits of EXIT_INFO_2. The upper 32 bits are for the VMM's own error code. The fw_err argument to snp_guest_issue_request() is thus a misnomer, and callers will need access to all 64 bits. The type of unsigned long also causes problems, since sw_exit_info2 is u64 (unsigned long long) vs the argument's unsigned long*. Change this type for issuing the guest request. Pass the ioctl command struct's error field directly instead of in a local variable, since an incomplete guest request may not set the error code, and uninitialized stack memory would be written back to user space. The firmware might not even be called, so bookend the call with the no firmware call error and clear the error. Since the "fw_err" field is really exitinfo2 split into the upper bits' vmm error code and lower bits' firmware error code, convert the 64 bit value to a union. [ bp: - Massage commit message - adjust code - Fix a build issue as Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202303070609.vX6wp2Af-lkp@intel.com - print exitinfo2 in hex Tom: - Correct -EIO exit case. ] Signed-off-by: Dionna Glaze <dionnaglaze@google.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230214164638.1189804-5-dionnaglaze@google.com Link: https://lore.kernel.org/r/20230307192449.24732-12-bp@alien8.de
2023-03-21x86/smpboot: Remove initial_gsBrian Gerst
Given its CPU#, each CPU can find its own per-cpu offset, and directly set GSBASE accordingly. The global variable can be eliminated. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Usama Arif <usama.arif@bytedance.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Usama Arif <usama.arif@bytedance.com> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/20230316222109.1940300-9-usama.arif@bytedance.com
2023-03-21x86/smpboot: Remove initial_stack on 64-bitBrian Gerst
In order to facilitate parallel startup, start to eliminate some of the global variables passing information to CPUs in the startup path. However, start by introducing one more: smpboot_control. For now this merely holds the CPU# of the CPU which is coming up. Each CPU can then find its own per-cpu data, and everything else it needs can be found from there, allowing the other global variables to be removed. First to be removed is initial_stack. Each CPU can load %rsp from its current_task->thread.sp instead. That is already set up with the correct idle thread for APs. Set up the .sp field in INIT_THREAD on x86 so that the BSP also finds a suitable stack pointer in the static per-cpu data when coming up on first boot. On resume from S3, the CPU needs a temporary stack because its idle task is already active. Instead of setting initial_stack, the sleep code can simply set its own current->thread.sp to point to the temporary stack. Nobody else cares about ->thread.sp for a thread which is currently on a CPU, because the true value is actually in the %rsp register. Which is restored with the rest of the CPU context in do_suspend_lowlevel(). Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Usama Arif <usama.arif@bytedance.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Usama Arif <usama.arif@bytedance.com> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/20230316222109.1940300-7-usama.arif@bytedance.com
2023-03-19Merge tag 'x86_urgent_for_v6.3_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "There's a little bit more 'movement' in there for my taste but it needs to happen and should make the code better after it. - Check cmdline_find_option()'s return value before further processing - Clear temporary storage in the resctrl code to prevent access to an unexistent MSR - Add a simple throttling mechanism to protect the hypervisor from potentially malicious SEV guests issuing requests in rapid succession. In order to not jeopardize the sanity of everyone involved in maintaining this code, the request issuing side has received a cleanup, split in more or less trivial, small and digestible pieces. Otherwise, the code was threatening to become an unmaintainable mess. Therefore, that cleanup is marked indirectly also for stable so that there's no differences between the upstream code and the stable variant when it comes down to backporting more there" * tag 'x86_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Fix use of uninitialized buffer in sme_enable() x86/resctrl: Clear staged_config[] before and after it is used virt/coco/sev-guest: Add throttling awareness virt/coco/sev-guest: Convert the sw_exit_info_2 checking to a switch-case virt/coco/sev-guest: Do some code style cleanups virt/coco/sev-guest: Carve out the request issuing logic into a helper virt/coco/sev-guest: Remove the disable_vmpck label in handle_guest_request() virt/coco/sev-guest: Simplify extended guest request handling virt/coco/sev-guest: Check SEV_SNP attribute at probe time
2023-03-17Merge tag 'for-linus-6.3-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - cleanup for xen time handling - enable the VGA console in a Xen PVH dom0 - cleanup in the xenfs driver * tag 'for-linus-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: remove unnecessary (void*) conversions x86/PVH: obtain VGA console info in Dom0 x86/xen/time: cleanup xen_tsc_safe_clocksource xen: update arch/x86/include/asm/xen/cpuid.h
2023-03-17x86/paravirt: Convert simple paravirt functions to asmJuergen Gross
All functions referenced via __PV_IS_CALLEE_SAVE() need to be assembler functions, as those functions calls are hidden from the compiler. In case the kernel is compiled with "-fzero-call-used-regs" the compiler will clobber caller-saved registers at the end of C functions, which will result in unexpectedly zeroed registers at the call site of the related paravirt functions. Replace the C functions with DEFINE_PARAVIRT_ASM() constructs using the same instructions as the related paravirt calls in the PVOP_ALT_[V]CALLEE*() macros. And since they're not C functions visible to the compiler anymore, latter won't do the callee-clobbered zeroing invoked by -fzero-call-used-regs and thus won't corrupt registers. [ bp: Extend commit message. ] Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230317063325.361-1-jgross@suse.com
2023-03-16KVM: x86/mmu: Remove FNAME(invlpg) and use FNAME(sync_spte) to update vTLB ↵Lai Jiangshan
instead. In hardware TLB, invalidating TLB entries means the translations are removed from the TLB. In KVM shadowed vTLB, the translations (combinations of shadow paging and hardware TLB) are generally maintained as long as they remain "clean" when the TLB of an address space (i.e. a PCID or all) is flushed with the help of write-protections, sp->unsync, and kvm_sync_page(), where "clean" in this context means that no updates to KVM's SPTEs are needed. However, FNAME(invlpg) always zaps/removes the vTLB if the shadow page is unsync, and thus triggers a remote flush even if the original vTLB entry is clean, i.e. is usable as-is. Besides this, FNAME(invlpg) is largely is a duplicate implementation of FNAME(sync_spte) to invalidate a vTLB entry. To address both issues, reuse FNAME(sync_spte) to share the code and slightly modify the semantics, i.e. keep the vTLB entry if it's "clean" and avoid remote TLB flush. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Link: https://lore.kernel.org/r/20230216235321.735214-3-jiangshanlai@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16kvm: x86/mmu: Use KVM_MMU_ROOT_XXX for kvm_mmu_invalidate_addr()Lai Jiangshan
The @root_hpa for kvm_mmu_invalidate_addr() is called with @mmu->root.hpa or INVALID_PAGE where @mmu->root.hpa is to invalidate gva for the current root (the same meaning as KVM_MMU_ROOT_CURRENT) and INVALID_PAGE is to invalidate gva for all roots (the same meaning as KVM_MMU_ROOTS_ALL). Change the argument type of kvm_mmu_invalidate_addr() and use KVM_MMU_ROOT_XXX instead so that we can reuse the function for kvm_mmu_invpcid_gva() and nested_ept_invalidate_addr() for invalidating gva for different set of roots. No fuctionalities changed. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Link: https://lore.kernel.org/r/20230216154115.710033-9-jiangshanlai@gmail.com [sean: massage comment slightly] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16KVM: x86/mmu: Sanity check input to kvm_mmu_free_roots()Sean Christopherson
Tweak KVM_MMU_ROOTS_ALL to precisely cover all current+previous root flags, and add a sanity in kvm_mmu_free_roots() to verify that the set of roots to free doesn't stray outside KVM_MMU_ROOTS_ALL. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Link: https://lore.kernel.org/r/20230216154115.710033-8-jiangshanlai@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16KVM: x86/mmu: Move the code out of FNAME(sync_page)'s loop body into mmu.cLai Jiangshan
Rename mmu->sync_page to mmu->sync_spte and move the code out of FNAME(sync_page)'s loop body into mmu.c. No functionalities change intended. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Link: https://lore.kernel.org/r/20230216154115.710033-6-jiangshanlai@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16x86/mm/iommu/sva: Make LAM and SVA mutually exclusiveKirill A. Shutemov
IOMMU and SVA-capable devices know nothing about LAM and only expect canonical addresses. An attempt to pass down tagged pointer will lead to address translation failure. By default do not allow to enable both LAM and use SVA in the same process. The new ARCH_FORCE_TAGGED_SVA arch_prctl() overrides the limitation. By using the arch_prctl() userspace takes responsibility to never pass tagged address to the device. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/20230312112612.31869-12-kirill.shutemov%40linux.intel.com
2023-03-16mm: Expose untagging mask in /proc/$PID/statusKirill A. Shutemov
Add a line in /proc/$PID/status to report untag_mask. It can be used to find out LAM status of the process from the outside. It is useful for debuggers. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/all/20230312112612.31869-10-kirill.shutemov%40linux.intel.com
2023-03-16x86/mm: Provide arch_prctl() interface for LAMKirill A. Shutemov
Add a few of arch_prctl() handles: - ARCH_ENABLE_TAGGED_ADDR enabled LAM. The argument is required number of tag bits. It is rounded up to the nearest LAM mode that can provide it. For now only LAM_U57 is supported, with 6 tag bits. - ARCH_GET_UNTAG_MASK returns untag mask. It can indicates where tag bits located in the address. - ARCH_GET_MAX_TAG_BITS returns the maximum tag bits user can request. Zero if LAM is not supported. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/all/20230312112612.31869-9-kirill.shutemov%40linux.intel.com
2023-03-16x86/mm: Reduce untagged_addr() overhead for systems without LAMKirill A. Shutemov
Use alternatives to reduce untagged_addr() overhead. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20230312112612.31869-8-kirill.shutemov%40linux.intel.com
2023-03-16x86/uaccess: Provide untagged_addr() and remove tags before address checkKirill A. Shutemov
untagged_addr() is a helper used by the core-mm to strip tag bits and get the address to the canonical shape based on rules of the current thread. It only handles userspace addresses. The untagging mask is stored in per-CPU variable and set on context switching to the task. The tags must not be included into check whether it's okay to access the userspace address. Strip tags in access_ok(). Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/all/20230312112612.31869-7-kirill.shutemov%40linux.intel.com
2023-03-16x86/mm: Handle LAM on context switchKirill A. Shutemov
Linear Address Masking mode for userspace pointers encoded in CR3 bits. The mode is selected per-process and stored in mm_context_t. switch_mm_irqs_off() now respects selected LAM mode and constructs CR3 accordingly. The active LAM mode gets recorded in the tlb_state. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/all/20230312112612.31869-5-kirill.shutemov%40linux.intel.com
2023-03-16x86: CPUID and CR3/CR4 flags for Linear Address MaskingKirill A. Shutemov
Enumerate Linear Address Masking and provide defines for CR3 and CR4 flags. The new CONFIG_ADDRESS_MASKING option enables the feature support in kernel. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/all/20230312112612.31869-4-kirill.shutemov%40linux.intel.com
2023-03-16x86: Allow atomic MM_CONTEXT flags settingKirill A. Shutemov
So far there's no need in atomic setting of MM context flags in mm_context_t::flags. The flags set early in exec and never change after that. LAM enabling requires atomic flag setting. The upcoming flag MM_CONTEXT_FORCE_TAGGED_SVA can be set much later in the process lifetime where multiple threads exist. Convert the field to unsigned long and do MM_CONTEXT_* accesses with __set_bit() and test_bit(). No functional changes. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/all/20230312112612.31869-3-kirill.shutemov%40linux.intel.com
2023-03-16KVM: x86/mmu: Use 64-bit address to invalidate to fix a subtle bugLai Jiangshan
FNAME(invlpg)() and kvm_mmu_invalidate_gva() take a gva_t, i.e. unsigned long, as the type of the address to invalidate. On 32-bit kernels, the upper 32 bits of the GPA will get dropped when an L2 GPA address is invalidated in the shadowed nested TDP MMU. Convert it to u64 to fix the problem. Reported-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Link: https://lore.kernel.org/r/20230216154115.710033-2-jiangshanlai@gmail.com [sean: tweak changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16x86/uaccess: Remove memcpy_page_flushcache()Ira Weiny
Commit 21b56c847753 ("iov_iter: get rid of separate bvec and xarray callbacks") removed the calls to memcpy_page_flushcache(). In addition, memcpy_page_flushcache() uses the deprecated kmap_atomic(). Remove the unused x86 memcpy_page_flushcache() implementation and also get rid of one more kmap_atomic() user. [ dhansen: tweak changelog ] Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20221230-kmap-x86-v1-1-15f1ecccab50%40intel.com
2023-03-14KVM: x86/mmu: Use EMULTYPE flag to track write #PFs to shadow pagesSean Christopherson
Use a new EMULTYPE flag, EMULTYPE_WRITE_PF_TO_SP, to track page faults on self-changing writes to shadowed page tables instead of propagating that information to the emulator via a semi-persistent vCPU flag. Using a flag in "struct kvm_vcpu_arch" is confusing, especially as implemented, as it's not at all obvious that clearing the flag only when emulation actually occurs is correct. E.g. if KVM sets the flag and then retries the fault without ever getting to the emulator, the flag will be left set for future calls into the emulator. But because the flag is consumed if and only if both EMULTYPE_PF and EMULTYPE_ALLOW_RETRY_PF are set, and because EMULTYPE_ALLOW_RETRY_PF is deliberately not set for direct MMUs, emulated MMIO, or while L2 is active, KVM avoids false positives on a stale flag since FNAME(page_fault) is guaranteed to be run and refresh the flag before it's ultimately consumed by the tail end of reexecute_instruction(). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230202182817.407394-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: SVM: Fix a benign off-by-one bug in AVIC physical table maskSean Christopherson
Define the "physical table max index mask" as bits 8:0, not 9:0. x2AVIC currently supports a max of 512 entries, i.e. the max index is 511, and the inputs to GENMASK_ULL() are inclusive. The bug is benign as bit 9 is reserved and never set by KVM, i.e. KVM is just clearing bits that are guaranteed to be zero. Note, as of this writing, APM "Rev. 3.39-October 2022" incorrectly states that bits 11:8 are reserved in Table B-1. VMCB Layout, Control Area. I.e. that table wasn't updated when x2AVIC support was added. Opportunistically fix the comment for the max AVIC ID to align with the code, and clean up comment formatting too. Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode") Cc: stable@vger.kernel.org Cc: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20230207002156.521736-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>