summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/cpu/common.c
AgeCommit message (Collapse)Author
2021-03-28x86/process/64: Move cpu_current_top_of_stack out of TSSLai Jiangshan
cpu_current_top_of_stack is currently stored in TSS.sp1. TSS is exposed through the cpu_entry_area which is visible with user CR3 when PTI is enabled and active. This makes it a coveted fruit for attackers. An attacker can fetch the kernel stack top from it and continue next steps of actions based on the kernel stack. But it is actualy not necessary to be stored in the TSS. It is only accessed after the entry code switched to kernel CR3 and kernel GS_BASE which means it can be in any regular percpu variable. The reason why it is in TSS is historical (pre PTI) because TSS is also used as scratch space in SYSCALL_64 and therefore cache hot. A syscall also needs the per CPU variable current_task and eventually __preempt_count, so placing cpu_current_top_of_stack next to them makes it likely that they end up in the same cache line which should avoid performance regressions. This is not enforced as the compiler is free to place these variables, so these entry relevant variables should move into a data structure to make this enforceable. The seccomp_benchmark doesn't show any performance loss in the "getpid native" test result. Actually, the result changes from 93ns before to 92ns with this change when KPTI is disabled. The test is very stable and although the test doesn't show a higher degree of precision it gives enough confidence that moving cpu_current_top_of_stack does not cause a regression. [ tglx: Removed unneeded export. Massaged changelog ] Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210125173444.22696-2-jiangshanlai@gmail.com
2021-03-18x86: Fix various typos in commentsIngo Molnar
Fix ~144 single-word typos in arch/x86/ code comments. Doing this in a single commit should reduce the churn. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org
2021-03-08x86/stackprotector/32: Make the canary into a regular percpu variableAndy Lutomirski
On 32-bit kernels, the stackprotector canary is quite nasty -- it is stored at %gs:(20), which is nasty because 32-bit kernels use %fs for percpu storage. It's even nastier because it means that whether %gs contains userspace state or kernel state while running kernel code depends on whether stackprotector is enabled (this is CONFIG_X86_32_LAZY_GS), and this setting radically changes the way that segment selectors work. Supporting both variants is a maintenance and testing mess. Merely rearranging so that percpu and the stack canary share the same segment would be messy as the 32-bit percpu address layout isn't currently compatible with putting a variable at a fixed offset. Fortunately, GCC 8.1 added options that allow the stack canary to be accessed as %fs:__stack_chk_guard, effectively turning it into an ordinary percpu variable. This lets us get rid of all of the code to manage the stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess. (That name is special. We could use any symbol we want for the %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us use any name other than __stack_chk_guard.) Forcibly disable stackprotector on older compilers that don't support the new options and turn the stack canary into a percpu variable. The "lazy GS" approach is now used for all 32-bit configurations. Also makes load_gs_index() work on 32-bit kernels. On 64-bit kernels, it loads the GS selector and updates the user GSBASE accordingly. (This is unchanged.) On 32-bit kernels, it loads the GS selector and updates GSBASE, which is now always the user base. This means that the overall effect is the same on 32-bit and 64-bit, which avoids some ifdeffery. [ bp: Massage commit message. ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/c0ff7dba14041c7e5d1cae5d4df052f03759bef3.1613243844.git.luto@kernel.org
2021-02-24Merge tag 'x86-entry-2021-02-24' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 irq entry updates from Thomas Gleixner: "The irq stack switching was moved out of the ASM entry code in course of the entry code consolidation. It ended up being suboptimal in various ways. This reworks the X86 irq stack handling: - Make the stack switching inline so the stackpointer manipulation is not longer at an easy to find place. - Get rid of the unnecessary indirect call. - Avoid the double stack switching in interrupt return and reuse the interrupt stack for softirq handling. - A objtool fix for CONFIG_FRAME_POINTER=y builds where it got confused about the stack pointer manipulation" * tag 'x86-entry-2021-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix stack-swizzle for FRAME_POINTER=y um: Enforce the usage of asm-generic/softirq_stack.h x86/softirq/64: Inline do_softirq_own_stack() softirq: Move do_softirq_own_stack() to generic asm header softirq: Move __ARCH_HAS_DO_SOFTIRQ to Kconfig x86: Select CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK x86/softirq: Remove indirection in do_softirq_own_stack() x86/entry: Use run_sysvec_on_irqstack_cond() for XEN upcall x86/entry: Convert device interrupts to inline stack switching x86/entry: Convert system vectors to irq stack macro x86/irq: Provide macro for inlining irq stack switching x86/apic: Split out spurious handling code x86/irq/64: Adjust the per CPU irq stack pointer by 8 x86/irq: Sanitize irq stack tracking x86/entry: Fix instrumentation annotation
2021-02-10x86/irq/64: Adjust the per CPU irq stack pointer by 8Thomas Gleixner
The per CPU hardirq_stack_ptr contains the pointer to the irq stack in the form that it is ready to be assigned to [ER]SP so that the first push ends up on the top entry of the stack. But the stack switching on 64 bit has the following rules: 1) Store the current stack pointer (RSP) in the top most stack entry to allow the unwinder to link back to the previous stack 2) Set RSP to the top most stack entry 3) Invoke functions on the irq stack 4) Pop RSP from the top most stack entry (stored in #1) so it's back to the original stack. That requires all stack switching code to decrement the stored pointer by 8 in order to be able to store the current RSP and then set RSP to that location. That's a pointless exercise. Do the -8 adjustment right when storing the pointer and make the data type a void pointer to avoid confusion vs. the struct irq_stack data type which is on 64bit only used to declare the backing store. Move the definition next to the inuse flag so they likely end up in the same cache line. Sticking them into a struct to enforce it is a seperate change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210210002512.354260928@linutronix.de
2021-02-10x86/irq: Sanitize irq stack trackingThomas Gleixner
The recursion protection for hard interrupt stacks is an unsigned int per CPU variable initialized to -1 named __irq_count. The irq stack switching is only done when the variable is -1, which creates worse code than just checking for 0. When the stack switching happens it uses this_cpu_add/sub(1), but there is no reason to do so. It simply can use straight writes. This is a historical leftover from the low level ASM code which used inc and jz to make a decision. Rename it to hardirq_stack_inuse, make it a bool and use plain stores. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210210002512.228830141@linutronix.de
2021-01-28x86/cpufeatures: Assign dedicated feature word for CPUID_0x8000001F[EAX]Sean Christopherson
Collect the scattered SME/SEV related feature flags into a dedicated word. There are now five recognized features in CPUID.0x8000001F.EAX, with at least one more on the horizon (SEV-SNP). Using a dedicated word allows KVM to use its automagic CPUID adjustment logic when reporting the set of supported features to userspace. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Link: https://lkml.kernel.org/r/20210122204047.2860075-2-seanjc@google.com
2020-10-14Merge tag 'x86_seves_for_v5.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV-ES support from Borislav Petkov: "SEV-ES enhances the current guest memory encryption support called SEV by also encrypting the guest register state, making the registers inaccessible to the hypervisor by en-/decrypting them on world switches. Thus, it adds additional protection to Linux guests against exfiltration, control flow and rollback attacks. With SEV-ES, the guest is in full control of what registers the hypervisor can access. This is provided by a guest-host exchange mechanism based on a new exception vector called VMM Communication Exception (#VC), a new instruction called VMGEXIT and a shared Guest-Host Communication Block which is a decrypted page shared between the guest and the hypervisor. Intercepts to the hypervisor become #VC exceptions in an SEV-ES guest so in order for that exception mechanism to work, the early x86 init code needed to be made able to handle exceptions, which, in itself, brings a bunch of very nice cleanups and improvements to the early boot code like an early page fault handler, allowing for on-demand building of the identity mapping. With that, !KASLR configurations do not use the EFI page table anymore but switch to a kernel-controlled one. The main part of this series adds the support for that new exchange mechanism. The goal has been to keep this as much as possibly separate from the core x86 code by concentrating the machinery in two SEV-ES-specific files: arch/x86/kernel/sev-es-shared.c arch/x86/kernel/sev-es.c Other interaction with core x86 code has been kept at minimum and behind static keys to minimize the performance impact on !SEV-ES setups. Work by Joerg Roedel and Thomas Lendacky and others" * tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits) x86/sev-es: Use GHCB accessor for setting the MMIO scratch buffer x86/sev-es: Check required CPU features for SEV-ES x86/efi: Add GHCB mappings when SEV-ES is active x86/sev-es: Handle NMI State x86/sev-es: Support CPU offline/online x86/head/64: Don't call verify_cpu() on starting APs x86/smpboot: Load TSS and getcpu GDT entry before loading IDT x86/realmode: Setup AP jump table x86/realmode: Add SEV-ES specific trampoline entry point x86/vmware: Add VMware-specific handling for VMMCALL under SEV-ES x86/kvm: Add KVM-specific VMMCALL handling under SEV-ES x86/paravirt: Allow hypervisor-specific VMMCALL handling under SEV-ES x86/sev-es: Handle #DB Events x86/sev-es: Handle #AC Events x86/sev-es: Handle VMMCALL Events x86/sev-es: Handle MWAIT/MWAITX Events x86/sev-es: Handle MONITOR/MONITORX Events x86/sev-es: Handle INVD Events x86/sev-es: Handle RDPMC Events x86/sev-es: Handle RDTSC(P) Events ...
2020-10-13Merge tag 'x86_asm_for_v5.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Borislav Petkov: "Two asm wrapper fixes: - Use XORL instead of XORQ to avoid a REX prefix and save some bytes in the .fixup section, by Uros Bizjak. - Replace __force_order dummy variable with a memory clobber to fix LLVM requiring a definition for former and to prevent memory accesses from still being cached/reordered, by Arvind Sankar" * tag 'x86_asm_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm: Replace __force_order with a memory clobber x86/uaccess: Use XORL %0,%0 in __get_user_asm()
2020-10-12Merge tag 'x86-paravirt-2020-10-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 paravirt cleanup from Ingo Molnar: "Clean up the paravirt code after the removal of 32-bit Xen PV support" * tag 'x86-paravirt-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/paravirt: Avoid needless paravirt step clearing page table entries x86/paravirt: Remove set_pte_at() pv-op x86/entry/32: Simplify CONFIG_XEN_PV build dependency x86/paravirt: Use CONFIG_PARAVIRT_XXL instead of CONFIG_PARAVIRT x86/paravirt: Clean up paravirt macros x86/paravirt: Remove 32-bit support from CONFIG_PARAVIRT_XXL
2020-10-01x86/asm: Replace __force_order with a memory clobberArvind Sankar
The CRn accessor functions use __force_order as a dummy operand to prevent the compiler from reordering CRn reads/writes with respect to each other. The fact that the asm is volatile should be enough to prevent this: volatile asm statements should be executed in program order. However GCC 4.9.x and 5.x have a bug that might result in reordering. This was fixed in 8.1, 7.3 and 6.5. Versions prior to these, including 5.x and 4.9.x, may reorder volatile asm statements with respect to each other. There are some issues with __force_order as implemented: - It is used only as an input operand for the write functions, and hence doesn't do anything additional to prevent reordering writes. - It allows memory accesses to be cached/reordered across write functions, but CRn writes affect the semantics of memory accesses, so this could be dangerous. - __force_order is not actually defined in the kernel proper, but the LLVM toolchain can in some cases require a definition: LLVM (as well as GCC 4.9) requires it for PIE code, which is why the compressed kernel has a definition, but also the clang integrated assembler may consider the address of __force_order to be significant, resulting in a reference that requires a definition. Fix this by: - Using a memory clobber for the write functions to additionally prevent caching/reordering memory accesses across CRn writes. - Using a dummy input operand with an arbitrary constant address for the read functions, instead of a global variable. This will prevent reads from being reordered across writes, while allowing memory loads to be cached/reordered across CRn reads, which should be safe. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602 Link: https://lore.kernel.org/lkml/20200527135329.1172644-1-arnd@arndb.de/ Link: https://lkml.kernel.org/r/20200902232152.3709896-1-nivedita@alum.mit.edu
2020-09-22x86/fpu: Handle FPU-related and clearcpuid command line arguments earlierMike Hommey
FPU initialization handles them currently. However, in the case of clearcpuid=, some other early initialization code may check for features before the FPU initialization code is called. Handling the argument earlier allows the command line to influence those early initializations. Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200921215638.37980-1-mh@glandium.org
2020-09-09x86/smpboot: Load TSS and getcpu GDT entry before loading IDTJoerg Roedel
The IDT on 64-bit contains vectors which use paranoid_entry() and/or IST stacks. To make these vectors work, the TSS and the getcpu GDT entry need to be set up before the IDT is loaded. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-68-joro@8bytes.org
2020-09-09x86/sev-es: Allocate and map an IST stack for #VC handlerJoerg Roedel
Allocate and map an IST stack and an additional fall-back stack for the #VC handler. The memory for the stacks is allocated only when SEV-ES is active. The #VC handler needs to use an IST stack because a #VC exception can be raised from kernel space with unsafe stack, e.g. in the SYSCALL entry path. Since the #VC exception can be nested, the #VC handler switches back to the interrupted stack when entered from kernel space. If switching back is not possible, the fall-back stack is used. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-43-joro@8bytes.org
2020-08-15x86/paravirt: Remove 32-bit support from CONFIG_PARAVIRT_XXLJuergen Gross
The last 32-bit user of stuff under CONFIG_PARAVIRT_XXL is gone. Remove 32-bit specific parts. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200815100641.26362-2-jgross@suse.com
2020-08-10Merge tag 'locking-urgent-2020-08-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Thomas Gleixner: "A set of locking fixes and updates: - Untangle the header spaghetti which causes build failures in various situations caused by the lockdep additions to seqcount to validate that the write side critical sections are non-preemptible. - The seqcount associated lock debug addons which were blocked by the above fallout. seqcount writers contrary to seqlock writers must be externally serialized, which usually happens via locking - except for strict per CPU seqcounts. As the lock is not part of the seqcount, lockdep cannot validate that the lock is held. This new debug mechanism adds the concept of associated locks. sequence count has now lock type variants and corresponding initializers which take a pointer to the associated lock used for writer serialization. If lockdep is enabled the pointer is stored and write_seqcount_begin() has a lockdep assertion to validate that the lock is held. Aside of the type and the initializer no other code changes are required at the seqcount usage sites. The rest of the seqcount API is unchanged and determines the type at compile time with the help of _Generic which is possible now that the minimal GCC version has been moved up. Adding this lockdep coverage unearthed a handful of seqcount bugs which have been addressed already independent of this. While generally useful this comes with a Trojan Horse twist: On RT kernels the write side critical section can become preemtible if the writers are serialized by an associated lock, which leads to the well known reader preempts writer livelock. RT prevents this by storing the associated lock pointer independent of lockdep in the seqcount and changing the reader side to block on the lock when a reader detects that a writer is in the write side critical section. - Conversion of seqcount usage sites to associated types and initializers" * tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) locking/seqlock, headers: Untangle the spaghetti monster locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header x86/headers: Remove APIC headers from <asm/smp.h> seqcount: More consistent seqprop names seqcount: Compress SEQCNT_LOCKNAME_ZERO() seqlock: Fold seqcount_LOCKNAME_init() definition seqlock: Fold seqcount_LOCKNAME_t definition seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g hrtimer: Use sequence counter with associated raw spinlock kvm/eventfd: Use sequence counter with associated spinlock userfaultfd: Use sequence counter with associated spinlock NFSv4: Use sequence counter with associated spinlock iocost: Use sequence counter with associated spinlock raid5: Use sequence counter with associated spinlock vfs: Use sequence counter with associated spinlock timekeeping: Use sequence counter with associated raw spinlock xfrm: policy: Use sequence counters with associated lock netfilter: nft_set_rbtree: Use sequence counter with associated rwlock netfilter: conntrack: Use sequence counter with associated spinlock sched: tasks: Use sequence counter with associated spinlock ...
2020-08-06locking/seqlock, headers: Untangle the spaghetti monsterPeter Zijlstra
By using lockdep_assert_*() from seqlock.h, the spaghetti monster attacked. Attack back by reducing seqlock.h dependencies from two key high level headers: - <linux/seqlock.h>: -Remove <linux/ww_mutex.h> - <linux/time.h>: -Remove <linux/seqlock.h> - <linux/sched.h>: +Add <linux/seqlock.h> The price was to add it to sched.h ... Core header fallout, we add direct header dependencies instead of gaining them parasitically from higher level headers: - <linux/dynamic_queue_limits.h>: +Add <asm/bug.h> - <linux/hrtimer.h>: +Add <linux/seqlock.h> - <linux/ktime.h>: +Add <asm/bug.h> - <linux/lockdep.h>: +Add <linux/smp.h> - <linux/sched.h>: +Add <linux/seqlock.h> - <linux/videodev2.h>: +Add <linux/kernel.h> Arch headers fallout: - PARISC: <asm/timex.h>: +Add <asm/special_insns.h> - SH: <asm/io.h>: +Add <asm/page.h> - SPARC: <asm/timer_64.h>: +Add <uapi/asm/asi.h> - SPARC: <asm/vvar.h>: +Add <asm/processor.h>, <asm/barrier.h> -Remove <linux/seqlock.h> - X86: <asm/fixmap.h>: +Add <asm/pgtable_types.h> -Remove <asm/acpi.h> There's also a bunch of parasitic header dependency fallout in .c files, not listed separately. [ mingo: Extended the changelog, split up & fixed the original patch. ] Co-developed-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200804133438.GK2674@hirez.programming.kicks-ass.net
2020-08-04Merge tag 'x86-fsgsbase-2020-08-04' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fsgsbase from Thomas Gleixner: "Support for FSGSBASE. Almost 5 years after the first RFC to support it, this has been brought into a shape which is maintainable and actually works. This final version was done by Sasha Levin who took it up after Intel dropped the ball. Sasha discovered that the SGX (sic!) offerings out there ship rogue kernel modules enabling FSGSBASE behind the kernels back which opens an instantanious unpriviledged root hole. The FSGSBASE instructions provide a considerable speedup of the context switch path and enable user space to write GSBASE without kernel interaction. This enablement requires careful handling of the exception entries which go through the paranoid entry path as they can no longer rely on the assumption that user GSBASE is positive (as enforced via prctl() on non FSGSBASE enabled systemn). All other entries (syscalls, interrupts and exceptions) can still just utilize SWAPGS unconditionally when the entry comes from user space. Converting these entries to use FSGSBASE has no benefit as SWAPGS is only marginally slower than WRGSBASE and locating and retrieving the kernel GSBASE value is not a free operation either. The real benefit of RD/WRGSBASE is the avoidance of the MSR reads and writes. The changes come with appropriate selftests and have held up in field testing against the (sanitized) Graphene-SGX driver" * tag 'x86-fsgsbase-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86/fsgsbase: Fix Xen PV support x86/ptrace: Fix 32-bit PTRACE_SETREGS vs fsbase and gsbase selftests/x86/fsgsbase: Add a missing memory constraint selftests/x86/fsgsbase: Fix a comment in the ptrace_write_gsbase test selftests/x86: Add a syscall_arg_fault_64 test for negative GSBASE selftests/x86/fsgsbase: Test ptracer-induced GS base write with FSGSBASE selftests/x86/fsgsbase: Test GS selector on ptracer-induced GS base write Documentation/x86/64: Add documentation for GS/FS addressing mode x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2 x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit x86/entry/64: Introduce the FIND_PERCPU_BASE macro x86/entry/64: Switch CR3 before SWAPGS in paranoid entry x86/speculation/swapgs: Check FSGSBASE in enabling SWAPGS mitigation x86/process/64: Use FSGSBASE instructions on thread copy and ptrace x86/process/64: Use FSBSBASE in switch_to() if available x86/process/64: Make save_fsgs_for_kvm() ready for FSGSBASE x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE ...
2020-06-18x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2Andi Kleen
The kernel needs to explicitly enable FSGSBASE. So, the application needs to know if it can safely use these instructions. Just looking at the CPUID bit is not enough because it may be running in a kernel that does not enable the instructions. One way for the application would be to just try and catch the SIGILL. But that is difficult to do in libraries which may not want to overwrite the signal handlers of the main application. Enumerate the enabled FSGSBASE capability in bit 1 of AT_HWCAP2 in the ELF aux vector. AT_HWCAP2 is already used by PPC for similar purposes. The application can access it open coded or by using the getauxval() function in newer versions of glibc. [ tglx: Massaged changelog ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1557309753-24073-18-git-send-email-chang.seok.bae@intel.com Link: https://lkml.kernel.org/r/20200528201402.1708239-14-sashal@kernel.org
2020-06-18x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bitAndy Lutomirski
Now that FSGSBASE is fully supported, remove unsafe_fsgsbase, enable FSGSBASE by default, and add nofsgsbase to disable it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1557309753-24073-17-git-send-email-chang.seok.bae@intel.com Link: https://lkml.kernel.org/r/20200528201402.1708239-13-sashal@kernel.org
2020-06-18x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASEAndy Lutomirski
This is temporary. It will allow the next few patches to be tested incrementally. Setting unsafe_fsgsbase is a root hole. Don't do it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/1557309753-24073-4-git-send-email-chang.seok.bae@intel.com Link: https://lkml.kernel.org/r/20200528201402.1708239-3-sashal@kernel.org
2020-06-18x86/cpu: Use pinning mask for CR4 bits needing to be 0Kees Cook
The X86_CR4_FSGSBASE bit of CR4 should not change after boot[1]. Older kernels should enforce this bit to zero, and newer kernels need to enforce it depending on boot-time configuration (e.g. "nofsgsbase"). To support a pinned bit being either 1 or 0, use an explicit mask in combination with the expected pinned bit values. [1] https://lore.kernel.org/lkml/20200527103147.GI325280@hirez.programming.kicks-ass.net Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/202006082013.71E29A42@keescook
2020-06-11x86/entry: Remove debug IDT frobbingPeter Zijlstra
This is all unused now. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200529213321.245019500@infradead.org
2020-06-11x86/nmi: Protect NMI entry against instrumentationThomas Gleixner
Mark all functions in the fragile code parts noinstr or force inlining so they can't be instrumented. Also make the hardware latency tracer invocation explicit outside of non-instrumentable section. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200505135314.716186134@linutronix.de
2020-06-09Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge even more updates from Andrew Morton: - a kernel-wide sweep of show_stack() - pagetable cleanups - abstract out accesses to mmap_sem - prep for mmap_sem scalability work - hch's user acess work Subsystems affected by this patch series: debug, mm/pagemap, mm/maccess, mm/documentation. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (93 commits) include/linux/cache.h: expand documentation over __read_mostly maccess: return -ERANGE when probe_kernel_read() fails x86: use non-set_fs based maccess routines maccess: allow architectures to provide kernel probing directly maccess: move user access routines together maccess: always use strict semantics for probe_kernel_read maccess: remove strncpy_from_unsafe tracing/kprobes: handle mixed kernel/userspace probes better bpf: rework the compat kernel probe handling bpf:bpf_seq_printf(): handle potentially unsafe format string better bpf: handle the compat string in bpf_trace_copy_string better bpf: factor out a bpf_trace_copy_string helper maccess: unify the probe kernel arch hooks maccess: remove probe_read_common and probe_write_common maccess: rename strnlen_unsafe_user to strnlen_user_nofault maccess: rename strncpy_from_unsafe_strict to strncpy_from_kernel_nofault maccess: rename strncpy_from_unsafe_user to strncpy_from_user_nofault maccess: update the top of file comment maccess: clarify kerneldoc comments maccess: remove duplicate kerneldoc comments ...
2020-06-09mm: reorder includes after introduction of linux/pgtable.hMike Rapoport
The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include of the latter in the middle of asm includes. Fix this up with the aid of the below script and manual adjustments here and there. import sys import re if len(sys.argv) is not 3: print "USAGE: %s <file> <header>" % (sys.argv[0]) sys.exit(1) hdr_to_move="#include <linux/%s>" % sys.argv[2] moved = False in_hdrs = False with open(sys.argv[1], "r") as f: lines = f.readlines() for _line in lines: line = _line.rstrip(' ') if line == hdr_to_move: continue if line.startswith("#include <linux/"): in_hdrs = True elif not moved and in_hdrs: moved = True print hdr_to_move print line Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mm: introduce include/linux/pgtable.hMike Rapoport
The include/linux/pgtable.h is going to be the home of generic page table manipulation functions. Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and make the latter include asm/pgtable.h. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09Merge branch 'x86/srbds' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 srbds fixes from Thomas Gleixner: "The 9th episode of the dime novel "The performance killer" with the subtitle "Slow Randomizing Boosts Denial of Service". SRBDS is an MDS-like speculative side channel that can leak bits from the random number generator (RNG) across cores and threads. New microcode serializes the processor access during the execution of RDRAND and RDSEED. This ensures that the shared buffer is overwritten before it is released for reuse. This is equivalent to a full bus lock, which means that many threads running the RNG instructions in parallel have the same effect as the same amount of threads issuing a locked instruction targeting an address which requires locking of two cachelines at once. The mitigation support comes with the usual pile of unpleasant ingredients: - command line options - sysfs file - microcode checks - a list of vulnerable CPUs identified by model and stepping this time which requires stepping match support for the cpu match logic. - the inevitable slowdown of affected CPUs" * branch 'x86/srbds' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation: Add Ivy Bridge to affected list x86/speculation: Add SRBDS vulnerability and mitigation documentation x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigation x86/cpu: Add 'table' argument to cpu_matches()
2020-06-05Merge tag 'x86-mm-2020-06-05' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "Misc changes: - Unexport various PAT primitives - Unexport per-CPU tlbstate and uninline TLB helpers" * tag 'x86-mm-2020-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/tlb/uv: Add a forward declaration for struct flush_tlb_info x86/cpu: Export native_write_cr4() only when CONFIG_LKTDM=m x86/tlb: Restrict access to tlbstate xen/privcmd: Remove unneeded asm/tlb.h include x86/tlb: Move PCID helpers where they are used x86/tlb: Uninline nmi_uaccess_okay() x86/tlb: Move cr4_set_bits_and_update_boot() to the usage site x86/tlb: Move paravirt_tlb_remove_table() to the usage site x86/tlb: Move __flush_tlb_all() out of line x86/tlb: Move flush_tlb_others() out of line x86/tlb: Move __flush_tlb_one_kernel() out of line x86/tlb: Move __flush_tlb_one_user() out of line x86/tlb: Move __flush_tlb_global() out of line x86/tlb: Move __flush_tlb() out of line x86/alternatives: Move temporary_mm helpers into C x86/cr4: Sanitize CR4.PCE update x86/cpu: Uninline CR4 accessors x86/tlb: Uninline __get_current_cr3_fast() x86/mm: Use pgprotval_t in protval_4k_2_large() and protval_large_2_4k() x86/mm: Unexport __cachemode2pte_tbl ...
2020-05-06x86/resctrl: Query LLC monitoring properties once during bootReinette Chatre
Cache and memory bandwidth monitoring are features that are part of x86 CPU resource control that is supported by the resctrl subsystem. The monitoring properties are obtained via CPUID from every CPU and only used within the resctrl subsystem where the properties are only read from boot_cpu_data. Obtain the monitoring properties once, placed in boot_cpu_data, via the ->c_bsp_init() helpers of the vendors that support X86_FEATURE_CQM_LLC. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/6d74a6ac3e69f4b7a8b4115835f9455faf0f468d.1588715690.git.reinette.chatre@intel.com
2020-05-06x86/resctrl: Remove unnecessary RMID checksReinette Chatre
The cache and memory bandwidth monitoring properties are read using CPUID on every CPU. After the information is read from the system a sanity check is run to (1) ensure that the RMID data is initialized for the boot CPU in case the information was not available on the boot CPU and (2) the boot CPU's RMID is set to the minimum of RMID obtained from all CPUs. Every known platform that supports resctrl has the same maximum RMID on all CPUs. Both sanity checks found in x86_init_cache_qos() can thus safely be removed. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/c9a3b60d34091840c8b0bd1c6fab15e5ba92cb17.1588715690.git.reinette.chatre@intel.com
2020-05-06x86/cpu: Move resctrl CPUID code to resctrl/Reinette Chatre
The function determining a platform's support and properties of cache occupancy and memory bandwidth monitoring (properties of X86_FEATURE_CQM_LLC) can be found among the common CPU code. After the feature's properties is populated in the per-CPU data the resctrl subsystem is the only consumer (via boot_cpu_data). Move the function that obtains the CPU information used by resctrl to the resctrl subsystem and rename it from init_cqm() to resctrl_cpu_detect(). The function continues to be called from the common CPU code. This move is done in preparation of the addition of some vendor specific code. No functional change. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/38433b99f9d16c8f4ee796f8cc42b871531fa203.1588715690.git.reinette.chatre@intel.com
2020-04-26x86/cpu: Export native_write_cr4() only when CONFIG_LKTDM=mThomas Gleixner
Modules have no business poking into this but fixing this is for later. [ bp: Carve out from an earlier patch. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200421092558.939985695@linutronix.de
2020-04-24x86/cpu: Uninline CR4 accessorsThomas Gleixner
cpu_tlbstate is exported because various TLB-related functions need access to it, but cpu_tlbstate is sensitive information which should only be accessed by well-contained kernel functions and not be directly exposed to modules. The various CR4 accessors require cpu_tlbstate as the CR4 shadow cache is located there. In preparation for unexporting cpu_tlbstate, create a builtin function for manipulating CR4 and rework the various helpers to use it. No functional change. [ bp: push the export of native_write_cr4() only when CONFIG_LKTDM=m to the last patch in the series. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200421092558.939985695@linutronix.de
2020-04-20x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigationMark Gross
SRBDS is an MDS-like speculative side channel that can leak bits from the random number generator (RNG) across cores and threads. New microcode serializes the processor access during the execution of RDRAND and RDSEED. This ensures that the shared buffer is overwritten before it is released for reuse. While it is present on all affected CPU models, the microcode mitigation is not needed on models that enumerate ARCH_CAPABILITIES[MDS_NO] in the cases where TSX is not supported or has been disabled with TSX_CTRL. The mitigation is activated by default on affected processors and it increases latency for RDRAND and RDSEED instructions. Among other effects this will reduce throughput from /dev/urandom. * Enable administrator to configure the mitigation off when desired using either mitigations=off or srbds=off. * Export vulnerability status via sysfs * Rename file-scoped macros to apply for non-whitelist table initializations. [ bp: Massage, - s/VULNBL_INTEL_STEPPING/VULNBL_INTEL_STEPPINGS/g, - do not read arch cap MSR a second time in tsx_fused_off() - just pass it in, - flip check in cpu_set_bug_bits() to save an indentation level, - reflow comments. jpoimboe: s/Mitigated/Mitigation/ in user-visible strings tglx: Dropped the fused off magic for now ] Signed-off-by: Mark Gross <mgross@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
2020-04-20x86/cpu: Add 'table' argument to cpu_matches()Mark Gross
To make cpu_matches() reusable for other matching tables, have it take a pointer to a x86_cpu_id table as an argument. [ bp: Flip arguments order. ] Signed-off-by: Mark Gross <mgross@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2020-03-30Merge tag 'x86-splitlock-2020-03-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 splitlock updates from Thomas Gleixner: "Support for 'split lock' detection: Atomic operations (lock prefixed instructions) which span two cache lines have to acquire the global bus lock. This is at least 1k cycles slower than an atomic operation within a cache line and disrupts performance on other cores. Aside of performance disruption this is a unpriviledged form of DoS. Some newer CPUs have the capability to raise an #AC trap when such an operation is attempted. The detection is by default enabled in warning mode which will warn once when a user space application is caught. A command line option allows to disable the detection or to select fatal mode which will terminate offending applications with SIGBUS" * tag 'x86-splitlock-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/split_lock: Avoid runtime reads of the TEST_CTRL MSR x86/split_lock: Rework the initialization flow of split lock detection x86/split_lock: Enable split lock detection by kernel
2020-03-25Merge branch 'x86/cpu' into perf/core, to resolve conflictIngo Molnar
Conflicts: arch/x86/events/intel/uncore.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-24x86/cpu/bugs: Convert to new matching macrosThomas Gleixner
The new macro set has a consistent namespace and uses C99 initializers instead of the grufty C89 ones. The local wrappers have to stay as they are tailored to tame the hardware vulnerability mess. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lkml.kernel.org/r/20200320131508.934926587@linutronix.de
2020-02-27x86/pkeys: Manually set X86_FEATURE_OSPKE to preserve existing changesSean Christopherson
Explicitly set X86_FEATURE_OSPKE via set_cpu_cap() instead of calling get_cpu_cap() to pull the feature bit from CPUID after enabling CR4.PKE. Invoking get_cpu_cap() effectively wipes out any {set,clear}_cpu_cap() changes that were made between this_cpu->c_init() and setup_pku(), as all non-synthetic feature words are reinitialized from the CPU's CPUID values. Blasting away capability updates manifests most visibility when running on a VMX capable CPU, but with VMX disabled by BIOS. To indicate that VMX is disabled, init_ia32_feat_ctl() clears X86_FEATURE_VMX, using clear_cpu_cap() instead of setup_clear_cpu_cap() so that KVM can report which CPU is misconfigured (KVM needs to probe every CPU anyways). Restoring X86_FEATURE_VMX from CPUID causes KVM to think VMX is enabled, ultimately leading to an unexpected #GP when KVM attempts to do VMXON. Arguably, init_ia32_feat_ctl() should use setup_clear_cpu_cap() and let KVM figure out a different way to report the misconfigured CPU, but VMX is not the only feature bit that is affected, i.e. there is precedent that tweaking feature bits via {set,clear}_cpu_cap() after ->c_init() is expected to work. Most notably, x86_init_rdrand()'s clearing of X86_FEATURE_RDRAND when RDRAND malfunctions is also overwritten. Fixes: 0697694564c8 ("x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU") Reported-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Jacob Keller <jacob.e.keller@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200226231615.13664-1-sean.j.christopherson@intel.com
2020-02-20x86/split_lock: Enable split lock detection by kernelPeter Zijlstra (Intel)
A split-lock occurs when an atomic instruction operates on data that spans two cache lines. In order to maintain atomicity the core takes a global bus lock. This is typically >1000 cycles slower than an atomic operation within a cache line. It also disrupts performance on other cores (which must wait for the bus lock to be released before their memory operations can complete). For real-time systems this may mean missing deadlines. For other systems it may just be very annoying. Some CPUs have the capability to raise an #AC trap when a split lock is attempted. Provide a command line option to give the user choices on how to handle this: split_lock_detect= off - not enabled (no traps for split locks) warn - warn once when an application does a split lock, but allow it to continue running. fatal - Send SIGBUS to applications that cause split lock On systems that support split lock detection the default is "warn". Note that if the kernel hits a split lock in any mode other than "off" it will OOPs. One implementation wrinkle is that the MSR to control the split lock detection is per-core, not per thread. This might result in some short lived races on HT systems in "warn" mode if Linux tries to enable on one thread while disabling on the other. Race analysis by Sean Christopherson: - Toggling of split-lock is only done in "warn" mode. Worst case scenario of a race is that a misbehaving task will generate multiple #AC exceptions on the same instruction. And this race will only occur if both siblings are running tasks that generate split-lock #ACs, e.g. a race where sibling threads are writing different values will only occur if CPUx is disabling split-lock after an #AC and CPUy is re-enabling split-lock after *its* previous task generated an #AC. - Transitioning between off/warn/fatal modes at runtime isn't supported and disabling is tracked per task, so hardware will always reach a steady state that matches the configured mode. I.e. split-lock is guaranteed to be enabled in hardware once all _TIF_SLD threads have been scheduled out. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Co-developed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20200126200535.GB30377@agluck-desk2.amr.corp.intel.com
2020-01-30Merge tag 'mpx-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx Pull x86 MPX removal from Dave Hansen: "MPX requires recompiling applications, which requires compiler support. Unfortunately, GCC 9.1 is expected to be be released without support for MPX. This means that there was only a relatively small window where folks could have ever used MPX. It failed to gain wide adoption in the industry, and Linux was the only mainstream OS to ever support it widely. Support for the feature may also disappear on future processors. This set completes the process that we started during the 5.4 merge window when the MPX prctl()s were removed. XSAVE support is left in place, which allows MPX-using KVM guests to continue to function" * tag 'mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx: x86/mpx: remove MPX from arch/x86 mm: remove arch_bprm_mm_init() hook x86/mpx: remove bounds exception code x86/mpx: remove build infrastructure x86/alternatives: add missing insn.h include
2020-01-28Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu-features updates from Ingo Molnar: "The biggest change in this cycle was a large series from Sean Christopherson to clean up the handling of VMX features. This both fixes bugs/inconsistencies and makes the code more coherent and future-proof. There are also two cleanups and a minor TSX syslog messages enhancement" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/cpu: Remove redundant cpu_detect_cache_sizes() call x86/cpu: Print "VMX disabled" error message iff KVM is enabled KVM: VMX: Allow KVM_INTEL when building for Centaur and/or Zhaoxin CPUs perf/x86: Provide stubs of KVM helpers for non-Intel CPUs KVM: VMX: Use VMX_FEATURE_* flags to define VMCS control bits KVM: VMX: Check for full VMX support when verifying CPU compatibility KVM: VMX: Use VMX feature flag to query BIOS enabling KVM: VMX: Drop initialization of IA32_FEAT_CTL MSR x86/cpufeatures: Add flag to track whether MSR IA32_FEAT_CTL is configured x86/cpu: Set synthetic VMX cpufeatures during init_ia32_feat_ctl() x86/cpu: Print VMX flags in /proc/cpuinfo using VMX_FEATURES_* x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUs x86/vmx: Introduce VMX_FEATURES_* x86/cpu: Clear VMX feature flag if VMX is not fully enabled x86/zhaoxin: Use common IA32_FEAT_CTL MSR initialization x86/centaur: Use common IA32_FEAT_CTL MSR initialization x86/mce: WARN once if IA32_FEAT_CTL MSR is left unlocked x86/intel: Initialize IA32_FEAT_CTL MSR at boot tools/x86: Sync msr-index.h from kernel sources selftests, kvm: Replace manual MSR defs with common msr-index.h ...
2020-01-28Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups all around the map" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/CPU/AMD: Remove amd_get_topology_early() x86/tsc: Remove redundant assignment x86/crash: Use resource_size() x86/cpu: Add a missing prototype for arch_smt_update() x86/nospec: Remove unused RSB_FILL_LOOPS x86/vdso: Provide missing include file x86/Kconfig: Correct spelling and punctuation Documentation/x86/boot: Fix typo x86/boot: Fix a comment's incorrect file reference x86/process: Remove set but not used variables prev and next x86/Kconfig: Fix Kconfig indentation
2020-01-28Merge branch 'efi-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The main changes in this cycle were: - Cleanup of the GOP [graphics output] handling code in the EFI stub - Complete refactoring of the mixed mode handling in the x86 EFI stub - Overhaul of the x86 EFI boot/runtime code - Increase robustness for mixed mode code - Add the ability to disable DMA at the root port level in the EFI stub - Get rid of RWX mappings in the EFI memory map and page tables, where possible - Move the support code for the old EFI memory mapping style into its only user, the SGI UV1+ support code. - plus misc fixes, updates, smaller cleanups. ... and due to interactions with the RWX changes, another round of PAT cleanups make a guest appearance via the EFI tree - with no side effects intended" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits) efi/x86: Disable instrumentation in the EFI runtime handling code efi/libstub/x86: Fix EFI server boot failure efi/x86: Disallow efi=old_map in mixed mode x86/boot/compressed: Relax sed symbol type regex for LLVM ld.lld efi/x86: avoid KASAN false positives when accessing the 1: 1 mapping efi: Fix handling of multiple efi_fake_mem= entries efi: Fix efi_memmap_alloc() leaks efi: Add tracking for dynamically allocated memmaps efi: Add a flags parameter to efi_memory_map efi: Fix comment for efi_mem_type() wrt absent physical addresses efi/arm: Defer probe of PCIe backed efifb on DT systems efi/x86: Limit EFI old memory map to SGI UV machines efi/x86: Avoid RWX mappings for all of DRAM efi/x86: Don't map the entire kernel text RW for mixed mode x86/mm: Fix NX bit clearing issue in kernel_map_pages_in_pgd efi/libstub/x86: Fix unused-variable warning efi/libstub/x86: Use mandatory 16-byte stack alignment in mixed mode efi/libstub/x86: Use const attribute for efi_is_64bit() efi: Allow disabling PCI busmastering on bridges during boot efi/x86: Allow translating 64-bit arguments for mixed mode calls ...
2020-01-23x86/mpx: remove MPX from arch/x86Dave Hansen
From: Dave Hansen <dave.hansen@linux.intel.com> MPX is being removed from the kernel due to a lack of support in the toolchain going forward (gcc). This removes all the remaining (dead at this point) MPX handling code remaining in the tree. The only remaining code is the XSAVE support for MPX state which is currently needd for KVM to handle VMs which might use MPX. Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: x86@kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
2020-01-17x86/speculation/swapgs: Exclude Zhaoxin CPUs from SWAPGS vulnerabilityTony W Wang-oc
New Zhaoxin family 7 CPUs are not affected by the SWAPGS vulnerability. So mark these CPUs in the cpu vulnerability whitelist accordingly. Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/1579227872-26972-3-git-send-email-TonyWWang-oc@zhaoxin.com
2020-01-17x86/speculation/spectre_v2: Exclude Zhaoxin CPUs from SPECTRE_V2Tony W Wang-oc
New Zhaoxin family 7 CPUs are not affected by SPECTRE_V2. So define a separate cpu_vuln_whitelist bit NO_SPECTRE_V2 and add these CPUs to the cpu vulnerability whitelist. Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/1579227872-26972-2-git-send-email-TonyWWang-oc@zhaoxin.com
2020-01-13x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUsSean Christopherson
Add an entry in struct cpuinfo_x86 to track VMX capabilities and fill the capabilities during IA32_FEAT_CTL MSR initialization. Make the VMX capabilities dependent on IA32_FEAT_CTL and X86_FEATURE_NAMES so as to avoid unnecessary overhead on CPUs that can't possibly support VMX, or when /proc/cpuinfo is not available. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-11-sean.j.christopherson@intel.com
2020-01-09x86/cpu: Add a missing prototype for arch_smt_update()Benjamin Thiel
.. in order to fix a -Wmissing-prototype warning. No functional change. Signed-off-by: Benjamin Thiel <b.thiel@posteo.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200109121723.8151-1-b.thiel@posteo.de