summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/head_64.S
AgeCommit message (Collapse)Author
2023-06-26Merge tag 'smp-core-2023-06-26' of ↵Linus Torvalds
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP updates from Thomas Gleixner: "A large update for SMP management: - Parallel CPU bringup The reason why people are interested in parallel bringup is to shorten the (kexec) reboot time of cloud servers to reduce the downtime of the VM tenants. The current fully serialized bringup does the following per AP: 1) Prepare callbacks (allocate, intialize, create threads) 2) Kick the AP alive (e.g. INIT/SIPI on x86) 3) Wait for the AP to report alive state 4) Let the AP continue through the atomic bringup 5) Let the AP run the threaded bringup to full online state There are two significant delays: #3 The time for an AP to report alive state in start_secondary() on x86 has been measured in the range between 350us and 3.5ms depending on vendor and CPU type, BIOS microcode size etc. #4 The atomic bringup does the microcode update. This has been measured to take up to ~8ms on the primary threads depending on the microcode patch size to apply. On a two socket SKL server with 56 cores (112 threads) the boot CPU spends on current mainline about 800ms busy waiting for the APs to come up and apply microcode. That's more than 80% of the actual onlining procedure. This can be reduced significantly by splitting the bringup mechanism into two parts: 1) Run the prepare callbacks and kick the AP alive for each AP which needs to be brought up. The APs wake up, do their firmware initialization and run the low level kernel startup code including microcode loading in parallel up to the first synchronization point. (#1 and #2 above) 2) Run the rest of the bringup code strictly serialized per CPU (#3 - #5 above) as it's done today. Parallelizing that stage of the CPU bringup might be possible in theory, but it's questionable whether required surgery would be justified for a pretty small gain. If the system is large enough the first AP is already waiting at the first synchronization point when the boot CPU finished the wake-up of the last AP. That reduces the AP bringup time on that SKL from ~800ms to ~80ms, i.e. by a factor ~10x. The actual gain varies wildly depending on the system, CPU, microcode patch size and other factors. There are some opportunities to reduce the overhead further, but that needs some deep surgery in the x86 CPU bringup code. For now this is only enabled on x86, but the core functionality obviously works for all SMP capable architectures. - Enhancements for SMP function call tracing so it is possible to locate the scheduling and the actual execution points. That allows to measure IPI delivery time precisely" * tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits) trace,smp: Add tracepoints for scheduling remotelly called functions trace,smp: Add tracepoints around remotelly called functions MAINTAINERS: Add CPU HOTPLUG entry x86/smpboot: Fix the parallel bringup decision x86/realmode: Make stack lock work in trampoline_compat() x86/smp: Initialize cpu_primary_thread_mask late cpu/hotplug: Fix off by one in cpuhp_bringup_mask() x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it x86/smpboot: Support parallel startup of secondary CPUs x86/smpboot: Implement a bit spinlock to protect the realmode stack x86/apic: Save the APIC virtual base address cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE x86/apic: Provide cpu_primary_thread mask x86/smpboot: Enable split CPU startup cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism cpu/hotplug: Reset task stack state in _cpu_up() cpu/hotplug: Remove unused state functions riscv: Switch to hotplug core state synchronization parisc: Switch to hotplug core state synchronization ...
2023-06-02x86/head/64: Switch to KERNEL_CS as soon as new GDT is installedTom Lendacky
The call to startup_64_setup_env() will install a new GDT but does not actually switch to using the KERNEL_CS entry until returning from the function call. Commit bcce82908333 ("x86/sev: Detect/setup SEV/SME features earlier in boot") moved the call to sme_enable() earlier in the boot process and in between the call to startup_64_setup_env() and the switch to KERNEL_CS. An SEV-ES or an SEV-SNP guest will trigger #VC exceptions during the call to sme_enable() and if the CS pushed on the stack as part of the exception and used by IRETQ is not mapped by the new GDT, then problems occur. Today, the current CS when entering startup_64 is the kernel CS value because it was set up by the decompressor code, so no issue is seen. However, a recent patchset that looked to avoid using the legacy decompressor during an EFI boot exposed this bug. At entry to startup_64, the CS value is that of EFI and is not mapped in the new kernel GDT. So when a #VC exception occurs, the CS value used by IRETQ is not valid and the guest boot crashes. Fix this issue by moving the block that switches to the KERNEL_CS value to be done immediately after returning from startup_64_setup_env(). Fixes: bcce82908333 ("x86/sev: Detect/setup SEV/SME features earlier in boot") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/all/6ff1f28af2829cc9aea357ebee285825f90a431f.1684340801.git.thomas.lendacky%40amd.com
2023-05-15x86/smpboot: Support parallel startup of secondary CPUsDavid Woodhouse
In parallel startup mode the APs are kicked alive by the control CPU quickly after each other and run through the early startup code in parallel. The real-mode startup code is already serialized with a bit-spinlock to protect the real-mode stack. In parallel startup mode the smpboot_control variable obviously cannot contain the Linux CPU number so the APs have to determine their Linux CPU number on their own. This is required to find the CPUs per CPU offset in order to find the idle task stack and other per CPU data. To achieve this, export the cpuid_to_apicid[] array so that each AP can find its own CPU number by searching therein based on its APIC ID. Introduce a flag in the top bits of smpboot_control which indicates that the AP should find its CPU number by reading the APIC ID from the APIC. This is required because CPUID based APIC ID retrieval can only provide the initial APIC ID, which might have been overruled by the firmware. Some AMD APUs come up with APIC ID = initial APIC ID + 0x10, so the APIC ID to CPU number lookup would fail miserably if based on CPUID. Also virtualization can make its own APIC ID assignements. The only requirement is that the APIC IDs are consistent with the APCI/MADT table. For the boot CPU or in case parallel bringup is disabled the control bits are empty and the CPU number is directly available in bit 0-23 of smpboot_control. [ tglx: Initial proof of concept patch with bitlock and APIC ID lookup ] [ dwmw2: Rework and testing, commit message, CPUID 0x1 and CPU0 support ] [ seanc: Fix stray override of initial_gs in common_cpu_up() ] [ Oleksandr Natalenko: reported suspend/resume issue fixed in x86_acpi_suspend_lowlevel ] [ tglx: Make it read the APIC ID from the APIC instead of using CPUID, split the bitlock part out ] Co-developed-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Helge Deller <deller@gmx.de> # parisc Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://lore.kernel.org/r/20230512205257.411554373@linutronix.de
2023-05-15x86/smpboot: Implement a bit spinlock to protect the realmode stackThomas Gleixner
Parallel AP bringup requires that the APs can run fully parallel through the early startup code including the real mode trampoline. To prepare for this implement a bit-spinlock to serialize access to the real mode stack so that parallel upcoming APs are not going to corrupt each others stack while going through the real mode startup code. Co-developed-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Helge Deller <deller@gmx.de> # parisc Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://lore.kernel.org/r/20230512205257.355425551@linutronix.de
2023-05-15x86/smpboot: Restrict soft_restart_cpu() to SEVThomas Gleixner
Now that the CPU0 hotplug cruft is gone, the only user is AMD SEV. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Helge Deller <deller@gmx.de> # parisc Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://lore.kernel.org/r/20230512205255.822234014@linutronix.de
2023-05-15x86/smpboot: Rename start_cpu0() to soft_restart_cpu()Thomas Gleixner
This is used in the SEV play_dead() implementation to re-online CPUs. But that has nothing to do with CPU0. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Helge Deller <deller@gmx.de> # parisc Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://lore.kernel.org/r/20230512205255.662319599@linutronix.de
2023-04-28Merge tag 'objtool-core-2023-04-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: - Mark arch_cpu_idle_dead() __noreturn, make all architectures & drivers that did this inconsistently follow this new, common convention, and fix all the fallout that objtool can now detect statically - Fix/improve the ORC unwinder becoming unreliable due to UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK and UNWIND_HINT_UNDEFINED to resolve it - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code - Generate ORC data for __pfx code - Add more __noreturn annotations to various kernel startup/shutdown and panic functions - Misc improvements & fixes * tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) x86/hyperv: Mark hv_ghcb_terminate() as noreturn scsi: message: fusion: Mark mpt_halt_firmware() __noreturn x86/cpu: Mark {hlt,resume}_play_dead() __noreturn btrfs: Mark btrfs_assertfail() __noreturn objtool: Include weak functions in global_noreturns check cpu: Mark nmi_panic_self_stop() __noreturn cpu: Mark panic_smp_self_stop() __noreturn arm64/cpu: Mark cpu_park_loop() and friends __noreturn x86/head: Mark *_start_kernel() __noreturn init: Mark start_kernel() __noreturn init: Mark [arch_call_]rest_init() __noreturn objtool: Generate ORC data for __pfx code x86/linkage: Fix padding for typed functions objtool: Separate prefix code from stack validation code objtool: Remove superfluous dead_end_function() check objtool: Add symbol iteration helpers objtool: Add WARN_INSN() scripts/objdump-func: Support multiple functions context_tracking: Fix KCSAN noinstr violation objtool: Add stackleak instrumentation to uaccess safe list ...
2023-03-23x86,objtool: Split UNWIND_HINT_EMPTY in twoJosh Poimboeuf
Mark reported that the ORC unwinder incorrectly marks an unwind as reliable when the unwind terminates prematurely in the dark corners of return_to_handler() due to lack of information about the next frame. The problem is UNWIND_HINT_EMPTY is used in two different situations: 1) The end of the kernel stack unwind before hitting user entry, boot code, or fork entry 2) A blind spot in ORC coverage where the unwinder has to bail due to lack of information about the next frame The ORC unwinder has no way to tell the difference between the two. When it encounters an undefined stack state with 'end=1', it blindly marks the stack reliable, which can break the livepatch consistency model. Fix it by splitting UNWIND_HINT_EMPTY into UNWIND_HINT_UNDEFINED and UNWIND_HINT_END_OF_STACK. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/fd6212c8b450d3564b855e1cb48404d6277b4d9f.1677683419.git.jpoimboe@kernel.org
2023-03-23x86,objtool: Separate unret validation from unwind hintsJosh Poimboeuf
The ENTRY unwind hint type is serving double duty as both an empty unwind hint and an unret validation annotation. Unret validation is unrelated to unwinding. Separate it out into its own annotation. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/ff7448d492ea21b86d8a90264b105fbd0d751077.1677683419.git.jpoimboe@kernel.org
2023-03-21x86/smpboot: Remove initial_gsBrian Gerst
Given its CPU#, each CPU can find its own per-cpu offset, and directly set GSBASE accordingly. The global variable can be eliminated. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Usama Arif <usama.arif@bytedance.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Usama Arif <usama.arif@bytedance.com> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/20230316222109.1940300-9-usama.arif@bytedance.com
2023-03-21x86/smpboot: Remove early_gdt_descr on 64-bitBrian Gerst
Build the GDT descriptor on the stack instead. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Usama Arif <usama.arif@bytedance.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Usama Arif <usama.arif@bytedance.com> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/20230316222109.1940300-8-usama.arif@bytedance.com
2023-03-21x86/smpboot: Remove initial_stack on 64-bitBrian Gerst
In order to facilitate parallel startup, start to eliminate some of the global variables passing information to CPUs in the startup path. However, start by introducing one more: smpboot_control. For now this merely holds the CPU# of the CPU which is coming up. Each CPU can then find its own per-cpu data, and everything else it needs can be found from there, allowing the other global variables to be removed. First to be removed is initial_stack. Each CPU can load %rsp from its current_task->thread.sp instead. That is already set up with the correct idle thread for APs. Set up the .sp field in INIT_THREAD on x86 so that the BSP also finds a suitable stack pointer in the static per-cpu data when coming up on first boot. On resume from S3, the CPU needs a temporary stack because its idle task is already active. Instead of setting initial_stack, the sleep code can simply set its own current->thread.sp to point to the temporary stack. Nobody else cares about ->thread.sp for a thread which is currently on a CPU, because the true value is actually in the %rsp register. Which is restored with the rest of the CPU context in do_suspend_lowlevel(). Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Usama Arif <usama.arif@bytedance.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Usama Arif <usama.arif@bytedance.com> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/20230316222109.1940300-7-usama.arif@bytedance.com
2022-10-17x86/callthunks: Add call patching for call depth trackingThomas Gleixner
Mitigating the Intel SKL RSB underflow issue in software requires to track the call depth. That is every CALL and every RET need to be intercepted and additional code injected. The existing retbleed mitigations already include means of redirecting RET to __x86_return_thunk; this can be re-purposed and RET can be redirected to another function doing RET accounting. CALL accounting will use the function padding introduced in prior patches. For each CALL instruction, the destination symbol's padding is rewritten to do the accounting and the CALL instruction is adjusted to call into the padding. This ensures only affected CPUs pay the overhead of this accounting. Unaffected CPUs will leave the padding unused and have their 'JMP __x86_return_thunk' replaced with an actual 'RET' instruction. Objtool has been modified to supply a .call_sites section that lists all the 'CALL' instructions. Additionally the paravirt instruction sites are iterated since they will have been patched from an indirect call to direct calls (or direct instructions in which case it'll be ignored). Module handling and the actual thunk code for SKL will be added in subsequent steps. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111147.470877038@infradead.org
2022-06-27objtool: Add entry UNRET validationPeter Zijlstra
Since entry asm is tricky, add a validation pass that ensures the retbleed mitigation has been done before the first actual RET instruction. Entry points are those that either have UNWIND_HINT_ENTRY, which acts as UNWIND_HINT_EMPTY but marks the instruction as an entry point, or those that have UWIND_HINT_IRET_REGS at +0. This is basically a variant of validate_branch() that is intra-function and it will simply follow all branches from marked entry points and ensures that all paths lead to ANNOTATE_UNRET_END. If a path hits RET or an indirection the path is a fail and will be reported. There are 3 ANNOTATE_UNRET_END instances: - UNTRAIN_RET itself - exception from-kernel; this path doesn't need UNTRAIN_RET - all early exceptions; these also don't need UNTRAIN_RET Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-05-23Merge tag 'x86_tdx_for_v5.19_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull Intel TDX support from Borislav Petkov: "Intel Trust Domain Extensions (TDX) support. This is the Intel version of a confidential computing solution called Trust Domain Extensions (TDX). This series adds support to run the kernel as part of a TDX guest. It provides similar guest protections to AMD's SEV-SNP like guest memory and register state encryption, memory integrity protection and a lot more. Design-wise, it differs from AMD's solution considerably: it uses a software module which runs in a special CPU mode called (Secure Arbitration Mode) SEAM. As the name suggests, this module serves as sort of an arbiter which the confidential guest calls for services it needs during its lifetime. Just like AMD's SNP set, this series reworks and streamlines certain parts of x86 arch code so that this feature can be properly accomodated" * tag 'x86_tdx_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) x86/tdx: Fix RETs in TDX asm x86/tdx: Annotate a noreturn function x86/mm: Fix spacing within memory encryption features message x86/kaslr: Fix build warning in KASLR code in boot stub Documentation/x86: Document TDX kernel architecture ACPICA: Avoid cache flush inside virtual machines x86/tdx/ioapic: Add shared bit for IOAPIC base address x86/mm: Make DMA memory shared for TD guest x86/mm/cpa: Add support for TDX shared memory x86/tdx: Make pages shared in ioremap() x86/topology: Disable CPU online/offline control for TDX guests x86/boot: Avoid #VE during boot for TDX platforms x86/boot: Set CR0.NE early and keep it set during the boot x86/acpi/x86/boot: Add multiprocessor wake-up support x86/boot: Add a trampoline for booting APs via firmware handoff x86/tdx: Wire up KVM hypercalls x86/tdx: Port I/O: Add early boot support x86/tdx: Port I/O: Add runtime hypercalls x86/boot: Port I/O: Add decompression-time support for TDX x86/boot: Port I/O: Allow to hook up alternative helpers ...
2022-04-07x86/boot: Avoid #VE during boot for TDX platformsSean Christopherson
There are a few MSRs and control register bits that the kernel normally needs to modify during boot. But, TDX disallows modification of these registers to help provide consistent security guarantees. Fortunately, TDX ensures that these are all in the correct state before the kernel loads, which means the kernel does not need to modify them. The conditions to avoid are: * Any writes to the EFER MSR * Clearing CR4.MCE This theoretically makes the guest boot more fragile. If, for instance, EFER was set up incorrectly and a WRMSR was performed, it will trigger early exception panic or a triple fault, if it's before early exceptions are set up. However, this is likely to trip up the guest BIOS long before control reaches the kernel. In any case, these kinds of problems are unlikely to occur in production environments, and developers have good debug tools to fix them quickly. Change the common boot code to work on TDX and non-TDX systems. This should have no functional effect on non-TDX systems. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20220405232939.73860-24-kirill.shutemov@linux.intel.com
2022-04-06x86/head/64: Re-enable stack protectionMichael Roth
Due to 103a4908ad4d ("x86/head/64: Disable stack protection for head$(BITS).o") kernel/head{32,64}.c are compiled with -fno-stack-protector to allow a call to set_bringup_idt_handler(), which would otherwise have stack protection enabled with CONFIG_STACKPROTECTOR_STRONG. While sufficient for that case, there may still be issues with calls to any external functions that were compiled with stack protection enabled that in-turn make stack-protected calls, or if the exception handlers set up by set_bringup_idt_handler() make calls to stack-protected functions. Subsequent patches for SEV-SNP CPUID validation support will introduce both such cases. Attempting to disable stack protection for everything in scope to address that is prohibitive since much of the code, like the SEV-ES #VC handler, is shared code that remains in use after boot and could benefit from having stack protection enabled. Attempting to inline calls is brittle and can quickly balloon out to library/helper code where that's not really an option. Instead, re-enable stack protection for head32.c/head64.c, and make the appropriate changes to ensure the segment used for the stack canary is initialized in advance of any stack-protected C calls. For head64.c: - The BSP will enter from startup_64() and call into C code (startup_64_setup_env()) shortly after setting up the stack, which may result in calls to stack-protected code. Set up %gs early to allow for this safely. - APs will enter from secondary_startup_64*(), and %gs will be set up soon after. There is one call to C code prior to %gs being setup (__startup_secondary_64()), but it is only to fetch 'sme_me_mask' global, so just load 'sme_me_mask' directly instead, and remove the now-unused __startup_secondary_64() function. For head32.c: - BSPs/APs will set %fs to __BOOT_DS prior to any C calls. In recent kernels, the compiler is configured to access the stack canary at %fs:__stack_chk_guard [1], which overlaps with the initial per-cpu '__stack_chk_guard' variable in the initial/"master" .data..percpu area. This is sufficient to allow access to the canary for use during initial startup, so no changes are needed there. [1] 3fb0fdb3bbe7 ("x86/stackprotector/32: Make the canary into a regular percpu variable") [ bp: Massage commit message. ] Suggested-by: Joerg Roedel <jroedel@suse.de> #for 64-bit %gs set up Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-24-brijesh.singh@amd.com
2022-04-06x86/sev: Detect/setup SEV/SME features earlier in bootMichael Roth
sme_enable() handles feature detection for both SEV and SME. Future patches will also use it for SEV-SNP feature detection/setup, which will need to be done immediately after the first #VC handler is set up. Move it now in preparation. Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Link: https://lore.kernel.org/r/20220307213356.2797205-9-brijesh.singh@amd.com
2022-03-15x86/ibt,sev: AnnotationsPeter Zijlstra
No IBT on AMD so far.. probably correct, who knows. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220308154318.995109889@infradead.org
2022-03-15x86/ibt: Annotate text referencesPeter Zijlstra
Annotate away some of the generic code references. This is things where we take the address of a symbol for exception handling or return addresses (eg. context switch). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220308154318.877758523@infradead.org
2022-03-15x86/ibt,entry: Sprinkle ENDBR dustPeter Zijlstra
Kernel entry points should be having ENDBR on for IBT configs. The SYSCALL entry points are found through taking their respective address in order to program them in the MSRs, while the exception entry points are found through UNWIND_HINT_IRET_REGS. The rule is that any UNWIND_HINT_IRET_REGS at sym+0 should have an ENDBR, see the later objtool ibt validation patch. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220308154317.933157479@infradead.org
2022-03-15x86/ibt,xen: Sprinkle the ENDBRPeter Zijlstra
Even though Xen currently doesn't advertise IBT, prepare for when it will eventually do so and sprinkle the ENDBR dust accordingly. Even though most of the entry points are IRET like, the CPL0 Hypervisor can set WAIT-FOR-ENDBR and demand ENDBR at these sites. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220308154317.873919996@infradead.org
2022-03-15x86/entry,xen: Early rewrite of restore_regs_and_return_to_kernel()Peter Zijlstra
By doing an early rewrite of 'jmp native_iret` in restore_regs_and_return_to_kernel() we can get rid of the last INTERRUPT_RETURN user and paravirt_iret. Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220308154317.815039833@infradead.org
2021-12-06x86/mm/64: Flush global TLB on boot and AP bringupJoerg Roedel
The AP bringup code uses the trampoline_pgd page-table which establishes global mappings in the user range of the address space. Flush the global TLB entries after the indentity mappings are removed so no stale entries remain in the TLB. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211202153226.22946-3-joro@8bytes.org
2021-05-12x86/entry: Unify definitions from <asm/calling.h> and <asm/ptrace-abi.h>H. Peter Anvin (Intel)
The register offsets in <asm/ptrace-abi.h> are duplicated in entry/calling.h, but are formatted differently and therefore not compatible. Use the version from <asm/ptrace-abi.h> consistently. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210510185316.3307264-2-hpa@zytor.com
2020-12-14Merge tag 'x86_build_for_v5.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Borislav Petkov: "Two x86 build fixes: - Fix the vmlinux size check on 64-bit along with adding useful clarifications on the topic (Arvind Sankar) - Remove -m16 workaround now that the GCC versions that need it are unsupported (Nick Desaulniers)" * tag 'x86_build_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build: Remove -m16 workaround for unsupported versions of GCC x86/build: Fix vmlinux size check on 64-bit
2020-12-14Merge tag 'x86_cleanups_for_v5.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: "Another branch with a nicely negative diffstat, just the way I like 'em: - Remove all uses of TIF_IA32 and TIF_X32 and reclaim the two bits in the end (Gabriel Krisman Bertazi) - All kinds of minor cleanups all over the tree" * tag 'x86_cleanups_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/ia32_signal: Propagate __user annotation properly x86/alternative: Update text_poke_bp() kernel-doc comment x86/PCI: Make a kernel-doc comment a normal one x86/asm: Drop unused RDPID macro x86/boot/compressed/64: Use TEST %reg,%reg instead of CMP $0,%reg x86/head64: Remove duplicate include x86/mm: Declare 'start' variable where it is used x86/head/64: Remove unused GET_CR2_INTO() macro x86/boot: Remove unused finalize_identity_maps() x86/uaccess: Document copy_from_user_nmi() x86/dumpstack: Make show_trace_log_lvl() static x86/mtrr: Fix a kernel-doc markup x86/setup: Remove unused MCA variables x86, libnvdimm/test: Remove COPY_MC_TEST x86: Reclaim TIF_IA32 and TIF_X32 x86/mm: Convert mmu context ia32_compat into a proper flags field x86/elf: Use e_machine to check for x32/ia32 in setup_additional_pages() elf: Expose ELF header on arch_setup_additional_pages() x86/elf: Use e_machine to select start_thread for x32 elf: Expose ELF header in compat_start_thread() ...
2020-11-18x86/head/64: Remove unused GET_CR2_INTO() macroArvind Sankar
Commit 4b47cdbda6f1 ("x86/head/64: Move early exception dispatch to C code") removed the usage of GET_CR2_INTO(). Drop the definition as well, and related definitions in paravirt.h and asm-offsets.h Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201005151208.2212886-3-nivedita@alum.mit.edu
2020-10-29x86/build: Fix vmlinux size check on 64-bitArvind Sankar
Commit b4e0409a36f4 ("x86: check vmlinux limits, 64-bit") added a check that the size of the 64-bit kernel is less than KERNEL_IMAGE_SIZE. The check uses (_end - _text), but this is not enough. The initial PMD used in startup_64() (level2_kernel_pgt) can only map upto KERNEL_IMAGE_SIZE from __START_KERNEL_map, not from _text, and the modules area (MODULES_VADDR) starts at KERNEL_IMAGE_SIZE. The correct check is what is currently done for 32-bit, since LOAD_OFFSET is defined appropriately for the two architectures. Just check (_end - LOAD_OFFSET) against KERNEL_IMAGE_SIZE unconditionally. Note that on 32-bit, the limit is not strict: KERNEL_IMAGE_SIZE is not really used by the main kernel. The higher the kernel is located, the less the space available for the vmalloc area. However, it is used by KASLR in the compressed stub to limit the maximum address of the kernel to a safe value. Clean up various comments to clarify that despite the name, KERNEL_IMAGE_SIZE is not a limit on the size of the kernel image, but a limit on the maximum virtual address that the image can occupy. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201029161903.2553528-1-nivedita@alum.mit.edu
2020-10-29x86/head/64: Check SEV encryption before switching to kernel page-tableJoerg Roedel
When SEV is enabled, the kernel requests the C-bit position again from the hypervisor to build its own page-table. Since the hypervisor is an untrusted source, the C-bit position needs to be verified before the kernel page-table is used. Call sev_verify_cbit() before writing the CR3. [ bp: Massage. ] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20201028164659.27002-5-joro@8bytes.org
2020-09-09x86/head/64: Don't call verify_cpu() on starting APsJoerg Roedel
The APs are not ready to handle exceptions when verify_cpu() is called in secondary_startup_64(). Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200907131613.12703-69-joro@8bytes.org
2020-09-09x86/sev-es: Setup GHCB-based boot #VC handlerJoerg Roedel
Add the infrastructure to handle #VC exceptions when the kernel runs on virtual addresses and has mapped a GHCB. This handler will be used until the runtime #VC handler takes over. Since the handler runs very early, disable instrumentation for sev-es.c. [ bp: Make vc_ghcb_invalidate() __always_inline so that it can be inlined in noinstr functions like __sev_es_nmi_complete(). ] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200908123816.GB3764@8bytes.org
2020-09-09x86/sev-es: Setup an early #VC handlerJoerg Roedel
Setup an early handler for #VC exceptions. There is no GHCB mapped yet, so just re-use the vc_no_ghcb_handler(). It can only handle CPUID exit-codes, but that should be enough to get the kernel through verify_cpu() and __startup_64() until it runs on virtual addresses. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> [ boot failure Error: kernel_ident_mapping_init() failed. ] Reported-by: kernel test robot <lkp@intel.com> Link: https://lkml.kernel.org/r/20200908123517.GA3764@8bytes.org
2020-09-07x86/head/64: Move early exception dispatch to C codeJoerg Roedel
Move the assembly coded dispatch between page-faults and all other exceptions to C code to make it easier to maintain and extend. Also change the return-type of early_make_pgtable() to bool and make it static. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-36-joro@8bytes.org
2020-09-07x86/head/64: Install a CPU bringup IDTJoerg Roedel
Add a separate bringup IDT for the CPU bringup code that will be used until the kernel switches to the idt_table. There are two reasons for a separate IDT: 1) When the idt_table is set up and the secondary CPUs are booted, it contains entries (e.g. IST entries) which require certain CPU state to be set up. This includes a working TSS (for IST), MSR_GS_BASE (for stack protector) or CR4.FSGSBASE (for paranoid_entry) path. By using a dedicated IDT for early boot this state need not to be set up early. 2) The idt_table is static to idt.c, so any function using/modifying must be in idt.c too. That means that all compiler driven instrumentation like tracing or KASAN is also active in this code. But during early CPU bringup the environment is not set up for this instrumentation to work correctly. To avoid all of these hassles and make early exception handling robust, use a dedicated bringup IDT. The IDT is loaded two times, first on the boot CPU while the kernel is still running on direct mapped addresses, and again later after the switch to kernel addresses has happened. The second IDT load happens on the boot and secondary CPUs. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-34-joro@8bytes.org
2020-09-07x86/head/64: Switch to initial stack earlierJoerg Roedel
Make sure there is a stack once the kernel runs from virtual addresses. At this stage any secondary CPU which boots will have lost its stack because the kernel switched to a new page-table which does not map the real-mode stack anymore. This is needed for handling early #VC exceptions caused by instructions like CPUID. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200907131613.12703-33-joro@8bytes.org
2020-09-07x86/head/64: Load segment registers earlierJoerg Roedel
Make sure segments are properly set up before setting up an IDT and doing anything that might cause a #VC exception. This is later needed for early exception handling. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200907131613.12703-32-joro@8bytes.org
2020-09-07x86/head/64: Load GDT after switch to virtual addressesJoerg Roedel
Load the GDT right after switching to virtual addresses to make sure there is a defined GDT for exception handling. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200907131613.12703-31-joro@8bytes.org
2020-09-07x86/head/64: Install startup GDTJoerg Roedel
Handling exceptions during boot requires a working GDT. The kernel GDT can't be used on the direct mapping, so load a startup GDT and setup segments. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-30-joro@8bytes.org
2020-06-11x86/entry: Remove the apic/BUILD interrupt leftoversThomas Gleixner
Remove all the code which was there to emit the system vector stubs. All users are gone. Move the now unused GET_CR2_INTO macro muck to head_64.S where the last user is. Fixup the eye hurting comment there while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202119.927433002@linutronix.de
2020-06-09mm: reorder includes after introduction of linux/pgtable.hMike Rapoport
The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include of the latter in the middle of asm includes. Fix this up with the aid of the below script and manual adjustments here and there. import sys import re if len(sys.argv) is not 3: print "USAGE: %s <file> <header>" % (sys.argv[0]) sys.exit(1) hdr_to_move="#include <linux/%s>" % sys.argv[2] moved = False in_hdrs = False with open(sys.argv[1], "r") as f: lines = f.readlines() for _line in lines: line = _line.rstrip(' ') if line == hdr_to_move: continue if line.startswith("#include <linux/"): in_hdrs = True elif not moved and in_hdrs: moved = True print hdr_to_move print line Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mm: introduce include/linux/pgtable.hMike Rapoport
The include/linux/pgtable.h is going to be the home of generic page table manipulation functions. Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and make the latter include asm/pgtable.h. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-18x86/asm/64: Change all ENTRY+END to SYM_CODE_*Jiri Slaby
Change all assembly code which is marked using END (and not ENDPROC). Switch all these to the appropriate new annotation SYM_CODE_START and SYM_CODE_END. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> [xen bits] Cc: Andy Lutomirski <luto@kernel.org> Cc: Cao jin <caoj.fnst@cn.fujitsu.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: linux-arch@vger.kernel.org Cc: Maran Wilson <maran.wilson@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20191011115108.12392-24-jslaby@suse.cz
2019-10-18x86/asm: Do not annotate functions with GLOBALJiri Slaby
GLOBAL is an x86's custom macro and is going to die very soon. It was meant for global symbols, but here, it was used for functions. Instead, use the new macros SYM_FUNC_START* and SYM_CODE_START* (depending on the type of the function) which are dedicated to global functions. And since they both require a closing by SYM_*_END, do that here too. startup_64, which does not use GLOBAL but uses .globl explicitly, is converted too. "No alignments" are preserved. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Allison Randal <allison@lohutok.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Cao jin <caoj.fnst@cn.fujitsu.com> Cc: Enrico Weigelt <info@metux.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-arch@vger.kernel.org Cc: Maran Wilson <maran.wilson@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191011115108.12392-17-jslaby@suse.cz
2019-10-18x86/asm/head: Annotate data appropriatelyJiri Slaby
Use the new SYM_DATA, SYM_DATA_START, and SYM_DATA_END in both 32 and 64 bit head_*.S. In the 64-bit version, define also SYM_DATA_START_PAGE_ALIGNED locally using the new SYM_START. It is used in the code instead of NEXT_PAGE() which was defined in this file and had been using the obsolete macro GLOBAL(). Now, the data in the 64-bit object file look sane: Value Size Type Bind Vis Ndx Name 0000 4096 OBJECT GLOBAL DEFAULT 15 init_level4_pgt 1000 4096 OBJECT GLOBAL DEFAULT 15 level3_kernel_pgt 2000 2048 OBJECT GLOBAL DEFAULT 15 level2_kernel_pgt 3000 4096 OBJECT GLOBAL DEFAULT 15 level2_fixmap_pgt 4000 4096 OBJECT GLOBAL DEFAULT 15 level1_fixmap_pgt 5000 2 OBJECT GLOBAL DEFAULT 15 early_gdt_descr 5002 8 OBJECT LOCAL DEFAULT 15 early_gdt_descr_base 500a 8 OBJECT GLOBAL DEFAULT 15 phys_base 0000 8 OBJECT GLOBAL DEFAULT 17 initial_code 0008 8 OBJECT GLOBAL DEFAULT 17 initial_gs 0010 8 OBJECT GLOBAL DEFAULT 17 initial_stack 0000 4 OBJECT GLOBAL DEFAULT 19 early_recursion_flag 1000 4096 OBJECT GLOBAL DEFAULT 19 early_level4_pgt 2000 0x40000 OBJECT GLOBAL DEFAULT 19 early_dynamic_pgts 0000 4096 OBJECT GLOBAL DEFAULT 22 empty_zero_page All have correct size and type now. Note that this also removes implicit 16B alignment previously inserted by ENTRY: * initial_code, setup_once_ref, initial_page_table, initial_stack, boot_gdt are still aligned * early_gdt_descr is now properly aligned as was intended before ENTRY was added there long time ago * phys_base's alignment is kept by an explicitly added new alignment Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Cao jin <caoj.fnst@cn.fujitsu.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: linux-arch@vger.kernel.org Cc: Maran Wilson <maran.wilson@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191011115108.12392-12-jslaby@suse.cz
2019-10-18x86/asm: Annotate local pseudo-functionsJiri Slaby
Use the newly added SYM_CODE_START_LOCAL* to annotate beginnings of all pseudo-functions (those ending with END until now) which do not have ".globl" annotation. This is needed to balance END for tools that generate debuginfo. Note that ENDs are switched to SYM_CODE_END too so that everybody can see the pairing. C-like functions (which handle frame ptr etc.) are not annotated here, hence SYM_CODE_* macros are used here, not SYM_FUNC_*. Note that the 32bit version of early_idt_handler_common already had ENDPROC -- switch that to SYM_CODE_END for the same reason as above (and to be the same as 64bit). While early_idt_handler_common is LOCAL, it's name is not prepended with ".L" as it happens to appear in call traces. bad_get_user*, and bad_put_user are now aligned, as they are separate functions. They do not mind to be aligned -- no need to be compact there. early_idt_handler_common is aligned now too, as it is after early_idt_handler_array, so as well no need to be compact there. verify_cpu is self-standing and included in other .S files, so align it too. The others have alignment preserved to what it used to be (using the _NOALIGN variant of macros). Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexios Zavras <alexios.zavras@intel.com> Cc: Allison Randal <allison@lohutok.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Cao jin <caoj.fnst@cn.fujitsu.com> Cc: Enrico Weigelt <info@metux.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: linux-arch@vger.kernel.org Cc: Maran Wilson <maran.wilson@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191011115108.12392-6-jslaby@suse.cz
2019-10-05x86/asm: Reorder early variablesJiri Slaby
Moving early_recursion_flag (4 bytes) after early_level4_pgt (4k) and early_dynamic_pgts (256k) saves 4k which are used for alignment of early_level4_pgt after early_recursion_flag. The real improvement is merely on the source code side. Previously it was: * __INITDATA + .balign * early_recursion_flag variable * a ton of CPP MACROS * __INITDATA (again) * early_top_pgt and early_recursion_flag variables * .data Now, it is a bit simpler: * a ton of CPP MACROS * __INITDATA + .balign * early_top_pgt and early_recursion_flag variables * early_recursion_flag variable * .data On the binary level the change looks like this: Before: (sections) 12 .init.data 00042000 0000000000000000 0000000000000000 00008000 2**12 (symbols) 000000 4 OBJECT GLOBAL DEFAULT 22 early_recursion_flag 001000 4096 OBJECT GLOBAL DEFAULT 22 early_top_pgt 002000 0x40000 OBJECT GLOBAL DEFAULT 22 early_dynamic_pgts After: (sections) 12 .init.data 00041004 0000000000000000 0000000000000000 00008000 2**12 (symbols) 000000 4096 OBJECT GLOBAL DEFAULT 22 early_top_pgt 001000 0x40000 OBJECT GLOBAL DEFAULT 22 early_dynamic_pgts 041000 4 OBJECT GLOBAL DEFAULT 22 early_recursion_flag So the resulting vmlinux is smaller by 4k with my toolchain as many other variables can be placed after early_recursion_flag to fill the rest of the page. Note that this is only .init data, so it is freed right after being booted anyway. Savings on-disk are none -- compression of zeros is easy, so the size of bzImage is the same pre and post the change. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191003095238.29831-1-jslaby@suse.cz
2019-07-22x86/irq/64: Update stale commentCao jin
Commit e6401c130931 ("x86/irq/64: Split the IRQ stack into its own pages") missed to update one piece of comment as it did to its peer in Xen, which will confuse people who still need to read comment. Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190719081635.26528-1-caoj.fnst@cn.fujitsu.com
2019-07-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of x86 specific fixes and updates: - The CR2 corruption fixes which store CR2 early in the entry code and hand the stored address to the fault handlers. - Revert a forgotten leftover of the dropped FSGSBASE series. - Plug a memory leak in the boot code. - Make the Hyper-V assist functionality robust by zeroing the shadow page. - Remove a useless check for dead processes with LDT - Update paravirt and VMware maintainers entries. - A few cleanup patches addressing various compiler warnings" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry/64: Prevent clobbering of saved CR2 value x86/hyper-v: Zero out the VP ASSIST PAGE on allocation x86, boot: Remove multiple copy of static function sanitize_boot_params() x86/boot/compressed/64: Remove unused variable x86/boot/efi: Remove unused variables x86/mm, tracing: Fix CR2 corruption x86/entry/64: Update comments and sanity tests for create_gap x86/entry/64: Simplify idtentry a little x86/entry/32: Simplify common_exception x86/paravirt: Make read_cr2() CALLEE_SAVE MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE x86/process: Delete useless check for dead process with LDT x86: math-emu: Hide clang warnings for 16-bit overflow x86/e820: Use proper booleans instead of 0/1 x86/apic: Silence -Wtype-limits compiler warnings x86/mm: Free sme_early_buffer after init x86/boot: Fix memory leak in default_get_smp_config() Revert "x86/ptrace: Prevent ptrace from clearing the FS/GS selector" and fix the test
2019-07-18x86/head/64: Annotate start_cpu0() as non-callableJosh Poimboeuf
After an objtool improvement, it complains about the fact that start_cpu0() jumps to code which has an LRET instruction. arch/x86/kernel/head_64.o: warning: objtool: .head.text+0xe4: unsupported instruction in callable function Technically, start_cpu0() is callable, but it acts nothing like a callable function. Prevent objtool from treating it like one by removing its ELF function annotation. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/6b1b4505fcb90571a55fa1b52d71fb458ca24454.1563413318.git.jpoimboe@redhat.com