summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/cpu/common.c
AgeCommit message (Collapse)Author
2024-02-15x86/cpu/topology: Make the APIC mismatch warnings completeThomas Gleixner
Detect all possible combinations of mismatch right in the CPUID evaluation code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20240212154638.867699078@linutronix.de
2024-02-15x86/cpu: Remove x86_coreid_bitsThomas Gleixner
No more users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20240212153625.455839743@linutronix.de
2024-02-15x86/cpu: Use common topology code for HYGONThomas Gleixner
Switch it over to use the consolidated topology evaluation and remove the temporary safe guards which are not longer needed. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20240212153625.207750409@linutronix.de
2024-02-15x86/cpu: Use common topology code for IntelThomas Gleixner
Intel CPUs use either topology leaf 0xb/0x1f evaluation or the legacy SMP/HT evaluation based on CPUID leaf 0x1/0x4. Move it over to the consolidated topology code and remove the random topology hacks which are sprinkled into the Intel and the common code. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20240212153624.893644349@linutronix.de
2024-02-15x86/cpu: Move __max_die_per_package to common.cThomas Gleixner
In preparation of a complete replacement for the topology leaf 0xb/0x1f evaluation, move __max_die_per_package into the common code. Will be removed once everything is converted over. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20240212153624.768188958@linutronix.de
2024-02-15x86/cpu: Add legacy topology parserThomas Gleixner
The legacy topology detection via CPUID leaf 4, which provides the number of cores in the package and CPUID leaf 1 which provides the number of logical CPUs in case that FEATURE_HT is enabled and the CMP_LEGACY feature is not set, is shared for Intel, Centaur and Zhaoxin CPUs. Lift the code from common.c without the early detection hack and provide it as common fallback mechanism. Will be utilized in later changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20240212153624.644448852@linutronix.de
2024-02-15x86/cpu: Provide cpu_init/parse_topology()Thomas Gleixner
Topology evaluation is a complete disaster and impenetrable mess. It's scattered all over the place with some vendor implementations doing early evaluation and some not. The most horrific part is the permanent overwriting of smt_max_siblings and __max_die_per_package, instead of establishing them once on the boot CPU and validating the result on the APs. The goals are: - One topology evaluation entry point - Proper sharing of pointlessly duplicated code - Proper structuring of the evaluation logic and preferences. - Evaluating important system wide information only once on the boot CPU - Making the 0xb/0x1f leaf parsing less convoluted and actually fixing the short comings of leaf 0x1f evaluation. Start to consolidate the topology evaluation code by providing the entry points for the early boot CPU evaluation and for the final parsing on the boot CPU and the APs. Move the trivial pieces into that new code: - The initialization of cpuinfo_x86::topo - The evaluation of CPUID leaf 1, which presets topo::initial_apicid - topo_apicid is set to topo::initial_apicid when invoked from early boot. When invoked for the final evaluation on the boot CPU it reads the actual APIC ID, which makes apic_get_initial_apicid() obsolete once everything is converted over. Provide a temporary helper function topo_converted() which shields off the not yet converted CPU vendors from invoking code which would break them. This shielding covers all vendor CPUs which support SMP, but not the historical pure UP ones as they only need the topology info init and eventually the initial APIC initialization. Provide two new members in cpuinfo_x86::topo to store the maximum number of SMT siblings and the number of dies per package and add them to the debugfs readout. These two members will be used to populate this information on the boot CPU and to validate the APs against it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20240212153624.581436579@linutronix.de
2024-02-14Merge tag 'v6.8-rc4' into x86/percpu, to resolve conflicts and refresh the ↵Ingo Molnar
branch Conflicts: arch/x86/include/asm/percpu.h arch/x86/include/asm/text-patching.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-01-31x86/fred: Invoke FRED initialization code to enable FREDH. Peter Anvin (Intel)
Let cpu_init_exception_handling() call cpu_init_fred_exceptions() to initialize FRED. However if FRED is unavailable or disabled, it falls back to set up TSS IST and initialize IDT. Co-developed-by: Xin Li <xin3.li@intel.com> Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Xin Li <xin3.li@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Shan Kang <shan.kang@intel.com> Link: https://lore.kernel.org/r/20231205105030.8698-36-xin3.li@intel.com
2024-01-31x86/syscall: Split IDT syscall setup code into idt_syscall_init()Xin Li
Because FRED uses the ring 3 FRED entrypoint for SYSCALL and SYSENTER and ERETU is the only legit instruction to return to ring 3, there is NO need to setup SYSCALL and SYSENTER MSRs for FRED, except the IA32_STAR MSR. Split IDT syscall setup code into idt_syscall_init() to make it easy to skip syscall setup code when FRED is enabled. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Xin Li <xin3.li@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Shan Kang <shan.kang@intel.com> Link: https://lore.kernel.org/r/20231205105030.8698-34-xin3.li@intel.com
2024-01-31x86/cpu: Add X86_CR4_FRED macroH. Peter Anvin (Intel)
Add X86_CR4_FRED macro for the FRED bit in %cr4. This bit must not be changed after initialization, so add it to the pinned CR4 bits. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Xin Li <xin3.li@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Shan Kang <shan.kang@intel.com> Link: https://lore.kernel.org/r/20231205105030.8698-12-xin3.li@intel.com
2024-01-29x86/speculation: Do not enable Automatic IBRS if SEV-SNP is enabledKim Phillips
Without SEV-SNP, Automatic IBRS protects only the kernel. But when SEV-SNP is enabled, the Automatic IBRS protection umbrella widens to all host-side code, including userspace. This protection comes at a cost: reduced userspace indirect branch performance. To avoid this performance loss, don't use Automatic IBRS on SEV-SNP hosts and all back to retpolines instead. [ mdr: squash in changes from review discussion. ] Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Dave Hansen <dave.hansen@intel.com> Link: https://lore.kernel.org/r/20240126041126.1927228-3-michael.roth@amd.com
2024-01-18Merge tag 'x86_tdx_for_6.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 TDX updates from Dave Hansen: "This contains the initial support for host-side TDX support so that KVM can run TDX-protected guests. This does not include the actual KVM-side support which will come from the KVM folks. The TDX host interactions with kexec also needs to be ironed out before this is ready for prime time, so this code is currently Kconfig'd off when kexec is on. The majority of the code here is the kernel telling the TDX module which memory to protect and handing some additional memory over to it to use to store TDX module metadata. That sounds pretty simple, but the TDX architecture is rather flexible and it takes quite a bit of back-and-forth to say, "just protect all memory, please." There is also some code tacked on near the end of the series to handle a hardware erratum. The erratum can make software bugs such as a kernel write to TDX-protected memory cause a machine check and masquerade as a real hardware failure. The erratum handling watches out for these and tries to provide nicer user errors" * tag 'x86_tdx_for_6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86/virt/tdx: Make TDX host depend on X86_MCE x86/virt/tdx: Disable TDX host support when kexec is enabled Documentation/x86: Add documentation for TDX host support x86/mce: Differentiate real hardware #MCs from TDX erratum ones x86/cpu: Detect TDX partial write machine check erratum x86/virt/tdx: Handle TDX interaction with sleep and hibernation x86/virt/tdx: Initialize all TDMRs x86/virt/tdx: Configure global KeyID on all packages x86/virt/tdx: Configure TDX module with the TDMRs and global KeyID x86/virt/tdx: Designate reserved areas for all TDMRs x86/virt/tdx: Allocate and set up PAMTs for TDMRs x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions x86/virt/tdx: Get module global metadata for module initialization x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory x86/virt/tdx: Add skeleton to enable TDX on demand x86/virt/tdx: Add SEAMCALL error printing for module initialization x86/virt/tdx: Handle SEAMCALL no entropy error in common code x86/virt/tdx: Make INTEL_TDX_HOST depend on X86_X2APIC x86/virt/tdx: Define TDX supported page sizes as macros ...
2024-01-08Merge tag 'x86-asm-2024-01-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "Replace magic numbers in GDT descriptor definitions & handling: - Introduce symbolic names via macros for descriptor types/fields/flags, and then use these symbolic names. - Clean up definitions a bit, such as GDT_ENTRY_INIT() - Fix/clean up details that became visibly inconsistent after the symbol-based code was introduced: - Unify accessed flag handling - Set the D/B size flag consistently & according to the HW specification" * tag 'x86-asm-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm: Add DB flag to 32-bit percpu GDT entry x86/asm: Always set A (accessed) flag in GDT descriptors x86/asm: Replace magic numbers in GDT descriptors, script-generated change x86/asm: Replace magic numbers in GDT descriptors, preparations x86/asm: Provide new infrastructure for GDT descriptors
2023-12-20x86/asm: Always set A (accessed) flag in GDT descriptorsVegard Nossum
We have no known use for having the CPU track whether GDT descriptors have been accessed or not. Simplify the code by adding the flag to the common flags and removing it everywhere else. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231219151200.2878271-5-vegard.nossum@oracle.com
2023-12-20x86/asm: Replace magic numbers in GDT descriptors, script-generated changeVegard Nossum
Actually replace the numeric values by the new symbolic values. I used this to find all the existing users of the GDT_ENTRY*() macros: $ git grep -P 'GDT_ENTRY(_INIT)?\(' Some of the lines will exceed 80 characters, but some of them will be shorter again in the next couple of patches. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231219151200.2878271-4-vegard.nossum@oracle.com
2023-12-20x86/asm: Replace magic numbers in GDT descriptors, preparationsVegard Nossum
We'd like to replace all the magic numbers in various GDT descriptors with new, semantically meaningful, symbolic values. In order to be able to verify that the change doesn't cause any actual changes to the compiled binary code, I've split the change into two patches: - Part 1 (this commit): everything _but_ actually replacing the numbers - Part 2 (the following commit): _only_ replacing the numbers The reason we need this split for verification is that including new headers causes some spurious changes to the object files, mostly line number changes in the debug info but occasionally other subtle codegen changes. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231219151200.2878271-3-vegard.nossum@oracle.com
2023-12-08x86/virt/tdx: Detect TDX during kernel bootKai Huang
Intel Trust Domain Extensions (TDX) protects guest VMs from malicious host and certain physical attacks. A CPU-attested software module called 'the TDX module' runs inside a new isolated memory range as a trusted hypervisor to manage and run protected VMs. Pre-TDX Intel hardware has support for a memory encryption architecture called MKTME. The memory encryption hardware underpinning MKTME is also used for Intel TDX. TDX ends up "stealing" some of the physical address space from the MKTME architecture for crypto-protection to VMs. The BIOS is responsible for partitioning the "KeyID" space between legacy MKTME and TDX. The KeyIDs reserved for TDX are called 'TDX private KeyIDs' or 'TDX KeyIDs' for short. During machine boot, TDX microcode verifies that the BIOS programmed TDX private KeyIDs consistently and correctly programmed across all CPU packages. The MSRs are locked in this state after verification. This is why MSR_IA32_MKTME_KEYID_PARTITIONING gets used for TDX enumeration: it indicates not just that the hardware supports TDX, but that all the boot-time security checks passed. The TDX module is expected to be loaded by the BIOS when it enables TDX, but the kernel needs to properly initialize it before it can be used to create and run any TDX guests. The TDX module will be initialized by the KVM subsystem when KVM wants to use TDX. Detect platform TDX support by detecting TDX private KeyIDs. The TDX module itself requires one TDX KeyID as the 'TDX global KeyID' to protect its metadata. Each TDX guest also needs a TDX KeyID for its own protection. Just use the first TDX KeyID as the global KeyID and leave the rest for TDX guests. If no TDX KeyID is left for TDX guests, disable TDX as initializing the TDX module alone is useless. [ dhansen: add X86_FEATURE, replace helper function ] Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/all/20231208170740.53979-1-dave.hansen%40intel.com
2023-11-13x86/barrier: Do not serialize MSR accesses on AMDBorislav Petkov (AMD)
AMD does not have the requirement for a synchronization barrier when acccessing a certain group of MSRs. Do not incur that unnecessary penalty there. There will be a CPUID bit which explicitly states that a MFENCE is not needed. Once that bit is added to the APM, this will be extended with it. While at it, move to processor.h to avoid include hell. Untangling that file properly is a matter for another day. Some notes on the performance aspect of why this is relevant, courtesy of Kishon VijayAbraham <Kishon.VijayAbraham@amd.com>: On a AMD Zen4 system with 96 cores, a modified ipi-bench[1] on a VM shows x2AVIC IPI rate is 3% to 4% lower than AVIC IPI rate. The ipi-bench is modified so that the IPIs are sent between two vCPUs in the same CCX. This also requires to pin the vCPU to a physical core to prevent any latencies. This simulates the use case of pinning vCPUs to the thread of a single CCX to avoid interrupt IPI latency. In order to avoid run-to-run variance (for both x2AVIC and AVIC), the below configurations are done: 1) Disable Power States in BIOS (to prevent the system from going to lower power state) 2) Run the system at fixed frequency 2500MHz (to prevent the system from increasing the frequency when the load is more) With the above configuration: *) Performance measured using ipi-bench for AVIC: Average Latency: 1124.98ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 42.6759M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] *) Performance measured using ipi-bench for x2AVIC: Average Latency: 1172.42ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 40.9432M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] From above, x2AVIC latency is ~4% more than AVIC. However, the expectation is x2AVIC performance to be better or equivalent to AVIC. Upon analyzing the perf captures, it is observed significant time is spent in weak_wrmsr_fence() invoked by x2apic_send_IPI(). With the fix to skip weak_wrmsr_fence() *) Performance measured using ipi-bench for x2AVIC: Average Latency: 1117.44ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 42.9608M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] Comparing the performance of x2AVIC with and without the fix, it can be seen the performance improves by ~4%. Performance captured using an unmodified ipi-bench using the 'mesh-ipi' option with and without weak_wrmsr_fence() on a Zen4 system also showed significant performance improvement without weak_wrmsr_fence(). The 'mesh-ipi' option ignores CCX or CCD and just picks random vCPU. Average throughput (10 iterations) with weak_wrmsr_fence(), Cumulative throughput: 4933374 IPI/s Average throughput (10 iterations) without weak_wrmsr_fence(), Cumulative throughput: 6355156 IPI/s [1] https://github.com/bytedance/kvm-utils/tree/master/microbenchmark/ipi-bench Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230622095212.20940-1-bp@alien8.de
2023-11-04Merge tag 'x86_microcode_for_v6.7_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loading updates from Borislac Petkov: "Major microcode loader restructuring, cleanup and improvements by Thomas Gleixner: - Restructure the code needed for it and add a temporary initrd mapping on 32-bit so that the loader can access the microcode blobs. This in itself is a preparation for the next major improvement: - Do not load microcode on 32-bit before paging has been enabled. Handling this has caused an endless stream of headaches, issues, ugly code and unnecessary hacks in the past. And there really wasn't any sensible reason to do that in the first place. So switch the 32-bit loading to happen after paging has been enabled and turn the loader code "real purrty" again - Drop mixed microcode steppings loading on Intel - there, a single patch loaded on the whole system is sufficient - Rework late loading to track which CPUs have updated microcode successfully and which haven't, act accordingly - Move late microcode loading on Intel in NMI context in order to guarantee concurrent loading on all threads - Make the late loading CPU-hotplug-safe and have the offlined threads be woken up for the purpose of the update - Add support for a minimum revision which determines whether late microcode loading is safe on a machine and the microcode does not change software visible features which the machine cannot use anyway since feature detection has happened already. Roughly, the minimum revision is the smallest revision number which must be loaded currently on the system so that late updates can be allowed - Other nice leanups, fixess, etc all over the place" * tag 'x86_microcode_for_v6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) x86/microcode/intel: Add a minimum required revision for late loading x86/microcode: Prepare for minimal revision check x86/microcode: Handle "offline" CPUs correctly x86/apic: Provide apic_force_nmi_on_cpu() x86/microcode: Protect against instrumentation x86/microcode: Rendezvous and load in NMI x86/microcode: Replace the all-in-one rendevous handler x86/microcode: Provide new control functions x86/microcode: Add per CPU control field x86/microcode: Add per CPU result state x86/microcode: Sanitize __wait_for_cpus() x86/microcode: Clarify the late load logic x86/microcode: Handle "nosmt" correctly x86/microcode: Clean up mc_cpu_down_prep() x86/microcode: Get rid of the schedule work indirection x86/microcode: Mop up early loading leftovers x86/microcode/amd: Use cached microcode for AP load x86/microcode/amd: Cache builtin/initrd microcode early x86/microcode/amd: Cache builtin microcode too x86/microcode/amd: Use correct per CPU ucode_cpu_info ...
2023-10-30Merge tag 'x86-core-2023-10-29-v2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Thomas Gleixner: - Limit the hardcoded topology quirk for Hygon CPUs to those which have a model ID less than 4. The newer models have the topology CPUID leaf 0xB correctly implemented and are not affected. - Make SMT control more robust against enumeration failures SMT control was added to allow controlling SMT at boottime or runtime. The primary purpose was to provide a simple mechanism to disable SMT in the light of speculation attack vectors. It turned out that the code is sensible to enumeration failures and worked only by chance for XEN/PV. XEN/PV has no real APIC enumeration which means the primary thread mask is not set up correctly. By chance a XEN/PV boot ends up with smp_num_siblings == 2, which makes the hotplug control stay at its default value "enabled". So the mask is never evaluated. The ongoing rework of the topology evaluation caused XEN/PV to end up with smp_num_siblings == 1, which sets the SMT control to "not supported" and the empty primary thread mask causes the hotplug core to deny the bringup of the APS. Make the decision logic more robust and take 'not supported' and 'not implemented' into account for the decision whether a CPU should be booted or not. - Fake primary thread mask for XEN/PV Pretend that all XEN/PV vCPUs are primary threads, which makes the usage of the primary thread mask valid on XEN/PV. That is consistent with because all of the topology information on XEN/PV is fake or even non-existent. - Encapsulate topology information in cpuinfo_x86 Move the randomly scattered topology data into a separate data structure for readability and as a preparatory step for the topology evaluation overhaul. - Consolidate APIC ID data type to u32 It's fixed width hardware data and not randomly u16, int, unsigned long or whatever developers decided to use. - Cure the abuse of cpuinfo for persisting logical IDs. Per CPU cpuinfo is used to persist the logical package and die IDs. That's really not the right place simply because cpuinfo is subject to be reinitialized when a CPU goes through an offline/online cycle. Use separate per CPU data for the persisting to enable the further topology management rework. It will be removed once the new topology management is in place. - Provide a debug interface for inspecting topology information Useful in general and extremly helpful for validating the topology management rework in terms of correctness or "bug" compatibility. * tag 'x86-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/apic, x86/hyperv: Use u32 in hv_snp_boot_ap() too x86/cpu: Provide debug interface x86/cpu/topology: Cure the abuse of cpuinfo for persisting logical ids x86/apic: Use u32 for wakeup_secondary_cpu[_64]() x86/apic: Use u32 for [gs]et_apic_id() x86/apic: Use u32 for phys_pkg_id() x86/apic: Use u32 for cpu_present_to_apicid() x86/apic: Use u32 for check_apicid_used() x86/apic: Use u32 for APIC IDs in global data x86/apic: Use BAD_APICID consistently x86/cpu: Move cpu_l[l2]c_id into topology info x86/cpu: Move logical package and die IDs into topology info x86/cpu: Remove pointless evaluation of x86_coreid_bits x86/cpu: Move cu_id into topology info x86/cpu: Move cpu_core_id into topology info hwmon: (fam15h_power) Use topology_core_id() scsi: lpfc: Use topology_core_id() x86/cpu: Move cpu_die_id into topology info x86/cpu: Move phys_proc_id into topology info x86/cpu: Encapsulate topology information in cpuinfo_x86 ...
2023-10-30Merge tag 'x86-mm-2023-10-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm handling updates from Ingo Molnar: - Add new NX-stack self-test - Improve NUMA partial-CFMWS handling - Fix #VC handler bugs resulting in SEV-SNP boot failures - Drop the 4MB memory size restriction on minimal NUMA nodes - Reorganize headers a bit, in preparation to header dependency reduction efforts - Misc cleanups & fixes * tag 'x86-mm-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Drop the 4 MB restriction on minimal NUMA node memory size selftests/x86/lam: Zero out buffer for readlink() x86/sev: Drop unneeded #include x86/sev: Move sev_setup_arch() to mem_encrypt.c x86/tdx: Replace deprecated strncpy() with strtomem_pad() selftests/x86/mm: Add new test that userspace stack is in fact NX x86/sev: Make boot_ghcb_page[] static x86/boot: Move x86_cache_alignment initialization to correct spot x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach x86/sev-es: Allow copy_from_kernel_nofault() in earlier boot x86_64: Show CR4.PSE on auxiliaries like on BSP x86/iommu/docs: Update AMD IOMMU specification document URL x86/sev/docs: Update document URL in amd-memory-encryption.rst x86/mm: Move arch_memory_failure() and arch_is_platform_page() definitions from <asm/processor.h> to <asm/pgtable.h> ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window x86/numa: Introduce numa_fill_memblks()
2023-10-23x86/percpu: Introduce const-qualified const_pcpu_hot to micro-optimize code ↵Uros Bizjak
generation Some variables in pcpu_hot, currently current_task and top_of_stack are actually per-thread variables implemented as per-CPU variables and thus stable for the duration of the respective task. There is already an attempt to eliminate redundant reads from these variables using this_cpu_read_stable() asm macro, which hides the dependency on the read memory address. However, the compiler has limited ability to eliminate asm common subexpressions, so this approach results in a limited success. The solution is to allow more aggressive elimination by aliasing pcpu_hot into a const-qualified const_pcpu_hot, and to read stable per-CPU variables from this constant copy. The current per-CPU infrastructure does not support reads from const-qualified variables. However, when the compiler supports segment qualifiers, it is possible to declare the const-aliased variable in the relevant named address space. The compiler considers access to the variable, declared in this way, as a read from a constant location, and will optimize reads from the variable accordingly. By implementing constant-qualified const_pcpu_hot, the compiler can eliminate redundant reads from the constant variables, reducing the number of loads from current_task from 3766 to 3217 on a test build, a -14.6% reduction. The reduction of loads translates to the following code savings: text data bss dec hex filename 25,477,353 4389456 808452 30675261 1d4113d vmlinux-old.o 25,476,074 4389440 808452 30673966 1d40c2e vmlinux-new.o representing a code size reduction of -1279 bytes. [ mingo: Updated the changelog, EXPORT(const_pcpu_hot). ] Co-developed-by: Nadav Amit <namit@vmware.com> Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20231020162004.135244-1-ubizjak@gmail.com
2023-10-18x86/microcode/32: Move early loading after paging enableThomas Gleixner
32-bit loads microcode before paging is enabled. The commit which introduced that has zero justification in the changelog. The cover letter has slightly more content, but it does not give any technical justification either: "The problem in current microcode loading method is that we load a microcode way, way too late; ideally we should load it before turning paging on. This may only be practical on 32 bits since we can't get to 64-bit mode without paging on, but we should still do it as early as at all possible." Handwaving word salad with zero technical content. Someone claimed in an offlist conversation that this is required for curing the ATOM erratum AAE44/AAF40/AAG38/AAH41. That erratum requires an microcode update in order to make the usage of PSE safe. But during early boot, PSE is completely irrelevant and it is evaluated way later. Neither is it relevant for the AP on single core HT enabled CPUs as the microcode loading on the AP is not doing anything. On dual core CPUs there is a theoretical problem if a split of an executable large page between enabling paging including PSE and loading the microcode happens. But that's only theoretical, it's practically irrelevant because the affected dual core CPUs are 64bit enabled and therefore have paging and PSE enabled before loading the microcode on the second core. So why would it work on 64-bit but not on 32-bit? The erratum: "AAG38 Code Fetch May Occur to Incorrect Address After a Large Page is Split Into 4-Kbyte Pages Problem: If software clears the PS (page size) bit in a present PDE (page directory entry), that will cause linear addresses mapped through this PDE to use 4-KByte pages instead of using a large page after old TLB entries are invalidated. Due to this erratum, if a code fetch uses this PDE before the TLB entry for the large page is invalidated then it may fetch from a different physical address than specified by either the old large page translation or the new 4-KByte page translation. This erratum may also cause speculative code fetches from incorrect addresses." The practical relevance for this is exactly zero because there is no splitting of large text pages during early boot-time, i.e. between paging enable and microcode loading, and neither during CPU hotplug. IOW, this load microcode before paging enable is yet another voodoo programming solution in search of a problem. What's worse is that it causes at least two serious problems: 1) When stackprotector is enabled, the microcode loader code has the stackprotector mechanics enabled. The read from the per CPU variable __stack_chk_guard is always accessing the virtual address either directly on UP or via %fs on SMP. In physical address mode this results in an access to memory above 3GB. So this works by chance as the hardware returns the same value when there is no RAM at this physical address. When there is RAM populated above 3G then the read is by chance the same as nothing changes that memory during the very early boot stage. That's not necessarily true during runtime CPU hotplug. 2) When function tracing is enabled, the relevant microcode loader functions and the functions invoked from there will call into the tracing code and evaluate global and per CPU variables in physical address mode. What could potentially go wrong? Cure this and move the microcode loading after the early paging enable, use the new temporary initrd mapping and remove the gunk in the microcode loader which is required to handle physical address mode. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.348298216@linutronix.de
2023-10-10x86/apic: Use u32 for cpu_present_to_apicid()Thomas Gleixner
APIC IDs are used with random data types u16, u32, int, unsigned int, unsigned long. Make it all consistently use u32 because that reflects the hardware register width and fixup a few related usage sites for consistency sake. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230814085113.054064391@linutronix.de
2023-10-10x86/cpu: Move cpu_l[l2]c_id into topology infoThomas Gleixner
The topology IDs which identify the LLC and L2 domains clearly belong to the per CPU topology information. Move them into cpuinfo_x86::cpuinfo_topo and get rid of the extra per CPU data and the related exports. This also paves the way to do proper topology evaluation during early boot because it removes the only per CPU dependency for that. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230814085112.803864641@linutronix.de
2023-10-10x86/cpu: Move logical package and die IDs into topology infoThomas Gleixner
Yet another topology related data pair. Rename logical_proc_id to logical_pkg_id so it fits the common naming conventions. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230814085112.745139505@linutronix.de
2023-10-10x86/cpu: Move cu_id into topology infoThomas Gleixner
No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230814085112.628405546@linutronix.de
2023-10-10x86/cpu: Move cpu_core_id into topology infoThomas Gleixner
Rename it to core_id and stick it to the other ID fields. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230814085112.566519388@linutronix.de
2023-10-10x86/cpu: Move cpu_die_id into topology infoThomas Gleixner
Move the next member. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230814085112.388185134@linutronix.de
2023-10-10x86/cpu: Move phys_proc_id into topology infoThomas Gleixner
Rename it to pkg_id which is the terminology used in the kernel. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230814085112.329006989@linutronix.de
2023-10-10x86/cpu: Encapsulate topology information in cpuinfo_x86Thomas Gleixner
The topology related information is randomly scattered across cpuinfo_x86. Create a new structure cpuinfo_topo and move in a first step initial_apicid and apicid into it. Aside of being better readable this is in preparation for replacing the horribly fragile CPU topology evaluation code further down the road. Consolidate APIC ID fields to u32 as that represents the hardware type. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230814085112.269787744@linutronix.de
2023-10-05Merge tag 'v6.6-rc4' into x86/entry, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-10-03x86/boot: Move x86_cache_alignment initialization to correct spotDave Hansen
c->x86_cache_alignment is initialized from c->x86_clflush_size. However, commit fbf6449f84bf moved c->x86_clflush_size initialization to later in boot without moving the c->x86_cache_alignment assignment: fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach") This presumably left c->x86_cache_alignment set to zero for longer than it should be. The result was an oops on 32-bit kernels while accessing a pointer at 0x20. The 0x20 came from accessing a structure member at offset 0x10 (buffer->cpumask) from a ZERO_SIZE_PTR=0x10. kmalloc() can evidently return ZERO_SIZE_PTR when it's given 0 as its alignment requirement. Move the c->x86_cache_alignment initialization to be after c->x86_clflush_size has an actual value. Fixes: fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach") Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20231002220045.1014760-1-dave.hansen@linux.intel.com
2023-09-28x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of ↵Adam Dunlap
a two-phase approach Instead of setting x86_virt_bits to a possibly-correct value and then correcting it later, do all the necessary checks before setting it. At this point, the #VC handler references boot_cpu_data.x86_virt_bits, and in the previous version, it would be triggered by the CPUIDs between the point at which it is set to 48 and when it is set to the correct value. Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Adam Dunlap <acdunlap@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Jacob Xu <jacobhxu@google.com> Link: https://lore.kernel.org/r/20230912002703.3924521-3-acdunlap@google.com
2023-09-28x86/srso: Add SRSO mitigation for Hygon processorsPu Wen
Add mitigation for the speculative return stack overflow vulnerability which exists on Hygon processors too. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/tencent_4A14812842F104E93AA722EC939483CEFF05@qq.com
2023-09-14x86/entry: Make IA32 syscalls' availability depend on ia32_enabled()Nikolay Borisov
Another major aspect of supporting running of 32bit processes is the ability to access 32bit syscalls. Such syscalls can be invoked by using the legacy int 0x80 handler and sysenter/syscall instructions. If IA32 emulation is disabled ensure that each of those 3 distinct mechanisms are also disabled. For int 0x80 a #GP exception would be generated since the respective descriptor is not going to be loaded at all. Invoking sysenter will also result in a #GP since IA32_SYSENTER_CS contains an invalid segment. Finally, syscall instruction cannot really be disabled so it's configured to execute a minimal handler. Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230623111409.3047467-6-nik.borisov@suse.com
2023-09-14x86/entry: Rename ignore_sysret()Nikolay Borisov
The SYSCALL instruction cannot really be disabled in compatibility mode. The best that can be done is to configure the CSTAR msr to point to a minimal handler. Currently this handler has a rather misleading name - ignore_sysret() as it's not really doing anything with sysret. Give it a more descriptive name. Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230623111409.3047467-3-nik.borisov@suse.com
2023-09-01Merge tag 'x86-urgent-2023-09-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Dave Hansen: "The most important fix here adds a missing CPU model to the recent Gather Data Sampling (GDS) mitigation list to ensure that mitigations are available on that CPU. There are also a pair of warning fixes, and closure of a covert channel that pops up when protection keys are disabled. Summary: - Mark all Skylake CPUs as vulnerable to GDS - Fix PKRU covert channel - Fix -Wmissing-variable-declarations warning for ia32_xyz_class - Fix kernel-doc annotation warning" * tag 'x86-urgent-2023-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu/xstate: Fix PKRU covert channel x86/irq/i8259: Fix kernel-doc annotation warning x86/speculation: Mark all Skylake CPUs as vulnerable to GDS x86/audit: Fix -Wmissing-variable-declarations warning for ia32_xyz_class
2023-08-31Merge tag 'x86_shstk_for_6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 shadow stack support from Dave Hansen: "This is the long awaited x86 shadow stack support, part of Intel's Control-flow Enforcement Technology (CET). CET consists of two related security features: shadow stacks and indirect branch tracking. This series implements just the shadow stack part of this feature, and just for userspace. The main use case for shadow stack is providing protection against return oriented programming attacks. It works by maintaining a secondary (shadow) stack using a special memory type that has protections against modification. When executing a CALL instruction, the processor pushes the return address to both the normal stack and to the special permission shadow stack. Upon RET, the processor pops the shadow stack copy and compares it to the normal stack copy. For more information, refer to the links below for the earlier versions of this patch set" Link: https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/ Link: https://lore.kernel.org/lkml/20230613001108.3040476-1-rick.p.edgecombe@intel.com/ * tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits) x86/shstk: Change order of __user in type x86/ibt: Convert IBT selftest to asm x86/shstk: Don't retry vm_munmap() on -EINTR x86/kbuild: Fix Documentation/ reference x86/shstk: Move arch detail comment out of core mm x86/shstk: Add ARCH_SHSTK_STATUS x86/shstk: Add ARCH_SHSTK_UNLOCK x86: Add PTRACE interface for shadow stack selftests/x86: Add shadow stack test x86/cpufeatures: Enable CET CR4 bit for shadow stack x86/shstk: Wire in shadow stack interface x86: Expose thread features in /proc/$PID/status x86/shstk: Support WRSS for userspace x86/shstk: Introduce map_shadow_stack syscall x86/shstk: Check that signal frame is shadow stack mem x86/shstk: Check that SSP is aligned on sigreturn x86/shstk: Handle signals for shadow stack x86/shstk: Introduce routines modifying shstk x86/shstk: Handle thread shadow stack x86/shstk: Add user-mode shadow stack support ...
2023-08-31x86/speculation: Mark all Skylake CPUs as vulnerable to GDSDave Hansen
The Gather Data Sampling (GDS) vulnerability is common to all Skylake processors. However, the "client" Skylakes* are now in this list: https://www.intel.com/content/www/us/en/support/articles/000022396/processors.html which means they are no longer included for new vulnerabilities here: https://www.intel.com/content/www/us/en/developer/topic-technology/software-security-guidance/processors-affected-consolidated-product-cpu-model.html or in other GDS documentation. Thus, they were not included in the original GDS mitigation patches. Mark SKYLAKE and SKYLAKE_L as vulnerable to GDS to match all the other Skylake CPUs (which include Kaby Lake). Also group the CPUs so that the ones that share the exact same vulnerabilities are next to each other. Last, move SRBDS to the end of each line. This makes it clear at a glance that SKYLAKE_X is unique. Of the five Skylakes, it is the only "server" CPU and has a different implementation from the clients of the "special register" hardware, making it immune to SRBDS. This makes the diff much harder to read, but the resulting table is worth it. I very much appreciate the report from Michael Zhivich about this issue. Despite what level of support a hardware vendor is providing, the kernel very much needs an accurate and up-to-date list of vulnerable CPUs. More reports like this are very welcome. * Client Skylakes are CPUID 406E3/506E3 which is family 6, models 0x4E and 0x5E, aka INTEL_FAM6_SKYLAKE and INTEL_FAM6_SKYLAKE_L. Reported-by: Michael Zhivich <mzhivich@akamai.com> Fixes: 8974eb588283 ("x86/speculation: Add Gather Data Sampling mitigation") Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2023-08-30Merge tag 'x86_apic_for_6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Dave Hansen: "This includes a very thorough rework of the 'struct apic' handlers. Quite a variety of them popped up over the years, especially in the 32-bit days when odd apics were much more in vogue. The end result speaks for itself, which is a removal of a ton of code and static calls to replace indirect calls. If there's any breakage here, it's likely to be around the 32-bit museum pieces that get light to no testing these days. Summary: - Rework apic callbacks, getting rid of unnecessary ones and coalescing lots of silly duplicates. - Use static_calls() instead of indirect calls for apic->foo() - Tons of cleanups an crap removal along the way" * tag 'x86_apic_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) x86/apic: Turn on static calls x86/apic: Provide static call infrastructure for APIC callbacks x86/apic: Wrap IPI calls into helper functions x86/apic: Mark all hotpath APIC callback wrappers __always_inline x86/xen/apic: Mark apic __ro_after_init x86/apic: Convert other overrides to apic_update_callback() x86/apic: Replace acpi_wake_cpu_handler_update() and apic_set_eoi_cb() x86/apic: Provide apic_update_callback() x86/xen/apic: Use standard apic driver mechanism for Xen PV x86/apic: Provide common init infrastructure x86/apic: Wrap apic->native_eoi() into a helper x86/apic: Nuke ack_APIC_irq() x86/apic: Remove pointless arguments from [native_]eoi_write() x86/apic/noop: Tidy up the code x86/apic: Remove pointless NULL initializations x86/apic: Sanitize APIC ID range validation x86/apic: Prepare x2APIC for using apic::max_apic_id x86/apic: Simplify X2APIC ID validation x86/apic: Add max_apic_id member x86/apic: Wrap APIC ID validation into an inline ...
2023-08-28Merge tag 'x86_microcode_for_v6.6_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loading updates from Borislav Petkov: "The first, cleanup part of the microcode loader reorg tglx has been working on. The other part wasn't fully ready in time so it will follow on later. This part makes the loader core code as it is practically enabled on pretty much every baremetal machine so there's no need to have the Kconfig items. In addition, there are cleanups which prepare for future feature enablement" * tag 'x86_microcode_for_v6.6_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Remove remaining references to CONFIG_MICROCODE_AMD x86/microcode/intel: Remove pointless mutex x86/microcode/intel: Remove debug code x86/microcode: Move core specific defines to local header x86/microcode/intel: Rename get_datasize() since its used externally x86/microcode: Make reload_early_microcode() static x86/microcode: Include vendor headers into microcode.h x86/microcode/intel: Move microcode functions out of cpu/intel.c x86/microcode: Hide the config knob x86/mm: Remove unused microcode.h include x86/microcode: Remove microcode_mutex x86/microcode/AMD: Rip out static buffers
2023-08-28Merge tag 'smp-core-2023-08-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug updates from Thomas Gleixner: "Updates for the CPU hotplug core: - Support partial SMT enablement. So far the sysfs SMT control only allows to toggle between SMT on and off. That's sufficient for x86 which usually has at max two threads except for the Xeon PHI platform which has four threads per core Though PowerPC has up to 16 threads per core and so far it's only possible to control the number of enabled threads per core via a command line option. There is some way to control this at runtime, but that lacks enforcement and the usability is awkward This update expands the sysfs interface and the core infrastructure to accept numerical values so PowerPC can build SMT runtime control for partial SMT enablement on top The core support has also been provided to the PowerPC maintainers who added the PowerPC related changes on top - Minor cleanups and documentation updates" * tag 'smp-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation: core-api/cpuhotplug: Fix state names cpu/hotplug: Remove unused function declaration cpu_set_state_online() cpu/SMT: Fix cpu_smt_possible() comment cpu/SMT: Allow enabling partial SMT states via sysfs cpu/SMT: Create topology_smt_thread_allowed() cpu/SMT: Remove topology_smt_supported() cpu/SMT: Store the current/max number of threads cpu/SMT: Move smt/control simple exit cases earlier cpu/SMT: Move SMT prototypes into cpu_smt.h cpu/hotplug: Remove dependancy against cpu_primary_thread_mask
2023-08-17x86/ibt: Convert IBT selftest to asmJosh Poimboeuf
The following warning is reported when frame pointers and kernel IBT are enabled: vmlinux.o: warning: objtool: ibt_selftest+0x11: sibling call from callable instruction with modified stack frame The problem is that objtool interprets the indirect branch in ibt_selftest() as a sibling call, and GCC inserts a (partial) frame pointer prologue before it: 0000 000000000003f550 <ibt_selftest>: 0000 3f550: f3 0f 1e fa endbr64 0004 3f554: e8 00 00 00 00 call 3f559 <ibt_selftest+0x9> 3f555: R_X86_64_PLT32 __fentry__-0x4 0009 3f559: 55 push %rbp 000a 3f55a: 48 8d 05 02 00 00 00 lea 0x2(%rip),%rax # 3f563 <ibt_selftest_ip> 0011 3f561: ff e0 jmp *%rax Note the inline asm is missing ASM_CALL_CONSTRAINT, so the 'push %rbp' happens before the indirect branch and the 'mov %rsp, %rbp' happens afterwards. Simplify the generated code and make it easier to understand for both tools and humans by moving the selftest to proper asm. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/99a7e16b97bda97bf0a04aa141d6241cd8a839a2.1680912949.git.jpoimboe@kernel.org
2023-08-13x86/microcode: Include vendor headers into microcode.hAshok Raj
Currently vendor specific headers are included explicitly when used in common code. Instead, include the vendor specific headers in microcode.h, and include that in all usages. No functional change. Suggested-by: Boris Petkov <bp@alien8.de> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230812195727.776541545@linutronix.de
2023-08-09x86/cpu: Make identify_boot_cpu() staticThomas Gleixner
It's not longer used outside the source file. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
2023-08-08x86/microcode: Remove microcode_mutexSebastian Andrzej Siewior
microcode_mutex is only used by reload_store(). It has a comment saying "to synchronize with each other". Other user of this mutex have been removed in the commits 181b6f40e9ea8 ("x86/microcode: Rip out the OLD_INTERFACE"). b6f86689d5b74 ("x86/microcode: Rip out the subsys interface gunk") The sysfs interface does not need additional synchronisation vs itself because it is provided as kernfs_ops::mutex which is acquired in kernfs_fop_write_iter(). Remove the superfluous microcode_mutex. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230804075853.JF_n6GXC@linutronix.de
2023-08-07Merge tag 'gds-for-linus-2023-08-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/gds fixes from Dave Hansen: "Mitigate Gather Data Sampling issue: - Add Base GDS mitigation - Support GDS_NO under KVM - Fix a documentation typo" * tag 'gds-for-linus-2023-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/x86: Fix backwards on/off logic about YMM support KVM: Add GDS_NO support to KVM x86/speculation: Add Kconfig option for GDS x86/speculation: Add force option to GDS mitigation x86/speculation: Add Gather Data Sampling mitigation
2023-08-07Merge tag 'x86_bugs_srso' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/srso fixes from Borislav Petkov: "Add a mitigation for the speculative RAS (Return Address Stack) overflow vulnerability on AMD processors. In short, this is yet another issue where userspace poisons a microarchitectural structure which can then be used to leak privileged information through a side channel" * tag 'x86_bugs_srso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/srso: Tie SBPB bit setting to microcode patch detection x86/srso: Add a forgotten NOENDBR annotation x86/srso: Fix return thunks in generated code x86/srso: Add IBPB on VMEXIT x86/srso: Add IBPB x86/srso: Add SRSO_NO support x86/srso: Add IBPB_BRTYPE support x86/srso: Add a Speculative RAS Overflow mitigation x86/bugs: Increase the x86 bugs vector size to two u32s