linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2025-03-03	x86/smp/32: Remove safe_smp_processor_id()	Brian Gerst
	The safe_smp_processor_id() function was originally implemented in: dc2bc768a009 ("stack overflow safe kdump: safe_smp_processor_id()") to mitigate the CPU number corruption on a stack overflow. At the time, x86-32 stored the CPU number in thread_struct, which was located at the bottom of the task stack and thus vulnerable to an overflow. The CPU number is now located in percpu memory, so this workaround is no longer needed. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20250303170115.2176553-1-brgerst@gmail.com
2025-03-03	x86/asm: Merge KSTK_ESP() implementations	Brian Gerst
	Commit: 263042e4630a ("Save user RSP in pt_regs->sp on SYSCALL64 fastpath") simplified the 64-bit implementation of KSTK_ESP() which is now identical to 32-bit. Merge them into a common definition. No functional change. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250303183111.2245129-1-brgerst@gmail.com
2025-03-03	x86/paravirt: Remove unused paravirt_disable_iospace()	Dr. David Alan Gilbert
	The last use of paravirt_disable_iospace() was removed in 2015 by commit d1c29465b8a5 ("lguest: don't disable iospace.") Remove it. Note the comment above it about 'entry.S' is unrelated to this but stayed when intervening code got deleted. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20250303004441.250451-1-linux@treblig.org
2025-03-03	x86/ibt: Make cfi_bhi a constant for FINEIBT_BHI=n	Peter Zijlstra
	Robot yielded a .config that tripped: vmlinux.o: warning: objtool: do_jit+0x276: relocation to !ENDBR: .noinstr.text+0x6a60 This is the result of using __bhi_args[1] in unreachable code; make sure the compiler is able to determine this is unreachable and trigger DCE. Closes: https://lore.kernel.org/oe-kbuild-all/202503030704.H9KFysNS-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20250303094911.GL5880@noisy.programming.kicks-ass.net
2025-02-26	x86/ibt: Optimize the fineibt-bhi arity 1 case	Peter Zijlstra
	Saves a CALL to an out-of-line thunk for the common case of 1 argument. Suggested-by: Scott Constable <scott.d.constable@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.927885784@infradead.org
2025-02-26	x86/ibt: Implement FineIBT-BHI mitigation	Peter Zijlstra
	While WAIT_FOR_ENDBR is specified to be a full speculation stop; it has been shown that some implementations are 'leaky' to such an extend that speculation can escape even the FineIBT preamble. To deal with this, add additional hardening to the FineIBT preamble. Notably, using a new LLVM feature: https://github.com/llvm/llvm-project/commit/e223485c9b38a5579991b8cebb6a200153eee245 which encodes the number of arguments in the kCFI preamble's register. Using this register<->arity mapping, have the FineIBT preamble CALL into a stub clobbering the relevant argument registers in the speculative case. Scott sayeth thusly: Microarchitectural attacks such as Branch History Injection (BHI) and Intra-mode Branch Target Injection (IMBTI) [1] can cause an indirect call to mispredict to an adversary-influenced target within the same hardware domain (e.g., within the kernel). Instructions at the mispredicted target may execute speculatively and potentially expose kernel data (e.g., to a user-mode adversary) through a microarchitectural covert channel such as CPU cache state. CET-IBT [2] is a coarse-grained control-flow integrity (CFI) ISA extension that enforces that each indirect call (or indirect jump) must land on an ENDBR (end branch) instruction, even speculatively. FineIBT is a software technique that refines CET-IBT by associating each function type with a 32-bit hash and enforcing (at the callee) that the hash of the caller's function pointer type matches the hash of the callee's function type. However, recent research [3] has demonstrated that the conditional branch that enforces FineIBT's hash check can be coerced to mispredict, potentially allowing an adversary to speculatively bypass the hash check: __cfi_foo: ENDBR64 SUB R10d, 0x01234567 JZ foo # Even if the hash check fails and ZF=0, this branch could still mispredict as taken UD2 foo: ... The techniques demonstrated in [3] require the attacker to be able to control the contents of at least one live register at the mispredicted target. Therefore, this patch set introduces a sequence of CMOV instructions at each indirect-callable target that poisons every live register with data that the attacker cannot control whenever the FineIBT hash check fails, thus mitigating any potential attack. The security provided by this scheme has been discussed in detail on an earlier thread [4]. [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html [2] Intel Software Developer's Manual, Volume 1, Chapter 18 [3] https://www.vusec.net/projects/native-bhi/ [4] https://lore.kernel.org/lkml/20240927194925.707462984@infradead.org/ There are some caveats for certain processors, see [1] for more info Suggested-by: Scott Constable <scott.d.constable@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.820402212@infradead.org
2025-02-26	x86/bhi: Add BHI stubs	Peter Zijlstra
	Add an array of code thunks, to be called from the FineIBT preamble, clobbering the first 'n' argument registers for speculative execution. Notably the 0th entry will clobber no argument registers and will never be used, it exists so the array can be naturally indexed, while the 7th entry will clobber all the 6 argument registers and also RSP in order to mess up stack based arguments. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.717378681@infradead.org
2025-02-26	x86/ibt: Add paranoid FineIBT mode	Peter Zijlstra
	Due to concerns about circumvention attacks against FineIBT on 'naked' ENDBR, add an additional caller side hash check to FineIBT. This should make it impossible to pivot over such a 'naked' ENDBR instruction at the cost of an additional load. The specific pivot reported was against the SYSCALL entry site and FRED will have all those holes fixed up. https://lore.kernel.org/linux-hardening/Z60NwR4w%2F28Z7XUa@ubun/ This specific fineibt_paranoid_start[] sequence was concocted by Scott. Suggested-by: Scott Constable <scott.d.constable@intel.com> Reported-by: Jennifer Miller <jmill@asu.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.598033084@infradead.org
2025-02-26	x86/traps: Decode LOCK Jcc.d8 as #UD	Peter Zijlstra
	Because overlapping code sequences are all the rage. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.486463917@infradead.org
2025-02-26	x86/ibt: Optimize the FineIBT instruction sequence	Peter Zijlstra
	Scott notes that non-taken branches are faster. Abuse overlapping code that traps instead of explicit UD2 instructions. And LEA does not modify flags and will have less dependencies. Suggested-by: Scott Constable <scott.d.constable@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.371942555@infradead.org
2025-02-26	x86/traps: Allow custom fixups in handle_bug()	Peter Zijlstra
	The normal fixup in handle_bug() is simply continuing at the next instruction. However upcoming patches make this the wrong thing, so allow handlers (specifically handle_cfi_failure()) to over-ride regs->ip. The callchain is such that the fixup needs to be done before it is determined if the exception is fatal, as such, revert any changes in that case. Additionally, have handle_cfi_failure() remember the regs->ip value it starts with for reporting. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.275223080@infradead.org
2025-02-26	x86/traps: Decode 0xEA instructions as #UD	Peter Zijlstra
	FineIBT will start using 0xEA as #UD. Normally '0xEA' is a 'bad', invalid instruction for the CPU. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.166774696@infradead.org
2025-02-26	x86/alternatives: Clean up preprocessor conditional block comments	Ingo Molnar
	When in the middle of a kernel source code file a kernel developer sees a lone #else or #endif: ... #else ... It's not obvious at a glance what those preprocessor blocks are conditional on, if the starting #ifdef is outside visible range. So apply the standard pattern we use in such cases elsewhere in the kernel for large preprocessor blocks: #ifdef CONFIG_XXX ... ... ... #endif /* CONFIG_XXX / ... #ifdef CONFIG_XXX ... ... ... #else / !CONFIG_XXX: / ... ... ... #endif / !CONFIG_XXX / ( Note that in the #else case we use the / !CONFIG_XXX / marker in the final #endif, not / CONFIG_XXX */, which serves as an easy visual marker to differentiate #else or #elif related #endif closures from singular #ifdef/#endif blocks. ) Also clean up __CFI_DEFAULT definition with a bit more vertical alignment applied, and a pointless tab converted to the standard space we use in such definitions. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
2025-02-26	x86/ibt: Add exact_endbr() helper	Peter Zijlstra
	For when we want to exactly match ENDBR, and not everything that we can scribble it with. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.059556588@infradead.org
2025-02-26	x86/cfi: Add 'cfi=warn' boot option	Peter Zijlstra
	Rebuilding with CONFIG_CFI_PERMISSIVE=y enabled is such a pain, esp. since clang is so slow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124159.924496481@infradead.org
2025-02-25	x86/nmi: Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()	Waiman Long
	Depending on the type of panics, it was found that the __register_nmi_handler() function can be called in NMI context from nmi_shootdown_cpus() leading to a lockdep splat: WARNING: inconsistent lock state inconsistent {INITIAL USE} -> {IN-NMI} usage. lock(&nmi_desc[0].lock); <Interrupt> lock(&nmi_desc[0].lock); Call Trace: _raw_spin_lock_irqsave __register_nmi_handler nmi_shootdown_cpus kdump_nmi_shootdown_cpus native_machine_crash_shutdown __crash_kexec In this particular case, the following panic message was printed before: Kernel panic - not syncing: Fatal hardware error! This message seemed to be given out from __ghes_panic() running in NMI context. The __register_nmi_handler() function which takes the nmi_desc lock with irq disabled shouldn't be called from NMI context as this can lead to deadlock. The nmi_shootdown_cpus() function can only be invoked once. After the first invocation, all other CPUs should be stuck in the newly added crash_nmi_callback() and cannot respond to a second NMI. Fix it by adding a new emergency NMI handler to the nmi_desc structure and provide a new set_emergency_nmi_handler() helper to set crash_nmi_callback() in any context. The new emergency handler will preempt other handlers in the linked list. That will eliminate the need to take any lock and serve the panic in NMI use case. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20250206191844.131700-1-longman@redhat.com
2025-02-21	x86/arch_prctl/64: Clean up ARCH_MAP_VDSO_32	Brian Gerst
	process_64.c is not built on native 32-bit, so CONFIG_X86_32 will never be set. No change in functionality intended. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20250202202323.422113-3-brgerst@gmail.com
2025-02-21	x86/arch_prctl: Simplify sys_arch_prctl()	Brian Gerst
	Use in_ia32_syscall() instead of a compat syscall entry. No change in functionality intended. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20250202202323.422113-2-brgerst@gmail.com
2025-02-21	x86/module: Remove unnecessary check in module_finalize()	Dan Carpenter
	The "calls" pointer can no longer be NULL after the following commit: ab9fea59487d ("x86/alternative: Simplify callthunk patching") Delete this unnecessary check. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/fcbb2f57-0714-4139-b441-8817365c16a1@stanley.mountain
2025-02-18	x86: Move sysctls into arch/x86	Joel Granados
	Move the following sysctl tables into arch/x86/kernel/setup.c: panic_on_{unrecoverable_nmi,io_nmi} bootloader_{type,version} io_delay_type unknown_nmi_panic acpi_realmode_flags Variables moved from include/linux/ to arch/x86/include/asm/ because there is no longer need for them outside arch/x86/kernel: acpi_realmode_flags panic_on_{unrecoverable_nmi,io_nmi} Include <asm/nmi.h> in arch/s86/kernel/setup.h in order to bring in panic_on_{io_nmi,unrecovered_nmi}. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kerenel/sysctl.c. Signed-off-by: Joel Granados <joel.granados@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250218-jag-mv_ctltables-v1-8-cd3698ab8d29@kernel.org
2025-02-18	Merge tag 'v6.14-rc3' into x86/core, to pick up fixes	Ingo Molnar
	Pick up upstream x86 fixes before applying new patches. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-02-16	Linux 6.14-rc3v6.14-rc3	Linus Torvalds

2025-02-16	Merge tag 'kbuild-fixes-v6.14-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix annoying logs when building tools in parallel - Fix the Debian linux-headers package build again - Fix the target triple detection for userspace programs on Clang * tag 'kbuild-fixes-v6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: modpost: Fix a few typos in a comment kbuild: userprogs: fix bitsize and target detection on clang kbuild: fix linux-headers package build when $(CC) cannot link userspace tools: fix annoying "mkdir -p ..." logs when building tools in parallel
2025-02-16	Merge tag 'driver-core-6.14-rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core api addition from Greg KH: "Here is a driver core new api for 6.14-rc3 that is being added to allow platform devices from stop being abused. It adds a new 'faux_device' structure and bus and api to allow almost a straight or simpler conversion from platform devices that were not really a platform device. It also comes with a binding for rust, with an example driver in rust showing how it's used. I'm adding this now so that the patches that convert the different drivers and subsystems can all start flowing into linux-next now through their different development trees, in time for 6.15-rc1. We have a number that are already reviewed and tested, but adding those conversions now doesn't seem right. For now, no one is using this, and it passes all build tests from 0-day and linux-next, so all should be good" * tag 'driver-core-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: rust/kernel: Add faux device bindings driver core: add a faux bus for use when a simple device/bus is needed
2025-02-16	Merge tag 'tty-6.14-rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull serial driver fixes from Greg KH: "Here are some small serial driver fixes for some reported problems. Nothing major, just: - sc16is7xx irq check fix - 8250 fifo underflow fix - serial_port and 8250 iotype fixes Most of these have been in linux-next already, and all have passed 0-day testing" * tag 'tty-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: 8250: Fix fifo underflow on flush serial: 8250_pnp: Remove unneeded ->iotype assignment serial: 8250_platform: Remove unneeded ->iotype assignment serial: 8250_of: Remove unneeded ->iotype assignment serial: port: Make ->iotype validation global in __uart_read_properties() serial: port: Always update ->iotype in __uart_read_properties() serial: port: Assign ->iotype correctly when ->iobase is set serial: sc16is7xx: Fix IRQ number check behavior
2025-02-16	Merge tag 'usb-6.14-rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB driver fixes, and new device ids, for 6.14-rc3. Lots of tiny stuff for reported problems, including: - new device ids and quirks - usb hub crash fix found by syzbot - dwc2 driver fix - dwc3 driver fixes - uvc gadget driver fix - cdc-acm driver fixes for a variety of different issues - other tiny bugfixes Almost all of these have been in linux-next this week, and all have passed 0-day testing" * tag 'usb-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (25 commits) usb: typec: tcpm: PSSourceOffTimer timeout in PR_Swap enters ERROR_RECOVERY usb: roles: set switch registered flag early on usb: gadget: uvc: Fix unstarted kthread worker USB: quirks: add USB_QUIRK_NO_LPM quirk for Teclast dist usb: gadget: core: flush gadget workqueue after device removal USB: gadget: f_midi: f_midi_complete to call queue_work usb: core: fix pipe creation for get_bMaxPacketSize0 usb: dwc3: Fix timeout issue during controller enter/exit from halt state USB: Add USB_QUIRK_NO_LPM quirk for sony xperia xz1 smartphone USB: cdc-acm: Fill in Renesas R-Car D3 USB Download mode quirk usb: cdc-acm: Fix handling of oversized fragments usb: cdc-acm: Check control transfer buffer size before access usb: xhci: Restore xhci_pci support for Renesas HCs USB: pci-quirks: Fix HCCPARAMS register error for LS7A EHCI USB: serial: option: drop MeiG Smart defines USB: serial: option: fix Telit Cinterion FN990A name USB: serial: option: add Telit Cinterion FN990B compositions USB: serial: option: add MeiG Smart SLM828 usb: gadget: f_midi: fix MIDI Streaming descriptor lengths usb: dwc2: gadget: remove of_node reference upon udc_stop ...
2025-02-16	Merge tag 'irq_urgent_for_v6.14_rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq Kconfig cleanup from Borislav Petkov: - Remove an unused config item GENERIC_PENDING_IRQ_CHIPFLAGS * tag 'irq_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Remove unused CONFIG_GENERIC_PENDING_IRQ_CHIPFLAGS
2025-02-16	Merge tag 'perf_urgent_for_v6.14_rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fixes from Borislav Petkov: - Explicitly clear DEBUGCTL.LBR to prevent LBRs continuing being enabled after handoff to the OS - Check CPUID(0x23) leaf and subleafs presence properly - Remove the PEBS-via-PT feature from being supported on hybrid systems - Fix perf record/top default commands on systems without a raw PMU registered * tag 'perf_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Ensure LBRs are disabled when a CPU is starting perf/x86/intel: Fix ARCH_PERFMON_NUM_COUNTER_LEAF perf/x86/intel: Clean up PEBS-via-PT on hybrid perf/x86/rapl: Fix the error checking order
2025-02-16	Merge tag 'sched_urgent_for_v6.14_rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Borislav Petkov: - Clarify what happens when a task is woken up from the wake queue and make clear its removal from that queue is atomic * tag 'sched_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Clarify wake_up_q()'s write to task->wake_q.next
2025-02-16	Merge tag 'objtool_urgent_for_v6.14_rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Borislav Petkov: - Move a warning about a lld.ld breakage into the verbose setting as said breakage has been fixed in the meantime - Teach objtool to ignore dangling jump table entries added by Clang * tag 'objtool_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Move dodgy linker warn to verbose objtool: Ignore dangling jump table entries
2025-02-16	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull kvm fixes from Paolo Bonzini: "ARM: - Large set of fixes for vector handling, especially in the interactions between host and guest state. This fixes a number of bugs affecting actual deployments, and greatly simplifies the FP/SIMD/SVE handling. Thanks to Mark Rutland for dealing with this thankless task. - Fix an ugly race between vcpu and vgic creation/init, resulting in unexpected behaviours - Fix use of kernel VAs at EL2 when emulating timers with nVHE - Small set of pKVM improvements and cleanups x86: - Fix broken SNP support with KVM module built-in, ensuring the PSP module is initialized before KVM even when the module infrastructure cannot be used to order initcalls - Reject Hyper-V SEND_IPI hypercalls if the local APIC isn't being emulated by KVM to fix a NULL pointer dereference - Enter guest mode (L2) from KVM's perspective before initializing the vCPU's nested NPT MMU so that the MMU is properly tagged for L2, not L1 - Load the guest's DR6 outside of the innermost .vcpu_run() loop, as the guest's value may be stale if a VM-Exit is handled in the fastpath" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits) x86/sev: Fix broken SNP support with KVM module built-in KVM: SVM: Ensure PSP module is initialized if KVM module is built-in crypto: ccp: Add external API interface for PSP module initialization KVM: arm64: vgic: Hoist SGI/PPI alloc from vgic_init() to kvm_create_vgic() KVM: arm64: timer: Drop warning on failed interrupt signalling KVM: arm64: Fix alignment of kvm_hyp_memcache allocations KVM: arm64: Convert timer offset VA when accessed in HYP code KVM: arm64: Simplify warning in kvm_arch_vcpu_load_fp() KVM: arm64: Eagerly switch ZCR_EL{1,2} KVM: arm64: Mark some header functions as inline KVM: arm64: Refactor exit handlers KVM: arm64: Refactor CPTR trap deactivation KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN KVM: arm64: Remove host FPSIMD saving for non-protected KVM KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state KVM: x86: Load DR6 with guest value only before entering .vcpu_run() loop KVM: nSVM: Enter guest mode before initializing nested NPT MMU KVM: selftests: Add CPUID tests for Hyper-V features that need in-kernel APIC KVM: selftests: Manage CPUID array in Hyper-V CPUID test's core helper ...
2025-02-16	Merge tag 'mips-fixes_6.14_1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Thomas Bogendoerfer: "Fix for o32 ptrace/get_syscall_info" * tag 'mips-fixes_6.14_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: fix mips_get_syscall_arg() for o32 MIPS: Export syscall stack arguments properly for remote use
2025-02-15	Merge tag 'devicetree-fixes-for-6.14-1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: - Add bindings for QCom QCS8300 clocks, QCom SAR2130P qfprom, and powertip,{st7272\|hx8238a} displays - Fix compatible for TI am62a7 dss - Add a kunit test for __of_address_resource_bounds() * tag 'devicetree-fixes-for-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: display: Add powertip,{st7272\|hx8238a} as DT Schema description dt-bindings: nvmem: qcom,qfprom: Add SAR2130P compatible dt-bindings: display: ti: Fix compatible for am62a7 dss of: address: Add kunit test for __of_address_resource_bounds() dt-bindings: clock: qcom: Add QCS8300 video clock controller dt-bindings: clock: qcom: Add CAMCC clocks for QCS8300 dt-bindings: clock: qcom: Add GPU clocks for QCS8300
2025-02-15	Merge tag 'uml-for-linus-6.14-rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML fixes from Richard Weinberger: - Align signal stack correctly - Convert to raw spinlocks where needed (irq and virtio) - FPU related fixes * tag 'uml-for-linus-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: um: convert irq_lock to raw spinlock um: virtio_uml: use raw spinlock um: virt-pci: don't use kmalloc() um: fix execve stub execution on old host OSs um: properly align signal stack on x86_64 um: avoid copying FP state from init_task um: add back support for FXSAVE registers
2025-02-15	Merge tag 'trace-ring-buffer-v6.14-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull trace ring buffer fixes from Steven Rostedt: - Enable resize on mmap() error When a process mmaps a ring buffer, its size is locked and resizing is disabled. But if the user passes in a wrong parameter, the mmap() can fail after the resize was disabled and the mmap() exits with error without reenabling the ring buffer resize. This prevents the ring buffer from ever being resized after that. Reenable resizing of the ring buffer on mmap() error. - Have resizing return proper error and not always -ENOMEM If the ring buffer is mmapped by one task and another task tries to resize the buffer it will error with -ENOMEM. This is confusing to the user as there may be plenty of memory available. Have it return the error that actually happens (in this case -EBUSY) where the user can understand why the resize failed. - Test the sub-buffer array to validate persistent memory buffer On boot up, the initialization of the persistent memory buffer will do a validation check to see if the content of the data is valid, and if so, it will use the memory as is, otherwise it re-initializes it. There's meta data in this persistent memory that keeps track of which sub-buffer is the reader page and an array that states the order of the sub-buffers. The values in this array are indexes into the sub-buffers. The validator checks to make sure that all the entries in the array are within the sub-buffer list index, but it does not check for duplications. While working on this code, the array got corrupted and had duplicates, where not all the sub-buffers were accounted for. This passed the validator as all entries were valid, but the link list was incorrect and could have caused a crash. The corruption only produced incorrect data, but it could have been more severe. To fix this, create a bitmask that covers all the sub-buffer indexes and set it to all zeros. While iterating the array checking the values of the array content, have it set a bit corresponding to the index in the array. If the bit was already set, then it is a duplicate and mark the buffer as invalid and reset it. - Prevent mmap()ing persistent ring buffer The persistent ring buffer uses vmap() to map the persistent memory. Currently, the mmap() logic only uses virt_to_page() to get the page from the ring buffer memory and use that to map to user space. This works because a normal ring buffer uses alloc_page() to allocate its memory. But because the persistent ring buffer use vmap() it causes a kernel crash. Fixing this to work with vmap() is not hard, but since mmap() on persistent memory buffers never worked, just have the mmap() return -ENODEV (what was returned before mmap() for persistent memory ring buffers, as they never supported mmap. Normal buffers will still allow mmap(). Implementing mmap() for persistent memory ring buffers can wait till the next merge window. - Fix polling on persistent ring buffers There's a "buffer_percent" option (default set to 50), that is used to have reads of the ring buffer binary data block until the buffer fills to that percentage. The field "pages_touched" is incremented every time a new sub-buffer has content added to it. This field is used in the calculations to determine the amount of content is in the buffer and if it exceeds the "buffer_percent" then it will wake the task polling on the buffer. As persistent ring buffers can be created by the content from a previous boot, the "pages_touched" field was not updated. This means that if a task were to poll on the persistent buffer, it would block even if the buffer was completely full. It would block even if the "buffer_percent" was zero, because with "pages_touched" as zero, it would be calculated as the buffer having no content. Update pages_touched when initializing the persistent ring buffer from a previous boot. * tag 'trace-ring-buffer-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ring-buffer: Update pages_touched to reflect persistent buffer content tracing: Do not allow mmap() of persistent ring buffer ring-buffer: Validate the persistent meta data subbuf array tracing: Have the error of __tracing_resize_ring_buffer() passed to user ring-buffer: Unlock resize on mmap error
2025-02-15	ring-buffer: Update pages_touched to reflect persistent buffer content	Steven Rostedt
	The pages_touched field represents the number of subbuffers in the ring buffer that have content that can be read. This is used in accounting of "dirty_pages" and "buffer_percent" to allow the user to wait for the buffer to be filled to a certain amount before it reads the buffer in blocking mode. The persistent buffer never updated this value so it was set to zero, and this accounting would take it as it had no content. This would cause user space to wait for content even though there's enough content in the ring buffer that satisfies the buffer_percent. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/20250214123512.0631436e@gandalf.local.home Fixes: 5f3b6e839f3ce ("ring-buffer: Validate boot range memory events") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-02-15	tracing: Do not allow mmap() of persistent ring buffer	Steven Rostedt
	When trying to mmap a trace instance buffer that is attached to reserve_mem, it would crash: BUG: unable to handle page fault for address: ffffe97bd00025c8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 2862f3067 P4D 2862f3067 PUD 0 Oops: Oops: 0000 [#1] PREEMPT_RT SMP PTI CPU: 4 UID: 0 PID: 981 Comm: mmap-rb Not tainted 6.14.0-rc2-test-00003-g7f1a5e3fbf9e-dirty #233 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:validate_page_before_insert+0x5/0xb0 Code: e2 01 89 d0 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 <48> 8b 46 08 a8 01 75 67 66 90 48 89 f0 8b 50 34 85 d2 74 76 48 89 RSP: 0018:ffffb148c2f3f968 EFLAGS: 00010246 RAX: ffff9fa5d3322000 RBX: ffff9fa5ccff9c08 RCX: 00000000b879ed29 RDX: ffffe97bd00025c0 RSI: ffffe97bd00025c0 RDI: ffff9fa5ccff9c08 RBP: ffffb148c2f3f9f0 R08: 0000000000000004 R09: 0000000000000004 R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000 R13: 00007f16a18d5000 R14: ffff9fa5c48db6a8 R15: 0000000000000000 FS: 00007f16a1b54740(0000) GS:ffff9fa73df00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffe97bd00025c8 CR3: 00000001048c6006 CR4: 0000000000172ef0 Call Trace: <TASK> ? __die_body.cold+0x19/0x1f ? __die+0x2e/0x40 ? page_fault_oops+0x157/0x2b0 ? search_module_extables+0x53/0x80 ? validate_page_before_insert+0x5/0xb0 ? kernelmode_fixup_or_oops.isra.0+0x5f/0x70 ? __bad_area_nosemaphore+0x16e/0x1b0 ? bad_area_nosemaphore+0x16/0x20 ? do_kern_addr_fault+0x77/0x90 ? exc_page_fault+0x22b/0x230 ? asm_exc_page_fault+0x2b/0x30 ? validate_page_before_insert+0x5/0xb0 ? vm_insert_pages+0x151/0x400 __rb_map_vma+0x21f/0x3f0 ring_buffer_map+0x21b/0x2f0 tracing_buffers_mmap+0x70/0xd0 __mmap_region+0x6f0/0xbd0 mmap_region+0x7f/0x130 do_mmap+0x475/0x610 vm_mmap_pgoff+0xf2/0x1d0 ksys_mmap_pgoff+0x166/0x200 __x64_sys_mmap+0x37/0x50 x64_sys_call+0x1670/0x1d70 do_syscall_64+0xbb/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f The reason was that the code that maps the ring buffer pages to user space has: page = virt_to_page((void *)cpu_buffer->subbuf_ids[s]); And uses that in: vm_insert_pages(vma, vma->vm_start, pages, &nr_pages); But virt_to_page() does not work with vmap()'d memory which is what the persistent ring buffer has. It is rather trivial to allow this, but for now just disable mmap() of instances that have their ring buffer from the reserve_mem option. If an mmap() is performed on a persistent buffer it will return -ENODEV just like it would if the .mmap field wasn't defined in the file_operations structure. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/20250214115547.0d7287d3@gandalf.local.home Fixes: 9b7bdf6f6ece6 ("tracing: Have trace_printk not use binary prints if boot buffer") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-02-15	Merge tag 'i2c-for-6.14-rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "MAINTAINERS maintenance. Changed email, added entry, deleted entry falling back to a generic one" * tag 'i2c-for-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: MAINTAINERS: Add maintainer for Qualcomm's I2C GENI driver MAINTAINERS: delete entry for AXXIA I2C MAINTAINERS: Use my kernel.org address for I2C ACPI work
2025-02-15	Merge tag 's390-6.14-4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix isolated VFs handling by verifying that a VF’s parent PF is locally owned before registering it in an existing PCI domain - Disable arch_test_bit() optimization for PROFILE_ALL_BRANCHES to workaround gcc failure in handling __builtin_constant_p() in this case - Fix CHPID "configure" attribute caching in CIO by not updating the cache when SCLP returns no data, ensuring consistent sysfs output - Remove CONFIG_LSM from default configs and rely on defaults, which enables BPF LSM hook * tag 's390-6.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/pci: Fix handling of isolated VFs s390/pci: Pull search for parent PF out of zpci_iov_setup_virtfn() s390/bitops: Disable arch_test_bit() optimization for PROFILE_ALL_BRANCHES s390/cio: Fix CHPID "configure" attribute caching s390/configs: Remove CONFIG_LSM
2025-02-16	modpost: Fix a few typos in a comment	Uwe Kleine-König
	Namely: s/becasue/because/ and s/wiht/with/ plus an added article. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-02-16	kbuild: userprogs: fix bitsize and target detection on clang	Thomas Weißschuh
	scripts/Makefile.clang was changed in the linked commit to move --target from KBUILD_CFLAGS to KBUILD_CPPFLAGS, as that generally has a broader scope. However that variable is not inspected by the userprogs logic, breaking cross compilation on clang. Use both variables to detect bitsize and target arguments for userprogs. Fixes: feb843a469fb ("kbuild: add $(CLANG_FLAGS) to KBUILD_CPPFLAGS") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-02-15	Merge tag 'rust-fixes-6.14-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux Pull rust fixes from Miguel Ojeda: - Fix objtool warning due to future Rust 1.85.0 (to be released in a few days) - Clean future Rust 1.86.0 (to be released 2025-04-03) Clippy warning * tag 'rust-fixes-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: rust: rbtree: fix overindented list item objtool/rust: add one more `noreturn` Rust function
2025-02-15	tegra210-adma: fix 32-bit x86 build	Linus Torvalds
	The Tegra210 Audio DMA controller driver did a plain divide: page_no = (res_page->start - res_base->start) / cdata->ch_base_offset; which causes problems on 32-bit x86 configurations that have 64-bit resource sizes: x86_64-linux-ld: drivers/dma/tegra210-adma.o: in function `tegra_adma_probe': tegra210-adma.c:(.text+0x1322): undefined reference to `__udivdi3' because gcc doesn't generate the trivial code for a 64-by-32 divide, turning it into a function call to do a full 64-by-64 divide. And the kernel intentionally doesn't provide that helper function, because 99% of the time all you want is the narrower version. Of course, tegra210 is a 64-bit architecture and the 32-bit x86 build is purely for build testing, so this really is just about build coverage failure. But build coverage is good. Side note: div_u64() would be suboptimal if you actually have a 32-bit resource_t, so our "helper" for divides are admittedly making it harder than it should be to generate good code for all the possible cases. At some point, I'll consider 32-bit x86 so entirely legacy that I can't find it in myself to care any more, and we'll just add the __udivdi3 library function. But for now, the right thing to do is to use "div_u64()" to show that you know that you are doing the simpler divide with a 32-bit number. And the build error enforces that. While fixing the build issue, also check for division-by-zero, and for overflow. Which hopefully cannot happen on real production hardware, but the value of 'ch_base_offset' can definitely be zero in other places. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-02-15	Merge tag 'gpio-fixes-for-v6.14-rc3-take2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix interrupt handling issues in gpio-bcm-kona - add an ACPI quirk for Acer Nitro ANV14 fixing an issue with spurious wake up events - add missing return value checks to gpio-stmpe - fix a crash in error path in gpiochip_get_ngpios() * tag 'gpio-fixes-for-v6.14-rc3-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpiolib: Fix crash on error in gpiochip_get_ngpios() gpio: stmpe: Check return value of stmpe_reg_read in stmpe_gpio_irq_sync_unlock gpiolib: acpi: Add a quirk for Acer Nitro ANV14 gpio: bcm-kona: Add missing newline to dev_err format string gpio: bcm-kona: Make sure GPIO bits are unlocked when requesting IRQ gpio: bcm-kona: Fix GPIO lock/unlock for banks above bank 0
2025-02-15	kbuild: fix linux-headers package build when $(CC) cannot link userspace	Masahiro Yamada
	Since commit 5f73e7d0386d ("kbuild: refactor cross-compiling linux-headers package"), the linux-headers Debian package fails to build when $(CC) cannot build userspace applications, for example, when using toolchains installed by the 0day bot. The host programs in the linux-headers package should be rebuilt using the disto's cross-compiler, ${DEB_HOST_GNU_TYPE}-gcc instead of $(CC). Hence, the variable 'CC' must be expanded in this shell script instead of in the top-level Makefile. Commit f354fc88a72a ("kbuild: install-extmod-build: add missing quotation marks for CC variable") was not a correct fix because CC="ccache gcc" should be unrelated when rebuilding userspace tools. Fixes: 5f73e7d0386d ("kbuild: refactor cross-compiling linux-headers package") Reported-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Closes: https://lore.kernel.org/linux-kbuild/CAK7LNARb3xO3ptBWOMpwKcyf3=zkfhMey5H2KnB1dOmUwM79dA@mail.gmail.com/T/#t Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-02-15	tools: fix annoying "mkdir -p ..." logs when building tools in parallel	Masahiro Yamada
	When CONFIG_OBJTOOL=y or CONFIG_DEBUG_INFO_BTF=y, parallel builds show awkward "mkdir -p ..." logs. $ make -j16 [ snip ] mkdir -p /home/masahiro/ref/linux/tools/objtool && make O=/home/masahiro/ref/linux subdir=tools/objtool --no-print-directory -C objtool mkdir -p /home/masahiro/ref/linux/tools/bpf/resolve_btfids && make O=/home/masahiro/ref/linux subdir=tools/bpf/resolve_btfids --no-print-directory -C bpf/resolve_btfids Defining MAKEFLAGS=<value> on the command line wipes out command line switches from the resultant MAKEFLAGS definition, even though the command line switches are active. [1] MAKEFLAGS puts all single-letter options into the first word, and that word will be empty if no single-letter options were given. [2] However, this breaks if MAKEFLAGS=<value> is given on the command line. The tools/ and tools/% targets set MAKEFLAGS=<value> on the command line, which breaks the following code in tools/scripts/Makefile.include: short-opts := $(firstword -$(MAKEFLAGS)) If MAKEFLAGS really needs modification, it should be done through the environment variable, as follows: MAKEFLAGS=<value> $(MAKE) ... That said, I question whether modifying MAKEFLAGS is necessary here. The only flag we might want to exclude is --no-print-directory, as the tools build system changes the working directory. However, people might find the "Entering/Leaving directory" logs annoying. I simply removed the offending MAKEFLAGS=<value>. [1]: https://savannah.gnu.org/bugs/?62469 [2]: https://www.gnu.org/software/make/manual/make.html#Testing-Flags Fixes: ea01fa9f63ae ("tools: Connect to the kernel build system") Fixes: a50e43332756 ("perf tools: Honor parallel jobs") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Daniel Xu <dxu@dxuuu.xyz>
2025-02-14	Merge tag 'alpha-fixes-v6.14-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha Pull alpha fixes from Matt Turner: "A few changes for alpha, including some important fixes for kernel stack alignment" * tag 'alpha-fixes-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha: alpha: Use str_yes_no() helper in pci_dac_dma_supported() alpha: Replace one-element array with flexible array member alpha: align stack for page fault and user unaligned trap handlers alpha: make stack 16-byte aligned (most cases) alpha: replace hardcoded stack offsets with autogenerated ones
2025-02-14	Merge tag 'pci-v6.14-fixes-3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fixes from Bjorn Helgaas: - Update a BUILD_BUG_ON() usage that works on current compilers, but breaks compilation on gcc 5.3.1 (Alex Williamson) - Avoid use of FLR for Mediatek MT7922 WiFi; the device previously worked after a long timeout and fallback to SBR, but after a recent RRS change it doesn't work at all after FLR (Bjorn Helgaas) * tag 'pci-v6.14-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI: Avoid FLR for Mediatek MT7922 WiFi PCI: Fix BUILD_BUG_ON usage for old gcc
2025-02-14	Merge tag 'kvm-x86-fixes-6.14-rcN' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM fixes for 6.14 part 1 - Reject Hyper-V SEND_IPI hypercalls if the local APIC isn't being emulated by KVM to fix a NULL pointer dereference. - Enter guest mode (L2) from KVM's perspective before initializing the vCPU's nested NPT MMU so that the MMU is properly tagged for L2, not L1. - Load the guest's DR6 outside of the innermost .vcpu_run() loop, as the guest's value may be stale if a VM-Exit is handled in the fastpath.
2025-02-14	x86/sev: Fix broken SNP support with KVM module built-in	Ashish Kalra
	Fix issues with enabling SNP host support and effectively SNP support which is broken with respect to the KVM module being built-in. SNP host support is enabled in snp_rmptable_init() which is invoked as device_initcall(). SNP check on IOMMU is done during IOMMU PCI init (IOMMU_PCI_INIT stage). And for that reason snp_rmptable_init() is currently invoked via device_initcall() and cannot be invoked via subsys_initcall() as core IOMMU subsystem gets initialized via subsys_initcall(). Now, if kvm_amd module is built-in, it gets initialized before SNP host support is enabled in snp_rmptable_init() : [ 10.131811] kvm_amd: TSC scaling supported [ 10.136384] kvm_amd: Nested Virtualization enabled [ 10.141734] kvm_amd: Nested Paging enabled [ 10.146304] kvm_amd: LBR virtualization supported [ 10.151557] kvm_amd: SEV enabled (ASIDs 100 - 509) [ 10.156905] kvm_amd: SEV-ES enabled (ASIDs 1 - 99) [ 10.162256] kvm_amd: SEV-SNP enabled (ASIDs 1 - 99) [ 10.171508] kvm_amd: Virtual VMLOAD VMSAVE supported [ 10.177052] kvm_amd: Virtual GIF supported ... ... [ 10.201648] kvm_amd: in svm_enable_virtualization_cpu And then svm_x86_ops->enable_virtualization_cpu() (svm_enable_virtualization_cpu) programs MSR_VM_HSAVE_PA as following: wrmsrl(MSR_VM_HSAVE_PA, sd->save_area_pa); So VM_HSAVE_PA is non-zero before SNP support is enabled on all CPUs. snp_rmptable_init() gets invoked after svm_enable_virtualization_cpu() as following : ... [ 11.256138] kvm_amd: in svm_enable_virtualization_cpu ... [ 11.264918] SEV-SNP: in snp_rmptable_init This triggers a #GP exception in snp_rmptable_init() when snp_enable() is invoked to set SNP_EN in SYSCFG MSR: [ 11.294289] unchecked MSR access error: WRMSR to 0xc0010010 (tried to write 0x0000000003fc0000) at rIP: 0xffffffffaf5d5c28 (native_write_msr+0x8/0x30) ... [ 11.294404] Call Trace: [ 11.294482] <IRQ> [ 11.294513] ? show_stack_regs+0x26/0x30 [ 11.294522] ? ex_handler_msr+0x10f/0x180 [ 11.294529] ? search_extable+0x2b/0x40 [ 11.294538] ? fixup_exception+0x2dd/0x340 [ 11.294542] ? exc_general_protection+0x14f/0x440 [ 11.294550] ? asm_exc_general_protection+0x2b/0x30 [ 11.294557] ? __pfx_snp_enable+0x10/0x10 [ 11.294567] ? native_write_msr+0x8/0x30 [ 11.294570] ? __snp_enable+0x5d/0x70 [ 11.294575] snp_enable+0x19/0x20 [ 11.294578] __flush_smp_call_function_queue+0x9c/0x3a0 [ 11.294586] generic_smp_call_function_single_interrupt+0x17/0x20 [ 11.294589] __sysvec_call_function+0x20/0x90 [ 11.294596] sysvec_call_function+0x80/0xb0 [ 11.294601] </IRQ> [ 11.294603] <TASK> [ 11.294605] asm_sysvec_call_function+0x1f/0x30 ... [ 11.294631] arch_cpu_idle+0xd/0x20 [ 11.294633] default_idle_call+0x34/0xd0 [ 11.294636] do_idle+0x1f1/0x230 [ 11.294643] ? complete+0x71/0x80 [ 11.294649] cpu_startup_entry+0x30/0x40 [ 11.294652] start_secondary+0x12d/0x160 [ 11.294655] common_startup_64+0x13e/0x141 [ 11.294662] </TASK> This #GP exception is getting triggered due to the following errata for AMD family 19h Models 10h-1Fh Processors: Processor may generate spurious #GP(0) Exception on WRMSR instruction: Description: The Processor will generate a spurious #GP(0) Exception on a WRMSR instruction if the following conditions are all met: - the target of the WRMSR is a SYSCFG register. - the write changes the value of SYSCFG.SNPEn from 0 to 1. - One of the threads that share the physical core has a non-zero value in the VM_HSAVE_PA MSR. The document being referred to above: https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/revision-guides/57095-PUB_1_01.pdf To summarize, with kvm_amd module being built-in, KVM/SVM initialization happens before host SNP is enabled and this SVM initialization sets VM_HSAVE_PA to non-zero, which then triggers a #GP when SYSCFG.SNPEn is being set and this will subsequently cause SNP_INIT(_EX) to fail with INVALID_CONFIG error as SYSCFG[SnpEn] is not set on all CPUs. Essentially SNP host enabling code should be invoked before KVM initialization, which is currently not the case when KVM is built-in. Add fix to call snp_rmptable_init() early from iommu_snp_enable() directly and not invoked via device_initcall() which enables SNP host support before KVM initialization with kvm_amd module built-in. Add additional handling for `iommu=off` or `amd_iommu=off` options. Note that IOMMUs need to be enabled for SNP initialization, therefore, if host SNP support is enabled but late IOMMU initialization fails then that will cause PSP driver's SNP_INIT to fail as IOMMU SNP sanity checks in SNP firmware will fail with invalid configuration error as below: [ 9.723114] ccp 0000:23:00.1: sev enabled [ 9.727602] ccp 0000:23:00.1: psp enabled [ 9.732527] ccp 0000:a2:00.1: enabling device (0000 -> 0002) [ 9.739098] ccp 0000:a2:00.1: no command queues available [ 9.745167] ccp 0000:a2:00.1: psp enabled [ 9.805337] ccp 0000:23:00.1: SEV-SNP: failed to INIT rc -5, error 0x3 [ 9.866426] ccp 0000:23:00.1: SEV API:1.53 build:5 Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature") Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Cc: <Stable@vger.kernel.org> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Acked-by: Joerg Roedel <jroedel@suse.de> Message-ID: <138b520fb83964782303b43ade4369cd181fdd9c.1739226950.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>