summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
36 hoursKVM: x86: Free vCPUs before freeing VM stateSean Christopherson
commit 17bcd714426386fda741a4bccd96a2870179344b upstream. Free vCPUs before freeing any VM state, as both SVM and VMX may access VM state when "freeing" a vCPU that is currently "in" L2, i.e. that needs to be kicked out of nested guest mode. Commit 6fcee03df6a1 ("KVM: x86: avoid loading a vCPU after .vm_destroy was called") partially fixed the issue, but for unknown reasons only moved the MMU unloading before VM destruction. Complete the change, and free all vCPU state prior to destroying VM state, as nVMX accesses even more state than nSVM. In addition to the AVIC, KVM can hit a use-after-free on MSR filters: kvm_msr_allowed+0x4c/0xd0 __kvm_set_msr+0x12d/0x1e0 kvm_set_msr+0x19/0x40 load_vmcs12_host_state+0x2d8/0x6e0 [kvm_intel] nested_vmx_vmexit+0x715/0xbd0 [kvm_intel] nested_vmx_free_vcpu+0x33/0x50 [kvm_intel] vmx_free_vcpu+0x54/0xc0 [kvm_intel] kvm_arch_vcpu_destroy+0x28/0xf0 kvm_vcpu_destroy+0x12/0x50 kvm_arch_destroy_vm+0x12c/0x1c0 kvm_put_kvm+0x263/0x3c0 kvm_vm_release+0x21/0x30 and an upcoming fix to process injectable interrupts on nested VM-Exit will access the PIC: BUG: kernel NULL pointer dereference, address: 0000000000000090 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page CPU: 23 UID: 1000 PID: 2658 Comm: kvm-nx-lpage-re RIP: 0010:kvm_cpu_has_extint+0x2f/0x60 [kvm] Call Trace: <TASK> kvm_cpu_has_injectable_intr+0xe/0x60 [kvm] nested_vmx_vmexit+0x2d7/0xdf0 [kvm_intel] nested_vmx_free_vcpu+0x40/0x50 [kvm_intel] vmx_vcpu_free+0x2d/0x80 [kvm_intel] kvm_arch_vcpu_destroy+0x2d/0x130 [kvm] kvm_destroy_vcpus+0x8a/0x100 [kvm] kvm_arch_destroy_vm+0xa7/0x1d0 [kvm] kvm_destroy_vm+0x172/0x300 [kvm] kvm_vcpu_release+0x31/0x50 [kvm] Inarguably, both nSVM and nVMX need to be fixed, but punt on those cleanups for the moment. Conceptually, vCPUs should be freed before VM state. Assets like the I/O APIC and PIC _must_ be allocated before vCPUs are created, so it stands to reason that they must be freed _after_ vCPUs are destroyed. Reported-by: Aaron Lewis <aaronlewis@google.com> Closes: https://lore.kernel.org/all/20240703175618.2304869-2-aaronlewis@google.com Cc: Jim Mattson <jmattson@google.com> Cc: Yan Zhao <yan.y.zhao@intel.com> Cc: Rick P Edgecombe <rick.p.edgecombe@intel.com> Cc: Kai Huang <kai.huang@intel.com> Cc: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250224235542.2562848-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Cheng <chengkev@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
36 hoursARM: 9448/1: Use an absolute path to unified.h in KBUILD_AFLAGSNathan Chancellor
[ Upstream commit 87c4e1459e80bf65066f864c762ef4dc932fad4b ] After commit d5c8d6e0fa61 ("kbuild: Update assembler calls to use proper flags and language target"), which updated as-instr to use the 'assembler-with-cpp' language option, the Kbuild version of as-instr always fails internally for arch/arm with <command-line>: fatal error: asm/unified.h: No such file or directory compilation terminated. because '-include' flags are now taken into account by the compiler driver and as-instr does not have '$(LINUXINCLUDE)', so unified.h is not found. This went unnoticed at the time of the Kbuild change because the last use of as-instr in Kbuild that arch/arm could reach was removed in 5.7 by commit 541ad0150ca4 ("arm: Remove 32bit KVM host support") but a stable backport of the Kbuild change to before that point exposed this potential issue if one were to be reintroduced. Follow the general pattern of '-include' paths throughout the tree and make unified.h absolute using '$(srctree)' to ensure KBUILD_AFLAGS can be used independently. Closes: https://lore.kernel.org/CACo-S-1qbCX4WAVFA63dWfHtrRHZBTyyr2js8Lx=Az03XHTTHg@mail.gmail.com/ Cc: stable@vger.kernel.org Fixes: d5c8d6e0fa61 ("kbuild: Update assembler calls to use proper flags and language target") Reported-by: KernelCI bot <bot@kernelci.org> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [ No KBUILD_RUSTFLAGS in 6.12 ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
36 hoursarm64: dts: qcom: x1-crd: Fix vreg_l2j_1p2 voltageStephan Gerhold
[ Upstream commit 5ce920e6a8db40e4b094c0d863cbd19fdcfbbb7a ] In the ACPI DSDT table, PPP_RESOURCE_ID_LDO2_J is configured with 1256000 uV instead of the 1200000 uV we have currently in the device tree. Use the same for consistency and correctness. Cc: stable@vger.kernel.org Fixes: bd50b1f5b6f3 ("arm64: dts: qcom: x1e80100: Add Compute Reference Device") Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org> Reviewed-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Abel Vesa <abel.vesa@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250423-x1e-vreg-l2j-voltage-v1-1-24b6a2043025@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org> [ Change x1e80100-crd.dts instead ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
36 hoursx86/hyperv: Fix APIC ID and VP index confusion in hv_snp_boot_ap()Roman Kisel
[ Upstream commit 86c48271e0d60c82665e9fd61277002391efcef7 ] To start an application processor in SNP-isolated guest, a hypercall is used that takes a virtual processor index. The hv_snp_boot_ap() function uses that START_VP hypercall but passes as VP index to it what it receives as a wakeup_secondary_cpu_64 callback: the APIC ID. As those two aren't generally interchangeable, that may lead to hung APs if the VP index and the APIC ID don't match up. Update the parameter names to avoid confusion as to what the parameter is. Use the APIC ID to the VP index conversion to provide the correct input to the hypercall. Cc: stable@vger.kernel.org Fixes: 44676bb9d566 ("x86/hyperv: Add smp support for SEV-SNP guest") Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20250507182227.7421-2-romank@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250507182227.7421-2-romank@linux.microsoft.com> [ changed HVCALL_GET_VP_INDEX_FROM_APIC_ID to HVCALL_GET_VP_ID_FROM_APIC_ID ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
36 hoursKVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flushManuel Andreas
[ Upstream commit fa787ac07b3ceb56dd88a62d1866038498e96230 ] In KVM guests with Hyper-V hypercalls enabled, the hypercalls HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST and HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX allow a guest to request invalidation of portions of a virtual TLB. For this, the hypercall parameter includes a list of GVAs that are supposed to be invalidated. However, when non-canonical GVAs are passed, there is currently no filtering in place and they are eventually passed to checked invocations of INVVPID on Intel / INVLPGA on AMD. While AMD's INVLPGA silently ignores non-canonical addresses (effectively a no-op), Intel's INVVPID explicitly signals VM-Fail and ultimately triggers the WARN_ONCE in invvpid_error(): invvpid failed: ext=0x0 vpid=1 gva=0xaaaaaaaaaaaaa000 WARNING: CPU: 6 PID: 326 at arch/x86/kvm/vmx/vmx.c:482 invvpid_error+0x91/0xa0 [kvm_intel] Modules linked in: kvm_intel kvm 9pnet_virtio irqbypass fuse CPU: 6 UID: 0 PID: 326 Comm: kvm-vm Not tainted 6.15.0 #14 PREEMPT(voluntary) RIP: 0010:invvpid_error+0x91/0xa0 [kvm_intel] Call Trace: vmx_flush_tlb_gva+0x320/0x490 [kvm_intel] kvm_hv_vcpu_flush_tlb+0x24f/0x4f0 [kvm] kvm_arch_vcpu_ioctl_run+0x3013/0x5810 [kvm] Hyper-V documents that invalid GVAs (those that are beyond a partition's GVA space) are to be ignored. While not completely clear whether this ruling also applies to non-canonical GVAs, it is likely fine to make that assumption, and manual testing on Azure confirms "real" Hyper-V interprets the specification in the same way. Skip non-canonical GVAs when processing the list of address to avoid tripping the INVVPID failure. Alternatively, KVM could filter out "bad" GVAs before inserting into the FIFO, but practically speaking the only downside of pushing validation to the final processing is that doing so is suboptimal for the guest, and no well-behaved guest will request TLB flushes for non-canonical addresses. Fixes: 260970862c88 ("KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently") Cc: stable@vger.kernel.org Signed-off-by: Manuel Andreas <manuel.andreas@tum.de> Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/c090efb3-ef82-499f-a5e0-360fc8420fb7@tum.de Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
36 hoursKVM: x86: model canonical checks more preciselyMaxim Levitsky
[ Upstream commit 9245fd6b8531497d129a7a6e3eef258042862f85 ] As a result of a recent investigation, it was determined that x86 CPUs which support 5-level paging, don't always respect CR4.LA57 when doing canonical checks. In particular: 1. MSRs which contain a linear address, allow full 57-bitcanonical address regardless of CR4.LA57 state. For example: MSR_KERNEL_GS_BASE. 2. All hidden segment bases and GDT/IDT bases also behave like MSRs. This means that full 57-bit canonical address can be loaded to them regardless of CR4.LA57, both using MSRS (e.g GS_BASE) and instructions (e.g LGDT). 3. TLB invalidation instructions also allow the user to use full 57-bit address regardless of the CR4.LA57. Finally, it must be noted that the CPU doesn't prevent the user from disabling 5-level paging, even when the full 57-bit canonical address is present in one of the registers mentioned above (e.g GDT base). In fact, this can happen without any userspace help, when the CPU enters SMM mode - some MSRs, for example MSR_KERNEL_GS_BASE are left to contain a non-canonical address in regard to the new mode. Since most of the affected MSRs and all segment bases can be read and written freely by the guest without any KVM intervention, this patch makes the emulator closely follow hardware behavior, which means that the emulator doesn't take in the account the guest CPUID support for 5-level paging, and only takes in the account the host CPU support. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240906221824.491834-4-mlevitsk@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com> Stable-dep-of: fa787ac07b3c ("KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
36 hoursKVM: x86: Add X86EMUL_F_MSR and X86EMUL_F_DT_LOAD to aid canonical checksMaxim Levitsky
[ Upstream commit c534b37b7584e2abc5d487b4e017f61a61959ca9 ] Add emulation flags for MSR accesses and Descriptor Tables loads, and pass the new flags as appropriate to emul_is_noncanonical_address(). The flags will be used to perform the correct canonical check, as the type of access affects whether or not CR4.LA57 is consulted when determining the canonical bit. No functional change is intended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240906221824.491834-3-mlevitsk@redhat.com [sean: split to separate patch, massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Stable-dep-of: fa787ac07b3c ("KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
36 hoursKVM: x86: Route non-canonical checks in emulator through emulate_opsMaxim Levitsky
[ Upstream commit 16ccadefa295af434ca296e566f078223ecd79ca ] Add emulate_ops.is_canonical_addr() to perform (non-)canonical checks in the emulator, which will allow extending is_noncanonical_address() to support different flavors of canonical checks, e.g. for descriptor table bases vs. MSRs, without needing duplicate logic in the emulator. No functional change is intended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240906221824.491834-3-mlevitsk@redhat.com [sean: separate from additional of flags, massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Stable-dep-of: fa787ac07b3c ("KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
36 hoursKVM: x86: drop x86.h include from cpuid.hMaxim Levitsky
[ Upstream commit e52ad1ddd0a3b07777141ec9406d5dc2c9a0de17 ] Drop x86.h include from cpuid.h to allow the x86.h to include the cpuid.h instead. Also fix various places where x86.h was implicitly included via cpuid.h Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240906221824.491834-2-mlevitsk@redhat.com [sean: fixup a missed include in mtrr.c] Signed-off-by: Sean Christopherson <seanjc@google.com> Stable-dep-of: fa787ac07b3c ("KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
36 hoursarm64: dts: qcom: x1e78100-t14s: mark l12b and l15b always-onJohan Hovold
[ Upstream commit 673fa129e558c5f1196adb27d97ac90ddfe4f19c ] The l12b and l15b supplies are used by components that are not (fully) described (and some never will be) and must never be disabled. Mark the regulators as always-on to prevent them from being disabled, for example, when consumers probe defer or suspend. Fixes: 7d1cbe2f4985 ("arm64: dts: qcom: Add X1E78100 ThinkPad T14s Gen 6") Cc: stable@vger.kernel.org # 6.12 Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Link: https://lore.kernel.org/r/20250314145440.11371-3-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson <andersson@kernel.org> [ applied changes to .dts file instead of .dtsi ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
36 hourscrypto: powerpc/poly1305 - add depends on BROKEN for nowEric Biggers
[ Upstream commit bc8169003b41e89fe7052e408cf9fdbecb4017fe ] As discussed in the thread containing https://lore.kernel.org/linux-crypto/20250510053308.GB505731@sol/, the Power10-optimized Poly1305 code is currently not safe to call in softirq context. Disable it for now. It can be re-enabled once it is fixed. Fixes: ba8f8624fde2 ("crypto: poly1305-p10 - Glue code for optmized Poly1305 implementation for ppc64le") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [ applied to arch/powerpc/crypto/Kconfig ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
36 hoursx86/bugs: Fix use of possibly uninit value in amd_check_tsa_microcode()Michael Zhivich
For kernels compiled with CONFIG_INIT_STACK_NONE=y, the value of __reserved field in zen_patch_rev union on the stack may be garbage. If so, it will prevent correct microcode check when consulting p.ucode_rev, resulting in incorrect mitigation selection. This is a stable-only fix. Cc: <stable@vger.kernel.org> Signed-off-by: Michael Zhivich <mzhivich@akamai.com> Fixes: 7a0395f6607a5 ("x86/bugs: Add a Transient Scheduler Attacks mitigation") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
36 hoursarm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()Ada Couprie Diaz
commit d42e6c20de6192f8e4ab4cf10be8c694ef27e8cb upstream. `cpu_switch_to()` and `call_on_irq_stack()` manipulate SP to change to different stacks along with the Shadow Call Stack if it is enabled. Those two stack changes cannot be done atomically and both functions can be interrupted by SErrors or Debug Exceptions which, though unlikely, is very much broken : if interrupted, we can end up with mismatched stacks and Shadow Call Stack leading to clobbered stacks. In `cpu_switch_to()`, it can happen when SP_EL0 points to the new task, but x18 stills points to the old task's SCS. When the interrupt handler tries to save the task's SCS pointer, it will save the old task SCS pointer (x18) into the new task struct (pointed to by SP_EL0), clobbering it. In `call_on_irq_stack()`, it can happen when switching from the task stack to the IRQ stack and when switching back. In both cases, we can be interrupted when the SCS pointer points to the IRQ SCS, but SP points to the task stack. The nested interrupt handler pushes its return addresses on the IRQ SCS. It then detects that SP points to the task stack, calls `call_on_irq_stack()` and clobbers the task SCS pointer with the IRQ SCS pointer, which it will also use ! This leads to tasks returning to addresses on the wrong SCS, or even on the IRQ SCS, triggering kernel panics via CONFIG_VMAP_STACK or FPAC if enabled. This is possible on a default config, but unlikely. However, when enabling CONFIG_ARM64_PSEUDO_NMI, DAIF is unmasked and instead the GIC is responsible for filtering what interrupts the CPU should receive based on priority. Given the goal of emulating NMIs, pseudo-NMIs can be received by the CPU even in `cpu_switch_to()` and `call_on_irq_stack()`, possibly *very* frequently depending on the system configuration and workload, leading to unpredictable kernel panics. Completely mask DAIF in `cpu_switch_to()` and restore it when returning. Do the same in `call_on_irq_stack()`, but restore and mask around the branch. Mask DAIF even if CONFIG_SHADOW_CALL_STACK is not enabled for consistency of behaviour between all configurations. Introduce and use an assembly macro for saving and masking DAIF, as the existing one saves but only masks IF. Cc: <stable@vger.kernel.org> Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Reported-by: Cristian Prundeanu <cpru@amazon.com> Fixes: 59b37fe52f49 ("arm64: Stash shadow stack pointer in the task struct on interrupt") Tested-by: Cristian Prundeanu <cpru@amazon.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250718142814.133329-1-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
36 hoursARM: 9450/1: Fix allowing linker DCE with binutils < 2.36Nathan Chancellor
commit 53e7e1fb81cc8ba2da1cb31f8917ef397caafe91 upstream. Commit e7607f7d6d81 ("ARM: 9443/1: Require linker to support KEEP within OVERLAY for DCE") accidentally broke the binutils version restriction that was added in commit 0d437918fb64 ("ARM: 9414/1: Fix build issue with LD_DEAD_CODE_DATA_ELIMINATION"), reintroducing the segmentation fault addressed by that workaround. Restore the binutils version dependency by using CONFIG_LD_CAN_USE_KEEP_IN_OVERLAY as an additional condition to ensure that CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION is only enabled with binutils >= 2.36 and ld.lld >= 21.0.0. Closes: https://lore.kernel.org/6739da7d-e555-407a-b5cb-e5681da71056@landley.net/ Closes: https://lore.kernel.org/CAFERDQ0zPoya5ZQfpbeuKVZEo_fKsonLf6tJbp32QnSGAtbi+Q@mail.gmail.com/ Cc: stable@vger.kernel.org Fixes: e7607f7d6d81 ("ARM: 9443/1: Require linker to support KEEP within OVERLAY for DCE") Reported-by: Rob Landley <rob@landley.net> Tested-by: Rob Landley <rob@landley.net> Reported-by: Martin Wetterwald <martin@wetterwald.eu> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
36 hoursx86/hyperv: Fix usage of cpu_online_mask to get valid cpuNuno Das Neves
[ Upstream commit bb169f80ed5a156ec3405e0e49c6b8e9ae264718 ] Accessing cpu_online_mask here is problematic because the cpus read lock is not held in this context. However, cpu_online_mask isn't needed here since the effective affinity mask is guaranteed to be valid in this callback. So, just use cpumask_first() to get the cpu instead of ANDing it with cpus_online_mask unnecessarily. Fixes: e39397d1fd68 ("x86/hyperv: implement an MSI domain for root partition") Reported-by: Michael Kelley <mhklinux@outlook.com> Closes: https://lore.kernel.org/linux-hyperv/SN6PR02MB4157639630F8AD2D8FD8F52FD475A@SN6PR02MB4157.namprd02.prod.outlook.com/ Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1751582677-30930-4-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1751582677-30930-4-git-send-email-nunodasneves@linux.microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
36 hoursx86/traps: Initialize DR7 by writing its architectural reset valueXin Li (Intel)
commit fa7d0f83c5c4223a01598876352473cb3d3bd4d7 upstream. Initialize DR7 by writing its architectural reset value to always set bit 10, which is reserved to '1', when "clearing" DR7 so as not to trigger unanticipated behavior if said bit is ever unreserved, e.g. as a feature enabling flag with inverted polarity. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Sean Christopherson <seanjc@google.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250620231504.2676902-3-xin%40zytor.com [ context adjusted: no KVM_DEBUGREG_AUTO_SWITCH flag test" ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 daysKVM: x86/xen: Fix cleanup logic in emulation of Xen schedop poll hypercallsManuel Andreas
commit 5a53249d149f48b558368c5338b9921b76a12f8c upstream. kvm_xen_schedop_poll does a kmalloc_array() when a VM polls the host for more than one event channel potr (nr_ports > 1). After the kmalloc_array(), the error paths need to go through the "out" label, but the call to kvm_read_guest_virt() does not. Fixes: 92c58965e965 ("KVM: x86/xen: Use kvm_read_guest_virt() instead of open-coding it badly") Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Manuel Andreas <manuel.andreas@tum.de> [Adjusted commit message. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 daysriscv: traps_misaligned: properly sign extend value in misaligned load handlerAndreas Schwab
[ Upstream commit b3510183ab7d63c71a3f5c89043d31686a76a34c ] Add missing cast to signed long. Signed-off-by: Andreas Schwab <schwab@suse.de> Fixes: 956d705dd279 ("riscv: Unaligned load/store handling for M_MODE") Tested-by: Clément Léger <cleger@rivosinc.com> Link: https://lore.kernel.org/r/mvmikk0goil.fsf@suse.de Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysriscv: Enable interrupt during exception handlingNam Cao
[ Upstream commit 969f028bf2c40573ef18061f702ede3ebfe12b42 ] force_sig_fault() takes a spinlock, which is a sleeping lock with CONFIG_PREEMPT_RT=y. However, exception handling calls force_sig_fault() with interrupt disabled, causing a sleeping in atomic context warning. This can be reproduced using userspace programs such as: int main() { asm ("ebreak"); } or int main() { asm ("unimp"); } There is no reason that interrupt must be disabled while handling exceptions from userspace. Enable interrupt while handling user exceptions. This also has the added benefit of avoiding unnecessary delays in interrupt handling. Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry") Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250625085630.3649485-1-namcao@linutronix.de Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysarm64: dts: imx95: Correct the DMA interrupter number of pcie0_epRichard Zhu
[ Upstream commit 61f1065272ea3721c20c4c0a6877d346b0e237c3 ] Correct the DMA interrupter number of pcie0_ep from 317 to 311. Fixes: 3b1d5deb29ff ("arm64: dts: imx95: add pcie[0,1] and pcie-ep[0,1] support") Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com> Reviewed-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysarm64: dts: rockchip: Add cd-gpios for sdcard detect on Cool Pi 4BAndy Yan
[ Upstream commit 98570e8cb8b0c0893810f285b4a3b1a3ab81a556 ] cd-gpios is used for sdcard detects for sdmmc. Fixes: 3f5d336d64d6 ("arm64: dts: rockchip: Add support for rk3588s based board Cool Pi 4B") Signed-off-by: Andy Yan <andyshrk@163.com> Link: https://lore.kernel.org/r/20250524064223.5741-2-andyshrk@163.com Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 daysarm64: dts: rockchip: Add cd-gpios for sdcard detect on Cool Pi CM5Andy Yan
[ Upstream commit e625e284172d235be5cd906a98c6c91c365bb9b1 ] cd-gpios is used for sdcard detects for sdmmc. Fixes: 791c154c3982 ("arm64: dts: rockchip: Add support for rk3588 based board Cool Pi CM5 EVB") Signed-off-by: Andy Yan <andyshrk@163.com> Link: https://lore.kernel.org/r/20250524064223.5741-1-andyshrk@163.com Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
10 dayss390/bpf: Fix bpf_arch_text_poke() with new_addr == NULL againIlya Leoshkevich
commit 6a5abf8cf182f577c7ae6c62f14debc9754ec986 upstream. Commit 7ded842b356d ("s390/bpf: Fix bpf_plt pointer arithmetic") has accidentally removed the critical piece of commit c730fce7c70c ("s390/bpf: Fix bpf_arch_text_poke() with new_addr == NULL"), causing intermittent kernel panics in e.g. perf's on_switch() prog to reappear. Restore the fix and add a comment. Fixes: 7ded842b356d ("s390/bpf: Fix bpf_plt pointer arithmetic") Cc: stable@vger.kernel.org Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250716194524.48109-2-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 daysarm64: dts: rockchip: use cs-gpios for spi1 on ringneckJakob Unterwurzacher
commit 53b6445ad08f07b6f4a84f1434f543196009ed89 upstream. Hardware CS has a very slow rise time of about 6us, causing transmission errors when CS does not reach high between transaction. It looks like it's not driven actively when transitioning from low to high but switched to input, so only the CPU pull-up pulls it high, slowly. Transitions from high to low are fast. On the oscilloscope, CS looks like an irregular sawtooth pattern like this: _____ ^ / | ^ /| / | /| / | / | / | / | / | ___/ |___/ |_____/ |___ With cs-gpios we have a CS rise time of about 20ns, as it should be, and CS looks rectangular. This fixes the data errors when running a flashcp loop against a m25p40 spi flash. With the Rockchip 6.1 kernel we see the same slow rise time, but for some reason CS is always high for long enough to reach a solid high. The RK3399 and RK3588 SoCs use the same SPI driver, so we also checked our "Puma" (RK3399) and "Tiger" (RK3588) boards. They do not have this problem. Hardware CS rise time is good. Fixes: c484cf93f61b ("arm64: dts: rockchip: add PX30-µQ7 (Ringneck) SoM with Haikou baseboard") Cc: stable@vger.kernel.org Reviewed-by: Quentin Schulz <quentin.schulz@cherry.de> Signed-off-by: Jakob Unterwurzacher <jakob.unterwurzacher@cherry.de> Link: https://lore.kernel.org/r/20250627131715.1074308-1-jakob.unterwurzacher@cherry.de Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 daysarm64: dts: imx8mp-venice-gw73xx: fix TPM SPI frequencyTim Harvey
commit 1fc02c2086003c5fdaa99cde49a987992ff1aae4 upstream. The IMX8MPDS Table 37 [1] shows that the max SPI master read frequency depends on the pins the interface is muxed behind with ECSPI2 muxed behind ECSPI2 supporting up to 25MHz. Adjust the spi-max-frequency based on these findings. [1] https://www.nxp.com/webapp/Download?colCode=IMX8MPIEC Fixes: 2b3ab9d81ab4 ("arm64: dts: imx8mp-venice-gw73xx: add TPM device") Cc: stable@vger.kernel.org Signed-off-by: Tim Harvey <tharvey@gateworks.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 daysarm64: dts: imx8mp-venice-gw72xx: fix TPM SPI frequencyTim Harvey
commit b25344753c53a5524ba80280ce68f2046e559ce0 upstream. The IMX8MPDS Table 37 [1] shows that the max SPI master read frequency depends on the pins the interface is muxed behind with ECSPI2 muxed behind ECSPI2 supporting up to 25MHz. Adjust the spi-max-frequency based on these findings. [1] https://www.nxp.com/webapp/Download?colCode=IMX8MPIEC Fixes: 5016f22028e4 ("arm64: dts: imx8mp-venice-gw72xx: add TPM device") Cc: stable@vger.kernel.org Signed-off-by: Tim Harvey <tharvey@gateworks.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 daysarm64: dts: imx8mp-venice-gw71xx: fix TPM SPI frequencyTim Harvey
commit 528e2d3125ad8d783e922033a0a8e2adb17b400e upstream. The IMX8MPDS Table 37 [1] shows that the max SPI master read frequency depends on the pins the interface is muxed behind with ECSPI2 muxed behind ECSPI2 supporting up to 25MHz. Adjust the spi-max-frequency based on these findings. [1] https://www.nxp.com/webapp/Download?colCode=IMX8MPIEC Fixes: 1a8f6ff6a291 ("arm64: dts: imx8mp-venice-gw71xx: add TPM device") Cc: stable@vger.kernel.org Signed-off-by: Tim Harvey <tharvey@gateworks.com> Link: https://lore.kernel.org/stable/20250523173723.4167474-1-tharvey%40gateworks.com Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 daysarm64: dts: freescale: imx8mm-verdin: Keep LDO5 always onFrancesco Dolcini
commit fbe94be09fa81343d623a86ec64a742759b669b3 upstream. LDO5 regulator is used to power the i.MX8MM NVCC_SD2 I/O supply, that is used for the SD2 card interface and also for some GPIOs. When the SD card interface is not enabled the regulator subsystem could turn off this supply, since it is not used anywhere else, however this will also remove the power to some other GPIOs, for example one I/O that is used to power the ethernet phy, leading to a non working ethernet interface. [ 31.820515] On-module +V3.3_1.8_SD (LDO5): disabling [ 31.821761] PMIC_USDHC_VSELECT: disabling [ 32.764949] fec 30be0000.ethernet end0: Link is Down Fix this keeping the LDO5 supply always on. Cc: stable@vger.kernel.org Fixes: 6a57f224f734 ("arm64: dts: freescale: add initial support for verdin imx8m mini") Fixes: f5aab0438ef1 ("regulator: pca9450: Fix enable register for LDO5") Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 daysarm64: dts: add big-endian property back into watchdog nodeMeng Li
commit 720fd1cbc0a0f3acdb26aedb3092ab10fe05e7ae upstream. Watchdog doesn't work on NXP ls1046ardb board because in commit 7c8ffc5555cb("arm64: dts: layerscape: remove big-endian for mmc nodes"), it intended to remove the big-endian from mmc node, but the big-endian of watchdog node is also removed by accident. So, add watchdog big-endian property back. In addition, add compatible string fsl,ls1046a-wdt, which allow big-endian property. Fixes: 7c8ffc5555cb ("arm64: dts: layerscape: remove big-endian for mmc nodes") Cc: stable@vger.kernel.org Signed-off-by: Meng Li <Meng.Li@windriver.com> Reviewed-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 daysarm64: dts: imx8mp-venice-gw74xx: fix TPM SPI frequencyTim Harvey
commit 0bdaca0922175478ddeadf8e515faa5269f6fae6 upstream. The IMX8MPDS Table 37 [1] shows that the max SPI master read frequency depends on the pins the interface is muxed behind with ECSPI2 muxed behind ECSPI2 supporting up to 25MHz. Adjust the spi-max-frequency based on these findings. [1] https://www.nxp.com/webapp/Download?colCode=IMX8MPIEC Fixes: 531936b218d8 ("arm64: dts: imx8mp-venice-gw74xx: update to revB PCB") Cc: stable@vger.kernel.org Signed-off-by: Tim Harvey <tharvey@gateworks.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17KVM: SVM: Set synthesized TSA CPUID flagsBorislav Petkov (AMD)
VERW_CLEAR is supposed to be set only by the hypervisor to denote TSA mitigation support to a guest. SQ_NO and L1_NO are both synthesizable, and are going to be set by hw CPUID on future machines. So keep the kvm_cpu_cap_init_kvm_defined() invocation *and* set them when synthesized. This fix is stable-only. Co-developed-by: Jinpu Wang <jinpu.wang@ionos.com> Signed-off-by: Jinpu Wang <jinpu.wang@ionos.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17arm64: Filter out SME hwcaps when FEAT_SME isn't implementedMark Brown
commit a75ad2fc76a2ab70817c7eed3163b66ea84ca6ac upstream. We have a number of hwcaps for various SME subfeatures enumerated via ID_AA64SMFR0_EL1. Currently we advertise these without cross checking against the main SME feature, advertised in ID_AA64PFR1_EL1.SME which means that if the two are out of sync userspace can see a confusing situation where SME subfeatures are advertised without the base SME hwcap. This can be readily triggered by using the arm64.nosme override which only masks out ID_AA64PFR1_EL1.SME, and there have also been reports of VMMs which do the same thing. Fix this as we did previously for SVE in 064737920bdb ("arm64: Filter out SVE hwcaps when FEAT_SVE isn't implemented") by filtering out the SME subfeature hwcaps when FEAT_SME is not present. Fixes: 5e64b862c482 ("arm64/sme: Basic enumeration support") Reported-by: Yury Khrustalev <yury.khrustalev@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250620-arm64-sme-filter-hwcaps-v1-1-02b9d3c2d8ef@kernel.org Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17riscv: vdso: Exclude .rodata from the PT_DYNAMIC segmentFangrui Song
[ Upstream commit e0eb1b6b0cd29ca7793c501d5960fd36ba11f110 ] .rodata is implicitly included in the PT_DYNAMIC segment due to inheriting the segment of the preceding .dynamic section (in both GNU ld and LLD). When the .rodata section's size is not a multiple of 16 bytes on riscv64, llvm-readelf will report a "PT_DYNAMIC dynamic table is invalid" warning. Note: in the presence of the .dynamic section, GNU readelf and llvm-readelf's -d option decodes the dynamic section using the section. This issue arose after commit 8f8c1ff879fab60f80f3a7aec3000f47e5b03ba9 ("riscv: vdso.lds.S: remove hardcoded 0x800 .text start addr"), which placed .rodata directly after .dynamic by removing .eh_frame. This patch resolves the implicit inclusion into PT_DYNAMIC by explicitly specifying the :text output section phdr. Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://github.com/ClangBuiltLinux/linux/issues/2093 Signed-off-by: Fangrui Song <i@maskray.me> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20250602-riscv-vdso-v1-1-0620cf63cff0@maskray.me Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-17um: vector: Reduce stack usage in vector_eth_configure()Tiwei Bie
[ Upstream commit 2d65fc13be85c336c56af7077f08ccd3a3a15a4a ] When compiling with clang (19.1.7), initializing *vp using a compound literal may result in excessive stack usage. Fix it by initializing the required fields of *vp individually. Without this patch: $ objdump -d arch/um/drivers/vector_kern.o | ./scripts/checkstack.pl x86_64 0 ... 0x0000000000000540 vector_eth_configure [vector_kern.o]:1472 ... With this patch: $ objdump -d arch/um/drivers/vector_kern.o | ./scripts/checkstack.pl x86_64 0 ... 0x0000000000000540 vector_eth_configure [vector_kern.o]:208 ... Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506221017.WtB7Usua-lkp@intel.com/ Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20250623110829.314864-1-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-17x86/mm: Disable hugetlb page table sharing on 32-bitJann Horn
commit 76303ee8d54bff6d9a6d55997acd88a6c2ba63cf upstream. Only select ARCH_WANT_HUGE_PMD_SHARE on 64-bit x86. Page table sharing requires at least three levels because it involves shared references to PMD tables; 32-bit x86 has either two-level paging (without PAE) or three-level paging (with PAE), but even with three-level paging, having a dedicated PGD entry for hugetlb is only barely possible (because the PGD only has four entries), and it seems unlikely anyone's actually using PMD sharing on 32-bit. Having ARCH_WANT_HUGE_PMD_SHARE enabled on non-PAE 32-bit X86 (which has 2-level paging) became particularly problematic after commit 59d9094df3d7 ("mm: hugetlb: independent PMD page table shared count"), since that changes `struct ptdesc` such that the `pt_mm` (for PGDs) and the `pt_share_count` (for PMDs) share the same union storage - and with 2-level paging, PMDs are PGDs. (For comparison, arm64 also gates ARCH_WANT_HUGE_PMD_SHARE on the configuration of page tables such that it is never enabled with 2-level paging.) Closes: https://lore.kernel.org/r/srhpjxlqfna67blvma5frmy3aa@altlinux.org Fixes: cfe28c5d63d8 ("x86: mm: Remove x86 version of huge_pmd_share.") Reported-by: Vitaly Chikunov <vt@altlinux.org> Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Oscar Salvador <osalvador@suse.de> Acked-by: David Hildenbrand <david@redhat.com> Tested-by: Vitaly Chikunov <vt@altlinux.org> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250702-x86-2level-hugetlb-v2-1-1a98096edf92%40google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17x86/rdrand: Disable RDSEED on AMD Cyan SkillfishMikhail Paulyshka
commit 5b937a1ed64ebeba8876e398110a5790ad77407c upstream. AMD Cyan Skillfish (Family 17h, Model 47h, Stepping 0h) has an error that causes RDSEED to always return 0xffffffff, while RDRAND works correctly. Mask the RDSEED cap for this CPU so that both /proc/cpuinfo and direct CPUID read report RDSEED as unavailable. [ bp: Move to amd.c, massage. ] Signed-off-by: Mikhail Paulyshka <me@mixaill.net> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/20250524145319.209075-1-me@mixaill.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17KVM: SVM: Reject SEV{-ES} intra host migration if vCPU creation is in-flightSean Christopherson
commit ecf371f8b02d5e31b9aa1da7f159f1b2107bdb01 upstream. Reject migration of SEV{-ES} state if either the source or destination VM is actively creating a vCPU, i.e. if kvm_vm_ioctl_create_vcpu() is in the section between incrementing created_vcpus and online_vcpus. The bulk of vCPU creation runs _outside_ of kvm->lock to allow creating multiple vCPUs in parallel, and so sev_info.es_active can get toggled from false=>true in the destination VM after (or during) svm_vcpu_create(), resulting in an SEV{-ES} VM effectively having a non-SEV{-ES} vCPU. The issue manifests most visibly as a crash when trying to free a vCPU's NULL VMSA page in an SEV-ES VM, but any number of things can go wrong. BUG: unable to handle page fault for address: ffffebde00000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP KASAN NOPTI CPU: 227 UID: 0 PID: 64063 Comm: syz.5.60023 Tainted: G U O 6.15.0-smp-DEV #2 NONE Tainted: [U]=USER, [O]=OOT_MODULE Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 12.52.0-0 10/28/2024 RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:206 [inline] RIP: 0010:arch_test_bit arch/x86/include/asm/bitops.h:238 [inline] RIP: 0010:_test_bit include/asm-generic/bitops/instrumented-non-atomic.h:142 [inline] RIP: 0010:PageHead include/linux/page-flags.h:866 [inline] RIP: 0010:___free_pages+0x3e/0x120 mm/page_alloc.c:5067 Code: <49> f7 06 40 00 00 00 75 05 45 31 ff eb 0c 66 90 4c 89 f0 4c 39 f0 RSP: 0018:ffff8984551978d0 EFLAGS: 00010246 RAX: 0000777f80000001 RBX: 0000000000000000 RCX: ffffffff918aeb98 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffebde00000000 RBP: 0000000000000000 R08: ffffebde00000007 R09: 1ffffd7bc0000000 R10: dffffc0000000000 R11: fffff97bc0000001 R12: dffffc0000000000 R13: ffff8983e19751a8 R14: ffffebde00000000 R15: 1ffffd7bc0000000 FS: 0000000000000000(0000) GS:ffff89ee661d3000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffebde00000000 CR3: 000000793ceaa000 CR4: 0000000000350ef0 DR0: 0000000000000000 DR1: 0000000000000b5f DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: <TASK> sev_free_vcpu+0x413/0x630 arch/x86/kvm/svm/sev.c:3169 svm_vcpu_free+0x13a/0x2a0 arch/x86/kvm/svm/svm.c:1515 kvm_arch_vcpu_destroy+0x6a/0x1d0 arch/x86/kvm/x86.c:12396 kvm_vcpu_destroy virt/kvm/kvm_main.c:470 [inline] kvm_destroy_vcpus+0xd1/0x300 virt/kvm/kvm_main.c:490 kvm_arch_destroy_vm+0x636/0x820 arch/x86/kvm/x86.c:12895 kvm_put_kvm+0xb8e/0xfb0 virt/kvm/kvm_main.c:1310 kvm_vm_release+0x48/0x60 virt/kvm/kvm_main.c:1369 __fput+0x3e4/0x9e0 fs/file_table.c:465 task_work_run+0x1a9/0x220 kernel/task_work.c:227 exit_task_work include/linux/task_work.h:40 [inline] do_exit+0x7f0/0x25b0 kernel/exit.c:953 do_group_exit+0x203/0x2d0 kernel/exit.c:1102 get_signal+0x1357/0x1480 kernel/signal.c:3034 arch_do_signal_or_restart+0x40/0x690 arch/x86/kernel/signal.c:337 exit_to_user_mode_loop kernel/entry/common.c:111 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x67/0xb0 kernel/entry/common.c:218 do_syscall_64+0x7c/0x150 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f87a898e969 </TASK> Modules linked in: gq(O) gsmi: Log Shutdown Reason 0x03 CR2: ffffebde00000000 ---[ end trace 0000000000000000 ]--- Deliberately don't check for a NULL VMSA when freeing the vCPU, as crashing the host is likely desirable due to the VMSA being consumed by hardware. E.g. if KVM manages to allow VMRUN on the vCPU, hardware may read/write a bogus VMSA page. Accessing PFN 0 is "fine"-ish now that it's sequestered away thanks to L1TF, but panicking in this scenario is preferable to potentially running with corrupted state. Reported-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Fixes: 0b020f5af092 ("KVM: SEV: Add support for SEV-ES intra host migration") Fixes: b56639318bb2 ("KVM: SEV: Add support for SEV intra host migration") Cc: stable@vger.kernel.org Cc: James Houghton <jthoughton@google.com> Cc: Peter Gonda <pgonda@google.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Tested-by: Liam Merwick <liam.merwick@oracle.com> Reviewed-by: James Houghton <jthoughton@google.com> Link: https://lore.kernel.org/r/20250602224459.41505-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17KVM: x86/xen: Allow 'out of range' event channel ports in IRQ routing table.David Woodhouse
commit a7f4dff21fd744d08fa956c243d2b1795f23cbf7 upstream. To avoid imposing an ordering constraint on userspace, allow 'invalid' event channel targets to be configured in the IRQ routing table. This is the same as accepting interrupts targeted at vCPUs which don't exist yet, which is already the case for both Xen event channels *and* for MSIs (which don't do any filtering of permitted APIC ID targets at all). If userspace actually *triggers* an IRQ with an invalid target, that will fail cleanly, as kvm_xen_set_evtchn_fast() also does the same range check. If KVM enforced that the IRQ target must be valid at the time it is *configured*, that would force userspace to create all vCPUs and do various other parts of setup (in this case, setting the Xen long_mode) before restoring the IRQ table. Cc: stable@vger.kernel.org Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/e489252745ac4b53f1f7f50570b03fb416aa2065.camel@infradead.org [sean: massage comment] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17x86/mce: Make sure CMCI banks are cleared during shutdown on IntelJP Kobryn
commit 30ad231a5029bfa16e46ce868497b1a5cdd3c24d upstream. CMCI banks are not cleared during shutdown on Intel CPUs. As a side effect, when a kexec is performed, CPUs coming back online are unable to rediscover/claim these occupied banks which breaks MCE reporting. Clear the CPU ownership during shutdown via cmci_clear() so the banks can be reclaimed and MCE reporting will become functional once more. [ bp: Massage commit message. ] Reported-by: Aijay Adams <aijay@meta.com> Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: <stable@kernel.org> Link: https://lore.kernel.org/20250627174935.95194-1-inwardvessel@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17x86/mce: Ensure user polling settings are honored when restarting timerYazen Ghannam
commit 00c092de6f28ebd32208aef83b02d61af2229b60 upstream. Users can disable MCA polling by setting the "ignore_ce" parameter or by setting "check_interval=0". This tells the kernel to *not* start the MCE timer on a CPU. If the user did not disable CMCI, then storms can occur. When these happen, the MCE timer will be started with a fixed interval. After the storm subsides, the timer's next interval is set to check_interval. This disregards the user's input through "ignore_ce" and "check_interval". Furthermore, if "check_interval=0", then the new timer will run faster than expected. Create a new helper to check these conditions and use it when a CMCI storm ends. [ bp: Massage. ] Fixes: 7eae17c4add5 ("x86/mce: Add per-bank CMCI storm mitigation") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-2-236dd74f645f@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17x86/mce: Don't remove sysfs if thresholding sysfs init failsYazen Ghannam
commit 4c113a5b28bfd589e2010b5fc8867578b0135ed7 upstream. Currently, the MCE subsystem sysfs interface will be removed if the thresholding sysfs interface fails to be created. A common failure is due to new MCA bank types that are not recognized and don't have a short name set. The MCA thresholding feature is optional and should not break the common MCE sysfs interface. Also, new MCA bank types are occasionally introduced, and updates will be needed to recognize them. But likewise, this should not break the common sysfs interface. Keep the MCE sysfs interface regardless of the status of the thresholding sysfs interface. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-1-236dd74f645f@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17x86/mce/amd: Fix threshold limit resetYazen Ghannam
commit 5f6e3b720694ad771911f637a51930f511427ce1 upstream. The MCA threshold limit must be reset after servicing the interrupt. Currently, the restart function doesn't have an explicit check for this. It makes some assumptions based on the current limit and what's in the registers. These assumptions don't always hold, so the limit won't be reset in some cases. Make the reset condition explicit. Either an interrupt/overflow has occurred or the bank is being initialized. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-4-236dd74f645f@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17x86/mce/amd: Add default names for MCA banks and blocksYazen Ghannam
commit d66e1e90b16055d2f0ee76e5384e3f119c3c2773 upstream. Ensure that sysfs init doesn't fail for new/unrecognized bank types or if a bank has additional blocks available. Most MCA banks have a single thresholding block, so the block takes the same name as the bank. Unified Memory Controllers (UMCs) are a special case where there are two blocks and each has a unique name. However, the microarchitecture allows for five blocks. Any new MCA bank types with more than one block will be missing names for the extra blocks. The MCE sysfs will fail to initialize in this case. Fixes: 87a6d4091bd7 ("x86/mce/AMD: Update sysfs bank names for SMCA systems") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-3-236dd74f645f@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17arm64: poe: Handle spurious Overlay faultsKevin Brodsky
[ Upstream commit 22f3a4f6085951eff28bd1e44d3f388c1d9a5f44 ] We do not currently issue an ISB after updating POR_EL0 when context-switching it, for instance. The rationale is that if the old value of POR_EL0 is more restrictive and causes a fault during uaccess, the access will be retried [1]. In other words, we are trading an ISB on every context-switching for the (unlikely) possibility of a spurious fault. We may also miss faults if the new value of POR_EL0 is more restrictive, but that's considered acceptable. However, as things stand, a spurious Overlay fault results in uaccess failing right away since it causes fault_from_pkey() to return true. If an Overlay fault is reported, we therefore need to double check POR_EL0 against vma_pkey(vma) - this is what arch_vma_access_permitted() already does. As it turns out, we already perform that explicit check if no Overlay fault is reported, and we need to keep that check (see comment added in fault_from_pkey()). Net result: the Overlay ISS2 bit isn't of much help to decide whether a pkey fault occurred. Remove the check for the Overlay bit from fault_from_pkey() and add a comment to try and explain the situation. While at it, also add a comment to permission_overlay_switch() in case anyone gets surprised by the lack of ISB. [1] https://lore.kernel.org/linux-arm-kernel/ZtYNGBrcE-j35fpw@arm.com/ Fixes: 160a8e13de6c ("arm64: context switch POR_EL0 register") Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Link: https://lore.kernel.org/r/20250619160042.2499290-2-kevin.brodsky@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-17crypto: s390/sha - Fix uninitialized variable in SHA-1 and SHA-2Eric Biggers
commit 68279380266a5fa70e664de754503338e2ec3f43 upstream. Commit 88c02b3f79a6 ("s390/sha3: Support sha3 performance enhancements") added the field s390_sha_ctx::first_message_part and made it be used by s390_sha_update() (now s390_sha_update_blocks()). At the time, s390_sha_update() was used by all the s390 SHA-1, SHA-2, and SHA-3 algorithms. However, only the initialization functions for SHA-3 were updated, leaving SHA-1 and SHA-2 using first_message_part uninitialized. This could cause e.g. the function code CPACF_KIMD_SHA_512 | CPACF_KIMD_NIP to be used instead of just CPACF_KIMD_SHA_512. This apparently was harmless, as the SHA-1 and SHA-2 function codes ignore CPACF_KIMD_NIP; it is recognized only by the SHA-3 function codes (https://lore.kernel.org/r/73477fe9-a1dc-4e38-98a6-eba9921e8afa@linux.ibm.com/). Therefore, this bug was found only when first_message_part was later converted to a boolean and UBSAN detected its uninitialized use. Regardless, let's fix this by just initializing to zero. Note: in 6.16, we need to patch SHA-1, SHA-384, and SHA-512. In 6.15 and earlier, we'll also need to patch SHA-224 and SHA-256, as they hadn't yet been librarified (which incidentally fixed this bug). Fixes: 88c02b3f79a6 ("s390/sha3: Support sha3 performance enhancements") Cc: stable@vger.kernel.org Reported-by: Ingo Franzki <ifranzki@linux.ibm.com> Closes: https://lore.kernel.org/r/12740696-595c-4604-873e-aefe8b405fbf@linux.ibm.com Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20250703172316.7914-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-14x86/CPU/AMD: Properly check the TSA microcodeBorislav Petkov (AMD)
In order to simplify backports, I resorted to an older version of the microcode revision checking which didn't pull in the whole struct x86_cpu_id matching machinery. My simpler method, however, forgot to add the extended CPU model to the patch revision, which lead to mismatches when determining whether TSA mitigation support is present. So add that forgotten extended model. This is a stable-only fix and the preference is to do it this way because it is a lot simpler. Also, the Fixes: tag below points to the respective stable patch. Fixes: 7a0395f6607a ("x86/bugs: Add a Transient Scheduler Attacks mitigation") Reported-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Thomas Voegtle <tv@lio96.de> Message-ID: <04ea0a8e-edb0-c59e-ce21-5f3d5d167af3@lio96.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-10x86/process: Move the buffer clearing before MONITORBorislav Petkov (AMD)
Commit 8e786a85c0a3c0fffae6244733fb576eeabd9dec upstream. Move the VERW clearing before the MONITOR so that VERW doesn't disarm it and the machine never enters C1. Original idea by Kim Phillips <kim.phillips@amd.com>. Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-10x86/microcode/AMD: Add TSA microcode SHAsBorislav Petkov (AMD)
Commit 2329f250e04d3b8e78b36a68b9880ca7750a07ef upstream. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-10KVM: SVM: Advertise TSA CPUID bits to guestsBorislav Petkov (AMD)
Commit 31272abd5974b38ba312e9cf2ec2f09f9dd7dcba upstream. Synthesize the TSA CPUID feature bits for guests. Set TSA_{SQ,L1}_NO on unaffected machines. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-10x86/bugs: Add a Transient Scheduler Attacks mitigationBorislav Petkov (AMD)
Commit d8010d4ba43e9f790925375a7de100604a5e2dba upstream. Add the required features detection glue to bugs.c et all in order to support the TSA mitigation. Co-developed-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>