summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-06-28KVM: nVMX: Request immediate exit iff pending nested event needs injectionSean Christopherson
When requesting an immediate exit from L2 in order to inject a pending event, do so only if the pending event actually requires manual injection, i.e. if and only if KVM actually needs to regain control in order to deliver the event. Avoiding the "immediate exit" isn't simply an optimization, it's necessary to make forward progress, as the "already expired" VMX preemption timer trick that KVM uses to force a VM-Exit has higher priority than events that aren't directly injected. At present time, this is a glorified nop as all events processed by vmx_has_nested_events() require injection, but that will not hold true in the future, e.g. if there's a pending virtual interrupt in vmcs02.RVI. I.e. if KVM is trying to deliver a virtual interrupt to L2, the expired VMX preemption timer will trigger VM-Exit before the virtual interrupt is delivered, and KVM will effectively hang the vCPU in an endless loop of forced immediate VM-Exits (because the pending virtual interrupt never goes away). Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240607172609.3205077-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-06-28KVM: nVMX: Add a helper to get highest pending from Posted Interrupt vectorSean Christopherson
Add a helper to retrieve the highest pending vector given a Posted Interrupt descriptor. While the actual operation is straightforward, it's surprisingly easy to mess up, e.g. if one tries to reuse lapic.c's find_highest_vector(), which doesn't work with PID.PIR due to the APIC's IRR and ISR component registers being physically discontiguous (they're 4-byte registers aligned at 16-byte intervals). To make PIR handling more consistent with respect to IRR and ISR handling, return -1 to indicate "no interrupt pending". Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240607172609.3205077-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-06-28KVM: VMX: Switch __vmx_exit() and kvm_x86_vendor_exit() in vmx_exit()Kai Huang
In the vmx_init() error handling path, the __vmx_exit() is done before kvm_x86_vendor_exit(). They should follow the same order in vmx_exit(). But currently __vmx_exit() is done after kvm_x86_vendor_exit() in vmx_exit(). Switch the order of them to fix. Fixes: e32b120071ea ("KVM: VMX: Do _all_ initialization before exposing /dev/kvm to userspace") Signed-off-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20240627010524.3732488-1-kai.huang@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-06-28KVM: VMX: Remove unnecessary INVEPT[GLOBAL] from hardware enable pathSean Christopherson
Remove the completely pointess global INVEPT, i.e. EPT TLB flush, from KVM's VMX enablement path. KVM always does a targeted TLB flush when using a "new" EPT root, in quotes because "new" simply means a root that isn't currently being used by the vCPU. KVM also _deliberately_ runs with stale TLB entries for defunct roots, i.e. doesn't do a TLB flush when vCPUs stop using roots, precisely because KVM does the flush on first use. As called out by the comment in kvm_mmu_load(), the reason KVM flushes on first use is because KVM can't guarantee the correctness of past hypervisors. Jumping back to the global INVEPT, when the painfully terse commit 1439442c7b25 ("KVM: VMX: Enable EPT feature for KVM") was added, the effective TLB flush being performed was: static void vmx_flush_tlb(struct kvm_vcpu *vcpu) { vpid_sync_vcpu_all(to_vmx(vcpu)); } I.e. KVM was not flushing EPT TLB entries when allocating a "new" root, which very strongly suggests that the global INVEPT during hardware enabling was a misguided hack that addressed the most obvious symptom, but failed to fix the underlying bug. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20240608001003.3296640-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-06-28KVM: nVMX: Update VMCS12_REVISION comment to state it should never changeSean Christopherson
Rewrite the comment above VMCS12_REVISION to unequivocally state that the ID must never change. KVM_{G,S}ET_NESTED_STATE have been officially supported for some time now, i.e. changing VMCS12_REVISION would break userspace. Opportunistically add a blurb to the CHECK_OFFSET() comment to make it explicitly clear that new fields are allowed, i.e. that the restriction on the layout is all about backwards compatibility. No functional change intended. Cc: Jim Mattson <jmattson@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20240613190103.1054877-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-06-20Merge branch 'kvm-6.10-fixes' into HEADPaolo Bonzini
2024-06-20KVM: x86/tdp_mmu: Sprinkle __must_checkIsaku Yamahata
The TDP MMU function __tdp_mmu_set_spte_atomic uses a cmpxchg64 to replace the SPTE value and returns -EBUSY on failure. The caller must check the return value and retry. Add __must_check to it, as well as to two more functions that forward the return value of __tdp_mmu_set_spte_atomic to their caller. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Message-Id: <8f7d5a1b241bf5351eaab828d1a1efe5c17699ca.1705965635.git.isaku.yamahata@intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-20KVM: interrupt kvm_gmem_populate() on signalsPaolo Bonzini
kvm_gmem_populate() is a potentially lengthy operation that can involve multiple calls to the firmware. Interrupt it if a signal arrives. Fixes: 1f6c06b177513 ("KVM: guest_memfd: Add interface for populating gmem pages with user data") Cc: Isaku Yamahata <isaku.yamahata@intel.com> Cc: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-20KVM: Discard zero mask with function kvm_dirty_ring_resetBibo Mao
Function kvm_reset_dirty_gfn may be called with parameters cur_slot / cur_offset / mask are all zero, it does not represent real dirty page. It is not necessary to clear dirty page in this condition. Also return value of macro __fls() is undefined if mask is zero which is called in funciton kvm_reset_dirty_gfn(). Here just return. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Message-ID: <20240613122803.1031511-1-maobibo@loongson.cn> [Move the conditional inside kvm_reset_dirty_gfn; suggested by Sean Christopherson. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-20virt: guest_memfd: fix reference leak on hwpoisoned pagePaolo Bonzini
If kvm_gmem_get_pfn() detects an hwpoisoned page, it returns -EHWPOISON but it does not put back the reference that kvm_gmem_get_folio() had grabbed. Add the forgotten folio_put(). Fixes: a7800aa80ea4 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory") Cc: stable@vger.kernel.org Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-20kvm: do not account temporary allocations to kmemAlexey Dobriyan
Some allocations done by KVM are temporary, they are created as result of program actions, but can't exists for arbitrary long times. They should have been GFP_TEMPORARY (rip!). OTOH, kvm-nx-lpage-recovery and kvm-pit kernel threads exist for as long as VM exists but their task_struct memory is not accounted. This is story for another day. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Message-ID: <c0122f66-f428-417e-a360-b25fc0f154a0@p183> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-20MAINTAINERS: Drop Wanpeng Li as a Reviewer for KVM Paravirt supportSean Christopherson
Drop Wanpeng as a KVM PARAVIRT reviewer as his @tencent.com email is bouncing, and according to lore[*], the last activity from his @gmail.com address was almost two years ago. [*] https://lore.kernel.org/all/CANRm+Cwj29M9HU3=JRUOaKDR+iDKgr0eNMWQi0iLkR5THON-bg@mail.gmail.com Cc: Wanpeng Li <kernellwp@gmail.com> Cc: Like Xu <like.xu.linux@gmail.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240610163427.3359426-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-20KVM: x86: Always sync PIR to IRR prior to scanning I/O APIC routesSean Christopherson
Sync pending posted interrupts to the IRR prior to re-scanning I/O APIC routes, irrespective of whether the I/O APIC is emulated by userspace or by KVM. If a level-triggered interrupt routed through the I/O APIC is pending or in-service for a vCPU, KVM needs to intercept EOIs on said vCPU even if the vCPU isn't the destination for the new routing, e.g. if servicing an interrupt using the old routing races with I/O APIC reconfiguration. Commit fceb3a36c29a ("KVM: x86: ioapic: Fix level-triggered EOI and userspace I/OAPIC reconfigure race") fixed the common cases, but kvm_apic_pending_eoi() only checks if an interrupt is in the local APIC's IRR or ISR, i.e. misses the uncommon case where an interrupt is pending in the PIR. Failure to intercept EOI can manifest as guest hangs with Windows 11 if the guest uses the RTC as its timekeeping source, e.g. if the VMM doesn't expose a more modern form of time to the guest. Cc: stable@vger.kernel.org Cc: Adamos Ttofari <attofari@amazon.de> Cc: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240611014845.82795-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-05KVM: SNP: Fix LBR Virtualization for SNP guestRavi Bangoria
SEV-ES and thus SNP guest mandates LBR Virtualization to be _always_ ON. Although commit b7e4be0a224f ("KVM: SEV-ES: Delegate LBR virtualization to the processor") did the correct change for SEV-ES guests, it missed the SNP. Fix it. Reported-by: Srikanth Aithal <sraithal@amd.com> Fixes: b7e4be0a224f ("KVM: SEV-ES: Delegate LBR virtualization to the processor") Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Message-ID: <20240605114810.1304-1-ravi.bangoria@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-05KVM: x86/mmu: Don't save mmu_invalidate_seq after checking private attrTao Su
Drop the second snapshot of mmu_invalidate_seq in kvm_faultin_pfn(). Before checking the mismatch of private vs. shared, mmu_invalidate_seq is saved to fault->mmu_seq, which can be used to detect an invalidation related to the gfn occurred, i.e. KVM will not install a mapping in page table if fault->mmu_seq != mmu_invalidate_seq. Currently there is a second snapshot of mmu_invalidate_seq, which may not be same as the first snapshot in kvm_faultin_pfn(), i.e. the gfn attribute may be changed between the two snapshots, but the gfn may be mapped in page table without hindrance. Therefore, drop the second snapshot as it has no obvious benefits. Fixes: f6adeae81f35 ("KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn()") Signed-off-by: Tao Su <tao1.su@linux.intel.com> Message-ID: <20240528102234.2162763-1-tao1.su@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-05Merge tag 'kvmarm-fixes-6.10-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.10, take #1 - Large set of FP/SVE fixes for pKVM, addressing the fallout from the per-CPU data rework and making sure that the host is not involved in the FP/SVE switching any more - Allow FEAT_BTI to be enabled with NV now that FEAT_PAUTH is copletely supported - Fix for the respective priorities of Failed PAC, Illegal Execution state and Instruction Abort exceptions - Fix the handling of AArch32 instruction traps failing their condition code, which was broken by the introduction of ESR_EL2.ISS2 - Allow vpcus running in AArch32 state to be restored in System mode - Fix AArch32 GPR restore that would lose the 64 bit state under some conditions
2024-06-04KVM: arm64: Ensure that SME controls are disabled in protected modeFuad Tabba
KVM (and pKVM) do not support SME guests. Therefore KVM ensures that the host's SME state is flushed and that SME controls for enabling access to ZA storage and for streaming are disabled. pKVM needs to protect against a buggy/malicious host. Ensure that it wouldn't run a guest when protected mode is enabled should any of the SME controls be enabled. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-10-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Refactor CPACR trap bit setting/clearing to use ELx formatFuad Tabba
When setting/clearing CPACR bits for EL0 and EL1, use the ELx format of the bits, which covers both. This makes the code clearer, and reduces the chances of accidentally missing a bit. No functional change intended. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-9-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Consolidate initializing the host data's fpsimd_state/sve in pKVMFuad Tabba
Now that we have introduced finalize_init_hyp_mode(), lets consolidate the initializing of the host_data fpsimd_state and sve state. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240603122852.3923848-8-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Eagerly restore host fpsimd/sve state in pKVMFuad Tabba
When running in protected mode we don't want to leak protected guest state to the host, including whether a guest has used fpsimd/sve. Therefore, eagerly restore the host state on guest exit when running in protected mode, which happens only if the guest has used fpsimd/sve. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-7-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVMFuad Tabba
Protected mode needs to maintain (save/restore) the host's sve state, rather than relying on the host kernel to do that. This is to avoid leaking information to the host about guests and the type of operations they are performing. As a first step towards that, allocate memory mapped at hyp, per cpu, for the host sve state. The following patch will use this memory to save/restore the host state. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-6-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Specialize handling of host fpsimd state on trapFuad Tabba
In subsequent patches, n/vhe will diverge on saving the host fpsimd/sve state when taking a guest fpsimd/sve trap. Add a specialized helper to handle it. No functional change intended. Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-5-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Abstract set/clear of CPTR_EL2 bits behind helperFuad Tabba
The same traps controlled by CPTR_EL2 or CPACR_EL1 need to be toggled in different parts of the code, but the exact bits and their polarity differ between these two formats and the mode (vhe/nvhe/hvhe). To reduce the amount of duplicated code and the chance of getting the wrong bit/polarity or missing a field, abstract the set/clear of CPTR_EL2 bits behind a helper. Since (h)VHE is the way of the future, use the CPACR_EL1 format, which is a subset of the VHE CPTR_EL2, as a reference. No functional change intended. Suggested-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Fix prototype for __sve_save_state/__sve_restore_stateFuad Tabba
Since the prototypes for __sve_save_state/__sve_restore_state at hyp were added, the underlying macro has acquired a third parameter for saving/restoring ffr. Fix the prototypes to account for the third parameter, and restore the ffr for the guest since it is saved. Suggested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240603122852.3923848-3-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Reintroduce __sve_save_stateFuad Tabba
Now that the hypervisor is handling the host sve state in protected mode, it needs to be able to save it. This reverts commit e66425fc9ba3 ("KVM: arm64: Remove unused __sve_save_state"). Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-2-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-03Merge branch 'kvm-6.11-sev-snp' into HEADPaolo Bonzini
Pull base x86 KVM support for running SEV-SNP guests from Michael Roth: * add some basic infrastructure and introduces a new KVM_X86_SNP_VM vm_type to handle differences versus the existing KVM_X86_SEV_VM and KVM_X86_SEV_ES_VM types. * implement the KVM API to handle the creation of a cryptographic launch context, encrypt/measure the initial image into guest memory, and finalize it before launching it. * implement handling for various guest-generated events such as page state changes, onlining of additional vCPUs, etc. * implement the gmem/mmu hooks needed to prepare gmem-allocated pages before mapping them into guest private memory ranges as well as cleaning them up prior to returning them to the host for use as normal memory. Because those cleanup hooks supplant certain activities like issuing WBINVDs during KVM MMU invalidations, avoid duplicating that work to avoid unecessary overhead. This merge leaves out support support for attestation guest requests and for loading the signing keys to be used for attestation requests.
2024-06-03Merge tag 'kvm-riscv-fixes-6.10-1' of https://github.com/kvm-riscv/linux ↵Paolo Bonzini
into HEAD KVM/riscv fixes for 6.10, take #1 - No need to use mask when hart-index-bits is 0 - Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext()
2024-06-03Merge branch 'kvm-fixes-6.10-1' into HEADPaolo Bonzini
* Fixes and debugging help for the #VE sanity check. Also disable it by default, even for CONFIG_DEBUG_KERNEL, because it was found to trigger spuriously (most likely a processor erratum as the exact symptoms vary by generation). * Avoid WARN() when two NMIs arrive simultaneously during an NMI-disabled situation (GIF=0 or interrupt shadow) when the processor supports virtual NMI. While generally KVM will not request an NMI window when virtual NMIs are supported, in this case it *does* have to single-step over the interrupt shadow or enable the STGI intercept, in order to deliver the latched second NMI. * Drop support for hand tuning APIC timer advancement from userspace. Since we have adaptive tuning, and it has proved to work well, drop the module parameter for manual configuration and with it a few stupid bugs that it had.
2024-06-03Merge branch 'kvm-fixes-6.10-1' into HEADPaolo Bonzini
* Fixes and debugging help for the #VE sanity check. Also disable it by default, even for CONFIG_DEBUG_KERNEL, because it was found to trigger spuriously (most likely a processor erratum as the exact symptoms vary by generation). * Avoid WARN() when two NMIs arrive simultaneously during an NMI-disabled situation (GIF=0 or interrupt shadow) when the processor supports virtual NMI. While generally KVM will not request an NMI window when virtual NMIs are supported, in this case it *does* have to single-step over the interrupt shadow or enable the STGI intercept, in order to deliver the latched second NMI. * Drop support for hand tuning APIC timer advancement from userspace. Since we have adaptive tuning, and it has proved to work well, drop the module parameter for manual configuration and with it a few stupid bugs that it had.
2024-06-03KVM: x86: Drop support for hand tuning APIC timer advancement from userspaceSean Christopherson
Remove support for specifying a static local APIC timer advancement value, and instead present a read-only boolean parameter to let userspace enable or disable KVM's dynamic APIC timer advancement. Realistically, it's all but impossible for userspace to specify an advancement that is more precise than what KVM's adaptive tuning can provide. E.g. a static value needs to be tuned for the exact hardware and kernel, and if KVM is using hrtimers, likely requires additional tuning for the exact configuration of the entire system. Dropping support for a userspace provided value also fixes several flaws in the interface. E.g. KVM interprets a negative value other than -1 as a large advancement, toggling between a negative and positive value yields unpredictable behavior as vCPUs will switch from dynamic to static advancement, changing the advancement in the middle of VM creation can result in different values for vCPUs within a VM, etc. Those flaws are mostly fixable, but there's almost no justification for taking on yet more complexity (it's minimal complexity, but still non-zero). The only arguments against using KVM's adaptive tuning is if a setup needs a higher maximum, or if the adjustments are too reactive, but those are arguments for letting userspace control the absolute max advancement and the granularity of each adjustment, e.g. similar to how KVM provides knobs for halt polling. Link: https://lore.kernel.org/all/20240520115334.852510-1-zhoushuling@huawei.com Cc: Shuling Zhou <zhoushuling@huawei.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240522010304.1650603-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03KVM: SEV-ES: Delegate LBR virtualization to the processorRavi Bangoria
As documented in APM[1], LBR Virtualization must be enabled for SEV-ES guests. Although KVM currently enforces LBRV for SEV-ES guests, there are multiple issues with it: o MSR_IA32_DEBUGCTLMSR is still intercepted. Since MSR_IA32_DEBUGCTLMSR interception is used to dynamically toggle LBRV for performance reasons, this can be fatal for SEV-ES guests. For ex SEV-ES guest on Zen3: [guest ~]# wrmsr 0x1d9 0x4 KVM: entry failed, hardware error 0xffffffff EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000 Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests. No additional save/restore logic is required since MSR_IA32_DEBUGCTLMSR is of swap type A. o KVM will disable LBRV if userspace sets MSR_IA32_DEBUGCTLMSR before the VMSA is encrypted. Fix this by moving LBRV enablement code post VMSA encryption. [1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June 2023, Vol 2, 15.35.2 Enabling SEV-ES. https://bugzilla.kernel.org/attachment.cgi?id=304653 Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading") Co-developed-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Message-ID: <20240531044644.768-4-ravi.bangoria@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03KVM: SEV-ES: Disallow SEV-ES guests when X86_FEATURE_LBRV is absentRavi Bangoria
As documented in APM[1], LBR Virtualization must be enabled for SEV-ES guests. So, prevent SEV-ES guests when LBRV support is missing. [1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June 2023, Vol 2, 15.35.2 Enabling SEV-ES. https://bugzilla.kernel.org/attachment.cgi?id=304653 Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading") Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Message-ID: <20240531044644.768-3-ravi.bangoria@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03KVM: SEV-ES: Prevent MSR access post VMSA encryptionNikunj A Dadhania
KVM currently allows userspace to read/write MSRs even after the VMSA is encrypted. This can cause unintentional issues if MSR access has side- effects. For ex, while migrating a guest, userspace could attempt to migrate MSR_IA32_DEBUGCTLMSR and end up unintentionally disabling LBRV on the target. Fix this by preventing access to those MSRs which are context switched via the VMSA, once the VMSA is encrypted. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Message-ID: <20240531044644.768-2-ravi.bangoria@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03KVM: SVM: Remove the need to trigger an UNBLOCK event on AP creationTom Lendacky
All SNP APs are initially started using the APIC INIT/SIPI sequence in the guest. This sequence moves the AP MP state from KVM_MP_STATE_UNINITIALIZED to KVM_MP_STATE_RUNNABLE, so there is no need to attempt the UNBLOCK. As it is, the UNBLOCK support in SVM is only enabled when AVIC is enabled. When AVIC is disabled, AP creation is still successful. Remove the KVM_REQ_UNBLOCK request from the AP creation code and revert the changes to the vcpu_unblocking() kvm_x86_ops path. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03KVM: SEV: Don't WARN() if RMP lookup fails when invalidating gmem pagesPaolo Bonzini
The hook only handles cleanup work specific to SNP, e.g. RMP table entries and flushing caches for encrypted guest memory. When run on a non-SNP-enabled host (currently only possible using KVM_X86_SW_PROTECTED_VM, e.g. via KVM selftests), the callback is a noop and will WARN due to the RMP table not being present. It's actually expected in this case that the RMP table wouldn't be present and that the hook should be a noop, so drop the WARN_ONCE(). Reported-by: Sean Christopherson <seanjc@google.com> Closes: https://lore.kernel.org/kvm/ZkU3_y0UoPk5yAeK@google.com/ Fixes: 8eb01900b018 ("KVM: SEV: Implement gmem hook for invalidating private pages") Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03KVM: SEV: Automatically switch reclaimed pages to sharedMichael Roth
Currently there's a consistent pattern of always calling host_rmp_make_shared() immediately after snp_page_reclaim(), so go ahead and handle it automatically as part of snp_page_reclaim(). Also rename it to kvm_rmp_make_shared() to more easily distinguish it as a KVM-specific variant of the more generic rmp_make_shared() helper. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-02Linux 6.10-rc2v6.10-rc2Linus Torvalds
2024-06-02Merge tag 'ata-6.10-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux Pull ata fixes from Niklas Cassel: - Add a quirk for three different devices that have shown issues with LPM (link power management). These devices appear to not implement LPM properly, since we see command timeouts when enabling LPM. The quirk disables LPM for these problematic devices. (Me) - Do not apply the Intel PCS quirk on Alder Lake. The quirk is not needed and was originally added by mistake when LPM support was enabled for this AHCI controller. Enabling the quirk when not needed causes the the controller to not be able to detect the connected devices on some platforms. * tag 'ata-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: ata: libata-core: Add ATA_HORKAGE_NOLPM for Apacer AS340 ata: libata-core: Add ATA_HORKAGE_NOLPM for AMD Radeon S3 SSD ata: libata-core: Add ATA_HORKAGE_NOLPM for Crucial CT240BX500SSD1 ata: ahci: Do not apply Intel PCS quirk on Intel Alder Lake
2024-06-02Merge tag 'x86-urgent-2024-06-02' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Miscellaneous topology parsing fixes: - Fix topology parsing regression on older CPUs in the new AMD/Hygon parser - Fix boot crash on odd Intel Quark and similar CPUs that do not fill out cpuinfo_x86::x86_clflush_size and zero out cpuinfo_x86::x86_cache_alignment as a result. Provide 32 bytes as a general fallback value. - Fix topology enumeration on certain rare CPUs where the BIOS locks certain CPUID leaves and the kernel unlocked them late, which broke with the new topology parsing code. Factor out this unlocking logic and move it earlier in the parsing sequence" * tag 'x86-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/topology/intel: Unlock CPUID before evaluating anything x86/cpu: Provide default cache line size if not enumerated x86/topology/amd: Evaluate SMT in CPUID leaf 0x8000001e only on family 0x17 and greater
2024-06-02Merge tag 'sched-urgent-2024-06-02' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Export a symbol to make life easier for instrumentation/debugging" * tag 'sched-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/x86: Export 'percpu arch_freq_scale'
2024-06-02Merge tag 'perf-urgent-2024-06-02' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events fix from Ingo Molnar: "Add missing MODULE_DESCRIPTION() lines" * tag 'perf-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Add missing MODULE_DESCRIPTION() lines perf/x86/rapl: Add missing MODULE_DESCRIPTION() line
2024-06-02Merge tag 'hardening-v6.10-rc2-take2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - scsi: mpt3sas: Avoid possible run-time warning with long manufacturer strings - mailmap: update entry for Kees Cook - kunit/fortify: Remove __kmalloc_node() test * tag 'hardening-v6.10-rc2-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: kunit/fortify: Remove __kmalloc_node() test mailmap: update entry for Kees Cook scsi: mpt3sas: Avoid possible run-time warning with long manufacturer strings
2024-06-01Merge tag 'powerpc-6.10-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Enforce full ordering for ATOMIC operations with BPF_FETCH - Fix uaccess build errors seen with GCC 13/14 - Fix build errors on ppc32 due to ARCH_HAS_KERNEL_FPU_SUPPORT - Drop error message from lparcfg guest name lookup Thanks to Christophe Leroy, Guenter Roeck, Nathan Lynch, Naveen N Rao, Puranjay Mohan, and Samuel Holland. * tag 'powerpc-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Limit ARCH_HAS_KERNEL_FPU_SUPPORT to PPC64 powerpc/uaccess: Use YZ asm constraint for ld powerpc/uaccess: Fix build errors seen with GCC 13/14 powerpc/pseries/lparcfg: drop error message from guest name lookup powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH
2024-06-01Merge tag 'firewire-fixes-6.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 Pull firewire fix from Takashi Sakamoto: "After merging a commit 1fffe7a34c89 ("script: modpost: emit a warning when the description is missing"), MODULE_DESCRIPTOR seems to be mandatory for kernel modules. In FireWire subsystem, the most of practical kernel modules have the field, while KUnit test modules do not. A single patch is applied to fix them" * tag 'firewire-fixes-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: firewire: add missing MODULE_DESCRIPTION() to test modules
2024-06-01Merge tag '6.10-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: "Two small smb3 fixes: - Fix socket creation with sfu mount option (spotted by test generic/423) - Minor cleanup: fix missing description in two files" * tag '6.10-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix creating sockets when using sfu mount options fs: smb: common: add missing MODULE_DESCRIPTION() macros
2024-06-01Merge tag 'kbuild-fixes-v6.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix a Kconfig bug regarding comparisons to 'm' or 'n' - Replace missed $(srctree)/$(src) - Fix unneeded kallsyms step 3 - Remove incorrect "compatible" properties from image nodes in image.fit - Improve gen_kheaders.sh - Fix 'make dt_binding_check' - Clean up unnecessary code * tag 'kbuild-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: dt-bindings: kbuild: Fix dt_binding_check on unconfigured build kheaders: use `command -v` to test for existence of `cpio` kheaders: explicitly define file modes for archived headers scripts/make_fit: Drop fdt image entry compatible string kbuild: remove a stale comment about cleaning in link-vmlinux.sh kbuild: fix short log for AS in link-vmlinux.sh kbuild: change scripts/mksysmap into sed script kbuild: avoid unneeded kallsyms step 3 kbuild: scripts/gdb: Replace missed $(srctree)/$(src) w/ $(src) kconfig: remove redundant check in expr_join_or() kconfig: fix comparison to constant symbols, 'm', 'n' kconfig: remove unused expr_is_no()
2024-06-01Merge tag 'xfs-6.10-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Chandan Babu: - Fix a livelock by dropping an xfarray sortinfo folio when an error is encountered - During extended attribute operations, Initialize transaction reservation computation based on attribute operation code - Relax symbolic link's ondisk verification code to allow symbolic links with short remote targets - Prevent soft lockups when unmapping file ranges and also during remapping blocks during a reflink operation - Fix compilation warnings when XFS is built with W=1 option * tag 'xfs-6.10-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: Add cond_resched to block unmap range and reflink remap path xfs: don't open-code u64_to_user_ptr xfs: allow symlinks with short remote targets xfs: fix xfs_init_attr_trans not handling explicit operation codes xfs: drop xfarray sortinfo folio on error xfs: Stop using __maybe_unused in xfs_alloc.c xfs: Clear W=1 warning in xfs_iwalk_run_callbacks()
2024-06-01Merge tag 'tty-6.10-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty fix from Greg KH: "Here is a single revert for a much-reported regression in 6.10-rc1 when it comes to a few older architectures. Turns out that the VT ioctls don't work the same across all cpu types because of some old compatibility requrements for stuff like alpha and powerpc. So revert the change that attempted to have them use the _IO() macros and go back to the known-working values instead. This has NOT been in linux-next but has had many reports that it fixes the issue with 6.10-rc1" * tag 'tty-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: Revert "VT: Use macros to define ioctls"
2024-06-01Merge tag 'landlock-6.10-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux Pull landlock fix from Mickaël Salaün: "This fixes a wrong path walk triggered by syzkaller" * tag 'landlock-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: selftests/landlock: Add layout1.refer_mount_root landlock: Fix d_parent walk
2024-06-01Revert "VT: Use macros to define ioctls"Greg Kroah-Hartman
This reverts commit 8c467f3300591a206fa8dcc6988d768910799872. Turns out this breaks many architectures as the vt ioctls do not all match up everywhere due to historical reasons, so the original commit is invalid for many values. Reported-by: Nick Bowler <nbowler@draconx.ca> Reported-by: Arnd Bergmann <arnd@kernel.org> Reported-by: Jiri Slaby <jirislaby@kernel.org> Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Gladkov <legion@kernel.org> Link: https://lore.kernel.org/r/ad4e561c-1d49-4f25-882c-7a36c6b1b5c0@draconx.ca Link: https://lore.kernel.org/r/0da9785e-ba44-4718-9d08-4e96c1ba7ab2@kernel.org Link: https://lore.kernel.org/all/34d848f4-670b-4493-bf21-130ef862521b@xenosoft.de/ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>