Age | Commit message (Collapse) | Author |
|
into HEAD
KVM/riscv fixes for 6.10, take #2
- Fix compilation for KVM selftests
|
|
KVM fixes for 6.10
- Fix a "shift too big" goof in the KVM_SEV_INIT2 selftest.
- Compute the max mappable gfn for KVM selftests on x86 using GuestMaxPhyAddr
from KVM's supported CPUID (if it's available).
- Fix a race in kvm_vcpu_on_spin() by ensuring loads and stores are atomic.
- Fix technically benign bug in __kvm_handle_hva_range() where KVM consumes
the return from a void-returning function as if it were a boolean.
|
|
With commit 27bd5fdc24c0 ("KVM: SEV-ES: Prevent MSR access post VMSA
encryption"), older VMMs like QEMU 9.0 and older will fail when booting
SEV-ES guests with something like the following error:
qemu-system-x86_64: error: failed to get MSR 0x174
qemu-system-x86_64: ../qemu.git/target/i386/kvm/kvm.c:3950: kvm_get_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
This is because older VMMs that might still call
svm_get_msr()/svm_set_msr() for SEV-ES guests after guest boot even if
those interfaces were essentially just noops because of the vCPU state
being encrypted and stored separately in the VMSA. Now those VMMs will
get an -EINVAL and generally crash.
Newer VMMs that are aware of KVM_SEV_INIT2 however are already aware of
the stricter limitations of what vCPU state can be sync'd during
guest run-time, so newer QEMU for instance will work both for legacy
KVM_SEV_ES_INIT interface as well as KVM_SEV_INIT2.
So when using KVM_SEV_INIT2 it's okay to assume userspace can deal with
-EINVAL, whereas for legacy KVM_SEV_ES_INIT the kernel might be dealing
with either an older VMM and so it needs to assume that returning
-EINVAL might break the VMM.
Address this by only returning -EINVAL if the guest was started with
KVM_SEV_INIT2. Otherwise, just silently return.
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nikunj A Dadhania <nikunj@amd.com>
Reported-by: Srikanth Aithal <sraithal@amd.com>
Closes: https://lore.kernel.org/lkml/37usuu4yu4ok7be2hqexhmcyopluuiqj3k266z4gajc2rcj4yo@eujb23qc3zcm/
Fixes: 27bd5fdc24c0 ("KVM: SEV-ES: Prevent MSR access post VMSA encryption")
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240604233510.764949-1-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.10, take #2
- Fix dangling references to a redistributor region if
the vgic was prematurely destroyed.
- Properly mark FFA buffers as released, ensuring that
both parties can make forward progress.
|
|
Function kvm_reset_dirty_gfn may be called with parameters cur_slot /
cur_offset / mask are all zero, it does not represent real dirty page.
It is not necessary to clear dirty page in this condition. Also return
value of macro __fls() is undefined if mask is zero which is called in
funciton kvm_reset_dirty_gfn(). Here just return.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Message-ID: <20240613122803.1031511-1-maobibo@loongson.cn>
[Move the conditional inside kvm_reset_dirty_gfn; suggested by
Sean Christopherson. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
If kvm_gmem_get_pfn() detects an hwpoisoned page, it returns -EHWPOISON
but it does not put back the reference that kvm_gmem_get_folio() had
grabbed. Add the forgotten folio_put().
Fixes: a7800aa80ea4 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
Cc: stable@vger.kernel.org
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Some allocations done by KVM are temporary, they are created as result
of program actions, but can't exists for arbitrary long times.
They should have been GFP_TEMPORARY (rip!).
OTOH, kvm-nx-lpage-recovery and kvm-pit kernel threads exist for as long
as VM exists but their task_struct memory is not accounted.
This is story for another day.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Message-ID: <c0122f66-f428-417e-a360-b25fc0f154a0@p183>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Drop Wanpeng as a KVM PARAVIRT reviewer as his @tencent.com email is
bouncing, and according to lore[*], the last activity from his @gmail.com
address was almost two years ago.
[*] https://lore.kernel.org/all/CANRm+Cwj29M9HU3=JRUOaKDR+iDKgr0eNMWQi0iLkR5THON-bg@mail.gmail.com
Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: Like Xu <like.xu.linux@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240610163427.3359426-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Sync pending posted interrupts to the IRR prior to re-scanning I/O APIC
routes, irrespective of whether the I/O APIC is emulated by userspace or
by KVM. If a level-triggered interrupt routed through the I/O APIC is
pending or in-service for a vCPU, KVM needs to intercept EOIs on said
vCPU even if the vCPU isn't the destination for the new routing, e.g. if
servicing an interrupt using the old routing races with I/O APIC
reconfiguration.
Commit fceb3a36c29a ("KVM: x86: ioapic: Fix level-triggered EOI and
userspace I/OAPIC reconfigure race") fixed the common cases, but
kvm_apic_pending_eoi() only checks if an interrupt is in the local
APIC's IRR or ISR, i.e. misses the uncommon case where an interrupt is
pending in the PIR.
Failure to intercept EOI can manifest as guest hangs with Windows 11 if
the guest uses the RTC as its timekeeping source, e.g. if the VMM doesn't
expose a more modern form of time to the guest.
Cc: stable@vger.kernel.org
Cc: Adamos Ttofari <attofari@amazon.de>
Cc: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240611014845.82795-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Bail from outer address space loop, not just the inner memslot loop, when
a "null" handler is encountered by __kvm_handle_hva_range(), which is the
intended behavior. On x86, which has multiple address spaces thanks to
SMM emulation, breaking from just the memslot loop results in undefined
behavior due to assigning the non-existent return value from kvm_null_fn()
to a bool.
In practice, the bug is benign as kvm_mmu_notifier_invalidate_range_end()
is the only caller that passes handler=kvm_null_fn, and it doesn't set
flush_on_ret, i.e. assigning garbage to r.ret is ultimately ignored. And
for most configuration the compiler elides the entire sequence, i.e. there
is no undefined behavior at runtime.
------------[ cut here ]------------
UBSAN: invalid-load in arch/x86/kvm/../../../virt/kvm/kvm_main.c:655:10
load of value 160 is not a valid value for type '_Bool'
CPU: 370 PID: 8246 Comm: CPU 0/KVM Not tainted 6.8.2-amdsos-build58-ubuntu-22.04+ #1
Hardware name: AMD Corporation Sh54p/Sh54p, BIOS WPC4429N 04/25/2024
Call Trace:
<TASK>
dump_stack_lvl+0x48/0x60
ubsan_epilogue+0x5/0x30
__ubsan_handle_load_invalid_value+0x79/0x80
kvm_mmu_notifier_invalidate_range_end.cold+0x18/0x4f [kvm]
__mmu_notifier_invalidate_range_end+0x63/0xe0
__split_huge_pmd+0x367/0xfc0
do_huge_pmd_wp_page+0x1cc/0x380
__handle_mm_fault+0x8ee/0xe50
handle_mm_fault+0xe4/0x4a0
__get_user_pages+0x190/0x840
get_user_pages_unlocked+0xe0/0x590
hva_to_pfn+0x114/0x550 [kvm]
kvm_faultin_pfn+0xed/0x5b0 [kvm]
kvm_tdp_page_fault+0x123/0x170 [kvm]
kvm_mmu_page_fault+0x244/0xaa0 [kvm]
vcpu_enter_guest+0x592/0x1070 [kvm]
kvm_arch_vcpu_ioctl_run+0x145/0x8a0 [kvm]
kvm_vcpu_ioctl+0x288/0x6d0 [kvm]
__x64_sys_ioctl+0x8f/0xd0
do_syscall_64+0x77/0x120
entry_SYSCALL_64_after_hwframe+0x6e/0x76
</TASK>
---[ end trace ]---
Fixes: 071064f14d87 ("KVM: Don't take mmu_lock for range invalidation unless necessary")
Signed-off-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/b8723d39903b64c241c50f5513f804390c7b5eec.1718203311.git.babu.moger@amd.com
[sean: massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
According to the FF-A spec (Buffer states and ownership), after a
producer has written into a buffer, it is "full" and now owned by the
consumer. The producer won't be able to use that buffer, until the
consumer hands it over with an invocation such as RX_RELEASE.
It is clear in the following paragraph (Transfer of buffer ownership),
that MEM_RETRIEVE_RESP is transferring the ownership from producer (in
our case SPM) to consumer (hypervisor). RX_RELEASE is therefore
mandatory here.
It is less clear though what is happening with MEM_FRAG_TX. But this
invocation, as a response to MEM_FRAG_RX writes into the same hypervisor
RX buffer (see paragraph "Transmission of transaction descriptor in
fragments"). Also this is matching the TF-A implementation where the RX
buffer is marked "full" during a MEM_FRAG_RX.
Release the RX hypervisor buffer in those two cases. This will unblock
later invocations using this buffer which would otherwise fail.
(RETRIEVE_REQ, MEM_FRAG_RX and PARTITION_INFO_GET).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20240611175317.1220842-1-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Due to commit 2b7deea3ec7c ("Revert "kvm: selftests: move base
kvm_util.h declarations to kvm_util_base.h"") kvm selftests now
requires explicitly including ucall_common.h when needed. The commit
added the directives everywhere they were needed at the time, but, by
merge time, new places had been merged for RISC-V. Add those now to
fix RISC-V's compilation.
Fixes: dee7ea42a1eb ("Merge tag 'kvm-x86-selftests_utils-6.10' of https://github.com/kvm-x86/linux into HEAD")
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240603122045.323064-2-ajones@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
When tearing down a redistributor region, make sure we don't have
any dangling pointer to that region stored in a vcpu.
Fixes: e5a35635464b ("kvm: arm64: vgic-v3: Introduce vgic_v3_free_redist_region()")
Reported-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240605175637.1635653-1-maz@kernel.org
Cc: stable@vger.kernel.org
|
|
Use {READ,WRITE}_ONCE() to access kvm->last_boosted_vcpu to ensure the
loads and stores are atomic. In the extremely unlikely scenario the
compiler tears the stores, it's theoretically possible for KVM to attempt
to get a vCPU using an out-of-bounds index, e.g. if the write is split
into multiple 8-bit stores, and is paired with a 32-bit load on a VM with
257 vCPUs:
CPU0 CPU1
last_boosted_vcpu = 0xff;
(last_boosted_vcpu = 0x100)
last_boosted_vcpu[15:8] = 0x01;
i = (last_boosted_vcpu = 0x1ff)
last_boosted_vcpu[7:0] = 0x00;
vcpu = kvm->vcpu_array[0x1ff];
As detected by KCSAN:
BUG: KCSAN: data-race in kvm_vcpu_on_spin [kvm] / kvm_vcpu_on_spin [kvm]
write to 0xffffc90025a92344 of 4 bytes by task 4340 on cpu 16:
kvm_vcpu_on_spin (arch/x86/kvm/../../../virt/kvm/kvm_main.c:4112) kvm
handle_pause (arch/x86/kvm/vmx/vmx.c:5929) kvm_intel
vmx_handle_exit (arch/x86/kvm/vmx/vmx.c:?
arch/x86/kvm/vmx/vmx.c:6606) kvm_intel
vcpu_run (arch/x86/kvm/x86.c:11107 arch/x86/kvm/x86.c:11211) kvm
kvm_arch_vcpu_ioctl_run (arch/x86/kvm/x86.c:?) kvm
kvm_vcpu_ioctl (arch/x86/kvm/../../../virt/kvm/kvm_main.c:?) kvm
__se_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:904 fs/ioctl.c:890)
__x64_sys_ioctl (fs/ioctl.c:890)
x64_sys_call (arch/x86/entry/syscall_64.c:33)
do_syscall_64 (arch/x86/entry/common.c:?)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
read to 0xffffc90025a92344 of 4 bytes by task 4342 on cpu 4:
kvm_vcpu_on_spin (arch/x86/kvm/../../../virt/kvm/kvm_main.c:4069) kvm
handle_pause (arch/x86/kvm/vmx/vmx.c:5929) kvm_intel
vmx_handle_exit (arch/x86/kvm/vmx/vmx.c:?
arch/x86/kvm/vmx/vmx.c:6606) kvm_intel
vcpu_run (arch/x86/kvm/x86.c:11107 arch/x86/kvm/x86.c:11211) kvm
kvm_arch_vcpu_ioctl_run (arch/x86/kvm/x86.c:?) kvm
kvm_vcpu_ioctl (arch/x86/kvm/../../../virt/kvm/kvm_main.c:?) kvm
__se_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:904 fs/ioctl.c:890)
__x64_sys_ioctl (fs/ioctl.c:890)
x64_sys_call (arch/x86/entry/syscall_64.c:33)
do_syscall_64 (arch/x86/entry/common.c:?)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
value changed: 0x00000012 -> 0x00000000
Fixes: 217ece6129f2 ("KVM: use yield_to instead of sleep in kvm_vcpu_on_spin")
Cc: stable@vger.kernel.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240510092353.2261824-1-leitao@debian.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Use the max mappable GPA via GuestPhysBits advertised by KVM to calculate
max_gfn. Currently some selftests (e.g. access_tracking_perf_test,
dirty_log_test...) add RAM regions close to max_gfn, so guest may access
GPA beyond its mappable range and cause infinite loop.
Adjust max_gfn in vm_compute_max_gfn() since x86 selftests already
overrides vm_compute_max_gfn() specifically to deal with goofy edge cases.
Reported-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20240513014003.104593-1-tao1.su@linux.intel.com
[sean: tweak name, add comment and sanity check]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Currrentl a 32 bit 1u value is being shifted more than 32 bits causing
overflow and incorrect checking of bits 32-63. Fix this by using the
BIT_ULL macro for shifting bits.
Detected by cppcheck:
sev_init2_tests.c:108:34: error: Shifting 32-bit value by 63 bits is
undefined behaviour [shiftTooManyBits]
Fixes: dfc083a181ba ("selftests: kvm: add tests for KVM_SEV_INIT2")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20240523154102.2236133-1-colin.i.king@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Drop the second snapshot of mmu_invalidate_seq in kvm_faultin_pfn().
Before checking the mismatch of private vs. shared, mmu_invalidate_seq is
saved to fault->mmu_seq, which can be used to detect an invalidation
related to the gfn occurred, i.e. KVM will not install a mapping in page
table if fault->mmu_seq != mmu_invalidate_seq.
Currently there is a second snapshot of mmu_invalidate_seq, which may not
be same as the first snapshot in kvm_faultin_pfn(), i.e. the gfn attribute
may be changed between the two snapshots, but the gfn may be mapped in
page table without hindrance. Therefore, drop the second snapshot as it
has no obvious benefits.
Fixes: f6adeae81f35 ("KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn()")
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Message-ID: <20240528102234.2162763-1-tao1.su@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.10, take #1
- Large set of FP/SVE fixes for pKVM, addressing the fallout
from the per-CPU data rework and making sure that the host
is not involved in the FP/SVE switching any more
- Allow FEAT_BTI to be enabled with NV now that FEAT_PAUTH
is copletely supported
- Fix for the respective priorities of Failed PAC, Illegal
Execution state and Instruction Abort exceptions
- Fix the handling of AArch32 instruction traps failing their
condition code, which was broken by the introduction of
ESR_EL2.ISS2
- Allow vpcus running in AArch32 state to be restored in
System mode
- Fix AArch32 GPR restore that would lose the 64 bit state
under some conditions
|
|
KVM (and pKVM) do not support SME guests. Therefore KVM ensures
that the host's SME state is flushed and that SME controls for
enabling access to ZA storage and for streaming are disabled.
pKVM needs to protect against a buggy/malicious host. Ensure that
it wouldn't run a guest when protected mode is enabled should any
of the SME controls be enabled.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-10-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
When setting/clearing CPACR bits for EL0 and EL1, use the ELx
format of the bits, which covers both. This makes the code
clearer, and reduces the chances of accidentally missing a bit.
No functional change intended.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-9-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Now that we have introduced finalize_init_hyp_mode(), lets
consolidate the initializing of the host_data fpsimd_state and
sve state.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240603122852.3923848-8-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
When running in protected mode we don't want to leak protected
guest state to the host, including whether a guest has used
fpsimd/sve. Therefore, eagerly restore the host state on guest
exit when running in protected mode, which happens only if the
guest has used fpsimd/sve.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-7-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Protected mode needs to maintain (save/restore) the host's sve
state, rather than relying on the host kernel to do that. This is
to avoid leaking information to the host about guests and the
type of operations they are performing.
As a first step towards that, allocate memory mapped at hyp, per
cpu, for the host sve state. The following patch will use this
memory to save/restore the host state.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-6-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In subsequent patches, n/vhe will diverge on saving the host
fpsimd/sve state when taking a guest fpsimd/sve trap. Add a
specialized helper to handle it.
No functional change intended.
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-5-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The same traps controlled by CPTR_EL2 or CPACR_EL1 need to be
toggled in different parts of the code, but the exact bits and
their polarity differ between these two formats and the mode
(vhe/nvhe/hvhe).
To reduce the amount of duplicated code and the chance of getting
the wrong bit/polarity or missing a field, abstract the set/clear
of CPTR_EL2 bits behind a helper.
Since (h)VHE is the way of the future, use the CPACR_EL1 format,
which is a subset of the VHE CPTR_EL2, as a reference.
No functional change intended.
Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Since the prototypes for __sve_save_state/__sve_restore_state at
hyp were added, the underlying macro has acquired a third
parameter for saving/restoring ffr.
Fix the prototypes to account for the third parameter, and
restore the ffr for the guest since it is saved.
Suggested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240603122852.3923848-3-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Now that the hypervisor is handling the host sve state in
protected mode, it needs to be able to save it.
This reverts commit e66425fc9ba3 ("KVM: arm64: Remove unused
__sve_save_state").
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240603122852.3923848-2-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
into HEAD
KVM/riscv fixes for 6.10, take #1
- No need to use mask when hart-index-bits is 0
- Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext()
|
|
* Fixes and debugging help for the #VE sanity check. Also disable
it by default, even for CONFIG_DEBUG_KERNEL, because it was found
to trigger spuriously (most likely a processor erratum as the
exact symptoms vary by generation).
* Avoid WARN() when two NMIs arrive simultaneously during an NMI-disabled
situation (GIF=0 or interrupt shadow) when the processor supports
virtual NMI. While generally KVM will not request an NMI window
when virtual NMIs are supported, in this case it *does* have to
single-step over the interrupt shadow or enable the STGI intercept,
in order to deliver the latched second NMI.
* Drop support for hand tuning APIC timer advancement from userspace.
Since we have adaptive tuning, and it has proved to work well,
drop the module parameter for manual configuration and with it a
few stupid bugs that it had.
|
|
Remove support for specifying a static local APIC timer advancement value,
and instead present a read-only boolean parameter to let userspace enable
or disable KVM's dynamic APIC timer advancement. Realistically, it's all
but impossible for userspace to specify an advancement that is more
precise than what KVM's adaptive tuning can provide. E.g. a static value
needs to be tuned for the exact hardware and kernel, and if KVM is using
hrtimers, likely requires additional tuning for the exact configuration of
the entire system.
Dropping support for a userspace provided value also fixes several flaws
in the interface. E.g. KVM interprets a negative value other than -1 as a
large advancement, toggling between a negative and positive value yields
unpredictable behavior as vCPUs will switch from dynamic to static
advancement, changing the advancement in the middle of VM creation can
result in different values for vCPUs within a VM, etc. Those flaws are
mostly fixable, but there's almost no justification for taking on yet more
complexity (it's minimal complexity, but still non-zero).
The only arguments against using KVM's adaptive tuning is if a setup needs
a higher maximum, or if the adjustments are too reactive, but those are
arguments for letting userspace control the absolute max advancement and
the granularity of each adjustment, e.g. similar to how KVM provides knobs
for halt polling.
Link: https://lore.kernel.org/all/20240520115334.852510-1-zhoushuling@huawei.com
Cc: Shuling Zhou <zhoushuling@huawei.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240522010304.1650603-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
As documented in APM[1], LBR Virtualization must be enabled for SEV-ES
guests. Although KVM currently enforces LBRV for SEV-ES guests, there
are multiple issues with it:
o MSR_IA32_DEBUGCTLMSR is still intercepted. Since MSR_IA32_DEBUGCTLMSR
interception is used to dynamically toggle LBRV for performance reasons,
this can be fatal for SEV-ES guests. For ex SEV-ES guest on Zen3:
[guest ~]# wrmsr 0x1d9 0x4
KVM: entry failed, hardware error 0xffffffff
EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000
Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests.
No additional save/restore logic is required since MSR_IA32_DEBUGCTLMSR
is of swap type A.
o KVM will disable LBRV if userspace sets MSR_IA32_DEBUGCTLMSR before the
VMSA is encrypted. Fix this by moving LBRV enablement code post VMSA
encryption.
[1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June
2023, Vol 2, 15.35.2 Enabling SEV-ES.
https://bugzilla.kernel.org/attachment.cgi?id=304653
Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
Co-developed-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Message-ID: <20240531044644.768-4-ravi.bangoria@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
As documented in APM[1], LBR Virtualization must be enabled for SEV-ES
guests. So, prevent SEV-ES guests when LBRV support is missing.
[1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June
2023, Vol 2, 15.35.2 Enabling SEV-ES.
https://bugzilla.kernel.org/attachment.cgi?id=304653
Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Message-ID: <20240531044644.768-3-ravi.bangoria@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
KVM currently allows userspace to read/write MSRs even after the VMSA is
encrypted. This can cause unintentional issues if MSR access has side-
effects. For ex, while migrating a guest, userspace could attempt to
migrate MSR_IA32_DEBUGCTLMSR and end up unintentionally disabling LBRV on
the target. Fix this by preventing access to those MSRs which are context
switched via the VMSA, once the VMSA is encrypted.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Message-ID: <20240531044644.768-2-ravi.bangoria@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fixes from Niklas Cassel:
- Add a quirk for three different devices that have shown issues with
LPM (link power management). These devices appear to not implement
LPM properly, since we see command timeouts when enabling LPM. The
quirk disables LPM for these problematic devices. (Me)
- Do not apply the Intel PCS quirk on Alder Lake. The quirk is not
needed and was originally added by mistake when LPM support was
enabled for this AHCI controller. Enabling the quirk when not needed
causes the the controller to not be able to detect the connected
devices on some platforms.
* tag 'ata-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata-core: Add ATA_HORKAGE_NOLPM for Apacer AS340
ata: libata-core: Add ATA_HORKAGE_NOLPM for AMD Radeon S3 SSD
ata: libata-core: Add ATA_HORKAGE_NOLPM for Crucial CT240BX500SSD1
ata: ahci: Do not apply Intel PCS quirk on Intel Alder Lake
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Miscellaneous topology parsing fixes:
- Fix topology parsing regression on older CPUs in the new AMD/Hygon
parser
- Fix boot crash on odd Intel Quark and similar CPUs that do not fill
out cpuinfo_x86::x86_clflush_size and zero out
cpuinfo_x86::x86_cache_alignment as a result.
Provide 32 bytes as a general fallback value.
- Fix topology enumeration on certain rare CPUs where the BIOS locks
certain CPUID leaves and the kernel unlocked them late, which broke
with the new topology parsing code. Factor out this unlocking logic
and move it earlier in the parsing sequence"
* tag 'x86-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/topology/intel: Unlock CPUID before evaluating anything
x86/cpu: Provide default cache line size if not enumerated
x86/topology/amd: Evaluate SMT in CPUID leaf 0x8000001e only on family 0x17 and greater
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Export a symbol to make life easier for instrumentation/debugging"
* tag 'sched-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/x86: Export 'percpu arch_freq_scale'
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events fix from Ingo Molnar:
"Add missing MODULE_DESCRIPTION() lines"
* tag 'perf-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Add missing MODULE_DESCRIPTION() lines
perf/x86/rapl: Add missing MODULE_DESCRIPTION() line
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- scsi: mpt3sas: Avoid possible run-time warning with long manufacturer
strings
- mailmap: update entry for Kees Cook
- kunit/fortify: Remove __kmalloc_node() test
* tag 'hardening-v6.10-rc2-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
kunit/fortify: Remove __kmalloc_node() test
mailmap: update entry for Kees Cook
scsi: mpt3sas: Avoid possible run-time warning with long manufacturer strings
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Enforce full ordering for ATOMIC operations with BPF_FETCH
- Fix uaccess build errors seen with GCC 13/14
- Fix build errors on ppc32 due to ARCH_HAS_KERNEL_FPU_SUPPORT
- Drop error message from lparcfg guest name lookup
Thanks to Christophe Leroy, Guenter Roeck, Nathan Lynch, Naveen N Rao,
Puranjay Mohan, and Samuel Holland.
* tag 'powerpc-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Limit ARCH_HAS_KERNEL_FPU_SUPPORT to PPC64
powerpc/uaccess: Use YZ asm constraint for ld
powerpc/uaccess: Fix build errors seen with GCC 13/14
powerpc/pseries/lparcfg: drop error message from guest name lookup
powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire fix from Takashi Sakamoto:
"After merging a commit 1fffe7a34c89 ("script: modpost: emit a warning
when the description is missing"), MODULE_DESCRIPTOR seems to be
mandatory for kernel modules. In FireWire subsystem, the most of
practical kernel modules have the field, while KUnit test modules do
not. A single patch is applied to fix them"
* tag 'firewire-fixes-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: add missing MODULE_DESCRIPTION() to test modules
|
|
Pull smb client fixes from Steve French:
"Two small smb3 fixes:
- Fix socket creation with sfu mount option (spotted by test generic/423)
- Minor cleanup: fix missing description in two files"
* tag '6.10-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix creating sockets when using sfu mount options
fs: smb: common: add missing MODULE_DESCRIPTION() macros
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix a Kconfig bug regarding comparisons to 'm' or 'n'
- Replace missed $(srctree)/$(src)
- Fix unneeded kallsyms step 3
- Remove incorrect "compatible" properties from image nodes in
image.fit
- Improve gen_kheaders.sh
- Fix 'make dt_binding_check'
- Clean up unnecessary code
* tag 'kbuild-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
dt-bindings: kbuild: Fix dt_binding_check on unconfigured build
kheaders: use `command -v` to test for existence of `cpio`
kheaders: explicitly define file modes for archived headers
scripts/make_fit: Drop fdt image entry compatible string
kbuild: remove a stale comment about cleaning in link-vmlinux.sh
kbuild: fix short log for AS in link-vmlinux.sh
kbuild: change scripts/mksysmap into sed script
kbuild: avoid unneeded kallsyms step 3
kbuild: scripts/gdb: Replace missed $(srctree)/$(src) w/ $(src)
kconfig: remove redundant check in expr_join_or()
kconfig: fix comparison to constant symbols, 'm', 'n'
kconfig: remove unused expr_is_no()
|
|
Pull xfs fixes from Chandan Babu:
- Fix a livelock by dropping an xfarray sortinfo folio when an error
is encountered
- During extended attribute operations, Initialize transaction
reservation computation based on attribute operation code
- Relax symbolic link's ondisk verification code to allow symbolic
links with short remote targets
- Prevent soft lockups when unmapping file ranges and also during
remapping blocks during a reflink operation
- Fix compilation warnings when XFS is built with W=1 option
* tag 'xfs-6.10-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: Add cond_resched to block unmap range and reflink remap path
xfs: don't open-code u64_to_user_ptr
xfs: allow symlinks with short remote targets
xfs: fix xfs_init_attr_trans not handling explicit operation codes
xfs: drop xfarray sortinfo folio on error
xfs: Stop using __maybe_unused in xfs_alloc.c
xfs: Clear W=1 warning in xfs_iwalk_run_callbacks()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty fix from Greg KH:
"Here is a single revert for a much-reported regression in 6.10-rc1
when it comes to a few older architectures.
Turns out that the VT ioctls don't work the same across all cpu types
because of some old compatibility requrements for stuff like alpha and
powerpc. So revert the change that attempted to have them use the
_IO() macros and go back to the known-working values instead.
This has NOT been in linux-next but has had many reports that it fixes
the issue with 6.10-rc1"
* tag 'tty-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "VT: Use macros to define ioctls"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock fix from Mickaël Salaün:
"This fixes a wrong path walk triggered by syzkaller"
* tag 'landlock-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
selftests/landlock: Add layout1.refer_mount_root
landlock: Fix d_parent walk
|
|
This reverts commit 8c467f3300591a206fa8dcc6988d768910799872.
Turns out this breaks many architectures as the vt ioctls do not all
match up everywhere due to historical reasons, so the original commit is
invalid for many values.
Reported-by: Nick Bowler <nbowler@draconx.ca>
Reported-by: Arnd Bergmann <arnd@kernel.org>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Gladkov <legion@kernel.org>
Link: https://lore.kernel.org/r/ad4e561c-1d49-4f25-882c-7a36c6b1b5c0@draconx.ca
Link: https://lore.kernel.org/r/0da9785e-ba44-4718-9d08-4e96c1ba7ab2@kernel.org
Link: https://lore.kernel.org/all/34d848f4-670b-4493-bf21-130ef862521b@xenosoft.de/
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Pull drm fixes from Dave Airlie:
"This is the weekly fixes. Lots of small fixes across the board, one
BUG_ON fix in shmem seems most important, otherwise amdgpu, i915, xe
mostly with small fixes to all the other drivers.
shmem:
- fix BUG_ON in COW handling
- warn when trying to pin imported objects
buddy:
- fix page size handling
dma-buf:
- sw-sync: Don't interfere with IRQ handling
- fix kthreads-handling error path
i915:
- fix a race in audio component by registering it later
- make DPT object unshrinkable to avoid shrinking when framebuffer
has not shrunk
- fix CCS id calculation to fix a perf regression
- fix selftest caching mode
- fix FIELD_PREP compiler warnings
- fix indefinite wait for GT wakeref release
- revert overeager multi-gt pm reference removal
xe:
- pcode polling timeout change
- fix for deadlocks for faulting VMs
- error-path lock imbalance fix
amdgpu:
- RAS fix
- fix colorspace property for MST connectors
- fix for PCIe DPM
- silence UBSAN warning
- GPUVM robustness fix
- partition fix
- drop deprecated I2C_CLASS_SPD
amdkfd:
- revert unused changes for certain 11.0.3 devices
- simplify APU VRAM handling
lima:
- fix dma_resv-related deadlock in object pin
msm:
- remove build-time dependency on Python 3.9
nouveau:
- nvif: Fix possible integer overflow
panel:
- lg-sw43408: Select DP helpers; Declare backlight ops as static
- sitronix-st7789v: Various fixes for jt240mhqs_hwt_ek_e3 panel
panfrost:
- fix dma_resv-related deadlock in object pin"
* tag 'drm-fixes-2024-06-01' of https://gitlab.freedesktop.org/drm/kernel: (35 commits)
drm/msm: remove python 3.9 dependency for compiling msm
drm/panel: sitronix-st7789v: fix display size for jt240mhqs_hwt_ek_e3 panel
drm/panel: sitronix-st7789v: tweak timing for jt240mhqs_hwt_ek_e3 panel
drm/panel: sitronix-st7789v: fix timing for jt240mhqs_hwt_ek_e3 panel
drm/amd/pm: remove deprecated I2C_CLASS_SPD support from newly added SMU_14_0_2
drm/amdgpu: Make CPX mode auto default in NPS4
drm/amdkfd: simplify APU VRAM handling
Revert "drm/amdkfd: fix gfx_target_version for certain 11.0.3 devices"
drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent
drm/amdgpu: silence UBSAN warning
drm/amdgpu: Adjust logic in amdgpu_device_partner_bandwidth()
drm/i915: Fix audio component initialization
drm/i915/dpt: Make DPT object unshrinkable
drm/i915/gt: Fix CCS id's calculation for CCS mode setting
drm/panel/lg-sw43408: mark sw43408_backlight_ops as static
drm/i915/selftests: Set always_coherent to false when reading from CPU
drm/panel/lg-sw43408: select CONFIG_DRM_DISPLAY_DP_HELPER
drm/i915/guc: avoid FIELD_PREP warning
drm/i915/gt: Disarm breadcrumbs if engines are already idle
Revert "drm/i915: Remove extra multi-gt pm-references"
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- sttcs: Fix property spelling
- intel-m10-bmc-hwmon: Fix multiplier for N6000 board power sensor
- ltc2992: Fix memory leak
- dell-smm: Add Dell G15 5511 to fan control whitelist
* tag 'hwmon-for-v6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (shtc1) Fix property misspelling
hwmon: (intel-m10-bmc-hwmon) Fix multiplier for N6000 board power sensor
hwmon: (ltc2992) Fix memory leak in ltc2992_parse_dt()
hwmon: (dell-smm) Add Dell G15 5511 to fan control whitelist
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox
Pull mailbox fix from Jassi Brar:
- zynqmp-ipi: fix linker error on some configurations
* tag 'mailbox-fixes-v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox:
mailbox: zynqmp-ipi: drop irq_to_desc() call
|