summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-12-23KVM: x86/mmu: Zap invalid roots with mmu_lock holding for write at uninitRick Edgecombe
Prepare for a future TDX patch which asserts that atomic zapping (i.e. zapping with mmu_lock taken for read) don't operate on mirror roots. When tearing down a VM, all roots have to be zapped (including mirror roots once they're in place) so do that with the mmu_lock taken for write. kvm_mmu_uninit_tdp_mmu() is invoked either before or after executing any atomic operations on SPTEs by vCPU threads. Therefore, it will not impact vCPU threads performance if kvm_tdp_mmu_zap_invalidated_roots() acquires mmu_lock for write to zap invalid roots. Co-developed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Message-ID: <20240718211230.1492011-2-rick.p.edgecombe@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-23KVM: guest_memfd: Remove RCU-protected attribute from slot->gmem.fileYan Zhao
Remove the RCU-protected attribute from slot->gmem.file. No need to use RCU primitives rcu_assign_pointer()/synchronize_rcu() to update this pointer. - slot->gmem.file is updated in 3 places: kvm_gmem_bind(), kvm_gmem_unbind(), kvm_gmem_release(). All of them are protected by kvm->slots_lock. - slot->gmem.file is read in 2 paths: (1) kvm_gmem_populate kvm_gmem_get_file __kvm_gmem_get_pfn (2) kvm_gmem_get_pfn kvm_gmem_get_file __kvm_gmem_get_pfn Path (1) kvm_gmem_populate() requires holding kvm->slots_lock, so slot->gmem.file is protected by the kvm->slots_lock in this path. Path (2) kvm_gmem_get_pfn() does not require holding kvm->slots_lock. However, it's also not guarded by rcu_read_lock() and rcu_read_unlock(). So synchronize_rcu() in kvm_gmem_unbind()/kvm_gmem_release() actually will not wait for the readers in kvm_gmem_get_pfn() due to lack of RCU read-side critical section. The path (2) kvm_gmem_get_pfn() is safe without RCU protection because: a) kvm_gmem_bind() is called on a new memslot, before the memslot is visible to kvm_gmem_get_pfn(). b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot occur after the memslot is no longer visible to kvm_gmem_get_pfn(). c) get_file_active() ensures that kvm_gmem_get_pfn() will not access the stale file if kvm_gmem_release() sets it to NULL. This is because if kvm_gmem_release() occurs before kvm_gmem_get_pfn(), get_file_active() will return NULL; if get_file_active() does not return NULL, kvm_gmem_release() should not occur until after kvm_gmem_get_pfn() releases the file reference. Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Message-ID: <20241104084303.29909-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-22KVM: x86: Refactor __kvm_emulate_hypercall() into a macroPaolo Bonzini
Rework __kvm_emulate_hypercall() into a macro so that completion of hypercalls that don't exit to userspace use direct function calls to the completion helper, i.e. don't trigger a retpoline when RETPOLINE=y. Opportunistically take the names of the input registers, as opposed to taking the input values, to preemptively dedup more of the calling code (TDX needs to use different registers). Use the direct GPR accessors to read values to avoid the pointless marking of the registers as available (KVM requires GPRs to always be available). Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-ID: <20241128004344.4072099-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-22KVM: x86: Always complete hypercall via function callbackSean Christopherson
Finish "emulation" of KVM hypercalls by function callback, even when the hypercall is handled entirely within KVM, i.e. doesn't require an exit to userspace, and refactor __kvm_emulate_hypercall()'s return value to *only* communicate whether or not KVM should exit to userspace or resume the guest. (Ab)Use vcpu->run->hypercall.ret to propagate the return value to the callback, purely to avoid having to add a trampoline for every completion callback. Using the function return value for KVM's control flow eliminates the multiplexed return value, where '0' for KVM_HC_MAP_GPA_RANGE (and only that hypercall) means "exit to userspace". Note, the unnecessary extra indirect call and thus potential retpoline will be eliminated in the near future by converting the intermediate layer to a macro. Suggested-by: Binbin Wu <binbin.wu@linux.intel.com> Suggested-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-ID: <20241128004344.4072099-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-22KVM: x86: Bump hypercall stat prior to fully completing hypercallSean Christopherson
Increment the "hypercalls" stat for KVM hypercalls as soon as KVM knows it will skip the guest instruction, i.e. once KVM is committed to emulating the hypercall. Waiting until completion adds no known value, and creates a discrepancy where the stat will be bumped if KVM exits to userspace as a result of trying to skip the instruction, but not if the hypercall itself exits. Handling the stat in common code will also avoid the need for another helper to dedup code when TDX comes along (TDX needs a separate completion path due to GPR usage differences). Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20241128004344.4072099-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-22KVM: x86: Move "emulate hypercall" function declarations to x86.hSean Christopherson
Move the declarations for the hypercall emulation APIs to x86.h. While the helpers are exported, they are intended to be consumed only by KVM vendor modules, i.e. don't need to be exposed to the kernel at-large. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20241128004344.4072099-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-22KVM: x86: Add a helper to check for user interception of KVM hypercallsBinbin Wu
Add and use user_exit_on_hypercall() to check if userspace wants to handle a KVM hypercall instead of open-coding the logic everywhere. No functional change intended. Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> [sean: squash into one patch, keep explicit KVM_HC_MAP_GPA_RANGE check] Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Message-ID: <20241128004344.4072099-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-22KVM: x86: clear vcpu->run->hypercall.ret before exiting for KVM_EXIT_HYPERCALLPaolo Bonzini
QEMU up to 9.2.0 is assuming that vcpu->run->hypercall.ret is 0 on exit and it never modifies it when processing KVM_EXIT_HYPERCALL. Make this explicit in the code, to avoid breakage when KVM starts modifying that field. This in principle is not a good idea... It would have been much better if KVM had set the field to -KVM_ENOSYS from the beginning, so that a dumb userspace that does nothing on KVM_EXIT_HYPERCALL would tell the guest it does not support KVM_HC_MAP_GPA_RANGE. However, breaking userspace is a Very Bad Thing, as everybody should know. Reported-by: Binbin Wu <binbin.wu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-22Merge tag 'kvm-x86-fixes-6.13-rcN' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 fixes for 6.13: - Disable AVIC on SNP-enabled systems that don't allow writes to the virtual APIC page, as such hosts will hit unexpected RMP #PFs in the host when running VMs of any flavor. - Fix a WARN in the hypercall completion path due to KVM trying to determine if a guest with protected register state is in 64-bit mode (KVM's ABI is to assume such guests only make hypercalls in 64-bit mode). - Allow the guest to write to supported bits in MSR_AMD64_DE_CFG to fix a regression with Windows guests, and because KVM's read-only behavior appears to be entirely made up. - Treat TDP MMU faults as spurious if the faulting access is allowed given the existing SPTE. This fixes a benign WARN (other than the WARN itself) due to unexpectedly replacing a writable SPTE with a read-only SPTE.
2024-12-19KVM: x86/mmu: Treat TDP MMU faults as spurious if access is already allowedSean Christopherson
Treat slow-path TDP MMU faults as spurious if the access is allowed given the existing SPTE to fix a benign warning (other than the WARN itself) due to replacing a writable SPTE with a read-only SPTE, and to avoid the unnecessary LOCK CMPXCHG and subsequent TLB flush. If a read fault races with a write fault, fast GUP fails for any reason when trying to "promote" the read fault to a writable mapping, and KVM resolves the write fault first, then KVM will end up trying to install a read-only SPTE (for a !map_writable fault) overtop a writable SPTE. Note, it's not entirely clear why fast GUP fails, or if that's even how KVM ends up with a !map_writable fault with a writable SPTE. If something else is going awry, e.g. due to a bug in mmu_notifiers, then treating read faults as spurious in this scenario could effectively mask the underlying problem. However, retrying the faulting access instead of overwriting an existing SPTE is functionally correct and desirable irrespective of the WARN, and fast GUP _can_ legitimately fail with a writable VMA, e.g. if the Accessed bit in primary MMU's PTE is toggled and causes a PTE value mismatch. The WARN was also recently added, specifically to track down scenarios where KVM is unnecessarily overwrites SPTEs, i.e. treating the fault as spurious doesn't regress KVM's bug-finding capabilities in any way. In short, letting the WARN linger because there's a tiny chance it's due to a bug elsewhere would be excessively paranoid. Fixes: 1a175082b190 ("KVM: x86/mmu: WARN and flush if resolving a TDP MMU fault clears MMU-writable") Reported-by: Lei Yang <leiyang@redhat.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219588 Tested-by: Lei Yang <leiyang@redhat.com> Link: https://lore.kernel.org/r/20241218213611.3181643-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19KVM: SVM: Allow guest writes to set MSR_AMD64_DE_CFG bitsSean Christopherson
Drop KVM's arbitrary behavior of making DE_CFG.LFENCE_SERIALIZE read-only for the guest, as rejecting writes can lead to guest crashes, e.g. Windows in particular doesn't gracefully handle unexpected #GPs on the WRMSR, and nothing in the AMD manuals suggests that LFENCE_SERIALIZE is read-only _if it exists_. KVM only allows LFENCE_SERIALIZE to be set, by the guest or host, if the underlying CPU has X86_FEATURE_LFENCE_RDTSC, i.e. if LFENCE is guaranteed to be serializing. So if the guest sets LFENCE_SERIALIZE, KVM will provide the desired/correct behavior without any additional action (the guest's value is never stuffed into hardware). And having LFENCE be serializing even when it's not _required_ to be is a-ok from a functional perspective. Fixes: 74a0e79df68a ("KVM: SVM: Disallow guest from changing userspace's MSR_AMD64_DE_CFG value") Fixes: d1d93fa90f1a ("KVM: SVM: Add MSR-based feature support for serializing LFENCE") Reported-by: Simon Pilkington <simonp.git@mailbox.org> Closes: https://lore.kernel.org/all/52914da7-a97b-45ad-86a0-affdf8266c61@mailbox.org Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20241211172952.1477605-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19KVM: x86: Play nice with protected guests in complete_hypercall_exit()Sean Christopherson
Use is_64_bit_hypercall() instead of is_64_bit_mode() to detect a 64-bit hypercall when completing said hypercall. For guests with protected state, e.g. SEV-ES and SEV-SNP, KVM must assume the hypercall was made in 64-bit mode as the vCPU state needed to detect 64-bit mode is unavailable. Hacking the sev_smoke_test selftest to generate a KVM_HC_MAP_GPA_RANGE hypercall via VMGEXIT trips the WARN: ------------[ cut here ]------------ WARNING: CPU: 273 PID: 326626 at arch/x86/kvm/x86.h:180 complete_hypercall_exit+0x44/0xe0 [kvm] Modules linked in: kvm_amd kvm ... [last unloaded: kvm] CPU: 273 UID: 0 PID: 326626 Comm: sev_smoke_test Not tainted 6.12.0-smp--392e932fa0f3-feat #470 Hardware name: Google Astoria/astoria, BIOS 0.20240617.0-0 06/17/2024 RIP: 0010:complete_hypercall_exit+0x44/0xe0 [kvm] Call Trace: <TASK> kvm_arch_vcpu_ioctl_run+0x2400/0x2720 [kvm] kvm_vcpu_ioctl+0x54f/0x630 [kvm] __se_sys_ioctl+0x6b/0xc0 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> ---[ end trace 0000000000000000 ]--- Fixes: b5aead0064f3 ("KVM: x86: Assume a 64-bit hypercall for guests with protected state") Cc: stable@vger.kernel.org Cc: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20241128004344.4072099-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19KVM: SVM: Disable AVIC on SNP-enabled system without HvInUseWrAllowed featureSuravee Suthikulpanit
On SNP-enabled system, VMRUN marks AVIC Backing Page as in-use while the guest is running for both secure and non-secure guest. Any hypervisor write to the in-use vCPU's AVIC backing page (e.g. to inject an interrupt) will generate unexpected #PF in the host. Currently, attempt to run AVIC guest would result in the following error: BUG: unable to handle page fault for address: ff3a442e549cc270 #PF: supervisor write access in kernel mode #PF: error_code(0x80000003) - RMP violation PGD b6ee01067 P4D b6ee02067 PUD 10096d063 PMD 11c540063 PTE 80000001149cc163 SEV-SNP: PFN 0x1149cc unassigned, dumping non-zero entries in 2M PFN region: [0x114800 - 0x114a00] ... Newer AMD system is enhanced to allow hypervisor to modify the backing page for non-secure guest on SNP-enabled system. This enhancement is available when the CPUID Fn8000_001F_EAX bit 30 is set (HvInUseWrAllowed). This table describes AVIC support matrix w.r.t. SNP enablement: | Non-SNP system | SNP system ----------------------------------------------------- Non-SNP guest | AVIC Activate | AVIC Activate iff | | HvInuseWrAllowed=1 ----------------------------------------------------- SNP guest | N/A | Secure AVIC Therefore, check and disable AVIC in kvm_amd driver when the feature is not available on SNP-enabled system. See the AMD64 Architecture Programmer’s Manual (APM) Volume 2 for detail. (https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/ programmer-references/40332.pdf) Fixes: 216d106c7ff7 ("x86/sev: Add SEV-SNP host initialization support") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20241104075845.7583-1-suravee.suthikulpanit@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19Merge tag 'kvm-selftests-treewide-6.14' of https://github.com/kvm-x86/linux ↵Paolo Bonzini
into HEAD KVM selftests "tree"-wide changes for 6.14: - Rework vcpu_get_reg() to return a value instead of using an out-param, and update all affected arch code accordingly. - Convert the max_guest_memory_test into a more generic mmu_stress_test. The basic gist of the "conversion" is to have the test do mprotect() on guest memory while vCPUs are accessing said memory, e.g. to verify KVM and mmu_notifiers are working as intended. - Play nice with treewrite builds of unsupported architectures, e.g. arm (32-bit), as KVM selftests' Makefile doesn't do anything to ensure the target architecture is actually one KVM selftests supports. - Use the kernel's $(ARCH) definition instead of the target triple for arch specific directories, e.g. arm64 instead of aarch64, mainly so as not to be different from the rest of the kernel.
2024-12-18KVM: selftests: Override ARCH for x86_64 instead of using ARCH_DIRSean Christopherson
Now that KVM selftests uses the kernel's canonical arch paths, directly override ARCH to 'x86' when targeting x86_64 instead of defining ARCH_DIR to redirect to appropriate paths. ARCH_DIR was originally added to deal with KVM selftests using the target triple ARCH for directories, e.g. s390x and aarch64; keeping it around just to deal with the one-off alias from x86_64=>x86 is unnecessary and confusing. Note, even when selftests are built from the top-level Makefile, ARCH is scoped to KVM's makefiles, i.e. overriding ARCH won't trip up some other selftests that (somehow) expects x86_64 and can't work with x86. Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20241128005547.4077116-17-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: selftests: Use canonical $(ARCH) paths for KVM selftests directoriesSean Christopherson
Use the kernel's canonical $(ARCH) paths instead of the raw target triple for KVM selftests directories. KVM selftests are quite nearly the only place in the entire kernel that using the target triple for directories, tools/testing/selftests/drivers/s390x being the lone holdout. Using the kernel's preferred nomenclature eliminates the minor, but annoying, friction of having to translate to KVM's selftests directories, e.g. for pattern matching, opening files, running selftests, etc. Opportunsitically delete file comments that reference the full path of the file, as they are obviously prone to becoming stale, and serve no known purpose. Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: selftests: Provide empty 'all' and 'clean' targets for unsupported ARCHsSean Christopherson
Provide empty targets for KVM selftests if the target architecture is unsupported to make it obvious which architectures are supported, and so that various side effects don't fail and/or do weird things, e.g. as is, "mkdir -p $(sort $(dir $(TEST_GEN_PROGS)))" fails due to a missing operand, and conversely, "$(shell mkdir -p $(sort $(OUTPUT)/$(ARCH_DIR) ..." will create an empty, useless directory for the unsupported architecture. Move the guts of the Makefile to Makefile.kvm so that it's easier to see that the if-statement effectively guards all of KVM selftests. Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: selftests: Verify KVM correctly handles mprotect(PROT_READ)Sean Christopherson
Add two phases to mmu_stress_test to verify that KVM correctly handles guest memory that was writable, and then made read-only in the primary MMU, and then made writable again. Add bonus coverage for x86 and arm64 to verify that all of guest memory was marked read-only. Making forward progress (without making memory writable) requires arch specific code to skip over the faulting instruction, but the test can at least verify each vCPU's starting page was made read-only for other architectures. Link: https://lore.kernel.org/r/20241128005547.4077116-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: selftests: Add a read-only mprotect() phase to mmu_stress_testSean Christopherson
Add a third phase of mmu_stress_test to verify that mprotect()ing guest memory to make it read-only doesn't cause explosions, e.g. to verify KVM correctly handles the resulting mmu_notifier invalidations. Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: selftests: Precisely limit the number of guest loops in mmu_stress_testSean Christopherson
Run the exact number of guest loops required in mmu_stress_test instead of looping indefinitely in anticipation of adding more stages that run different code (e.g. reads instead of writes). Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: selftests: Use vcpu_arch_put_guest() in mmu_stress_testSean Christopherson
Use vcpu_arch_put_guest() to write memory from the guest in mmu_stress_test as an easy way to provide a bit of extra coverage. Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: selftests: Enable mmu_stress_test on arm64Sean Christopherson
Enable the mmu_stress_test on arm64. The intent was to enable the test across all architectures when it was first added, but a few goofs made it unrunnable on !x86. Now that those goofs are fixed, at least for arm64, enable the test. Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Marc Zyngier <maz@kernel.org> Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: sefltests: Explicitly include ucall_common.h in mmu_stress_test.cSean Christopherson
Explicitly include ucall_common.h in the MMU stress test, as unlike arm64 and x86-64, RISC-V doesn't include ucall_common.h in its processor.h, i.e. this will allow enabling the test on RISC-V. Reported-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: selftests: Compute number of extra pages needed in mmu_stress_testSean Christopherson
Create mmu_stress_tests's VM with the correct number of extra pages needed to map all of memory in the guest. The bug hasn't been noticed before as the test currently runs only on x86, which maps guest memory with 1GiB pages, i.e. doesn't need much memory in the guest for page tables. Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: selftests: Only muck with SREGS on x86 in mmu_stress_testSean Christopherson
Try to get/set SREGS in mmu_stress_test only when running on x86, as the ioctls are supported only by x86 and PPC, and the latter doesn't yet support KVM selftests. Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: selftests: Rename max_guest_memory_test to mmu_stress_testSean Christopherson
Rename max_guest_memory_test to mmu_stress_test so that the name isn't horribly misleading when future changes extend the test to verify things like mprotect() interactions, and because the test is useful even when its configured to populate far less than the maximum amount of guest memory. Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: selftests: Check for a potential unhandled exception iff KVM_RUN succeededSean Christopherson
Don't check for an unhandled exception if KVM_RUN failed, e.g. if it returned errno=EFAULT, as reporting unhandled exceptions is done via a ucall, i.e. requires KVM_RUN to exit cleanly. Theoretically, checking for a ucall on a failed KVM_RUN could get a false positive, e.g. if there were stale data in vcpu->run from a previous exit. Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: selftests: Assert that vcpu_{g,s}et_reg() won't truncateSean Christopherson
Assert that the register being read/written by vcpu_{g,s}et_reg() is no larger than a uint64_t, i.e. that a selftest isn't unintentionally truncating the value being read/written. Ideally, the assert would be done at compile-time, but that would limit the checks to hardcoded accesses and/or require fancier compile-time assertion infrastructure to filter out dynamic usage. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: selftests: Return a value from vcpu_get_reg() instead of using an out-paramSean Christopherson
Return a uint64_t from vcpu_get_reg() instead of having the caller provide a pointer to storage, as none of the vcpu_get_reg() usage in KVM selftests accesses a register larger than 64 bits, and vcpu_set_reg() only accepts a 64-bit value. If a use case comes along that needs to get a register that is larger than 64 bits, then a utility can be added to assert success and take a void pointer, but until then, forcing an out param yields ugly code and prevents feeding the output of vcpu_get_reg() into vcpu_set_reg(). Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20241128005547.4077116-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-17KVM: Move KVM_REG_SIZE() definition to common uAPI headerSean Christopherson
Define KVM_REG_SIZE() in the common kvm.h header, and delete the arm64 and RISC-V versions. As evidenced by the surrounding definitions, all aspects of the register size encoding are generic, i.e. RISC-V should have moved arm64's definition to common code instead of copy+pasting. Acked-by: Anup Patel <anup@brainfault.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20241128005547.4077116-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-15Linux 6.13-rc3v6.13-rc3Linus Torvalds
2024-12-15Merge tag 'arc-6.13-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC fixes from Vineet Gupta: - Sundry build and misc fixes * tag 'arc-6.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: build: Try to guess GCC variant of cross compiler ARC: bpf: Correct conditional check in 'check_jmp_32' ARC: dts: Replace deprecated snps,nr-gpios property for snps,dw-apb-gpio-port devices ARC: build: Use __force to suppress per-CPU cmpxchg warnings ARC: fix reference of dependency for PAE40 config ARC: build: disallow invalid PAE40 + 4K page config arc: rename aux.h to arc_aux.h
2024-12-15Merge tag 'efi-fixes-for-v6.13-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: - Limit EFI zboot to GZIP and ZSTD before it comes in wider use - Fix inconsistent error when looking up a non-existent file in efivarfs with a name that does not adhere to the NAME-GUID format - Drop some unused code * tag 'efi-fixes-for-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi/esrt: remove esre_attribute::store() efivarfs: Fix error on non-existent file efi/zboot: Limit compression options to GZIP and ZSTD
2024-12-15Merge tag 'i2c-for-6.13-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "i2c host fixes: PNX used the wrong unit for timeouts, Nomadik was missing a sentinel, and RIIC was missing rounding up" * tag 'i2c-for-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: riic: Always round-up when calculating bus period i2c: nomadik: Add missing sentinel to match table i2c: pnx: Fix timeout in wait functions
2024-12-15Merge tag 'edac_urgent_for_v6.13_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fix from Borislav Petkov: - Make sure amd64_edac loads successfully on certain Zen4 memory configurations * tag 'edac_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/amd64: Simplify ECC check on unified memory controllers
2024-12-15Merge tag 'irq_urgent_for_v6.13_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Disable the secure programming interface of the GIC500 chip in the RK3399 SoC to fix interrupt priority assignment and even make a dead machine boot again when the gic-v3 driver enables pseudo NMIs - Correct the declaration of a percpu variable to fix several sparse warnings * tag 'irq_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-v3: Work around insecure GIC integrations irqchip/gic: Correct declaration of *percpu_base pointer in union gic_base
2024-12-15Merge tag 'sched_urgent_for_v6.13_rc3-p2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Prevent incorrect dequeueing of the deadline dlserver helper task and fix its time accounting - Properly track the CFS runqueue runnable stats - Check the total number of all queued tasks in a sched fair's runqueue hierarchy before deciding to stop the tick - Fix the scheduling of the task that got woken last (NEXT_BUDDY) by preventing those from being delayed * tag 'sched_urgent_for_v6.13_rc3-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/dlserver: Fix dlserver time accounting sched/dlserver: Fix dlserver double enqueue sched/eevdf: More PELT vs DELAYED_DEQUEUE sched/fair: Fix sched_can_stop_tick() for fair tasks sched/fair: Fix NEXT_BUDDY
2024-12-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM64: - Fix confusion with implicitly-shifted MDCR_EL2 masks breaking SPE/TRBE initialization - Align nested page table walker with the intended memory attribute combining rules of the architecture - Prevent userspace from constraining the advertised ASID width, avoiding horrors of guest TLBIs not matching the intended context in hardware - Don't leak references on LPIs when insertion into the translation cache fails RISC-V: - Replace csr_write() with csr_set() for HVIEN PMU overflow bit x86: - Cache CPUID.0xD XSTATE offsets+sizes during module init On Intel's Emerald Rapids CPUID costs hundreds of cycles and there are a lot of leaves under 0xD. Getting rid of the CPUIDs during nested VM-Enter and VM-Exit is planned for the next release, for now just cache them: even on Skylake that is 40% faster" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Cache CPUID.0xD XSTATE offsets+sizes during module init RISC-V: KVM: Fix csr_write -> csr_set for HVIEN PMU overflow bit KVM: arm64: vgic-its: Add error handling in vgic_its_cache_translation KVM: arm64: Do not allow ID_AA64MMFR0_EL1.ASIDbits to be overridden KVM: arm64: Fix S1/S2 combination when FWB==1 and S2 has Device memory type arm64: Fix usage of new shifted MDCR_EL2 values
2024-12-14Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fix from James Bottomley: "Single one-line fix in the ufs driver" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: core: Update compl_time_stamp_local_clock after completing a cqe
2024-12-14Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull bpf fixes from Daniel Borkmann: - Fix a bug in the BPF verifier to track changes to packet data property for global functions (Eduard Zingerman) - Fix a theoretical BPF prog_array use-after-free in RCU handling of __uprobe_perf_func (Jann Horn) - Fix BPF tracing to have an explicit list of tracepoints and their arguments which need to be annotated as PTR_MAYBE_NULL (Kumar Kartikeya Dwivedi) - Fix a logic bug in the bpf_remove_insns code where a potential error would have been wrongly propagated (Anton Protopopov) - Avoid deadlock scenarios caused by nested kprobe and fentry BPF programs (Priya Bala Govindasamy) - Fix a bug in BPF verifier which was missing a size check for BTF-based context access (Kumar Kartikeya Dwivedi) - Fix a crash found by syzbot through an invalid BPF prog_array access in perf_event_detach_bpf_prog (Jiri Olsa) - Fix several BPF sockmap bugs including a race causing a refcount imbalance upon element replace (Michal Luczaj) - Fix a use-after-free from mismatching BPF program/attachment RCU flavors (Jann Horn) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (23 commits) bpf: Avoid deadlock caused by nested kprobe and fentry bpf programs selftests/bpf: Add tests for raw_tp NULL args bpf: Augment raw_tp arguments with PTR_MAYBE_NULL bpf: Revert "bpf: Mark raw_tp arguments with PTR_MAYBE_NULL" selftests/bpf: Add test for narrow ctx load for pointer args bpf: Check size for BTF-based ctx access of pointer members selftests/bpf: extend changes_pkt_data with cases w/o subprograms bpf: fix null dereference when computing changes_pkt_data of prog w/o subprogs bpf: Fix theoretical prog_array UAF in __uprobe_perf_func() bpf: fix potential error return selftests/bpf: validate that tail call invalidates packet pointers bpf: consider that tail calls invalidate packet pointers selftests/bpf: freplace tests for tracking of changes_packet_data bpf: check changes_pkt_data property for extension programs selftests/bpf: test for changing packet data from global functions bpf: track changes_pkt_data property for global functions bpf: refactor bpf_helper_changes_pkt_data to use helper number bpf: add find_containing_subprog() utility function bpf,perf: Fix invalid prog_array access in perf_event_detach_bpf_prog bpf: Fix UAF via mismatching bpf_prog/attachment RCU flavors ...
2024-12-14bpf: Avoid deadlock caused by nested kprobe and fentry bpf programsPriya Bala Govindasamy
BPF program types like kprobe and fentry can cause deadlocks in certain situations. If a function takes a lock and one of these bpf programs is hooked to some point in the function's critical section, and if the bpf program tries to call the same function and take the same lock it will lead to deadlock. These situations have been reported in the following bug reports. In percpu_freelist - Link: https://lore.kernel.org/bpf/CAADnVQLAHwsa+2C6j9+UC6ScrDaN9Fjqv1WjB1pP9AzJLhKuLQ@mail.gmail.com/T/ Link: https://lore.kernel.org/bpf/CAPPBnEYm+9zduStsZaDnq93q1jPLqO-PiKX9jy0MuL8LCXmCrQ@mail.gmail.com/T/ In bpf_lru_list - Link: https://lore.kernel.org/bpf/CAPPBnEajj+DMfiR_WRWU5=6A7KKULdB5Rob_NJopFLWF+i9gCA@mail.gmail.com/T/ Link: https://lore.kernel.org/bpf/CAPPBnEZQDVN6VqnQXvVqGoB+ukOtHGZ9b9U0OLJJYvRoSsMY_g@mail.gmail.com/T/ Link: https://lore.kernel.org/bpf/CAPPBnEaCB1rFAYU7Wf8UxqcqOWKmRPU1Nuzk3_oLk6qXR7LBOA@mail.gmail.com/T/ Similar bugs have been reported by syzbot. In queue_stack_maps - Link: https://lore.kernel.org/lkml/0000000000004c3fc90615f37756@google.com/ Link: https://lore.kernel.org/all/20240418230932.2689-1-hdanton@sina.com/T/ In lpm_trie - Link: https://lore.kernel.org/linux-kernel/00000000000035168a061a47fa38@google.com/T/ In ringbuf - Link: https://lore.kernel.org/bpf/20240313121345.2292-1-hdanton@sina.com/T/ Prevent kprobe and fentry bpf programs from attaching to these critical sections by removing CC_FLAGS_FTRACE for percpu_freelist.o, bpf_lru_list.o, queue_stack_maps.o, lpm_trie.o, ringbuf.o files. The bugs reported by syzbot are due to tracepoint bpf programs being called in the critical sections. This patch does not aim to fix deadlocks caused by tracepoint programs. However, it does prevent deadlocks from occurring in similar situations due to kprobe and fentry programs. Signed-off-by: Priya Bala Govindasamy <pgovind2@uci.edu> Link: https://lore.kernel.org/r/CAPPBnEZpjGnsuA26Mf9kYibSaGLm=oF6=12L21X1GEQdqjLnzQ@mail.gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-14Merge tag 'usb-6.13-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB driver fixes from Greg KH: "Here are some small USB driver fixes for some reported issues. Included in here are: - typec driver bugfixes - u_serial gadget driver bugfix for much reported and discussed issue - dwc2 bugfixes - midi gadget driver bugfix - ehci-hcd driver bugfix - other small bugfixes All of these have been in linux-next for over a week with no reported issues" * tag 'usb-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: typec: ucsi: Fix connector status writing past buffer size usb: typec: ucsi: Fix completion notifications usb: dwc2: Fix HCD port connection race usb: dwc2: hcd: Fix GetPortStatus & SetPortFeature usb: dwc2: Fix HCD resume usb: gadget: u_serial: Fix the issue that gs_start_io crashed due to accessing null pointer usb: misc: onboard_usb_dev: skip suspend/resume sequence for USB5744 SMBus support usb: dwc3: xilinx: make sure pipe clock is deselected in usb2 only mode usb: core: hcd: only check primary hcd skip_phy_initialization usb: gadget: midi2: Fix interpretation of is_midi1 bits usb: dwc3: imx8mp: fix software node kernel dump usb: typec: anx7411: fix OF node reference leaks in anx7411_typec_switch_probe() usb: typec: anx7411: fix fwnode_handle reference leak usb: host: max3421-hcd: Correctly abort a USB request. dt-bindings: phy: imx8mq-usb: correct reference to usb-switch.yaml usb: ehci-hcd: fix call balance of clocks handling routines
2024-12-14Merge tag 'tty-6.13-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull serial driver fixes from Greg KH: "Here are two small serial driver fixes for 6.13-rc3. They are: - ioport build fallout fix for the 8250 port driver that should resolve Guenter's runtime problems - sh-sci driver bugfix for a reported problem Both of these have been in linux-next for a while with no reported issues" * tag 'tty-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: tty: serial: Work around warning backtrace in serial8250_set_defaults serial: sh-sci: Check if TX data was written to device in .tx_empty()
2024-12-14Merge tag 'staging-6.13-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver fixes from Greg KH: "Here are some small staging gpib driver build and bugfixes for issues that have been much-reported (should finally fix Guenter's build issues). There are more of these coming in later -rc releases, but for now this should fix the majority of the reported problems. All of these have been in linux-next for a while with no reported issues" * tag 'staging-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: gpib: Fix i386 build issue staging: gpib: Fix faulty workaround for assignment in if staging: gpib: Workaround for ppc build failure staging: gpib: Make GPIB_NI_PCI_ISA depend on HAS_IOPORT
2024-12-14Merge tag 'v6.13-p2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "Fix a regression in rsassa-pkcs1 as well as a buffer overrun in hisilicon/debugfs" * tag 'v6.13-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: hisilicon/debugfs - fix the struct pointer incorrectly offset problem crypto: rsassa-pkcs1 - Copy source data for SG list
2024-12-14Merge tag 'rust-fixes-6.13' of https://github.com/Rust-for-Linux/linuxLinus Torvalds
Pull rust fixes from Miguel Ojeda: "Toolchain and infrastructure: - Set bindgen's Rust target version to prevent issues when pairing older rustc releases with newer bindgen releases, such as bindgen >= 0.71.0 and rustc < 1.82 due to unsafe_extern_blocks. drm/panic: - Remove spurious empty line detected by a new Clippy warning" * tag 'rust-fixes-6.13' of https://github.com/Rust-for-Linux/linux: rust: kbuild: set `bindgen`'s Rust target version drm/panic: remove spurious empty line to clean warning
2024-12-14Merge tag 'iommu-fixes-v6.13-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fixes from Joerg Roedel: - Per-domain device-list locking fixes for the AMD IOMMU driver - Fix incorrect use of smp_processor_id() in the NVidia-specific part of the ARM-SMMU-v3 driver - Intel IOMMU driver fixes: - Remove cache tags before disabling ATS - Avoid draining PRQ in sva mm release path - Fix qi_batch NULL pointer with nested parent domain * tag 'iommu-fixes-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu/vt-d: Avoid draining PRQ in sva mm release path iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain iommu/vt-d: Remove cache tags before disabling ATS iommu/amd: Add lockdep asserts for domain->dev_list iommu/amd: Put list_add/del(dev_data) back under the domain->lock iommu/tegra241-cmdqv: do not use smp_processor_id in preemptible context
2024-12-14Merge tag 'ata-6.13-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux Pull ata fix from Damien Le Moal: - Fix an OF node reference leak in the sata_highbank driver * tag 'ata-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: ata: sata_highbank: fix OF node reference leak in highbank_initialize_phys()
2024-12-14Merge tag 'i2c-host-fixes-6.13-rc3' of ↵Wolfram Sang
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current i2c-host-fixes for v6.13-rc3 - Replaced jiffies with msec for timeout calculations. - Added a sentinel to the 'of_device_id' array in Nomadik. - Rounded up bus period calculation in RIIC.
2024-12-13Merge tag '6.13-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - fix rmmod leak - two minor cleanups - fix for unlink/rename with pending i/o * tag '6.13-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: destroy cfid_put_wq on module exit cifs: Use str_yes_no() helper in cifs_ses_add_channel() cifs: Fix rmdir failure due to ongoing I/O on deleted file smb3: fix compiler warning in reparse code