summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-03-16KVM: x86/mmu: Check mmu->sync_page pointer in kvm_sync_page_check()Lai Jiangshan
Assert that mmu->sync_page is non-NULL as part of the sanity checks performed before attempting to sync a shadow page. Explicitly checking mmu->sync_page is all but guaranteed to be redundant with the existing sanity check that the MMU is indirect, but the cost is negligible, and the explicit check also serves as documentation. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Link: https://lore.kernel.org/r/20230216154115.710033-4-jiangshanlai@gmail.com [sean: increase verbosity of changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16KVM: x86/mmu: Move the check in FNAME(sync_page) as kvm_sync_page_check()Lai Jiangshan
Prepare to check mmu->sync_page pointer before calling it. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Link: https://lore.kernel.org/r/20230216154115.710033-3-jiangshanlai@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16KVM: x86/mmu: Use 64-bit address to invalidate to fix a subtle bugLai Jiangshan
FNAME(invlpg)() and kvm_mmu_invalidate_gva() take a gva_t, i.e. unsigned long, as the type of the address to invalidate. On 32-bit kernels, the upper 32 bits of the GPA will get dropped when an L2 GPA address is invalidated in the shadowed nested TDP MMU. Convert it to u64 to fix the problem. Reported-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Link: https://lore.kernel.org/r/20230216154115.710033-2-jiangshanlai@gmail.com [sean: tweak changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16KVM: Change return type of kvm_arch_vm_ioctl() to "int"Thomas Huth
All kvm_arch_vm_ioctl() implementations now only deal with "int" types as return values, so we can change the return type of these functions to use "int" instead of "long". Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Anup Patel <anup@brainfault.org> Message-Id: <20230208140105.655814-7-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16KVM: Standardize on "int" return types instead of "long" in kvm_main.cThomas Huth
KVM functions use "long" return values for functions that are wired up to "struct file_operations", but otherwise use "int" return values for functions that can return 0/-errno in order to avoid unintentional divergences between 32-bit and 64-bit kernels. Some code still uses "long" in unnecessary spots, though, which can cause a little bit of confusion and unnecessary size casts. Let's change these spots to use "int" types, too. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230208140105.655814-6-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16KVM: arm64: Limit length in kvm_vm_ioctl_mte_copy_tags() to INT_MAXThomas Huth
In case of success, this function returns the amount of handled bytes. However, this does not work for large values: The function is called from kvm_arch_vm_ioctl() (which still returns a long), which in turn is called from kvm_vm_ioctl() in virt/kvm/kvm_main.c. And that function stores the return value in an "int r" variable. So the upper 32-bits of the "long" return value are lost there. KVM ioctl functions should only return "int" values, so let's limit the amount of bytes that can be requested here to INT_MAX to avoid the problem with the truncated return value. We can then also change the return type of the function to "int" to make it clearer that it is not possible to return a "long" here. Fixes: f0376edb1ddc ("KVM: arm64: Add ioctl to fetch/store tags in a guest") Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Steven Price <steven.price@arm.com> Message-Id: <20230208140105.655814-5-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16KVM: x86: Remove the KVM_GET_NR_MMU_PAGES ioctlThomas Huth
The KVM_GET_NR_MMU_PAGES ioctl is quite questionable on 64-bit hosts since it fails to return the full 64 bits of the value that can be set with the corresponding KVM_SET_NR_MMU_PAGES call. Its "long" return value is truncated into an "int" in the kvm_arch_vm_ioctl() function. Since this ioctl also never has been used by userspace applications (QEMU, Google's internal VMM, kvmtool and CrosVM have been checked), it's likely the best if we remove this badly designed ioctl before anybody really tries to use it. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230208140105.655814-4-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16KVM: s390: Use "int" as return type for kvm_s390_get/set_skeys()Thomas Huth
These two functions only return normal integers, so it does not make sense to declare the return type as "long" here. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230208140105.655814-3-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16KVM: PPC: Standardize on "int" return types in the powerpc KVM codeThomas Huth
Most functions that are related to kvm_arch_vm_ioctl() already use "int" as return type to pass error values back to the caller. Some outlier functions use "long" instead for no good reason (they do not really require long values here). Let's standardize on "int" here to avoid casting the values back and forth between the two types. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230208140105.655814-2-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16kvm: x86: Advertise FLUSH_L1D to user spaceEmanuele Giuseppe Esposito
FLUSH_L1D was already added in 11e34e64e4103, but the feature is not visible to userspace yet. The bit definition: CPUID.(EAX=7,ECX=0):EDX[bit 28] If the feature is supported by the host, kvm should support it too so that userspace can choose whether to expose it to the guest or not. One disadvantage of not exposing it is that the guest will report a non existing vulnerability in /sys/devices/system/cpu/vulnerabilities/mmio_stale_data because the mitigation is present only if the guest supports (FLUSH_L1D and MD_CLEAR) or FB_CLEAR. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20230201132905.549148-4-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16kvm: svm: Add IA32_FLUSH_CMD guest supportEmanuele Giuseppe Esposito
Expose IA32_FLUSH_CMD to the guest if the guest CPUID enumerates support for this MSR. As with IA32_PRED_CMD, permission for unintercepted writes to this MSR will be granted to the guest after the first non-zero write. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20230201132905.549148-3-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16kvm: vmx: Add IA32_FLUSH_CMD guest supportEmanuele Giuseppe Esposito
Expose IA32_FLUSH_CMD to the guest if the guest CPUID enumerates support for this MSR. As with IA32_PRED_CMD, permission for unintercepted writes to this MSR will be granted to the guest after the first non-zero write. Co-developed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20230201132905.549148-2-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: VMX: Rename "KVM is using eVMCS" static key to match its wrapperSean Christopherson
Rename enable_evmcs to __kvm_is_using_evmcs to match its wrapper, and to avoid confusion with enabling eVMCS for nested virtualization, i.e. have "enable eVMCS" be reserved for "enable eVMCS support for L1". No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230211003534.564198-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: VMX: Stub out enable_evmcs static key for CONFIG_HYPERV=nSean Christopherson
Wrap enable_evmcs in a helper and stub it out when CONFIG_HYPERV=n in order to eliminate the static branch nop placeholders. clang-14 is clever enough to elide the nop, but gcc-12 is not. Stubbing out the key reduces the size of kvm-intel.ko by ~7.5% (200KiB) when compiled with gcc-12 (there are a _lot_ of VMCS accesses throughout KVM). Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230211003534.564198-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: nVMX: Move EVMCS1_SUPPORT_* macros to hyperv.cSean Christopherson
Move the macros that define the set of VMCS controls that are supported by eVMCS1 from hyperv.h to hyperv.c, i.e. make them "private". The macros should never be consumed directly by KVM at-large since the "final" set of supported controls depends on guest CPUID. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230211003534.564198-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: x86/mmu: Remove FNAME(is_self_change_mapping)Lai Jiangshan
Drop FNAME(is_self_change_mapping) and instead rely on kvm_mmu_hugepage_adjust() to adjust the hugepage accordingly. Prior to commit 4cd071d13c5c ("KVM: x86/mmu: Move calls to thp_adjust() down a level"), the hugepage adjustment was done before allocating new shadow pages, i.e. failed to restrict the hugepage sizes if a new shadow page resulted in account_shadowed() changing the disallowed hugepage tracking. Removing FNAME(is_self_change_mapping) fixes a bug reported by Huang Hang where KVM unnecessarily forces a 4KiB page. FNAME(is_self_change_mapping) has a defect in that it blindly disables _all_ hugepage mappings rather than trying to reduce the size of the hugepage. If the guest is writing to a 1GiB page and the 1GiB is self-referential but a 2MiB page is not, then KVM can and should create a 2MiB mapping. Add a comment above the call to kvm_mmu_hugepage_adjust() to call out the new dependency on adjusting the hugepage size after walking indirect PTEs. Reported-by: Huang Hang <hhuang@linux.alibaba.com> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Link: https://lore.kernel.org/r/20221213125538.81209-1-jiangshanlai@gmail.com [sean: rework changelog after separating out the emulator change] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230202182817.407394-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: x86/mmu: Detect write #PF to shadow pages during FNAME(fetch) walkLai Jiangshan
Move the detection of write #PF to shadow pages, i.e. a fault on a write to a page table that is being shadowed by KVM that is used to translate the write itself, from FNAME(is_self_change_mapping) to FNAME(fetch). There is no need to detect the self-referential write before kvm_faultin_pfn() as KVM does not consume EMULTYPE_WRITE_PF_TO_SP for accesses that resolve to "error or no-slot" pfns, i.e. KVM doesn't allow retrying MMIO accesses or writes to read-only memslots. Detecting the EMULTYPE_WRITE_PF_TO_SP scenario in FNAME(fetch) will allow dropping FNAME(is_self_change_mapping) entirely, as the hugepage interaction can be deferred to kvm_mmu_hugepage_adjust(). Cc: Huang Hang <hhuang@linux.alibaba.com> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Link: https://lore.kernel.org/r/20221213125538.81209-1-jiangshanlai@gmail.com [sean: split to separate patch, write changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230202182817.407394-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: x86/mmu: Use EMULTYPE flag to track write #PFs to shadow pagesSean Christopherson
Use a new EMULTYPE flag, EMULTYPE_WRITE_PF_TO_SP, to track page faults on self-changing writes to shadowed page tables instead of propagating that information to the emulator via a semi-persistent vCPU flag. Using a flag in "struct kvm_vcpu_arch" is confusing, especially as implemented, as it's not at all obvious that clearing the flag only when emulation actually occurs is correct. E.g. if KVM sets the flag and then retries the fault without ever getting to the emulator, the flag will be left set for future calls into the emulator. But because the flag is consumed if and only if both EMULTYPE_PF and EMULTYPE_ALLOW_RETRY_PF are set, and because EMULTYPE_ALLOW_RETRY_PF is deliberately not set for direct MMUs, emulated MMIO, or while L2 is active, KVM avoids false positives on a stale flag since FNAME(page_fault) is guaranteed to be run and refresh the flag before it's ultimately consumed by the tail end of reexecute_instruction(). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230202182817.407394-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: selftests: Sync KVM exit reasons in selftestsVipin Sharma
Add missing KVM_EXIT_* reasons in KVM selftests from include/uapi/linux/kvm.h Signed-off-by: Vipin Sharma <vipinsh@google.com> Message-Id: <20230204014547.583711-5-vipinsh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: selftests: Add macro to generate KVM exit reason stringsSean Christopherson
Add and use a macro to generate the KVM exit reason strings array instead of relying on developers to correctly copy+paste+edit each string. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230204014547.583711-4-vipinsh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: selftests: Print expected and actual exit reason in KVM exit reason assertVipin Sharma
Print what KVM exit reason a test was expecting and what it actually got int TEST_ASSERT_KVM_EXIT_REASON(). Signed-off-by: Vipin Sharma <vipinsh@google.com> Message-Id: <20230204014547.583711-3-vipinsh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: selftests: Make vCPU exit reason test assertion commonVipin Sharma
Make TEST_ASSERT_KVM_EXIT_REASON() macro and replace all exit reason test assert statements with it. No functional changes intended. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Message-Id: <20230204014547.583711-2-vipinsh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: selftests: Add EVTCHNOP_send slow path test to xen_shinfo_testDavid Woodhouse
When kvm_xen_evtchn_send() takes the slow path because the shinfo GPC needs to be revalidated, it used to violate the SRCU vs. kvm->lock locking rules and potentially cause a deadlock. Now that lockdep is learning to catch such things, make sure that code path is exercised by the selftest. Link: https://lore.kernel.org/all/20230113124606.10221-2-dwmw2@infradead.org Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230204024151.1373296-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: selftests: Use enum for test numbers in xen_shinfo_testDavid Woodhouse
The xen_shinfo_test started off with very few iterations, and the numbers we used in GUEST_SYNC() were precisely mapped to the RUNSTATE_xxx values anyway to start with. It has since grown quite a few more tests, and it's kind of awful to be handling them all as bare numbers. Especially when I want to add a new test in the middle. Define an enum for the test stages, and use it both in the guest code and the host switch statement. No functional change, if I can count to 24. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230204024151.1373296-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: selftests: Add helpers to make Xen-style VMCALL/VMMCALL hypercallsSean Christopherson
Add wrappers to do hypercalls using VMCALL/VMMCALL and Xen's register ABI (as opposed to full Xen-style hypercalls through a hypervisor provided page). Using the common helpers dedups a pile of code, and uses the native hypercall instruction when running on AMD. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230204024151.1373296-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: selftests: Move the guts of kvm_hypercall() to a separate macroSean Christopherson
Extract the guts of kvm_hypercall() to a macro so that Xen hypercalls, which have a different register ABI, can reuse the VMCALL vs. VMMCALL logic. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230204024151.1373296-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: SVM: WARN if GATag generation drops VM or vCPU ID informationSean Christopherson
WARN if generating a GATag given a VM ID and vCPU ID doesn't yield the same IDs when pulling the IDs back out of the tag. Don't bother adding error handling to callers, this is very much a paranoid sanity check as KVM fully controls the VM ID and is supposed to reject too-big vCPU IDs. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20230207002156.521736-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: SVM: Modify AVIC GATag to support max number of 512 vCPUsSuravee Suthikulpanit
Define AVIC_VCPU_ID_MASK based on AVIC_PHYSICAL_MAX_INDEX, i.e. the mask that effectively controls the largest guest physical APIC ID supported by x2AVIC, instead of hardcoding the number of bits to 8 (and the number of VM bits to 24). The AVIC GATag is programmed into the AMD IOMMU IRTE to provide a reference back to KVM in case the IOMMU cannot inject an interrupt into a non-running vCPU. In such a case, the IOMMU notifies software by creating a GALog entry with the corresponded GATag, and KVM then uses the GATag to find the correct VM+vCPU to kick. Dropping bit 8 from the GATag results in kicking the wrong vCPU when targeting vCPUs with x2APIC ID > 255. Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode") Cc: stable@vger.kernel.org Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20230207002156.521736-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: SVM: Fix a benign off-by-one bug in AVIC physical table maskSean Christopherson
Define the "physical table max index mask" as bits 8:0, not 9:0. x2AVIC currently supports a max of 512 entries, i.e. the max index is 511, and the inputs to GENMASK_ULL() are inclusive. The bug is benign as bit 9 is reserved and never set by KVM, i.e. KVM is just clearing bits that are guaranteed to be zero. Note, as of this writing, APM "Rev. 3.39-October 2022" incorrectly states that bits 11:8 are reserved in Table B-1. VMCB Layout, Control Area. I.e. that table wasn't updated when x2AVIC support was added. Opportunistically fix the comment for the max AVIC ID to align with the code, and clean up comment formatting too. Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode") Cc: stable@vger.kernel.org Cc: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20230207002156.521736-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14selftests: KVM: skip hugetlb tests if huge pages are not availablePaolo Bonzini
Right now, if KVM memory stress tests are run with hugetlb sources but hugetlb is not available (either in the kernel or because /proc/sys/vm/nr_hugepages is 0) the test will fail with a memory allocation error. This makes it impossible to add tests that default to hugetlb-backed memory, because on a machine with a default configuration they will fail. Therefore, check HugePages_Total as well and, if zero, direct the user to enable hugepages in procfs. Furthermore, return KSFT_SKIP whenever hugetlb is not available. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: VMX: Use tabs instead of spaces for indentationRong Tao
Code indentation should use tabs where possible and miss a '*'. Signed-off-by: Rong Tao <rongtao@cestc.cn> Message-Id: <tencent_A492CB3F9592578451154442830EA1B02C07@qq.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: VMX: Fix indentation coding style issueRong Tao
Code indentation should use tabs where possible. Signed-off-by: Rong Tao <rongtao@cestc.cn> Message-Id: <tencent_31E6ACADCB6915E157CF5113C41803212107@qq.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: nVMX: remove unnecessary #ifdefPaolo Bonzini
nested_vmx_check_controls() has already run by the time KVM checks host state, so the "host address space size" exit control can only be set on x86-64 hosts. Simplify the condition at the cost of adding some dead code to 32-bit kernels. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: nVMX: add missing consistency checks for CR0 and CR4Paolo Bonzini
The effective values of the guest CR0 and CR4 registers may differ from those included in the VMCS12. In particular, disabling EPT forces CR4.PAE=1 and disabling unrestricted guest mode forces CR0.PG=CR0.PE=1. Therefore, checks on these bits cannot be delegated to the processor and must be performed by KVM. Reported-by: Reima ISHII <ishiir@g.ecc.u-tokyo.ac.jp> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14Merge tag 'kvmarm-fixes-6.3-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.3, part #1 A single patch to address a rather annoying bug w.r.t. guest timer offsetting. Effectively the synchronization of timer offsets between vCPUs was broken, leading to inconsistent timer reads within the VM.
2023-03-12Linux 6.3-rc2v6.3-rc2Linus Torvalds
2023-03-12wifi: cfg80211: Partial revert "wifi: cfg80211: Fix use after free for wext"Hector Martin
This reverts part of commit 015b8cc5e7c4 ("wifi: cfg80211: Fix use after free for wext") This commit broke WPA offload by unconditionally clearing the crypto modes for non-WEP connections. Drop that part of the patch. Signed-off-by: Hector Martin <marcan@marcan.st> Reported-by: Ilya <me@0upti.me> Reported-and-tested-by: Janne Grunau <j@jannau.net> Reviewed-by: Eric Curtin <ecurtin@redhat.com> Fixes: 015b8cc5e7c4 ("wifi: cfg80211: Fix use after free for wext") Cc: stable@kernel.org Link: https://lore.kernel.org/linux-wireless/ZAx0TWRBlGfv7pNl@kroah.com/T/#m11e6e0915ab8fa19ce8bc9695ab288c0fe018edf Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-12Merge tag 'tpm-v6.3-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm fixes from Jarkko Sakkinen: "Two additional bug fixes for v6.3" * tag 'tpm-v6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm: disable hwrng for fTPM on some AMD designs tpm/eventlog: Don't abort tpm_read_log on faulty ACPI address
2023-03-12tpm: disable hwrng for fTPM on some AMD designsMario Limonciello
AMD has issued an advisory indicating that having fTPM enabled in BIOS can cause "stuttering" in the OS. This issue has been fixed in newer versions of the fTPM firmware, but it's up to system designers to decide whether to distribute it. This issue has existed for a while, but is more prevalent starting with kernel 6.1 because commit b006c439d58db ("hwrng: core - start hwrng kthread also for untrusted sources") started to use the fTPM for hwrng by default. However, all uses of /dev/hwrng result in unacceptable stuttering. So, simply disable registration of the defective hwrng when detecting these faulty fTPM versions. As this is caused by faulty firmware, it is plausible that such a problem could also be reproduced by other TPM interactions, but this hasn't been shown by any user's testing or reports. It is hypothesized to be triggered more frequently by the use of the RNG because userspace software will fetch random numbers regularly. Intentionally continue to register other TPM functionality so that users that rely upon PCR measurements or any storage of data will still have access to it. If it's found later that another TPM functionality is exacerbating this problem a module parameter it can be turned off entirely and a module parameter can be introduced to allow users who rely upon fTPM functionality to turn it on even though this problem is present. Link: https://www.amd.com/en/support/kb/faq/pa-410 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216989 Link: https://lore.kernel.org/all/20230209153120.261904-1-Jason@zx2c4.com/ Fixes: b006c439d58d ("hwrng: core - start hwrng kthread also for untrusted sources") Cc: stable@vger.kernel.org Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Tested-by: reach622@mailcuk.com Tested-by: Bell <1138267643@qq.com> Co-developed-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-03-12tpm/eventlog: Don't abort tpm_read_log on faulty ACPI addressMorten Linderud
tpm_read_log_acpi() should return -ENODEV when no eventlog from the ACPI table is found. If the firmware vendor includes an invalid log address we are unable to map from the ACPI memory and tpm_read_log() returns -EIO which would abort discovery of the eventlog. Change the return value from -EIO to -ENODEV when acpi_os_map_iomem() fails to map the event log. The following hardware was used to test this issue: Framework Laptop (Pre-production) BIOS: INSYDE Corp, Revision: 3.2 TPM Device: NTC, Firmware Revision: 7.2 Dump of the faulty ACPI TPM2 table: [000h 0000 4] Signature : "TPM2" [Trusted Platform Module hardware interface Table] [004h 0004 4] Table Length : 0000004C [008h 0008 1] Revision : 04 [009h 0009 1] Checksum : 2B [00Ah 0010 6] Oem ID : "INSYDE" [010h 0016 8] Oem Table ID : "TGL-ULT" [018h 0024 4] Oem Revision : 00000002 [01Ch 0028 4] Asl Compiler ID : "ACPI" [020h 0032 4] Asl Compiler Revision : 00040000 [024h 0036 2] Platform Class : 0000 [026h 0038 2] Reserved : 0000 [028h 0040 8] Control Address : 0000000000000000 [030h 0048 4] Start Method : 06 [Memory Mapped I/O] [034h 0052 12] Method Parameters : 00 00 00 00 00 00 00 00 00 00 00 00 [040h 0064 4] Minimum Log Length : 00010000 [044h 0068 8] Log Address : 000000004053D000 Fixes: 0cf577a03f21 ("tpm: Fix handling of missing event log") Tested-by: Erkki Eilonen <erkki@bearmetal.eu> Signed-off-by: Morten Linderud <morten@linderud.pw> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-03-12Merge tag 'xfs-6.3-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: - Fix a crash if mount time quotacheck fails when there are inodes queued for garbage collection. - Fix an off by one error when discarding folios after writeback failure. * tag 'xfs-6.3-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix off-by-one-block in xfs_discard_folio() xfs: quotacheck failure can race with background inode inactivation
2023-03-12Merge tag 'staging-6.3-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver fixes and removal from Greg KH: "Here are four small staging driver fixes, and one big staging driver deletion for 6.3-rc2. The fixes are: - rtl8192e driver fixes for where the driver was attempting to execute various programs directly from the disk for unknown reasons - rtl8723bs driver fixes for issues found by Hans in testing The deleted driver is the removal of the r8188eu wireless driver as now in 6.3-rc1 we have a "real" wifi driver for one that includes support for many many more devices than this old driver did. So it's time to remove it as it is no longer needed. The maintainers of this driver all have acked its removal. Many thanks to them over the years for working to clean it up and keep it working while the real driver was being developed. All of these have been in linux-next this week with no reported problems" * tag 'staging-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: r8188eu: delete driver staging: rtl8723bs: Pass correct parameters to cfg80211_get_bss() staging: rtl8723bs: Fix key-store index handling staging: rtl8192e: Remove call_usermodehelper starting RadioPower.sh staging: rtl8192e: Remove function ..dm_check_ac_dc_power calling a script
2023-03-12Merge tag 'x86_urgent_for_v6.3_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Borislav Petkov: "A single erratum fix for AMD machines: - Disable XSAVES on AMD Zen1 and Zen2 machines due to an erratum. No impact to anything as those machines will fallback to XSAVEC which is equivalent there" * tag 'x86_urgent_for_v6.3_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/CPU/AMD: Disable XSAVES on AMD family 0x17
2023-03-12Merge tag 'kernel.fork.v6.3-rc2' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux Pull clone3 fix from Christian Brauner: "A simple fix for the clone3() system call. The CLONE_NEWTIME allows the creation of time namespaces. The flag reuses a bit from the CSIGNAL bits that are used in the legacy clone() system call to set the signal that gets sent to the parent after the child exits. The clone3() system call doesn't rely on CSIGNAL anymore as it uses a dedicated .exit_signal field in struct clone_args. So we blocked all CSIGNAL bits in clone3_args_valid(). When CLONE_NEWTIME was introduced and reused a CSIGNAL bit we forgot to adapt clone3_args_valid() causing CLONE_NEWTIME with clone3() to be rejected. Fix this" * tag 'kernel.fork.v6.3-rc2' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux: selftests/clone3: test clone3 with CLONE_NEWTIME fork: allow CLONE_NEWTIME in clone3 flags
2023-03-12Merge tag 'vfs.misc.v6.3-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull vfs fixes from Christian Brauner: - When allocating pages for a watch queue failed, we didn't return an error causing userspace to proceed even though all subsequent notifcations would be lost. Make sure to return an error. - Fix a misformed tree entry for the idmapping maintainers entry. - When setting file leases from an idmapped mount via generic_setlease() we need to take the idmapping into account otherwise taking a lease would fail from an idmapped mount. - Remove two redundant assignments, one in splice code and the other in locks code, that static checkers complained about. * tag 'vfs.misc.v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: filelocks: use mount idmapping for setlease permission check fs/locks: Remove redundant assignment to cmd splice: Remove redundant assignment to ret MAINTAINERS: repair a malformed T: entry in IDMAPPED MOUNTS watch_queue: fix IOC_WATCH_QUEUE_SET_SIZE alloc error paths
2023-03-12Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Bug fixes and regressions for ext4, the most serious of which is a potential deadlock during directory renames that was introduced during the merge window discovered by a combination of syzbot and lockdep" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: zero i_disksize when initializing the bootloader inode ext4: make sure fs error flag setted before clear journal error ext4: commit super block if fs record error when journal record without error ext4, jbd2: add an optimized bmap for the journal inode ext4: fix WARNING in ext4_update_inline_data ext4: move where set the MAY_INLINE_DATA flag is set ext4: Fix deadlock during directory rename ext4: Fix comment about the 64BIT feature docs: ext4: modify the group desc size to 64 ext4: fix another off-by-one fsmap error on 1k block filesystems ext4: fix RENAME_WHITEOUT handling for inline directories ext4: make kobj_type structures constant ext4: fix cgroup writeback accounting with fs-layer encryption
2023-03-12cpumask: relax sanity checking constraintsLinus Torvalds
The cpumask_check() was unnecessarily tight, and causes problems for the users of cpumask_next(). We have a number of users that take the previous return value of one of the bit scanning functions and subtract one to keep it in "range". But since the scanning functions end up returning up to 'small_cpumask_bits' instead of the tighter 'nr_cpumask_bits', the range really needs to be using that widened form. [ This "previous-1" behavior is also the reason we have all those comments about /* -1 is a legal arg here. */ and separate checks for that being ok. So we could have just made "small_cpumask_bits-1" be a similar special "don't check this" value. Tetsuo Handa even suggested a patch that only does that for cpumask_next(), since that seems to be the only actual case that triggers, but that all makes it even _more_ magical and special. So just relax the check ] One example of this kind of pattern being the 'c_start()' function in arch/x86/kernel/cpu/proc.c, but also duplicated in various forms on other architectures. Reported-by: syzbot+96cae094d90877641f32@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=96cae094d90877641f32 Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Link: https://lore.kernel.org/lkml/c1f4cc16-feea-b83c-82cf-1a1f007b7eb9@I-love.SAKURA.ne.jp/ Fixes: 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask optimizations") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-11Merge tag 'i2c-for-6.3-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: "This marks the end of a transition to let I2C have the same probe semantics as other subsystems. Uwe took care that no drivers in the current tree nor in -next use the deprecated .probe call. So, it is a good time to switch to the new, standard semantics now. There is also a regression fix: - regression fix for the notifier handling of the I2C core - final coversions of drivers away from deprecated .probe - make .probe_new the standard probe and convert I2C core to use it * tag 'i2c-for-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: dev: Fix bus callback return values i2c: Convert drivers to new .probe() callback i2c: mux: Convert all drivers to new .probe() callback i2c: Switch .probe() to not take an id parameter media: i2c: ov2685: convert to i2c's .probe_new() media: i2c: ov5695: convert to i2c's .probe_new() w1: ds2482: Convert to i2c's .probe_new() serial: sc16is7xx: Convert to i2c's .probe_new() mtd: maps: pismo: Convert to i2c's .probe_new() misc: ad525x_dpot-i2c: Convert to i2c's .probe_new()
2023-03-11ubi: block: Fix missing blk_mq_end_requestRichard Weinberger
Switching to BLK_MQ_F_BLOCKING wrongly removed the call to blk_mq_end_request(). Add it back to have our IOs finished Fixes: 91cc8fbcc8c7 ("ubi: block: set BLK_MQ_F_BLOCKING") Analyzed-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Daniel Palmer <daniel@0x0f.com> Link: https://lore.kernel.org/linux-mtd/CAHk-=wi29bbBNh3RqJKu3PxzpjDN5D5K17gEVtXrb7-6bfrnMQ@mail.gmail.com/ Signed-off-by: Richard Weinberger <richard@nod.at> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Daniel Palmer <daniel@0x0f.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-11KVM: arm64: timers: Convert per-vcpu virtual offset to a global valueMarc Zyngier
Having a per-vcpu virtual offset is a pain. It needs to be synchronized on each update, and expands badly to a setup where different timers can have different offsets, or have composite offsets (as with NV). So let's start by replacing the use of the CNTVOFF_EL2 shadow register (which we want to reclaim for NV anyway), and make the virtual timer carry a pointer to a VM-wide offset. This simplifies the code significantly. It also addresses two terrible bugs: - The use of CNTVOFF_EL2 leads to some nice offset corruption when the sysreg gets reset, as reported by Joey. - The kvm mutex is taken from a vcpu ioctl, which goes against the locking rules... Reported-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230224173915.GA17407@e124191.cambridge.arm.com Tested-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20230224191640.3396734-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>