summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-12-11KVM: mmu: Fix SPTE encoding of MMIO generation upper halfMaciej S. Szmigiero
Commit cae7ed3c2cb0 ("KVM: x86: Refactor the MMIO SPTE generation handling") cleaned up the computation of MMIO generation SPTE masks, however it introduced a bug how the upper part was encoded: SPTE bits 52-61 were supposed to contain bits 10-19 of the current generation number, however a missing shift encoded bits 1-10 there instead (mostly duplicating the lower part of the encoded generation number that then consisted of bits 1-9). In the meantime, the upper part was shrunk by one bit and moved by subsequent commits to become an upper half of the encoded generation number (bits 9-17 of bits 0-17 encoded in a SPTE). In addition to the above, commit 56871d444bc4 ("KVM: x86: fix overlap between SPTE_MMIO_MASK and generation") has changed the SPTE bit range assigned to encode the generation number and the total number of bits encoded but did not update them in the comment attached to their defines, nor in the KVM MMU doc. Let's do it here, too, since it is too trivial thing to warrant a separate commit. Fixes: cae7ed3c2cb0 ("KVM: x86: Refactor the MMIO SPTE generation handling") Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <156700708db2a5296c5ed7a8b9ac71f1e9765c85.1607129096.git.maciej.szmigiero@oracle.com> Cc: stable@vger.kernel.org [Reorganize macros so that everything is computed from the bit ranges. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-10Merge tag 'kvmarm-fixes-5.10-5' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD kvm/arm64 fixes for 5.10, take #5 - Don't leak page tables on PTE update - Correctly invalidate TLBs on table to block transition - Only update permissions if the fault level matches the expected mapping size
2020-12-04kvm: x86/mmu: Use cpuid to determine max gfnRick Edgecombe
In the TDP MMU, use shadow_phys_bits to dermine the maximum possible GFN mapped in the guest for zapping operations. boot_cpu_data.x86_phys_bits may be reduced in the case of HW features that steal HPA bits for other purposes. However, this doesn't necessarily reduce GPA space that can be accessed via TDP. So zap based on a maximum gfn calculated with MAXPHYADDR retrieved from CPUID. This is already stored in shadow_phys_bits, so use it instead of x86_phys_bits. Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU") Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Message-Id: <20201203231120.27307-1-rick.p.edgecombe@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-04kvm: svm: de-allocate svm_cpu_data for all cpus in svm_cpu_uninit()Jacob Xu
The cpu arg for svm_cpu_uninit() was previously ignored resulting in the per cpu structure svm_cpu_data not being de-allocated for all cpus. Signed-off-by: Jacob Xu <jacobhxu@google.com> Message-Id: <20201203205939.1783969-1-jacobhxu@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-03selftests: kvm/set_memory_region_test: Fix race in move region testMaciej S. Szmigiero
The current memory region move test correctly handles the situation that the second (realigning) memslot move operation would temporarily trigger MMIO until it completes, however it does not handle the case in which the first (misaligning) move operation does this, too. This results in false test assertions in case it does so. Fix this by handling temporary MMIO from the first memslot move operation in the test guest code, too. Fixes: 8a0639fe9201 ("KVM: sefltests: Add explicit synchronization to move mem region test") Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <0fdddb94bb0e31b7da129a809a308d91c10c0b5e.1606941224.git.maciej.szmigiero@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-02KVM: arm64: Add usage of stage 2 fault lookup level in user_mem_abort()Yanan Wang
If we get a FSC_PERM fault, just using (logging_active && writable) to determine calling kvm_pgtable_stage2_map(). There will be two more cases we should consider. (1) After logging_active is configged back to false from true. When we get a FSC_PERM fault with write_fault and adjustment of hugepage is needed, we should merge tables back to a block entry. This case is ignored by still calling kvm_pgtable_stage2_relax_perms(), which will lead to an endless loop and guest panic due to soft lockup. (2) We use (FSC_PERM && logging_active && writable) to determine collapsing a block entry into a table by calling kvm_pgtable_stage2_map(). But sometimes we may only need to relax permissions when trying to write to a page other than a block. In this condition,using kvm_pgtable_stage2_relax_perms() will be fine. The ISS filed bit[1:0] in ESR_EL2 regesiter indicates the stage2 lookup level at which a D-abort or I-abort occurred. By comparing granule of the fault lookup level with vma_pagesize, we can strictly distinguish conditions of calling kvm_pgtable_stage2_relax_perms() or kvm_pgtable_stage2_map(), and the above two cases will be well considered. Suggested-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Yanan Wang <wangyanan55@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201201201034.116760-4-wangyanan55@huawei.com
2020-12-02KVM: arm64: Fix handling of merging tables into a block entryYanan Wang
When dirty logging is enabled, we collapse block entries into tables as necessary. If dirty logging gets canceled, we can end-up merging tables back into block entries. When this happens, we must not only free the non-huge page-table pages but also invalidate all the TLB entries that can potentially cover the block. Otherwise, we end-up with multiple possible translations for the same physical page, which can legitimately result in a TLB conflict. To address this, replease the bogus invalidation by IPA with a full VM invalidation. Although this is pretty heavy handed, it happens very infrequently and saves a bunch of invalidations by IPA. Signed-off-by: Yanan Wang <wangyanan55@huawei.com> [maz: fixup commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201201201034.116760-3-wangyanan55@huawei.com
2020-12-02KVM: arm64: Fix memory leak on stage2 update of a valid PTEYanan Wang
When installing a new leaf PTE onto an invalid ptep, we need to get_page(ptep) to account for the new mapping. However, simply updating a valid PTE shouldn't result in any additional refcounting, as there is new mapping. This otherwise results in a page being forever wasted. Address this by fixing-up the refcount in stage2_map_walker_try_leaf() if the PTE was already valid, balancing out the later get_page() in stage2_map_walk_leaf(). Signed-off-by: Yanan Wang <wangyanan55@huawei.com> [maz: update commit message, add comment in the code] Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201201201034.116760-2-wangyanan55@huawei.com
2020-11-27kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level PTVitaly Kuznetsov
Commit 95fb5b0258b7 ("kvm: x86/mmu: Support MMIO in the TDP MMU") caused the following WARNING on an Intel Ice Lake CPU: get_mmio_spte: detect reserved bits on spte, addr 0xb80a0, dump hierarchy: ------ spte 0xb80a0 level 5. ------ spte 0xfcd210107 level 4. ------ spte 0x1004c40107 level 3. ------ spte 0x1004c41107 level 2. ------ spte 0x1db00000000b83b6 level 1. WARNING: CPU: 109 PID: 10254 at arch/x86/kvm/mmu/mmu.c:3569 kvm_mmu_page_fault.cold.150+0x54/0x22f [kvm] ... Call Trace: ? kvm_io_bus_get_first_dev+0x55/0x110 [kvm] vcpu_enter_guest+0xaa1/0x16a0 [kvm] ? vmx_get_cs_db_l_bits+0x17/0x30 [kvm_intel] ? skip_emulated_instruction+0xaa/0x150 [kvm_intel] kvm_arch_vcpu_ioctl_run+0xca/0x520 [kvm] The guest triggering this crashes. Note, this happens with the traditional MMU and EPT enabled, not with the newly introduced TDP MMU. Turns out, there was a subtle change in the above mentioned commit. Previously, walk_shadow_page_get_mmio_spte() was setting 'root' to 'iterator.level' which is returned by shadow_walk_init() and this equals to 'vcpu->arch.mmu->shadow_root_level'. Now, get_mmio_spte() sets it to 'int root = vcpu->arch.mmu->root_level'. The difference between 'root_level' and 'shadow_root_level' on CPUs supporting 5-level page tables is that in some case we don't want to use 5-level, in particular when 'cpuid_maxphyaddr(vcpu) <= 48' kvm_mmu_get_tdp_level() returns '4'. In case upper layer is not used, the corresponding SPTE will fail '__is_rsvd_bits_set()' check. Revert to using 'shadow_root_level'. Fixes: 95fb5b0258b7 ("kvm: x86/mmu: Support MMIO in the TDP MMU") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20201126110206.2118959-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-27KVM: x86: Fix split-irqchip vs interrupt injection window requestPaolo Bonzini
kvm_cpu_accept_dm_intr and kvm_vcpu_ready_for_interrupt_injection are a hodge-podge of conditions, hacked together to get something that more or less works. But what is actually needed is much simpler; in both cases the fundamental question is, do we have a place to stash an interrupt if userspace does KVM_INTERRUPT? In userspace irqchip mode, that is !vcpu->arch.interrupt.injected. Currently kvm_event_needs_reinjection(vcpu) covers it, but it is unnecessarily restrictive. In split irqchip mode it's a bit more complicated, we need to check kvm_apic_accept_pic_intr(vcpu) (the IRQ window exit is basically an INTACK cycle and thus requires ExtINTs not to be masked) as well as !pending_userspace_extint(vcpu). However, there is no need to check kvm_event_needs_reinjection(vcpu), since split irqchip keeps pending ExtINT state separate from event injection state, and checking kvm_cpu_has_interrupt(vcpu) is wrong too since ExtINT has higher priority than APIC interrupts. In fact the latter fixes a bug: when userspace requests an IRQ window vmexit, an interrupt in the local APIC can cause kvm_cpu_has_interrupt() to be true and thus kvm_vcpu_ready_for_interrupt_injection() to return false. When this happens, vcpu_run does not exit to userspace but the interrupt window vmexits keep occurring. The VM loops without any hope of making progress. Once we try to fix these with something like return kvm_arch_interrupt_allowed(vcpu) && - !kvm_cpu_has_interrupt(vcpu) && - !kvm_event_needs_reinjection(vcpu) && - kvm_cpu_accept_dm_intr(vcpu); + (!lapic_in_kernel(vcpu) + ? !vcpu->arch.interrupt.injected + : (kvm_apic_accept_pic_intr(vcpu) + && !pending_userspace_extint(v))); we realize two things. First, thanks to the previous patch the complex conditional can reuse !kvm_cpu_has_extint(vcpu). Second, the interrupt window request in vcpu_enter_guest() bool req_int_win = dm_request_for_irq_injection(vcpu) && kvm_cpu_accept_dm_intr(vcpu); should be kept in sync with kvm_vcpu_ready_for_interrupt_injection(): it is unnecessary to ask the processor for an interrupt window if we would not be able to return to userspace. Therefore, kvm_cpu_accept_dm_intr(vcpu) is basically !kvm_cpu_has_extint(vcpu) ANDed with the existing check for masked ExtINT. It all makes sense: - we can accept an interrupt from userspace if there is a place to stash it (and, for irqchip split, ExtINTs are not masked). Interrupts from userspace _can_ be accepted even if right now EFLAGS.IF=0. - in order to tell userspace we will inject its interrupt ("IRQ window open" i.e. kvm_vcpu_ready_for_interrupt_injection), both KVM and the vCPU need to be ready to accept the interrupt. ... and this is what the patch implements. Reported-by: David Woodhouse <dwmw@amazon.co.uk> Analyzed-by: David Woodhouse <dwmw@amazon.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Nikos Tsironis <ntsironis@arrikto.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Tested-by: David Woodhouse <dwmw@amazon.co.uk>
2020-11-27KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extintPaolo Bonzini
Centralize handling of interrupts from the userspace APIC in kvm_cpu_has_extint and kvm_cpu_get_extint, since userspace APIC interrupts are handled more or less the same as ExtINTs are with split irqchip. This removes duplicated code from kvm_cpu_has_injectable_intr and kvm_cpu_has_interrupt, and makes the code more similar between kvm_cpu_has_{extint,interrupt} on one side and kvm_cpu_get_{extint,interrupt} on the other. Cc: stable@vger.kernel.org Reviewed-by: Filippo Sironi <sironi@amazon.de> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Tested-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-27Merge tag 'kvmarm-fixes-5.10-4' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm64 fixes for v5.10, take #4 - Fix alignment of the new HYP sections - Fix GICR_TYPER access from userspace
2020-11-20MAINTAINERS: Update email address for Sean ChristophersonSean Christopherson
Update my email address to one provided by my new benefactor. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: kvm@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201119183707.291864-1-sean.kvm@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-18Merge tag 'kvm-s390-master-5.10-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master KVM: s390: Fix for destroy page ultravisor call - handle response code from older firmware - add uv.c to KVM: s390/s390 maintainer list
2020-11-18MAINTAINERS: add uv.c also to KVM/s390Christian Borntraeger
Most changes in uv.c are related to KVM. Involve also the KVM team regarding changes to uv.c. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-18s390/uv: handle destroy page legacy interfaceChristian Borntraeger
Older firmware can return rc=0x107 rrc=0xd for destroy page if the page is already non-secure. This should be handled like a success as already done by newer firmware. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Fixes: 1a80b54d1ce1 ("s390/uv: add destroy page call") Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
2020-11-17KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspaceZenghui Yu
It was recently reported that if GICR_TYPER is accessed before the RD base address is set, we'll suffer from the unset @rdreg dereferencing. Oops... gpa_t last_rdist_typer = rdreg->base + GICR_TYPER + (rdreg->free_index - 1) * KVM_VGIC_V3_REDIST_SIZE; It's "expected" that users will access registers in the redistributor if the RD has been properly configured (e.g., the RD base address is set). But it hasn't yet been covered by the existing documentation. Per discussion on the list [1], the reporting of the GICR_TYPER.Last bit for userspace never actually worked. And it's difficult for us to emulate it correctly given that userspace has the flexibility to access it any time. Let's just drop the reporting of the Last bit for userspace for now (userspace should have full knowledge about it anyway) and it at least prevents kernel from panic ;-) [1] https://lore.kernel.org/kvmarm/c20865a267e44d1e2c0d52ce4e012263@kernel.org/ Fixes: ba7b3f1275fd ("KVM: arm/arm64: Revisit Redistributor TYPER last bit computation") Reported-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20201117151629.1738-1-yuzenghui@huawei.com Cc: stable@vger.kernel.org
2020-11-17KVM: SVM: fix error return code in svm_create_vcpu()Chen Zhou
Fix to return a negative error code from the error handling case instead of 0 in function svm_create_vcpu(), as done elsewhere in this function. Fixes: f4c847a95654 ("KVM: SVM: refactor msr permission bitmap allocation") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Message-Id: <20201117025426.167824-1-chenzhou10@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-16KVM: SVM: Fix offset computation bug in __sev_dbg_decrypt().Ashish Kalra
Fix offset computation in __sev_dbg_decrypt() to include the source paddr before it is rounded down to be aligned to 16 bytes as required by SEV API. This fixes incorrect guest memory dumps observed when using qemu monitor. Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <20201110224205.29444-1-Ashish.Kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-16Merge tag 'kvm-s390-master-5.10-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master KVM: s390: Fixes for 5.10 - do not reset the global diag318 data for per-cpu reset - do not mark memory as protected too early
2020-11-16KVM: arm64: Correctly align nVHE percpu dataJamie Iles
The nVHE percpu data is partially linked but the nVHE linker script did not align the percpu section. The PERCPU_INPUT macro would then align the data to a page boundary: #define PERCPU_INPUT(cacheline) \ __per_cpu_start = .; \ *(.data..percpu..first) \ . = ALIGN(PAGE_SIZE); \ *(.data..percpu..page_aligned) \ . = ALIGN(cacheline); \ *(.data..percpu..read_mostly) \ . = ALIGN(cacheline); \ *(.data..percpu) \ *(.data..percpu..shared_aligned) \ PERCPU_DECRYPTED_SECTION \ __per_cpu_end = .; but then when the final vmlinux linking happens the hypervisor percpu data is included after page alignment and so the offsets potentially don't match. On my build I saw that the .hyp.data..percpu section was at address 0x20 and then the percpu data would begin at 0x1000 (because of the page alignment in PERCPU_INPUT), but when linked into vmlinux, everything would be shifted down by 0x20 bytes. This manifests as one of the CPUs getting lost when running kvm-unit-tests or starting any VM and subsequent soft lockup on a Cortex A72 device. Fixes: 30c953911c43 ("kvm: arm64: Set up hyp percpu data for nVHE") Signed-off-by: Jamie Iles <jamie@nuviainc.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: David Brazdil <dbrazdil@google.com> Cc: David Brazdil <dbrazdil@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201113150406.14314-1-jamie@nuviainc.com
2020-11-15kvm: mmu: fix is_tdp_mmu_check when the TDP MMU is not in usePaolo Bonzini
In some cases where shadow paging is in use, the root page will be either mmu->pae_root or vcpu->arch.mmu->lm_root. Then it will not have an associated struct kvm_mmu_page, because it is allocated with alloc_page instead of kvm_mmu_alloc_page. Just return false quickly from is_tdp_mmu_root if the TDP MMU is not in use, which also includes the case where shadow paging is enabled. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-13KVM: SVM: Update cr3_lm_rsvd_bits for AMD SEV guestsBabu Moger
For AMD SEV guests, update the cr3_lm_rsvd_bits to mask the memory encryption bit in reserved bits. Signed-off-by: Babu Moger <babu.moger@amd.com> Message-Id: <160521948301.32054.5783800787423231162.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-13KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_archBabu Moger
SEV guests fail to boot on a system that supports the PCID feature. While emulating the RSM instruction, KVM reads the guest CR3 and calls kvm_set_cr3(). If the vCPU is in the long mode, kvm_set_cr3() does a sanity check for the CR3 value. In this case, it validates whether the value has any reserved bits set. The reserved bit range is 63:cpuid_maxphysaddr(). When AMD memory encryption is enabled, the memory encryption bit is set in the CR3 value. The memory encryption bit may fall within the KVM reserved bit range, causing the KVM emulation failure. Introduce a new field cr3_lm_rsvd_bits in kvm_vcpu_arch which will cache the reserved bits in the CR3 value. This will be initialized to rsvd_bits(cpuid_maxphyaddr(vcpu), 63). If the architecture has any special bits(like AMD SEV encryption bit) that needs to be masked from the reserved bits, should be cleared in vendor specific kvm_x86_ops.vcpu_after_set_cpuid handler. Fixes: a780a3ea628268b2 ("KVM: X86: Fix reserved bits check for MOV to CR3") Signed-off-by: Babu Moger <babu.moger@amd.com> Message-Id: <160521947657.32054.3264016688005356563.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-13KVM: x86: clflushopt should be treated as a no-op by emulationDavid Edmondson
The instruction emulator ignores clflush instructions, yet fails to support clflushopt. Treat both similarly. Fixes: 13e457e0eebf ("KVM: x86: Emulator does not decode clflush well") Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20201103120400.240882-1-david.edmondson@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-13Merge tag 'kvmarm-fixes-5.10-3' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for v5.10, take #3 - Allow userspace to downgrade ID_AA64PFR0_EL1.CSV2 - Inject UNDEF on SCXTNUM_ELx access
2020-11-12Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscryptLinus Torvalds
Pull fscrypt fix from Eric Biggers: "Fix a regression where new files weren't using inline encryption when they should be" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fscrypt: fix inline encryption not used on new files
2020-11-12Merge tag 'gfs2-v5.10-rc3-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fixes from Andreas Gruenbacher: "Fix jdata data corruption and glock reference leak" * tag 'gfs2-v5.10-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix case in which ail writes are done to jdata holes Revert "gfs2: Ignore journal log writes for jdata holes" gfs2: fix possible reference leak in gfs2_check_blk_type
2020-11-12Merge tag 'net-5.10-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Current release - regressions: - arm64: dts: fsl-ls1028a-kontron-sl28: specify in-band mode for ENETC Current release - bugs in new features: - mptcp: provide rmem[0] limit offset to fix oops Previous release - regressions: - IPv6: Set SIT tunnel hard_header_len to zero to fix path MTU calculations - lan743x: correctly handle chips with internal PHY - bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE - mlx5e: Fix VXLAN port table synchronization after function reload Previous release - always broken: - bpf: Zero-fill re-used per-cpu map element - fix out-of-order UDP packets when forwarding with UDP GSO fraglists turned on: - fix UDP header access on Fast/frag0 UDP GRO - fix IP header access and skb lookup on Fast/frag0 UDP GRO - ethtool: netlink: add missing netdev_features_change() call - net: Update window_clamp if SOCK_RCVBUF is set - igc: Fix returning wrong statistics - ch_ktls: fix multiple leaks and corner cases in Chelsio TLS offload - tunnels: Fix off-by-one in lower MTU bounds for ICMP/ICMPv6 replies - r8169: disable hw csum for short packets on all chip versions - vrf: Fix fast path output packet handling with async Netfilter rules" * tag 'net-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits) lan743x: fix use of uninitialized variable net: udp: fix IP header access and skb lookup on Fast/frag0 UDP GRO net: udp: fix UDP header access on Fast/frag0 UDP GRO devlink: Avoid overwriting port attributes of registered port vrf: Fix fast path output packet handling with async Netfilter rules cosa: Add missing kfree in error path of cosa_write net: switch to the kernel.org patchwork instance ch_ktls: stop the txq if reaches threshold ch_ktls: tcb update fails sometimes ch_ktls/cxgb4: handle partial tag alone SKBs ch_ktls: don't free skb before sending FIN ch_ktls: packet handling prior to start marker ch_ktls: Correction in middle record handling ch_ktls: missing handling of header alone ch_ktls: Correction in trimmed_len calculation cxgb4/ch_ktls: creating skbs causes panic ch_ktls: Update cheksum information ch_ktls: Correction in finding correct length cxgb4/ch_ktls: decrypted bit is not enough net/x25: Fix null-ptr-deref in x25_connect ...
2020-11-12Merge tag 'nfs-for-5.10-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Anna Schumaker: "Stable fixes: - Fix failure to unregister shrinker Other fixes: - Fix unnecessary locking to clear up some contention - Fix listxattr receive buffer size - Fix default mount options for nfsroot" * tag 'nfs-for-5.10-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Remove unnecessary inode lock in nfs_fsync_dir() NFS: Remove unnecessary inode locking in nfs_llseek_dir() NFS: Fix listxattr receive buffer size NFSv4.2: fix failure to unregister shrinker nfsroot: Default mount option should ask for built-in NFS version
2020-11-12KVM: arm64: Handle SCXTNUM_ELx trapsMarc Zyngier
As the kernel never sets HCR_EL2.EnSCXT, accesses to SCXTNUM_ELx will trap to EL2. Let's handle that as gracefully as possible by injecting an UNDEF exception into the guest. This is consistent with the guest's view of ID_AA64PFR0_EL1.CSV2 being at most 1. Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201110141308.451654-4-maz@kernel.org
2020-11-12KVM: arm64: Unify trap handlers injecting an UNDEFMarc Zyngier
A large number of system register trap handlers only inject an UNDEF exeption, and yet each class of sysreg seems to provide its own, identical function. Let's unify them all, saving us introducing yet another one later. Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201110141308.451654-3-maz@kernel.org
2020-11-12KVM: arm64: Allow setting of ID_AA64PFR0_EL1.CSV2 from userspaceMarc Zyngier
We now expose ID_AA64PFR0_EL1.CSV2=1 to guests running on hosts that are immune to Spectre-v2, but that don't have this field set, most likely because they predate the specification. However, this prevents the migration of guests that have started on a host the doesn't fake this CSV2 setting to one that does, as KVM rejects the write to ID_AA64PFR0_EL2 on the grounds that it isn't what is already there. In order to fix this, allow userspace to set this field as long as this doesn't result in a promising more than what is already there (setting CSV2 to 0 is acceptable, but setting it to 1 when it is already set to 0 isn't). Fixes: e1026237f9067 ("KVM: arm64: Set CSV2 for guests on hardware unaffected by Spectre-v2") Reported-by: Peng Liang <liangpeng10@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201110141308.451654-2-maz@kernel.org
2020-11-12Merge tag 'v5.10-rc1' into kvmarm-master/nextMarc Zyngier
Linux 5.10-rc1 Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-12Merge tag 'acpi-5.10-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These are mostly docmentation fixes and janitorial changes plus some new device IDs and a new quirk. Specifics: - Fix documentation regarding GPIO properties (Andy Shevchenko) - Fix spelling mistakes in ACPI documentation (Flavio Suligoi) - Fix white space inconsistencies in ACPI code (Maximilian Luz) - Fix string formatting in the ACPI Generic Event Device (GED) driver (Nick Desaulniers) - Add Intel Alder Lake device IDs to the ACPI drivers used by the Dynamic Platform and Thermal Framework (Srinivas Pandruvada) - Add lid-related DMI quirk for Medion Akoya E2228T to the ACPI button driver (Hans de Goede)" * tag 'acpi-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: DPTF: Support Alder Lake Documentation: ACPI: fix spelling mistakes ACPI: button: Add DMI quirk for Medion Akoya E2228T ACPI: GED: fix -Wformat ACPI: Fix whitespace inconsistencies ACPI: scan: Fix acpi_dma_configure_id() kerneldoc name Documentation: firmware-guide: gpio-properties: Clarify initial output state Documentation: firmware-guide: gpio-properties: active_low only for GpioIo() Documentation: firmware-guide: gpio-properties: Fix factual mistakes
2020-11-12Merge tag 'pm-5.10-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "Make the intel_pstate driver behave as expected when it operates in the passive mode with HWP enabled and the 'powersave' governor on top of it" * tag 'pm-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: intel_pstate: Take CPUFREQ_GOV_STRICT_TARGET into account cpufreq: Add strict_target to struct cpufreq_policy cpufreq: Introduce CPUFREQ_GOV_STRICT_TARGET cpufreq: Introduce governor flags
2020-11-12lan743x: fix use of uninitialized variableSven Van Asbroeck
When no devicetree is present, the driver will use an uninitialized variable. Fix by initializing this variable. Fixes: 902a66e08cea ("lan743x: correctly handle chips with internal PHY") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com> Link: https://lore.kernel.org/r/20201112152513.1941-1-TheSven73@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12Merge branch 'net-udp-fix-fast-frag0-udp-gro'Jakub Kicinski
Alexander Lobakin says: ==================== net: udp: fix Fast/frag0 UDP GRO While testing UDP GSO fraglists forwarding through driver that uses Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order iperf packets: [ ID] Interval Transfer Bitrate Jitter [SUM] 0.0-40.0 sec 12106 datagrams received out-of-order Simple switch to napi_gro_receive() or any other method without frag0 shortcut completely resolved them. I've found two incorrect header accesses in GRO receive callback(s): - udp_hdr() (instead of udp_gro_udphdr()) that always points to junk in "fast" mode and could probably do this in "regular". This was the actual bug that caused all out-of-order delivers; - udp{4,6}_lib_lookup_skb() -> ip{,v6}_hdr() (instead of skb_gro_network_header()) that potentionally might return odd pointers in both modes. Each patch addresses one of these two issues. This doesn't cover a support for nested tunnels as it's out of the subject and requires more invasive changes. It will be handled separately in net-next series. Credits: Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Since v4 [0]: - split the fix into two logical ones (Willem); - replace ternaries with plain ifs to beautify the code (Jakub); - drop p->data part to reintroduce it later in abovementioned set. Since v3 [1]: - restore the original {,__}udp{4,6}_lib_lookup_skb() and use private versions of them inside GRO code (Willem). Since v2 [2]: - dropped redundant check introduced in v2 as it's performed right before (thanks to Eric); - udp_hdr() switched to data + off for skbs from list (also Eric); - fixed possible malfunction of {,__}udp{4,6}_lib_lookup_skb() with Fast/frag0 due to ip{,v6}_hdr() usage (Willem). Since v1 [3]: - added a NULL pointer check for "uh" as suggested by Willem. [0] https://lore.kernel.org/netdev/Ha2hou5eJPcblo4abjAqxZRzIl1RaLs2Hy0oOAgFs@cp4-web-036.plabs.ch [1] https://lore.kernel.org/netdev/MgZce9htmEtCtHg7pmWxXXfdhmQ6AHrnltXC41zOoo@cp7-web-042.plabs.ch [2] https://lore.kernel.org/netdev/0eaG8xtbtKY1dEKCTKUBubGiC9QawGgB3tVZtNqVdY@cp4-web-030.plabs.ch [3] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch ==================== Link: https://lore.kernel.org/r/hjGOh0iCOYyo1FPiZh6TMXcx3YCgNs1T1eGKLrDz8@cp4-web-037.plabs.ch Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12net: udp: fix IP header access and skb lookup on Fast/frag0 UDP GROAlexander Lobakin
udp{4,6}_lib_lookup_skb() use ip{,v6}_hdr() to get IP header of the packet. While it's probably OK for non-frag0 paths, this helpers will also point to junk on Fast/frag0 GRO when all headers are located in frags. As a result, sk/skb lookup may fail or give wrong results. To support both GRO modes, skb_gro_network_header() might be used. To not modify original functions, add private versions of udp{4,6}_lib_lookup_skb() only to perform correct sk lookups on GRO. Present since the introduction of "application-level" UDP GRO in 4.7-rc1. Misc: replace totally unneeded ternaries with plain ifs. Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket") Suggested-by: Willem de Bruijn <willemb@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12net: udp: fix UDP header access on Fast/frag0 UDP GROAlexander Lobakin
UDP GRO uses udp_hdr(skb) in its .gro_receive() callback. While it's probably OK for non-frag0 paths (when all headers or even the entire frame are already in skb head), this inline points to junk when using Fast GRO (napi_gro_frags() or napi_gro_receive() with only Ethernet header in skb head and all the rest in the frags) and breaks GRO packet compilation and the packet flow itself. To support both modes, skb_gro_header_fast() + skb_gro_header_slow() are typically used. UDP even has an inline helper that makes use of them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr() to get rid of the out-of-order delivers. Present since the introduction of plain UDP GRO in 5.0-rc1. Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12gfs2: Fix case in which ail writes are done to jdata holesBob Peterson
Patch b2a846dbef4e ("gfs2: Ignore journal log writes for jdata holes") tried (unsuccessfully) to fix a case in which writes were done to jdata blocks, the blocks are sent to the ail list, then a punch_hole or truncate operation caused the blocks to be freed. In other words, the ail items are for jdata holes. Before b2a846dbef4e, the jdata hole caused function gfs2_block_map to return -EIO, which was eventually interpreted as an IO error to the journal, and then withdraw. This patch changes function gfs2_get_block_noalloc, which is only used for jdata writes, so it returns -ENODATA rather than -EIO, and when -ENODATA is returned to gfs2_ail1_start_one, the error is ignored. We can safely ignore it because gfs2_ail1_start_one is only called when the jdata pages have already been written and truncated, so the ail1 content no longer applies. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-11-12Revert "gfs2: Ignore journal log writes for jdata holes"Bob Peterson
This reverts commit b2a846dbef4ef54ef032f0f5ee188c609a0278a7. That commit changed the behavior of function gfs2_block_map to return -ENODATA in cases where a hole (IOMAP_HOLE) is encountered and create is false. While that fixed the intended problem for jdata, it also broke other callers of gfs2_block_map such as some jdata block reads. Before the patch, an encountered hole would be skipped and the buffer seen as unmapped by the caller. The patch changed the behavior to return -ENODATA, which is interpreted as an error by the caller. The -ENODATA return code should be restricted to the specific case where jdata holes are encountered during ail1 writes. That will be done in a later patch. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-11-12Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2020-11-10 This series contains updates to i40e and igc drivers and the MAINTAINERS file. Slawomir fixes updating VF MAC addresses to fix various issues related to reporting and setting of these addresses for i40e. Dan Carpenter fixes a possible used before being initialized issue for i40e. Vinicius fixes reporting of netdev stats for igc. Tony updates repositories for Intel Ethernet Drivers. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: MAINTAINERS: Update repositories for Intel Ethernet Drivers igc: Fix returning wrong statistics i40e, xsk: uninitialized variable in i40e_clean_rx_irq_zc() i40e: Fix MAC address setting for a VF via Host/VM ==================== Link: https://lore.kernel.org/r/20201111001955.533210-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12devlink: Avoid overwriting port attributes of registered portParav Pandit
Cited commit in fixes tag overwrites the port attributes for the registered port. Avoid such error by checking registered flag before setting attributes. Fixes: 71ad8d55f8e5 ("devlink: Replace devlink_port_attrs_set parameters with a struct") Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20201111034744.35554-1-parav@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12vrf: Fix fast path output packet handling with async Netfilter rulesMartin Willi
VRF devices use an optimized direct path on output if a default qdisc is involved, calling Netfilter hooks directly. This path, however, does not consider Netfilter rules completing asynchronously, such as with NFQUEUE. The Netfilter okfn() is called for asynchronously accepted packets, but the VRF never passes that packet down the stack to send it out over the slave device. Using the slower redirect path for this seems not feasible, as we do not know beforehand if a Netfilter hook has asynchronously completing rules. Fix the use of asynchronously completing Netfilter rules in OUTPUT and POSTROUTING by using a special completion function that additionally calls dst_output() to pass the packet down the stack. Also, slightly adjust the use of nf_reset_ct() so that is called in the asynchronous case, too. Fixes: dcdd43c41e60 ("net: vrf: performance improvements for IPv4") Fixes: a9ec54d1b0cd ("net: vrf: performance improvements for IPv6") Signed-off-by: Martin Willi <martin@strongswan.org> Link: https://lore.kernel.org/r/20201106073030.3974927-1-martin@strongswan.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-12NFS: Remove unnecessary inode lock in nfs_fsync_dir()Trond Myklebust
nfs_inc_stats() is already thread-safe, and there are no other reasons to hold the inode lock here. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-12NFS: Remove unnecessary inode locking in nfs_llseek_dir()Trond Myklebust
Remove the contentious inode lock, and instead provide thread safety using the file->f_lock spinlock. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-12NFS: Fix listxattr receive buffer sizeChuck Lever
Certain NFSv4.2/RDMA tests fail with v5.9-rc1. rpcrdma_convert_kvec() runs off the end of the rl_segments array because rq_rcv_buf.tail[0].iov_len holds a very large positive value. The resultant kernel memory corruption is enough to crash the client system. Callers of rpc_prepare_reply_pages() must reserve an extra XDR_UNIT in the maximum decode size for a possible XDR pad of the contents of the xdr_buf's pages. That guarantees the allocated receive buffer will be large enough to accommodate the usual contents plus that XDR pad word. encode_op_hdr() cannot add that extra word. If it does, xdr_inline_pages() underruns the length of the tail iovec. Fixes: 3e1f02123fba ("NFSv4.2: add client side XDR handling for extended attributes") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-12NFSv4.2: fix failure to unregister shrinkerJ. Bruce Fields
We forgot to unregister the nfs4_xattr_large_entry_shrinker. That leaves the global list of shrinkers corrupted after unload of the nfs module, after which possibly unrelated code that calls register_shrinker() or unregister_shrinker() gets a BUG() with "supervisor write access in kernel mode". And similarly for the nfs4_xattr_large_entry_lru. Reported-by: Kris Karas <bugs-a17@moonlit-rail.com> Tested-By: Kris Karas <bugs-a17@moonlit-rail.com> Fixes: 95ad37f90c33 "NFSv4.2: add client side xattr caching." Signed-off-by: J. Bruce Fields <bfields@redhat.com> CC: stable@vger.kernel.org Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-11-12Merge branches 'acpi-scan', 'acpi-misc', 'acpi-button' and 'acpi-dptf'Rafael J. Wysocki
* acpi-scan: ACPI: scan: Fix acpi_dma_configure_id() kerneldoc name * acpi-misc: ACPI: GED: fix -Wformat ACPI: Fix whitespace inconsistencies * acpi-button: ACPI: button: Add DMI quirk for Medion Akoya E2228T * acpi-dptf: ACPI: DPTF: Support Alder Lake