summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-08-20Merge branch kvm-arm64/mmu/kmemleak-pkvm into kvmarm-master/nextMarc Zyngier
Prevent kmemleak from peeking into the HYP data, which is fatal in protected mode. * kvm-arm64/mmu/kmemleak-pkvm: KVM: arm64: Unregister HYP sections from kmemleak in protected mode arm64: Move .hyp.rodata outside of the _sdata.._edata range Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20Merge branch kvm-arm64/misc-5.15 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/misc-5.15: : Misc improvements for 5.15: : : - Account the number of VMID-wide TLB invalidations as : remote TLB flushes : - Fix comments in the VGIC code : - Cleanup the PMU IMPDEF identification : - Streamline the TGRAN2 usage : - Avoid advertising a 52bit IPA range for non-64KB configs : - Avoid spurious signalling when a HW-mapped interrupt is in the : A+P state on entry, and in the P state on exit, but that the : physical line is not pending anymore. : - Bunch of minor cleanups KVM: arm64: vgic: Resample HW pending state on deactivation KVM: arm64: vgic: Drop WARN from vgic_get_irq KVM: arm64: Drop unused REQUIRES_VIRT KVM: arm64: Drop check_kvm_target_cpu() based percpu probe KVM: arm64: Drop init_common_resources() KVM: arm64: Use ARM64_MIN_PARANGE_BITS as the minimum supported IPA arm64/mm: Add remaining ID_AA64MMFR0_PARANGE_ macros KVM: arm64: Restrict IPA size to maximum 48 bits on 4K and 16K page size arm64/mm: Define ID_AA64MMFR0_TGRAN_2_SHIFT KVM: arm64: perf: Replace '0xf' instances with ID_AA64DFR0_PMUVER_IMP_DEF KVM: arm64: Fix comments related to GICv2 PMR reporting KVM: arm64: Count VMID-wide TLB invalidations arm64/kexec: Test page size support with new TGRAN range values Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20Merge branch kvm-arm64/mmu/mapping-levels into kvmarm-master/nextMarc Zyngier
Revamp the KVM/arm64 THP code by parsing the userspace page tables instead of relying on an infrastructure that is about to disappear (we are the last user). * kvm-arm64/mmu/mapping-levels: KVM: Get rid of kvm_get_pfn() KVM: arm64: Use get_page() instead of kvm_get_pfn() KVM: Remove kvm_is_transparent_hugepage() and PageTransCompoundMap() KVM: arm64: Avoid mapping size adjustment on permission fault KVM: arm64: Walk userspace page tables to compute the THP mapping size KVM: arm64: Introduce helper to retrieve a PTE and its level Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20Merge branch kvm-arm64/pmu/reset-values into kvmarm-master/nextMarc Zyngier
Fix the reset values for our PMU emulation. As a side effect, it allows a nice optimisation by only tracking the in-use counters when flipping them on and off, now that we are guaranteed not to have any spurious bit set. * kvm-arm64/pmu/reset-values: KVM: arm64: Remove PMSWINC_EL0 shadow register KVM: arm64: Disabling disabled PMU counters wastes a lot of time KVM: arm64: Drop unnecessary masking of PMU registers KVM: arm64: Narrow PMU sysreg reset values to architectural requirements Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20Merge branch arm64/for-next/sysreg into kvm-arm64/misc-5.15Marc Zyngier
Merge the arm64/for-next/sysreg branch to avoid merge conflicts in -next and upstream. * arm64/for-next/sysreg: arm64/kexec: Test page size support with new TGRAN range values Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20KVM: arm64: vgic: Resample HW pending state on deactivationMarc Zyngier
When a mapped level interrupt (a timer, for example) is deactivated by the guest, the corresponding host interrupt is equally deactivated. However, the fate of the pending state still needs to be dealt with in SW. This is specially true when the interrupt was in the active+pending state in the virtual distributor at the point where the guest was entered. On exit, the pending state is potentially stale (the guest may have put the interrupt in a non-pending state). If we don't do anything, the interrupt will be spuriously injected in the guest. Although this shouldn't have any ill effect (spurious interrupts are always possible), we can improve the emulation by detecting the deactivation-while-pending case and resample the interrupt. While we're at it, move the logic into a common helper that can be shared between the two GIC implementations. Fixes: e40cc57bac79 ("KVM: arm/arm64: vgic: Support level-triggered mapped interrupts") Reported-by: Raghavendra Rao Ananta <rananta@google.com> Tested-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210819180305.1670525-1-maz@kernel.org
2021-08-19KVM: arm64: vgic: Drop WARN from vgic_get_irqRicardo Koller
vgic_get_irq(intid) is used all over the vgic code in order to get a reference to a struct irq. It warns whenever intid is not a valid number (like when it's a reserved IRQ number). The issue is that this warning can be triggered from userspace (e.g., KVM_IRQ_LINE for intid 1020). Drop the WARN call from vgic_get_irq. Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210818213205.598471-1-ricarkol@google.com
2021-08-18KVM: arm64: Drop unused REQUIRES_VIRTAnshuman Khandual
This seems like a residue from the past. REQUIRES_VIRT is no more available . Hence it can just be dropped along with the related code section. Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1628744994-16623-6-git-send-email-anshuman.khandual@arm.com
2021-08-18KVM: arm64: Drop check_kvm_target_cpu() based percpu probeAnshuman Khandual
kvm_target_cpu() never returns a negative error code, so check_kvm_target() would never have 'ret' filled with a negative error code. Hence the percpu probe via check_kvm_target_cpu() does not make sense as its never going to find an unsupported CPU, forcing kvm_arch_init() to exit early. Hence lets just drop this percpu probe (and also check_kvm_target_cpu()) altogether. While here, this also changes kvm_target_cpu() return type to a u32, making it explicit that an error code will not be returned from this function. Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1628744994-16623-5-git-send-email-anshuman.khandual@arm.com
2021-08-18KVM: arm64: Drop init_common_resources()Anshuman Khandual
Could do without this additional indirection via init_common_resources() by just calling kvm_set_ipa_limit() directly instead. Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1628744994-16623-4-git-send-email-anshuman.khandual@arm.com
2021-08-18KVM: arm64: Use ARM64_MIN_PARANGE_BITS as the minimum supported IPAAnshuman Khandual
Drop hard coded value for the minimum supported IPA range bits (i.e 32). Instead use ARM64_MIN_PARANGE_BITS which improves the code readability. Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1628744994-16623-3-git-send-email-anshuman.khandual@arm.com
2021-08-18arm64/mm: Add remaining ID_AA64MMFR0_PARANGE_ macrosAnshuman Khandual
Currently there are macros only for 48 and 52 bits parange value extracted from the ID_AA64MMFR0.PARANGE field. This change completes the enumeration and updates the helper id_aa64mmfr0_parange_to_phys_shift(). While here it also defines ARM64_MIN_PARANGE_BITS as the absolute minimum shift value PA range which could be supported on a given platform. Cc: Marc Zyngier <maz@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1628744994-16623-2-git-send-email-anshuman.khandual@arm.com
2021-08-11KVM: arm64: Restrict IPA size to maximum 48 bits on 4K and 16K page sizeAnshuman Khandual
Even though ID_AA64MMFR0.PARANGE reports 52 bit PA size support, it cannot be enabled as guest IPA size on 4K or 16K page size configurations. Hence kvm_ipa_limit must be restricted to 48 bits. This change achieves required IPA capping. Before the commit c9b69a0cf0b4 ("KVM: arm64: Don't constrain maximum IPA size based on host configuration"), the problem here would have been just latent via PHYS_MASK_SHIFT (which earlier in turn capped kvm_ipa_limit), which remains capped at 48 bits on 4K and 16K configs. Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org Fixes: c9b69a0cf0b4 ("KVM: arm64: Don't constrain maximum IPA size based on host configuration") Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1628680275-16578-1-git-send-email-anshuman.khandual@arm.com
2021-08-11arm64/mm: Define ID_AA64MMFR0_TGRAN_2_SHIFTAnshuman Khandual
Streamline the Stage-2 TGRAN value extraction from ID_AA64MMFR0 register by adding a page size agnostic ID_AA64MMFR0_TGRAN_2_SHIFT. This is similar to the existing Stage-1 TGRAN shift i.e ID_AA64MMFR0_TGRAN_SHIFT. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1628569782-30213-1-git-send-email-anshuman.khandual@arm.com
2021-08-11KVM: arm64: perf: Replace '0xf' instances with ID_AA64DFR0_PMUVER_IMP_DEFAnshuman Khandual
ID_AA64DFR0_PMUVER_IMP_DEF which indicate implementation defined PMU, never actually gets used although there are '0xf' instances scattered all around. Just do the macro replacement to improve readability. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Marc Zyngier <maz@kernel.org> Cc: linux-perf-users@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-04KVM: arm64: Unregister HYP sections from kmemleak in protected modeMarc Zyngier
Booting a KVM host in protected mode with kmemleak quickly results in a pretty bad crash, as kmemleak doesn't know that the HYP sections have been taken away. This is specially true for the BSS section, which is part of the kernel BSS section and registered at boot time by kmemleak itself. Unregister the HYP part of the BSS before making that section HYP-private. The rest of the HYP-specific data is obtained via the page allocator or lives in other sections, none of which is subjected to kmemleak. Fixes: 90134ac9cabb ("KVM: arm64: Protect the .hyp sections from the host") Reviewed-by: Quentin Perret <qperret@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org # 5.13 Link: https://lore.kernel.org/r/20210802123830.2195174-3-maz@kernel.org
2021-08-04arm64: Move .hyp.rodata outside of the _sdata.._edata rangeMarc Zyngier
The HYP rodata section is currently lumped together with the BSS, which isn't exactly what is expected (it gets registered with kmemleak, for example). Move it away so that it is actually marked RO. As an added benefit, it isn't registered with kmemleak anymore. Fixes: 380e18ade4a5 ("KVM: arm64: Introduce a BSS section for use at Hyp") Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org #5.13 Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210802123830.2195174-2-maz@kernel.org
2021-08-02KVM: arm64: Fix comments related to GICv2 PMR reportingJason Wang
Remove the repeated word 'the' from two comments. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210728130623.12017-1-wangborong@cdjrlc.com
2021-08-02KVM: arm64: Count VMID-wide TLB invalidationsPaolo Bonzini
KVM/ARM has an architecture-specific implementation of kvm_flush_remote_tlbs; however, unlike the generic one, it does not count the flushes in kvm->stat.remote_tlb_flush, so that it inexorably remained stuck to zero. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210727103251.16561-1-pbonzini@redhat.com
2021-08-02KVM: arm64: Remove PMSWINC_EL0 shadow registerMarc Zyngier
We keep an entry for the PMSWINC_EL0 register in the vcpu structure, while *never* writing anything there outside of reset. Given that the register is defined as write-only, that we always trap when this register is accessed, there is little point in saving anything anyway. Get rid of the entry, and save a mighty 8 bytes per vcpu structure. We still need to keep it exposed to userspace in order to preserve backward compatibility with previously saved VMs. Since userspace cannot expect any effect of writing to PMSWINC_EL0, treat the register as RAZ/WI for the purpose of userspace access. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210719123902.1493805-5-maz@kernel.org
2021-08-02KVM: arm64: Disabling disabled PMU counters wastes a lot of timeAlexandre Chartre
In a KVM guest on arm64, performance counters interrupts have an unnecessary overhead which slows down execution when using the "perf record" command and limits the "perf record" sampling period. The problem is that when a guest VM disables counters by clearing the PMCR_EL0.E bit (bit 0), KVM will disable all counters defined in PMCR_EL0 even if they are not enabled in PMCNTENSET_EL0. KVM disables a counter by calling into the perf framework, in particular by calling perf_event_create_kernel_counter() which is a time consuming operation. So, for example, with a Neoverse N1 CPU core which has 6 event counters and one cycle counter, KVM will always disable all 7 counters even if only one is enabled. This typically happens when using the "perf record" command in a guest VM: perf will disable all event counters with PMCNTENTSET_EL0 and only uses the cycle counter. And when using the "perf record" -F option with a high profiling frequency, the overhead of KVM disabling all counters instead of one on every counter interrupt becomes very noticeable. The problem is fixed by having KVM disable only counters which are enabled in PMCNTENSET_EL0. If a counter is not enabled in PMCNTENSET_EL0 then KVM will not enable it when setting PMCR_EL0.E and it will remain disabled as long as it is not enabled in PMCNTENSET_EL0. So there is effectively no need to disable a counter when clearing PMCR_EL0.E if it is not enabled PMCNTENSET_EL0. Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com> [maz: moved 'mask' close to the actual user, simplifying the patch] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210712170345.660272-1-alexandre.chartre@oracle.com Link: https://lore.kernel.org/r/20210719123902.1493805-4-maz@kernel.org
2021-08-02KVM: arm64: Drop unnecessary masking of PMU registersMarc Zyngier
We always sanitise our PMU sysreg on the write side, so there is no need to do it on the read side as well. Drop the unnecessary masking. Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210719123902.1493805-3-maz@kernel.org
2021-08-02KVM: arm64: Narrow PMU sysreg reset values to architectural requirementsMarc Zyngier
A number of the PMU sysregs expose reset values that are not compliant with the architecture (set bits in the RES0 ranges, for example). This in turn has the effect that we need to pointlessly mask some register fields when using them. Let's start by making sure we don't have illegal values in the shadow registers at reset time. This affects all the registers that dedicate one bit per counter, the counters themselves, PMEVTYPERn_EL0 and PMSELR_EL0. Reported-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20210719123902.1493805-2-maz@kernel.org
2021-08-02KVM: Get rid of kvm_get_pfn()Marc Zyngier
Nobody is using kvm_get_pfn() anymore. Get rid of it. Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210726153552.1535838-7-maz@kernel.org
2021-08-02KVM: arm64: Use get_page() instead of kvm_get_pfn()Marc Zyngier
When mapping a THP, we are guaranteed that the page isn't reserved, and we can safely avoid the kvm_is_reserved_pfn() call. Replace kvm_get_pfn() with get_page(pfn_to_page()). Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210726153552.1535838-6-maz@kernel.org
2021-08-02KVM: Remove kvm_is_transparent_hugepage() and PageTransCompoundMap()Marc Zyngier
Now that arm64 has stopped using kvm_is_transparent_hugepage(), we can remove it, as well as PageTransCompoundMap() which was only used by the former. Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210726153552.1535838-5-maz@kernel.org
2021-08-02KVM: arm64: Avoid mapping size adjustment on permission faultMarc Zyngier
Since we only support PMD-sized mappings for THP, getting a permission fault on a level that results in a mapping being larger than PAGE_SIZE is a sure indication that we have already upgraded our mapping to a PMD. In this case, there is no need to try and parse userspace page tables, as the fault information already tells us everything. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20210726153552.1535838-4-maz@kernel.org
2021-08-02KVM: arm64: Walk userspace page tables to compute the THP mapping sizeMarc Zyngier
We currently rely on the kvm_is_transparent_hugepage() helper to discover whether a given page has the potential to be mapped as a block mapping. However, this API doesn't really give un everything we want: - we don't get the size: this is not crucial today as we only support PMD-sized THPs, but we'd like to have larger sizes in the future - we're the only user left of the API, and there is a will to remove it altogether To address the above, implement a simple walker using the existing page table infrastructure, and plumb it into transparent_hugepage_adjust(). No new page sizes are supported in the process. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20210726153552.1535838-3-maz@kernel.org
2021-08-02KVM: arm64: Introduce helper to retrieve a PTE and its levelMarc Zyngier
It is becoming a common need to fetch the PTE for a given address together with its level. Add such a helper. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Quentin Perret <qperret@google.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20210726153552.1535838-2-maz@kernel.org
2021-08-01Linux 5.14-rc4v5.14-rc4Linus Torvalds
2021-08-01Merge tag 'perf-tools-fixes-for-v5.14-2021-08-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Revert "perf map: Fix dso->nsinfo refcounting", this makes 'perf top' abort, uncovering a design flaw on how namespace information is kept. The fix for that is more than we can do right now, leave it for the next merge window. - Split --dump-raw-trace by AUX records for ARM's CoreSight, fixing up the decoding of some records. - Fix PMU alias matching. Thanks to James Clark and John Garry for these fixes. * tag 'perf-tools-fixes-for-v5.14-2021-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: Revert "perf map: Fix dso->nsinfo refcounting" perf pmu: Fix alias matching perf cs-etm: Split --dump-raw-trace by AUX records
2021-08-01Merge tag 'powerpc-5.14-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Don't use r30 in VDSO code, to avoid breaking existing Go lang programs. - Change an export symbol to allow non-GPL modules to use spinlocks again. Thanks to Paul Menzel, and Srikar Dronamraju. * tag 'powerpc-5.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/vdso: Don't use r30 to avoid breaking Go lang powerpc/pseries: Fix regression while building external modules
2021-08-01Merge tag 'xfs-5.14-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: "This contains a bunch of bug fixes in XFS. Dave and I have been busy the last couple of weeks to find and fix as many log recovery bugs as we can find; here are the results so far. Go fstests -g recoveryloop! ;) - Fix a number of coordination bugs relating to cache flushes for metadata writeback, cache flushes for multi-buffer log writes, and FUA writes for single-buffer log writes - Fix a bug with incorrect replay of attr3 blocks - Fix unnecessary stalls when flushing logs to disk - Fix spoofing problems when recovering realtime bitmap blocks" * tag 'xfs-5.14-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: prevent spoofing of rtbitmap blocks when recovering buffers xfs: limit iclog tail updates xfs: need to see iclog flags in tracing xfs: Enforce attr3 buffer recovery order xfs: logging the on disk inode LSN can make it go backwards xfs: avoid unnecessary waits in xfs_log_force_lsn() xfs: log forces imply data device cache flushes xfs: factor out forced iclog flushes xfs: fix ordering violation between cache flushes and tail updates xfs: fold __xlog_state_release_iclog into xlog_state_release_iclog xfs: external logs need to flush data device xfs: flush data dev on external log write
2021-07-31Merge tag '5.14-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Three cifs/smb3 fixes, including two for stable, and a fix for an fallocate problem noticed by Clang" * tag '5.14-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: add missing parsing of backupuid smb3: rc uninitialized in one fallocate path SMB3: fix readpage for large swap cache
2021-07-30Merge tag 'net-5.14-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.14-rc4, including fixes from bpf, can, WiFi (mac80211) and netfilter trees. Current release - regressions: - mac80211: fix starting aggregation sessions on mesh interfaces Current release - new code bugs: - sctp: send pmtu probe only if packet loss in Search Complete state - bnxt_en: add missing periodic PHC overflow check - devlink: fix phys_port_name of virtual port and merge error - hns3: change the method of obtaining default ptp cycle - can: mcba_usb_start(): add missing urb->transfer_dma initialization Previous releases - regressions: - set true network header for ECN decapsulation - mlx5e: RX, avoid possible data corruption w/ relaxed ordering and LRO - phy: re-add check for PHY_BRCM_DIS_TXCRXC_NOENRGY on the BCM54811 PHY - sctp: fix return value check in __sctp_rcv_asconf_lookup Previous releases - always broken: - bpf: - more spectre corner case fixes, introduce a BPF nospec instruction for mitigating Spectre v4 - fix OOB read when printing XDP link fdinfo - sockmap: fix cleanup related races - mac80211: fix enabling 4-address mode on a sta vif after assoc - can: - raw: raw_setsockopt(): fix raw_rcv panic for sock UAF - j1939: j1939_session_deactivate(): clarify lifetime of session object, avoid UAF - fix number of identical memory leaks in USB drivers - tipc: - do not blindly write skb_shinfo frags when doing decryption - fix sleeping in tipc accept routine" * tag 'net-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits) gve: Update MAINTAINERS list can: esd_usb2: fix memory leak can: ems_usb: fix memory leak can: usb_8dev: fix memory leak can: mcba_usb_start(): add missing urb->transfer_dma initialization can: hi311x: fix a signedness bug in hi3110_cmd() MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver bpf: Fix leakage due to insufficient speculative store bypass mitigation bpf: Introduce BPF nospec instruction for mitigating Spectre v4 sis900: Fix missing pci_disable_device() in probe and remove net: let flow have same hash in two directions nfc: nfcsim: fix use after free during module unload tulip: windbond-840: Fix missing pci_disable_device() in probe and remove sctp: fix return value check in __sctp_rcv_asconf_lookup nfc: s3fwrn5: fix undefined parameter values in dev_err() net/mlx5: Fix mlx5_vport_tbl_attr chain from u16 to u32 net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev() net/mlx5: Unload device upon firmware fatal error net/mlx5e: Fix page allocation failure for ptp-RQ over SF net/mlx5e: Fix page allocation failure for trap-RQ over SF ...
2021-07-30Merge tag 'acpi-5.14-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These revert a recent IRQ resources handling modification that turned out to be problematic, fix suspend-to-idle handling on AMD platforms to take upcoming systems into account properly and fix the retrieval of the DPTF attributes of the PCH FIVR. Specifics: - Revert recent change of the ACPI IRQ resources handling that attempted to improve the ACPI IRQ override selection logic, but introduced serious regressions on some systems (Hui Wang). - Fix up quirks for AMD platforms in the suspend-to-idle support code so as to take upcoming systems using uPEP HID AMDI007 into account as appropriate (Mario Limonciello). - Fix the code retrieving DPTF attributes of the PCH FIVR so that it agrees on the return data type with the ACPI control method evaluated for this purpose (Srinivas Pandruvada)" * tag 'acpi-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: DPTF: Fix reading of attributes Revert "ACPI: resources: Add checks for ACPI IRQ override" ACPI: PM: Add support for upcoming AMD uPEP HID AMDI007
2021-07-30pipe: make pipe writes always wake up readersLinus Torvalds
Since commit 1b6b26ae7053 ("pipe: fix and clarify pipe write wakeup logic") we have sanitized the pipe write logic, and would only try to wake up readers if they needed it. In particular, if the pipe already had data in it before the write, there was no point in trying to wake up a reader, since any existing readers must have been aware of the pre-existing data already. Doing extraneous wakeups will only cause potential thundering herd problems. However, it turns out that some Android libraries have misused the EPOLL interface, and expected "edge triggered" be to "any new write will trigger it". Even if there was no edge in sight. Quoting Sandeep Patil: "The commit 1b6b26ae7053 ('pipe: fix and clarify pipe write wakeup logic') changed pipe write logic to wakeup readers only if the pipe was empty at the time of write. However, there are libraries that relied upon the older behavior for notification scheme similar to what's described in [1] One such library 'realm-core'[2] is used by numerous Android applications. The library uses a similar notification mechanism as GNU Make but it never drains the pipe until it is full. When Android moved to v5.10 kernel, all applications using this library stopped working. The library has since been fixed[3] but it will be a while before all applications incorporate the updated library" Our regression rule for the kernel is that if applications break from new behavior, it's a regression, even if it was because the application did something patently wrong. Also note the original report [4] by Michal Kerrisk about a test for this epoll behavior - but at that point we didn't know of any actual broken use case. So add the extraneous wakeup, to approximate the old behavior. [ I say "approximate", because the exact old behavior was to do a wakeup not for each write(), but for each pipe buffer chunk that was filled in. The behavior introduced by this change is not that - this is just "every write will cause a wakeup, whether necessary or not", which seems to be sufficient for the broken library use. ] It's worth noting that this adds the extraneous wakeup only for the write side, while the read side still considers the "edge" to be purely about reading enough from the pipe to allow further writes. See commit f467a6a66419 ("pipe: fix and clarify pipe read wakeup logic") for the pipe read case, which remains that "only wake up if the pipe was full, and we read something from it". Link: https://lore.kernel.org/lkml/CAHk-=wjeG0q1vgzu4iJhW5juPkTsjTYmiqiMUYAebWW+0bam6w@mail.gmail.com/ [1] Link: https://github.com/realm/realm-core [2] Link: https://github.com/realm/realm-core/issues/4666 [3] Link: https://lore.kernel.org/lkml/CAKgNAkjMBGeAwF=2MKK758BhxvW58wYTgYKB2V-gY1PwXxrH+Q@mail.gmail.com/ [4] Link: https://lore.kernel.org/lkml/20210729222635.2937453-1-sspatil@android.com/ Reported-by: Sandeep Patil <sspatil@android.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-30Revert "perf map: Fix dso->nsinfo refcounting"Arnaldo Carvalho de Melo
This makes 'perf top' abort in some cases, and the right fix will involve surgery that is too much to do at this stage, so revert for now and fix it in the next merge window. This reverts commit 2d6b74baa7147251c30a46c4996e8cc224aa2dc5. Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Krister Johansen <kjlx@templeofstupid.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-30Merge branches 'acpi-resources' and 'acpi-dptf'Rafael J. Wysocki
* acpi-resources: Revert "ACPI: resources: Add checks for ACPI IRQ override" * acpi-dptf: ACPI: DPTF: Fix reading of attributes
2021-07-30Merge tag 'block-5.14-2021-07-30' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - gendisk freeing fix (Christoph) - blk-iocost wake ordering fix (Tejun) - tag allocation error handling fix (John) - loop locking fix. While this isn't the prettiest fix in the world, nobody has any good alternatives for 5.14. Something to likely revisit for 5.15. (Tetsuo) * tag 'block-5.14-2021-07-30' of git://git.kernel.dk/linux-block: block: delay freeing the gendisk blk-iocost: fix operation ordering in iocg_wake_fn() blk-mq-sched: Fix blk_mq_sched_alloc_tags() error handling loop: reintroduce global lock for safe loop_validate_file() traversal
2021-07-30Merge tag 'io_uring-5.14-2021-07-30' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: - A fix for block backed reissue (me) - Reissue context hardening (me) - Async link locking fix (Pavel) * tag 'io_uring-5.14-2021-07-30' of git://git.kernel.dk/linux-block: io_uring: fix poll requests leaking second poll entries io_uring: don't block level reissue off completion path io_uring: always reissue from task_work context io_uring: fix race in unified task_work running io_uring: fix io_prep_async_link locking
2021-07-30Merge tag 'libata-5.14-2021-07-30' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull libata fixlets from Jens Axboe: - A fix for PIO highmem (Christoph) - Kill HAVE_IDE as it's now unused (Lukas) * tag 'libata-5.14-2021-07-30' of git://git.kernel.dk/linux-block: arch: Kconfig: clean up obsolete use of HAVE_IDE libata: fix ata_pio_sector for CONFIG_HIGHMEM
2021-07-30Merge tag 'for-5.14-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix -Warray-bounds warning, to help external patchset to make it default treewide - fix writeable device accounting (syzbot report) - fix fsync and log replay after a rename and inode eviction - fix potentially lost error code when submitting multiple bios for compressed range * tag 'for-5.14-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: calculate number of eb pages properly in csum_tree_block btrfs: fix rw device counting in __btrfs_free_extra_devids btrfs: fix lost inode on log replay after mix of fsync, rename and inode eviction btrfs: mark compressed range uptodate only if all bio succeed
2021-07-30Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: - resume timing fix for intel-ish driver (Ye Xiang) - fix for using incorrect MMIO register in amd_sfh driver (Dylan MacKenzie) - Cintiq 24HDT / 27QHDT regression fix and touch processing fix for Wacom driver (Jason Gerecke) - device removal bugfix for ft260 driver (Michael Zaidman) - other small assorted fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: ft260: fix device removal due to USB disconnect HID: wacom: Skip processing of touches with negative slot values HID: wacom: Re-enable touch by default for Cintiq 24HDT / 27QHDT HID: Kconfig: Fix spelling mistake "Uninterruptable" -> "Uninterruptible" HID: apple: Add support for Keychron K1 wireless keyboard HID: fix typo in Kconfig HID: ft260: fix format type warning in ft260_word_show() HID: amd_sfh: Use correct MMIO register for DMA address HID: asus: Remove check for same LED brightness on set HID: intel-ish-hid: use async resume function
2021-07-30Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "7 patches. Subsystems affected by this patch series: lib, ocfs2, and mm (slub, migration, and memcg)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook() slub: fix unreclaimable slab stat for bulk free mm/migrate: fix NR_ISOLATED corruption on 64-bit mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code ocfs2: issue zeroout to EOF blocks ocfs2: fix zero out valid data lib/test_string.c: move string selftest in the Runtime Testing menu
2021-07-30Merge tag 'linux-can-fixes-for-5.14-20210730' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2021-07-30 The first patch is by me and adds Yasushi SHOJI as a reviewer for the Microchip CAN BUS Analyzer Tool driver. Dan Carpenter's patch fixes a signedness bug in the hi311x driver. Pavel Skripkin provides 4 patches, the first targets the mcba_usb driver by adding the missing urb->transfer_dma initialization, which was broken in a previous commit. The last 3 patches fix a memory leak in the usb_8dev, ems_usb and esd_usb2 driver. * tag 'linux-can-fixes-for-5.14-20210730' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: esd_usb2: fix memory leak can: ems_usb: fix memory leak can: usb_8dev: fix memory leak can: mcba_usb_start(): add missing urb->transfer_dma initialization can: hi311x: fix a signedness bug in hi3110_cmd() MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver ==================== Link: https://lore.kernel.org/r/20210730070526.1699867-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-30mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook()Wang Hai
When I use kfree_rcu() to free a large memory allocated by kmalloc_node(), the following dump occurs. BUG: kernel NULL pointer dereference, address: 0000000000000020 [...] Oops: 0000 [#1] SMP [...] Workqueue: events kfree_rcu_work RIP: 0010:__obj_to_index include/linux/slub_def.h:182 [inline] RIP: 0010:obj_to_index include/linux/slub_def.h:191 [inline] RIP: 0010:memcg_slab_free_hook+0x120/0x260 mm/slab.h:363 [...] Call Trace: kmem_cache_free_bulk+0x58/0x630 mm/slub.c:3293 kfree_bulk include/linux/slab.h:413 [inline] kfree_rcu_work+0x1ab/0x200 kernel/rcu/tree.c:3300 process_one_work+0x207/0x530 kernel/workqueue.c:2276 worker_thread+0x320/0x610 kernel/workqueue.c:2422 kthread+0x13d/0x160 kernel/kthread.c:313 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 When kmalloc_node() a large memory, page is allocated, not slab, so when freeing memory via kfree_rcu(), this large memory should not be used by memcg_slab_free_hook(), because memcg_slab_free_hook() is is used for slab. Using page_objcgs_check() instead of page_objcgs() in memcg_slab_free_hook() to fix this bug. Link: https://lkml.kernel.org/r/20210728145655.274476-1-wanghai38@huawei.com Fixes: 270c6a71460e ("mm: memcontrol/slab: Use helpers to access slab page's memcg_data") Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <guro@fb.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-30slub: fix unreclaimable slab stat for bulk freeShakeel Butt
SLUB uses page allocator for higher order allocations and update unreclaimable slab stat for such allocations. At the moment, the bulk free for SLUB does not share code with normal free code path for these type of allocations and have missed the stat update. So, fix the stat update by common code. The user visible impact of the bug is the potential of inconsistent unreclaimable slab stat visible through meminfo and vmstat. Link: https://lkml.kernel.org/r/20210728155354.3440560-1-shakeelb@google.com Fixes: 6a486c0ad4dc ("mm, sl[ou]b: improve memory accounting") Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <guro@fb.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-30mm/migrate: fix NR_ISOLATED corruption on 64-bitAneesh Kumar K.V
Similar to commit 2da9f6305f30 ("mm/vmscan: fix NR_ISOLATED_FILE corruption on 64-bit") avoid using unsigned int for nr_pages. With unsigned int type the large unsigned int converts to a large positive signed long. Symptoms include CMA allocations hanging forever due to alloc_contig_range->...->isolate_migratepages_block waiting forever in "while (unlikely(too_many_isolated(pgdat)))". Link: https://lkml.kernel.org/r/20210728042531.359409-1-aneesh.kumar@linux.ibm.com Fixes: c5fc5c3ae0c8 ("mm: migrate: account THP NUMA migration counters correctly") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-30mm: memcontrol: fix blocking rstat function called from atomic cgroup1 ↵Johannes Weiner
thresholding code Dan Carpenter reports: The patch 2d146aa3aa84: "mm: memcontrol: switch to rstat" from Apr 29, 2021, leads to the following static checker warning: kernel/cgroup/rstat.c:200 cgroup_rstat_flush() warn: sleeping in atomic context mm/memcontrol.c 3572 static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) 3573 { 3574 unsigned long val; 3575 3576 if (mem_cgroup_is_root(memcg)) { 3577 cgroup_rstat_flush(memcg->css.cgroup); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is from static analysis and potentially a false positive. The problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold() which holds an rcu_read_lock(). And the cgroup_rstat_flush() function can sleep. 3578 val = memcg_page_state(memcg, NR_FILE_PAGES) + 3579 memcg_page_state(memcg, NR_ANON_MAPPED); 3580 if (swap) 3581 val += memcg_page_state(memcg, MEMCG_SWAP); 3582 } else { 3583 if (!swap) 3584 val = page_counter_read(&memcg->memory); 3585 else 3586 val = page_counter_read(&memcg->memsw); 3587 } 3588 return val; 3589 } __mem_cgroup_threshold() indeed holds the rcu lock. In addition, the thresholding code is invoked during stat changes, and those contexts have irqs disabled as well. If the lock breaking occurs inside the flush function, it will result in a sleep from an atomic context. Use the irqsafe flushing variant in mem_cgroup_usage() to fix this. Link: https://lkml.kernel.org/r/20210726150019.251820-1-hannes@cmpxchg.org Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Chris Down <chris@chrisdown.name> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>