summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-10-12Merge tag 'edac_updates_for_v5.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - Add Amazon's Annapurna Labs memory controller EDAC driver (Talel Shenhar) - New AMD CPUs support (Yazen Ghannam) - The usual misc fixes and cleanups all over the subsystem * tag 'edac_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/amd64: Set proper family type for Family 19h Models 20h-2Fh EDAC/mc_sysfs: Add missing newlines when printing {max,dimm}_location EDAC/aspeed: Use module_platform_driver() to simplify EDAC, sb_edac: Simplify switch statement EDAC/ti: Fix handling of platform_get_irq() error EDAC/aspeed: Fix handling of platform_get_irq() error EDAC/i5100: Fix error handling order in i5100_init_one() EDAC/highbank: Handover Calxeda Highbank maintenance to Andre Przywara EDAC/socfpga: Transfer SoCFPGA EDAC maintainership EDAC/thunderx: Make symbol lmc_dfs_ents static EDAC/al-mc-edac: Add Amazon's Annapurna Labs Memory Controller driver dt-bindings: EDAC: Add Amazon's Annapurna Labs Memory Controller binding EDAC/mce_amd: Add new error descriptions for existing types EDAC: Replace HTTP links with HTTPS ones
2020-10-12Merge tag 'm68k-for-v5.10-tag1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k updates from Geert Uytterhoeven: - Conversion of the Mac IDE driver to a platform driver - Minor cleanups and fixes * tag 'm68k-for-v5.10-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: ide/macide: Convert Mac IDE driver to platform driver m68k: Replace HTTP links with HTTPS ones m68k: mm: Remove superfluous memblock_alloc*() casts m68k: mm: Use PAGE_ALIGNED() helper m68k: Sort selects in main Kconfig m68k: amiga: Clean up Amiga hardware configuration m68k: Revive _TIF_* masks m68k: Correct some typos in comments m68k: Use get_kernel_nofault() in show_registers() zorro: Fix address space collision message with RAM expansion boards m68k: amiga: Fix Denise detection on OCS
2020-10-12Merge tag 'microblaze-v5.10' of git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds
Pull Microblaze build warning fix from Michal Simek. * tag 'microblaze-v5.10' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: fix kbuild redundant file warning
2020-10-12Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "There's quite a lot of code here, but much of it is due to the addition of a new PMU driver as well as some arm64-specific selftests which is an area where we've traditionally been lagging a bit. In terms of exciting features, this includes support for the Memory Tagging Extension which narrowly missed 5.9, hopefully allowing userspace to run with use-after-free detection in production on CPUs that support it. Work is ongoing to integrate the feature with KASAN for 5.11. Another change that I'm excited about (assuming they get the hardware right) is preparing the ASID allocator for sharing the CPU page-table with the SMMU. Those changes will also come in via Joerg with the IOMMU pull. We do stray outside of our usual directories in a few places, mostly due to core changes required by MTE. Although much of this has been Acked, there were a couple of places where we unfortunately didn't get any review feedback. Other than that, we ran into a handful of minor conflicts in -next, but nothing that should post any issues. Summary: - Userspace support for the Memory Tagging Extension introduced by Armv8.5. Kernel support (via KASAN) is likely to follow in 5.11. - Selftests for MTE, Pointer Authentication and FPSIMD/SVE context switching. - Fix and subsequent rewrite of our Spectre mitigations, including the addition of support for PR_SPEC_DISABLE_NOEXEC. - Support for the Armv8.3 Pointer Authentication enhancements. - Support for ASID pinning, which is required when sharing page-tables with the SMMU. - MM updates, including treating flush_tlb_fix_spurious_fault() as a no-op. - Perf/PMU driver updates, including addition of the ARM CMN PMU driver and also support to handle CPU PMU IRQs as NMIs. - Allow prefetchable PCI BARs to be exposed to userspace using normal non-cacheable mappings. - Implementation of ARCH_STACKWALK for unwinding. - Improve reporting of unexpected kernel traps due to BPF JIT failure. - Improve robustness of user-visible HWCAP strings and their corresponding numerical constants. - Removal of TEXT_OFFSET. - Removal of some unused functions, parameters and prototypes. - Removal of MPIDR-based topology detection in favour of firmware description. - Cleanups to handling of SVE and FPSIMD register state in preparation for potential future optimisation of handling across syscalls. - Cleanups to the SDEI driver in preparation for support in KVM. - Miscellaneous cleanups and refactoring work" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits) Revert "arm64: initialize per-cpu offsets earlier" arm64: random: Remove no longer needed prototypes arm64: initialize per-cpu offsets earlier kselftest/arm64: Check mte tagged user address in kernel kselftest/arm64: Verify KSM page merge for MTE pages kselftest/arm64: Verify all different mmap MTE options kselftest/arm64: Check forked child mte memory accessibility kselftest/arm64: Verify mte tag inclusion via prctl kselftest/arm64: Add utilities and a test to validate mte memory perf: arm-cmn: Fix conversion specifiers for node type perf: arm-cmn: Fix unsigned comparison to less than zero arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option arm64: Pull in task_stack_page() to Spectre-v4 mitigation code KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled arm64: Get rid of arm64_ssbd_state KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state() KVM: arm64: Get rid of kvm_arm_have_ssbd() KVM: arm64: Simplify handling of ARCH_WORKAROUND_2 ...
2020-10-12Merge tag 'tpmdd-next-v5.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm updates from Jarkko Sakkinen: "Support for a new TPM device and fixes and Git URL change (infraded -> korg)" * tag 'tpmdd-next-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: MAINTAINERS: TPM DEVICE DRIVER: Update GIT tpm_tis: Add a check for invalid status tpm: use %*ph to print small buffer dt-bindings: Add SynQucer TPM MMIO as a trivial device tpm: tis: add support for MMIO TPM on SynQuacer
2020-10-12Merge branch 'efi/urgent' into efi/core, to pick up fixesIngo Molnar
These fixes missed the v5.9 merge window, pick them up for early v5.10 merge. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-10-12perf/core: Fix race in the perf_mmap_close() functionJiri Olsa
There's a possible race in perf_mmap_close() when checking ring buffer's mmap_count refcount value. The problem is that the mmap_count check is not atomic because we call atomic_dec() and atomic_read() separately. perf_mmap_close: ... atomic_dec(&rb->mmap_count); ... if (atomic_read(&rb->mmap_count)) goto out_put; <ring buffer detach> free_uid out_put: ring_buffer_put(rb); /* could be last */ The race can happen when we have two (or more) events sharing same ring buffer and they go through atomic_dec() and then they both see 0 as refcount value later in atomic_read(). Then both will go on and execute code which is meant to be run just once. The code that detaches ring buffer is probably fine to be executed more than once, but the problem is in calling free_uid(), which will later on demonstrate in related crashes and refcount warnings, like: refcount_t: addition on 0; use-after-free. ... RIP: 0010:refcount_warn_saturate+0x6d/0xf ... Call Trace: prepare_creds+0x190/0x1e0 copy_creds+0x35/0x172 copy_process+0x471/0x1a80 _do_fork+0x83/0x3a0 __do_sys_wait4+0x83/0x90 __do_sys_clone+0x85/0xa0 do_syscall_64+0x5b/0x1e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Using atomic decrease and check instead of separated calls. Tested-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Wade Mealing <wmealing@redhat.com> Fixes: 9bb5d40cd93c ("perf: Fix mmap() accounting hole"); Link: https://lore.kernel.org/r/20200916115311.GE2301783@krava
2020-10-12Merge branch 'edac-drivers' into edac-updates-for-v5.10Borislav Petkov
Signed-off-by: Borislav Petkov <bp@suse.de>
2020-10-11Linux 5.9v5.9Linus Torvalds
2020-10-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "Five fixes. Subsystems affected by this patch series: MAINTAINERS, mm/pagemap, mm/swap, and mm/hugetlb" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged mm: validate inode in mapping_set_error() mm: mmap: Fix general protection fault in unlink_file_vma() MAINTAINERS: Antoine Tenart's email address MAINTAINERS: change hardening mailing list
2020-10-11Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs fix from Al Viro: "Fixes an obvious bug (memory leak introduced in 5.8)" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: pipe: Fix memory leaks in create_pipe_files()
2020-10-11Merge tag 'x86-urgent-2020-10-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two fixes: - Fix a (hopefully final) IRQ state tracking bug vs MCE handling - Fix a documentation link" * tag 'x86-urgent-2020-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/x86: Fix incorrect references to zero-page.txt x86/mce: Use idtentry_nmi_enter/exit()
2020-10-11Merge tag 'irqchip-5.10' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier: Core changes: - Allow irq retriggering to follow a hierarchy - Allow interrupt hierarchies to be trimmed at allocation time - Allow interrupts to be hidden from /proc/interrupts (IPIs) - Introduce stub for set_handle_irq() when !GENERIC_IRQ_MULTI_HANDLER - New per-cpu IPI handling flow Architecture changes: - Move arm/arm64 IPI handling to the core interrupt code, removing the home brewed accounting Driver updates: - New driver for the MStar (and more recently Mediatek) platforms - New driver for the Actions Owl SIRQ controller - New driver for the TI PRUSS infrastructure - Wake-up support for the Qualcomm PDC controller - Primary interrupt controller support for the Designware APB ICTL - Convert the IPI code for GIC, GICv3, hip04, armada-270-xp and bcm2836 to using standard interrupts - Improve GICv3 pseudo-NMI support to deal with both non-secure and secure priorities on arm64 - Convert the GIC/GICv3 drivers to using HW-based irq retrigger - A sprinkling of dev_err_probe() conversion - A set of NVIDIA Tegra fixes for interrupt hierarchy corruption - A reset fix for the Loongson HTVEC driver - A couple of error handling fixes in the TI SCI drivers
2020-10-11Merge tag 'perf-urgent-2020-10-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Ingo Molnar: "Fix an error handling bug that can cause a lockup if a CPU is offline (doh ...)" * tag 'perf-urgent-2020-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix task_function_call() error handling
2020-10-11mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected ↵Vijay Balakrishna
by khugepaged When memory is hotplug added or removed the min_free_kbytes should be recalculated based on what is expected by khugepaged. Currently after hotplug, min_free_kbytes will be set to a lower default and higher default set when THP enabled is lost. This change restores min_free_kbytes as expected for THP consumers. [vijayb@linux.microsoft.com: v5] Link: https://lkml.kernel.org/r/1601398153-5517-1-git-send-email-vijayb@linux.microsoft.com Fixes: f000565adb77 ("thp: set recommended min free kbytes") Signed-off-by: Vijay Balakrishna <vijayb@linux.microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Allen Pais <apais@microsoft.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Song Liu <songliubraving@fb.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/1600305709-2319-2-git-send-email-vijayb@linux.microsoft.com Link: https://lkml.kernel.org/r/1600204258-13683-1-git-send-email-vijayb@linux.microsoft.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-11mm: validate inode in mapping_set_error()Minchan Kim
The swap address_space doesn't have host. Thus, it makes kernel crash once swap write meets error. Fix it. Fixes: 735e4ae5ba28 ("vfs: track per-sb writeback errors and report them to syncfs") Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Andres Freund <andres@anarazel.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201010000650.750063-1-minchan@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-11mm: mmap: Fix general protection fault in unlink_file_vma()Miaohe Lin
The syzbot reported the below general protection fault: general protection fault, probably for non-canonical address 0xe00eeaee0000003b: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0x00777770000001d8-0x00777770000001df] CPU: 1 PID: 10488 Comm: syz-executor721 Not tainted 5.9.0-rc3-syzkaller #0 RIP: 0010:unlink_file_vma+0x57/0xb0 mm/mmap.c:164 Call Trace: free_pgtables+0x1b3/0x2f0 mm/memory.c:415 exit_mmap+0x2c0/0x530 mm/mmap.c:3184 __mmput+0x122/0x470 kernel/fork.c:1076 mmput+0x53/0x60 kernel/fork.c:1097 exit_mm kernel/exit.c:483 [inline] do_exit+0xa8b/0x29f0 kernel/exit.c:793 do_group_exit+0x125/0x310 kernel/exit.c:903 get_signal+0x428/0x1f00 kernel/signal.c:2757 arch_do_signal+0x82/0x2520 arch/x86/kernel/signal.c:811 exit_to_user_mode_loop kernel/entry/common.c:136 [inline] exit_to_user_mode_prepare+0x1ae/0x200 kernel/entry/common.c:167 syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:242 entry_SYSCALL_64_after_hwframe+0x44/0xa9 It's because the ->mmap() callback can change vma->vm_file and fput the original file. But the commit d70cec898324 ("mm: mmap: merge vma after call_mmap() if possible") failed to catch this case and always fput() the original file, hence add an extra fput(). [ Thanks Hillf for pointing this extra fput() out. ] Fixes: d70cec898324 ("mm: mmap: merge vma after call_mmap() if possible") Reported-by: syzbot+c5d5a51dcbb558ca0cb5@syzkaller.appspotmail.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christian König <ckoenig.leichtzumerken@gmail.com> Cc: Hongxiang Lou <louhongxiang@huawei.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Airlie <airlied@redhat.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: John Hubbard <jhubbard@nvidia.com> Link: https://lkml.kernel.org/r/20200916090733.31427-1-linmiaohe@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-11MAINTAINERS: Antoine Tenart's email addressAntoine Tenart
Use my kernel.org address instead of my bootlin.com one. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20201005164533.16811-1-atenart@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-11MAINTAINERS: change hardening mailing listKees Cook
As more email from git history gets aimed at the OpenWall kernel-hardening@ list, there has been a desire to separate "new topics" from "on-going" work. To handle this, the superset of hardening email topics are now to be directed to linux-hardening@vger.kernel.org. Update the MAINTAINERS file and the .mailmap to accomplish this, so that linux-hardening@ can be treated like any other regular upstream kernel development list. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Emese Revfy <re.emese@gmail.com> Cc: "Tobin C. Harding" <me@tobin.cc> Cc: Tycho Andersen <tycho@tycho.pizza> Cc: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/linux-hardening/202010051443.279CC265D@keescook/ Link: https://lkml.kernel.org/r/20201006000012.2768958-1-keescook@chromium.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-10Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Some more driver bugfixes for I2C. Including a revert - the updated series for it will come during the next merge window" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: owl: Clear NACK and BUS error bits Revert "i2c: imx: Fix reset of I2SR_IAL flag" i2c: meson: fixup rate calculation with filter delay i2c: meson: keep peripheral clock enabled i2c: meson: fix clock setting overwrite i2c: imx: Fix reset of I2SR_IAL flag
2020-10-10cifs: Fix incomplete memory allocation on setxattr pathVladimir Zapolskiy
On setxattr() syscall path due to an apprent typo the size of a dynamically allocated memory chunk for storing struct smb2_file_full_ea_info object is computed incorrectly, to be more precise the first addend is the size of a pointer instead of the wanted object size. Coincidentally it makes no difference on 64-bit platforms, however on 32-bit targets the following memcpy() writes 4 bytes of data outside of the dynamically allocated memory. ============================================================================= BUG kmalloc-16 (Not tainted): Redzone overwritten ----------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: 0x79e69a6f-0x9e5cdecf @offset=368. First byte 0x73 instead of 0xcc INFO: Slab 0xd36d2454 objects=85 used=51 fp=0xf7d0fc7a flags=0x35000201 INFO: Object 0x6f171df3 @offset=352 fp=0x00000000 Redzone 5d4ff02d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................ Object 6f171df3: 00 00 00 00 00 05 06 00 73 6e 72 75 62 00 66 69 ........snrub.fi Redzone 79e69a6f: 73 68 32 0a sh2. Padding 56254d82: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ CPU: 0 PID: 8196 Comm: attr Tainted: G B 5.9.0-rc8+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 Call Trace: dump_stack+0x54/0x6e print_trailer+0x12c/0x134 check_bytes_and_report.cold+0x3e/0x69 check_object+0x18c/0x250 free_debug_processing+0xfe/0x230 __slab_free+0x1c0/0x300 kfree+0x1d3/0x220 smb2_set_ea+0x27d/0x540 cifs_xattr_set+0x57f/0x620 __vfs_setxattr+0x4e/0x60 __vfs_setxattr_noperm+0x4e/0x100 __vfs_setxattr_locked+0xae/0xd0 vfs_setxattr+0x4e/0xe0 setxattr+0x12c/0x1a0 path_setxattr+0xa4/0xc0 __ia32_sys_lsetxattr+0x1d/0x20 __do_fast_syscall_32+0x40/0x70 do_fast_syscall_32+0x29/0x60 do_SYSENTER_32+0x15/0x20 entry_SYSENTER_32+0x9f/0xf2 Fixes: 5517554e4313 ("cifs: Add support for writing attributes on SMB2+") Signed-off-by: Vladimir Zapolskiy <vladimir@tuxera.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-10mm/khugepaged: fix filemap page_to_pgoff(page) != offsetHugh Dickins
There have been elusive reports of filemap_fault() hitting its VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built with CONFIG_READ_ONLY_THP_FOR_FS=y. Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged without NUMA reuses the same huge page after collapse_file() failed (whereas NUMA targets its allocation to the respective node each time). And most of us were usually testing with CONFIG_NUMA=y kernels. collapse_file(old start) new_page = khugepaged_alloc_page(hpage) __SetPageLocked(new_page) new_page->index = start // hpage->index=old offset new_page->mapping = mapping xas_store(&xas, new_page) filemap_fault page = find_get_page(mapping, offset) // if offset falls inside hpage then // compound_head(page) == hpage lock_page_maybe_drop_mmap() __lock_page(page) // collapse fails xas_store(&xas, old page) new_page->mapping = NULL unlock_page(new_page) collapse_file(new start) new_page = khugepaged_alloc_page(hpage) __SetPageLocked(new_page) new_page->index = start // hpage->index=new offset new_page->mapping = mapping // mapping becomes valid again // since compound_head(page) == hpage // page_to_pgoff(page) got changed VM_BUG_ON_PAGE(page_to_pgoff(page) != offset) An initial patch replaced __SetPageLocked() by lock_page(), which did fix the race which Suren illustrates above. But testing showed that it's not good enough: if the racing task's __lock_page() gets delayed long after its find_get_page(), then it may follow collapse_file(new start)'s successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE. It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a check and retry (as is done for mapping), with similar relaxations in find_lock_entry() and pagecache_get_page(): but it's not obvious what else might get caught out; and khugepaged non-NUMA appears to be unique in exposing a page to page cache, then revoking, without going through a full cycle of freeing before reuse. Instead, non-NUMA khugepaged_prealloc_page() release the old page if anyone else has a reference to it (1% of cases when I tested). Although never reported on huge tmpfs, I believe its find_lock_entry() has been at similar risk; but huge tmpfs does not rely on khugepaged for its normal working nearly so much as READ_ONLY_THP_FOR_FS does. Reported-by: Denis Lisov <dennis.lissov@gmail.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569 Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%40linux-foundation.org Reported-by: Qian Cai <cai@lca.pw> Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pw Reported-and-analyzed-by: Suren Baghdasaryan <surenb@google.com> Fixes: 87c460a0bded ("mm/khugepaged: collapse_shmem() without freezing new_page") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: stable@vger.kernel.org # v4.9+ Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-10io_uring: keep a pointer ref_node in file_dataPavel Begunkov
->cur_refs of struct fixed_file_data always points to percpu_ref embedded into struct fixed_file_ref_node. Don't overuse container_of() and offsetting, and point directly to fixed_file_ref_node. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-10io_uring: refactor *files_register()'s error pathsPavel Begunkov
Don't keep repeating cleaning sequences in error paths, write it once in the and use labels. It's less error prone and looks cleaner. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-10io_uring: clean file_data access in files_registerPavel Begunkov
Keep file_data in a local var and replace with it complex references such as ctx->file_data. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-10io_uring: don't delay io_init_req() error checkPavel Begunkov
Don't postpone io_init_req() error checks and do that right after calling it. There is no control-flow statements or dependencies with sqe/submitted accounting, so do those earlier, that makes the code flow a bit more natural. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-10io_uring: clean leftovers after splitting issuePavel Begunkov
Kill extra if in io_issue_sqe() and place send/recv[msg] calls appropriately under switch's cases. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-10io_uring: remove timeout.list after hrtimer cancelPavel Begunkov
Remove timeouts from ctx->timeout_list after hrtimer_try_to_cancel() successfully cancels it. With this we don't need to care whether there was a race and it was removed in io_timeout_fn(), and that will be handy for following patches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-10io_uring: use a separate struct for timeout_removePavel Begunkov
Don't use struct io_timeout for both IORING_OP_TIMEOUT and IORING_OP_TIMEOUT_REMOVE, they're quite different. Split them in two, that allows to remove an unused field in struct io_timeout, and btw kill ->flags not used by either. This also easier to follow, especially for timeout remove. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-10io_uring: improve submit_state.ios_left accountingPavel Begunkov
state->ios_left isn't decremented for requests that don't need a file, so it might be larger than number of SQEs left. That in some circumstances makes us to grab more files that is needed so imposing extra put. Deaccount one ios_left for each request. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-10io_uring: simplify io_file_get()Pavel Begunkov
Keep ->needs_file_no_error check out of io_file_get(), and let callers handle it. It makes it more straightforward. Also, as the only error it can hand back -EBADF, make it return a file or NULL. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-10io_uring: kill extra check in fixed io_file_get()Pavel Begunkov
ctx->nr_user_files == 0 IFF ctx->file_data == NULL and there fixed files are not used. Hence, verifying fds only against ctx->nr_user_files is enough. Remove the other check from hot path. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-10io_uring: clean up ->files grabbingPavel Begunkov
Move work.files grabbing into io_prep_async_work() to all other work resources initialisation. We don't need to keep it separately now, as ->ring_fd/file are gone. It also allows to not grab it when a request is not going to io-wq. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-10io_uring: don't io_prep_async_work() linked reqsPavel Begunkov
There is no real reason left for preparing io-wq work context for linked requests in advance, remove it as this might become a bottleneck in some cases. Reported-by: Roman Gershman <romger@amazon.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-10Merge branch 'irq/mstar' into irq/irqchip-nextMarc Zyngier
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-10-10dt-bindings: interrupt-controller: Add MStar interrupt controllerMark-PK Tsai
Add binding for MStar interrupt controller. Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20200902063344.1852-3-mark-pk.tsai@mediatek.com
2020-10-10irqchip/irq-mst: Add MStar interrupt controller supportMark-PK Tsai
Add MStar interrupt controller support using hierarchy irq domain. Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Daniel Palmer <daniel@thingy.jp> Link: https://lore.kernel.org/r/20200902063344.1852-2-mark-pk.tsai@mediatek.com
2020-10-10Merge branch 'irq/irqchip-fixes' into irq/irqchip-nextMarc Zyngier
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-10-10Merge branch 'irq/tegra-pmc' into irq/irqchip-nextMarc Zyngier
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-10-10i2c: owl: Clear NACK and BUS error bitsCristian Ciocaltea
When the NACK and BUS error bits are set by the hardware, the driver is responsible for clearing them by writing "1" into the corresponding status registers. Hence perform the necessary operations in owl_i2c_interrupt(). Fixes: d211e62af466 ("i2c: Add Actions Semiconductor Owl family S900 I2C driver") Reported-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@gmail.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2020-10-10soc/tegra: pmc: Don't create fake interrupt hierarchy levelsMarc Zyngier
The Tegra PMC driver does ungodly things with the interrupt hierarchy, repeatedly corrupting it by pulling hwirq numbers out of thin air, overriding existing IRQ mappings and changing the handling flow of unsuspecting users. All of this is done in the name of preserving the interrupt hierarchy even when these levels do not exist in the HW. Together with the use of proper IRQs for IPIs, this leads to an unbootable system as the rescheduling IPI gets repeatedly repurposed for random drivers... Instead, let's simply mark the level from which the hierarchy does not make sense for the HW, and let the core code trim the usused levels from the hierarchy. Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-10-10soc/tegra: pmc: Allow optional irq parent callbacksMarc Zyngier
Make the PMC driver resistent to variable depth interrupt hierarchy, which we are about to introduce. Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-10-10gpio: tegra186: Allow optional irq parent callbacksMarc Zyngier
Make the tegra186 GPIO driver resistent to variable depth interrupt hierarchy, which we are about to introduce. No functionnal change yet. Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-10-10genirq/irqdomain: Allow partial trimming of irq_data hierarchyMarc Zyngier
It appears that some HW is ugly enough that not all the interrupts connected to a particular interrupt controller end up with the same hierarchy depth (some of them are terminated early). This leaves the irqchip hacker with only two choices, both equally bad: - create discrete domain chains, one for each "hierarchy depth", which is very hard to maintain - create fake hierarchy levels for the shallow paths, leading to all kind of problems (what are the safe hwirq values for these fake levels?) Implement the ability to cut short a single interrupt hierarchy from a level marked as being disconnected by using the new irq_domain_disconnect_hierarchy() helper. The irqdomain allocation code will then perform the trimming Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-10-10Revert "i2c: imx: Fix reset of I2SR_IAL flag"Wolfram Sang
This reverts commit fa4d30556883f2eaab425b88ba9904865a4d00f3. An updated version was sent. So, revert this version and give the new version more time for testing. Signed-off-by: Wolfram Sang <wsa@kernel.org>
2020-10-09Merge tag 'spi-fix-v5.9-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fix from Mark Brown: "One last minute fix for v5.9 which has been causing crashes in test systems with the fsl-dspi driver when they hit deferred probe (and which I probably let cook in next a bit longer than is ideal). And an update to MAINTAINERS reflecting Serge's extensive and detailed recent work on the DesignWare driver" * tag 'spi-fix-v5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: MAINTAINERS: Add maintainer of DW APB SSI driver spi: fsl-dspi: fix NULL pointer dereference
2020-10-09Merge tag 'riscv-for-linus-5.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: "Two fixes this week: - A fix to actually reserve the device tree's memory. Without this the device tree can be overwritten on systems that don't otherwise reserve it. This issue should only manifest on !MMU systems. - A workaround for a BUG() that triggers when the memory that originally contained initdata is freed and later repurposed. This triggers a BUG() on builds that had HARDENED_USERCOPY enabled" * tag 'riscv-for-linus-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fixup bootup failure with HARDENED_USERCOPY RISC-V: Make sure memblock reserves the memory containing DT
2020-10-09ata: ahci: mvebu: Make SATA PHY optional for Armada 3720Pali Rohár
Older ATF does not provide SMC call for SATA phy power on functionality and therefore initialization of ahci_mvebu is failing when older version of ATF is using. In this case phy_power_on() function returns -EOPNOTSUPP. This patch adds a new hflag AHCI_HFLAG_IGN_NOTSUPP_POWER_ON which cause that ahci_platform_enable_phys() would ignore -EOPNOTSUPP errors from phy_power_on() call. It fixes initialization of ahci_mvebu on Espressobin boards where is older Marvell's Arm Trusted Firmware without SMC call for SATA phy power. This is regression introduced in commit 8e18c8e58da64 ("arm64: dts: marvell: armada-3720-espressobin: declare SATA PHY property") where SATA phy was defined and therefore ahci_platform_enable_phys() on Espressobin started failing. Fixes: 8e18c8e58da64 ("arm64: dts: marvell: armada-3720-espressobin: declare SATA PHY property") Signed-off-by: Pali Rohár <pali@kernel.org> Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com> Cc: <stable@vger.kernel.org> # 5.1+: ea17a0f153af: phy: marvell: comphy: Convert internal SMCC firmware return codes to errno Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-09block: fix uapi blkzoned.h commentsDamien Le Moal
Update the kdoc comments for struct blk_zone (capacity field description missing) and for struct blk_zone_report (flags field description missing). Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-09blk-mq: move cancel of hctx->run_work to the front of blk_exit_queueYang Yang
blk_exit_queue will free elevator_data, while blk_mq_run_work_fn will access it. Move cancel of hctx->run_work to the front of blk_exit_queue to avoid use-after-free. Fixes: 1b97871b501f ("blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release") Signed-off-by: Yang Yang <yang.yang@vivo.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>