summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-11-23Merge tag 'powerpc-cve-2020-4788' into fixesMichael Ellerman
From Daniel's cover letter: IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch series flushes the L1 cache on kernel entry (patch 2) and after the kernel performs any user accesses (patch 3). It also adds a self-test and performs some related cleanups.
2020-11-23cpufreq: scmi: Fix build for !CONFIG_COMMON_CLKSudeep Holla
Commit 8410e7f3b31e ("cpufreq: scmi: Fix OPP addition failure with a dummy clock provider") registers a dummy clock provider using devm_of_clk_add_hw_provider. These *_hw_provider functions are defined only when CONFIG_COMMON_CLK=y. One possible fix is to add the Kconfig dependency, but since we plan to move away from the clock dependency for scmi cpufreq, it is preferrable to avoid that. Let us just conditionally compile out the offending call to devm_of_clk_add_hw_provider. It also uses the variable 'dev' outside of the #ifdef block to avoid build warning. Fixes: 8410e7f3b31e ("cpufreq: scmi: Fix OPP addition failure with a dummy clock provider") Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-11-23drm/exynos: depend on COMMON_CLK to fix compile testsKrzysztof Kozlowski
The Exynos DRM uses Common Clock Framework thus it cannot be built on platforms without it (e.g. compile test on MIPS with RALINK and SOC_RT305X): /usr/bin/mips-linux-gnu-ld: drivers/gpu/drm/exynos/exynos_mixer.o: in function `mixer_bind': exynos_mixer.c:(.text+0x958): undefined reference to `clk_set_parent' Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2020-11-22Linux 5.10-rc5v5.10-rc5Linus Torvalds
2020-11-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: - Various functionality / regression fixes for Logitech devices from Hans de Goede - Fix for (recently added) GPIO support in mcp2221 driver from Lars Povlsen - Power management handling fix/quirk in i2c-hid driver for certain BIOSes that have strange aproach to power-cycle from Hans de Goede - a few device ID additions and device-specific quirks * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: logitech-dj: Fix Dinovo Mini when paired with a MX5x00 receiver HID: logitech-dj: Fix an error in mse_bluetooth_descriptor HID: Add Logitech Dinovo Edge battery quirk HID: logitech-hidpp: Add HIDPP_CONSUMER_VENDOR_KEYS quirk for the Dinovo Edge HID: logitech-dj: Handle quad/bluetooth keyboards with a builtin trackpad HID: add HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE for Gamevice devices HID: mcp2221: Fix GPIO output handling HID: hid-sensor-hub: Fix issue with devices with no report ID HID: i2c-hid: Put ACPI enumerated devices in D3 on shutdown HID: add support for Sega Saturn HID: cypress: Support Varmilo Keyboards' media hotkeys HID: ite: Replace ABS_MISC 120/121 events with touchpad on/off keypresses HID: logitech-hidpp: Add PID for MX Anywhere 2 HID: uclogic: Add ID for Trust Flex Design Tablet
2020-11-22Merge tag 'sched-urgent-2020-11-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Thomas Gleixner: "A couple of scheduler fixes: - Make the conditional update of the overutilized state work correctly by caching the relevant flags state before overwriting them and checking them afterwards. - Fix a data race in the wakeup path which caused loadavg on ARM64 platforms to become a random number generator. - Fix the ordering of the iowaiter accounting operations so it can't be decremented before it is incremented. - Fix a bug in the deadline scheduler vs. priority inheritance when a non-deadline task A has inherited the parameters of a deadline task B and then blocks on a non-deadline task C. The second inheritance step used the static deadline parameters of task A, which are usually 0, instead of further propagating task B's parameters. The zero initialized parameters trigger a bug in the deadline scheduler" * tag 'sched-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/deadline: Fix priority inheritance with multiple scheduling classes sched: Fix rq->nr_iowait ordering sched: Fix data-race in wakeup sched/fair: Fix overutilized update in enqueue_task_fair()
2020-11-22Merge tag 'perf-urgent-2020-11-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Thomas Gleixner: "A single fix for the x86 perf sysfs interfaces which used kobject attributes instead of device attributes and therefore making clang's control flow integrity checker upset" * tag 'perf-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: fix sysfs type mismatches
2020-11-22Merge tag 'locking-urgent-2020-11-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Thomas Gleixner: "A single fix for lockdep which makes the recursion protection cover graph lock/unlock" * tag 'locking-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep: Put graph lock/unlock under lock_recursion protection
2020-11-22Merge tag 'efi-urgent-for-v5.10-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Borislav Petkov: "Forwarded EFI fixes from Ard Biesheuvel: - fix memory leak in efivarfs driver - fix HYP mode issue in 32-bit ARM version of the EFI stub when built in Thumb2 mode - avoid leaking EFI pgd pages on allocation failure" * tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/x86: Free efi_pgd with free_pages() efivarfs: fix memory leak in efivarfs_create() efi/arm: set HSCTLR Thumb2 bit correctly for HVC calls from HYP
2020-11-22Merge tag 'x86_urgent_for_v5.10-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - An IOMMU VT-d build fix when CONFIG_PCI_ATS=n along with a revert of same because the proper one is going through the IOMMU tree (Thomas Gleixner) - An Intel microcode loader fix to save the correct microcode patch to apply during resume (Chen Yu) - A fix to not access user memory of other processes when dumping opcode bytes (Thomas Gleixner) * tag 'x86_urgent_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "iommu/vt-d: Take CONFIG_PCI_ATS into account" x86/dumpstack: Do not try to access user space code of other tasks x86/microcode/intel: Check patch signature before saving microcode for early loading iommu/vt-d: Take CONFIG_PCI_ATS into account
2020-11-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "8 patches. Subsystems affected by this patch series: mm (madvise, pagemap, readahead, memcg, userfaultfd), kbuild, and vfs" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: fix madvise WILLNEED performance problem libfs: fix error cast of negative value in simple_attr_write() mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault() mm: memcg/slab: fix root memcg vmstats mm: fix readahead_page_batch for retry entries mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports compiler-clang: remove version check for BPF Tracing mm/madvise: fix memory leak from process_madvise
2020-11-22Merge tag 'staging-5.10-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging and IIO fixes from Greg KH: "Here are some small Staging and IIO driver fixes for 5.10-rc5. They include: - IIO fixes for reported regressions and problems - new device ids for IIO drivers - new device id for rtl8723bs driver - staging ralink driver Kconfig dependency fix - staging mt7621-pci bus resource fix All of these have been in linux-next all week with no reported issues" * tag 'staging-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: iio: accel: kxcjk1013: Add support for KIOX010A ACPI DSM for setting tablet-mode iio: accel: kxcjk1013: Replace is_smo8500_device with an acpi_type enum docs: ABI: testing: iio: stm32: remove re-introduced unsupported ABI iio: light: fix kconfig dependency bug for VCNL4035 iio/adc: ingenic: Fix AUX/VBAT readings when touchscreen is used iio/adc: ingenic: Fix battery VREF for JZ4770 SoC staging: rtl8723bs: Add 024c:0627 to the list of SDIO device-ids staging: ralink-gdma: fix kconfig dependency bug for DMA_RALINK staging: mt7621-pci: avoid to request pci bus resources iio: imu: st_lsm6dsx: set 10ms as min shub slave timeout counter/ti-eqep: Fix regmap max_register iio: adc: stm32-adc: fix a regression when using dma and irq iio: adc: mediatek: fix unset field iio: cros_ec: Use default frequencies when EC returns invalid information
2020-11-22Merge tag 'tty-5.10-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty fixes from Greg KH: "Here are some small tty/serial fixes for 5.10-rc5 that resolve some reported issues: - speakup crash when telling the kernel to use a device that isn't really there - imx serial driver fixes for reported problems - ar933x_uart driver fix for probe error handling path All have been in linux-next for a while with no reported issues" * tag 'tty-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: ar933x_uart: disable clk on error handling path in probe tty: serial: imx: keep console clocks always on speakup: Do not let the line discipline be used several times tty: serial: imx: fix potential deadlock
2020-11-22Merge tag 'ext4_for_linus_fixes2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "A final set of miscellaneous bug fixes for ext4" * tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix bogus warning in ext4_update_dx_flag() jbd2: fix kernel-doc markups ext4: drop fast_commit from /proc/mounts
2020-11-22afs: Fix speculative status fetch going out of order wrt to modificationsDavid Howells
When doing a lookup in a directory, the afs filesystem uses a bulk status fetch to speculatively retrieve the statuses of up to 48 other vnodes found in the same directory and it will then either update extant inodes or create new ones - effectively doing 'lookup ahead'. To avoid the possibility of deadlocking itself, however, the filesystem doesn't lock all of those inodes; rather just the directory inode is locked (by the VFS). When the operation completes, afs_inode_init_from_status() or afs_apply_status() is called, depending on whether the inode already exists, to commit the new status. A case exists, however, where the speculative status fetch operation may straddle a modification operation on one of those vnodes. What can then happen is that the speculative bulk status RPC retrieves the old status, and whilst that is happening, the modification happens - which returns an updated status, then the modification status is committed, then we attempt to commit the speculative status. This results in something like the following being seen in dmesg: kAFS: vnode modified {100058:861} 8->9 YFS.InlineBulkStatus showing that for vnode 861 on volume 100058, we saw YFS.InlineBulkStatus say that the vnode had data version 8 when we'd already recorded version 9 due to a local modification. This was causing the cache to be invalidated for that vnode when it shouldn't have been. If it happens on a data file, this might lead to local changes being lost. Fix this by ignoring speculative status updates if the data version doesn't match the expected value. Note that it is possible to get a DV regression if a volume gets restored from a backup - but we should get a callback break in such a case that should trigger a recheck anyway. It might be worth checking the volume creation time in the volsync info and, if a change is observed in that (as would happen on a restore), invalidate all caches associated with the volume. Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22mm: fix madvise WILLNEED performance problemMatthew Wilcox (Oracle)
The calculation of the end page index was incorrect, leading to a regression of 70% when running stress-ng. With this fix, we instead see a performance improvement of 3%. Fixes: e6e88712e43b ("mm: optimise madvise WILLNEED") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Feng Tang <feng.tang@intel.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Link: https://lkml.kernel.org/r/20201109134851.29692-1-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22libfs: fix error cast of negative value in simple_attr_write()Yicong Yang
The attr->set() receive a value of u64, but simple_strtoll() is used for doing the conversion. It will lead to the error cast if user inputs a negative value. Use kstrtoull() instead of simple_strtoll() to convert a string got from the user to an unsigned value. The former will return '-EINVAL' if it gets a negetive value, but the latter can't handle the situation correctly. Make 'val' unsigned long long as what kstrtoull() takes, this will eliminate the compile warning on no 64-bit architectures. Fixes: f7b88631a897 ("fs/libfs.c: fix simple_attr_write() on 32bit machines") Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: https://lkml.kernel.org/r/1605341356-11872-1-git-send-email-yangyicong@hisilicon.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()Gerald Schaefer
Alexander reported a syzkaller / KASAN finding on s390, see below for complete output. In do_huge_pmd_anonymous_page(), the pre-allocated pagetable will be freed in some cases. In the case of userfaultfd_missing(), this will happen after calling handle_userfault(), which might have released the mmap_lock. Therefore, the following pte_free(vma->vm_mm, pgtable) will access an unstable vma->vm_mm, which could have been freed or re-used already. For all architectures other than s390 this will go w/o any negative impact, because pte_free() simply frees the page and ignores the passed-in mm. The implementation for SPARC32 would also access mm->page_table_lock for pte_free(), but there is no THP support in SPARC32, so the buggy code path will not be used there. For s390, the mm->context.pgtable_list is being used to maintain the 2K pagetable fragments, and operating on an already freed or even re-used mm could result in various more or less subtle bugs due to list / pagetable corruption. Fix this by calling pte_free() before handle_userfault(), similar to how it is already done in __do_huge_pmd_anonymous_page() for the WRITE / non-huge_zero_page case. Commit 6b251fc96cf2c ("userfaultfd: call handle_userfault() for userfaultfd_missing() faults") actually introduced both, the do_huge_pmd_anonymous_page() and also __do_huge_pmd_anonymous_page() changes wrt to calling handle_userfault(), but only in the latter case it put the pte_free() before calling handle_userfault(). BUG: KASAN: use-after-free in do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744 Read of size 8 at addr 00000000962d6988 by task syz-executor.0/9334 CPU: 1 PID: 9334 Comm: syz-executor.0 Not tainted 5.10.0-rc1-syzkaller-07083-g4c9720875573 #0 Hardware name: IBM 3906 M04 701 (KVM/Linux) Call Trace: do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744 create_huge_pmd mm/memory.c:4256 [inline] __handle_mm_fault+0xe6e/0x1068 mm/memory.c:4480 handle_mm_fault+0x288/0x748 mm/memory.c:4607 do_exception+0x394/0xae0 arch/s390/mm/fault.c:479 do_dat_exception+0x34/0x80 arch/s390/mm/fault.c:567 pgm_check_handler+0x1da/0x22c arch/s390/kernel/entry.S:706 copy_from_user_mvcos arch/s390/lib/uaccess.c:111 [inline] raw_copy_from_user+0x3a/0x88 arch/s390/lib/uaccess.c:174 _copy_from_user+0x48/0xa8 lib/usercopy.c:16 copy_from_user include/linux/uaccess.h:192 [inline] __do_sys_sigaltstack kernel/signal.c:4064 [inline] __s390x_sys_sigaltstack+0xc8/0x240 kernel/signal.c:4060 system_call+0xe0/0x28c arch/s390/kernel/entry.S:415 Allocated by task 9334: slab_alloc_node mm/slub.c:2891 [inline] slab_alloc mm/slub.c:2899 [inline] kmem_cache_alloc+0x118/0x348 mm/slub.c:2904 vm_area_dup+0x9c/0x2b8 kernel/fork.c:356 __split_vma+0xba/0x560 mm/mmap.c:2742 split_vma+0xca/0x108 mm/mmap.c:2800 mlock_fixup+0x4ae/0x600 mm/mlock.c:550 apply_vma_lock_flags+0x2c6/0x398 mm/mlock.c:619 do_mlock+0x1aa/0x718 mm/mlock.c:711 __do_sys_mlock2 mm/mlock.c:738 [inline] __s390x_sys_mlock2+0x86/0xa8 mm/mlock.c:728 system_call+0xe0/0x28c arch/s390/kernel/entry.S:415 Freed by task 9333: slab_free mm/slub.c:3142 [inline] kmem_cache_free+0x7c/0x4b8 mm/slub.c:3158 __vma_adjust+0x7b2/0x2508 mm/mmap.c:960 vma_merge+0x87e/0xce0 mm/mmap.c:1209 userfaultfd_release+0x412/0x6b8 fs/userfaultfd.c:868 __fput+0x22c/0x7a8 fs/file_table.c:281 task_work_run+0x200/0x320 kernel/task_work.c:151 tracehook_notify_resume include/linux/tracehook.h:188 [inline] do_notify_resume+0x100/0x148 arch/s390/kernel/signal.c:538 system_call+0xe6/0x28c arch/s390/kernel/entry.S:416 The buggy address belongs to the object at 00000000962d6948 which belongs to the cache vm_area_struct of size 200 The buggy address is located 64 bytes inside of 200-byte region [00000000962d6948, 00000000962d6a10) The buggy address belongs to the page: page:00000000313a09fe refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x962d6 flags: 0x3ffff00000000200(slab) raw: 3ffff00000000200 000040000257e080 0000000c0000000c 000000008020ba00 raw: 0000000000000000 000f001e00000000 ffffffff00000001 0000000096959501 page dumped because: kasan: bad access detected page->mem_cgroup:0000000096959501 Memory state around the buggy address: 00000000962d6880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000000962d6900: 00 fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb >00000000962d6980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ 00000000962d6a00: fb fb fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00000000962d6a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Fixes: 6b251fc96cf2c ("userfaultfd: call handle_userfault() for userfaultfd_missing() faults") Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: <stable@vger.kernel.org> [4.3+] Link: https://lkml.kernel.org/r/20201110190329.11920-1-gerald.schaefer@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22mm: memcg/slab: fix root memcg vmstatsMuchun Song
If we reparent the slab objects to the root memcg, when we free the slab object, we need to update the per-memcg vmstats to keep it correct for the root memcg. Now this at least affects the vmstat of NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack size is smaller than the PAGE_SIZE. David said: "I assume that without this fix that the root memcg's vmstat would always be inflated if we reparented" Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Christopher Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Roman Gushchin <guro@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Chris Down <chris@chrisdown.name> Cc: <stable@vger.kernel.org> [5.3+] Link: https://lkml.kernel.org/r/20201110031015.15715-1-songmuchun@bytedance.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22mm: fix readahead_page_batch for retry entriesMatthew Wilcox (Oracle)
Both btrfs and fuse have reported faults caused by seeing a retry entry instead of the page they were looking for. This was caused by a missing check in the iterator. As can be seen in the below panic log, the accessing 0x402 causes a panic. In the xarray.h, 0x402 means RETRY_ENTRY. BUG: kernel NULL pointer dereference, address: 0000000000000402 CPU: 14 PID: 306003 Comm: as Not tainted 5.9.0-1-amd64 #1 Debian 5.9.1-1 Hardware name: Lenovo ThinkSystem SR665/7D2VCTO1WW, BIOS D8E106Q-1.01 05/30/2020 RIP: 0010:fuse_readahead+0x152/0x470 [fuse] Code: 41 8b 57 18 4c 8d 54 10 ff 4c 89 d6 48 8d 7c 24 10 e8 d2 e3 28 f9 48 85 c0 0f 84 fe 00 00 00 44 89 f2 49 89 04 d4 44 8d 72 01 <48> 8b 10 41 8b 4f 1c 48 c1 ea 10 83 e2 01 80 fa 01 19 d2 81 e2 01 RSP: 0018:ffffad99ceaebc50 EFLAGS: 00010246 RAX: 0000000000000402 RBX: 0000000000000001 RCX: 0000000000000002 RDX: 0000000000000000 RSI: ffff94c5af90bd98 RDI: ffffad99ceaebc60 RBP: ffff94ddc1749a00 R08: 0000000000000402 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000100 R12: ffff94de6c429ce0 R13: ffff94de6c4d3700 R14: 0000000000000001 R15: ffffad99ceaebd68 FS: 00007f228c5c7040(0000) GS:ffff94de8ed80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000402 CR3: 0000001dbd9b4000 CR4: 0000000000350ee0 Call Trace: read_pages+0x83/0x270 page_cache_readahead_unbounded+0x197/0x230 generic_file_buffered_read+0x57a/0xa20 new_sync_read+0x112/0x1a0 vfs_read+0xf8/0x180 ksys_read+0x5f/0xe0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 042124cc64c3 ("mm: add new readahead_control API") Reported-by: David Sterba <dsterba@suse.com> Reported-by: Wonhyuk Yang <vvghjk1234@gmail.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201103142852.8543-1-willy@infradead.org Link: https://lkml.kernel.org/r/20201103124349.16722-1-vvghjk1234@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exportsDan Williams
The core-mm has a default __weak implementation of phys_to_target_node() to mirror the weak definition of memory_add_physaddr_to_nid(). That symbol is exported for modules. However, while the export in mm/memory_hotplug.c exported the symbol in the configuration cases of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=y ...and: CONFIG_NUMA_KEEP_MEMINFO=n CONFIG_MEMORY_HOTPLUG=y ...it failed to export the symbol in the case of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=n Not only is that broken, but Christoph points out that the kernel should not be exporting any __weak symbol, which means that memory_add_physaddr_to_nid() example that phys_to_target_node() copied is broken too. Rework the definition of phys_to_target_node() and memory_add_physaddr_to_nid() to not require weak symbols. Move to the common arch override design-pattern of an asm header defining a symbol to replace the default implementation. The only common header that all memory_add_physaddr_to_nid() producing architectures implement is asm/sparsemem.h. In fact, powerpc already defines its memory_add_physaddr_to_nid() helper in sparsemem.h. Double-down on that observation and define phys_to_target_node() where necessary in asm/sparsemem.h. An alternate consideration that was discarded was to put this override in asm/numa.h, but that entangles with the definition of MAX_NUMNODES relative to the inclusion of linux/nodemask.h, and requires powerpc to grow a new header. The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid now that the symbol is properly exported / stubbed in all combinations of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG. [dan.j.williams@intel.com: v4] Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com [dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning] Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: a035b6bf863e ("mm/memory_hotplug: introduce default phys_to_target_node() implementation") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22compiler-clang: remove version check for BPF TracingNick Desaulniers
bpftrace parses the kernel headers and uses Clang under the hood. Remove the version check when __BPF_TRACING__ is defined (as bpftrace does) so that this tool can continue to parse kernel headers, even with older clang sources. Fixes: commit 1f7a44f63e6c ("compiler-clang: add build check for clang 10.0.1") Reported-by: Chen Yu <yu.chen.surf@gmail.com> Reported-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Jarkko Sakkinen <jarkko@kernel.org> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Miguel Ojeda <ojeda@kernel.org> Link: https://lkml.kernel.org/r/20201104191052.390657-1-ndesaulniers@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22mm/madvise: fix memory leak from process_madviseEric Dumazet
The early return in process_madvise() will produce a memory leak. Fix it. Fixes: ecb8ac8b1f14 ("mm/madvise: introduce process_madvise() syscall: an external memory hinting API") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20201116155132.GA3805951@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-22irqchip/gic-v3-its: Unconditionally save/restore the ITS state on suspendXu Qiang
On systems without HW-based collections (i.e. anything except GIC-500), we rely on firmware to perform the ITS save/restore. This doesn't really work, as although FW can properly save everything, it cannot fully restore the state of the command queue (the read-side is reset to the head of the queue). This results in the ITS consuming previously processed commands, potentially corrupting the state. Instead, let's always save the ITS state on suspend, disabling it in the process, and restore the full state on resume. This saves us from broken FW as long as it doesn't enable the ITS by itself (for which we can't do anything). This amounts to simply dropping the ITS_FLAGS_SAVE_SUSPEND_STATE. Signed-off-by: Xu Qiang <xuqiang36@huawei.com> [maz: added warning on resume, rewrote commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201107104226.14282-1-xuqiang36@huawei.com
2020-11-22irqchip/exiu: Fix the index of fwspec for IRQ typeChen Baozi
Since fwspec->param_count of ACPI node is two, the index of IRQ type in fwspec->param[] should be 1 rather than 2. Fixes: 3d090a36c8c8 ("irqchip/exiu: Implement ACPI support") Signed-off-by: Chen Baozi <chenbaozi@phytium.com.cn> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20201117032015.11805-1-cbz@baozis.org Cc: stable@vger.kernel.org
2020-11-21Merge branch 'ibmvnic-fixes-in-reset-path'Jakub Kicinski
Lijun Pan says: ==================== ibmvnic: fixes in reset path Patch 1/3 and 2/3 notify peers in failover and migration reset. Patch 3/3 skips timeout reset if it is already resetting. ==================== Link: https://lore.kernel.org/r/20201120224013.46891-1-ljp@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21ibmvnic: skip tx timeout reset while in resettingLijun Pan
Sometimes it takes longer than 5 seconds (watchdog timeout) to complete failover, migration, and other resets. In stead of scheduling another timeout reset, we wait for the current one to complete. Suggested-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21ibmvnic: notify peers when failover and migration happenLijun Pan
Commit 61d3e1d9bc2a ("ibmvnic: Remove netdev notify for failover resets") excluded the failover case for notify call because it said netdev_notify_peers() can cause network traffic to stall or halt. Current testing does not show network traffic stall or halt because of the notify call for failover event. netdev_notify_peers may be used when a device wants to inform the rest of the network about some sort of a reconfiguration such as failover or migration. It is unnecessary to call that in other events like FATAL, NON_FATAL, CHANGE_PARAM, and TIMEOUT resets since in those scenarios the hardware does not change. If the driver must do a hard reset, it is necessary to notify peers. Fixes: 61d3e1d9bc2a ("ibmvnic: Remove netdev notify for failover resets") Suggested-by: Brian King <brking@linux.vnet.ibm.com> Suggested-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21ibmvnic: fix call_netdevice_notifiers in do_resetLijun Pan
When netdev_notify_peers was substituted in commit 986103e7920c ("net/ibmvnic: Fix RTNL deadlock during device reset"), call_netdevice_notifiers(NETDEV_RESEND_IGMP, dev) was missed. Fix it now. Fixes: 986103e7920c ("net/ibmvnic: Fix RTNL deadlock during device reset") Signed-off-by: Lijun Pan <ljp@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21tun: honor IOCB_NOWAIT flagJens Axboe
tun only checks the file O_NONBLOCK flag, but it should also be checking the iocb IOCB_NOWAIT flag. Any fops using ->read/write_iter() should check both, otherwise it breaks users that correctly expect O_NONBLOCK semantics if IOCB_NOWAIT is set. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/e9451860-96cc-c7c7-47b8-fe42cadd5f4c@kernel.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21net/af_iucv: set correct sk_protocol for child socketsJulian Wiedmann
Child sockets erroneously inherit their parent's sk_type (ie. SOCK_*), instead of the PF_IUCV protocol that the parent was created with in iucv_sock_create(). We're currently not using sk->sk_protocol ourselves, so this shouldn't have much impact (except eg. getting the output in skb_dump() right). Fixes: eac3731bd04c ("[S390]: Add AF_IUCV socket support") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Link: https://lore.kernel.org/r/20201120100657.34407-1-jwi@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21usbnet: ipheth: fix connectivity with iOS 14Yves-Alexis Perez
Starting with iOS 14 released in September 2020, connectivity using the personal hotspot USB tethering function of iOS devices is broken. Communication between the host and the device (for example ICMP traffic or DNS resolution using the DNS service running in the device itself) works fine, but communication to endpoints further away doesn't work. Investigation on the matter shows that no UDP and ICMP traffic from the tethered host is reaching the Internet at all. For TCP traffic there are exchanges between tethered host and server but packets are modified in transit leading to impossible communication. After some trials Matti Vuorela discovered that reducing the URB buffer size by two bytes restored the previous behavior. While a better solution might exist to fix the issue, since the protocol is not publicly documented and considering the small size of the fix, let's do that. Tested-by: Matti Vuorela <matti.vuorela@bitfactor.fi> Signed-off-by: Yves-Alexis Perez <corsac@corsac.net> Link: https://lore.kernel.org/linux-usb/CAAn0qaXmysJ9vx3ZEMkViv_B19ju-_ExN8Yn_uSefxpjS6g4Lw@mail.gmail.com/ Link: https://github.com/libimobiledevice/libimobiledevice/issues/1038 Link: https://lore.kernel.org/r/20201119172439.94988-1-corsac@corsac.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21cxgb4: Fix build failure when CONFIG_TLS=mTom Seewald
After commit 9d2e5e9eeb59 ("cxgb4/ch_ktls: decrypted bit is not enough") whenever CONFIG_TLS=m and CONFIG_CHELSIO_T4=y, the following build failure occurs: ld: drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.o: in function `cxgb_select_queue': cxgb4_main.c:(.text+0x2dac): undefined reference to `tls_validate_xmit_skb' Fix this by ensuring that if TLS is set to be a module, CHELSIO_T4 will also be compiled as a module. As otherwise the cxgb4 driver will not be able to access TLS' symbols. Fixes: 9d2e5e9eeb59 ("cxgb4/ch_ktls: decrypted bit is not enough") Signed-off-by: Tom Seewald <tseewald@gmail.com> Link: https://lore.kernel.org/r/20201120192528.615-1-tseewald@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21bonding: wait for sysfs kobject destruction before freeing struct slaveJamie Iles
syzkaller found that with CONFIG_DEBUG_KOBJECT_RELEASE=y, releasing a struct slave device could result in the following splat: kobject: 'bonding_slave' (00000000cecdd4fe): kobject_release, parent 0000000074ceb2b2 (delayed 1000) bond0 (unregistering): (slave bond_slave_1): Releasing backup interface ------------[ cut here ]------------ ODEBUG: free active (active state 0) object type: timer_list hint: workqueue_select_cpu_near kernel/workqueue.c:1549 [inline] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x98 kernel/workqueue.c:1600 WARNING: CPU: 1 PID: 842 at lib/debugobjects.c:485 debug_print_object+0x180/0x240 lib/debugobjects.c:485 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 842 Comm: kworker/u4:4 Tainted: G S 5.9.0-rc8+ #96 Hardware name: linux,dummy-virt (DT) Workqueue: netns cleanup_net Call trace: dump_backtrace+0x0/0x4d8 include/linux/bitmap.h:239 show_stack+0x34/0x48 arch/arm64/kernel/traps.c:142 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x174/0x1f8 lib/dump_stack.c:118 panic+0x360/0x7a0 kernel/panic.c:231 __warn+0x244/0x2ec kernel/panic.c:600 report_bug+0x240/0x398 lib/bug.c:198 bug_handler+0x50/0xc0 arch/arm64/kernel/traps.c:974 call_break_hook+0x160/0x1d8 arch/arm64/kernel/debug-monitors.c:322 brk_handler+0x30/0xc0 arch/arm64/kernel/debug-monitors.c:329 do_debug_exception+0x184/0x340 arch/arm64/mm/fault.c:864 el1_dbg+0x48/0xb0 arch/arm64/kernel/entry-common.c:65 el1_sync_handler+0x170/0x1c8 arch/arm64/kernel/entry-common.c:93 el1_sync+0x80/0x100 arch/arm64/kernel/entry.S:594 debug_print_object+0x180/0x240 lib/debugobjects.c:485 __debug_check_no_obj_freed lib/debugobjects.c:967 [inline] debug_check_no_obj_freed+0x200/0x430 lib/debugobjects.c:998 slab_free_hook mm/slub.c:1536 [inline] slab_free_freelist_hook+0x190/0x210 mm/slub.c:1577 slab_free mm/slub.c:3138 [inline] kfree+0x13c/0x460 mm/slub.c:4119 bond_free_slave+0x8c/0xf8 drivers/net/bonding/bond_main.c:1492 __bond_release_one+0xe0c/0xec8 drivers/net/bonding/bond_main.c:2190 bond_slave_netdev_event drivers/net/bonding/bond_main.c:3309 [inline] bond_netdev_event+0x8f0/0xa70 drivers/net/bonding/bond_main.c:3420 notifier_call_chain+0xf0/0x200 kernel/notifier.c:83 __raw_notifier_call_chain kernel/notifier.c:361 [inline] raw_notifier_call_chain+0x44/0x58 kernel/notifier.c:368 call_netdevice_notifiers_info+0xbc/0x150 net/core/dev.c:2033 call_netdevice_notifiers_extack net/core/dev.c:2045 [inline] call_netdevice_notifiers net/core/dev.c:2059 [inline] rollback_registered_many+0x6a4/0xec0 net/core/dev.c:9347 unregister_netdevice_many.part.0+0x2c/0x1c0 net/core/dev.c:10509 unregister_netdevice_many net/core/dev.c:10508 [inline] default_device_exit_batch+0x294/0x338 net/core/dev.c:10992 ops_exit_list.isra.0+0xec/0x150 net/core/net_namespace.c:189 cleanup_net+0x44c/0x888 net/core/net_namespace.c:603 process_one_work+0x96c/0x18c0 kernel/workqueue.c:2269 worker_thread+0x3f0/0xc30 kernel/workqueue.c:2415 kthread+0x390/0x498 kernel/kthread.c:292 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:925 This is a potential use-after-free if the sysfs nodes are being accessed whilst removing the struct slave, so wait for the object destruction to complete before freeing the struct slave itself. Fixes: 07699f9a7c8d ("bonding: add sysfs /slave dir for bond slave devices.") Fixes: a068aab42258 ("bonding: Fix reference count leak in bond_sysfs_slave_add.") Cc: Qiushi Wu <wu000273@umn.edu> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20201120142827.879226-1-jamie@nuviainc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21Merge tag 'xfs-5.10-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: "The critical fixes are for a crash that someone reported in the xattr code on 32-bit arm last week; and a revert of the rmap key comparison change from last week as it was totally wrong. I need a vacation. :( Summary: - Fix various deficiencies in online fsck's metadata checking code - Fix an integer casting bug in the xattr code on 32-bit systems - Fix a hang in an inode walk when the inode index is corrupt - Fix error codes being dropped when initializing per-AG structures - Fix nowait directio writes that partially succeed but return EAGAIN - Revert last week's rmap comparison patch because it was wrong" * tag 'xfs-5.10-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: revert "xfs: fix rmap key and record comparison functions" xfs: don't allow NOWAIT DIO across extent boundaries xfs: return corresponding errcode if xfs_initialize_perag() fail xfs: ensure inobt record walks always make forward progress xfs: fix forkoff miscalculation related to XFS_LITINO(mp) xfs: directory scrub should check the null bestfree entries too xfs: strengthen rmap record flags checking xfs: fix the minrecs logic when dealing with inode root child blocks
2020-11-21Merge tag 'fsnotify_for_v5.10-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fanotify fix from Jan Kara: "A single fanotify fix from Amir" * tag 'fsnotify_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: fix logic of reporting name info with watched parent
2020-11-21Merge tag 'seccomp-v5.10-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp fixes from Kees Cook: "This gets the seccomp selftests running again on powerpc and sh, and fixes an audit reporting oversight noticed in both seccomp and ptrace. - Fix typos in seccomp selftests on powerpc and sh (Kees Cook) - Fix PF_SUPERPRIV audit marking in seccomp and ptrace (Mickaël Salaün)" * tag 'seccomp-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: selftests/seccomp: sh: Fix register names selftests/seccomp: powerpc: Fix typo in macro variable name seccomp: Set PF_SUPERPRIV when checking capability ptrace: Set PF_SUPERPRIV when checking capability
2020-11-21drm/mediatek: dsi: Modify horizontal front/back porch byte formulaCK Hu
In the patch to be fixed, horizontal_backporch_byte become too large for some panel, so roll back that patch. For small hfp or hbp panel, using vm->hfront_porch + vm->hback_porch to calculate horizontal_backporch_byte would make it negtive, so use horizontal_backporch_byte itself to make it positive. Fixes: 35bf948f1edb ("drm/mediatek: dsi: Fix scrolling of panel with small hfp or hbp") Signed-off-by: CK Hu <ck.hu@mediatek.com> Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Tested-by: Bilal Wasim <bilal.wasim@imgtec.com>
2020-11-20Merge branch 's390-qeth-fixes-2020-11-20'Jakub Kicinski
Julian Wiedmann says: ==================== s390/qeth: fixes 2020-11-20 This brings several fixes for qeth's af_iucv-specific code paths. Also one fix by Alexandra for the recently added BR_LEARNING_SYNC support. We want to trust the feature indication bit, so that HW can mask it out if there's any issues on their end. ==================== Link: https://lore.kernel.org/r/20201120090939.101406-1-jwi@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20s390/qeth: fix tear down of async TX buffersJulian Wiedmann
When qeth_iqd_tx_complete() detects that a TX buffer requires additional async completion via QAOB, it might fail to replace the queue entry's metadata (and ends up triggering recovery). Assume now that the device gets torn down, overruling the recovery. If the QAOB notification then arrives before the tear down has sufficiently progressed, the buffer state is changed to QETH_QDIO_BUF_HANDLED_DELAYED by qeth_qdio_handle_aob(). The tear down code calls qeth_drain_output_queue(), where qeth_cleanup_handled_pending() will then attempt to replace such a buffer _again_. If it succeeds this time, the buffer ends up dangling in its replacement's ->next_pending list ... where it will never be freed, since there's no further call to qeth_cleanup_handled_pending(). But the second attempt isn't actually needed, we can simply leave the buffer on the queue and re-use it after a potential recovery has completed. The qeth_clear_output_buffer() in qeth_drain_output_queue() will ensure that it's in a clean state again. Fixes: 72861ae792c2 ("qeth: recovery through asynchronous delivery") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20s390/qeth: fix af_iucv notification raceJulian Wiedmann
The two expected notification sequences are 1. TX_NOTIFY_PENDING with a subsequent TX_NOTIFY_DELAYED_*, when our TX completion code first observed the pending TX and the QAOB then completes at a later time; or 2. TX_NOTIFY_OK, when qeth_qdio_handle_aob() picked up the QAOB completion before our TX completion code even noticed that the TX was pending. But as qeth_iqd_tx_complete() and qeth_qdio_handle_aob() can run concurrently, we may end up with a race that results in a sequence of TX_NOTIFY_DELAYED_* followed by TX_NOTIFY_PENDING. Which would confuse the af_iucv code in its tracking of pending transmits. Rework the notification code, so that qeth_qdio_handle_aob() defers its notification if the TX completion code is still active. Fixes: b333293058aa ("qeth: add support for af_iucv HiperSockets transport") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20s390/qeth: make af_iucv TX notification call more robustJulian Wiedmann
Calling into socket code is ugly already, at least check whether we are dealing with the expected sk_family. Only looking at skb->protocol is bound to cause troubles (consider eg. af_packet). Fixes: b333293058aa ("qeth: add support for af_iucv HiperSockets transport") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20s390/qeth: Remove pnso workaroundAlexandra Winter
Remove workaround that supported early hardware implementations of PNSO OC3. Rely on the 'enarf' feature bit instead. Fixes: fa115adff2c1 ("s390/qeth: Detect PNSO OC3 capability") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> [jwi: use logical instead of bit-wise AND] Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20Merge branch 'tcp-address-issues-with-ect0-not-being-set-in-dctcp-packets'Jakub Kicinski
Alexander Duyck says: ==================== tcp: Address issues with ECT0 not being set in DCTCP packets This patch set is meant to address issues seen with SYN/ACK packets not containing the ECT0 bit when DCTCP is configured as the congestion control algorithm for a TCP socket. A simple test using "tcpdump" and "test_progs -t bpf_tcp_ca" makes the issue obvious. Looking at the packets will result in the SYN/ACK packet with an ECT0 bit that does not match the other packets for the flow when the congestion control agorithm is switch from the default. So for example going from non-DCTCP to a DCTCP congestion control algorithm we will see the SYN/ACK IPV6 header will not have ECT0 set while the other packets in the flow will. Likewise if we switch from a default of DCTCP to cubic we will see the ECT0 bit set in the SYN/ACK while the other packets in the flow will not. ==================== Link: https://lore.kernel.org/r/160582070138.66684.11785214534154816097.stgit@localhost.localdomain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20tcp: Set INET_ECN_xmit configuration in tcp_reinit_congestion_controlAlexander Duyck
When setting congestion control via a BPF program it is seen that the SYN/ACK for packets within a given flow will not include the ECT0 flag. A bit of simple printk debugging shows that when this is configured without BPF we will see the value INET_ECN_xmit value initialized in tcp_assign_congestion_control however when we configure this via BPF the socket is in the closed state and as such it isn't configured, and I do not see it being initialized when we transition the socket into the listen state. The result of this is that the ECT0 bit is configured based on whatever the default state is for the socket. Any easy way to reproduce this is to monitor the following with tcpdump: tools/testing/selftests/bpf/test_progs -t bpf_tcp_ca Without this patch the SYN/ACK will follow whatever the default is. If dctcp all SYN/ACK packets will have the ECT0 bit set, and if it is not then ECT0 will be cleared on all SYN/ACK packets. With this patch applied the SYN/ACK bit matches the value seen on the other packets in the given stream. Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control") Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 headerAlexander Duyck
An issue was recently found where DCTCP SYN/ACK packets did not have the ECT bit set in the L3 header. A bit of code review found that the recent change referenced below had gone though and added a mask that prevented the ECN bits from being populated in the L3 header. This patch addresses that by rolling back the mask so that it is only applied to the flags coming from the incoming TCP request instead of applying it to the socket tos/tclass field. Doing this the ECT bits were restored in the SYN/ACK packets in my testing. One thing that is not addressed by this patch set is the fact that tcp_reflect_tos appears to be incompatible with ECN based congestion avoidance algorithms. At a minimum the feature should likely be documented which it currently isn't. Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket") Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Fixes for two fairly obscure but annoying when triggered races in iSCSI" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: target: iscsi: Fix cmd abort fabric stop race scsi: libiscsi: Fix NOP race condition
2020-11-20dpaa2-eth: select XGMAC_MDIO for MDIO bus supportIoana Ciornei
Explicitly enable the FSL_XGMAC_MDIO Kconfig option in order to have MDIO access to internal and external PHYs. Fixes: 719479230893 ("dpaa2-eth: add MAC/PHY support through phylink") Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://lore.kernel.org/r/20201119145106.712761-1-ciorneiioana@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20Merge tag 'block-5.10-2020-11-20' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request from Christoph: - Doorbell Buffer freeing fix (Minwoo Im) - CSE log leak fix (Keith Busch) - blk-cgroup hd_struct leak fix (Christoph) - Flush request state fix (Ming) - dasd NULL deref fix (Stefan) * tag 'block-5.10-2020-11-20' of git://git.kernel.dk/linux-block: s390/dasd: fix null pointer dereference for ERP requests blk-cgroup: fix a hd_struct leak in blkcg_fill_root_iostats nvme: fix memory leak freeing command effects nvme: directly cache command effects log nvme: free sq/cq dbbuf pointers when dbbuf set fails block: mark flush request as IDLE when it is really finished
2020-11-20Merge tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "Mostly regression or stable fodder: - Disallow async path resolution of /proc/self - Tighten constraints for segmented async buffered reads - Fix double completion for a retry error case - Fix for fixed file life times (Pavel)" * tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block: io_uring: order refnode recycling io_uring: get an active ref_node from files_data io_uring: don't double complete failed reissue request mm: never attempt async page lock if we've transferred data already io_uring: handle -EOPNOTSUPP on path resolution proc: don't allow async path resolution of /proc/self components