summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-01-24mm: fix numa stats for thp migrationShakeel Butt
Currently the kernel is not correctly updating the numa stats for NR_FILE_PAGES and NR_SHMEM on THP migration. Fix that. For NR_FILE_DIRTY and NR_ZONE_WRITE_PENDING, although at the moment there is no need to handle THP migration as kernel still does not have write support for file THP but to be more future proof, this patch adds the THP support for those stats as well. Link: https://lkml.kernel.org/r/20210108155813.2914586-2-shakeelb@google.com Fixes: e71769ae52609 ("mm: enable thp migration for shmem thp") Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-24mm: memcg: fix memcg file_dirty numa statShakeel Butt
The kernel updates the per-node NR_FILE_DIRTY stats on page migration but not the memcg numa stats. That was not an issue until recently the commit 5f9a4f4a7096 ("mm: memcontrol: add the missing numa_stat interface for cgroup v2") exposed numa stats for the memcg. So fix the file_dirty per-memcg numa stat. Link: https://lkml.kernel.org/r/20210108155813.2914586-1-shakeelb@google.com Fixes: 5f9a4f4a7096 ("mm: memcontrol: add the missing numa_stat interface for cgroup v2") Signed-off-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-24mm: memcg/slab: optimize objcg stock drainingRoman Gushchin
Imran Khan reported a 16% regression in hackbench results caused by the commit f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects instead of pages"). The regression is noticeable in the case of a consequent allocation of several relatively large slab objects, e.g. skb's. As soon as the amount of stocked bytes exceeds PAGE_SIZE, drain_obj_stock() and __memcg_kmem_uncharge() are called, and it leads to a number of atomic operations in page_counter_uncharge(). The corresponding call graph is below (provided by Imran Khan): |__alloc_skb | | | |__kmalloc_reserve.isra.61 | | | | | |__kmalloc_node_track_caller | | | | | | | |slab_pre_alloc_hook.constprop.88 | | | obj_cgroup_charge | | | | | | | | | |__memcg_kmem_charge | | | | | | | | | | | |page_counter_try_charge | | | | | | | | | |refill_obj_stock | | | | | | | | | | | |drain_obj_stock.isra.68 | | | | | | | | | | | | | |__memcg_kmem_uncharge | | | | | | | | | | | | | | | |page_counter_uncharge | | | | | | | | | | | | | | | | | |page_counter_cancel | | | | | | | | | | | |__slab_alloc | | | | | | | | | |___slab_alloc | | | | | | | | |slab_post_alloc_hook Instead of directly uncharging the accounted kernel memory, it's possible to refill the generic page-sized per-cpu stock instead. It's a much faster operation, especially on a default hierarchy. As a bonus, __memcg_kmem_uncharge_page() will also get faster, so the freeing of page-sized kernel allocations (e.g. large kmallocs) will become faster. A similar change has been done earlier for the socket memory by the commit 475d0487a2ad ("mm: memcontrol: use per-cpu stocks for socket memory uncharging"). Link: https://lkml.kernel.org/r/20210106042239.2860107-1-guro@fb.com Fixes: f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects instead of pages") Signed-off-by: Roman Gushchin <guro@fb.com> Reported-by: Imran Khan <imran.f.khan@oracle.com> Tested-by: Imran Khan <imran.f.khan@oracle.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Michal Koutn <mkoutny@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-24mm: fix initialization of struct page for holes in memory layoutMike Rapoport
There could be struct pages that are not backed by actual physical memory. This can happen when the actual memory bank is not a multiple of SECTION_SIZE or when an architecture does not register memory holes reserved by the firmware as memblock.memory. Such pages are currently initialized using init_unavailable_mem() function that iterates through PFNs in holes in memblock.memory and if there is a struct page corresponding to a PFN, the fields if this page are set to default values and the page is marked as Reserved. init_unavailable_mem() does not take into account zone and node the page belongs to and sets both zone and node links in struct page to zero. On a system that has firmware reserved holes in a zone above ZONE_DMA, for instance in a configuration below: # grep -A1 E820 /proc/iomem 7a17b000-7a216fff : Unknown E820 type 7a217000-7bffffff : System RAM unset zone link in struct page will trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); because there are pages in both ZONE_DMA32 and ZONE_DMA (unset zone link in struct page) in the same pageblock. Update init_unavailable_mem() to use zone constraints defined by an architecture to properly setup the zone link and use node ID of the adjacent range in memblock.memory to set the node link. Link: https://lkml.kernel.org/r/20210111194017.22696-3-rppt@kernel.org Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reported-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David Hildenbrand <david@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qian Cai <cai@lca.pw> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-24x86/setup: don't remove E820_TYPE_RAM for pfn 0Mike Rapoport
Patch series "mm: fix initialization of struct page for holes in memory layout", v3. Commit 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") exposed several issues with the memory map initialization and these patches fix those issues. Initially there were crashes during compaction that Qian Cai reported back in April [1]. It seemed back then that the problem was fixed, but a few weeks ago Andrea Arcangeli hit the same bug [2] and there was an additional discussion at [3]. [1] https://lore.kernel.org/lkml/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw [2] https://lore.kernel.org/lkml/20201121194506.13464-1-aarcange@redhat.com [3] https://lore.kernel.org/mm-commits/20201206005401.qKuAVgOXr%akpm@linux-foundation.org This patch (of 2): The first 4Kb of memory is a BIOS owned area and to avoid its allocation for the kernel it was not listed in e820 tables as memory. As the result, pfn 0 was never recognised by the generic memory management and it is not a part of neither node 0 nor ZONE_DMA. If set_pfnblock_flags_mask() would be ever called for the pageblock corresponding to the first 2Mbytes of memory, having pfn 0 outside of ZONE_DMA would trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); Along with reserving the first 4Kb in e820 tables, several first pages are reserved with memblock in several places during setup_arch(). These reservations are enough to ensure the kernel does not touch the BIOS area and it is not necessary to remove E820_TYPE_RAM for pfn 0. Remove the update of e820 table that changes the type of pfn 0 and move the comment describing why it was done to trim_low_memory_range() that reserves the beginning of the memory. Link: https://lkml.kernel.org/r/20210111194017.22696-2-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David Hildenbrand <david@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qian Cai <cai@lca.pw> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-24io_uring: account io_uring internal files as REQ_F_INFLIGHTJens Axboe
We need to actively cancel anything that introduces a potential circular loop, where io_uring holds a reference to itself. If the file in question is an io_uring file, then add the request to the inflight list. Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-24io_uring: fix sleeping under spin in __io_clean_opPavel Begunkov
[ 27.629441] BUG: sleeping function called from invalid context at fs/file.c:402 [ 27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1012, name: io_wqe_worker-0 [ 27.633220] 1 lock held by io_wqe_worker-0/1012: [ 27.634286] #0: ffff888105e26c98 (&ctx->completion_lock) {....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70 [ 27.649249] Call Trace: [ 27.649874] dump_stack+0xac/0xe3 [ 27.650666] ___might_sleep+0x284/0x2c0 [ 27.651566] put_files_struct+0xb8/0x120 [ 27.652481] __io_clean_op+0x10c/0x2a0 [ 27.653362] __io_cqring_fill_event+0x2c1/0x350 [ 27.654399] __io_req_complete.part.102+0x41/0x70 [ 27.655464] io_openat2+0x151/0x300 [ 27.656297] io_issue_sqe+0x6c/0x14e0 [ 27.660991] io_wq_submit_work+0x7f/0x240 [ 27.662890] io_worker_handle_work+0x501/0x8a0 [ 27.664836] io_wqe_worker+0x158/0x520 [ 27.667726] kthread+0x134/0x180 [ 27.669641] ret_from_fork+0x1f/0x30 Instead of cleaning files on overflow, return back overflow cancellation into io_uring_cancel_files(). Previously it was racy to clean REQ_F_OVERFLOW flag, but we got rid of it, and can do it through repetitive attempts targeting all matching requests. Reported-by: Abaci <abaci@linux.alibaba.com> Reported-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-23Merge branch 'mtd/fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull mtd fixes from Miquel Raynal. * 'mtd/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: mtd: rawnand: omap: Use BCH private fields in the specific OOB layout mtd: spinand: Fix MTD_OPS_AUTO_OOB requests mtd: rawnand: intel: check the mtd name only after setting the variable mtd: rawnand: nandsim: Fix the logic when selecting Hamming soft ECC engine mtd: rawnand: gpmi: fix dst bit offset when extracting raw payload
2021-01-23Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Another bunch of driver fixes" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: sprd: depend on COMMON_CLK to fix compile tests Revert "i2c: imx: Remove unused .id_table support" i2c: octeon: check correct size of maximum RECV_LEN packet i2c: tegra: Create i2c_writesl_vi() to use with VI I2C for filling TX FIFO i2c: bpmp-tegra: Ignore unknown I2C_M flags i2c: tegra: Wait for config load atomically while in ISR
2021-01-23Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Twelve minor fixes, all in drivers or doc. Most of the fixes are pretty obvious (although we had two goes to get the UFS sysfs doc right) and the biggest change is in the ufs driver which they've extensively tested" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ibmvfc: Set default timeout to avoid crash during migration scsi: target: tcmu: Fix use-after-free of se_cmd->priv scsi: fnic: Fix memleak in vnic_dev_init_devcmd2 scsi: libfc: Avoid invoking response handler twice if ep is already completed scsi: scsi_transport_srp: Don't block target in failfast state scsi: docs: ABI: sysfs-driver-ufs: Rectify table formatting scsi: ufs: Fix tm request when non-fatal error happens scsi: ufs: Fix livelock of ufshcd_clear_ua_wluns() scsi: ibmvfc: Fix missing cast of ibmvfc_event pointer to u64 handle scsi: ufs: ufshcd-pltfrm depends on HAS_IOMEM scsi: megaraid_sas: Fix MEGASAS_IOC_FIRMWARE regression scsi: docs: ABI: sysfs-driver-ufs: Add DeepSleep power mode
2021-01-23Merge tag 'linux-kselftest-kunit-fixes-5.11-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kunit fixes from Shuah : "Five fixes to the kunit tool and documentation from Daniel Latypov and David Gow" * tag 'linux-kselftest-kunit-fixes-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: tool: move kunitconfig parsing into __init__, make it optional kunit: tool: fix minor typing issue with None status kunit: tool: surface and address more typing issues Documentation: kunit: include example of a parameterized test kunit: tool: Fix spelling of "diagnostic" in kunit_parser
2021-01-23cifs: do not fail __smb_send_rqst if non-fatal signals are pendingRonnie Sahlberg
RHBZ 1848178 The original intent of returning an error in this function in the patch: "CIFS: Mask off signals when sending SMB packets" was to avoid interrupting packet send in the middle of sending the data (and thus breaking an SMB connection), but we also don't want to fail the request for non-fatal signals even before we have had a chance to try to send it (the reported problem could be reproduced e.g. by exiting a child process when the parent process was in the midst of calling futimens to update a file's timestamps). In addition, since the signal may remain pending when we enter the sending loop, we may end up not sending the whole packet before TCP buffers become full. In this case the code returns -EINTR but what we need here is to return -ERESTARTSYS instead to allow system calls to be restarted. Fixes: b30c74c73c78 ("CIFS: Mask off signals when sending SMB packets") Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-01-22Merge tag 'for-5.11/dm-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix DM integrity crash if "recalculate" used without "internal_hash" - Fix DM integrity "recalculate" support to prevent recalculating checksums if we use internal_hash or journal_hash with a key (e.g. HMAC). Use of crypto as a means to prevent malicious corruption requires further changes and was never a design goal for dm-integrity's primary usecase of detecting accidental corruption. - Fix a benign dm-crypt copy-and-paste bug introduced as part of a fix that was merged for 5.11-rc4. - Fix DM core's dm_get_device() to avoid filesystem lookup to get block device (if possible). * tag 'for-5.11/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: avoid filesystem lookup in dm_get_dev_t() dm crypt: fix copy and paste bug in crypt_alloc_req_aead dm integrity: conditionally disable "recalculate" feature dm integrity: fix a crash if "recalculate" used without "internal_hash"
2021-01-22Merge tag 'perf-tools-fixes-v5.11-2-2021-01-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull more perf tools fixes from Arnaldo Carvalho de Melo: - Fix id index used in Intel PT for heterogeneous systems - Fix overrun issue in 'perf script' for dynamically-allocated PMU type number - Fix 'perf stat' metrics containing the 'duration_time' synthetic event - Fix system PMU 'perf stat' metrics * tag 'perf-tools-fixes-v5.11-2-2021-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf script: Fix overrun issue for dynamically-allocated PMU type number perf metricgroup: Fix system PMU metrics perf metricgroup: Fix for metrics containing duration_time perf evlist: Fix id index for heterogeneous systems
2021-01-22Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Correctly mask out bits 63:60 in a kernel tag check fault address (specified as unknown by the architecture). Previously they were just zeroed but for kernel pointers they need to be all ones. - Fix a panic (unexpected kernel BRK exception) caused by kprobes being reentered due to an interrupt. * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: kprobes: Fix Uexpected kernel BRK exception at EL1 kasan, arm64: fix pointer tags in KASAN reports
2021-01-22Merge tag 'ceph-for-5.11-rc5' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "A patch to zero out sensitive cryptographic data and two minor cleanups prompted by the fact that a bunch of code was moved in this cycle" * tag 'ceph-for-5.11-rc5' of git://github.com/ceph/ceph-client: libceph: fix "Boolean result is used in bitwise operation" warning libceph, ceph: disambiguate ceph_connection_operations handlers libceph: zero out session key and connection secret
2021-01-22Merge tag 'fixes-2021-01-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull typo fix from Mike Rapoport: "Fix typo in comment of memblock_phys_alloc_try_nid()" * tag 'fixes-2021-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: mm/memblock: Fix typo in comment of memblock_phys_alloc_try_nid()
2021-01-22Merge tag 'mmc-v5.11-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - Fix initialization of block size when ext_csd isn't present MMC host: - sdhci-brcmstb: Fix mmc timeout errors on S5 suspend - sdhci-of-dwcmshc: Fix request accessing RPMB - sdhci-xenon: Fix 1.8v regulator stabilization" * tag 'mmc-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: core: don't initialize block size from ext_csd if not present mmc: sdhci-brcmstb: Fix mmc timeout errors on S5 suspend mmc: sdhci-xenon: fix 1.8v regulator stabilization mmc: sdhci-of-dwcmshc: fix rpmb access
2021-01-22Merge tag 'platform-drivers-x86-v5.11-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: "A small collection of bug-fixes and model-specific quirks" * tag 'platform-drivers-x86-v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: thinkpad_acpi: Add P53/73 firmware to fan_quirk_table for dual fan control platform/x86: hp-wmi: Don't log a warning on HPWMI_RET_UNKNOWN_COMMAND errors platform/x86: intel-vbtn: Drop HP Stream x360 Convertible PC 11 from allow-list platform/x86: ideapad-laptop: Disable touchpad_switch for ELAN0634 platform/x86: amd-pmc: Fix CONFIG_DEBUG_FS check platform/x86: thinkpad_acpi: correct palmsensor error checking platform/x86: intel-vbtn: Support for tablet mode on Dell Inspiron 7352 platform/x86: touchscreen_dmi: Add swap-x-y quirk for Goodix touchscreen on Estar Beauty HD tablet platform/x86: i2c-multi-instantiate: Don't create platform device for INT3515 ACPI nodes platform/surface: SURFACE_PLATFORMS should depend on ACPI platform/surface: surface_gpe: Fix non-PM_SLEEP build warnings tools/power/x86/intel-speed-select: Set higher of cpuinfo_max_freq or base_frequency tools/power/x86/intel-speed-select: Set scaling_max_freq to base_frequency
2021-01-22io_uring: fix short read retries for non-reg filesPavel Begunkov
Sockets and other non-regular files may actually expect short reads to happen, don't retry reads for them. Because non-reg files don't set FMODE_BUF_RASYNC and so it won't do second/retry do_read, we can filter out those cases after first do_read() attempt with ret>0. Cc: stable@vger.kernel.org # 5.9+ Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-22io_uring: fix SQPOLL IORING_OP_CLOSE cancelation stateJens Axboe
IORING_OP_CLOSE is special in terms of cancelation, since it has an intermediate state where we've removed the file descriptor but hasn't closed the file yet. For that reason, it's currently marked with IO_WQ_WORK_NO_CANCEL to prevent cancelation. This ensures that the op is always run even if canceled, to prevent leaving us with a live file but an fd that is gone. However, with SQPOLL, since a cancel request doesn't carry any resources on behalf of the request being canceled, if we cancel before any of the close op has been run, we can end up with io-wq not having the ->files assigned. This can result in the following oops reported by Joseph: BUG: kernel NULL pointer dereference, address: 00000000000000d8 PGD 800000010b76f067 P4D 800000010b76f067 PUD 10b462067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 1788 Comm: io_uring-sq Not tainted 5.11.0-rc4 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__lock_acquire+0x19d/0x18c0 Code: 00 00 8b 1d fd 56 dd 08 85 db 0f 85 43 05 00 00 48 c7 c6 98 7b 95 82 48 c7 c7 57 96 93 82 e8 9a bc f5 ff 0f 0b e9 2b 05 00 00 <48> 81 3f c0 ca 67 8a b8 00 00 00 00 41 0f 45 c0 89 04 24 e9 81 fe RSP: 0018:ffffc90001933828 EFLAGS: 00010002 RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000d8 RBP: 0000000000000246 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff888106e8a140 R15: 00000000000000d8 FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000d8 CR3: 0000000106efa004 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: lock_acquire+0x31a/0x440 ? close_fd_get_file+0x39/0x160 ? __lock_acquire+0x647/0x18c0 _raw_spin_lock+0x2c/0x40 ? close_fd_get_file+0x39/0x160 close_fd_get_file+0x39/0x160 io_issue_sqe+0x1334/0x14e0 ? lock_acquire+0x31a/0x440 ? __io_free_req+0xcf/0x2e0 ? __io_free_req+0x175/0x2e0 ? find_held_lock+0x28/0xb0 ? io_wq_submit_work+0x7f/0x240 io_wq_submit_work+0x7f/0x240 io_wq_cancel_cb+0x161/0x580 ? io_wqe_wake_worker+0x114/0x360 ? io_uring_get_socket+0x40/0x40 io_async_find_and_cancel+0x3b/0x140 io_issue_sqe+0xbe1/0x14e0 ? __lock_acquire+0x647/0x18c0 ? __io_queue_sqe+0x10b/0x5f0 __io_queue_sqe+0x10b/0x5f0 ? io_req_prep+0xdb/0x1150 ? mark_held_locks+0x6d/0xb0 ? mark_held_locks+0x6d/0xb0 ? io_queue_sqe+0x235/0x4b0 io_queue_sqe+0x235/0x4b0 io_submit_sqes+0xd7e/0x12a0 ? _raw_spin_unlock_irq+0x24/0x30 ? io_sq_thread+0x3ae/0x940 io_sq_thread+0x207/0x940 ? do_wait_intr_irq+0xc0/0xc0 ? __ia32_sys_io_uring_enter+0x650/0x650 kthread+0x134/0x180 ? kthread_create_worker_on_cpu+0x90/0x90 ret_from_fork+0x1f/0x30 Fix this by moving the IO_WQ_WORK_NO_CANCEL until _after_ we've modified the fdtable. Canceling before this point is totally fine, and running it in the io-wq context _after_ that point is also fine. For 5.12, we'll handle this internally and get rid of the no-cancel flag, as IORING_OP_CLOSE is the only user of it. Cc: stable@vger.kernel.org Fixes: b5dba59e0cf7 ("io_uring: add support for IORING_OP_CLOSE") Reported-by: "Abaci <abaci@linux.alibaba.com>" Reviewed-and-tested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-22arm64: kprobes: Fix Uexpected kernel BRK exception at EL1Qais Yousef
I was hitting the below panic continuously when attaching kprobes to scheduler functions [ 159.045212] Unexpected kernel BRK exception at EL1 [ 159.053753] Internal error: BRK handler: f2000006 [#1] PREEMPT SMP [ 159.059954] Modules linked in: [ 159.063025] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.11.0-rc4-00008-g1e2a199f6ccd #56 [rt-app] <notice> [1] Exiting.[ 159.071166] Hardware name: ARM Juno development board (r2) (DT) [ 159.079689] pstate: 600003c5 (nZCv DAIF -PAN -UAO -TCO BTYPE=--) [ 159.085723] pc : 0xffff80001624501c [ 159.089377] lr : attach_entity_load_avg+0x2ac/0x350 [ 159.094271] sp : ffff80001622b640 [rt-app] <notice> [0] Exiting.[ 159.097591] x29: ffff80001622b640 x28: 0000000000000001 [ 159.105515] x27: 0000000000000049 x26: ffff000800b79980 [ 159.110847] x25: ffff00097ef37840 x24: 0000000000000000 [ 159.116331] x23: 00000024eacec1ec x22: ffff00097ef12b90 [ 159.121663] x21: ffff00097ef37700 x20: ffff800010119170 [rt-app] <notice> [11] Exiting.[ 159.126995] x19: ffff00097ef37840 x18: 000000000000000e [ 159.135003] x17: 0000000000000001 x16: 0000000000000019 [ 159.140335] x15: 0000000000000000 x14: 0000000000000000 [ 159.145666] x13: 0000000000000002 x12: 0000000000000002 [ 159.150996] x11: ffff80001592f9f0 x10: 0000000000000060 [ 159.156327] x9 : ffff8000100f6f9c x8 : be618290de0999a1 [ 159.161659] x7 : ffff80096a4b1000 x6 : 0000000000000000 [ 159.166990] x5 : ffff00097ef37840 x4 : 0000000000000000 [ 159.172321] x3 : ffff000800328948 x2 : 0000000000000000 [ 159.177652] x1 : 0000002507d52fec x0 : ffff00097ef12b90 [ 159.182983] Call trace: [ 159.185433] 0xffff80001624501c [ 159.188581] update_load_avg+0x2d0/0x778 [ 159.192516] enqueue_task_fair+0x134/0xe20 [ 159.196625] enqueue_task+0x4c/0x2c8 [ 159.200211] ttwu_do_activate+0x70/0x138 [ 159.204147] sched_ttwu_pending+0xbc/0x160 [ 159.208253] flush_smp_call_function_queue+0x16c/0x320 [ 159.213408] generic_smp_call_function_single_interrupt+0x1c/0x28 [ 159.219521] ipi_handler+0x1e8/0x3c8 [ 159.223106] handle_percpu_devid_irq+0xd8/0x460 [ 159.227650] generic_handle_irq+0x38/0x50 [ 159.231672] __handle_domain_irq+0x6c/0xc8 [ 159.235781] gic_handle_irq+0xcc/0xf0 [ 159.239452] el1_irq+0xb4/0x180 [ 159.242600] rcu_is_watching+0x28/0x70 [ 159.246359] rcu_read_lock_held_common+0x44/0x88 [ 159.250991] rcu_read_lock_any_held+0x30/0xc0 [ 159.255360] kretprobe_dispatcher+0xc4/0xf0 [ 159.259555] __kretprobe_trampoline_handler+0xc0/0x150 [ 159.264710] trampoline_probe_handler+0x38/0x58 [ 159.269255] kretprobe_trampoline+0x70/0xc4 [ 159.273450] run_rebalance_domains+0x54/0x80 [ 159.277734] __do_softirq+0x164/0x684 [ 159.281406] irq_exit+0x198/0x1b8 [ 159.284731] __handle_domain_irq+0x70/0xc8 [ 159.288840] gic_handle_irq+0xb0/0xf0 [ 159.292510] el1_irq+0xb4/0x180 [ 159.295658] arch_cpu_idle+0x18/0x28 [ 159.299245] default_idle_call+0x9c/0x3e8 [ 159.303265] do_idle+0x25c/0x2a8 [ 159.306502] cpu_startup_entry+0x2c/0x78 [ 159.310436] secondary_start_kernel+0x160/0x198 [ 159.314984] Code: d42000c0 aa1e03e9 d42000c0 aa1e03e9 (d42000c0) After a bit of head scratching and debugging it turned out that it is due to kprobe handler being interrupted by a tick that causes us to go into (I think another) kprobe handler. The culprit was kprobe_breakpoint_ss_handler() returning DBG_HOOK_ERROR which leads to the Unexpected kernel BRK exception. Reverting commit ba090f9cafd5 ("arm64: kprobes: Remove redundant kprobe_step_ctx") seemed to fix the problem for me. Further analysis showed that kcb->kprobe_status is set to KPROBE_REENTER when the error occurs. By teaching kprobe_breakpoint_ss_handler() to handle this status I can no longer reproduce the problem. Fixes: ba090f9cafd5 ("arm64: kprobes: Remove redundant kprobe_step_ctx") Signed-off-by: Qais Yousef <qais.yousef@arm.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20210122110909.3324607-1-qais.yousef@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-01-22sched: Relax the set_cpus_allowed_ptr() semanticsPeter Zijlstra
Now that we have KTHREAD_IS_PER_CPU to denote the critical per-cpu tasks to retain during CPU offline, we can relax the warning in set_cpus_allowed_ptr(). Any spurious kthread that wants to get on at the last minute will get pushed off before it can run. While during CPU online there is no harm, and actual benefit, to allowing kthreads back on early, it simplifies hotplug code and fixes a number of outstanding races. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lai jiangshan <jiangshanlai@gmail.com> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210121103507.240724591@infradead.org
2021-01-22sched: Fix CPU hotplug / tighten is_per_cpu_kthread()Peter Zijlstra
Prior to commit 1cf12e08bc4d ("sched/hotplug: Consolidate task migration on CPU unplug") we'd leave any task on the dying CPU and break affinity and force them off at the very end. This scheme had to change in order to enable migrate_disable(). One cannot wait for migrate_disable() to complete while stuck in stop_machine(). Furthermore, since we need at the very least: idle, hotplug and stop threads at any point before stop_machine, we can't break affinity and/or push those away. Under the assumption that all per-cpu kthreads are sanely handled by CPU hotplug, the new code no long breaks affinity or migrates any of them (which then includes the critical ones above). However, there's an important difference between per-cpu kthreads and kthreads that happen to have a single CPU affinity which is lost. The latter class very much relies on the forced affinity breaking and migration semantics previously provided. Use the new kthread_is_per_cpu() infrastructure to tighten is_per_cpu_kthread() and fix the hot-unplug problems stemming from the change. Fixes: 1cf12e08bc4d ("sched/hotplug: Consolidate task migration on CPU unplug") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210121103507.102416009@infradead.org
2021-01-22sched: Prepare to use balance_push in ttwu()Peter Zijlstra
In preparation of using the balance_push state in ttwu() we need it to provide a reliable and consistent state. The immediate problem is that rq->balance_callback gets cleared every schedule() and then re-set in the balance_push_callback() itself. This is not a reliable signal, so add a variable that stays set during the entire time. Also move setting it before the synchronize_rcu() in sched_cpu_deactivate(), such that we get guaranteed visibility to ttwu(), which is a preempt-disable region. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210121103506.966069627@infradead.org
2021-01-22workqueue: Restrict affinity change to rescuerPeter Zijlstra
create_worker() will already set the right affinity using kthread_bind_mask(), this means only the rescuer will need to change it's affinity. Howveer, while in cpu-hot-unplug a regular task is not allowed to run on online&&!active as it would be pushed away quite agressively. We need KTHREAD_IS_PER_CPU to survive in that environment. Therefore set the affinity after getting that magic flag. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210121103506.826629830@infradead.org
2021-01-22workqueue: Tag bound workers with KTHREAD_IS_PER_CPUPeter Zijlstra
Mark the per-cpu workqueue workers as KTHREAD_IS_PER_CPU. Workqueues have unfortunate semantics in that per-cpu workers are not default flushed and parked during hotplug, however a subset does manual flush on hotplug and hard relies on them for correctness. Therefore play silly games.. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210121103506.693465814@infradead.org
2021-01-22kthread: Extract KTHREAD_IS_PER_CPUPeter Zijlstra
There is a need to distinguish geniune per-cpu kthreads from kthreads that happen to have a single CPU affinity. Geniune per-cpu kthreads are kthreads that are CPU affine for correctness, these will obviously have PF_KTHREAD set, but must also have PF_NO_SETAFFINITY set, lest userspace modify their affinity and ruins things. However, these two things are not sufficient, PF_NO_SETAFFINITY is also set on other tasks that have their affinities controlled through other means, like for instance workqueues. Therefore another bit is needed; it turns out kthread_create_per_cpu() already has such a bit: KTHREAD_IS_PER_CPU, which is used to make kthread_park()/kthread_unpark() work correctly. Expose this flag and remove the implicit setting of it from kthread_create_on_cpu(); the io_uring usage of it seems dubious at best. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210121103506.557620262@infradead.org
2021-01-22sched: Don't run cpu-online with balance_push() enabledPeter Zijlstra
We don't need to push away tasks when we come online, mark the push complete right before the CPU dies. XXX hotplug state machine has trouble with rollback here. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210121103506.415606087@infradead.org
2021-01-22workqueue: Use cpu_possible_mask instead of cpu_active_mask to break affinityLai Jiangshan
The scheduler won't break affinity for us any more, and we should "emulate" the same behavior when the scheduler breaks affinity for us. The behavior is "changing the cpumask to cpu_possible_mask". And there might be some other CPUs online later while the worker is still running with the pending work items. The worker should be allowed to use the later online CPUs as before and process the work items ASAP. If we use cpu_active_mask here, we can't achieve this goal but using cpu_possible_mask can. Fixes: 06249738a41a ("workqueue: Manually break affinity on hotplug") Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Acked-by: Tejun Heo <tj@kernel.org> Tested-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210111152638.2417-4-jiangshanlai@gmail.com
2021-01-22sched/core: Print out straggler tasks in sched_cpu_dying()Valentin Schneider
Since commit 1cf12e08bc4d ("sched/hotplug: Consolidate task migration on CPU unplug") tasks are expected to move themselves out of a out-going CPU. For most tasks this will be done automagically via BALANCE_PUSH, but percpu kthreads will have to cooperate and move themselves away one way or another. Currently, some percpu kthreads (workqueues being a notable exemple) do not cooperate nicely and can end up on an out-going CPU at the time sched_cpu_dying() is invoked. Print the dying rq's tasks to shed some light on the stragglers. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210113183141.11974-1-valentin.schneider@arm.com
2021-01-22misc: rtsx: init value of aspm_enabledRicky Wu
make sure ASPM state sync with pcr->aspm_enabled init value pcr->aspm_enabled Cc: stable@vger.kernel.org Signed-off-by: Ricky Wu <ricky_wu@realtek.com> Link: https://lore.kernel.org/r/20210122081906.19100-1-ricky_wu@realtek.com Fixes: d928061c3143 ("misc: rtsx: modify en/disable aspm function") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-22tty: fix up hung_up_tty_write() conversionLinus Torvalds
In commit "tty: implement write_iter", I left the write_iter conversion of the hung up tty case alone, because I incorrectly thought it didn't matter. Jiri showed me the errors of my ways, and pointed out the problems with that incomplete conversion. Fix it all up. Reported-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Link: https://lore.kernel.org/r/CAHk-=wh+-rGsa=xruEWdg_fJViFG8rN9bpLrfLz=_yBYh2tBhA@mail.gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-22i2c: sprd: depend on COMMON_CLK to fix compile testsKrzysztof Kozlowski
The I2C_SPRD uses Common Clock Framework thus it cannot be built on platforms without it (e.g. compile test on MIPS with LANTIQ): /usr/bin/mips-linux-gnu-ld: drivers/i2c/busses/i2c-sprd.o: in function `sprd_i2c_probe': i2c-sprd.c:(.text.sprd_i2c_probe+0x254): undefined reference to `clk_set_parent' Fixes: 4a2d5f663dab ("i2c: Enable compile testing for more drivers") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Baolin Wang <baolin.wang7@gmail.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-01-22Revert "i2c: imx: Remove unused .id_table support"Fabio Estevam
Coldfire platforms are non-DT users of this driver, so keep the .id_table support. This reverts commit c610199cd392e6e2d41811ef83d85355c1b862b3. Fixes: c610199cd392 (i2c: imx: Remove unused .id_table support") Reported-by: Sascha Hauer <sha@pengutronix.de> Signed-off-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-01-21Merge tag 'for-linus' of git://github.com/openrisc/linuxLinus Torvalds
Pull OpenRISC fixes from Stafford Horne: - Compiler warning fixup for new Litex SoC driver - Sparse warning fixup for iounmap * tag 'for-linus' of git://github.com/openrisc/linux: openrisc: io: Add missing __iomem annotation to iounmap() soc: litex: Fix compile warning when device tree is not configured
2021-01-21Merge tag 'drm-fixes-2021-01-22' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Regular fixes pull, nothing too major in here, just some core fixes, one vc4, bunch of i915 and a bunch of amdgpu. core: - atomic: Release state on error - syncobj: Fix use-after-free - ttm: Don't use GFP_TRANSHUGE_LIGTH - vram-helper: Fix memory leak in vmap vc4: - Unify driver naming for PCM i915: - HDCP fixes - PMU wakeref fix - Fix HWSP validity race - Fix DP protocol converter accidental 4:4:4->4:2:0 conversion for RGB amdgpu: - Green Sardine fixes - Vangogh fixes - Renoir fixes - Misc display fixes" * tag 'drm-fixes-2021-01-22' of git://anongit.freedesktop.org/drm/drm: (21 commits) drm/amdgpu: update mmhub mgcg&ls for mmhub_v2_3 drm/amdgpu: modify GCR_GENERAL_CNTL for Vangogh drm/amdgpu/pm: no need GPU status set since mmnbif_gpu_BIF_DOORBELL_FENCE_CNTL added in FSDL drm/amd/display: Fixed corruptions on HPDRX link loss restore drm/amd/display: Use hardware sequencer functions for PG control drm/amd/display: Change function decide_dp_link_settings to avoid infinite looping drm/amd/display: Allow PSTATE chnage when no displays are enabled drm/amd/display: Update dram_clock_change_latency for DCN2.1 drm/amdgpu: remove gpu info firmware of green sardine drm/amd/display: DCN2X Find Secondary Pipe properly in MPO + ODM Case drm/syncobj: Fix use-after-free drm/vram-helper: Reuse existing page mappings in vmap drm/atomic: put state on error path drm/i915: Only enable DFP 4:4:4->4:2:0 conversion when outputting YCbCr 4:4:4 drm/i915: Check for rq->hwsp validity after acquiring RCU lock drm/i915/pmu: Don't grab wakeref when enabling events drm/i915/gt: Prevent use of engine->wa_ctx after error drm/vc4: Unify PCM card's driver_name drm/ttm: stop using GFP_TRANSHUGE_LIGHT drm/i915/hdcp: Get conn while content_type changed ...
2021-01-22Merge tag 'amd-drm-fixes-5.11-2021-01-21' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-5.11-2021-01-21: amdgpu: - Green Sardine fixes - Vangogh fixes - Renoir fixes - Misc display fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210121160129.3981-1-alexander.deucher@amd.com
2021-01-22Merge tag 'drm-intel-fixes-2021-01-21' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v5.11-rc5: - HDCP fixes - PMU wakeref fix - Fix HWSP validity race - Fix DP protocol converter accidental 4:4:4->4:2:0 conversion for RGB Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87a6t2kzgb.fsf@intel.com
2021-01-22Merge tag 'drm-misc-fixes-2021-01-20' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull (less than what git shortlog provides): * drm/atomic: Release state on error * drm/syncobj: Fix use-after-free * drm/ttm: Don't use GFP_TRANSHUGE_LIGTH * drm/vc4: Unify driver naming for PCM * drm/vram-helper: Fix memory leak in vmap Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/YAgdYGNoH7pC29rz@linux-uq9g
2021-01-21x86/cpu: Add another Alder Lake CPU to the Intel familyGayatri Kammela
Add Alder Lake mobile CPU model number to Intel family. Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210121215004.11618-1-tony.luck@intel.com
2021-01-21objtool: Don't fail on missing symbol tableJosh Poimboeuf
Thanks to a recent binutils change which doesn't generate unused symbols, it's now possible for thunk_64.o be completely empty without CONFIG_PREEMPTION: no text, no data, no symbols. We could edit the Makefile to only build that file when CONFIG_PREEMPTION is enabled, but that will likely create confusion if/when the thunks end up getting used by some other code again. Just ignore it and move on. Reported-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1254 Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
2021-01-21objtool: Don't fail the kernel build on fatal errorsJosh Poimboeuf
This is basically a revert of commit 644592d32837 ("objtool: Fail the kernel build on fatal errors"). That change turned out to be more trouble than it's worth. Failing the build is an extreme measure which sometimes gets too much attention and blocks CI build testing. These fatal-type warnings aren't yet as rare as we'd hope, due to the ever-increasing matrix of supported toolchains/plugins and their fast-changing nature as of late. Also, there are more people (and bots) looking for objtool warnings than ever before, so even non-fatal warnings aren't likely to be ignored for long. Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
2021-01-21perf script: Fix overrun issue for dynamically-allocated PMU type numberJin Yao
When unpacking the event which is from dynamic PMU, the array output[OUTPUT_TYPE_MAX] may be overrun. For example, type number of SKL uncore_imc is 10, but OUTPUT_TYPE_MAX is 7 now (OUTPUT_TYPE_MAX = PERF_TYPE_MAX + 1). /* In builtin-script.c */ process_event() { unsigned int type = output_type(attr->type); if (output[type].fields == 0) return; } output[10] is overrun. Create a type OUTPUT_TYPE_OTHER for dynamic PMU events, then output_type(attr->type) will return OUTPUT_TYPE_OTHER here. Note that if PERF_TYPE_MAX ever changed, then there would be a conflict between old perf.data files that had a dynamicaliy allocated PMU number that would then be the same as a fixed PERF_TYPE. Example: # perf record --switch-events -C 0 -e "{cpu-clock,uncore_imc/data_reads/,uncore_imc/data_writes/}:SD" -a -- sleep 1 # perf script Before: swapper 0 [000] 1479253.987551: 277766 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.987797: 246709 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.988127: 329883 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.988273: 146393 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.988523: 249977 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.988877: 354090 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.989023: 145940 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.989383: 359856 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1479253.989523: 140082 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) After: swapper 0 [000] 1397040.402011: 272384 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1397040.402011: 5396 uncore_imc/data_reads/: swapper 0 [000] 1397040.402011: 967 uncore_imc/data_writes/: swapper 0 [000] 1397040.402259: 249153 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1397040.402259: 7231 uncore_imc/data_reads/: swapper 0 [000] 1397040.402259: 1297 uncore_imc/data_writes/: swapper 0 [000] 1397040.402508: 249108 cpu-clock: ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms]) swapper 0 [000] 1397040.402508: 5333 uncore_imc/data_reads/: swapper 0 [000] 1397040.402508: 1008 uncore_imc/data_writes/: Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20201209005828.21302-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-21perf metricgroup: Fix system PMU metricsJohn Garry
Joakim reports that getting "perf stat" for multiple system PMU metrics segfaults: $ perf stat -a -I 1000 -M imx8mm_ddr_write.all,imx8mm_ddr_write.all Segmentation fault $ While the same works without issue for a single metric. The logic in metricgroup__add_metric_sys_event_iter() is broken, in that add_metric() @m argument should be NULL for each new metric. Fix by not passing a holder for that, and rather make local in metricgroup__add_metric_sys_event_iter(). Fixes: be335ec28efa ("perf metricgroup: Support adding metrics for system PMUs") Reported-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linuxarm@openeuler.org Link: https://lore.kernel.org/r/1611050655-44020-1-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-21perf metricgroup: Fix for metrics containing duration_timeJohn Garry
Metrics containing duration_time cause a segfault: $ perf stat -v -M L1D_Cache_Fill_BW sleep 1 Using CPUID GenuineIntel-6-3D-4 metric expr 64 * l1d.replacement / 1000000000 / duration_time for L1D_Cache_Fill_BW found event duration_time found event l1d.replacement adding {l1d.replacement}:W,duration_time l1d.replacement -> cpu/umask=0x1,(null)=0x1e8483,event=0x51/ Segmentation fault $ In commit c2337d67199a1ea1 ("perf metricgroup: Fix metrics using aliases covering multiple PMUs"), the logic in find_evsel_group() when iter'ing events was changed to not only select events in same group, but also for aliased PMUs. Checking whether events were for aliased PMUs was done by comparing the event PMU name. This was not safe for duration_time event, which has no associated PMU (and no PMU name), so fix by checking if the event PMU name is set also. Committer testing: Reproduced the bug, then, on a: $ grep -m1 ^'model name' /proc/cpuinfo model name : Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz $ We now get: $ perf stat -M L1D_Cache_Fill_BW sleep 1 Performance counter stats for 'sleep 1': 4,141 l1d.replacement:u 1,001,285,107 ns duration_time:u 1.001285107 seconds time elapsed 0.000000000 seconds user 0.001119000 seconds sys $ Detais from -v: Using CPUID GenuineIntel-6-8E-A metric expr 64 * l1d.replacement / 1000000000 / duration_time for L1D_Cache_Fill_BW found event duration_time found event l1d.replacement adding {l1d.replacement}:W,duration_time l1d.replacement -> cpu/(null)=0x1e8483,umask=0x1,event=0x51/ Control descriptor is not initialized Warning: kernel.perf_event_paranoid=2, trying to fall back to excluding kernel and hypervisor samples Warning: kernel.perf_event_paranoid=2, trying to fall back to excluding kernel and hypervisor samples l1d.replacement:u: 4592 612201 612201 duration_time:u: 1001478621 1001478621 1001478621 Fixes: c2337d67199a1ea1 ("perf metricgroup: Fix metrics using aliases covering multiple PMUs") Reported-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linuxarm@openeuler.org Link: https://lore.kernel.org/r/1611159518-226883-1-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-21perf evlist: Fix id index for heterogeneous systemsAdrian Hunter
perf_evlist__set_sid_idx() updates perf_sample_id with the evlist map index, CPU number and TID. It is passed indexes to the evsel's cpu and thread maps, but references the evlist's maps instead. That results in using incorrect CPU numbers on heterogeneous systems. Fix it by using evsel maps. The id index (PERF_RECORD_ID_INDEX) is used by AUX area tracing when in sampling mode. Having an incorrect CPU number causes the trace data to be attributed to the wrong CPU, and can result in decoder errors because the trace data is then associated with the wrong process. Committer notes: Keep the class prefix convention in the function name, switching from perf_evlist__set_sid_idx() to perf_evsel__set_sid_idx(). Fixes: 3c659eedada2fbf9 ("perf tools: Add id index") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20210121125446.11287-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-21dm: avoid filesystem lookup in dm_get_dev_t()Hannes Reinecke
This reverts commit 644bda6f3460 ("dm table: fall back to getting device using name_to_dev_t()") dm_get_dev_t() is just used to convert an arbitrary 'path' string into a dev_t. It doesn't presume that the device is present; that check will be done later, as the only caller is dm_get_device(), which does a dm_get_table_device() later on, which will properly open the device. So if the path string already _is_ in major:minor representation we can convert it directly, avoiding a recursion into the filesystem to lookup the block device. This avoids a hang in multipath_message() when the filesystem is inaccessible. Fixes: 644bda6f3460 ("dm table: fall back to getting device using name_to_dev_t()") Cc: stable@vger.kernel.org Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-01-21dm crypt: fix copy and paste bug in crypt_alloc_req_aeadIgnat Korchagin
In commit d68b29584c25 ("dm crypt: use GFP_ATOMIC when allocating crypto requests from softirq") code was incorrectly copy and pasted from crypt_alloc_req_skcipher()'s crypto request allocation code to crypt_alloc_req_aead(). It is OK from runtime perspective as both simple encryption request pointer and AEAD request pointer are part of a union, but may confuse code reviewers. Fixes: d68b29584c25 ("dm crypt: use GFP_ATOMIC when allocating crypto requests from softirq") Cc: stable@vger.kernel.org # v5.9+ Reported-by: Pavel Machek <pavel@denx.de> Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-01-21dm integrity: conditionally disable "recalculate" featureMikulas Patocka
Otherwise a malicious user could (ab)use the "recalculate" feature that makes dm-integrity calculate the checksums in the background while the device is already usable. When the system restarts before all checksums have been calculated, the calculation continues where it was interrupted even if the recalculate feature is not requested the next time the dm device is set up. Disable recalculating if we use internal_hash or journal_hash with a key (e.g. HMAC) and we don't have the "legacy_recalculate" flag. This may break activation of a volume, created by an older kernel, that is not yet fully recalculated -- if this happens, the user should add the "legacy_recalculate" flag to constructor parameters. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: Daniel Glockner <dg@emlix.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>