summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-08-13kasan, kmemleak: reset tags when scanning blockKuan-Ying Lee
Patch series "kasan, slub: reset tag when printing address", v3. With hardware tag-based kasan enabled, we reset the tag when we access metadata to avoid from false alarm. This patch (of 2): Kmemleak needs to scan kernel memory to check memory leak. With hardware tag-based kasan enabled, when it scans on the invalid slab and dereference, the issue will occur as below. Hardware tag-based KASAN doesn't use compiler instrumentation, we can not use kasan_disable_current() to ignore tag check. Based on the below report, there are 11 0xf7 granules, which amounts to 176 bytes, and the object is allocated from the kmalloc-256 cache. So when kmemleak accesses the last 256-176 bytes, it causes faults, as those are marked with KASAN_KMALLOC_REDZONE == KASAN_TAG_INVALID == 0xfe. Thus, we reset tags before accessing metadata to avoid from false positives. BUG: KASAN: out-of-bounds in scan_block+0x58/0x170 Read at addr f7ff0000c0074eb0 by task kmemleak/138 Pointer tag: [f7], memory tag: [fe] CPU: 7 PID: 138 Comm: kmemleak Not tainted 5.14.0-rc2-00001-g8cae8cd89f05-dirty #134 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x1b0 show_stack+0x1c/0x30 dump_stack_lvl+0x68/0x84 print_address_description+0x7c/0x2b4 kasan_report+0x138/0x38c __do_kernel_fault+0x190/0x1c4 do_tag_check_fault+0x78/0x90 do_mem_abort+0x44/0xb4 el1_abort+0x40/0x60 el1h_64_sync_handler+0xb4/0xd0 el1h_64_sync+0x78/0x7c scan_block+0x58/0x170 scan_gray_list+0xdc/0x1a0 kmemleak_scan+0x2ac/0x560 kmemleak_scan_thread+0xb0/0xe0 kthread+0x154/0x160 ret_from_fork+0x10/0x18 Allocated by task 0: kasan_save_stack+0x2c/0x60 __kasan_kmalloc+0xec/0x104 __kmalloc+0x224/0x3c4 __register_sysctl_paths+0x200/0x290 register_sysctl_table+0x2c/0x40 sysctl_init+0x20/0x34 proc_sys_init+0x3c/0x48 proc_root_init+0x80/0x9c start_kernel+0x648/0x6a4 __primary_switched+0xc0/0xc8 Freed by task 0: kasan_save_stack+0x2c/0x60 kasan_set_track+0x2c/0x40 kasan_set_free_info+0x44/0x54 ____kasan_slab_free.constprop.0+0x150/0x1b0 __kasan_slab_free+0x14/0x20 slab_free_freelist_hook+0xa4/0x1fc kfree+0x1e8/0x30c put_fs_context+0x124/0x220 vfs_kern_mount.part.0+0x60/0xd4 kern_mount+0x24/0x4c bdev_cache_init+0x70/0x9c vfs_caches_init+0xdc/0xf4 start_kernel+0x638/0x6a4 __primary_switched+0xc0/0xc8 The buggy address belongs to the object at ffff0000c0074e00 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 176 bytes inside of 256-byte region [ffff0000c0074e00, ffff0000c0074f00) The buggy address belongs to the page: page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x100074 head:(____ptrval____) order:2 compound_mapcount:0 compound_pincount:0 flags: 0xbfffc0000010200(slab|head|node=0|zone=2|lastcpupid=0xffff|kasantag=0x0) raw: 0bfffc0000010200 0000000000000000 dead000000000122 f5ff0000c0002300 raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff0000c0074c00: f0 f0 f0 f0 f0 f0 f0 f0 f0 fe fe fe fe fe fe fe ffff0000c0074d00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe >ffff0000c0074e00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 fe fe fe fe fe ^ ffff0000c0074f00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe ffff0000c0075000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Disabling lock debugging due to kernel taint kmemleak: 181 new suspected memory leaks (see /sys/kernel/debug/kmemleak) Link: https://lkml.kernel.org/r/20210804090957.12393-1-Kuan-Ying.Lee@mediatek.com Link: https://lkml.kernel.org/r/20210804090957.12393-2-Kuan-Ying.Lee@mediatek.com Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Marco Elver <elver@google.com> Cc: Nicholas Tang <nicholas.tang@mediatek.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-13Merge tag 'block-5.14-2021-08-13' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A few fixes for block that should go into 5.14: - Revert the mq-deadline cgroup addition. More work is needed on this front, let's revert it for now and get it right before having it in a released kernel (Tejun) - blk-iocost lockdep fix (Ming) - nbd double completion fix (Xie) - Fix for non-idling when clearing the shared tag flag (Yu)" * tag 'block-5.14-2021-08-13' of git://git.kernel.dk/linux-block: nbd: Aovid double completion of a request blk-mq: clear active_queues before clearing BLK_MQ_F_TAG_QUEUE_SHARED Revert "block/mq-deadline: Add cgroup support" blk-iocost: fix lockdep warning on blkcg->lock
2021-08-13Merge tag 'io_uring-5.14-2021-08-13' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "A bit bigger than the previous weeks, but mostly just a few stable bound fixes. In detail: - Followup fixes to patches from last week for io-wq, turns out they weren't complete (Hao) - Two lockdep reported fixes out of the RT camp (me) - Sync the io_uring-cp example with liburing, as a few bug fixes never made it to the kernel carried version (me) - SQPOLL related TIF_NOTIFY_SIGNAL fix (Nadav) - Use WRITE_ONCE() when writing sq flags (Nadav) - io_rsrc_put_work() deadlock fix (Pavel)" * tag 'io_uring-5.14-2021-08-13' of git://git.kernel.dk/linux-block: tools/io_uring/io_uring-cp: sync with liburing example io_uring: fix ctx-exit io_rsrc_put_work() deadlock io_uring: drop ctx->uring_lock before flushing work item io-wq: fix IO_WORKER_F_FIXED issue in create_io_worker() io-wq: fix bug of creating io-wokers unconditionally io_uring: rsrc ref lock needs to be IRQ safe io_uring: Use WRITE_ONCE() when writing to sq_flags io_uring: clear TIF_NOTIFY_SIGNAL when running task work
2021-08-13Merge tag 'pinctrl-v5.14-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "An assortment of pin control fixes of varying importance, the most important ones affecting Intel and AMD laptops turned up the recent few days so it's time to push this to your tree. - Fix the Kconfig dependency for Qualcomm SM8350 pin controller - Fix pin biasing fallback behaviour on the Mediatek pin controller - Fix the GPIO numbering scheme for Intel Tiger Lake-H to correspond to the products that are now actually out on the market - Fix a pin control function itemization in the Sunxi driver out-of-bounds access bug - Fix disable clocking for the RISC-V K210 pin controller on the errorpath - Fix a system shutdown bug affecting AMD Ryzen-based laptops, the system would not suspend but just bounce back up" * tag 'pinctrl-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: amd: Fix an issue with shutdown when system set to s0ix pinctrl: k210: Fix k210_fpioa_probe() pinctrl: sunxi: Don't underestimate number of functions pinctrl: tigerlake: Fix GPIO mapping for newer version of software pinctrl: mediatek: Fix fallback behavior for bias_set_combo pinctrl: qcom: fix GPIOLIB dependencies
2021-08-13soc: fsl: qe: fix static checker warningMaxim Kochetkov
The patch be7ecbd240b2: "soc: fsl: qe: convert QE interrupt controller to platform_device" from Aug 3, 2021, leads to the following static checker warning: drivers/soc/fsl/qe/qe_ic.c:438 qe_ic_init() warn: unsigned 'qe_ic->virq_low' is never less than zero. In old variant irq_of_parse_and_map() returns zero if failed so unsigned int for virq_high/virq_low was ok. In new variant platform_get_irq() returns negative error codes if failed so we need to use int for virq_high/virq_low. Also simplify high_handler checking and remove the curly braces to make checkpatch happy. Fixes: be7ecbd240b2 ("soc: fsl: qe: convert QE interrupt controller to platform_device") Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Li Yang <leoyang.li@nxp.com>
2021-08-13Merge branch 'bnxt-tx-napi-disabling-resiliency-improvements'Jakub Kicinski
Jakub Kicinski says: ==================== bnxt: Tx NAPI disabling resiliency improvements A lockdep warning was triggered by netpoll because napi poll was taking the xmit lock. Fix that and a couple more issues noticed while reading the code. ==================== Link: https://lore.kernel.org/r/20210812214242.578039-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-13bnxt: count Tx dropsJakub Kicinski
Drivers should count packets they are dropping. Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-13bnxt: make sure xmit_more + errors does not miss doorbellsJakub Kicinski
skbs are freed on error and not put on the ring. We may, however, be in a situation where we're freeing the last skb of a batch, and there is a doorbell ring pending because of xmit_more() being true earlier. Make sure we ring the door bell in such situations. Since errors are rare don't pay attention to xmit_more() and just always flush the pending frames. The busy case should be safe to be left alone because it can only happen if start_xmit races with completions and they both enable the queue. In that case the kick can't be pending. Noticed while reading the code. Fixes: 4d172f21cefe ("bnxt_en: Implement xmit_more.") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-13bnxt: disable napi before canceling DIMJakub Kicinski
napi schedules DIM, napi has to be disabled first, then DIM canceled. Noticed while reading the code. Fixes: 0bc0b97fca73 ("bnxt_en: cleanup DIM work on device shutdown") Fixes: 6a8788f25625 ("bnxt_en: add support for software dynamic interrupt moderation") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-13bnxt: don't lock the tx queue from napi pollJakub Kicinski
We can't take the tx lock from the napi poll routine, because netpoll can poll napi at any moment, including with the tx lock already held. The tx lock is protecting against two paths - the disable path, and (as Michael points out) the NETDEV_TX_BUSY case which may occur if NAPI completions race with start_xmit and both decide to re-enable the queue. For the disable/ifdown path use synchronize_net() to make sure closing the device does not race we restarting the queues. Annotate accesses to dev_state against data races. For the NAPI cleanup vs start_xmit path - appropriate barriers are already in place in the main spot where Tx queue is stopped but we need to do the same careful dance in the TX_BUSY case. Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-13nbd: Aovid double completion of a requestXie Yongji
There is a race between iterating over requests in nbd_clear_que() and completing requests in recv_work(), which can lead to double completion of a request. To fix it, flush the recv worker before iterating over the requests and don't abort the completed request while iterating. Fixes: 96d97e17828f ("nbd: clear_sock on netlink disconnect") Reported-by: Jiang Yadong <jiangyadong@bytedance.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20210813151330.96-1-xieyongji@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13selftests, bpf: Test that dead ldx_w insns are acceptedIlya Leoshkevich
Prevent regressions related to zero-extension metadata handling during dead code sanitization. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210812151811.184086-3-iii@linux.ibm.com
2021-08-13bpf: Clear zext_dst of dead insnsIlya Leoshkevich
"access skb fields ok" verifier test fails on s390 with the "verifier bug. zext_dst is set, but no reg is defined" message. The first insns of the test prog are ... 0: 61 01 00 00 00 00 00 00 ldxw %r0,[%r1+0] 8: 35 00 00 01 00 00 00 00 jge %r0,0,1 10: 61 01 00 08 00 00 00 00 ldxw %r0,[%r1+8] ... and the 3rd one is dead (this does not look intentional to me, but this is a separate topic). sanitize_dead_code() converts dead insns into "ja -1", but keeps zext_dst. When opt_subreg_zext_lo32_rnd_hi32() tries to parse such an insn, it sees this discrepancy and bails. This problem can be seen only with JITs whose bpf_jit_needs_zext() returns true. Fix by clearning dead insns' zext_dst. The commits that contributed to this problem are: 1. 5aa5bd14c5f8 ("bpf: add initial suite for selftests"), which introduced the test with the dead code. 2. 5327ed3d44b7 ("bpf: verifier: mark verified-insn with sub-register zext flag"), which introduced the zext_dst flag. 3. 83a2881903f3 ("bpf: Account for BPF_FETCH in insn_has_def32()"), which introduced the sanity check. 4. 9183671af6db ("bpf: Fix leakage under speculation on mispredicted branches"), which bisect points to. It's best to fix this on stable branches that contain the second one, since that's the point where the inconsistency was introduced. Fixes: 5327ed3d44b7 ("bpf: verifier: mark verified-insn with sub-register zext flag") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210812151811.184086-2-iii@linux.ibm.com
2021-08-13tools/io_uring/io_uring-cp: sync with liburing exampleJens Axboe
This example is missing a few fixes that are in the liburing version, synchronize with the upstream version. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13blk-mq: clear active_queues before clearing BLK_MQ_F_TAG_QUEUE_SHAREDYu Kuai
We run a test that delete and recover devcies frequently(two devices on the same host), and we found that 'active_queues' is super big after a period of time. If device a and device b share a tag set, and a is deleted, then blk_mq_exit_queue() will clear BLK_MQ_F_TAG_QUEUE_SHARED because there is only one queue that are using the tag set. However, if b is still active, the active_queues of b might never be cleared even if b is deleted. Thus clear active_queues before BLK_MQ_F_TAG_QUEUE_SHARED is cleared. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210731062130.1533893-1-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13driver core: Add missing kernel doc for device::msi_lockThomas Gleixner
Fixes: 77e89afc25f3 ("PCI/MSI: Protect msi_desc::masked for multi-MSI") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2021-08-13ipack: tpci200: fix memory leak in the tpci200_registerDongliang Mu
The error handling code in tpci200_register does not free interface_regs allocated by ioremap and the current version of error handling code is problematic. Fix this by refactoring the error handling code and free interface_regs when necessary. Fixes: 43986798fd50 ("ipack: add error handling for ioremap_nocache") Cc: stable@vger.kernel.org Reported-by: Dongliang Mu <mudongliangabcd@gmail.com> Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Link: https://lore.kernel.org/r/20210810100323.3938492-2-mudongliangabcd@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-13ipack: tpci200: fix many double free issues in tpci200_pci_probeDongliang Mu
The function tpci200_register called by tpci200_install and tpci200_unregister called by tpci200_uninstall are in pair. However, tpci200_unregister has some cleanup operations not in the tpci200_register. So the error handling code of tpci200_pci_probe has many different double free issues. Fix this problem by moving those cleanup operations out of tpci200_unregister, into tpci200_pci_remove and reverting the previous commit 9272e5d0028d ("ipack/carriers/tpci200: Fix a double free in tpci200_pci_probe"). Fixes: 9272e5d0028d ("ipack/carriers/tpci200: Fix a double free in tpci200_pci_probe") Cc: stable@vger.kernel.org Reported-by: Dongliang Mu <mudongliangabcd@gmail.com> Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Link: https://lore.kernel.org/r/20210810100323.3938492-1-mudongliangabcd@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-13slimbus: ngd: reset dma setup during runtime pmSrinivas Kandagatla
During suspend/resume NGD remote instance is power cycled along with remotely controlled bam dma engine. So Reset the dma configuration during this suspend resume path so that we are not dealing with any stale dma setup. Without this transactions timeout after first suspend resume path. Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver") Cc: <stable@vger.kernel.org> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20210809082428.11236-5-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-13slimbus: ngd: set correct device for pmSrinivas Kandagatla
For some reason we ended up using wrong device in some places for pm_runtime calls. Fix this so that NGG driver can do runtime pm correctly. Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver") Cc: <stable@vger.kernel.org> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20210809082428.11236-4-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-13slimbus: messaging: check for valid transaction idSrinivas Kandagatla
In some usecases transaction ids are dynamically allocated inside the controller driver after sending the messages which have generic acknowledge responses. So check for this before refcounting pm_runtime. Without this we would end up imbalancing runtime pm count by doing pm_runtime_put() in both slim_do_transfer() and slim_msg_response() for a single pm_runtime_get() in slim_do_transfer() Fixes: d3062a210930 ("slimbus: messaging: add slim_alloc/free_txn_tid()") Cc: <stable@vger.kernel.org> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20210809082428.11236-3-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-13slimbus: messaging: start transaction ids from 1 instead of zeroSrinivas Kandagatla
As tid is unsigned its hard to figure out if the tid is valid or invalid. So Start the transaction ids from 1 instead of zero so that we could differentiate between a valid tid and invalid tids This is useful in cases where controller would add a tid for controller specific transfers. Fixes: d3062a210930 ("slimbus: messaging: add slim_alloc/free_txn_tid()") Cc: <stable@vger.kernel.org> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20210809082428.11236-2-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-13Merge branch 'kvm-tdpmmu-fixes' into kvm-masterPaolo Bonzini
Merge topic branch with fixes for both 5.14-rc6 and 5.15.
2021-08-13KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlockSean Christopherson
Add yet another spinlock for the TDP MMU and take it when marking indirect shadow pages unsync. When using the TDP MMU and L1 is running L2(s) with nested TDP, KVM may encounter shadow pages for the TDP entries managed by L1 (controlling L2) when handling a TDP MMU page fault. The unsync logic is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and misbehaves when a shadow page is marked unsync via a TDP MMU page fault, which runs with mmu_lock held for read, not write. Lack of a critical section manifests most visibly as an underflow of unsync_children in clear_unsync_child_bit() due to unsync_children being corrupted when multiple CPUs write it without a critical section and without atomic operations. But underflow is the best case scenario. The worst case scenario is that unsync_children prematurely hits '0' and leads to guest memory corruption due to KVM neglecting to properly sync shadow pages. Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock would functionally be ok. Usurping the lock could degrade performance when building upper level page tables on different vCPUs, especially since the unsync flow could hold the lock for a comparatively long time depending on the number of indirect shadow pages and the depth of the paging tree. For simplicity, take the lock for all MMUs, even though KVM could fairly easily know that mmu_lock is held for write. If mmu_lock is held for write, there cannot be contention for the inner spinlock, and marking shadow pages unsync across multiple vCPUs will be slow enough that bouncing the kvm_arch cacheline should be in the noise. Note, even though L2 could theoretically be given access to its own EPT entries, a nested MMU must hold mmu_lock for write and thus cannot race against a TDP MMU page fault. I.e. the additional spinlock only _needs_ to be taken by the TDP MMU, as opposed to being taken by any MMU for a VM that is running with the TDP MMU enabled. Holding mmu_lock for read also prevents the indirect shadow page from being freed. But as above, keep it simple and always take the lock. Alternative #1, the TDP MMU could simply pass "false" for can_unsync and effectively disable unsync behavior for nested TDP. Write protecting leaf shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such VMMs typically don't modify TDP entries, but the same may not hold true for non-standard use cases and/or VMMs that are migrating physical pages (from L1's perspective). Alternative #2, the unsync logic could be made thread safe. In theory, simply converting all relevant kvm_mmu_page fields to atomics and using atomic bitops for the bitmap would suffice. However, (a) an in-depth audit would be required, (b) the code churn would be substantial, and (c) legacy shadow paging would incur additional atomic operations in performance sensitive paths for no benefit (to legacy shadow paging). Fixes: a2855afc7ee8 ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812181815.3378104-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEsSean Christopherson
Set the min_level for the TDP iterator at the root level when zapping all SPTEs to optimize the iterator's try_step_down(). Zapping a non-leaf SPTE will recursively zap all its children, thus there is no need for the iterator to attempt to step down. This avoids rereading the top-level SPTEs after they are zapped by causing try_step_down() to short-circuit. In most cases, optimizing try_step_down() will be in the noise as the cost of zapping SPTEs completely dominates the overall time. The optimization is however helpful if the zap occurs with relatively few SPTEs, e.g. if KVM is zapping in response to multiple memslot updates when userspace is adding and removing read-only memslots for option ROMs. In that case, the task doing the zapping likely isn't a vCPU thread, but it still holds mmu_lock for read and thus can be a noisy neighbor of sorts. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812181414.3376143-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEsSean Christopherson
Pass "all ones" as the end GFN to signal "zap all" for the TDP MMU and really zap all SPTEs in this case. As is, zap_gfn_range() skips non-leaf SPTEs whose range exceeds the range to be zapped. If shadow_phys_bits is not aligned to the range size of top-level SPTEs, e.g. 512gb with 4-level paging, the "zap all" flows will skip top-level SPTEs whose range extends beyond shadow_phys_bits and leak their SPs when the VM is destroyed. Use the current upper bound (based on host.MAXPHYADDR) to detect that the caller wants to zap all SPTEs, e.g. instead of using the max theoretical gfn, 1 << (52 - 12). The more precise upper bound allows the TDP iterator to terminate its walk earlier when running on hosts with MAXPHYADDR < 52. Add a WARN on kmv->arch.tdp_mmu_pages when the TDP MMU is destroyed to help future debuggers should KVM decide to leak SPTEs again. The bug is most easily reproduced by running (and unloading!) KVM in a VM whose host.MAXPHYADDR < 39, as the SPTE for gfn=0 will be skipped. ============================================================================= BUG kvm_mmu_page_header (Not tainted): Objects remaining in kvm_mmu_page_header on __kmem_cache_shutdown() ----------------------------------------------------------------------------- Slab 0x000000004d8f7af1 objects=22 used=2 fp=0x00000000624d29ac flags=0x4000000000000200(slab|zone=1) CPU: 0 PID: 1582 Comm: rmmod Not tainted 5.14.0-rc2+ #420 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack_lvl+0x45/0x59 slab_err+0x95/0xc9 __kmem_cache_shutdown.cold+0x3c/0x158 kmem_cache_destroy+0x3d/0xf0 kvm_mmu_module_exit+0xa/0x30 [kvm] kvm_arch_exit+0x5d/0x90 [kvm] kvm_exit+0x78/0x90 [kvm] vmx_exit+0x1a/0x50 [kvm_intel] __x64_sys_delete_module+0x13f/0x220 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812181414.3376143-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13Merge tag 'kvmarm-fixes-5.14-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.14, take #2 - Plug race between enabling MTE and creating vcpus - Fix off-by-one bug when checking whether an address range is RAM
2021-08-13KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PFSean Christopherson
Use vmx_need_pf_intercept() when determining if L0 wants to handle a #PF in L2 or if the VM-Exit should be forwarded to L1. The current logic fails to account for the case where #PF is intercepted to handle guest.MAXPHYADDR < host.MAXPHYADDR and ends up reflecting all #PFs into L1. At best, L1 will complain and inject the #PF back into L2. At worst, L1 will eat the unexpected fault and cause L2 to hang on infinite page faults. Note, while the bug was technically introduced by the commit that added support for the MAXPHYADDR madness, the shame is all on commit a0c134347baf ("KVM: VMX: introduce vmx_need_pf_intercept"). Fixes: 1dbf5d68af6f ("KVM: VMX: Add guest physical address check in EPT violation and misconfig") Cc: stable@vger.kernel.org Cc: Peter Shier <pshier@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210812045615.3167686-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13kvm: vmx: Sync all matching EPTPs when injecting nested EPT faultJunaid Shahid
When a nested EPT violation/misconfig is injected into the guest, the shadow EPT PTEs associated with that address need to be synced. This is done by kvm_inject_emulated_page_fault() before it calls nested_ept_inject_page_fault(). However, that will only sync the shadow EPT PTE associated with the current L1 EPTP. Since the ASID is based on EP4TA rather than the full EPTP, so syncing the current EPTP is not enough. The SPTEs associated with any other L1 EPTPs in the prev_roots cache with the same EP4TA also need to be synced. Signed-off-by: Junaid Shahid <junaids@google.com> Message-Id: <20210806222229.1645356-1-junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13Merge branch 'kvm-vmx-secctl' into kvm-masterPaolo Bonzini
Merge common topic branch for 5.14-rc6 and 5.15 merge window.
2021-08-13KVM: x86: remove dead initializationPaolo Bonzini
hv_vcpu is initialized again a dozen lines below, and at this point vcpu->arch.hyperv is not valid. Remove the initializer. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-13KVM: x86: Allow guest to set EFER.NX=1 on non-PAE 32-bit kernelsSean Christopherson
Remove an ancient restriction that disallowed exposing EFER.NX to the guest if EFER.NX=0 on the host, even if NX is fully supported by the CPU. The motivation of the check, added by commit 2cc51560aed0 ("KVM: VMX: Avoid saving and restoring msr_efer on lightweight vmexit"), was to rule out the case of host.EFER.NX=0 and guest.EFER.NX=1 so that KVM could run the guest with the host's EFER.NX and thus avoid context switching EFER if the only divergence was the NX bit. Fast forward to today, and KVM has long since stopped running the guest with the host's EFER.NX. Not only does KVM context switch EFER if host.EFER.NX=1 && guest.EFER.NX=0, KVM also forces host.EFER.NX=0 && guest.EFER.NX=1 when using shadow paging (to emulate SMEP). Furthermore, the entire motivation for the restriction was made obsolete over a decade ago when Intel added dedicated host and guest EFER fields in the VMCS (Nehalem timeframe), which reduced the overhead of context switching EFER from 400+ cycles (2 * WRMSR + 1 * RDMSR) to a mere ~2 cycles. In practice, the removed restriction only affects non-PAE 32-bit kernels, as EFER.NX is set during boot if NX is supported and the kernel will use PAE paging (32-bit or 64-bit), regardless of whether or not the kernel will actually use NX itself (mark PTEs non-executable). Alternatively and/or complementarily, startup_32_smp() in head_32.S could be modified to set EFER.NX=1 regardless of paging mode, thus eliminating the scenario where NX is supported but not enabled. However, that runs the risk of breaking non-KVM non-PAE kernels (though the risk is very, very low as there are no known EFER.NX errata), and also eliminates an easy-to-use mechanism for stressing KVM's handling of guest vs. host EFER across nested virtualization transitions. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210805183804.1221554-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-12Merge tag 'net-5.14-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from netfilter, bpf, can and ieee802154. The size of this is pretty normal, but we got more fixes for 5.14 changes this week than last week. Nothing major but the trend is the opposite of what we like. We'll see how the next week goes.. Current release - regressions: - r8169: fix ASPM-related link-up regressions - bridge: fix flags interpretation for extern learn fdb entries - phy: micrel: fix link detection on ksz87xx switch - Revert "tipc: Return the correct errno code" - ptp: fix possible memory leak caused by invalid cast Current release - new code bugs: - bpf: add missing bpf_read_[un]lock_trace() for syscall program - bpf: fix potentially incorrect results with bpf_get_local_storage() - page_pool: mask the page->signature before the checking, avoid dma mapping leaks - netfilter: nfnetlink_hook: 5 fixes to information in netlink dumps - bnxt_en: fix firmware interface issues with PTP - mlx5: Bridge, fix ageing time Previous releases - regressions: - linkwatch: fix failure to restore device state across suspend/resume - bareudp: fix invalid read beyond skb's linear data Previous releases - always broken: - bpf: fix integer overflow involving bucket_size - ppp: fix issues when desired interface name is specified via netlink - wwan: mhi_wwan_ctrl: fix possible deadlock - dsa: microchip: ksz8795: fix number of VLAN related bugs - dsa: drivers: fix broken backpressure in .port_fdb_dump - dsa: qca: ar9331: make proper initial port defaults Misc: - bpf: add lockdown check for probe_write_user helper - netfilter: conntrack: remove offload_pickup sysctl before 5.14 is out - netfilter: conntrack: collect all entries in one cycle, heuristically slow down garbage collection scans on idle systems to prevent frequent wake ups" * tag 'net-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) vsock/virtio: avoid potential deadlock when vsock device remove wwan: core: Avoid returning NULL from wwan_create_dev() net: dsa: sja1105: unregister the MDIO buses during teardown Revert "tipc: Return the correct errno code" net: mscc: Fix non-GPL export of regmap APIs net: igmp: increase size of mr_ifc_count MAINTAINERS: switch to my OMP email for Renesas Ethernet drivers tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets net: pcs: xpcs: fix error handling on failed to allocate memory net: linkwatch: fix failure to restore device state across suspend/resume net: bridge: fix memleak in br_add_if() net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge net: bridge: fix flags interpretation for extern learn fdb entries net: dsa: sja1105: fix broken backpressure in .port_fdb_dump net: dsa: lantiq: fix broken backpressure in .port_fdb_dump net: dsa: lan9303: fix broken backpressure in .port_fdb_dump net: dsa: hellcreek: fix broken backpressure in .port_fdb_dump bpf, core: Fix kernel-doc notation net: igmp: fix data-race in igmp_ifc_timer_expire() net: Fix memory leak in ieee802154_raw_deliver ...
2021-08-12Merge tag 'ceph-for-5.14-rc6' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "A patch to avoid a soft lockup in ceph_check_delayed_caps() from Luis and a reference handling fix from Jeff that should address some memory corruption reports in the snaprealm area. Both marked for stable" * tag 'ceph-for-5.14-rc6' of git://github.com/ceph/ceph-client: ceph: take snap_empty_lock atomically with snaprealm refcount change ceph: reduce contention in ceph_check_delayed_caps()
2021-08-12Merge tag 'drm-fixes-2021-08-13' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Another week, another set of pretty regular fixes, nothing really stands out too much. amdgpu: - Yellow carp update - RAS EEPROM fixes - BACO/BOCO fixes - Fix a memory leak in an error path - Freesync fix - VCN harvesting fix - Display fixes i915: - GVT fix for Windows VM hang. - Display fix of 12 BPC bits for display 12 and newer. - Don't try to access some media register for fused off domains. - Fix kerneldoc build warnings. mediatek: - Fix dpi bridge bug. - Fix cursor plane no update. meson: - Fix colors when booting with HDR" * tag 'drm-fixes-2021-08-13' of git://anongit.freedesktop.org/drm/drm: drm/doc/rfc: drop lmem uapi section drm/i915: Only access SFC_DONE when media domain is not fused off drm/i915/display: Fix the 12 BPC bits for PIPE_MISC reg drm/amd/display: use GFP_ATOMIC in amdgpu_dm_irq_schedule_work drm/amd/display: Remove invalid assert for ODM + MPC case drm/amd/pm: bug fix for the runtime pm BACO drm/amdgpu: handle VCN instances when harvesting (v2) drm/meson: fix colour distortion from HDR set during vendor u-boot drm/i915/gvt: Fix cached atomics setting for Windows VM drm/amdgpu: Add preferred mode in modeset when freesync video mode's enabled. drm/amd/pm: Fix a memory leak in an error handling path in 'vangogh_tables_init()' drm/amdgpu: don't enable baco on boco platforms in runpm drm/amdgpu: set RAS EEPROM address from VBIOS drm/amd/pm: update smu v13.0.1 firmware header drm/mediatek: Fix cursor plane no update drm/mediatek: mtk-dpi: Set out_fmt from config if not the last bridge drm/mediatek: dpi: Fix NULL dereference in mtk_dpi_bridge_atomic_check
2021-08-12ARM: ixp4xx: fix building both pci driversArnd Bergmann
When both the old and the new PCI drivers are enabled in the same kernel, there are a couple of namespace conflicts that cause a build failure: drivers/pci/controller/pci-ixp4xx.c:38: error: "IXP4XX_PCI_CSR" redefined [-Werror] 38 | #define IXP4XX_PCI_CSR 0x1c | In file included from arch/arm/mach-ixp4xx/include/mach/hardware.h:23, from arch/arm/mach-ixp4xx/include/mach/io.h:15, from arch/arm/include/asm/io.h:198, from include/linux/io.h:13, from drivers/pci/controller/pci-ixp4xx.c:20: arch/arm/mach-ixp4xx/include/mach/ixp4xx-regs.h:221: note: this is the location of the previous definition 221 | #define IXP4XX_PCI_CSR(x) ((volatile u32 *)(IXP4XX_PCI_CFG_BASE_VIRT+(x))) | drivers/pci/controller/pci-ixp4xx.c:148:12: error: 'ixp4xx_pci_read' redeclared as different kind of symbol 148 | static int ixp4xx_pci_read(struct ixp4xx_pci *p, u32 addr, u32 cmd, u32 *data) | ^~~~~~~~~~~~~~~ Rename both the ixp4xx_pci_read/ixp4xx_pci_write functions and the IXP4XX_PCI_CSR macro. In each case, I went with the version that has fewer callers to keep the change small. Fixes: f7821b493458 ("PCI: ixp4xx: Add a new driver for IXP4xx") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: soc@kernel.org Link: https://lore.kernel.org/r/20210721151546.2325937-1-arnd@kernel.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-08-12ARM: configs: Update the nhk8815_defconfigLinus Walleij
The platform lost the framebuffer due to a commit solving a circular dependency in v5.14-rc1, so add it back in by explicitly selecting the framebuffer. Also fix up some Kconfig options that got dropped or moved around while we're at it. Fixes: f611b1e7624c ("drm: Avoid circular dependencies for CONFIG_FB") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Cc: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210807225518.3607126-1-linus.walleij@linaro.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-08-13Merge tag 'drm-misc-fixes-2021-08-12' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull: * meson: Fix colors when booting with HDR Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/YRTb+qUuBYWjJDVg@linux-uq9g.fritz.box
2021-08-13Merge tag 'drm-intel-fixes-2021-08-12' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - GVT fix for Windows VM hang. - Display fix of 12 BPC bits for display 12 and newer. - Don't try to access some media register for fused off domains. - Fix kerneldoc build warnings. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YRU/hnQ1sNr+j37x@intel.com
2021-08-12Merge tag 'ieee802154-for-davem-2021-08-12' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan Stefan Schmidt says: ==================== ieee802154 for net 2021-08-12 Mostly fixes coming from bot reports. Dongliang Mu tackled some syzkaller reports in hwsim again and Takeshi Misawa a memory leak in ieee802154 raw. * tag 'ieee802154-for-davem-2021-08-12' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan: net: Fix memory leak in ieee802154_raw_deliver ieee802154: hwsim: fix GPF in hwsim_new_edge_nl ieee802154: hwsim: fix GPF in hwsim_set_edge_lqi ==================== Link: https://lore.kernel.org/r/20210812183912.1663996-1-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-12x86/resctrl: Fix default monitoring groups reportingBabu Moger
Creating a new sub monitoring group in the root /sys/fs/resctrl leads to getting the "Unavailable" value for mbm_total_bytes and mbm_local_bytes on the entire filesystem. Steps to reproduce: 1. mount -t resctrl resctrl /sys/fs/resctrl/ 2. cd /sys/fs/resctrl/ 3. cat mon_data/mon_L3_00/mbm_total_bytes 23189832 4. Create sub monitor group: mkdir mon_groups/test1 5. cat mon_data/mon_L3_00/mbm_total_bytes Unavailable When a new monitoring group is created, a new RMID is assigned to the new group. But the RMID is not active yet. When the events are read on the new RMID, it is expected to report the status as "Unavailable". When the user reads the events on the default monitoring group with multiple subgroups, the events on all subgroups are consolidated together. Currently, if any of the RMID reads report as "Unavailable", then everything will be reported as "Unavailable". Fix the issue by discarding the "Unavailable" reads and reporting all the successful RMID reads. This is not a problem on Intel systems as Intel reports 0 on Inactive RMIDs. Fixes: d89b7379015f ("x86/intel_rdt/cqm: Add mon_data") Reported-by: Paweł Szulik <pawel.szulik@intel.com> Signed-off-by: Babu Moger <Babu.Moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=213311 Link: https://lkml.kernel.org/r/162793309296.9224.15871659871696482080.stgit@bmoger-ubuntu
2021-08-12vsock/virtio: avoid potential deadlock when vsock device removeLongpeng(Mike)
There's a potential deadlock case when remove the vsock device or process the RESET event: vsock_for_each_connected_socket: spin_lock_bh(&vsock_table_lock) ----------- (1) ... virtio_vsock_reset_sock: lock_sock(sk) --------------------- (2) ... spin_unlock_bh(&vsock_table_lock) lock_sock() may do initiative schedule when the 'sk' is owned by other thread at the same time, we would receivce a warning message that "scheduling while atomic". Even worse, if the next task (selected by the scheduler) try to release a 'sk', it need to request vsock_table_lock and the deadlock occur, cause the system into softlockup state. Call trace: queued_spin_lock_slowpath vsock_remove_bound vsock_remove_sock virtio_transport_release __vsock_release vsock_release __sock_release sock_close __fput ____fput So we should not require sk_lock in this case, just like the behavior in vhost_vsock or vmci. Fixes: 0ea9e1d3a9e3 ("VSOCK: Introduce virtio_transport.ko") Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20210812053056.1699-1-longpeng2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-12tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event nameSteven Rostedt (VMware)
The following commands: # echo 'read_max u64 size;' > synthetic_events # echo 'hist:keys=common_pid:count=count:onmax($count).trace(read_max,count)' > events/syscalls/sys_enter_read/trigger Causes: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP CPU: 4 PID: 1763 Comm: bash Not tainted 5.14.0-rc2-test+ #155 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 RIP: 0010:strcmp+0xc/0x20 Code: 75 f7 31 c0 0f b6 0c 06 88 0c 02 48 83 c0 01 84 c9 75 f1 4c 89 c0 c3 0f 1f 80 00 00 00 00 31 c0 eb 08 48 83 c0 01 84 d2 74 0f <0f> b6 14 07 3a 14 06 74 ef 19 c0 83 c8 01 c3 31 c0 c3 66 90 48 89 RSP: 0018:ffffb5fdc0963ca8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffffffb3a4e040 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9714c0d0b640 RDI: 0000000000000000 RBP: 0000000000000000 R08: 00000022986b7cde R09: ffffffffb3a4dff8 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9714c50603c8 R13: 0000000000000000 R14: ffff97143fdf9e48 R15: ffff9714c01a2210 FS: 00007f1fa6785740(0000) GS:ffff9714da400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000002d863004 CR4: 00000000001706e0 Call Trace: __find_event_file+0x4e/0x80 action_create+0x6b7/0xeb0 ? kstrdup+0x44/0x60 event_hist_trigger_func+0x1a07/0x2130 trigger_process_regex+0xbd/0x110 event_trigger_write+0x71/0xd0 vfs_write+0xe9/0x310 ksys_write+0x68/0xe0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f1fa6879e87 The problem was the "trace(read_max,count)" where the "count" should be "$count" as "onmax()" only handles variables (although it really should be able to figure out that "count" is a field of sys_enter_read). But there's a path that does not find the variable and ends up passing a NULL for the event, which ends up getting passed to "strcmp()". Add a check for NULL to return and error on the command with: # cat error_log hist:syscalls:sys_enter_read: error: Couldn't create or find variable Command: hist:keys=common_pid:count=count:onmax($count).trace(read_max,count) ^ Link: https://lkml.kernel.org/r/20210808003011.4037f8d0@oasis.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Fixes: 50450603ec9cb tracing: Add 'onmax' hist trigger action support Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-12init: Suppress wrong warning for bootconfig cmdline parameterMasami Hiramatsu
Since the 'bootconfig' command line parameter is handled before parsing the command line, it doesn't use early_param(). But in this case, kernel shows a wrong warning message about it. [ 0.013714] Kernel command line: ro console=ttyS0 bootconfig console=tty0 [ 0.013741] Unknown command line parameters: bootconfig To suppress this message, add a dummy handler for 'bootconfig'. Link: https://lkml.kernel.org/r/162812945097.77369.1849780946468010448.stgit@devnote2 Fixes: 86d1919a4fb0 ("init: print out unknown kernel parameters") Reviewed-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-12tracing: define needed config DYNAMIC_FTRACE_WITH_ARGSLukas Bulwahn
Commit 2860cd8a2353 ("livepatch: Use the default ftrace_ops instead of REGS when ARGS is available") intends to enable config LIVEPATCH when ftrace with ARGS is available. However, the chain of configs to enable LIVEPATCH is incomplete, as HAVE_DYNAMIC_FTRACE_WITH_ARGS is available, but the definition of DYNAMIC_FTRACE_WITH_ARGS, combining DYNAMIC_FTRACE and HAVE_DYNAMIC_FTRACE_WITH_ARGS, needed to enable LIVEPATCH, is missing in the commit. Fortunately, ./scripts/checkkconfigsymbols.py detects this and warns: DYNAMIC_FTRACE_WITH_ARGS Referencing files: kernel/livepatch/Kconfig So, define the config DYNAMIC_FTRACE_WITH_ARGS analogously to the already existing similar configs, DYNAMIC_FTRACE_WITH_REGS and DYNAMIC_FTRACE_WITH_DIRECT_CALLS, in ./kernel/trace/Kconfig to connect the chain of configs. Link: https://lore.kernel.org/kernel-janitors/CAKXUXMwT2zS9fgyQHKUUiqo8ynZBdx2UEUu1WnV_q0OCmknqhw@mail.gmail.com/ Link: https://lkml.kernel.org/r/20210806195027.16808-1-lukas.bulwahn@gmail.com Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: stable@vger.kernel.org Fixes: 2860cd8a2353 ("livepatch: Use the default ftrace_ops instead of REGS when ARGS is available") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-12trace/osnoise: Print a stop tracing messageDaniel Bristot de Oliveira
When using osnoise/timerlat with stop tracing, sometimes it is not clear in which CPU the stop condition was hit, mainly when using some extra events. Print a message informing in which CPU the trace stopped, like in the example below: <idle>-0 [006] d.h. 2932.676616: #1672599 context irq timer_latency 34689 ns <idle>-0 [006] dNh. 2932.676618: irq_noise: local_timer:236 start 2932.676615639 duration 2391 ns <idle>-0 [006] dNh. 2932.676620: irq_noise: virtio0-output.0:47 start 2932.676620180 duration 86 ns <idle>-0 [003] d.h. 2932.676621: #1673374 context irq timer_latency 1200 ns <idle>-0 [006] d... 2932.676623: thread_noise: swapper/6:0 start 2932.676615964 duration 4339 ns <idle>-0 [003] dNh. 2932.676623: irq_noise: local_timer:236 start 2932.676620597 duration 1881 ns <idle>-0 [006] d... 2932.676623: sched_switch: prev_comm=swapper/6 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=timerlat/6 next_pid=852 next_prio=4 timerlat/6-852 [006] .... 2932.676623: #1672599 context thread timer_latency 41931 ns <idle>-0 [003] d... 2932.676623: thread_noise: swapper/3:0 start 2932.676620854 duration 880 ns <idle>-0 [003] d... 2932.676624: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=timerlat/3 next_pid=849 next_prio=4 timerlat/6-852 [006] .... 2932.676624: timerlat_main: stop tracing hit on cpu 6 timerlat/3-849 [003] .... 2932.676624: #1673374 context thread timer_latency 4310 ns Link: https://lkml.kernel.org/r/b30a0d7542adba019185f44ee648e60e14923b11.1626598844.git.bristot@kernel.org Cc: Tom Zanussi <zanussi@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-12trace/timerlat: Add a header with PREEMPT_RT additional fieldsDaniel Bristot de Oliveira
Some extra flags are printed to the trace header when using the PREEMPT_RT config. The extra flags are: need-resched-lazy, preempt-lazy-depth, and migrate-disable. Without printing these fields, the timerlat specific fields are shifted by three positions, for example: # tracer: timerlat # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # || / # |||| ACTIVATION # TASK-PID CPU# |||| TIMESTAMP ID CONTEXT LATENCY # | | | |||| | | | | <idle>-0 [000] d..h... 3279.798871: #1 context irq timer_latency 830 ns <...>-807 [000] ....... 3279.798881: #1 context thread timer_latency 11301 ns Add a new header for timerlat with the missing fields, to be used when the PREEMPT_RT is enabled. Link: https://lkml.kernel.org/r/babb83529a3211bd0805be0b8c21608230202c55.1626598844.git.bristot@kernel.org Cc: Tom Zanussi <zanussi@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-12trace/osnoise: Add a header with PREEMPT_RT additional fieldsDaniel Bristot de Oliveira
Some extra flags are printed to the trace header when using the PREEMPT_RT config. The extra flags are: need-resched-lazy, preempt-lazy-depth, and migrate-disable. Without printing these fields, the osnoise specific fields are shifted by three positions, for example: # tracer: osnoise # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth MAX # || / SINGLE Interference counters: # |||| RUNTIME NOISE %% OF CPU NOISE +-----------------------------+ # TASK-PID CPU# |||| TIMESTAMP IN US IN US AVAILABLE IN US HW NMI IRQ SIRQ THREAD # | | | |||| | | | | | | | | | | <...>-741 [000] ....... 1105.690909: 1000000 234 99.97660 36 21 0 1001 22 3 <...>-742 [001] ....... 1105.691923: 1000000 281 99.97190 197 7 0 1012 35 14 <...>-743 [002] ....... 1105.691958: 1000000 1324 99.86760 118 11 0 1016 155 143 <...>-744 [003] ....... 1105.691998: 1000000 109 99.98910 21 4 0 1004 33 7 <...>-745 [004] ....... 1105.692015: 1000000 2023 99.79770 97 37 0 1023 52 18 Add a new header for osnoise with the missing fields, to be used when the PREEMPT_RT is enabled. Link: https://lkml.kernel.org/r/1f03289d2a51fde5a58c2e7def063dc630820ad1.1626598844.git.bristot@kernel.org Cc: Tom Zanussi <zanussi@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-12Merge branch 'for-v5.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ucounts fix from Eric Biederman: "This fixes the ucount sysctls on big endian architectures. The counts were expanded to be longs instead of ints, and the sysctl code was overlooked, so only the low 32bit were being processed. On litte endian just processing the low 32bits is fine, but on 64bit big endian processing just the low 32bits results in the high order bits instead of the low order bits being processed and nothing works proper. This change took a little bit to mature as we have the SYSCTL_ZERO, and SYSCTL_INT_MAX macros that are only usable for sysctls operating on ints, but unfortunately are not obviously broken. Which resulted in the versions of this change working on big endian and not on little endian, because the int SYSCTL_ZERO when extended 64bit wound up being 0x100000000. So we only allowed values greater than 0x100000000 and less than 0faff. Which unfortunately broken everything that tried to set the sysctls. (First reported with the windows subsystem for linux). I have tested this on x86_64 64bit after first reproducing the problems with the earlier version of this change, and then verifying the problems do not exist when we use appropriate long min and max values for extra1 and extra2" * 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ucounts: add missing data type changes
2021-08-12Merge tag 'sound-5.14-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "This seems to be a usual bump in the middle, containing lots of pending ASoC fixes: - Yet another PCM mmap regression fix - Fix for ASoC DAPM prefix handling - Various cs42l42 codec fixes - PCM buffer reference fixes in a few ASoC drivers - Fixes for ASoC SOF, AMD, tlv320, WM - HD-audio quirks" * tag 'sound-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (32 commits) ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 650 G8 Notebook PC ALSA: pcm: Fix mmap breakage without explicit buffer setup ALSA: hda: Add quirk for ASUS Flow x13 ASoC: cs42l42: Fix mono playback ASoC: cs42l42: Constrain sample rate to prevent illegal SCLK ASoC: cs42l42: Fix LRCLK frame start edge ASoC: cs42l42: PLL must be running when changing MCLK_SRC_SEL ASoC: cs42l42: Remove duplicate control for WNF filter frequency ASoC: cs42l42: Fix inversion of ADC Notch Switch control ASoC: SOF: Intel: hda-ipc: fix reply size checking ASoC: SOF: Intel: Kconfig: fix SoundWire dependencies ASoC: amd: Fix reference to PCM buffer address ASoC: nau8824: Fix open coded prefix handling ASoC: kirkwood: Fix reference to PCM buffer address ASoC: uniphier: Fix reference to PCM buffer address ASoC: xilinx: Fix reference to PCM buffer address ASoC: intel: atom: Fix reference to PCM buffer address ASoC: cs42l42: Fix bclk calculation for mono ASoC: cs42l42: Don't allow SND_SOC_DAIFMT_LEFT_J ASoC: cs42l42: Correct definition of ADC Volume control ...