summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-06-24nfs: drop the incorrect assertion in nfs_swap_rw()Christoph Hellwig
Since commit 2282679fb20b ("mm: submit multipage write for SWP_FS_OPS swap-space"), we can plug multiple pages then unplug them all together. That means iov_iter_count(iter) could be way bigger than PAGE_SIZE, it actually equals the size of iov_iter_npages(iter, INT_MAX). Note this issue has nothing to do with large folios as we don't support THP_SWPOUT to non-block devices. [v-songbaohua@oppo.com: figure out the cause and correct the commit message] Link: https://lkml.kernel.org/r/20240618065647.21791-1-21cnbao@gmail.com Fixes: 2282679fb20b ("mm: submit multipage write for SWP_FS_OPS swap-space") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Barry Song <v-songbaohua@oppo.com> Closes: https://lore.kernel.org/linux-mm/20240617053201.GA16852@lst.de/ Reviewed-by: Martin Wege <martin.l.wege@gmail.com> Cc: NeilBrown <neilb@suse.de> Cc: Anna Schumaker <anna@kernel.org> Cc: Steve French <sfrench@samba.org> Cc: Trond Myklebust <trondmy@kernel.org> Cc: Chuanhua Han <hanchuanhua@oppo.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Chris Li <chrisl@kernel.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-24mm/migrate: make migrate_pages_batch() stats consistentZi Yan
As Ying pointed out in [1], stats->nr_thp_failed needs to be updated to avoid stats inconsistency between MIGRATE_SYNC and MIGRATE_ASYNC when calling migrate_pages_batch(). Because if not, when migrate_pages_batch() is called via migrate_pages(MIGRATE_ASYNC), nr_thp_failed will not be increased and when migrate_pages_batch() is called via migrate_pages(MIGRATE_SYNC*), nr_thp_failed will be increase in migrate_pages_sync() by stats->nr_thp_failed += astats.nr_thp_split. [1] https://lore.kernel.org/linux-mm/87msnq7key.fsf@yhuang6-desk2.ccr.corp.intel.com/ Link: https://lkml.kernel.org/r/20240620012712.19804-1-zi.yan@sent.com Link: https://lkml.kernel.org/r/20240618134151.29214-1-zi.yan@sent.com Fixes: 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list") Signed-off-by: Zi Yan <ziy@nvidia.com> Suggested-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-24MAINTAINERS: TPM DEVICE DRIVER: update the W-tagJarkko Sakkinen
Git hosting for the test suite has been migrated from Gitlab to Codeberg, given the "less hostile environment". Link: https://lkml.kernel.org/r/20240618133556.105604-1-jarkko@kernel.org Link: https://codeberg.org/jarkko/linux-tpmdd-test Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-24selftests/mm:fix test_prctl_fork_exec return failureaigourensheng
After calling fork() in test_prctl_fork_exec(), the global variable ksm_full_scans_fd is initialized to 0 in the child process upon entering the main function of ./ksm_functional_tests. In the function call chain test_child_ksm() -> __mmap_and_merge_range -> ksm_merge-> ksm_get_full_scans, start_scans = ksm_get_full_scans() will return an error. Therefore, the value of ksm_full_scans_fd needs to be initialized before calling test_child_ksm in the child process. Link: https://lkml.kernel.org/r/20240617052934.5834-1-shechenglong001@gmail.com Signed-off-by: aigourensheng <shechenglong001@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-24mm: convert page type macros to enumStephen Brennan
Changing PG_slab from a page flag to a page type in commit 46df8e73a4a3 ("mm: free up PG_slab") in has the unintended consequence of removing the PG_slab constant from kernel debuginfo. The commit does add the value to the vmcoreinfo note, which allows debuggers to find the value without hardcoding it. However it's most flexible to continue representing the constant with an enum. To that end, convert the page type fields into an enum. Debuggers will now be able to detect that PG_slab's type has changed from enum pageflags to enum pagetype. Link: https://lkml.kernel.org/r/20240607202954.1198180-1-stephen.s.brennan@oracle.com Fixes: 46df8e73a4a3 ("mm: free up PG_slab") Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Hao Ge <gehao@kylinos.cn> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Omar Sandoval <osandov@osandov.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-24ocfs2: fix DIO failure due to insufficient transaction creditsJan Kara
The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] #10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] #11 dio_complete at ffffffff8c2b9fa7 #12 do_blockdev_direct_IO at ffffffff8c2bc09f #13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] #14 generic_file_direct_write at ffffffff8c1dcf14 #15 __generic_file_write_iter at ffffffff8c1dd07b #16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] #17 aio_write at ffffffff8c2cc72e #18 kmem_cache_alloc at ffffffff8c248dde #19 do_io_submit at ffffffff8c2ccada #20 do_syscall_64 at ffffffff8c004984 #21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz Fixes: c15471f79506 ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-24kasan: fix bad call to unpoison_slab_objectAndrey Konovalov
Commit 29d7355a9d05 ("kasan: save alloc stack traces for mempool") messed up one of the calls to unpoison_slab_object: the last two arguments are supposed to be GFP flags and whether to init the object memory. Fix the call. Without this fix, __kasan_mempool_unpoison_object provides the object's size as GFP flags to unpoison_slab_object, which can cause LOCKDEP reports (and probably other issues). Link: https://lkml.kernel.org/r/20240614143238.60323-1-andrey.konovalov@linux.dev Fixes: 29d7355a9d05 ("kasan: save alloc stack traces for mempool") Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com> Reported-by: Brad Spengler <spender@grsecurity.net> Acked-by: Marco Elver <elver@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-24mm: handle profiling for fake memory allocations during compactionSuren Baghdasaryan
During compaction isolated free pages are marked allocated so that they can be split and/or freed. For that, post_alloc_hook() is used inside split_map_pages() and release_free_list(). split_map_pages() marks free pages allocated, splits the pages and then lets alloc_contig_range_noprof() free those pages. release_free_list() marks free pages and immediately frees them. This usage of post_alloc_hook() affect memory allocation profiling because these functions might not be called from an instrumented allocator, therefore current->alloc_tag is NULL and when debugging is enabled (CONFIG_MEM_ALLOC_PROFILING_DEBUG=y) that causes warnings. To avoid that, wrap such post_alloc_hook() calls into an instrumented function which acts as an allocator which will be charged for these fake allocations. Note that these allocations are very short lived until they are freed, therefore the associated counters should usually read 0. Link: https://lkml.kernel.org/r/20240614230504.3849136-1-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Sourav Panda <souravpanda@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-24mm/slab: fix 'variable obj_exts set but not used' warningSuren Baghdasaryan
slab_post_alloc_hook() uses prepare_slab_obj_exts_hook() to obtain slabobj_ext object. Currently the only user of slabobj_ext object in this path is memory allocation profiling, therefore when it's not enabled this object is not needed. This also generates a warning when compiling with CONFIG_MEM_ALLOC_PROFILING=n. Move the code under this configuration to fix the warning. If more slabobj_ext users appear in the future, the code will have to be changed back to call prepare_slab_obj_exts_hook(). Link: https://lkml.kernel.org/r/20240614225951.3845577-1-surenb@google.com Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202406150444.F6neSaiy-lkp@intel.com/ Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Kees Cook <keescook@chromium.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-24/proc/pid/smaps: add mseal info for vmaJeff Xu
Add sl in /proc/pid/smaps to indicate vma is sealed Link: https://lkml.kernel.org/r/20240614232014.806352-2-jeffxu@google.com Fixes: 8be7258aad44 ("mseal: add mseal syscall") Signed-off-by: Jeff Xu <jeffxu@chromium.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org> Cc: Jann Horn <jannh@google.com> Cc: Jorge Lucangeli Obes <jorgelo@chromium.org> Cc: Kees Cook <keescook@chromium.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stephen Röttger <sroettger@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-24mm: fix incorrect vbq reference in purge_fragmented_blockZhaoyang Huang
xa_for_each() in _vm_unmap_aliases() loops through all vbs. However, since commit 062eacf57ad9 ("mm: vmalloc: remove a global vmap_blocks xarray") the vb from xarray may not be on the corresponding CPU vmap_block_queue. Consequently, purge_fragmented_block() might use the wrong vbq->lock to protect the free list, leading to vbq->free breakage. Incorrect lock protection can exhaust all vmalloc space as follows: CPU0 CPU1 +--------------------------------------------+ | +--------------------+ +-----+ | +--> | |---->| |------+ | CPU1:vbq free_list | | vb1 | +--- | |<----| |<-----+ | +--------------------+ +-----+ | +--------------------------------------------+ _vm_unmap_aliases() vb_alloc() new_vmap_block() xa_for_each(&vbq->vmap_blocks, idx, vb) --> vb in CPU1:vbq->freelist purge_fragmented_block(vb) spin_lock(&vbq->lock) spin_lock(&vbq->lock) --> use CPU0:vbq->lock --> use CPU1:vbq->lock list_del_rcu(&vb->free_list) list_add_tail_rcu(&vb->free_list, &vbq->free) __list_del(vb->prev, vb->next) next->prev = prev +--------------------+ | | | CPU1:vbq free_list | +---| |<--+ | +--------------------+ | +----------------------------+ __list_add(new, head->prev, head) +--------------------------------------------+ | +--------------------+ +-----+ | +--> | |---->| |------+ | CPU1:vbq free_list | | vb2 | +--- | |<----| |<-----+ | +--------------------+ +-----+ | +--------------------------------------------+ prev->next = next +--------------------------------------------+ |----------------------------+ | | +--------------------+ | +-----+ | +--> | |--+ | |------+ | CPU1:vbq free_list | | vb2 | +--- | |<----| |<-----+ | +--------------------+ +-----+ | +--------------------------------------------+ Here’s a list breakdown. All vbs, which were to be added to ‘prev’, cannot be used by list_for_each_entry_rcu(vb, &vbq->free, free_list) in vb_alloc(). Thus, vmalloc space is exhausted. This issue affects both erofs and f2fs, the stacktrace is as follows: erofs: [<ffffffd4ffb93ad4>] __switch_to+0x174 [<ffffffd4ffb942f0>] __schedule+0x624 [<ffffffd4ffb946f4>] schedule+0x7c [<ffffffd4ffb947cc>] schedule_preempt_disabled+0x24 [<ffffffd4ffb962ec>] __mutex_lock+0x374 [<ffffffd4ffb95998>] __mutex_lock_slowpath+0x14 [<ffffffd4ffb95954>] mutex_lock+0x24 [<ffffffd4fef2900c>] reclaim_and_purge_vmap_areas+0x44 [<ffffffd4fef25908>] alloc_vmap_area+0x2e0 [<ffffffd4fef24ea0>] vm_map_ram+0x1b0 [<ffffffd4ff1b46f4>] z_erofs_lz4_decompress+0x278 [<ffffffd4ff1b8ac4>] z_erofs_decompress_queue+0x650 [<ffffffd4ff1b8328>] z_erofs_runqueue+0x7f4 [<ffffffd4ff1b66a8>] z_erofs_read_folio+0x104 [<ffffffd4feeb6fec>] filemap_read_folio+0x6c [<ffffffd4feeb68c4>] filemap_fault+0x300 [<ffffffd4fef0ecac>] __do_fault+0xc8 [<ffffffd4fef0c908>] handle_mm_fault+0xb38 [<ffffffd4ffb9f008>] do_page_fault+0x288 [<ffffffd4ffb9ed64>] do_translation_fault[jt]+0x40 [<ffffffd4fec39c78>] do_mem_abort+0x58 [<ffffffd4ffb8c3e4>] el0_ia+0x70 [<ffffffd4ffb8c260>] el0t_64_sync_handler[jt]+0xb0 [<ffffffd4fec11588>] ret_to_user[jt]+0x0 f2fs: [<ffffffd4ffb93ad4>] __switch_to+0x174 [<ffffffd4ffb942f0>] __schedule+0x624 [<ffffffd4ffb946f4>] schedule+0x7c [<ffffffd4ffb947cc>] schedule_preempt_disabled+0x24 [<ffffffd4ffb962ec>] __mutex_lock+0x374 [<ffffffd4ffb95998>] __mutex_lock_slowpath+0x14 [<ffffffd4ffb95954>] mutex_lock+0x24 [<ffffffd4fef2900c>] reclaim_and_purge_vmap_areas+0x44 [<ffffffd4fef25908>] alloc_vmap_area+0x2e0 [<ffffffd4fef24ea0>] vm_map_ram+0x1b0 [<ffffffd4ff1a3b60>] f2fs_prepare_decomp_mem+0x144 [<ffffffd4ff1a6c24>] f2fs_alloc_dic+0x264 [<ffffffd4ff175468>] f2fs_read_multi_pages+0x428 [<ffffffd4ff17b46c>] f2fs_mpage_readpages+0x314 [<ffffffd4ff1785c4>] f2fs_readahead+0x50 [<ffffffd4feec3384>] read_pages+0x80 [<ffffffd4feec32c0>] page_cache_ra_unbounded+0x1a0 [<ffffffd4feec39e8>] page_cache_ra_order+0x274 [<ffffffd4feeb6cec>] do_sync_mmap_readahead+0x11c [<ffffffd4feeb6764>] filemap_fault+0x1a0 [<ffffffd4ff1423bc>] f2fs_filemap_fault+0x28 [<ffffffd4fef0ecac>] __do_fault+0xc8 [<ffffffd4fef0c908>] handle_mm_fault+0xb38 [<ffffffd4ffb9f008>] do_page_fault+0x288 [<ffffffd4ffb9ed64>] do_translation_fault[jt]+0x40 [<ffffffd4fec39c78>] do_mem_abort+0x58 [<ffffffd4ffb8c3e4>] el0_ia+0x70 [<ffffffd4ffb8c260>] el0t_64_sync_handler[jt]+0xb0 [<ffffffd4fec11588>] ret_to_user[jt]+0x0 To fix this, introducee cpu within vmap_block to record which this vb belongs to. Link: https://lkml.kernel.org/r/20240614021352.1822225-1-zhaoyang.huang@unisoc.com Link: https://lkml.kernel.org/r/20240607023116.1720640-1-zhaoyang.huang@unisoc.com Fixes: fc1e0d980037 ("mm/vmalloc: prevent stale TLBs in fully utilized blocks") Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Suggested-by: Hailong.Liu <hailong.liu@oppo.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-24Merge tag 'for-netdev' of ↵Jakub Kicinski
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-06-24 We've added 12 non-merge commits during the last 10 day(s) which contain a total of 10 files changed, 412 insertions(+), 16 deletions(-). The main changes are: 1) Fix a BPF verifier issue validating may_goto with a negative offset, from Alexei Starovoitov. 2) Fix a BPF verifier validation bug with may_goto combined with jump to the first instruction, also from Alexei Starovoitov. 3) Fix a bug with overrunning reservations in BPF ring buffer, from Daniel Borkmann. 4) Fix a bug in BPF verifier due to missing proper var_off setting related to movsx instruction, from Yonghong Song. 5) Silence unnecessary syzkaller-triggered warning in __xdp_reg_mem_model(), from Daniil Dulov. * tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: xdp: Remove WARN() from __xdp_reg_mem_model() selftests/bpf: Add tests for may_goto with negative offset. bpf: Fix may_goto with negative offset. selftests/bpf: Add more ring buffer test coverage bpf: Fix overrunning reservations in ringbuf selftests/bpf: Tests with may_goto and jumps to the 1st insn bpf: Fix the corner case with may_goto and jump to the 1st insn. bpf: Update BPF LSM maintainer list bpf: Fix remap of arena. selftests/bpf: Add a few tests to cover bpf: Add missed var_off setting in coerce_subreg_to_size_sx() bpf: Add missed var_off setting in set_sext32_default_val() ==================== Link: https://patch.msgid.link/20240624124330.8401-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24Merge branch 'locking-introduce-nested-bh-locking'Jakub Kicinski
Sebastian Andrzej Siewior says: ==================== locking: Introduce nested-BH locking. Disabling bottoms halves acts as per-CPU BKL. On PREEMPT_RT code within local_bh_disable() section remains preemtible. As a result high prior tasks (or threaded interrupts) will be blocked by lower-prio task (or threaded interrupts) which are long running which includes softirq sections. The proposed way out is to introduce explicit per-CPU locks for resources which are protected by local_bh_disable() and use those only on PREEMPT_RT so there is no additional overhead for !PREEMPT_RT builds. The series introduces the infrastructure and converts large parts of networking which is largest stake holder here. Once this done the per-CPU lock from local_bh_disable() on PREEMPT_RT can be lifted. Performance testing. Baseline is net-next as of commit 93bda33046e7a ("Merge branch'net-constify-ctl_table-arguments-of-utility-functions'") plus v6.10-rc1. A 10GiG link is used between two hosts. The command xdp-bench redirect-cpu --cpu 3 --remote-action drop eth1 -e was invoked on the receiving side with a ixgbe. The sending side uses pktgen_sample03_burst_single_flow.sh on i40e. Baseline: | eth1->? 9,018,604 rx/s 0 err,drop/s | receive total 9,018,604 pkt/s 0 drop/s 0 error/s | cpu:7 9,018,604 pkt/s 0 drop/s 0 error/s | enqueue to cpu 3 9,018,602 pkt/s 0 drop/s 7.00 bulk-avg | cpu:7->3 9,018,602 pkt/s 0 drop/s 7.00 bulk-avg | kthread total 9,018,606 pkt/s 0 drop/s 214,698 sched | cpu:3 9,018,606 pkt/s 0 drop/s 214,698 sched | xdp_stats 0 pass/s 9,018,606 drop/s 0 redir/s | cpu:3 0 pass/s 9,018,606 drop/s 0 redir/s | redirect_err 0 error/s | xdp_exception 0 hit/s perf top --sort cpu,symbol --no-children: | 18.14% 007 [k] bpf_prog_4f0ffbb35139c187_cpumap_l4_hash | 13.29% 007 [k] ixgbe_poll | 12.66% 003 [k] cpu_map_kthread_run | 7.23% 003 [k] page_frag_free | 6.76% 007 [k] xdp_do_redirect | 3.76% 007 [k] cpu_map_redirect | 3.13% 007 [k] bq_flush_to_queue | 2.51% 003 [k] xdp_return_frame | 1.93% 007 [k] try_to_wake_up | 1.78% 007 [k] _raw_spin_lock | 1.74% 007 [k] cpu_map_enqueue | 1.56% 003 [k] bpf_prog_57cd311f2e27366b_cpumap_drop With this series applied: | eth1->? 10,329,340 rx/s 0 err,drop/s | receive total 10,329,340 pkt/s 0 drop/s 0 error/s | cpu:6 10,329,340 pkt/s 0 drop/s 0 error/s | enqueue to cpu 3 10,329,338 pkt/s 0 drop/s 8.00 bulk-avg | cpu:6->3 10,329,338 pkt/s 0 drop/s 8.00 bulk-avg | kthread total 10,329,321 pkt/s 0 drop/s 96,297 sched | cpu:3 10,329,321 pkt/s 0 drop/s 96,297 sched | xdp_stats 0 pass/s 10,329,321 drop/s 0 redir/s | cpu:3 0 pass/s 10,329,321 drop/s 0 redir/s | redirect_err 0 error/s | xdp_exception 0 hit/s perf top --sort cpu,symbol --no-children: | 20.90% 006 [k] bpf_prog_4f0ffbb35139c187_cpumap_l4_hash | 12.62% 006 [k] ixgbe_poll | 9.82% 003 [k] page_frag_free | 8.73% 003 [k] cpu_map_bpf_prog_run_xdp | 6.63% 006 [k] xdp_do_redirect | 4.94% 003 [k] cpu_map_kthread_run | 4.28% 006 [k] cpu_map_redirect | 4.03% 006 [k] bq_flush_to_queue | 3.01% 003 [k] xdp_return_frame | 1.95% 006 [k] _raw_spin_lock | 1.94% 003 [k] bpf_prog_57cd311f2e27366b_cpumap_drop This diff appears to be noise. v8: https://lore.kernel.org/all/20240619072253.504963-1-bigeasy@linutronix.de v7: https://lore.kernel.org/all/20240618072526.379909-1-bigeasy@linutronix.de v6: https://lore.kernel.org/all/20240612170303.3896084-1-bigeasy@linutronix.de v5: https://lore.kernel.org/all/20240607070427.1379327-1-bigeasy@linutronix.de v4: https://lore.kernel.org/all/20240604154425.878636-1-bigeasy@linutronix.de v3: https://lore.kernel.org/all/20240529162927.403425-1-bigeasy@linutronix.de v2: https://lore.kernel.org/all/20240503182957.1042122-1-bigeasy@linutronix.de v1: https://lore.kernel.org/all/20231215171020.687342-1-bigeasy@linutronix.de ==================== Link: https://patch.msgid.link/20240620132727.660738-1-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24net: Move per-CPU flush-lists to bpf_net_context on PREEMPT_RT.Sebastian Andrzej Siewior
The per-CPU flush lists, which are accessed from within the NAPI callback (xdp_do_flush() for instance), are per-CPU. There are subject to the same problem as struct bpf_redirect_info. Add the per-CPU lists cpu_map_flush_list, dev_map_flush_list and xskmap_map_flush_list to struct bpf_net_context. Add wrappers for the access. The lists initialized on first usage (similar to bpf_net_ctx_get_ri()). Cc: "Björn Töpel" <bjorn@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Yonghong Song <yonghong.song@linux.dev> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-16-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.Sebastian Andrzej Siewior
The XDP redirect process is two staged: - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the packet and makes decisions. While doing that, the per-CPU variable bpf_redirect_info is used. - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info and it may also access other per-CPU variables like xskmap_flush_list. At the very end of the NAPI callback, xdp_do_flush() is invoked which does not access bpf_redirect_info but will touch the individual per-CPU lists. The per-CPU variables are only used in the NAPI callback hence disabling bottom halves is the only protection mechanism. Users from preemptible context (like cpu_map_kthread_run()) explicitly disable bottom halves for protections reasons. Without locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. PREEMPT_RT has forced-threaded interrupts enabled and every NAPI-callback runs in a thread. If each thread has its own data structure then locking can be avoided. Create a struct bpf_net_context which contains struct bpf_redirect_info. Define the variable on stack, use bpf_net_ctx_set() to save a pointer to it, bpf_net_ctx_clear() removes it again. The bpf_net_ctx_set() may nest. For instance a function can be used from within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and NET_TX_SOFTIRQ which does not. Therefore only the first invocations updates the pointer. Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct bpf_redirect_info. The returned data structure is zero initialized to ensure nothing is leaked from stack. This is done on first usage of the struct. bpf_net_ctx_set() sets bpf_redirect_info::kern_flags to 0 to note that initialisation is required. First invocation of bpf_net_ctx_get_ri() will memset() the data structure and update bpf_redirect_info::kern_flags. bpf_redirect_info::nh is excluded from memset because it is only used once BPF_F_NEIGH is set which also sets the nh member. The kern_flags is moved past nh to exclude it from memset. The pointer to bpf_net_context is saved task's task_struct. Using always the bpf_net_context approach has the advantage that there is almost zero differences between PREEMPT_RT and non-PREEMPT_RT builds. Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Yonghong Song <yonghong.song@linux.dev> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-15-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24net: Use nested-BH locking for bpf_scratchpad.Sebastian Andrzej Siewior
bpf_scratchpad is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-14-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24seg6: Use nested-BH locking for seg6_bpf_srh_states.Sebastian Andrzej Siewior
The access to seg6_bpf_srh_states is protected by disabling preemption. Based on the code, the entry point is input_action_end_bpf() and every other function (the bpf helper functions bpf_lwt_seg6_*()), that is accessing seg6_bpf_srh_states, should be called from within input_action_end_bpf(). input_action_end_bpf() accesses seg6_bpf_srh_states first at the top of the function and then disables preemption. This looks wrong because if preemption needs to be disabled as part of the locking mechanism then the variable shouldn't be accessed beforehand. Looking at how it is used via test_lwt_seg6local.sh then input_action_end_bpf() is always invoked from softirq context. If this is always the case then the preempt_disable() statement is superfluous. If this is not always invoked from softirq then disabling only preemption is not sufficient. Replace the preempt_disable() statement with nested-BH locking. This is not an equivalent replacement as it assumes that the invocation of input_action_end_bpf() always occurs in softirq context and thus the preempt_disable() is superfluous. Add a local_lock_t the data structure and use local_lock_nested_bh() for locking. Add lockdep_assert_held() to ensure the lock is held while the per-CPU variable is referenced in the helper functions. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: David Ahern <dsahern@kernel.org> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-13-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24lwt: Don't disable migration prio invoking BPF.Sebastian Andrzej Siewior
There is no need to explicitly disable migration if bottom halves are also disabled. Disabling BH implies disabling migration. Remove migrate_disable() and rely solely on disabling BH to remain on the same CPU. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-12-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24dev: Use nested-BH locking for softnet_data.process_queue.Sebastian Andrzej Siewior
softnet_data::process_queue is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. softnet_data::input_queue_head can be updated lockless. This is fine because this value is only update CPU local by the local backlog_napi thread. Add a local_lock_t to softnet_data and use local_lock_nested_bh() for locking of process_queue. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-11-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24dev: Remove PREEMPT_RT ifdefs from backlog_lock.*().Sebastian Andrzej Siewior
The backlog_napi locking (previously RPS) relies on explicit locking if either RPS or backlog NAPI is enabled. If both are disabled then locking was achieved by disabling interrupts except on PREEMPT_RT. PREEMPT_RT was excluded because the needed synchronisation was already provided local_bh_disable(). Since the introduction of backlog NAPI and making it mandatory for PREEMPT_RT the ifdef within backlog_lock.*() is obsolete and can be removed. Remove the ifdefs in backlog_lock.*(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-10-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24net: softnet_data: Make xmit per task.Sebastian Andrzej Siewior
Softirq is preemptible on PREEMPT_RT. Without a per-CPU lock in local_bh_disable() there is no guarantee that only one device is transmitting at a time. With preemption and multiple senders it is possible that the per-CPU `recursion' counter gets incremented by different threads and exceeds XMIT_RECURSION_LIMIT leading to a false positive recursion alert. The `more' member is subject to similar problems if set by one thread for one driver and wrongly used by another driver within another thread. Instead of adding a lock to protect the per-CPU variable it is simpler to make xmit per-task. Sending and receiving skbs happens always in thread context anyway. Having a lock to protected the per-CPU counter would block/ serialize two sending threads needlessly. It would also require a recursive lock to ensure that the owner can increment the counter further. Make the softnet_data.xmit a task_struct member on PREEMPT_RT. Add needed wrapper. Cc: Ben Segall <bsegall@google.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-9-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24netfilter: br_netfilter: Use nested-BH locking for brnf_frag_data_storage.Sebastian Andrzej Siewior
brnf_frag_data_storage is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: Florian Westphal <fw@strlen.de> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: bridge@lists.linux.dev Cc: coreteam@netfilter.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-8-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24net/ipv4: Use nested-BH locking for ipv4_tcp_sk.Sebastian Andrzej Siewior
ipv4_tcp_sk is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Make a struct with a sock member (original ipv4_tcp_sk) and a local_lock_t and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: David Ahern <dsahern@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-7-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24net/tcp_sigpool: Use nested-BH locking for sigpool_scratch.Sebastian Andrzej Siewior
sigpool_scratch is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Make a struct with a pad member (original sigpool_scratch) and a local_lock_t and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: David Ahern <dsahern@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-6-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24net: Use nested-BH locking for napi_alloc_cache.Sebastian Andrzej Siewior
napi_alloc_cache is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-5-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24net: Use __napi_alloc_frag_align() instead of open coding it.Sebastian Andrzej Siewior
The else condition within __netdev_alloc_frag_align() is an open coded __napi_alloc_frag_align(). Use __napi_alloc_frag_align() instead of open coding it. Move fragsz assignment before page_frag_alloc_align() invocation because __napi_alloc_frag_align() also contains this statement. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-4-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24locking/local_lock: Add local nested BH locking infrastructure.Sebastian Andrzej Siewior
Add local_lock_nested_bh() locking. It is based on local_lock_t and the naming follows the preempt_disable_nested() example. For !PREEMPT_RT + !LOCKDEP it is a per-CPU annotation for locking assumptions based on local_bh_disable(). The macro is optimized away during compilation. For !PREEMPT_RT + LOCKDEP the local_lock_nested_bh() is reduced to the usual lock-acquire plus lockdep_assert_in_softirq() - ensuring that BH is disabled. For PREEMPT_RT local_lock_nested_bh() acquires the specified per-CPU lock. It does not disable CPU migration because it relies on local_bh_disable() disabling CPU migration. With LOCKDEP it performans the usual lockdep checks as with !PREEMPT_RT. Due to include hell the softirq check has been moved spinlock.c. The intention is to use this locking in places where locking of a per-CPU variable relies on BH being disabled. Instead of treating disabled bottom halves as a big per-CPU lock, PREEMPT_RT can use this to reduce the locking scope to what actually needs protecting. A side effect is that it also documents the protection scope of the per-CPU variables. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-3-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24locking/local_lock: Introduce guard definition for local_lock.Sebastian Andrzej Siewior
Introduce lock guard definition for local_lock_t. There are no users yet. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-2-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24Merge tag 'input-for-v6.10-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: - fixes for ili210x and elantech drivers - new products IDs added to xpad controller driver - a tweak to i8042 driver to always keep keyboard in Ayaneo Kun handheld in raw mode - populated "id_table" in ads7846 touchscreen driver to make sure non-OF instantiated devices can properly determine the model data. * tag 'input-for-v6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: ads7846 - use spi_device_id table Input: xpad - add support for ASUS ROG RAIKIRI PRO Input: ili210x - fix ili251x_read_touch_data() return value Input: i8042 - add Ayaneo Kun to i8042 quirk table Input: elantech - fix touchpad state on resume for Lenovo N24
2024-06-24wifi: ath12k: handle keepalive during WoWLAN suspend and resumeBaochen Qiang
With WoWLAN enabled and after sleeping for a rather long time, we are seeing that with some APs, it is not able to wake up the STA though the correct wake up pattern has been configured. This is because the host doesn't send keepalive command to firmware, thus firmware will not send any packet to the AP and after a specific time the AP kicks out the STA. So enable keepalive before going to suspend and disable it after resume back. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://patch.msgid.link/20240604055407.12506-9-quic_bqiang@quicinc.com
2024-06-24wifi: ath12k: support GTK rekey offloadBaochen Qiang
Host sets GTK related info to firmware before WoW is enabled, and gets rekey replay_count and then disables GTK rekey when WoW quits. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://patch.msgid.link/20240604055407.12506-8-quic_bqiang@quicinc.com
2024-06-24wifi: ath12k: support ARP and NS offloadBaochen Qiang
Support ARP and NS offload in WoW state. Tested this way: put machine A with QCA6390 to WoW state, ping/ping6 machine A from another machine B, check sniffer to see any ARP response and Neighbor Advertisement from machine A. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://patch.msgid.link/20240604055407.12506-7-quic_bqiang@quicinc.com
2024-06-24wifi: ath12k: implement hardware data filterBaochen Qiang
Host needs to set hardware data filter before entering WoW to let firmware drop needless broadcast/multicast frames to avoid frequent wakeup. Host clears hardware data filter when leaving WoW. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://patch.msgid.link/20240604055407.12506-6-quic_bqiang@quicinc.com
2024-06-24wifi: ath12k: add WoW net-detect functionalityBaochen Qiang
Implement net-detect feature by setting flag WIPHY_WOWLAN_NET_DETECT if firmware supports this feature. Driver sets the related PNO configuration to firmware before entering WoW and firmware then scans periodically and wakes up host if a specific SSID is found. Note that firmware crashes if we enable it for both P2P vdev and station vdev simultaneously because firmware can only support one vdev at a time. Since there is rare scenario for a P2P vdev to do net-detect, skip it for P2P vdevs. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://patch.msgid.link/20240604055407.12506-5-quic_bqiang@quicinc.com
2024-06-24wifi: ath12k: add basic WoW functionalitiesBaochen Qiang
Implement basic WoW functionalities such as magic-packet, disconnect and pattern. The logic is very similar to ath11k. When WoW is configured, ath12k_core_suspend and ath12k_core_resume are skipped (by checking ar->state) as we are not allowed to power cycle firmware. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://patch.msgid.link/20240604055407.12506-4-quic_bqiang@quicinc.com
2024-06-24wifi: ath12k: implement WoW enable and wakeup commandsBaochen Qiang
Implement WoW enable and WoW wakeup commands which are needed for suspend/resume. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://patch.msgid.link/20240604055407.12506-3-quic_bqiang@quicinc.com
2024-06-24wifi: ath12k: add ATH12K_DBG_WOW log levelBaochen Qiang
Currently there is no dedicated log level for WoW, so create it for future use. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://patch.msgid.link/20240604055407.12506-2-quic_bqiang@quicinc.com
2024-06-24Merge tag 'pinctrl-v6.10-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: - Use flag saving spinlocks in the Renesas rzg2l driver. This fixes up PREEMPT_RT problems. - Remove broken Qualcomm PM8008 that clearly was never working. A new version will arrive in the next merge window. - Add a quirk for LP8764 regmap that was missed and made the TI J7200 board unusable. - Fix persistance on the BCM2835 GPIO outputs kernel parameter so this remains consisten across a booted kernel. - Fix a potential deadlock in create_pinctrl() - Fix some erroneous bitfields and pinmux reset in the Rockchip RK3328 driver. * tag 'pinctrl-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: rockchip: fix pinmux reset in rockchip_pmx_set pinctrl: rockchip: use dedicated pinctrl type for RK3328 pinctrl: rockchip: fix pinmux bits for RK3328 GPIO3-B pins pinctrl: rockchip: fix pinmux bits for RK3328 GPIO2-B pins pinctrl: fix deadlock in create_pinctrl() when handling -EPROBE_DEFER pinctrl: bcm2835: Fix permissions of persist_gpio_outputs pinctrl: tps6594: add missing support for LP8764 PMIC dt-bindings: pinctrl: qcom,pmic-gpio: drop pm8008 pinctrl: qcom: spmi-gpio: drop broken pm8008 support pinctrl: renesas: rzg2l: Use spin_{lock,unlock}_irq{save,restore}
2024-06-24netfilter: fix undefined reference to 'netfilter_lwtunnel_*' when ↵Jianguo Wu
CONFIG_SYSCTL=n if CONFIG_SYSFS is not enabled in config, we get the below compile error, All errors (new ones prefixed by >>): csky-linux-ld: net/netfilter/core.o: in function `netfilter_init': core.c:(.init.text+0x42): undefined reference to `netfilter_lwtunnel_init' >> csky-linux-ld: core.c:(.init.text+0x56): undefined reference to `netfilter_lwtunnel_fini' >> csky-linux-ld: core.c:(.init.text+0x70): undefined reference to `netfilter_lwtunnel_init' csky-linux-ld: core.c:(.init.text+0x78): undefined reference to `netfilter_lwtunnel_fini' Fixes: a2225e0250c5 ("netfilter: move the sysctl nf_hooks_lwtunnel into the netfilter core") Reported-by: Mirsad Todorovac <mtodorovac69@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202406210511.8vbByYj3-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202406210520.6HmrUaA2-lkp@intel.com/ Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-24ASoC: mediatek: mt8195: Add platform entry for ETDM1_OUT_BE dai linkChen-Yu Tsai
Commit e70b8dd26711 ("ASoC: mediatek: mt8195: Remove afe-dai component and rework codec link") removed the codec entry for the ETDM1_OUT_BE dai link entirely instead of replacing it with COMP_EMPTY(). This worked by accident as the remaining COMP_EMPTY() platform entry became the codec entry, and the platform entry became completely empty, effectively the same as COMP_DUMMY() since snd_soc_fill_dummy_dai() doesn't do anything for platform entries. This causes a KASAN out-of-bounds warning in mtk_soundcard_common_probe() in sound/soc/mediatek/common/mtk-soundcard-driver.c: for_each_card_prelinks(card, i, dai_link) { if (adsp_node && !strncmp(dai_link->name, "AFE_SOF", strlen("AFE_SOF"))) dai_link->platforms->of_node = adsp_node; else if (!dai_link->platforms->name && !dai_link->platforms->of_node) dai_link->platforms->of_node = platform_node; } where the code expects the platforms array to have space for at least one entry. Add an COMP_EMPTY() entry so that dai_link->platforms has space. Fixes: e70b8dd26711 ("ASoC: mediatek: mt8195: Remove afe-dai component and rework codec link") Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patch.msgid.link/20240624061257.3115467-1-wenst@chromium.org Signed-off-by: Mark Brown <broonie@kernel.org>
2024-06-24xdp: Remove WARN() from __xdp_reg_mem_model()Daniil Dulov
syzkaller reports a warning in __xdp_reg_mem_model(). The warning occurs only if __mem_id_init_hash_table() returns an error. It returns the error in two cases: 1. memory allocation fails; 2. rhashtable_init() fails when some fields of rhashtable_params struct are not initialized properly. The second case cannot happen since there is a static const rhashtable_params struct with valid fields. So, warning is only triggered when there is a problem with memory allocation. Thus, there is no sense in using WARN() to handle this error and it can be safely removed. WARNING: CPU: 0 PID: 5065 at net/core/xdp.c:299 __xdp_reg_mem_model+0x2d9/0x650 net/core/xdp.c:299 CPU: 0 PID: 5065 Comm: syz-executor883 Not tainted 6.8.0-syzkaller-05271-gf99c5f563c17 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 RIP: 0010:__xdp_reg_mem_model+0x2d9/0x650 net/core/xdp.c:299 Call Trace: xdp_reg_mem_model+0x22/0x40 net/core/xdp.c:344 xdp_test_run_setup net/bpf/test_run.c:188 [inline] bpf_test_run_xdp_live+0x365/0x1e90 net/bpf/test_run.c:377 bpf_prog_test_run_xdp+0x813/0x11b0 net/bpf/test_run.c:1267 bpf_prog_test_run+0x33a/0x3b0 kernel/bpf/syscall.c:4240 __sys_bpf+0x48d/0x810 kernel/bpf/syscall.c:5649 __do_sys_bpf kernel/bpf/syscall.c:5738 [inline] __se_sys_bpf kernel/bpf/syscall.c:5736 [inline] __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5736 do_syscall_64+0xfb/0x240 entry_SYSCALL_64_after_hwframe+0x6d/0x75 Found by Linux Verification Center (linuxtesting.org) with syzkaller. Fixes: 8d5d88527587 ("xdp: rhashtable with allocator ID to pointer mapping") Signed-off-by: Daniil Dulov <d.dulov@aladdin.ru> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/all/20240617162708.492159-1-d.dulov@aladdin.ru Link: https://lore.kernel.org/bpf/20240624080747.36858-1-d.dulov@aladdin.ru
2024-06-24selftests/bpf: Add tests for may_goto with negative offset.Alexei Starovoitov
Add few tests with may_goto and negative offset. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240619235355.85031-2-alexei.starovoitov@gmail.com
2024-06-24bpf: Fix may_goto with negative offset.Alexei Starovoitov
Zac's syzbot crafted a bpf prog that exposed two bugs in may_goto. The 1st bug is the way may_goto is patched. When offset is negative it should be patched differently. The 2nd bug is in the verifier: when current state may_goto_depth is equal to visited state may_goto_depth it means there is an actual infinite loop. It's not correct to prune exploration of the program at this point. Note, that this check doesn't limit the program to only one may_goto insn, since 2nd and any further may_goto will increment may_goto_depth only in the queued state pushed for future exploration. The current state will have may_goto_depth == 0 regardless of number of may_goto insns and the verifier has to explore the program until bpf_exit. Fixes: 011832b97b31 ("bpf: Introduce may_goto instruction") Reported-by: Zac Ecob <zacecob@protonmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Closes: https://lore.kernel.org/bpf/CAADnVQL-15aNp04-cyHRn47Yv61NXfYyhopyZtUyxNojUZUXpA@mail.gmail.com/ Link: https://lore.kernel.org/bpf/20240619235355.85031-1-alexei.starovoitov@gmail.com
2024-06-24selftests/bpf: Add more ring buffer test coverageDaniel Borkmann
Add test coverage for reservations beyond the ring buffer size in order to validate that bpf_ringbuf_reserve() rejects the request with NULL, all other ring buffer tests keep passing as well: # ./vmtest.sh -- ./test_progs -t ringbuf [...] ./test_progs -t ringbuf [ 1.165434] bpf_testmod: loading out-of-tree module taints kernel. [ 1.165825] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel [ 1.284001] tsc: Refined TSC clocksource calibration: 3407.982 MHz [ 1.286871] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fc34e357, max_idle_ns: 440795379773 ns [ 1.289555] clocksource: Switched to clocksource tsc #274/1 ringbuf/ringbuf:OK #274/2 ringbuf/ringbuf_n:OK #274/3 ringbuf/ringbuf_map_key:OK #274/4 ringbuf/ringbuf_write:OK #274 ringbuf:OK #275 ringbuf_multi:OK [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> [ Test fixups for getting BPF CI back to work ] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240621140828.18238-2-daniel@iogearbox.net
2024-06-24MAINTAINERS: adjust file entry in FREESCALE QORIQ DPAA FMAN DRIVERLukas Bulwahn
Commit 243996d172a6 ("dt-bindings: net: Convert fsl-fman to yaml") splits the previous dt text file into four yaml files. It adjusts a corresponding file entry in MAINTAINERS from txt to yaml, but this adjustment misses that the file was split and renamed. Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a broken reference. Adjust the file entry to match the four yaml files resulting from this commit above. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-24net: usb: ax88179_178a: improve link status logsJose Ignacio Tornos Martinez
Avoid spurious link status logs that may ultimately be wrong; for example, if the link is set to down with the cable plugged, then the cable is unplugged and after this the link is set to up, the last new log that is appearing is incorrectly telling that the link is up. In order to avoid errors, show link status logs after link_reset processing, and in order to avoid spurious as much as possible, only show the link loss when some link status change is detected. cc: stable@vger.kernel.org Fixes: e2ca90c276e1 ("ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver") Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-23Linux 6.10-rc5v6.10-rc5Linus Torvalds
2024-06-23octeontx2-pf: Fix coverity and klockwork issues in octeon PF driverRatheesh Kannoth
Fix unintended sign extension and klockwork issues. These are not real issue but for sanity checks. Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-23Merge tag 'i2c-for-6.10-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "The core gains placeholders for recently added functions when CONFIG_I2C is not defined as well documentation fixes to start using inclusive terminology. The drivers get paths in DT bindings fixed as well as proper interrupt handling for the ocores driver" * tag 'i2c-for-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: docs: i2c: summary: be clearer with 'controller/target' and 'adapter/client' pairs docs: i2c: summary: document 'local' and 'remote' targets docs: i2c: summary: document use of inclusive language docs: i2c: summary: update speed mode description docs: i2c: summary: update I2C specification link docs: i2c: summary: start sentences consistently. i2c: Add nop fwnode operations i2c: ocores: set IACK bit after core is enabled dt-bindings: i2c: google,cros-ec-i2c-tunnel: correct path to i2c-controller schema dt-bindings: i2c: atmel,at91sam: correct path to i2c-controller schema
2024-06-23Merge tag '6.10-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: "Five smb3 client fixes - three nets/fiolios cifs fixes - fix typo in module parameters description - fix incorrect swap warning" * tag '6.10-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Move the 'pid' from the subreq to the req cifs: Only pick a channel once per read request cifs: Defer read completion cifs: fix typo in module parameter enable_gcm_256 cifs: drop the incorrect assertion in cifs_swap_rw()