summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-11-11ocfs2: fix UBSAN warning in ocfs2_verify_volume()Dmitry Antipov
Syzbot has reported the following splat triggered by UBSAN: UBSAN: shift-out-of-bounds in fs/ocfs2/super.c:2336:10 shift exponent 32768 is too large for 32-bit type 'int' CPU: 2 UID: 0 PID: 5255 Comm: repro Not tainted 6.12.0-rc4-syzkaller-00047-gc2ee9f594da8 #0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x241/0x360 ? __pfx_dump_stack_lvl+0x10/0x10 ? __pfx__printk+0x10/0x10 ? __asan_memset+0x23/0x50 ? lockdep_init_map_type+0xa1/0x910 __ubsan_handle_shift_out_of_bounds+0x3c8/0x420 ocfs2_fill_super+0xf9c/0x5750 ? __pfx_ocfs2_fill_super+0x10/0x10 ? __pfx_validate_chain+0x10/0x10 ? __pfx_validate_chain+0x10/0x10 ? validate_chain+0x11e/0x5920 ? __lock_acquire+0x1384/0x2050 ? __pfx_validate_chain+0x10/0x10 ? string+0x26a/0x2b0 ? widen_string+0x3a/0x310 ? string+0x26a/0x2b0 ? bdev_name+0x2b1/0x3c0 ? pointer+0x703/0x1210 ? __pfx_pointer+0x10/0x10 ? __pfx_format_decode+0x10/0x10 ? __lock_acquire+0x1384/0x2050 ? vsnprintf+0x1ccd/0x1da0 ? snprintf+0xda/0x120 ? __pfx_lock_release+0x10/0x10 ? do_raw_spin_lock+0x14f/0x370 ? __pfx_snprintf+0x10/0x10 ? set_blocksize+0x1f9/0x360 ? sb_set_blocksize+0x98/0xf0 ? setup_bdev_super+0x4e6/0x5d0 mount_bdev+0x20c/0x2d0 ? __pfx_ocfs2_fill_super+0x10/0x10 ? __pfx_mount_bdev+0x10/0x10 ? vfs_parse_fs_string+0x190/0x230 ? __pfx_vfs_parse_fs_string+0x10/0x10 legacy_get_tree+0xf0/0x190 ? __pfx_ocfs2_mount+0x10/0x10 vfs_get_tree+0x92/0x2b0 do_new_mount+0x2be/0xb40 ? __pfx_do_new_mount+0x10/0x10 __se_sys_mount+0x2d6/0x3c0 ? __pfx___se_sys_mount+0x10/0x10 ? do_syscall_64+0x100/0x230 ? __x64_sys_mount+0x20/0xc0 do_syscall_64+0xf3/0x230 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f37cae96fda Code: 48 8b 0d 51 ce 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1e ce 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007fff6c1aa228 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 00007fff6c1aa240 RCX: 00007f37cae96fda RDX: 00000000200002c0 RSI: 0000000020000040 RDI: 00007fff6c1aa240 RBP: 0000000000000004 R08: 00007fff6c1aa280 R09: 0000000000000000 R10: 00000000000008c0 R11: 0000000000000206 R12: 00000000000008c0 R13: 00007fff6c1aa280 R14: 0000000000000003 R15: 0000000001000000 </TASK> For a really damaged superblock, the value of 'i_super.s_blocksize_bits' may exceed the maximum possible shift for an underlying 'int'. So add an extra check whether the aforementioned field represents the valid block size, which is 512 bytes, 1K, 2K, or 4K. Link: https://lkml.kernel.org/r/20241106092100.2661330-1-dmantipov@yandex.ru Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem") Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reported-by: syzbot+56f7cd1abe4b8e475180@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=56f7cd1abe4b8e475180 Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11nilfs2: fix null-ptr-deref in block_dirty_buffer tracepointRyusuke Konishi
When using the "block:block_dirty_buffer" tracepoint, mark_buffer_dirty() may cause a NULL pointer dereference, or a general protection fault when KASAN is enabled. This happens because, since the tracepoint was added in mark_buffer_dirty(), it references the dev_t member bh->b_bdev->bd_dev regardless of whether the buffer head has a pointer to a block_device structure. In the current implementation, nilfs_grab_buffer(), which grabs a buffer to read (or create) a block of metadata, including b-tree node blocks, does not set the block device, but instead does so only if the buffer is not in the "uptodate" state for each of its caller block reading functions. However, if the uptodate flag is set on a folio/page, and the buffer heads are detached from it by try_to_free_buffers(), and new buffer heads are then attached by create_empty_buffers(), the uptodate flag may be restored to each buffer without the block device being set to bh->b_bdev, and mark_buffer_dirty() may be called later in that state, resulting in the bug mentioned above. Fix this issue by making nilfs_grab_buffer() always set the block device of the super block structure to the buffer head, regardless of the state of the buffer's uptodate flag. Link: https://lkml.kernel.org/r/20241106160811.3316-3-konishi.ryusuke@gmail.com Fixes: 5305cb830834 ("block: add block_{touch|dirty}_buffer tracepoint") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Ubisectech Sirius <bugreport@valiantsec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11nilfs2: fix null-ptr-deref in block_touch_buffer tracepointRyusuke Konishi
Patch series "nilfs2: fix null-ptr-deref bugs on block tracepoints". This series fixes null pointer dereference bugs that occur when using nilfs2 and two block-related tracepoints. This patch (of 2): It has been reported that when using "block:block_touch_buffer" tracepoint, touch_buffer() called from __nilfs_get_folio_block() causes a NULL pointer dereference, or a general protection fault when KASAN is enabled. This happens because since the tracepoint was added in touch_buffer(), it references the dev_t member bh->b_bdev->bd_dev regardless of whether the buffer head has a pointer to a block_device structure. In the current implementation, the block_device structure is set after the function returns to the caller. Here, touch_buffer() is used to mark the folio/page that owns the buffer head as accessed, but the common search helper for folio/page used by the caller function was optimized to mark the folio/page as accessed when it was reimplemented a long time ago, eliminating the need to call touch_buffer() here in the first place. So this solves the issue by eliminating the touch_buffer() call itself. Link: https://lkml.kernel.org/r/20241106160811.3316-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20241106160811.3316-2-konishi.ryusuke@gmail.com Fixes: 5305cb830834 ("block: add block_{touch|dirty}_buffer tracepoint") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: Ubisectech Sirius <bugreport@valiantsec.com> Closes: https://lkml.kernel.org/r/86bd3013-887e-4e38-960f-ca45c657f032.bugreport@valiantsec.com Reported-by: syzbot+9982fb8d18eba905abe2@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9982fb8d18eba905abe2 Tested-by: syzbot+9982fb8d18eba905abe2@syzkaller.appspotmail.com Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11mm: page_alloc: move mlocked flag clearance into free_pages_prepare()Roman Gushchin
Syzbot reported a bad page state problem caused by a page being freed using free_page() still having a mlocked flag at free_pages_prepare() stage: BUG: Bad page state in process syz.5.504 pfn:61f45 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x61f45 flags: 0xfff00000080204(referenced|workingset|mlocked|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000080204 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 8443, tgid 8442 (syz.5.504), ts 201884660643, free_ts 201499827394 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537 prep_new_page mm/page_alloc.c:1545 [inline] get_page_from_freelist+0x303f/0x3190 mm/page_alloc.c:3457 __alloc_pages_noprof+0x292/0x710 mm/page_alloc.c:4733 alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265 kvm_coalesced_mmio_init+0x1f/0xf0 virt/kvm/coalesced_mmio.c:99 kvm_create_vm virt/kvm/kvm_main.c:1235 [inline] kvm_dev_ioctl_create_vm virt/kvm/kvm_main.c:5488 [inline] kvm_dev_ioctl+0x12dc/0x2240 virt/kvm/kvm_main.c:5530 __do_compat_sys_ioctl fs/ioctl.c:1007 [inline] __se_compat_sys_ioctl+0x510/0xc90 fs/ioctl.c:950 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0xb4/0x110 arch/x86/entry/common.c:386 do_fast_syscall_32+0x34/0x80 arch/x86/entry/common.c:411 entry_SYSENTER_compat_after_hwframe+0x84/0x8e page last free pid 8399 tgid 8399 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1108 [inline] free_unref_folios+0xf12/0x18d0 mm/page_alloc.c:2686 folios_put_refs+0x76c/0x860 mm/swap.c:1007 free_pages_and_swap_cache+0x5c8/0x690 mm/swap_state.c:335 __tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline] tlb_batch_pages_flush mm/mmu_gather.c:149 [inline] tlb_flush_mmu_free mm/mmu_gather.c:366 [inline] tlb_flush_mmu+0x3a3/0x680 mm/mmu_gather.c:373 tlb_finish_mmu+0xd4/0x200 mm/mmu_gather.c:465 exit_mmap+0x496/0xc40 mm/mmap.c:1926 __mmput+0x115/0x390 kernel/fork.c:1348 exit_mm+0x220/0x310 kernel/exit.c:571 do_exit+0x9b2/0x28e0 kernel/exit.c:926 do_group_exit+0x207/0x2c0 kernel/exit.c:1088 __do_sys_exit_group kernel/exit.c:1099 [inline] __se_sys_exit_group kernel/exit.c:1097 [inline] __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1097 x64_sys_call+0x2634/0x2640 arch/x86/include/generated/asm/syscalls_64.h:232 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Modules linked in: CPU: 0 UID: 0 PID: 8442 Comm: syz.5.504 Not tainted 6.12.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 bad_page+0x176/0x1d0 mm/page_alloc.c:501 free_page_is_bad mm/page_alloc.c:918 [inline] free_pages_prepare mm/page_alloc.c:1100 [inline] free_unref_page+0xed0/0xf20 mm/page_alloc.c:2638 kvm_destroy_vm virt/kvm/kvm_main.c:1327 [inline] kvm_put_kvm+0xc75/0x1350 virt/kvm/kvm_main.c:1386 kvm_vcpu_release+0x54/0x60 virt/kvm/kvm_main.c:4143 __fput+0x23f/0x880 fs/file_table.c:431 task_work_run+0x24f/0x310 kernel/task_work.c:239 exit_task_work include/linux/task_work.h:43 [inline] do_exit+0xa2f/0x28e0 kernel/exit.c:939 do_group_exit+0x207/0x2c0 kernel/exit.c:1088 __do_sys_exit_group kernel/exit.c:1099 [inline] __se_sys_exit_group kernel/exit.c:1097 [inline] __ia32_sys_exit_group+0x3f/0x40 kernel/exit.c:1097 ia32_sys_call+0x2624/0x2630 arch/x86/include/generated/asm/syscalls_32.h:253 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0xb4/0x110 arch/x86/entry/common.c:386 do_fast_syscall_32+0x34/0x80 arch/x86/entry/common.c:411 entry_SYSENTER_compat_after_hwframe+0x84/0x8e RIP: 0023:0xf745d579 Code: Unable to access opcode bytes at 0xf745d54f. RSP: 002b:00000000f75afd6c EFLAGS: 00000206 ORIG_RAX: 00000000000000fc RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 00000000ffffff9c RDI: 00000000f744cff4 RBP: 00000000f717ae61 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> The problem was originally introduced by commit b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance"): it was focused on handling pagecache and anonymous memory and wasn't suitable for lower level get_page()/free_page() API's used for example by KVM, as with this reproducer. Fix it by moving the mlocked flag clearance down to free_page_prepare(). The bug itself if fairly old and harmless (aside from generating these warnings), aside from a small memory leak - "bad" pages are stopped from being allocated again. Link: https://lkml.kernel.org/r/20241106195354.270757-1-roman.gushchin@linux.dev Fixes: b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance") Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Reported-by: syzbot+e985d3026c4fd041578e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/6729f475.050a0220.701a.0019.GAE@google.com Acked-by: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11r8169: use helper r8169_mod_reg8_cond to simplify rtl_jumbo_configHeiner Kallweit
Use recently added helper r8169_mod_reg8_cond() to simplify jumbo mode configuration. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/3df1d484-a02e-46e7-8f75-db5b428e422e@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11Merge branch 'selftests-ncdevmem-add-ncdevmem-to-ksft'Jakub Kicinski
Stanislav Fomichev says: ==================== selftests: ncdevmem: Add ncdevmem to ksft The goal of the series is to simplify and make it possible to use ncdevmem in an automated way from the ksft python wrapper. ncdevmem is slowly mutated into a state where it uses stdout to print the payload and the python wrapper is added to make sure the arrived payload matches the expected one. ==================== Link: https://patch.msgid.link/20241107181211.3934153-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11selftests: ncdevmem: Add automated testStanislav Fomichev
Only RX side for now and small message to test the setup. In the future, we can extend it to TX side and to testing both sides with a couple of megs of data. make \ -C tools/testing/selftests \ TARGETS="drivers/hw/net" \ install INSTALL_PATH=~/tmp/ksft scp ~/tmp/ksft ${HOST}: scp ~/tmp/ksft ${PEER}: cfg+="NETIF=${DEV}\n" cfg+="LOCAL_V6=${HOST_IP}\n" cfg+="REMOTE_V6=${PEER_IP}\n" cfg+="REMOTE_TYPE=ssh\n" cfg+="REMOTE_ARGS=root@${PEER}\n" echo -e "$cfg" | ssh root@${HOST} "cat > ksft/drivers/net/net.config" ssh root@${HOST} "cd ksft && ./run_kselftest.sh -t drivers/net:devmem.py" Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20241107181211.3934153-13-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11selftests: ncdevmem: Move ncdevmem under drivers/net/hwStanislav Fomichev
This is where all the tests that depend on the HW functionality live in and this is where the automated test is gonna be added in the next patch. Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20241107181211.3934153-12-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11selftests: ncdevmem: Run selftest when none of the -s or -c has been providedStanislav Fomichev
This will be used as a 'probe' mode in the selftest to check whether the device supports the devmem or not. Use hard-coded queue layout (two last queues) and prevent user from passing custom -q and/or -t. Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20241107181211.3934153-11-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11selftests: ncdevmem: Remove hard-coded queue numbersStanislav Fomichev
Use single last queue of the device and probe it dynamically. Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20241107181211.3934153-10-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11selftests: ncdevmem: Use YNL to enable TCP header splitStanislav Fomichev
In the next patch the hard-coded queue numbers are gonna be removed. So introduce some initial support for ethtool YNL and use it to enable header split. Also, tcp-data-split requires latest ethtool which is unlikely to be present in the distros right now. (ideally, we should not shell out to ethtool at all). Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20241107181211.3934153-9-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11selftests: ncdevmem: Properly reset flow steeringStanislav Fomichev
ntuple off/on might be not enough to do it on all NICs. Add a bunch of shell crap to explicitly remove the rules. Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20241107181211.3934153-8-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11selftests: ncdevmem: Switch to AF_INET6Stanislav Fomichev
Use dualstack socket to support both v4 and v6. v4-mapped-v6 address can be used to do v4. Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20241107181211.3934153-7-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11selftests: ncdevmem: Remove default argumentsStanislav Fomichev
To make it clear what's required and what's not. Also, some of the values don't seem like a good defaults; for example eth1. Move the invocation comment to the top, add missing -s to the client and cleanup the client invocation a bit to make more readable. Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20241107181211.3934153-6-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11selftests: ncdevmem: Make client_ip optionalStanislav Fomichev
Support 3-tuple filtering by making client_ip optional. When -c is not passed, don't specify src-ip/src-port in the filter. Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20241107181211.3934153-5-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11selftests: ncdevmem: Unify error handlingStanislav Fomichev
There is a bunch of places where error() calls look out of place. Use the same error(1, errno, ...) pattern everywhere. Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20241107181211.3934153-4-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11selftests: ncdevmem: Separate out dmabuf providerStanislav Fomichev
So we can plug the other ones in the future if needed. Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20241107181211.3934153-3-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11selftests: ncdevmem: Redirect all non-payload output to stderrStanislav Fomichev
That should make it possible to do expected payload validation on the caller side. Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20241107181211.3934153-2-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11Merge branch 'net-stmmac-dwmac4-fixes-issues-in-dwmac4'Jakub Kicinski
Ley Foon Tan says: ==================== net: stmmac: dwmac4: Fixes issues in dwmac4 This patch series fixes issues in the dwmac4 driver. These three patches don't cause any user-visible issues, so they are targeted for net-next. Patch #1: Corrects the masking logic in the MTL Operation Mode RTC mask and shift macros. The current code lacks the use of the ~ operator, which is necessary to clear the bits properly. Patch #2: Addresses inaccuracies in the MTL_OP_MODE_*_MASK macros. The RTC fields are located in bits [1:0], and this patch ensures the mask and shift macros use the appropriate values to reflect this. Patch #3: Moves the handling of the Receive Watchdog Timeout (RWT) out of the Abnormal Interrupt Summary (AIS) condition. According to the databook, the RWT interrupt is not included in the AIS. v1: https://lore.kernel.org/20241023112005.GN402847@kernel.org v2: https://lore.kernel.org/20241101082336.1552084-3-leyfoon.tan@starfivetech.com ==================== Link: https://patch.msgid.link/20241107063637.2122726-1-leyfoon.tan@starfivetech.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: stmmac: dwmac4: Receive Watchdog Timeout is not in abnormal interrupt ↵Ley Foon Tan
summary The Receive Watchdog Timeout (RWT, bit[9]) is not part of Abnormal Interrupt Summary (AIS). Move the RWT handling out of the AIS condition statement. From databook, the AIS is the logical OR of the following interrupt bits: - Bit 1: Transmit Process Stopped - Bit 7: Receive Buffer Unavailable - Bit 8: Receive Process Stopped - Bit 10: Early Transmit Interrupt - Bit 12: Fatal Bus Error - Bit 13: Context Descriptor Error Signed-off-by: Ley Foon Tan <leyfoon.tan@starfivetech.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241107063637.2122726-4-leyfoon.tan@starfivetech.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: stmmac: dwmac4: Fix the MTL_OP_MODE_*_MASK operationLey Foon Tan
In order to mask off the bits, we need to use the '~' operator to invert all the bits of _MASK and clear them. Signed-off-by: Ley Foon Tan <leyfoon.tan@starfivetech.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241107063637.2122726-3-leyfoon.tan@starfivetech.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: stmmac: dwmac4: Fix MTL_OP_MODE_RTC mask and shift macrosLey Foon Tan
RTC fields are located in bits [1:0]. Correct the _MASK and _SHIFT macros to use the appropriate mask and shift. Signed-off-by: Ley Foon Tan <leyfoon.tan@starfivetech.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241107063637.2122726-2-leyfoon.tan@starfivetech.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: phy: aquantia: Add mdix config and reportingPaul Davey
Add support for configuring MDI-X state of PHY. Add reporting of resolved MDI-X state in status information. Tested on AQR113C. Signed-off-by: Paul Davey <paul.davey@alliedtelesis.co.nz> Link: https://patch.msgid.link/20241106222057.3965379-1-paul.davey@alliedtelesis.co.nz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11Merge branch 'introduce-vlan-support-in-hsr'Jakub Kicinski
MD Danish Anwar says: ==================== Introduce VLAN support in HSR This series adds VLAN support to HSR framework. This series also adds VLAN support to HSR mode of ICSSG Ethernet driver. [1] https://gist.githubusercontent.com/danish-ti/d309f92c640134ccc4f2c0c442de5be1/raw/9cfb5f8bd12b374ae591f4bd9ba3e91ae509ed4f/hsr_vlan_logs v1 https://lore.kernel.org/all/20241004074715.791191-1-danishanwar@ti.com/ v2 https://lore.kernel.org/all/20241024103056.3201071-1-danishanwar@ti.com/ ==================== Link: https://patch.msgid.link/20241106091710.3308519-1-danishanwar@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11selftests: hsr: Add test for VLANMD Danish Anwar
Add test for VLAN ping for HSR. The test adds vlan interfaces to the hsr interface and then verifies if ping to them works. Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20241106091710.3308519-5-danishanwar@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: ti: icssg-prueth: Add VLAN support for HSR modeRavi Gunasekaran
Add support for VLAN addition/deletion in HSR mode. In HSR mode, even if the host port is not a member of the VLAN domain, the slave ports should simply forward the frames. So allow forwarding of all VLAN frames in HSR mode. Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com> Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20241106091710.3308519-4-danishanwar@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: hsr: Add VLAN CTAG filter supportMurali Karicheri
This patch adds support for VLAN ctag based filtering at slave devices. The slave ethernet device may be capable of filtering ethernet packets based on VLAN ID. This requires that when the VLAN interface is created over an HSR/PRP interface, it passes the VID information to the associated slave ethernet devices so that it updates the hardware filters to filter ethernet frames based on VID. This patch adds the required functions to propagate the vid information to the slave devices. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20241106091710.3308519-3-danishanwar@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: hsr: Add VLAN supportWingMan Kwok
Add support for creating VLAN interfaces over HSR/PRP interface. Signed-off-by: WingMan Kwok <w-kwok2@ti.com> Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20241106091710.3308519-2-danishanwar@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11Merge branch 'side-mdio-support-for-lan937x-switches'Jakub Kicinski
Oleksij Rempel says: ==================== Side MDIO Support for LAN937x Switches This patch set introduces support for an internal MDIO bus in LAN937x switches, enabling the use of a side MDIO channel for PHY management while keeping SPI as the main interface for switch configuration. other changelogs are added to separate patches. ==================== Link: https://patch.msgid.link/20241106075942.1636998-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: dsa: microchip: parse PHY config from device treeOleksij Rempel
Introduce ksz_parse_dt_phy_config() to validate and parse PHY configuration from the device tree for KSZ switches. This function ensures proper setup of internal PHYs by checking `phy-handle` properties, verifying expected PHY IDs, and handling parent node mismatches. Sets the PHY mask on the MII bus if validation is successful. Returns -EINVAL on configuration errors. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20241106075942.1636998-7-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: dsa: microchip: add support for side MDIO interface in LAN937xOleksij Rempel
Implement side MDIO channel support for LAN937x switches, providing an alternative to SPI for PHY management alongside existing SPI-based switch configuration. This is needed to reduce SPI load, as SPI can be relatively expensive for small packets compared to MDIO support. Also, implemented static mappings for PHY addresses for various LAN937x models to support different internal PHY configurations. Since the PHY address mappings are not equal to the port indexes, this patch also provides PHY address calculation based on hardware strapping configuration. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241106075942.1636998-6-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: dsa: microchip: cleanup error handling in ksz_mdio_registerOleksij Rempel
Replace repeated cleanup code with a single error path using a label. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241106075942.1636998-5-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: dsa: microchip: Refactor MDIO handling for side MDIO accessOleksij Rempel
Add support for accessing PHYs via a side MDIO interface in LAN937x switches. The existing code already supports accessing PHYs via main management interfaces, which can be SPI, I2C, or MDIO, depending on the chip variant. This patch enables using a side MDIO bus, where SPI is used for the main switch configuration and MDIO for managing the integrated PHYs. On LAN937x, this is optional, allowing them to operate in both configurations: SPI only, or SPI + MDIO. Typically, the SPI interface is used for switch configuration, while MDIO handles PHY management. Additionally, update interrupt controller code to support non-linear port to PHY address mapping, enabling correct interrupt handling for configurations where PHY addresses do not directly correspond to port indexes. This change ensures that the interrupt mechanism properly aligns with the new, flexible PHY address mappings introduced by side MDIO support. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241106075942.1636998-4-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11dt-bindings: net: dsa: microchip: add mdio-parent-bus property for internal MDIOOleksij Rempel
Introduce `mdio-parent-bus` property in the ksz DSA bindings to reference the parent MDIO bus when the internal MDIO bus is attached to it, bypassing the main management interface. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241106075942.1636998-3-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11dt-bindings: net: dsa: microchip: add internal MDIO bus descriptionOleksij Rempel
Add description for the internal MDIO bus, including integrated PHY nodes, to ksz DSA bindings. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241106075942.1636998-2-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: atlantic: use irq_update_affinity_hint()Mohammad Heib
irq_set_affinity_hint() is deprecated, Use irq_update_affinity_hint() instead. This removes the side-effect of actually applying the affinity. The driver does not really need to worry about spreading its IRQs across CPUs. The core code already takes care of that. when the driver applies the affinities by itself, it breaks the users' expectations: 1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in order to prevent IRQs from being moved to certain CPUs that run a real-time workload. 2. atlantic device reopening will resets the affinity in aq_ndev_open(). 3. atlantic has no idea about irqbalance's config, so it may move an IRQ to a banned CPU. The real-time workload suffers unacceptable latency. Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241107120739.415743-1-mheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11nfp: use irq_update_affinity_hint()Mohammad Heib
irq_set_affinity_hint() is deprecated, Use irq_update_affinity_hint() instead. This removes the side-effect of actually applying the affinity. The driver does not really need to worry about spreading its IRQs across CPUs. The core code already takes care of that. when the driver applies the affinities by itself, it breaks the users' expectations: 1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in order to prevent IRQs from being moved to certain CPUs that run a real-time workload. 2. nfp device reopening will resets the affinity in nfp_net_netdev_open(). 3. nfp has no idea about irqbalance's config, so it may move an IRQ to a banned CPU. The real-time workload suffers unacceptable latency. Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Louis Peens <louis.peens@corigine.com> Link: https://patch.msgid.link/20241107115002.413358-1-mheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11bnxt_en: use irq_update_affinity_hint()Mohammad Heib
irq_set_affinity_hint() is deprecated, Use irq_update_affinity_hint() instead. This removes the side-effect of actually applying the affinity. The driver does not really need to worry about spreading its IRQs across CPUs. The core code already takes care of that. when the driver applies the affinities by itself, it breaks the users' expectations: 1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in order to prevent IRQs from being moved to certain CPUs that run a real-time workload. 2. bnxt_en device reopening will resets the affinity in bnxt_open(). 3. bnxt_en has no idea about irqbalance's config, so it may move an IRQ to a banned CPU. The real-time workload suffers unacceptable latency. Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://patch.msgid.link/20241106180811.385175-1-mheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: fix data-races around sk->sk_forward_allocWang Liang
Syzkaller reported this warning: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 16 at net/ipv4/af_inet.c:156 inet_sock_destruct+0x1c5/0x1e0 Modules linked in: CPU: 0 UID: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.12.0-rc5 #26 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:inet_sock_destruct+0x1c5/0x1e0 Code: 24 12 4c 89 e2 5b 48 c7 c7 98 ec bb 82 41 5c e9 d1 18 17 ff 4c 89 e6 5b 48 c7 c7 d0 ec bb 82 41 5c e9 bf 18 17 ff 0f 0b eb 83 <0f> 0b eb 97 0f 0b eb 87 0f 0b e9 68 ff ff ff 66 66 2e 0f 1f 84 00 RSP: 0018:ffffc9000008bd90 EFLAGS: 00010206 RAX: 0000000000000300 RBX: ffff88810b172a90 RCX: 0000000000000007 RDX: 0000000000000002 RSI: 0000000000000300 RDI: ffff88810b172a00 RBP: ffff88810b172a00 R08: ffff888104273c00 R09: 0000000000100007 R10: 0000000000020000 R11: 0000000000000006 R12: ffff88810b172a00 R13: 0000000000000004 R14: 0000000000000000 R15: ffff888237c31f78 FS: 0000000000000000(0000) GS:ffff888237c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffc63fecac8 CR3: 000000000342e000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn+0x88/0x130 ? inet_sock_destruct+0x1c5/0x1e0 ? report_bug+0x18e/0x1a0 ? handle_bug+0x53/0x90 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? inet_sock_destruct+0x1c5/0x1e0 __sk_destruct+0x2a/0x200 rcu_do_batch+0x1aa/0x530 ? rcu_do_batch+0x13b/0x530 rcu_core+0x159/0x2f0 handle_softirqs+0xd3/0x2b0 ? __pfx_smpboot_thread_fn+0x10/0x10 run_ksoftirqd+0x25/0x30 smpboot_thread_fn+0xdd/0x1d0 kthread+0xd3/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> ---[ end trace 0000000000000000 ]--- Its possible that two threads call tcp_v6_do_rcv()/sk_forward_alloc_add() concurrently when sk->sk_state == TCP_LISTEN with sk->sk_lock unlocked, which triggers a data-race around sk->sk_forward_alloc: tcp_v6_rcv tcp_v6_do_rcv skb_clone_and_charge_r sk_rmem_schedule __sk_mem_schedule sk_forward_alloc_add() skb_set_owner_r sk_mem_charge sk_forward_alloc_add() __kfree_skb skb_release_all skb_release_head_state sock_rfree sk_mem_uncharge sk_forward_alloc_add() sk_mem_reclaim // set local var reclaimable __sk_mem_reclaim sk_forward_alloc_add() In this syzkaller testcase, two threads call tcp_v6_do_rcv() with skb->truesize=768, the sk_forward_alloc changes like this: (cpu 1) | (cpu 2) | sk_forward_alloc ... | ... | 0 __sk_mem_schedule() | | +4096 = 4096 | __sk_mem_schedule() | +4096 = 8192 sk_mem_charge() | | -768 = 7424 | sk_mem_charge() | -768 = 6656 ... | ... | sk_mem_uncharge() | | +768 = 7424 reclaimable=7424 | | | sk_mem_uncharge() | +768 = 8192 | reclaimable=8192 | __sk_mem_reclaim() | | -4096 = 4096 | __sk_mem_reclaim() | -8192 = -4096 != 0 The skb_clone_and_charge_r() should not be called in tcp_v6_do_rcv() when sk->sk_state is TCP_LISTEN, it happens later in tcp_v6_syn_recv_sock(). Fix the same issue in dccp_v6_do_rcv(). Suggested-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") Signed-off-by: Wang Liang <wangliang74@huawei.com> Link: https://patch.msgid.link/20241107023405.889239-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11rxrpc: Add a tracepoint for aborts being proposedDavid Howells
Add a tracepoint to rxrpc to trace the proposal of an abort. The abort is performed asynchronously by the I/O thread. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/726356.1730898045@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11ipv6: Fix soft lockups in fib6_select_path under high next hop churnOmid Ehtemam-Haghighi
Soft lockups have been observed on a cluster of Linux-based edge routers located in a highly dynamic environment. Using the `bird` service, these routers continuously update BGP-advertised routes due to frequently changing nexthop destinations, while also managing significant IPv6 traffic. The lockups occur during the traversal of the multipath circular linked-list in the `fib6_select_path` function, particularly while iterating through the siblings in the list. The issue typically arises when the nodes of the linked list are unexpectedly deleted concurrently on a different core—indicated by their 'next' and 'previous' elements pointing back to the node itself and their reference count dropping to zero. This results in an infinite loop, leading to a soft lockup that triggers a system panic via the watchdog timer. Apply RCU primitives in the problematic code sections to resolve the issue. Where necessary, update the references to fib6_siblings to annotate or use the RCU APIs. Include a test script that reproduces the issue. The script periodically updates the routing table while generating a heavy load of outgoing IPv6 traffic through multiple iperf3 clients. It consistently induces infinite soft lockups within a couple of minutes. Kernel log: 0 [ffffbd13003e8d30] machine_kexec at ffffffff8ceaf3eb 1 [ffffbd13003e8d90] __crash_kexec at ffffffff8d0120e3 2 [ffffbd13003e8e58] panic at ffffffff8cef65d4 3 [ffffbd13003e8ed8] watchdog_timer_fn at ffffffff8d05cb03 4 [ffffbd13003e8f08] __hrtimer_run_queues at ffffffff8cfec62f 5 [ffffbd13003e8f70] hrtimer_interrupt at ffffffff8cfed756 6 [ffffbd13003e8fd0] __sysvec_apic_timer_interrupt at ffffffff8cea01af 7 [ffffbd13003e8ff0] sysvec_apic_timer_interrupt at ffffffff8df1b83d -- <IRQ stack> -- 8 [ffffbd13003d3708] asm_sysvec_apic_timer_interrupt at ffffffff8e000ecb [exception RIP: fib6_select_path+299] RIP: ffffffff8ddafe7b RSP: ffffbd13003d37b8 RFLAGS: 00000287 RAX: ffff975850b43600 RBX: ffff975850b40200 RCX: 0000000000000000 RDX: 000000003fffffff RSI: 0000000051d383e4 RDI: ffff975850b43618 RBP: ffffbd13003d3800 R8: 0000000000000000 R9: ffff975850b40200 R10: 0000000000000000 R11: 0000000000000000 R12: ffffbd13003d3830 R13: ffff975850b436a8 R14: ffff975850b43600 R15: 0000000000000007 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 9 [ffffbd13003d3808] ip6_pol_route at ffffffff8ddb030c 10 [ffffbd13003d3888] ip6_pol_route_input at ffffffff8ddb068c 11 [ffffbd13003d3898] fib6_rule_lookup at ffffffff8ddf02b5 12 [ffffbd13003d3928] ip6_route_input at ffffffff8ddb0f47 13 [ffffbd13003d3a18] ip6_rcv_finish_core.constprop.0 at ffffffff8dd950d0 14 [ffffbd13003d3a30] ip6_list_rcv_finish.constprop.0 at ffffffff8dd96274 15 [ffffbd13003d3a98] ip6_sublist_rcv at ffffffff8dd96474 16 [ffffbd13003d3af8] ipv6_list_rcv at ffffffff8dd96615 17 [ffffbd13003d3b60] __netif_receive_skb_list_core at ffffffff8dc16fec 18 [ffffbd13003d3be0] netif_receive_skb_list_internal at ffffffff8dc176b3 19 [ffffbd13003d3c50] napi_gro_receive at ffffffff8dc565b9 20 [ffffbd13003d3c80] ice_receive_skb at ffffffffc087e4f5 [ice] 21 [ffffbd13003d3c90] ice_clean_rx_irq at ffffffffc0881b80 [ice] 22 [ffffbd13003d3d20] ice_napi_poll at ffffffffc088232f [ice] 23 [ffffbd13003d3d80] __napi_poll at ffffffff8dc18000 24 [ffffbd13003d3db8] net_rx_action at ffffffff8dc18581 25 [ffffbd13003d3e40] __do_softirq at ffffffff8df352e9 26 [ffffbd13003d3eb0] run_ksoftirqd at ffffffff8ceffe47 27 [ffffbd13003d3ec0] smpboot_thread_fn at ffffffff8cf36a30 28 [ffffbd13003d3ee8] kthread at ffffffff8cf2b39f 29 [ffffbd13003d3f28] ret_from_fork at ffffffff8ce5fa64 30 [ffffbd13003d3f50] ret_from_fork_asm at ffffffff8ce03cbb Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table") Reported-by: Adrian Oliver <kernel@aoliver.ca> Signed-off-by: Omid Ehtemam-Haghighi <omid.ehtemamhaghighi@menlosecurity.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Ido Schimmel <idosch@idosch.org> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: Simon Horman <horms@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20241106010236.1239299-1-omid.ehtemamhaghighi@menlosecurity.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11Merge branch 'knobs-for-npc-default-rule-counters'Jakub Kicinski
Linu Cherian says: ==================== Knobs for NPC default rule counters Patch 1 introduce _rvu_mcam_remove/add_counter_from/to_rule by refactoring existing code Patch 2 adds a devlink param to enable/disable counters for default rules. Once enabled, counters can Patch 3 adds documentation for devlink params v4: https://lore.kernel.org/20241029035739.1981839-1-lcherian@marvell.com ==================== Link: https://patch.msgid.link/20241105125620.2114301-1-lcherian@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11devlink: Add documentation for OcteonTx2 AFLinu Cherian
Add documentation for the following devlink params - npc_mcam_high_zone_percent - npc_def_rule_cntr - nix_maxlf Signed-off-by: Linu Cherian <lcherian@marvell.com> Link: https://patch.msgid.link/20241105125620.2114301-4-lcherian@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11octeontx2-af: Knobs for NPC default rule countersLinu Cherian
Add devlink knobs to enable/disable counters on NPC default rule entries. Sample command to enable default rule counters: devlink dev param set <dev> name npc_def_rule_cntr value true cmode runtime Sample command to read the counter: cat /sys/kernel/debug/cn10k/npc/mcam_rules Signed-off-by: Linu Cherian <lcherian@marvell.com> Link: https://patch.msgid.link/20241105125620.2114301-3-lcherian@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11octeontx2-af: Refactor few NPC mcam APIsLinu Cherian
Introduce lowlevel variant of rvu_mcam_remove/add_counter_from/to_rule for better code reuse, which assumes necessary locks are taken at higher level. These low level functions would be used for implementing default rule counter APIs in the subsequent patch. Signed-off-by: Linu Cherian <lcherian@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241105125620.2114301-2-lcherian@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11mlx5/core: deduplicate {mlx5_,}eq_update_ci()Caleb Sander Mateos
The logic of eq_update_ci() is duplicated in mlx5_eq_update_ci(). The only additional work done by mlx5_eq_update_ci() is to increment eq->cons_index. Call eq_update_ci() from mlx5_eq_update_ci() to avoid the duplication. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107183054.2443218-2-csander@purestorage.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11mlx5/core: relax memory barrier in eq_update_ci()Caleb Sander Mateos
The memory barrier in eq_update_ci() after the doorbell write is a significant hot spot in mlx5_eq_comp_int(). Under heavy TCP load, we see 3% of CPU time spent on the mfence instruction. 98df6d5b877c ("net/mlx5: A write memory barrier is sufficient in EQ ci update") already relaxed the full memory barrier to just a write barrier in mlx5_eq_update_ci(), which duplicates eq_update_ci(). So replace mb() with wmb() in eq_update_ci() too. On strongly ordered architectures, no barrier is actually needed because the MMIO writes to the doorbell register are guaranteed to appear to the device in the order they were made. However, the kernel's ordered MMIO primitive writel() lacks a convenient big-endian interface. Therefore, we opt to stick with __raw_writel() + a barrier. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107183054.2443218-1-csander@purestorage.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11Merge branch ↵Jakub Kicinski
'macsec-inherit-lower-device-s-features-and-tso-limits-when-offloading' Sabrina Dubroca says: ==================== macsec: inherit lower device's features and TSO limits when offloading When macsec is offloaded to a NIC, we can take advantage of some of its features, mainly TSO and checksumming. This increases performance significantly. Some features cannot be inherited, because they require additional ops that aren't provided by the macsec netdevice. We also need to inherit TSO limits from the lower device, like VLAN/macvlan devices do. This series also moves the existing macsec offload selftest to the netdevsim selftests before adding tests for the new features. To allow this new selftest to work, netdevsim's hw_features are expanded. ==================== Link: https://patch.msgid.link/cover.1730929545.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11selftests: netdevsim: add ethtool features to macsec offload testsSabrina Dubroca
The test verifies that available features aren't changed by toggling offload on the device. Creating a device with offload off and then enabling it later should result in the same features as creating the device with offload enabled directly. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/ba801bd0a75b02de2dddbfc77f9efceb8b3d8a2e.1730929545.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11selftests: netdevsim: add test toggling macsec offloadSabrina Dubroca
The test verifies that toggling offload works (both via rtnetlink and macsec's genetlink APIs). This is only possible when no SA is configured. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/bf8e27ee0d921caa4eb35f1e830eca6d4080ddb2.1730929545.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>