Age | Commit message (Collapse) | Author |
|
[ Upstream commit 95fc45d1dea8e1253f8ec58abc5befb71553d666 ]
syzbot found a lockdep issue [1].
We should remove ax25 RTNL dependency in ax25_setsockopt()
This should also fix a variety of possible UAF in ax25.
[1]
WARNING: possible circular locking dependency detected
6.13.0-rc3-syzkaller-00762-g9268abe611b0 #0 Not tainted
------------------------------------------------------
syz.5.1818/12806 is trying to acquire lock:
ffffffff8fcb3988 (rtnl_mutex){+.+.}-{4:4}, at: ax25_setsockopt+0xa55/0xe90 net/ax25/af_ax25.c:680
but task is already holding lock:
ffff8880617ac258 (sk_lock-AF_AX25){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1618 [inline]
ffff8880617ac258 (sk_lock-AF_AX25){+.+.}-{0:0}, at: ax25_setsockopt+0x209/0xe90 net/ax25/af_ax25.c:574
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (sk_lock-AF_AX25){+.+.}-{0:0}:
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849
lock_sock_nested+0x48/0x100 net/core/sock.c:3642
lock_sock include/net/sock.h:1618 [inline]
ax25_kill_by_device net/ax25/af_ax25.c:101 [inline]
ax25_device_event+0x24d/0x580 net/ax25/af_ax25.c:146
notifier_call_chain+0x1a5/0x3f0 kernel/notifier.c:85
__dev_notify_flags+0x207/0x400
dev_change_flags+0xf0/0x1a0 net/core/dev.c:9026
dev_ifsioc+0x7c8/0xe70 net/core/dev_ioctl.c:563
dev_ioctl+0x719/0x1340 net/core/dev_ioctl.c:820
sock_do_ioctl+0x240/0x460 net/socket.c:1234
sock_ioctl+0x626/0x8e0 net/socket.c:1339
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:906 [inline]
__se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (rtnl_mutex){+.+.}-{4:4}:
check_prev_add kernel/locking/lockdep.c:3161 [inline]
check_prevs_add kernel/locking/lockdep.c:3280 [inline]
validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904
__lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5226
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849
__mutex_lock_common kernel/locking/mutex.c:585 [inline]
__mutex_lock+0x1ac/0xee0 kernel/locking/mutex.c:735
ax25_setsockopt+0xa55/0xe90 net/ax25/af_ax25.c:680
do_sock_setsockopt+0x3af/0x720 net/socket.c:2324
__sys_setsockopt net/socket.c:2349 [inline]
__do_sys_setsockopt net/socket.c:2355 [inline]
__se_sys_setsockopt net/socket.c:2352 [inline]
__x64_sys_setsockopt+0x1ee/0x280 net/socket.c:2352
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sk_lock-AF_AX25);
lock(rtnl_mutex);
lock(sk_lock-AF_AX25);
lock(rtnl_mutex);
*** DEADLOCK ***
1 lock held by syz.5.1818/12806:
#0: ffff8880617ac258 (sk_lock-AF_AX25){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1618 [inline]
#0: ffff8880617ac258 (sk_lock-AF_AX25){+.+.}-{0:0}, at: ax25_setsockopt+0x209/0xe90 net/ax25/af_ax25.c:574
stack backtrace:
CPU: 1 UID: 0 PID: 12806 Comm: syz.5.1818 Not tainted 6.13.0-rc3-syzkaller-00762-g9268abe611b0 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_circular_bug+0x13a/0x1b0 kernel/locking/lockdep.c:2074
check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2206
check_prev_add kernel/locking/lockdep.c:3161 [inline]
check_prevs_add kernel/locking/lockdep.c:3280 [inline]
validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904
__lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5226
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849
__mutex_lock_common kernel/locking/mutex.c:585 [inline]
__mutex_lock+0x1ac/0xee0 kernel/locking/mutex.c:735
ax25_setsockopt+0xa55/0xe90 net/ax25/af_ax25.c:680
do_sock_setsockopt+0x3af/0x720 net/socket.c:2324
__sys_setsockopt net/socket.c:2349 [inline]
__do_sys_setsockopt net/socket.c:2355 [inline]
__se_sys_setsockopt net/socket.c:2352 [inline]
__x64_sys_setsockopt+0x1ee/0x280 net/socket.c:2352
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f7b62385d29
Fixes: c433570458e4 ("ax25: fix a use-after-free in ax25_fillin_cb()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250103210514.87290-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7a596a50c4a4eab946aec149171c72321b4934aa ]
All callers of inet_getpeer() want to create an inetpeer.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241215175629.1248773-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: a853c609504e ("inetpeer: do not get a refcount in inet_getpeer()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 661cd8fc8e9039819ca0c22e0add52b632240a9e ]
All callers of inet_getpeer_v4() and inet_getpeer_v6()
want to create an inetpeer.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241215175629.1248773-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: a853c609504e ("inetpeer: do not get a refcount in inet_getpeer()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit cbc16bceea784210d585a42ac9f8f10ce62b300e ]
page_pool_ref_netmem() should work with either netmem representation, but
currently it casts to a page with netmem_to_page(), which will fail with
net iovs. Use netmem_get_pp_ref_count_ref() instead.
Fixes: 8ab79ed50cf1 ("page_pool: devmem support")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://lore.kernel.org/20250108220644.3528845-2-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3479c7549fb1dfa7a1db4efb7347c7b8ef50de4b ]
If the backlog of listen() is set to zero, sk_acceptq_is_full() allows
one connection to be made, but inet_csk_reqsk_queue_is_full() does not.
When the net.ipv4.tcp_syncookies is zero, inet_csk_reqsk_queue_is_full()
will cause an immediate drop before the sk_acceptq_is_full() check in
tcp_conn_request(), resulting in no connection can be made.
This patch tries to keep consistent with 64a146513f8f ("[NET]: Revert
incorrect accept queue backlog changes.").
Link: https://lore.kernel.org/netdev/20250102080258.53858-1-kuniyu@amazon.com/
Fixes: ef547f2ac16b ("tcp: remove max_qlen_log")
Signed-off-by: Zhongqiu Duan <dzq.aishenghu0@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250102171426.915276-1-dzq.aishenghu0@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4d94f05558271654670d18c26c912da0c1c15549 ]
This reworks hci_cb_list to not use mutex hci_cb_list_lock to avoid bugs
like the bellow:
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 5070, name: kworker/u9:2
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
4 locks held by kworker/u9:2/5070:
#0: ffff888015be3948 ((wq_completion)hci0#2){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline]
#0: ffff888015be3948 ((wq_completion)hci0#2){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x1770 kernel/workqueue.c:3335
#1: ffffc90003b6fd00 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline]
#1: ffffc90003b6fd00 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x1770 kernel/workqueue.c:3335
#2: ffff8880665d0078 (&hdev->lock){+.+.}-{3:3}, at: hci_le_create_big_complete_evt+0xcf/0xae0 net/bluetooth/hci_event.c:6914
#3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
#3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
#3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: hci_le_create_big_complete_evt+0xdb/0xae0 net/bluetooth/hci_event.c:6915
CPU: 0 PID: 5070 Comm: kworker/u9:2 Not tainted 6.8.0-syzkaller-08073-g480e035fc4c7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Workqueue: hci0 hci_rx_work
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
__might_resched+0x5d4/0x780 kernel/sched/core.c:10187
__mutex_lock_common kernel/locking/mutex.c:585 [inline]
__mutex_lock+0xc1/0xd70 kernel/locking/mutex.c:752
hci_connect_cfm include/net/bluetooth/hci_core.h:2004 [inline]
hci_le_create_big_complete_evt+0x3d9/0xae0 net/bluetooth/hci_event.c:6939
hci_event_func net/bluetooth/hci_event.c:7514 [inline]
hci_event_packet+0xa53/0x1540 net/bluetooth/hci_event.c:7569
hci_rx_work+0x3e8/0xca0 net/bluetooth/hci_core.c:4171
process_one_work kernel/workqueue.c:3254 [inline]
process_scheduled_works+0xa00/0x1770 kernel/workqueue.c:3335
worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
</TASK>
Reported-by: syzbot+2fb0835e0c9cefc34614@syzkaller.appspotmail.com
Tested-by: syzbot+2fb0835e0c9cefc34614@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2fb0835e0c9cefc34614
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 542ed8145e6f9392e3d0a86a0e9027d2ffd183e4 ]
Access to genmask field in struct nft_set_ext results in unaligned
atomic read:
[ 72.130109] Unable to handle kernel paging request at virtual address ffff0000c2bb708c
[ 72.131036] Mem abort info:
[ 72.131213] ESR = 0x0000000096000021
[ 72.131446] EC = 0x25: DABT (current EL), IL = 32 bits
[ 72.132209] SET = 0, FnV = 0
[ 72.133216] EA = 0, S1PTW = 0
[ 72.134080] FSC = 0x21: alignment fault
[ 72.135593] Data abort info:
[ 72.137194] ISV = 0, ISS = 0x00000021, ISS2 = 0x00000000
[ 72.142351] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 72.145989] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 72.150115] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000237d27000
[ 72.154893] [ffff0000c2bb708c] pgd=0000000000000000, p4d=180000023ffff403, pud=180000023f84b403, pmd=180000023f835403,
+pte=0068000102bb7707
[ 72.163021] Internal error: Oops: 0000000096000021 [#1] SMP
[...]
[ 72.170041] CPU: 7 UID: 0 PID: 54 Comm: kworker/7:0 Tainted: G E 6.13.0-rc3+ #2
[ 72.170509] Tainted: [E]=UNSIGNED_MODULE
[ 72.170720] Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202302-for-qemu 03/01/2023
[ 72.171192] Workqueue: events_power_efficient nft_rhash_gc [nf_tables]
[ 72.171552] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 72.171915] pc : nft_rhash_gc+0x200/0x2d8 [nf_tables]
[ 72.172166] lr : nft_rhash_gc+0x128/0x2d8 [nf_tables]
[ 72.172546] sp : ffff800081f2bce0
[ 72.172724] x29: ffff800081f2bd40 x28: ffff0000c2bb708c x27: 0000000000000038
[ 72.173078] x26: ffff0000c6780ef0 x25: ffff0000c643df00 x24: ffff0000c6778f78
[ 72.173431] x23: 000000000000001a x22: ffff0000c4b1f000 x21: ffff0000c6780f78
[ 72.173782] x20: ffff0000c2bb70dc x19: ffff0000c2bb7080 x18: 0000000000000000
[ 72.174135] x17: ffff0000c0a4e1c0 x16: 0000000000003000 x15: 0000ac26d173b978
[ 72.174485] x14: ffffffffffffffff x13: 0000000000000030 x12: ffff0000c6780ef0
[ 72.174841] x11: 0000000000000000 x10: ffff800081f2bcf8 x9 : ffff0000c3000000
[ 72.175193] x8 : 00000000000004be x7 : 0000000000000000 x6 : 0000000000000000
[ 72.175544] x5 : 0000000000000040 x4 : ffff0000c3000010 x3 : 0000000000000000
[ 72.175871] x2 : 0000000000003a98 x1 : ffff0000c2bb708c x0 : 0000000000000004
[ 72.176207] Call trace:
[ 72.176316] nft_rhash_gc+0x200/0x2d8 [nf_tables] (P)
[ 72.176653] process_one_work+0x178/0x3d0
[ 72.176831] worker_thread+0x200/0x3f0
[ 72.176995] kthread+0xe8/0xf8
[ 72.177130] ret_from_fork+0x10/0x20
[ 72.177289] Code: 54fff984 d503201f d2800080 91003261 (f820303f)
[ 72.177557] ---[ end trace 0000000000000000 ]---
Align struct nft_set_ext to word size to address this and
documentation it.
pahole reports that this increases the size of elements for rhash and
pipapo in 8 bytes on x86_64.
Fixes: 7ffc7481153b ("netfilter: nft_set_hash: skip duplicated elements pending gc run")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 54f89b3178d5448dd4457afbb98fc1ab99090a65 ]
When bpf_tcp_ingress() is called, the skmsg is being redirected to the
ingress of the destination socket. Therefore, we should charge its
receive socket buffer, instead of sending socket buffer.
Because sk_rmem_schedule() tests pfmemalloc of skb, we need to
introduce a wrapper and call it for skmsg.
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241210012039.1669389-2-zijianzhang@bytedance.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 29a651451e6c264f58cd9d9a26088e579d17b242 ]
The voice setting is used by sco_connect() or sco_conn_defer_accept()
after being set by sco_sock_setsockopt().
The PCM part of the voice setting is used for offload mode through PCM
chipset port.
This commits add support for mSBC 16 bits offloading, i.e. audio data
not transported over HCI.
The BCM4349B1 supports 16 bits transparent data on its I2S port.
If BT_VOICE_TRANSPARENT is used when accepting a SCO connection, this
gives only garbage audio while using BT_VOICE_TRANSPARENT_16BIT gives
correct audio.
This has been tested with connection to iPhone 14 and Samsung S24.
Fixes: ad10b1a48754 ("Bluetooth: Add Bluetooth socket voice option")
Signed-off-by: Frédéric Danis <frederic.danis@collabora.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b04df3da1b5c6f6dc7cdccc37941740c078c4043 ]
nf_tables_chain_destroy can sleep, it can't be used from call_rcu
callbacks.
Moreover, nf_tables_rule_release() is only safe for error unwinding,
while transaction mutex is held and the to-be-desroyed rule was not
exposed to either dataplane or dumps, as it deactives+frees without
the required synchronize_rcu() in-between.
nft_rule_expr_deactivate() callbacks will change ->use counters
of other chains/sets, see e.g. nft_lookup .deactivate callback, these
must be serialized via transaction mutex.
Also add a few lockdep asserts to make this more explicit.
Calling synchronize_rcu() isn't ideal, but fixing this without is hard
and way more intrusive. As-is, we can get:
WARNING: .. net/netfilter/nf_tables_api.c:5515 nft_set_destroy+0x..
Workqueue: events nf_tables_trans_destroy_work
RIP: 0010:nft_set_destroy+0x3fe/0x5c0
Call Trace:
<TASK>
nf_tables_trans_destroy_work+0x6b7/0xad0
process_one_work+0x64a/0xce0
worker_thread+0x613/0x10d0
In case the synchronize_rcu becomes an issue, we can explore alternatives.
One way would be to allocate nft_trans_rule objects + one nft_trans_chain
object, deactivate the rules + the chain and then defer the freeing to the
nft destroy workqueue. We'd still need to keep the synchronize_rcu path as
a fallback to handle -ENOMEM corner cases though.
Reported-by: syzbot+b26935466701e56cfdc2@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67478d92.050a0220.253251.0062.GAE@google.com/T/
Fixes: c03d278fdf35 ("netfilter: nf_tables: wait for rcu grace period on net_device removal")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3e643e4efa1e87432204b62f9cfdea3b2508c830 ]
The bt_copy_from_sockptr() return value is being misinterpreted by most
users: a non-zero result is mistakenly assumed to represent an error code,
but actually indicates the number of bytes that could not be copied.
Remove bt_copy_from_sockptr() and adapt callers to use
copy_safe_from_sockptr().
For sco_sock_setsockopt() (case BT_CODEC) use copy_struct_from_sockptr() to
scrub parts of uninitialized buffer.
Opportunistically, rename `len` to `optlen` in hci_sock_setsockopt_old()
and hci_sock_setsockopt().
Fixes: 51eda36d33e4 ("Bluetooth: SCO: Fix not validating setsockopt user input")
Fixes: a97de7bff13b ("Bluetooth: RFCOMM: Fix not validating setsockopt user input")
Fixes: 4f3951242ace ("Bluetooth: L2CAP: Fix not validating setsockopt user input")
Fixes: 9e8742cdfc4b ("Bluetooth: ISO: Fix not validating setsockopt user input")
Fixes: b2186061d604 ("Bluetooth: hci_sock: Fix not validating setsockopt user input")
Reviewed-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Reviewed-by: David Wei <dw@davidwei.uk>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0f6ede9fbc747e2553612271bce108f7517e7a45 ]
Ilya reported a slab-use-after-free in dst_destroy [1]
Issue is in xfrm6_net_init() and xfrm4_net_init() :
They copy xfrm[46]_dst_ops_template into net->xfrm.xfrm[46]_dst_ops.
But net structure might be freed before all the dst callbacks are
called. So when dst_destroy() calls later :
if (dst->ops->destroy)
dst->ops->destroy(dst);
dst->ops points to the old net->xfrm.xfrm[46]_dst_ops, which has been freed.
See a relevant issue fixed in :
ac888d58869b ("net: do not delay dst_entries_add() in dst_release()")
A fix is to queue the 'struct net' to be freed after one
another cleanup_net() round (and existing rcu_barrier())
[1]
BUG: KASAN: slab-use-after-free in dst_destroy (net/core/dst.c:112)
Read of size 8 at addr ffff8882137ccab0 by task swapper/37/0
Dec 03 05:46:18 kernel:
CPU: 37 UID: 0 PID: 0 Comm: swapper/37 Kdump: loaded Not tainted 6.12.0 #67
Hardware name: Red Hat KVM/RHEL, BIOS 1.16.1-1.el9 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl (lib/dump_stack.c:124)
print_address_description.constprop.0 (mm/kasan/report.c:378)
? dst_destroy (net/core/dst.c:112)
print_report (mm/kasan/report.c:489)
? dst_destroy (net/core/dst.c:112)
? kasan_addr_to_slab (mm/kasan/common.c:37)
kasan_report (mm/kasan/report.c:603)
? dst_destroy (net/core/dst.c:112)
? rcu_do_batch (kernel/rcu/tree.c:2567)
dst_destroy (net/core/dst.c:112)
rcu_do_batch (kernel/rcu/tree.c:2567)
? __pfx_rcu_do_batch (kernel/rcu/tree.c:2491)
? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4339 kernel/locking/lockdep.c:4406)
rcu_core (kernel/rcu/tree.c:2825)
handle_softirqs (kernel/softirq.c:554)
__irq_exit_rcu (kernel/softirq.c:589 kernel/softirq.c:428 kernel/softirq.c:637)
irq_exit_rcu (kernel/softirq.c:651)
sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049 arch/x86/kernel/apic/apic.c:1049)
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:702)
RIP: 0010:default_idle (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:92 arch/x86/kernel/process.c:743)
Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 0f 00 2d c7 c9 27 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90
RSP: 0018:ffff888100d2fe00 EFLAGS: 00000246
RAX: 00000000001870ed RBX: 1ffff110201a5fc2 RCX: ffffffffb61a3e46
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffb3d4d123
RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed11c7e1835d
R10: ffff888e3f0c1aeb R11: 0000000000000000 R12: 0000000000000000
R13: ffff888100d20000 R14: dffffc0000000000 R15: 0000000000000000
? ct_kernel_exit.constprop.0 (kernel/context_tracking.c:148)
? cpuidle_idle_call (kernel/sched/idle.c:186)
default_idle_call (./include/linux/cpuidle.h:143 kernel/sched/idle.c:118)
cpuidle_idle_call (kernel/sched/idle.c:186)
? __pfx_cpuidle_idle_call (kernel/sched/idle.c:168)
? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5848)
? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4347 kernel/locking/lockdep.c:4406)
? tsc_verify_tsc_adjust (arch/x86/kernel/tsc_sync.c:59)
do_idle (kernel/sched/idle.c:326)
cpu_startup_entry (kernel/sched/idle.c:423 (discriminator 1))
start_secondary (arch/x86/kernel/smpboot.c:202 arch/x86/kernel/smpboot.c:282)
? __pfx_start_secondary (arch/x86/kernel/smpboot.c:232)
? soft_restart_cpu (arch/x86/kernel/head_64.S:452)
common_startup_64 (arch/x86/kernel/head_64.S:414)
</TASK>
Dec 03 05:46:18 kernel:
Allocated by task 12184:
kasan_save_stack (mm/kasan/common.c:48)
kasan_save_track (./arch/x86/include/asm/current.h:49 mm/kasan/common.c:60 mm/kasan/common.c:69)
__kasan_slab_alloc (mm/kasan/common.c:319 mm/kasan/common.c:345)
kmem_cache_alloc_noprof (mm/slub.c:4085 mm/slub.c:4134 mm/slub.c:4141)
copy_net_ns (net/core/net_namespace.c:421 net/core/net_namespace.c:480)
create_new_namespaces (kernel/nsproxy.c:110)
unshare_nsproxy_namespaces (kernel/nsproxy.c:228 (discriminator 4))
ksys_unshare (kernel/fork.c:3313)
__x64_sys_unshare (kernel/fork.c:3382)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Dec 03 05:46:18 kernel:
Freed by task 11:
kasan_save_stack (mm/kasan/common.c:48)
kasan_save_track (./arch/x86/include/asm/current.h:49 mm/kasan/common.c:60 mm/kasan/common.c:69)
kasan_save_free_info (mm/kasan/generic.c:582)
__kasan_slab_free (mm/kasan/common.c:271)
kmem_cache_free (mm/slub.c:4579 mm/slub.c:4681)
cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:446 net/core/net_namespace.c:647)
process_one_work (kernel/workqueue.c:3229)
worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391)
kthread (kernel/kthread.c:389)
ret_from_fork (arch/x86/kernel/process.c:147)
ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
Dec 03 05:46:18 kernel:
Last potentially related work creation:
kasan_save_stack (mm/kasan/common.c:48)
__kasan_record_aux_stack (mm/kasan/generic.c:541)
insert_work (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 kernel/workqueue.c:788 kernel/workqueue.c:795 kernel/workqueue.c:2186)
__queue_work (kernel/workqueue.c:2340)
queue_work_on (kernel/workqueue.c:2391)
xfrm_policy_insert (net/xfrm/xfrm_policy.c:1610)
xfrm_add_policy (net/xfrm/xfrm_user.c:2116)
xfrm_user_rcv_msg (net/xfrm/xfrm_user.c:3321)
netlink_rcv_skb (net/netlink/af_netlink.c:2536)
xfrm_netlink_rcv (net/xfrm/xfrm_user.c:3344)
netlink_unicast (net/netlink/af_netlink.c:1316 net/netlink/af_netlink.c:1342)
netlink_sendmsg (net/netlink/af_netlink.c:1886)
sock_write_iter (net/socket.c:729 net/socket.c:744 net/socket.c:1165)
vfs_write (fs/read_write.c:590 fs/read_write.c:683)
ksys_write (fs/read_write.c:736)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Dec 03 05:46:18 kernel:
Second to last potentially related work creation:
kasan_save_stack (mm/kasan/common.c:48)
__kasan_record_aux_stack (mm/kasan/generic.c:541)
insert_work (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 kernel/workqueue.c:788 kernel/workqueue.c:795 kernel/workqueue.c:2186)
__queue_work (kernel/workqueue.c:2340)
queue_work_on (kernel/workqueue.c:2391)
__xfrm_state_insert (./include/linux/workqueue.h:723 net/xfrm/xfrm_state.c:1150 net/xfrm/xfrm_state.c:1145 net/xfrm/xfrm_state.c:1513)
xfrm_state_update (./include/linux/spinlock.h:396 net/xfrm/xfrm_state.c:1940)
xfrm_add_sa (net/xfrm/xfrm_user.c:912)
xfrm_user_rcv_msg (net/xfrm/xfrm_user.c:3321)
netlink_rcv_skb (net/netlink/af_netlink.c:2536)
xfrm_netlink_rcv (net/xfrm/xfrm_user.c:3344)
netlink_unicast (net/netlink/af_netlink.c:1316 net/netlink/af_netlink.c:1342)
netlink_sendmsg (net/netlink/af_netlink.c:1886)
sock_write_iter (net/socket.c:729 net/socket.c:744 net/socket.c:1165)
vfs_write (fs/read_write.c:590 fs/read_write.c:683)
ksys_write (fs/read_write.c:736)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Fixes: a8a572a6b5f2 ("xfrm: dst_entries_init() per-net dst_ops")
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Closes: https://lore.kernel.org/netdev/CANn89iKKYDVpB=MtmfH7nyv2p=rJWSLedO5k7wSZgtY_tO8WQg@mail.gmail.com/T/#m02c98c3009fe66382b73cfb4db9cf1df6fab3fbf
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241204125455.3871859-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a6d75ecee2bf828ac6a1b52724aba0a977e4eaf4 ]
It is unclear if net/lapb code is supposed to be ready for 8021q.
We can at least avoid crashes like the following :
skbuff: skb_under_panic: text:ffffffff8aabe1f6 len:24 put:20 head:ffff88802824a400 data:ffff88802824a3fe tail:0x16 end:0x140 dev:nr0.2
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:206 !
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 1 UID: 0 PID: 5508 Comm: dhcpcd Not tainted 6.12.0-rc7-syzkaller-00144-g66418447d27b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
RIP: 0010:skb_panic net/core/skbuff.c:206 [inline]
RIP: 0010:skb_under_panic+0x14b/0x150 net/core/skbuff.c:216
Code: 0d 8d 48 c7 c6 2e 9e 29 8e 48 8b 54 24 08 8b 0c 24 44 8b 44 24 04 4d 89 e9 50 41 54 41 57 41 56 e8 1a 6f 37 02 48 83 c4 20 90 <0f> 0b 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3
RSP: 0018:ffffc90002ddf638 EFLAGS: 00010282
RAX: 0000000000000086 RBX: dffffc0000000000 RCX: 7a24750e538ff600
RDX: 0000000000000000 RSI: 0000000000000201 RDI: 0000000000000000
RBP: ffff888034a86650 R08: ffffffff8174b13c R09: 1ffff920005bbe60
R10: dffffc0000000000 R11: fffff520005bbe61 R12: 0000000000000140
R13: ffff88802824a400 R14: ffff88802824a3fe R15: 0000000000000016
FS: 00007f2a5990d740(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000110c2631fd CR3: 0000000029504000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
skb_push+0xe5/0x100 net/core/skbuff.c:2636
nr_header+0x36/0x320 net/netrom/nr_dev.c:69
dev_hard_header include/linux/netdevice.h:3148 [inline]
vlan_dev_hard_header+0x359/0x480 net/8021q/vlan_dev.c:83
dev_hard_header include/linux/netdevice.h:3148 [inline]
lapbeth_data_transmit+0x1f6/0x2a0 drivers/net/wan/lapbether.c:257
lapb_data_transmit+0x91/0xb0 net/lapb/lapb_iface.c:447
lapb_transmit_buffer+0x168/0x1f0 net/lapb/lapb_out.c:149
lapb_establish_data_link+0x84/0xd0
lapb_device_event+0x4e0/0x670
notifier_call_chain+0x19f/0x3e0 kernel/notifier.c:93
__dev_notify_flags+0x207/0x400
dev_change_flags+0xf0/0x1a0 net/core/dev.c:8922
devinet_ioctl+0xa4e/0x1aa0 net/ipv4/devinet.c:1188
inet_ioctl+0x3d7/0x4f0 net/ipv4/af_inet.c:1003
sock_do_ioctl+0x158/0x460 net/socket.c:1227
sock_ioctl+0x626/0x8e0 net/socket.c:1346
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+fb99d1b0c0f81d94a5e2@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/67506220.050a0220.17bd51.006c.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241204141031.4030267-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 11ac0d7c3b5ba58232fb7dacb54371cbe75ec183 ]
If we got an unprotected action frame with CSA and then we heard the
beacon with the CSA IE, we'll block the queues with the CSA reason
twice. Since this reason is refcounted, we won't wake up the queues
since we wake them up only once and the ref count will never reach 0.
This led to blocked queues that prevented any activity (even
disconnection wouldn't reset the queue state and the only way to recover
would be to reload the kernel module.
Fix this by not refcounting the CSA reason.
It becomes now pointless to maintain the csa_blocked_queues state.
Remove it.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Fixes: 414e090bc41d ("wifi: mac80211: restrict public action ECSA frame handling")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219447
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241119173108.5ea90828c2cc.I4f89e58572fb71ae48e47a81e74595cac410fbac@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 94464a7b71634037b13d54021e0dfd0fb0d8c1f0 ]
This adds quirks for broken extended create connection,
and write auth payload timeout.
Signed-off-by: Danil Pylaev <danstiv404@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6b2d11e2d8fc130df4708be0b6b53fd3e6b54cf6 ]
Under CONFIG_PROVE_RCU_LIST + CONFIG_RCU_EXPERT
hlist_for_each_entry_rcu() provides very helpful splats, which help
to find possible issues. I missed CONFIG_RCU_EXPERT=y in my testing
config the same as described in
a3e4bf7f9675 ("configs/debug: make sure PROVE_RCU_LIST=y takes effect").
The fix itself is trivial: add the very same lockdep annotations
as were used to dereference ao_info from the socket.
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20241028152645.35a8be66@kernel.org/
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20241030-tcp-ao-hlist-lockdep-annotate-v1-1-bf641a64d7c6@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7b1d83da254be3bf054965c8f3b1ad976f460ae5 ]
Softirq can interrupt ongoing packet from process context that is
walking over the percpu area that contains inner header offsets.
Disable bh and perform three checks before restoring the percpu inner
header offsets to validate that the percpu area is valid for this
skbuff:
1) If the NFT_PKTINFO_INNER_FULL flag is set on, then this skbuff
has already been parsed before for inner header fetching to
register.
2) Validate that the percpu area refers to this skbuff using the
skbuff pointer as a cookie. If there is a cookie mismatch, then
this skbuff needs to be parsed again.
3) Finally, validate if the percpu area refers to this tunnel type.
Only after these three checks the percpu area is restored to a on-stack
copy and bh is enabled again.
After inner header fetching, the on-stack copy is stored back to the
percpu area.
Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching")
Reported-by: syzbot+84d0441b9860f0d63285@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit dbefa1f31a91670c9e7dac9b559625336206466f ]
Commit b1fca27d384e ("kernel debug: support resetting WARN*_ONCE")
added support for clearing the state of once warnings. However,
it is not functional when CONFIG_LD_DEAD_CODE_DATA_ELIMINATION or
CONFIG_LTO_CLANG is enabled, because .data.once matches the
.data.[0-9a-zA-Z_]* pattern in the DATA_MAIN macro.
Commit cb87481ee89d ("kbuild: linker script do not match C names unless
LD_DEAD_CODE_DATA_ELIMINATION is configured") was introduced to suppress
the issue for the default CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n case,
providing a minimal fix for stable backporting. We were aware this did
not address the issue for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y. The
plan was to apply correct fixes and then revert cb87481ee89d. [1]
Seven years have passed since then, yet the #ifdef workaround remains in
place. Meanwhile, commit b1fca27d384e introduced the .data.once section,
and commit dc5723b02e52 ("kbuild: add support for Clang LTO") extended
the #ifdef.
Using a ".." separator in the section name fixes the issue for
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION and CONFIG_LTO_CLANG.
[1]: https://lore.kernel.org/linux-kbuild/CAK7LNASck6BfdLnESxXUeECYL26yUDm0cwRZuM4gmaWUkxjL5g@mail.gmail.com/
Fixes: b1fca27d384e ("kernel debug: support resetting WARN*_ONCE")
Fixes: dc5723b02e52 ("kbuild: add support for Clang LTO")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 42ecf1947135110ea08abeaca39741636f9a2285 ]
The Bluetooth Core spec does not allow a LE BIG Create sync command to be
sent to Controller if another one is pending (Vol 4, Part E, page 2586).
In order to avoid this issue, the HCI_CONN_CREATE_BIG_SYNC was added
to mark that the LE BIG Create Sync command has been sent for a hcon.
Once the BIG Sync Established event is received, the hcon flag is
erased and the next pending hcon is handled.
Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 07a9342b94a9 ("Bluetooth: ISO: Send BIG Create Sync via hci_sync")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4a5e0ba68676b3a77298cf646cd2b39c94fbd2f5 ]
The Bluetooth Core spec does not allow a LE PA Create sync command to be
sent to Controller if another one is pending (Vol 4, Part E, page 2493).
In order to avoid this issue, the HCI_CONN_CREATE_PA_SYNC was added
to mark that the LE PA Create Sync command has been sent for a hcon.
Once the PA Sync Established event is received, the hcon flag is
erased and the next pending hcon is handled.
Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 07a9342b94a9 ("Bluetooth: ISO: Send BIG Create Sync via hci_sync")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth.
Quite calm week. No new regression under investigation.
Current release - regressions:
- eth: revert "igb: Disable threaded IRQ for igb_msix_other"
Current release - new code bugs:
- bluetooth: btintel: direct exception event to bluetooth stack
Previous releases - regressions:
- core: fix data-races around sk->sk_forward_alloc
- netlink: terminate outstanding dump on socket close
- mptcp: error out earlier on disconnect
- vsock: fix accept_queue memory leak
- phylink: ensure PHY momentary link-fails are handled
- eth: mlx5:
- fix null-ptr-deref in add rule err flow
- lock FTE when checking if active
- eth: dwmac-mediatek: fix inverted handling of mediatek,mac-wol
Previous releases - always broken:
- sched: fix u32's systematic failure to free IDR entries for hnodes.
- sctp: fix possible UAF in sctp_v6_available()
- eth: bonding: add ns target multicast address to slave device
- eth: mlx5: fix msix vectors to respect platform limit
- eth: icssg-prueth: fix 1 PPS sync"
* tag 'net-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits)
net: sched: u32: Add test case for systematic hnode IDR leaks
selftests: bonding: add ns multicast group testing
bonding: add ns target multicast address to slave device
net: ti: icssg-prueth: Fix 1 PPS sync
stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines
net: Make copy_safe_from_sockptr() match documentation
net: stmmac: dwmac-mediatek: Fix inverted handling of mediatek,mac-wol
ipmr: Fix access to mfc_cache_list without lock held
samples: pktgen: correct dev to DEV
net: phylink: ensure PHY momentary link-fails are handled
mptcp: pm: use _rcu variant under rcu_read_lock
mptcp: hold pm lock when deleting entry
mptcp: update local address flags when setting it
net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes.
MAINTAINERS: Re-add cancelled Renesas driver sections
Revert "igb: Disable threaded IRQ for igb_msix_other"
Bluetooth: btintel: Direct exception event to bluetooth stack
Bluetooth: hci_core: Fix calling mgmt_device_connected
virtio/vsock: Improve MSG_ZEROCOPY error handling
vsock: Fix sk_error_queue memory leak
...
|
|
Commit 4598380f9c54 ("bonding: fix ns validation on backup slaves")
tried to resolve the issue where backup slaves couldn't be brought up when
receiving IPv6 Neighbor Solicitation (NS) messages. However, this fix only
worked for drivers that receive all multicast messages, such as the veth
interface.
For standard drivers, the NS multicast message is silently dropped because
the slave device is not a member of the NS target multicast group.
To address this, we need to make the slave device join the NS target
multicast group, ensuring it can receive these IPv6 NS messages to validate
the slave’s status properly.
There are three policies before joining the multicast group:
1. All settings must be under active-backup mode (alb and tlb do not support
arp_validate), with backup slaves and slaves supporting multicast.
2. We can add or remove multicast groups when arp_validate changes.
3. Other operations, such as enslaving, releasing, or setting NS targets,
need to be guarded by arp_validate.
Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Pull bpf fixes from Daniel Borkmann:
- Fix a mismatching RCU unlock flavor in bpf_out_neigh_v6 (Jiawei Ye)
- Fix BPF sockmap with kTLS to reject vsock and unix sockets upon kTLS
context retrieval (Zijian Zhang)
- Fix BPF bits iterator selftest for s390x (Hou Tao)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix mismatched RCU unlock flavour in bpf_out_neigh_v6
bpf: Add sk_is_inet and IS_ICSK check in tls_sw_has_ctx_tx/rx
selftests/bpf: Use -4095 as the bad address for bits iterator
|
|
8c873e219970 ("netfilter: core: free hooks with call_rcu") removed
synchronize_net() call when unregistering basechain hook, however,
net_device removal event handler for the NFPROTO_NETDEV was not updated
to wait for RCU grace period.
Note that 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks
on net_device removal") does not remove basechain rules on device
removal, I was hinted to remove rules on net_device removal later, see
5ebe0b0eec9d ("netfilter: nf_tables: destroy basechain and rules on
netdevice removal").
Although NETDEV_UNREGISTER event is guaranteed to be handled after
synchronize_net() call, this path needs to wait for rcu grace period via
rcu callback to release basechain hooks if netns is alive because an
ongoing netlink dump could be in progress (sockets hold a reference on
the netns).
Note that nf_tables_pre_exit_net() unregisters and releases basechain
hooks but it is possible to see NETDEV_UNREGISTER at a later stage in
the netns exit path, eg. veth peer device in another netns:
cleanup_net()
default_device_exit_batch()
unregister_netdevice_many_notify()
notifier_call_chain()
nf_tables_netdev_event()
__nft_release_basechain()
In this particular case, same rule of thumb applies: if netns is alive,
then wait for rcu grace period because netlink dump in the other netns
could be in progress. Otherwise, if the other netns is going away then
no netlink dump can be in progress and basechain hooks can be released
inmediately.
While at it, turn WARN_ON() into WARN_ON_ONCE() for the basechain
validation, which should not ever happen.
Fixes: 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks on net_device removal")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
As the introduction of the support for vsock and unix sockets in sockmap,
tls_sw_has_ctx_tx/rx cannot presume the socket passed in must be IS_ICSK.
vsock and af_unix sockets have vsock_sock and unix_sock instead of
inet_connection_sock. For these sockets, tls_get_ctx may return an invalid
pointer and cause page fault in function tls_sw_ctx_rx.
BUG: unable to handle page fault for address: 0000000000040030
Workqueue: vsock-loopback vsock_loopback_work
RIP: 0010:sk_psock_strp_data_ready+0x23/0x60
Call Trace:
? __die+0x81/0xc3
? no_context+0x194/0x350
? do_page_fault+0x30/0x110
? async_page_fault+0x3e/0x50
? sk_psock_strp_data_ready+0x23/0x60
virtio_transport_recv_pkt+0x750/0x800
? update_load_avg+0x7e/0x620
vsock_loopback_work+0xd0/0x100
process_one_work+0x1a7/0x360
worker_thread+0x30/0x390
? create_worker+0x1a0/0x1a0
kthread+0x112/0x130
? __kthread_cancel_work+0x40/0x40
ret_from_fork+0x1f/0x40
v2:
- Add IS_ICSK check
v3:
- Update the commits in Fixes
Fixes: 634f1a7110b4 ("vsock: support sockmap")
Fixes: 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20241106003742.399240-1-zijianzhang@bytedance.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
There are code paths from which the function is called without holding
the RCU read lock, resulting in a suspicious RCU usage warning [1].
Fix by using l3mdev_master_upper_ifindex_by_index() which will acquire
the RCU read lock before calling
l3mdev_master_upper_ifindex_by_index_rcu().
[1]
WARNING: suspicious RCU usage
6.12.0-rc3-custom-gac8f72681cf2 #141 Not tainted
-----------------------------
net/core/dev.c:876 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ip/361:
#0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60
stack backtrace:
CPU: 3 UID: 0 PID: 361 Comm: ip Not tainted 6.12.0-rc3-custom-gac8f72681cf2 #141
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
<TASK>
dump_stack_lvl+0xba/0x110
lockdep_rcu_suspicious.cold+0x4f/0xd6
dev_get_by_index_rcu+0x1d3/0x210
l3mdev_master_upper_ifindex_by_index_rcu+0x2b/0xf0
ip_tunnel_bind_dev+0x72f/0xa00
ip_tunnel_newlink+0x368/0x7a0
ipgre_newlink+0x14c/0x170
__rtnl_newlink+0x1173/0x19c0
rtnl_newlink+0x6c/0xa0
rtnetlink_rcv_msg+0x3cc/0xf60
netlink_rcv_skb+0x171/0x450
netlink_unicast+0x539/0x7f0
netlink_sendmsg+0x8c1/0xd80
____sys_sendmsg+0x8f9/0xc20
___sys_sendmsg+0x197/0x1e0
__sys_sendmsg+0x122/0x1f0
do_syscall_64+0xbb/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: db53cd3d88dc ("net: Handle l3mdev in ip_tunnel_init_flow")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241022063822.462057-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
wireless fixes for v6.12-rc5
The first set of wireless fixes for v6.12. We have been busy and have
not been able to send this earlier, so there are more fixes than
usual. The fixes are all over, both in stack and in drivers, but
nothing special really standing out.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfiler, xfrm and bluetooth.
Oddly this includes a fix for a posix clock regression; in our
previous PR we included a change there as a pre-requisite for
networking one. That fix proved to be buggy and requires the follow-up
included here. Thomas suggested we should send it, given we sent the
buggy patch.
Current release - regressions:
- posix-clock: Fix unbalanced locking in pc_clock_settime()
- netfilter: fix typo causing some targets not to load on IPv6
Current release - new code bugs:
- xfrm: policy: remove last remnants of pernet inexact list
Previous releases - regressions:
- core: fix races in netdev_tx_sent_queue()/dev_watchdog()
- bluetooth: fix UAF on sco_sock_timeout
- eth: hv_netvsc: fix VF namespace also in synthetic NIC
NETDEV_REGISTER event
- eth: usbnet: fix name regression
- eth: be2net: fix potential memory leak in be_xmit()
- eth: plip: fix transmit path breakage
Previous releases - always broken:
- sched: deny mismatched skip_sw/skip_hw flags for actions created by
classifiers
- netfilter: bpf: must hold reference on net namespace
- eth: virtio_net: fix integer overflow in stats
- eth: bnxt_en: replace ptp_lock with irqsave variant
- eth: octeon_ep: add SKB allocation failures handling in
__octep_oq_process_rx()
Misc:
- MAINTAINERS: add Simon as an official reviewer"
* tag 'net-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits)
net: dsa: mv88e6xxx: support 4000ps cycle counter period
net: dsa: mv88e6xxx: read cycle counter period from hardware
net: dsa: mv88e6xxx: group cycle counter coefficients
net: usb: qmi_wwan: add Fibocom FG132 0x0112 composition
hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event
net: dsa: microchip: disable EEE for KSZ879x/KSZ877x/KSZ876x
Bluetooth: ISO: Fix UAF on iso_sock_timeout
Bluetooth: SCO: Fix UAF on sco_sock_timeout
Bluetooth: hci_core: Disable works on hci_unregister_dev
posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime()
r8169: avoid unsolicited interrupts
net: sched: use RCU read-side critical section in taprio_dump()
net: sched: fix use-after-free in taprio_change()
net/sched: act_api: deny mismatched skip_sw/skip_hw flags for actions created by classifiers
net: usb: usbnet: fix name regression
mlxsw: spectrum_router: fix xa_store() error checking
virtio_net: fix integer overflow in stats
net: fix races in netdev_tx_sent_queue()/dev_watchdog()
net: wwan: fix global oob in wwan_rtnl_policy
netfilter: xtables: fix typo causing some targets not to load on IPv6
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_core: Disable works on hci_unregister_dev
- SCO: Fix UAF on sco_sock_timeout
- ISO: Fix UAF on iso_sock_timeout
* tag 'for-net-2024-10-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: ISO: Fix UAF on iso_sock_timeout
Bluetooth: SCO: Fix UAF on sco_sock_timeout
Bluetooth: hci_core: Disable works on hci_unregister_dev
====================
Link: https://patch.msgid.link/20241023143005.2297694-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2024-10-22
1) Fix routing behavior that relies on L4 information
for xfrm encapsulated packets.
From Eyal Birger.
2) Remove leftovers of pernet policy_inexact lists.
From Florian Westphal.
3) Validate new SA's prefixlen when the selector family is
not set from userspace.
From Sabrina Dubroca.
4) Fix a kernel-infoleak when dumping an auth algorithm.
From Petr Vaganov.
Please pull or let me know if there are problems.
ipsec-2024-10-22
* tag 'ipsec-2024-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: fix one more kernel-infoleak in algo dumping
xfrm: validate new SA's prefixlen using SA family when sel.family is unset
xfrm: policy: remove last remnants of pernet inexact list
xfrm: respect ip protocols rules criteria when performing dst lookups
xfrm: extract dst lookup parameters into a struct
====================
Link: https://patch.msgid.link/20241022092226.654370-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
conn->sk maybe have been unlinked/freed while waiting for sco_conn_lock
so this checks if the conn->sk is still valid by checking if it part of
sco_sk_list.
Reported-by: syzbot+4c0d0c4cde787116d465@syzkaller.appspotmail.com
Tested-by: syzbot+4c0d0c4cde787116d465@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4c0d0c4cde787116d465
Fixes: ba316be1b6a0 ("Bluetooth: schedule SCO timeouts with delayed_work")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Pull bpf fixes from Daniel Borkmann:
- Fix BPF verifier to not affect subreg_def marks in its range
propagation (Eduard Zingerman)
- Fix a truncation bug in the BPF verifier's handling of
coerce_reg_to_size_sx (Dimitar Kanaliev)
- Fix the BPF verifier's delta propagation between linked registers
under 32-bit addition (Daniel Borkmann)
- Fix a NULL pointer dereference in BPF devmap due to missing rxq
information (Florian Kauer)
- Fix a memory leak in bpf_core_apply (Jiri Olsa)
- Fix an UBSAN-reported array-index-out-of-bounds in BTF parsing for
arrays of nested structs (Hou Tao)
- Fix build ID fetching where memory areas backing the file were
created with memfd_secret (Andrii Nakryiko)
- Fix BPF task iterator tid filtering which was incorrectly using pid
instead of tid (Jordan Rome)
- Several fixes for BPF sockmap and BPF sockhash redirection in
combination with vsocks (Michal Luczaj)
- Fix riscv BPF JIT and make BPF_CMPXCHG fully ordered (Andrea Parri)
- Fix riscv BPF JIT under CONFIG_CFI_CLANG to prevent the possibility
of an infinite BPF tailcall (Pu Lehui)
- Fix a build warning from resolve_btfids that bpf_lsm_key_free cannot
be resolved (Thomas Weißschuh)
- Fix a bug in kfunc BTF caching for modules where the wrong BTF object
was returned (Toke Høiland-Jørgensen)
- Fix a BPF selftest compilation error in cgroup-related tests with
musl libc (Tony Ambardar)
- Several fixes to BPF link info dumps to fill missing fields (Tyrone
Wu)
- Add BPF selftests for kfuncs from multiple modules, checking that the
correct kfuncs are called (Simon Sundberg)
- Ensure that internal and user-facing bpf_redirect flags don't overlap
(Toke Høiland-Jørgensen)
- Switch to use kvzmalloc to allocate BPF verifier environment (Rik van
Riel)
- Use raw_spinlock_t in BPF ringbuf to fix a sleep in atomic splat
under RT (Wander Lairson Costa)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (38 commits)
lib/buildid: Handle memfd_secret() files in build_id_parse()
selftests/bpf: Add test case for delta propagation
bpf: Fix print_reg_state's constant scalar dump
bpf: Fix incorrect delta propagation between linked registers
bpf: Properly test iter/task tid filtering
bpf: Fix iter/task tid filtering
riscv, bpf: Make BPF_CMPXCHG fully ordered
bpf, vsock: Drop static vsock_bpf_prot initialization
vsock: Update msg_count on read_skb()
vsock: Update rx_bytes on read_skb()
bpf, sockmap: SK_DROP on attempted redirects of unsupported af_vsock
selftests/bpf: Add asserts for netfilter link info
bpf: Fix link info netfilter flags to populate defrag flag
selftests/bpf: Add test for sign extension in coerce_subreg_to_size_sx()
selftests/bpf: Add test for truncation after sign extension in coerce_reg_to_size_sx()
bpf: Fix truncation bug in coerce_reg_to_size_sx()
selftests/bpf: Assert link info uprobe_multi count & path_size if unset
bpf: Fix unpopulated path_size when uprobe_multi fields unset
selftests/bpf: Fix cross-compiling urandom_read
selftests/bpf: Add test for kfunc module order
...
|
|
Don't mislead the callers of bpf_{sk,msg}_redirect_{map,hash}(): make sure
to immediately and visibly fail the forwarding of unsupported af_vsock
packets.
Fixes: 634f1a7110b4 ("vsock: support sockmap")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-1-d6577bbfe742@rbox.co
|
|
While running net selftests with CONFIG_PROVE_RCU_LIST=y I saw
one lockdep splat [1].
genlmsg_mcast() uses for_each_net_rcu(), and must therefore hold RCU.
Instead of letting all callers guard genlmsg_multicast_allns()
with a rcu_read_lock()/rcu_read_unlock() pair, do it in genlmsg_mcast().
This also means the @flags parameter is useless, we need to always use
GFP_ATOMIC.
[1]
[10882.424136] =============================
[10882.424166] WARNING: suspicious RCU usage
[10882.424309] 6.12.0-rc2-virtme #1156 Not tainted
[10882.424400] -----------------------------
[10882.424423] net/netlink/genetlink.c:1940 RCU-list traversed in non-reader section!!
[10882.424469]
other info that might help us debug this:
[10882.424500]
rcu_scheduler_active = 2, debug_locks = 1
[10882.424744] 2 locks held by ip/15677:
[10882.424791] #0: ffffffffb6b491b0 (cb_lock){++++}-{3:3}, at: genl_rcv (net/netlink/genetlink.c:1219)
[10882.426334] #1: ffffffffb6b49248 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg (net/netlink/genetlink.c:61 net/netlink/genetlink.c:57 net/netlink/genetlink.c:1209)
[10882.426465]
stack backtrace:
[10882.426805] CPU: 14 UID: 0 PID: 15677 Comm: ip Not tainted 6.12.0-rc2-virtme #1156
[10882.426919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[10882.427046] Call Trace:
[10882.427131] <TASK>
[10882.427244] dump_stack_lvl (lib/dump_stack.c:123)
[10882.427335] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822)
[10882.427387] genlmsg_multicast_allns (net/netlink/genetlink.c:1940 (discriminator 7) net/netlink/genetlink.c:1977 (discriminator 7))
[10882.427436] l2tp_tunnel_notify.constprop.0 (net/l2tp/l2tp_netlink.c:119) l2tp_netlink
[10882.427683] l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:253) l2tp_netlink
[10882.427748] genl_family_rcv_msg_doit (net/netlink/genetlink.c:1115)
[10882.427834] genl_rcv_msg (net/netlink/genetlink.c:1195 net/netlink/genetlink.c:1210)
[10882.427877] ? __pfx_l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:186) l2tp_netlink
[10882.427927] ? __pfx_genl_rcv_msg (net/netlink/genetlink.c:1201)
[10882.427959] netlink_rcv_skb (net/netlink/af_netlink.c:2551)
[10882.428069] genl_rcv (net/netlink/genetlink.c:1220)
[10882.428095] netlink_unicast (net/netlink/af_netlink.c:1332 net/netlink/af_netlink.c:1357)
[10882.428140] netlink_sendmsg (net/netlink/af_netlink.c:1901)
[10882.428210] ____sys_sendmsg (net/socket.c:729 (discriminator 1) net/socket.c:744 (discriminator 1) net/socket.c:2607 (discriminator 1))
Fixes: 33f72e6f0c67 ("l2tp : multicast notification to the registered listeners")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Chapman <jchapman@katalix.com>
Cc: Tom Parkin <tparkin@katalix.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20241011171217.3166614-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since introduced, mctp has been ignoring the returned value of
rtnl_register_module(), which could fail silently.
Handling the error allows users to view a module as an all-or-nothing
thing in terms of the rtnetlink functionality. This prevents syzkaller
from reporting spurious errors from its tests, where OOM often occurs
and module is automatically loaded.
Let's handle the errors by rtnl_register_many().
Fixes: 583be982d934 ("mctp: Add device handling and netlink interface")
Fixes: 831119f88781 ("mctp: Add neighbour netlink interface")
Fixes: 06d2f4c583a7 ("mctp: Add netlink route management")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Before commit addf9b90de22 ("net: rtnetlink: use rcu to free rtnl message
handlers"), once rtnl_msg_handlers[protocol] was allocated, the following
rtnl_register_module() for the same protocol never failed.
However, after the commit, rtnl_msg_handler[protocol][msgtype] needs to
be allocated in each rtnl_register_module(), so each call could fail.
Many callers of rtnl_register_module() do not handle the returned error,
and we need to add many error handlings.
To handle that easily, let's add wrapper functions for bulk registration
of rtnetlink message handlers.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Most qdiscs maintain their backlog using qdisc_pkt_len(skb)
on the assumption it is invariant between the enqueue()
and dequeue() handlers.
Unfortunately syzbot can crash a host rather easily using
a TBF + SFQ combination, with an STAB on SFQ [1]
We can't support TCA_STAB on arbitrary level, this would
require to maintain per-qdisc storage.
[1]
[ 88.796496] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 88.798611] #PF: supervisor read access in kernel mode
[ 88.799014] #PF: error_code(0x0000) - not-present page
[ 88.799506] PGD 0 P4D 0
[ 88.799829] Oops: Oops: 0000 [#1] SMP NOPTI
[ 88.800569] CPU: 14 UID: 0 PID: 2053 Comm: b371744477 Not tainted 6.12.0-rc1-virtme #1117
[ 88.801107] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 88.801779] RIP: 0010:sfq_dequeue (net/sched/sch_sfq.c:272 net/sched/sch_sfq.c:499) sch_sfq
[ 88.802544] Code: 0f b7 50 12 48 8d 04 d5 00 00 00 00 48 89 d6 48 29 d0 48 8b 91 c0 01 00 00 48 c1 e0 03 48 01 c2 66 83 7a 1a 00 7e c0 48 8b 3a <4c> 8b 07 4c 89 02 49 89 50 08 48 c7 47 08 00 00 00 00 48 c7 07 00
All code
========
0: 0f b7 50 12 movzwl 0x12(%rax),%edx
4: 48 8d 04 d5 00 00 00 lea 0x0(,%rdx,8),%rax
b: 00
c: 48 89 d6 mov %rdx,%rsi
f: 48 29 d0 sub %rdx,%rax
12: 48 8b 91 c0 01 00 00 mov 0x1c0(%rcx),%rdx
19: 48 c1 e0 03 shl $0x3,%rax
1d: 48 01 c2 add %rax,%rdx
20: 66 83 7a 1a 00 cmpw $0x0,0x1a(%rdx)
25: 7e c0 jle 0xffffffffffffffe7
27: 48 8b 3a mov (%rdx),%rdi
2a:* 4c 8b 07 mov (%rdi),%r8 <-- trapping instruction
2d: 4c 89 02 mov %r8,(%rdx)
30: 49 89 50 08 mov %rdx,0x8(%r8)
34: 48 c7 47 08 00 00 00 movq $0x0,0x8(%rdi)
3b: 00
3c: 48 rex.W
3d: c7 .byte 0xc7
3e: 07 (bad)
...
Code starting with the faulting instruction
===========================================
0: 4c 8b 07 mov (%rdi),%r8
3: 4c 89 02 mov %r8,(%rdx)
6: 49 89 50 08 mov %rdx,0x8(%r8)
a: 48 c7 47 08 00 00 00 movq $0x0,0x8(%rdi)
11: 00
12: 48 rex.W
13: c7 .byte 0xc7
14: 07 (bad)
...
[ 88.803721] RSP: 0018:ffff9a1f892b7d58 EFLAGS: 00000206
[ 88.804032] RAX: 0000000000000000 RBX: ffff9a1f8420c800 RCX: ffff9a1f8420c800
[ 88.804560] RDX: ffff9a1f81bc1440 RSI: 0000000000000000 RDI: 0000000000000000
[ 88.805056] RBP: ffffffffc04bb0e0 R08: 0000000000000001 R09: 00000000ff7f9a1f
[ 88.805473] R10: 000000000001001b R11: 0000000000009a1f R12: 0000000000000140
[ 88.806194] R13: 0000000000000001 R14: ffff9a1f886df400 R15: ffff9a1f886df4ac
[ 88.806734] FS: 00007f445601a740(0000) GS:ffff9a2e7fd80000(0000) knlGS:0000000000000000
[ 88.807225] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 88.807672] CR2: 0000000000000000 CR3: 000000050cc46000 CR4: 00000000000006f0
[ 88.808165] Call Trace:
[ 88.808459] <TASK>
[ 88.808710] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
[ 88.809261] ? page_fault_oops (arch/x86/mm/fault.c:715)
[ 88.809561] ? exc_page_fault (./arch/x86/include/asm/irqflags.h:26 ./arch/x86/include/asm/irqflags.h:87 ./arch/x86/include/asm/irqflags.h:147 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539)
[ 88.809806] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623)
[ 88.810074] ? sfq_dequeue (net/sched/sch_sfq.c:272 net/sched/sch_sfq.c:499) sch_sfq
[ 88.810411] sfq_reset (net/sched/sch_sfq.c:525) sch_sfq
[ 88.810671] qdisc_reset (./include/linux/skbuff.h:2135 ./include/linux/skbuff.h:2441 ./include/linux/skbuff.h:3304 ./include/linux/skbuff.h:3310 net/sched/sch_generic.c:1036)
[ 88.810950] tbf_reset (./include/linux/timekeeping.h:169 net/sched/sch_tbf.c:334) sch_tbf
[ 88.811208] qdisc_reset (./include/linux/skbuff.h:2135 ./include/linux/skbuff.h:2441 ./include/linux/skbuff.h:3304 ./include/linux/skbuff.h:3310 net/sched/sch_generic.c:1036)
[ 88.811484] netif_set_real_num_tx_queues (./include/linux/spinlock.h:396 ./include/net/sch_generic.h:768 net/core/dev.c:2958)
[ 88.811870] __tun_detach (drivers/net/tun.c:590 drivers/net/tun.c:673)
[ 88.812271] tun_chr_close (drivers/net/tun.c:702 drivers/net/tun.c:3517)
[ 88.812505] __fput (fs/file_table.c:432 (discriminator 1))
[ 88.812735] task_work_run (kernel/task_work.c:230)
[ 88.813016] do_exit (kernel/exit.c:940)
[ 88.813372] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:58 (discriminator 4))
[ 88.813639] ? handle_mm_fault (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/memcontrol.h:1022 ./include/linux/memcontrol.h:1045 ./include/linux/memcontrol.h:1052 mm/memory.c:5928 mm/memory.c:6088)
[ 88.813867] do_group_exit (kernel/exit.c:1070)
[ 88.814138] __x64_sys_exit_group (kernel/exit.c:1099)
[ 88.814490] x64_sys_call (??:?)
[ 88.814791] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
[ 88.815012] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[ 88.815495] RIP: 0033:0x7f44560f1975
Fixes: 175f9c1bba9b ("net_sched: Add size table for qdiscs")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20241007184130.3960565-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.
So, in order to avoid ending up with a flexible-array member in the
middle of multiple other structs, we use the `__struct_group()`
helper to create a new tagged `struct ieee80211_radiotap_header_fixed`.
This structure groups together all the members of the flexible
`struct ieee80211_radiotap_header` except the flexible array.
As a result, the array is effectively separated from the rest of the
members without modifying the memory layout of the flexible structure.
We then change the type of the middle struct members currently causing
trouble from `struct ieee80211_radiotap_header` to `struct
ieee80211_radiotap_header_fixed`.
We also want to ensure that in case new members need to be added to the
flexible structure, they are always included within the newly created
tagged struct. For this, we use `static_assert()`. This ensures that the
memory layout for both the flexible structure and the new tagged struct
is the same after any changes.
This approach avoids having to implement `struct ieee80211_radiotap_header_fixed`
as a completely separate structure, thus preventing having to maintain
two independent but basically identical structures, closing the door
to potential bugs in the future.
So, with these changes, fix the following warnings:
drivers/net/wireless/ath/wil6210/txrx.c:309:50: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/intel/ipw2x00/ipw2100.c:2521:50: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/intel/ipw2x00/ipw2200.h:1146:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/intel/ipw2x00/libipw.h:595:36: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/marvell/libertas/radiotap.h:34:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/marvell/libertas/radiotap.h:5:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/microchip/wilc1000/mon.c:10:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/microchip/wilc1000/mon.c:15:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/virtual/mac80211_hwsim.c:758:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/wireless/virtual/mac80211_hwsim.c:767:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/ZwBMtBZKcrzwU7l4@kspp
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add wiphy_delayed_work_pending() to check if any delayed work timer is
pending, that can be used to be sure that wiphy_delayed_work_queue()
won't postpone an already pending delayed work.
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Link: https://patch.msgid.link/20240924192805.13859-2-repk@triplefau.lt
[fix return value kernel-doc]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The kernel may crash when deleting a genetlink family if there are still
listeners for that family:
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c000000000c080bc] netlink_update_socket_mc+0x3c/0xc0
LR [c000000000c0f764] __netlink_clear_multicast_users+0x74/0xc0
Call Trace:
__netlink_clear_multicast_users+0x74/0xc0
genl_unregister_family+0xd4/0x2d0
Change the unsafe loop on the list to a safe one, because inside the
loop there is an element removal from this list.
Fixes: b8273570f802 ("genetlink: fix netns vs. netlink table locking (2)")
Cc: stable@vger.kernel.org
Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com>
Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241003104431.12391-1-a.kovaleva@yadro.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
|
|
xfrm_net still contained the no-longer-used inexact policy list heads,
remove them.
Fixes: a54ad727f745 ("xfrm: policy: remove remaining use of inexact list")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
We have some machines running stock Ubuntu 20.04.6 which is their 5.4.0-174-generic
kernel that are running ceph and recently hit a null ptr dereference in
tcp_rearm_rto(). Initially hitting it from the TLP path, but then later we also
saw it getting hit from the RACK case as well. Here are examples of the oops
messages we saw in each of those cases:
Jul 26 15:05:02 rx [11061395.780353] BUG: kernel NULL pointer dereference, address: 0000000000000020
Jul 26 15:05:02 rx [11061395.787572] #PF: supervisor read access in kernel mode
Jul 26 15:05:02 rx [11061395.792971] #PF: error_code(0x0000) - not-present page
Jul 26 15:05:02 rx [11061395.798362] PGD 0 P4D 0
Jul 26 15:05:02 rx [11061395.801164] Oops: 0000 [#1] SMP NOPTI
Jul 26 15:05:02 rx [11061395.805091] CPU: 0 PID: 9180 Comm: msgr-worker-1 Tainted: G W 5.4.0-174-generic #193-Ubuntu
Jul 26 15:05:02 rx [11061395.814996] Hardware name: Supermicro SMC 2x26 os-gen8 64C NVME-Y 256G/H12SSW-NTR, BIOS 2.5.V1.2U.NVMe.UEFI 05/09/2023
Jul 26 15:05:02 rx [11061395.825952] RIP: 0010:tcp_rearm_rto+0xe4/0x160
Jul 26 15:05:02 rx [11061395.830656] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d 5d c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef e8 0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7 e3
Jul 26 15:05:02 rx [11061395.849665] RSP: 0018:ffffb75d40003e08 EFLAGS: 00010246
Jul 26 15:05:02 rx [11061395.855149] RAX: 0000000000000000 RBX: 20c49ba5e353f7cf RCX: 0000000000000000
Jul 26 15:05:02 rx [11061395.862542] RDX: 0000000062177c30 RSI: 000000000000231c RDI: ffff9874ad283a60
Jul 26 15:05:02 rx [11061395.869933] RBP: ffffb75d40003e20 R08: 0000000000000000 R09: ffff987605e20aa8
Jul 26 15:05:02 rx [11061395.877318] R10: ffffb75d40003f00 R11: ffffb75d4460f740 R12: ffff9874ad283900
Jul 26 15:05:02 rx [11061395.884710] R13: ffff9874ad283a60 R14: ffff9874ad283980 R15: ffff9874ad283d30
Jul 26 15:05:02 rx [11061395.892095] FS: 00007f1ef4a2e700(0000) GS:ffff987605e00000(0000) knlGS:0000000000000000
Jul 26 15:05:02 rx [11061395.900438] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 26 15:05:02 rx [11061395.906435] CR2: 0000000000000020 CR3: 0000003e450ba003 CR4: 0000000000760ef0
Jul 26 15:05:02 rx [11061395.913822] PKRU: 55555554
Jul 26 15:05:02 rx [11061395.916786] Call Trace:
Jul 26 15:05:02 rx [11061395.919488]
Jul 26 15:05:02 rx [11061395.921765] ? show_regs.cold+0x1a/0x1f
Jul 26 15:05:02 rx [11061395.925859] ? __die+0x90/0xd9
Jul 26 15:05:02 rx [11061395.929169] ? no_context+0x196/0x380
Jul 26 15:05:02 rx [11061395.933088] ? ip6_protocol_deliver_rcu+0x4e0/0x4e0
Jul 26 15:05:02 rx [11061395.938216] ? ip6_sublist_rcv_finish+0x3d/0x50
Jul 26 15:05:02 rx [11061395.943000] ? __bad_area_nosemaphore+0x50/0x1a0
Jul 26 15:05:02 rx [11061395.947873] ? bad_area_nosemaphore+0x16/0x20
Jul 26 15:05:02 rx [11061395.952486] ? do_user_addr_fault+0x267/0x450
Jul 26 15:05:02 rx [11061395.957104] ? ipv6_list_rcv+0x112/0x140
Jul 26 15:05:02 rx [11061395.961279] ? __do_page_fault+0x58/0x90
Jul 26 15:05:02 rx [11061395.965458] ? do_page_fault+0x2c/0xe0
Jul 26 15:05:02 rx [11061395.969465] ? page_fault+0x34/0x40
Jul 26 15:05:02 rx [11061395.973217] ? tcp_rearm_rto+0xe4/0x160
Jul 26 15:05:02 rx [11061395.977313] ? tcp_rearm_rto+0xe4/0x160
Jul 26 15:05:02 rx [11061395.981408] tcp_send_loss_probe+0x10b/0x220
Jul 26 15:05:02 rx [11061395.985937] tcp_write_timer_handler+0x1b4/0x240
Jul 26 15:05:02 rx [11061395.990809] tcp_write_timer+0x9e/0xe0
Jul 26 15:05:02 rx [11061395.994814] ? tcp_write_timer_handler+0x240/0x240
Jul 26 15:05:02 rx [11061395.999866] call_timer_fn+0x32/0x130
Jul 26 15:05:02 rx [11061396.003782] __run_timers.part.0+0x180/0x280
Jul 26 15:05:02 rx [11061396.008309] ? recalibrate_cpu_khz+0x10/0x10
Jul 26 15:05:02 rx [11061396.012841] ? native_x2apic_icr_write+0x30/0x30
Jul 26 15:05:02 rx [11061396.017718] ? lapic_next_event+0x21/0x30
Jul 26 15:05:02 rx [11061396.021984] ? clockevents_program_event+0x8f/0xe0
Jul 26 15:05:02 rx [11061396.027035] run_timer_softirq+0x2a/0x50
Jul 26 15:05:02 rx [11061396.031212] __do_softirq+0xd1/0x2c1
Jul 26 15:05:02 rx [11061396.035044] do_softirq_own_stack+0x2a/0x40
Jul 26 15:05:02 rx [11061396.039480]
Jul 26 15:05:02 rx [11061396.041840] do_softirq.part.0+0x46/0x50
Jul 26 15:05:02 rx [11061396.046022] __local_bh_enable_ip+0x50/0x60
Jul 26 15:05:02 rx [11061396.050460] _raw_spin_unlock_bh+0x1e/0x20
Jul 26 15:05:02 rx [11061396.054817] nf_conntrack_tcp_packet+0x29e/0xbe0 [nf_conntrack]
Jul 26 15:05:02 rx [11061396.060994] ? get_l4proto+0xe7/0x190 [nf_conntrack]
Jul 26 15:05:02 rx [11061396.066220] nf_conntrack_in+0xe9/0x670 [nf_conntrack]
Jul 26 15:05:02 rx [11061396.071618] ipv6_conntrack_local+0x14/0x20 [nf_conntrack]
Jul 26 15:05:02 rx [11061396.077356] nf_hook_slow+0x45/0xb0
Jul 26 15:05:02 rx [11061396.081098] ip6_xmit+0x3f0/0x5d0
Jul 26 15:05:02 rx [11061396.084670] ? ipv6_anycast_cleanup+0x50/0x50
Jul 26 15:05:02 rx [11061396.089282] ? __sk_dst_check+0x38/0x70
Jul 26 15:05:02 rx [11061396.093381] ? inet6_csk_route_socket+0x13b/0x200
Jul 26 15:05:02 rx [11061396.098346] inet6_csk_xmit+0xa7/0xf0
Jul 26 15:05:02 rx [11061396.102263] __tcp_transmit_skb+0x550/0xb30
Jul 26 15:05:02 rx [11061396.106701] tcp_write_xmit+0x3c6/0xc20
Jul 26 15:05:02 rx [11061396.110792] ? __alloc_skb+0x98/0x1d0
Jul 26 15:05:02 rx [11061396.114708] __tcp_push_pending_frames+0x37/0x100
Jul 26 15:05:02 rx [11061396.119667] tcp_push+0xfd/0x100
Jul 26 15:05:02 rx [11061396.123150] tcp_sendmsg_locked+0xc70/0xdd0
Jul 26 15:05:02 rx [11061396.127588] tcp_sendmsg+0x2d/0x50
Jul 26 15:05:02 rx [11061396.131245] inet6_sendmsg+0x43/0x70
Jul 26 15:05:02 rx [11061396.135075] __sock_sendmsg+0x48/0x70
Jul 26 15:05:02 rx [11061396.138994] ____sys_sendmsg+0x212/0x280
Jul 26 15:05:02 rx [11061396.143172] ___sys_sendmsg+0x88/0xd0
Jul 26 15:05:02 rx [11061396.147098] ? __seccomp_filter+0x7e/0x6b0
Jul 26 15:05:02 rx [11061396.151446] ? __switch_to+0x39c/0x460
Jul 26 15:05:02 rx [11061396.155453] ? __switch_to_asm+0x42/0x80
Jul 26 15:05:02 rx [11061396.159636] ? __switch_to_asm+0x5a/0x80
Jul 26 15:05:02 rx [11061396.163816] __sys_sendmsg+0x5c/0xa0
Jul 26 15:05:02 rx [11061396.167647] __x64_sys_sendmsg+0x1f/0x30
Jul 26 15:05:02 rx [11061396.171832] do_syscall_64+0x57/0x190
Jul 26 15:05:02 rx [11061396.175748] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
Jul 26 15:05:02 rx [11061396.181055] RIP: 0033:0x7f1ef692618d
Jul 26 15:05:02 rx [11061396.184893] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 ca ee ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2f 44 89 c7 48 89 44 24 08 e8 fe ee ff ff 48
Jul 26 15:05:02 rx [11061396.203889] RSP: 002b:00007f1ef4a26aa0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
Jul 26 15:05:02 rx [11061396.211708] RAX: ffffffffffffffda RBX: 000000000000084b RCX: 00007f1ef692618d
Jul 26 15:05:02 rx [11061396.219091] RDX: 0000000000004000 RSI: 00007f1ef4a26b10 RDI: 0000000000000275
Jul 26 15:05:02 rx [11061396.226475] RBP: 0000000000004000 R08: 0000000000000000 R09: 0000000000000020
Jul 26 15:05:02 rx [11061396.233859] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000084b
Jul 26 15:05:02 rx [11061396.241243] R13: 00007f1ef4a26b10 R14: 0000000000000275 R15: 000055592030f1e8
Jul 26 15:05:02 rx [11061396.248628] Modules linked in: vrf bridge stp llc vxlan ip6_udp_tunnel udp_tunnel nls_iso8859_1 amd64_edac_mod edac_mce_amd kvm_amd kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper wmi_bmof ipmi_ssif input_leds joydev rndis_host cdc_ether usbnet mii ast drm_vram_helper ttm drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt ccp mac_hid ipmi_si ipmi_devintf ipmi_msghandler nft_ct sch_fq_codel nf_tables_set nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ramoops reed_solomon efi_pstore drm ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear mlx5_ib ib_uverbs ib_core raid1 mlx5_core hid_generic pci_hyperv_intf crc32_pclmul tls usbhid ahci mlxfw bnxt_en libahci hid nvme i2c_piix4 nvme_core wmi
Jul 26 15:05:02 rx [11061396.324334] CR2: 0000000000000020
Jul 26 15:05:02 rx [11061396.327944] ---[ end trace 68a2b679d1cfb4f1 ]---
Jul 26 15:05:02 rx [11061396.433435] RIP: 0010:tcp_rearm_rto+0xe4/0x160
Jul 26 15:05:02 rx [11061396.438137] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d 5d c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef e8 0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7 e3
Jul 26 15:05:02 rx [11061396.457144] RSP: 0018:ffffb75d40003e08 EFLAGS: 00010246
Jul 26 15:05:02 rx [11061396.462629] RAX: 0000000000000000 RBX: 20c49ba5e353f7cf RCX: 0000000000000000
Jul 26 15:05:02 rx [11061396.470012] RDX: 0000000062177c30 RSI: 000000000000231c RDI: ffff9874ad283a60
Jul 26 15:05:02 rx [11061396.477396] RBP: ffffb75d40003e20 R08: 0000000000000000 R09: ffff987605e20aa8
Jul 26 15:05:02 rx [11061396.484779] R10: ffffb75d40003f00 R11: ffffb75d4460f740 R12: ffff9874ad283900
Jul 26 15:05:02 rx [11061396.492164] R13: ffff9874ad283a60 R14: ffff9874ad283980 R15: ffff9874ad283d30
Jul 26 15:05:02 rx [11061396.499547] FS: 00007f1ef4a2e700(0000) GS:ffff987605e00000(0000) knlGS:0000000000000000
Jul 26 15:05:02 rx [11061396.507886] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 26 15:05:02 rx [11061396.513884] CR2: 0000000000000020 CR3: 0000003e450ba003 CR4: 0000000000760ef0
Jul 26 15:05:02 rx [11061396.521267] PKRU: 55555554
Jul 26 15:05:02 rx [11061396.524230] Kernel panic - not syncing: Fatal exception in interrupt
Jul 26 15:05:02 rx [11061396.530885] Kernel Offset: 0x1b200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Jul 26 15:05:03 rx [11061396.660181] ---[ end Kernel panic - not syncing: Fatal
exception in interrupt ]---
After we hit this we disabled TLP by setting tcp_early_retrans to 0 and then hit the crash in the RACK case:
Aug 7 07:26:16 rx [1006006.265582] BUG: kernel NULL pointer dereference, address: 0000000000000020
Aug 7 07:26:16 rx [1006006.272719] #PF: supervisor read access in kernel mode
Aug 7 07:26:16 rx [1006006.278030] #PF: error_code(0x0000) - not-present page
Aug 7 07:26:16 rx [1006006.283343] PGD 0 P4D 0
Aug 7 07:26:16 rx [1006006.286057] Oops: 0000 [#1] SMP NOPTI
Aug 7 07:26:16 rx [1006006.289896] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W 5.4.0-174-generic #193-Ubuntu
Aug 7 07:26:16 rx [1006006.299107] Hardware name: Supermicro SMC 2x26 os-gen8 64C NVME-Y 256G/H12SSW-NTR, BIOS 2.5.V1.2U.NVMe.UEFI 05/09/2023
Aug 7 07:26:16 rx [1006006.309970] RIP: 0010:tcp_rearm_rto+0xe4/0x160
Aug 7 07:26:16 rx [1006006.314584] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d 5d c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef e8 0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7 e3
Aug 7 07:26:16 rx [1006006.333499] RSP: 0018:ffffb42600a50960 EFLAGS: 00010246
Aug 7 07:26:16 rx [1006006.338895] RAX: 0000000000000000 RBX: 20c49ba5e353f7cf RCX: 0000000000000000
Aug 7 07:26:16 rx [1006006.346193] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff92d687ed8160
Aug 7 07:26:16 rx [1006006.353489] RBP: ffffb42600a50978 R08: 0000000000000000 R09: 00000000cd896dcc
Aug 7 07:26:16 rx [1006006.360786] R10: ffff92dc3404f400 R11: 0000000000000001 R12: ffff92d687ed8000
Aug 7 07:26:16 rx [1006006.368084] R13: ffff92d687ed8160 R14: 00000000cd896dcc R15: 00000000cd8fca81
Aug 7 07:26:16 rx [1006006.375381] FS: 0000000000000000(0000) GS:ffff93158ad40000(0000) knlGS:0000000000000000
Aug 7 07:26:16 rx [1006006.383632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 7 07:26:16 rx [1006006.389544] CR2: 0000000000000020 CR3: 0000003e775ce006 CR4: 0000000000760ee0
Aug 7 07:26:16 rx [1006006.396839] PKRU: 55555554
Aug 7 07:26:16 rx [1006006.399717] Call Trace:
Aug 7 07:26:16 rx [1006006.402335]
Aug 7 07:26:16 rx [1006006.404525] ? show_regs.cold+0x1a/0x1f
Aug 7 07:26:16 rx [1006006.408532] ? __die+0x90/0xd9
Aug 7 07:26:16 rx [1006006.411760] ? no_context+0x196/0x380
Aug 7 07:26:16 rx [1006006.415599] ? __bad_area_nosemaphore+0x50/0x1a0
Aug 7 07:26:16 rx [1006006.420392] ? _raw_spin_lock+0x1e/0x30
Aug 7 07:26:16 rx [1006006.424401] ? bad_area_nosemaphore+0x16/0x20
Aug 7 07:26:16 rx [1006006.428927] ? do_user_addr_fault+0x267/0x450
Aug 7 07:26:16 rx [1006006.433450] ? __do_page_fault+0x58/0x90
Aug 7 07:26:16 rx [1006006.437542] ? do_page_fault+0x2c/0xe0
Aug 7 07:26:16 rx [1006006.441470] ? page_fault+0x34/0x40
Aug 7 07:26:16 rx [1006006.445134] ? tcp_rearm_rto+0xe4/0x160
Aug 7 07:26:16 rx [1006006.449145] tcp_ack+0xa32/0xb30
Aug 7 07:26:16 rx [1006006.452542] tcp_rcv_established+0x13c/0x670
Aug 7 07:26:16 rx [1006006.456981] ? sk_filter_trim_cap+0x48/0x220
Aug 7 07:26:16 rx [1006006.461419] tcp_v6_do_rcv+0xdb/0x450
Aug 7 07:26:16 rx [1006006.465257] tcp_v6_rcv+0xc2b/0xd10
Aug 7 07:26:16 rx [1006006.468918] ip6_protocol_deliver_rcu+0xd3/0x4e0
Aug 7 07:26:16 rx [1006006.473706] ip6_input_finish+0x15/0x20
Aug 7 07:26:16 rx [1006006.477710] ip6_input+0xa2/0xb0
Aug 7 07:26:16 rx [1006006.481109] ? ip6_protocol_deliver_rcu+0x4e0/0x4e0
Aug 7 07:26:16 rx [1006006.486151] ip6_sublist_rcv_finish+0x3d/0x50
Aug 7 07:26:16 rx [1006006.490679] ip6_sublist_rcv+0x1aa/0x250
Aug 7 07:26:16 rx [1006006.494779] ? ip6_rcv_finish_core.isra.0+0xa0/0xa0
Aug 7 07:26:16 rx [1006006.499828] ipv6_list_rcv+0x112/0x140
Aug 7 07:26:16 rx [1006006.503748] __netif_receive_skb_list_core+0x1a4/0x250
Aug 7 07:26:16 rx [1006006.509057] netif_receive_skb_list_internal+0x1a1/0x2b0
Aug 7 07:26:16 rx [1006006.514538] gro_normal_list.part.0+0x1e/0x40
Aug 7 07:26:16 rx [1006006.519068] napi_complete_done+0x91/0x130
Aug 7 07:26:16 rx [1006006.523352] mlx5e_napi_poll+0x18e/0x610 [mlx5_core]
Aug 7 07:26:16 rx [1006006.528481] net_rx_action+0x142/0x390
Aug 7 07:26:16 rx [1006006.532398] __do_softirq+0xd1/0x2c1
Aug 7 07:26:16 rx [1006006.536142] irq_exit+0xae/0xb0
Aug 7 07:26:16 rx [1006006.539452] do_IRQ+0x5a/0xf0
Aug 7 07:26:16 rx [1006006.542590] common_interrupt+0xf/0xf
Aug 7 07:26:16 rx [1006006.546421]
Aug 7 07:26:16 rx [1006006.548695] RIP: 0010:native_safe_halt+0xe/0x10
Aug 7 07:26:16 rx [1006006.553399] Code: 7b ff ff ff eb bd 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 36 2c 50 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 26 2c 50 00 fb f4 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 dd 5e 61 ff 65
Aug 7 07:26:16 rx [1006006.572309] RSP: 0018:ffffb42600177e70 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffc2
Aug 7 07:26:16 rx [1006006.580040] RAX: ffffffff8ed08b20 RBX: 0000000000000005 RCX: 0000000000000001
Aug 7 07:26:16 rx [1006006.587337] RDX: 00000000f48eeca2 RSI: 0000000000000082 RDI: 0000000000000082
Aug 7 07:26:16 rx [1006006.594635] RBP: ffffb42600177e90 R08: 0000000000000000 R09: 000000000000020f
Aug 7 07:26:16 rx [1006006.601931] R10: 0000000000100000 R11: 0000000000000000 R12: 0000000000000005
Aug 7 07:26:16 rx [1006006.609229] R13: ffff93157deb5f00 R14: 0000000000000000 R15: 0000000000000000
Aug 7 07:26:16 rx [1006006.616530] ? __cpuidle_text_start+0x8/0x8
Aug 7 07:26:16 rx [1006006.620886] ? default_idle+0x20/0x140
Aug 7 07:26:16 rx [1006006.624804] arch_cpu_idle+0x15/0x20
Aug 7 07:26:16 rx [1006006.628545] default_idle_call+0x23/0x30
Aug 7 07:26:16 rx [1006006.632640] do_idle+0x1fb/0x270
Aug 7 07:26:16 rx [1006006.636035] cpu_startup_entry+0x20/0x30
Aug 7 07:26:16 rx [1006006.640126] start_secondary+0x178/0x1d0
Aug 7 07:26:16 rx [1006006.644218] secondary_startup_64+0xa4/0xb0
Aug 7 07:26:17 rx [1006006.648568] Modules linked in: vrf bridge stp llc vxlan ip6_udp_tunnel udp_tunnel nls_iso8859_1 nft_ct amd64_edac_mod edac_mce_amd kvm_amd kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper wmi_bmof ipmi_ssif input_leds joydev rndis_host cdc_ether usbnet ast mii drm_vram_helper ttm drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt ccp mac_hid ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel nf_tables_set nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ramoops reed_solomon efi_pstore drm ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear mlx5_ib ib_uverbs ib_core raid1 hid_generic mlx5_core pci_hyperv_intf crc32_pclmul usbhid ahci tls mlxfw bnxt_en hid libahci nvme i2c_piix4 nvme_core wmi [last unloaded: cpuid]
Aug 7 07:26:17 rx [1006006.726180] CR2: 0000000000000020
Aug 7 07:26:17 rx [1006006.729718] ---[ end trace e0e2e37e4e612984 ]---
Prior to seeing the first crash and on other machines we also see the warning in
tcp_send_loss_probe() where packets_out is non-zero, but both transmit and retrans
queues are empty so we know the box is seeing some accounting issue in this area:
Jul 26 09:15:27 kernel: ------------[ cut here ]------------
Jul 26 09:15:27 kernel: invalid inflight: 2 state 1 cwnd 68 mss 8988
Jul 26 09:15:27 kernel: WARNING: CPU: 16 PID: 0 at net/ipv4/tcp_output.c:2605 tcp_send_loss_probe+0x214/0x220
Jul 26 09:15:27 kernel: Modules linked in: vrf bridge stp llc vxlan ip6_udp_tunnel udp_tunnel nls_iso8859_1 nft_ct amd64_edac_mod edac_mce_amd kvm_amd kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper wmi_bmof ipmi_ssif joydev input_leds rndis_host cdc_ether usbnet mii ast drm_vram_helper ttm drm_kms_he>
Jul 26 09:15:27 kernel: CPU: 16 PID: 0 Comm: swapper/16 Not tainted 5.4.0-174-generic #193-Ubuntu
Jul 26 09:15:27 kernel: Hardware name: Supermicro SMC 2x26 os-gen8 64C NVME-Y 256G/H12SSW-NTR, BIOS 2.5.V1.2U.NVMe.UEFI 05/09/2023
Jul 26 09:15:27 kernel: RIP: 0010:tcp_send_loss_probe+0x214/0x220
Jul 26 09:15:27 kernel: Code: 08 26 01 00 75 e2 41 0f b6 54 24 12 41 8b 8c 24 c0 06 00 00 45 89 f0 48 c7 c7 e0 b4 20 a7 c6 05 8d 08 26 01 01 e8 4a c0 0f 00 <0f> 0b eb ba 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
Jul 26 09:15:27 kernel: RSP: 0018:ffffb7838088ce00 EFLAGS: 00010286
Jul 26 09:15:27 kernel: RAX: 0000000000000000 RBX: ffff9b84b5630430 RCX: 0000000000000006
Jul 26 09:15:27 kernel: RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff9b8e4621c8c0
Jul 26 09:15:27 kernel: RBP: ffffb7838088ce18 R08: 0000000000000927 R09: 0000000000000004
Jul 26 09:15:27 kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff9b84b5630000
Jul 26 09:15:27 kernel: R13: 0000000000000000 R14: 000000000000231c R15: ffff9b84b5630430
Jul 26 09:15:27 kernel: FS: 0000000000000000(0000) GS:ffff9b8e46200000(0000) knlGS:0000000000000000
Jul 26 09:15:27 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 26 09:15:27 kernel: CR2: 000056238cec2380 CR3: 0000003e49ede005 CR4: 0000000000760ee0
Jul 26 09:15:27 kernel: PKRU: 55555554
Jul 26 09:15:27 kernel: Call Trace:
Jul 26 09:15:27 kernel: <IRQ>
Jul 26 09:15:27 kernel: ? show_regs.cold+0x1a/0x1f
Jul 26 09:15:27 kernel: ? __warn+0x98/0xe0
Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220
Jul 26 09:15:27 kernel: ? report_bug+0xd1/0x100
Jul 26 09:15:27 kernel: ? do_error_trap+0x9b/0xc0
Jul 26 09:15:27 kernel: ? do_invalid_op+0x3c/0x50
Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220
Jul 26 09:15:27 kernel: ? invalid_op+0x1e/0x30
Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220
Jul 26 09:15:27 kernel: tcp_write_timer_handler+0x1b4/0x240
Jul 26 09:15:27 kernel: tcp_write_timer+0x9e/0xe0
Jul 26 09:15:27 kernel: ? tcp_write_timer_handler+0x240/0x240
Jul 26 09:15:27 kernel: call_timer_fn+0x32/0x130
Jul 26 09:15:27 kernel: __run_timers.part.0+0x180/0x280
Jul 26 09:15:27 kernel: ? timerqueue_add+0x9b/0xb0
Jul 26 09:15:27 kernel: ? enqueue_hrtimer+0x3d/0x90
Jul 26 09:15:27 kernel: ? do_error_trap+0x9b/0xc0
Jul 26 09:15:27 kernel: ? do_invalid_op+0x3c/0x50
Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220
Jul 26 09:15:27 kernel: ? invalid_op+0x1e/0x30
Jul 26 09:15:27 kernel: ? tcp_send_loss_probe+0x214/0x220
Jul 26 09:15:27 kernel: tcp_write_timer_handler+0x1b4/0x240
Jul 26 09:15:27 kernel: tcp_write_timer+0x9e/0xe0
Jul 26 09:15:27 kernel: ? tcp_write_timer_handler+0x240/0x240
Jul 26 09:15:27 kernel: call_timer_fn+0x32/0x130
Jul 26 09:15:27 kernel: __run_timers.part.0+0x180/0x280
Jul 26 09:15:27 kernel: ? timerqueue_add+0x9b/0xb0
Jul 26 09:15:27 kernel: ? enqueue_hrtimer+0x3d/0x90
Jul 26 09:15:27 kernel: ? recalibrate_cpu_khz+0x10/0x10
Jul 26 09:15:27 kernel: ? ktime_get+0x3e/0xa0
Jul 26 09:15:27 kernel: ? native_x2apic_icr_write+0x30/0x30
Jul 26 09:15:27 kernel: run_timer_softirq+0x2a/0x50
Jul 26 09:15:27 kernel: __do_softirq+0xd1/0x2c1
Jul 26 09:15:27 kernel: irq_exit+0xae/0xb0
Jul 26 09:15:27 kernel: smp_apic_timer_interrupt+0x7b/0x140
Jul 26 09:15:27 kernel: apic_timer_interrupt+0xf/0x20
Jul 26 09:15:27 kernel: </IRQ>
Jul 26 09:15:27 kernel: RIP: 0010:native_safe_halt+0xe/0x10
Jul 26 09:15:27 kernel: Code: 7b ff ff ff eb bd 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 36 2c 50 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 26 2c 50 00 fb f4 <c3> 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 e8 dd 5e 61 ff 65
Jul 26 09:15:27 kernel: RSP: 0018:ffffb783801cfe70 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
Jul 26 09:15:27 kernel: RAX: ffffffffa6908b20 RBX: 0000000000000010 RCX: 0000000000000001
Jul 26 09:15:27 kernel: RDX: 000000006fc0c97e RSI: 0000000000000082 RDI: 0000000000000082
Jul 26 09:15:27 kernel: RBP: ffffb783801cfe90 R08: 0000000000000000 R09: 0000000000000225
Jul 26 09:15:27 kernel: R10: 0000000000100000 R11: 0000000000000000 R12: 0000000000000010
Jul 26 09:15:27 kernel: R13: ffff9b8e390b0000 R14: 0000000000000000 R15: 0000000000000000
Jul 26 09:15:27 kernel: ? __cpuidle_text_start+0x8/0x8
Jul 26 09:15:27 kernel: ? default_idle+0x20/0x140
Jul 26 09:15:27 kernel: arch_cpu_idle+0x15/0x20
Jul 26 09:15:27 kernel: default_idle_call+0x23/0x30
Jul 26 09:15:27 kernel: do_idle+0x1fb/0x270
Jul 26 09:15:27 kernel: cpu_startup_entry+0x20/0x30
Jul 26 09:15:27 kernel: start_secondary+0x178/0x1d0
Jul 26 09:15:27 kernel: secondary_startup_64+0xa4/0xb0
Jul 26 09:15:27 kernel: ---[ end trace e7ac822987e33be1 ]---
The NULL ptr deref is coming from tcp_rto_delta_us() attempting to pull an skb
off the head of the retransmit queue and then dereferencing that skb to get the
skb_mstamp_ns value via tcp_skb_timestamp_us(skb).
The crash is the same one that was reported a # of years ago here:
https://lore.kernel.org/netdev/86c0f836-9a7c-438b-d81a-839be45f1f58@gmail.com/T/#t
and the kernel we're running has the fix which was added to resolve this issue.
Unfortunately we've been unsuccessful so far in reproducing this problem in the
lab and do not have the luxury of pushing out a new kernel to try and test if
newer kernels resolve this issue at the moment. I realize this is a report
against both an Ubuntu kernel and also an older 5.4 kernel. I have reported this
issue to Ubuntu here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2077657
however I feel like since this issue has possibly cropped up again it makes
sense to build in some protection in this path (even on the latest kernel
versions) since the code in question just blindly assumes there's a valid skb
without testing if it's NULL b/f it looks at the timestamp.
Given we have seen crashes in this path before and now this case it seems like
we should protect ourselves for when packets_out accounting is incorrect.
While we should fix that root cause we should also just make sure the skb
is not NULL before dereferencing it. Also add a warn once here to capture
some information if/when the problem case is hit again.
Fixes: e1a10ef7fa87 ("tcp: introduce tcp_rto_delta_us() helper for xmit timer fix")
Signed-off-by: Josh Hunt <johunt@akamai.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The series in the "fixes" tag added the ability to consider L4 attributes
in routing rules.
The dst lookup on the outer packet of encapsulated traffic in the xfrm
code was not adapted to this change, thus routing behavior that relies
on L4 information is not respected.
Pass the ip protocol information when performing dst lookups.
Fixes: a25724b05af0 ("Merge branch 'fib_rules-support-sport-dport-and-proto-match'")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Tested-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Preparation for adding more fields to dst lookup functions without
changing their signatures.
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Luiz Augusto von Dentz says:
====================
bluetooth-next pull request for net-next:
- btusb: Add MediaTek MT7925-B22M support ID 0x13d3:0x3604
- btusb: Add Realtek RTL8852C support ID 0x0489:0xe122
- btrtl: Add the support for RTL8922A
- btusb: Add 2 USB HW IDs for MT7925 (0xe118/e)
- btnxpuart: Add support for ISO packets
- btusb: Add Mediatek MT7925 support ID 0x13d3:0x3608
- btsdio: Do not bind to non-removable CYW4373
- hci_uart: Add support for Amlogic HCI UART
* tag 'for-net-next-2024-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (27 commits)
Bluetooth: btintel_pcie: Allocate memory for driver private data
Bluetooth: btusb: Fix not handling ZPL/short-transfer
Bluetooth: btusb: Add 2 USB HW IDs for MT7925 (0xe118/e)
Bluetooth: btsdio: Do not bind to non-removable CYW4373
Bluetooth: hci_sync: Ignore errors from HCI_OP_REMOTE_NAME_REQ_CANCEL
Bluetooth: CMTP: Mark BT_CMTP as DEPRECATED
Bluetooth: replace deprecated strncpy with strscpy_pad
Bluetooth: hci_core: Fix sending MGMT_EV_CONNECT_FAILED
Bluetooth: btrtl: Set msft ext address filter quirk for RTL8852B
Bluetooth: Use led_set_brightness() in LED trigger activate() callback
Bluetooth: btrtl: Use kvmemdup to simplify the code
Bluetooth: btusb: Add Mediatek MT7925 support ID 0x13d3:0x3608
Bluetooth: btrtl: Add the support for RTL8922A
Bluetooth: hci_ldisc: Use speed set by btattach as oper_speed
Bluetooth: hci_conn: Remove redundant memset after kzalloc
Bluetooth: L2CAP: Remove unused declarations
dt-bindings: bluetooth: bring the HW description closer to reality for wcn6855
Bluetooth: btnxpuart: Add support for ISO packets
Bluetooth: hci_h4: Add support for ISO packets in h4_recv.h
Bluetooth: btusb: Add Realtek RTL8852C support ID 0x0489:0xe122
...
====================
Link: https://patch.msgid.link/20240912214317.3054060-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When CONFIG_SYSFS is not set, the kernel fails to compile:
net/core/page_pool_user.c:368:45: error: implicit declaration of function 'get_netdev_rx_queue_index' [-Werror=implicit-function-declaration]
368 | if (pool->slow.queue_idx == get_netdev_rx_queue_index(rxq)) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~
When CONFIG_SYSFS is not set, get_netdev_rx_queue_index() is not defined
as well.
Fix by removing the ifdef around get_netdev_rx_queue_index(). It is not
needed anymore after commit e817f85652c1 ("xdp: generic XDP handling of
xdp_rxq_info") removed most of the CONFIG_SYSFS ifdefs.
Fixes: 0f9214046893 ("memory-provider: dmabuf devmem memory provider")
Cc: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20240913032824.2117095-1-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In tcp_recvmsg_locked(), detect if the skb being received by the user
is a devmem skb. In this case - if the user provided the MSG_SOCK_DEVMEM
flag - pass it to tcp_recvmsg_devmem() for custom handling.
tcp_recvmsg_devmem() copies any data in the skb header to the linear
buffer, and returns a cmsg to the user indicating the number of bytes
returned in the linear buffer.
tcp_recvmsg_devmem() then loops over the unaccessible devmem skb frags,
and returns to the user a cmsg_devmem indicating the location of the
data in the dmabuf device memory. cmsg_devmem contains this information:
1. the offset into the dmabuf where the payload starts. 'frag_offset'.
2. the size of the frag. 'frag_size'.
3. an opaque token 'frag_token' to return to the kernel when the buffer
is to be released.
The pages awaiting freeing are stored in the newly added
sk->sk_user_frags, and each page passed to userspace is get_page()'d.
This reference is dropped once the userspace indicates that it is
done reading this page. All pages are released when the socket is
destroyed.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240910171458.219195-10-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For device memory TCP, we expect the skb headers to be available in host
memory for access, and we expect the skb frags to be in device memory
and unaccessible to the host. We expect there to be no mixing and
matching of device memory frags (unaccessible) with host memory frags
(accessible) in the same skb.
Add a skb->devmem flag which indicates whether the frags in this skb
are device memory frags or not.
__skb_fill_netmem_desc() now checks frags added to skbs for net_iov,
and marks the skb as skb->devmem accordingly.
Add checks through the network stack to avoid accessing the frags of
devmem skbs and avoid coalescing devmem skbs with non devmem skbs.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-9-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement a memory provider that allocates dmabuf devmem in the form of
net_iov.
The provider receives a reference to the struct netdev_dmabuf_binding
via the pool->mp_priv pointer. The driver needs to set this pointer for
the provider in the net_iov.
The provider obtains a reference on the netdev_dmabuf_binding which
guarantees the binding and the underlying mapping remains alive until
the provider is destroyed.
Usage of PP_FLAG_DMA_MAP is required for this memory provide such that
the page_pool can provide the driver with the dma-addrs of the devmem.
Support for PP_FLAG_DMA_SYNC_DEV is omitted for simplicity & p.order !=
0.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-7-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|