Age | Commit message (Collapse) | Author |
|
Add netdev mailing list and some more udp.h headers to the UDP section.
This is now more consistent with the TCP section.
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-2-f2c86e7309c8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We already exclude wireless drivers from the netdev@ traffic, to
delegate it to linux-wireless@, and avoid overwhelming netdev@.
Many of the following wireless-related sections MAINTAINERS
are already not included in the NETWORKING [GENERAL] section.
For consistency, exclude those that are.
* 802.11 (including CFG80211/NL80211)
* MAC80211
* RFKILL
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-1-f2c86e7309c8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot found that slhc_remember() was missing checks against
malicious packets [1].
slhc_remember() only checked the size of the packet was at least 20,
which is not good enough.
We need to make sure the packet includes the IPv4 and TCP header
that are supposed to be carried.
Add iph and th pointers to make the code more readable.
[1]
BUG: KMSAN: uninit-value in slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666
slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666
ppp_receive_nonmp_frame+0xe45/0x35e0 drivers/net/ppp/ppp_generic.c:2455
ppp_receive_frame drivers/net/ppp/ppp_generic.c:2372 [inline]
ppp_do_recv+0x65f/0x40d0 drivers/net/ppp/ppp_generic.c:2212
ppp_input+0x7dc/0xe60 drivers/net/ppp/ppp_generic.c:2327
pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379
sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113
__release_sock+0x1da/0x330 net/core/sock.c:3072
release_sock+0x6b/0x250 net/core/sock.c:3626
pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903
sock_sendmsg_nosec net/socket.c:729 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:744
____sys_sendmsg+0x903/0xb60 net/socket.c:2602
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
__sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
__do_sys_sendmmsg net/socket.c:2771 [inline]
__se_sys_sendmmsg net/socket.c:2768 [inline]
__x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was created at:
slab_post_alloc_hook mm/slub.c:4091 [inline]
slab_alloc_node mm/slub.c:4134 [inline]
kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4186
kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587
__alloc_skb+0x363/0x7b0 net/core/skbuff.c:678
alloc_skb include/linux/skbuff.h:1322 [inline]
sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732
pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867
sock_sendmsg_nosec net/socket.c:729 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:744
____sys_sendmsg+0x903/0xb60 net/socket.c:2602
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
__sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
__do_sys_sendmmsg net/socket.c:2771 [inline]
__se_sys_sendmmsg net/socket.c:2768 [inline]
__x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
CPU: 0 UID: 0 PID: 5460 Comm: syz.2.33 Not tainted 6.12.0-rc2-syzkaller-00006-g87d6aab2389e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Fixes: b5451d783ade ("slip: Move the SLIP drivers")
Reported-by: syzbot+2ada1bc857496353be5a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/670646db.050a0220.3f80e.0027.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241009091132.2136321-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Eric report a panic on IPPROTO_SMC, and give the facts
that when INET_PROTOSW_ICSK was set, icsk->icsk_sync_mss must be set too.
Bug: Unable to handle kernel NULL pointer dereference at virtual address
0000000000000000
Mem abort info:
ESR = 0x0000000086000005
EC = 0x21: IABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x05: level 1 translation fault
user pgtable: 4k pages, 48-bit VAs, pgdp=00000001195d1000
[0000000000000000] pgd=0800000109c46003, p4d=0800000109c46003,
pud=0000000000000000
Internal error: Oops: 0000000086000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 UID: 0 PID: 8037 Comm: syz.3.265 Not tainted
6.11.0-rc7-syzkaller-g5f5673607153 #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 08/06/2024
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : 0x0
lr : cipso_v4_sock_setattr+0x2a8/0x3c0 net/ipv4/cipso_ipv4.c:1910
sp : ffff80009b887a90
x29: ffff80009b887aa0 x28: ffff80008db94050 x27: 0000000000000000
x26: 1fffe0001aa6f5b3 x25: dfff800000000000 x24: ffff0000db75da00
x23: 0000000000000000 x22: ffff0000d8b78518 x21: 0000000000000000
x20: ffff0000d537ad80 x19: ffff0000d8b78000 x18: 1fffe000366d79ee
x17: ffff8000800614a8 x16: ffff800080569b84 x15: 0000000000000001
x14: 000000008b336894 x13: 00000000cd96feaa x12: 0000000000000003
x11: 0000000000040000 x10: 00000000000020a3 x9 : 1fffe0001b16f0f1
x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000040 x4 : 0000000000000001 x3 : 0000000000000000
x2 : 0000000000000002 x1 : 0000000000000000 x0 : ffff0000d8b78000
Call trace:
0x0
netlbl_sock_setattr+0x2e4/0x338 net/netlabel/netlabel_kapi.c:1000
smack_netlbl_add+0xa4/0x154 security/smack/smack_lsm.c:2593
smack_socket_post_create+0xa8/0x14c security/smack/smack_lsm.c:2973
security_socket_post_create+0x94/0xd4 security/security.c:4425
__sock_create+0x4c8/0x884 net/socket.c:1587
sock_create net/socket.c:1622 [inline]
__sys_socket_create net/socket.c:1659 [inline]
__sys_socket+0x134/0x340 net/socket.c:1706
__do_sys_socket net/socket.c:1720 [inline]
__se_sys_socket net/socket.c:1718 [inline]
__arm64_sys_socket+0x7c/0x94 net/socket.c:1718
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
Code: ???????? ???????? ???????? ???????? (????????)
---[ end trace 0000000000000000 ]---
This patch add a toy implementation that performs a simple return to
prevent such panic. This is because MSS can be set in sock_create_kern
or smc_setsockopt, similar to how it's done in AF_SMC. However, for
AF_SMC, there is currently no way to synchronize MSS within
__sys_connect_file. This toy implementation lays the groundwork for us
to support such feature for IPPROTO_SMC in the future.
Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://patch.msgid.link/1728456916-67035-1-git-send-email-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot reported an issue in ppp_async_encode() [1]
In this case, pppoe_sendmsg() is called with a zero size.
Then ppp_async_encode() is called with an empty skb.
BUG: KMSAN: uninit-value in ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline]
BUG: KMSAN: uninit-value in ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675
ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline]
ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675
ppp_async_send+0x130/0x1b0 drivers/net/ppp/ppp_async.c:634
ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2280 [inline]
ppp_input+0x1f1/0xe60 drivers/net/ppp/ppp_generic.c:2304
pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379
sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113
__release_sock+0x1da/0x330 net/core/sock.c:3072
release_sock+0x6b/0x250 net/core/sock.c:3626
pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903
sock_sendmsg_nosec net/socket.c:729 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:744
____sys_sendmsg+0x903/0xb60 net/socket.c:2602
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
__sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
__do_sys_sendmmsg net/socket.c:2771 [inline]
__se_sys_sendmmsg net/socket.c:2768 [inline]
__x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was created at:
slab_post_alloc_hook mm/slub.c:4092 [inline]
slab_alloc_node mm/slub.c:4135 [inline]
kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4187
kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587
__alloc_skb+0x363/0x7b0 net/core/skbuff.c:678
alloc_skb include/linux/skbuff.h:1322 [inline]
sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732
pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867
sock_sendmsg_nosec net/socket.c:729 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:744
____sys_sendmsg+0x903/0xb60 net/socket.c:2602
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
__sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
__do_sys_sendmmsg net/socket.c:2771 [inline]
__se_sys_sendmmsg net/socket.c:2768 [inline]
__x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
CPU: 1 UID: 0 PID: 5411 Comm: syz.1.14 Not tainted 6.12.0-rc1-syzkaller-00165-g360c1f1f24c6 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+1d121645899e7692f92a@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009185802.3763282-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The purpose of this section is to document what is the current practice
regarding clean-up patches which address checkpatch warnings and similar
problems. I feel there is a value in having this documented so others
can easily refer to it.
Clearly this topic is subjective. And to some extent the current
practice discourages a wider range of patches than is described here.
But I feel it is best to start somewhere, with the most well established
part of the current practice.
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009-doc-mc-clean-v2-1-e637b665fa81@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Kuniyuki Iwashima says:
====================
rtnetlink: Handle error of rtnl_register_module().
While converting phonet to per-netns RTNL, I found a weird comment
/* Further rtnl_register_module() cannot fail */
that was true but no longer true after commit addf9b90de22 ("net:
rtnetlink: use rcu to free rtnl message handlers").
Many callers of rtnl_register_module() just ignore the returned
value but should handle them properly.
This series introduces two helpers, rtnl_register_many() and
rtnl_unregister_many(), to do that easily and fix such callers.
All rtnl_register() and rtnl_register_module() will be converted
to _many() variant and some rtnl_lock() will be saved in _many()
later in net-next.
Changes:
v4:
* Add more context in changelog of each patch
v3: https://lore.kernel.org/all/20241007124459.5727-1-kuniyu@amazon.com/
* Move module *owner to struct rtnl_msg_handler
* Make struct rtnl_msg_handler args/vars const
* Update mctp goto labels
v2: https://lore.kernel.org/netdev/20241004222358.79129-1-kuniyu@amazon.com/
* Remove __exit from mctp_neigh_exit().
v1: https://lore.kernel.org/netdev/20241003205725.5612-1-kuniyu@amazon.com/
====================
Link: https://patch.msgid.link/20241008184737.9619-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Before commit addf9b90de22 ("net: rtnetlink: use rcu to free rtnl
message handlers"), once the first rtnl_register_module() allocated
rtnl_msg_handlers[PF_PHONET], the following calls never failed.
However, after the commit, rtnl_register_module() could fail silently
to allocate rtnl_msg_handlers[PF_PHONET][msgtype] and requires error
handling for each call.
Handling the error allows users to view a module as an all-or-nothing
thing in terms of the rtnetlink functionality. This prevents syzkaller
from reporting spurious errors from its tests, where OOM often occurs
and module is automatically loaded.
Let's use rtnl_register_many() to handle the errors easily.
Fixes: addf9b90de22 ("net: rtnetlink: use rcu to free rtnl message handlers")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Rémi Denis-Courmont <courmisch@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Since introduced, mpls_init() has been ignoring the returned
value of rtnl_register_module(), which could fail silently.
Handling the error allows users to view a module as an all-or-nothing
thing in terms of the rtnetlink functionality. This prevents syzkaller
from reporting spurious errors from its tests, where OOM often occurs
and module is automatically loaded.
Let's handle the errors by rtnl_register_many().
Fixes: 03c0566542f4 ("mpls: Netlink commands to add, remove, and dump routes")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Since introduced, mctp has been ignoring the returned value of
rtnl_register_module(), which could fail silently.
Handling the error allows users to view a module as an all-or-nothing
thing in terms of the rtnetlink functionality. This prevents syzkaller
from reporting spurious errors from its tests, where OOM often occurs
and module is automatically loaded.
Let's handle the errors by rtnl_register_many().
Fixes: 583be982d934 ("mctp: Add device handling and netlink interface")
Fixes: 831119f88781 ("mctp: Add neighbour netlink interface")
Fixes: 06d2f4c583a7 ("mctp: Add netlink route management")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Since introduced, br_vlan_rtnl_init() has been ignoring the returned
value of rtnl_register_module(), which could fail silently.
Handling the error allows users to view a module as an all-or-nothing
thing in terms of the rtnetlink functionality. This prevents syzkaller
from reporting spurious errors from its tests, where OOM often occurs
and module is automatically loaded.
Let's handle the errors by rtnl_register_many().
Fixes: 8dcea187088b ("net: bridge: vlan: add rtm definitions and dump support")
Fixes: f26b296585dc ("net: bridge: vlan: add new rtm message support")
Fixes: adb3ce9bcb0f ("net: bridge: vlan: add del rtm message support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Since introduced, vxlan_vnifilter_init() has been ignoring the
returned value of rtnl_register_module(), which could fail silently.
Handling the error allows users to view a module as an all-or-nothing
thing in terms of the rtnetlink functionality. This prevents syzkaller
from reporting spurious errors from its tests, where OOM often occurs
and module is automatically loaded.
Let's handle the errors by rtnl_register_many().
Fixes: f9c4bb0b245c ("vxlan: vni filtering support on collect metadata device")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Before commit addf9b90de22 ("net: rtnetlink: use rcu to free rtnl message
handlers"), once rtnl_msg_handlers[protocol] was allocated, the following
rtnl_register_module() for the same protocol never failed.
However, after the commit, rtnl_msg_handler[protocol][msgtype] needs to
be allocated in each rtnl_register_module(), so each call could fail.
Many callers of rtnl_register_module() do not handle the returned error,
and we need to add many error handlings.
To handle that easily, let's add wrapper functions for bulk registration
of rtnetlink message handlers.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Restrict xtables extensions to families that are safe, syzbot found
a way to combine ebtables with extensions that are never used by
userspace tools. From Florian Westphal.
2) Set l3mdev inconditionally whenever possible in nft_fib to fix lookup
mismatch, also from Florian.
netfilter pull request 24-10-09
* tag 'nf-24-10-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
selftests: netfilter: conntrack_vrf.sh: add fib test case
netfilter: fib: check correct rtable in vrf setups
netfilter: xtables: avoid NFPROTO_UNSPEC where needed
====================
Link: https://patch.msgid.link/20241009213858.3565808-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
dst_entries_add() uses per-cpu data that might be freed at netns
dismantle from ip6_route_net_exit() calling dst_entries_destroy()
Before ip6_route_net_exit() can be called, we release all
the dsts associated with this netns, via calls to dst_release(),
which waits an rcu grace period before calling dst_destroy()
dst_entries_add() use in dst_destroy() is racy, because
dst_entries_destroy() could have been called already.
Decrementing the number of dsts must happen sooner.
Notes:
1) in CONFIG_XFRM case, dst_destroy() can call
dst_release_immediate(child), this might also cause UAF
if the child does not have DST_NOCOUNT set.
IPSEC maintainers might take a look and see how to address this.
2) There is also discussion about removing this count of dst,
which might happen in future kernels.
Fixes: f88649721268 ("ipv4: fix dst race in sk_dst_get()")
Closes: https://lore.kernel.org/lkml/CANn89iLCCGsP7SFn9HKpvnKu96Td4KD08xf7aGtiYgZnkjaL=w@mail.gmail.com/T/
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20241008143110.1064899-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-10-08 (ice, i40e, igb, e1000e)
This series contains updates to ice, i40e, igb, and e1000e drivers.
For ice:
Marcin allows driver to load, into safe mode, when DDP package is
missing or corrupted and adjusts the netif_is_ice() check to
account for when the device is in safe mode. He also fixes an
out-of-bounds issue when MSI-X are increased for VFs.
Wojciech clears FDB entries on reset to match the hardware state.
For i40e:
Aleksandr adds locking around MACVLAN filters to prevent memory leaks
due to concurrency issues.
For igb:
Mohamed Khalfella adds a check to not attempt to bring up an already
running interface on non-fatal PCIe errors.
For e1000e:
Vitaly changes board type for I219 to more closely match the hardware
and stop PHY issues.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
e1000e: change I219 (19) devices to ADP
igb: Do not bring the device up after non-fatal error
i40e: Fix macvlan leak by synchronizing access to mac_filter_hash
ice: Fix increasing MSI-X on VF
ice: Flush FDB entries before reset
ice: Fix netif_is_ice() in Safe Mode
ice: Fix entering Safe Mode
====================
Link: https://patch.msgid.link/20241008230050.928245-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Matthieu Baerts says:
====================
mptcp: misc. fixes involving fallback to TCP
- Patch 1: better handle DSS corruptions from a bugged peer: reducing
warnings, doing a fallback or a reset depending on the subflow state.
For >= v5.7.
- Patch 2: fix DSS corruption due to large pmtu xmit, where MPTCP was
not taken into account. For >= v5.6.
- Patch 3: fallback when MPTCP opts are dropped after the first data
packet, instead of resetting the connection. For >= v5.6.
- Patch 4: restrict the removal of a subflow to other closing states, a
better fix, for a recent one. For >= v5.10.
====================
Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-0-c6fb8e93e551@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In a previous fix, the in-kernel path-manager has been modified not to
retrigger the removal of a subflow if it was already closed, e.g. when
the initial subflow is removed, but kept in the subflows list.
To be complete, this fix should also skip the subflows that are in any
closing state: mptcp_close_ssk() will initiate the closure, but the
switch to the TCP_CLOSE state depends on the other peer.
Fixes: 58e1b66b4e4b ("mptcp: pm: do not remove already closed subflows")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-4-c6fb8e93e551@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As reported by Christoph [1], before this patch, an MPTCP connection was
wrongly reset when a host received a first data packet with MPTCP
options after the 3wHS, but got the next ones without.
According to the MPTCP v1 specs [2], a fallback should happen in this
case, because the host didn't receive a DATA_ACK from the other peer,
nor receive data for more than the initial window which implies a
DATA_ACK being received by the other peer.
The patch here re-uses the same logic as the one used in other places:
by looking at allow_infinite_fallback, which is disabled at the creation
of an additional subflow. It's not looking at the first DATA_ACK (or
implying one received from the other side) as suggested by the RFC, but
it is in continuation with what was already done, which is safer, and it
fixes the reported issue. The next step, looking at this first DATA_ACK,
is tracked in [4].
This patch has been validated using the following Packetdrill script:
0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
// 3WHS is OK
+0.0 < S 0:0(0) win 65535 <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey>
+0.0 > S. 0:0(0) ack 1 <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]>
+0.1 < . 1:1(0) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[ckey=2, skey]>
+0 accept(3, ..., ...) = 4
// Data from the client with valid MPTCP options (no DATA_ACK: normal)
+0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop>
// From here, the MPTCP options will be dropped by a middlebox
+0.0 > . 1:1(0) ack 501 <dss dack8=501 dll=0 nocs>
+0.1 read(4, ..., 500) = 500
+0 write(4, ..., 100) = 100
// The server replies with data, still thinking MPTCP is being used
+0.0 > P. 1:101(100) ack 501 <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop>
// But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options
+0.1 < . 501:501(0) ack 101 win 2048
+0.0 < P. 501:601(100) ack 101 win 2048
// The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window
+0.0 > . 101:101(0) ack 601
Note that this script requires Packetdrill with MPTCP support, see [3].
Fixes: dea2b1ea9c70 ("mptcp: do not reset MP_CAPABLE subflow on mapping errors")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/518 [1]
Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2]
Link: https://github.com/multipath-tcp/packetdrill [3]
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/519 [4]
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-3-c6fb8e93e551@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Syzkaller was able to trigger a DSS corruption:
TCP: request_sock_subflow_v4: Possible SYN flooding on port [::]:20002. Sending cookies.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 5227 at net/mptcp/protocol.c:695 __mptcp_move_skbs_from_subflow+0x20a9/0x21f0 net/mptcp/protocol.c:695
Modules linked in:
CPU: 0 UID: 0 PID: 5227 Comm: syz-executor350 Not tainted 6.11.0-syzkaller-08829-gaf9c191ac2a0 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
RIP: 0010:__mptcp_move_skbs_from_subflow+0x20a9/0x21f0 net/mptcp/protocol.c:695
Code: 0f b6 dc 31 ff 89 de e8 b5 dd ea f5 89 d8 48 81 c4 50 01 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 98 da ea f5 90 <0f> 0b 90 e9 47 ff ff ff e8 8a da ea f5 90 0f 0b 90 e9 99 e0 ff ff
RSP: 0018:ffffc90000006db8 EFLAGS: 00010246
RAX: ffffffff8ba9df18 RBX: 00000000000055f0 RCX: ffff888030023c00
RDX: 0000000000000100 RSI: 00000000000081e5 RDI: 00000000000055f0
RBP: 1ffff110062bf1ae R08: ffffffff8ba9cf12 R09: 1ffff110062bf1b8
R10: dffffc0000000000 R11: ffffed10062bf1b9 R12: 0000000000000000
R13: dffffc0000000000 R14: 00000000700cec61 R15: 00000000000081e5
FS: 000055556679c380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020287000 CR3: 0000000077892000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
move_skbs_to_msk net/mptcp/protocol.c:811 [inline]
mptcp_data_ready+0x29c/0xa90 net/mptcp/protocol.c:854
subflow_data_ready+0x34a/0x920 net/mptcp/subflow.c:1490
tcp_data_queue+0x20fd/0x76c0 net/ipv4/tcp_input.c:5283
tcp_rcv_established+0xfba/0x2020 net/ipv4/tcp_input.c:6237
tcp_v4_do_rcv+0x96d/0xc70 net/ipv4/tcp_ipv4.c:1915
tcp_v4_rcv+0x2dc0/0x37f0 net/ipv4/tcp_ipv4.c:2350
ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233
NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
__netif_receive_skb_one_core net/core/dev.c:5662 [inline]
__netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775
process_backlog+0x662/0x15b0 net/core/dev.c:6107
__napi_poll+0xcb/0x490 net/core/dev.c:6771
napi_poll net/core/dev.c:6840 [inline]
net_rx_action+0x89b/0x1240 net/core/dev.c:6962
handle_softirqs+0x2c5/0x980 kernel/softirq.c:554
do_softirq+0x11b/0x1e0 kernel/softirq.c:455
</IRQ>
<TASK>
__local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382
local_bh_enable include/linux/bottom_half.h:33 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:919 [inline]
__dev_queue_xmit+0x1764/0x3e80 net/core/dev.c:4451
dev_queue_xmit include/linux/netdevice.h:3094 [inline]
neigh_hh_output include/net/neighbour.h:526 [inline]
neigh_output include/net/neighbour.h:540 [inline]
ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:236
ip_local_out net/ipv4/ip_output.c:130 [inline]
__ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:536
__tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466
tcp_transmit_skb net/ipv4/tcp_output.c:1484 [inline]
tcp_mtu_probe net/ipv4/tcp_output.c:2547 [inline]
tcp_write_xmit+0x641d/0x6bf0 net/ipv4/tcp_output.c:2752
__tcp_push_pending_frames+0x9b/0x360 net/ipv4/tcp_output.c:3015
tcp_push_pending_frames include/net/tcp.h:2107 [inline]
tcp_data_snd_check net/ipv4/tcp_input.c:5714 [inline]
tcp_rcv_established+0x1026/0x2020 net/ipv4/tcp_input.c:6239
tcp_v4_do_rcv+0x96d/0xc70 net/ipv4/tcp_ipv4.c:1915
sk_backlog_rcv include/net/sock.h:1113 [inline]
__release_sock+0x214/0x350 net/core/sock.c:3072
release_sock+0x61/0x1f0 net/core/sock.c:3626
mptcp_push_release net/mptcp/protocol.c:1486 [inline]
__mptcp_push_pending+0x6b5/0x9f0 net/mptcp/protocol.c:1625
mptcp_sendmsg+0x10bb/0x1b10 net/mptcp/protocol.c:1903
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x1a6/0x270 net/socket.c:745
____sys_sendmsg+0x52a/0x7e0 net/socket.c:2603
___sys_sendmsg net/socket.c:2657 [inline]
__sys_sendmsg+0x2aa/0x390 net/socket.c:2686
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb06e9317f9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe2cfd4f98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fb06e97f468 RCX: 00007fb06e9317f9
RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000005
RBP: 00007fb06e97f446 R08: 0000555500000000 R09: 0000555500000000
R10: 0000555500000000 R11: 0000000000000246 R12: 00007fb06e97f406
R13: 0000000000000001 R14: 00007ffe2cfd4fe0 R15: 0000000000000003
</TASK>
Additionally syzkaller provided a nice reproducer. The repro enables
pmtu on the loopback device, leading to tcp_mtu_probe() generating
very large probe packets.
tcp_can_coalesce_send_queue_head() currently does not check for
mptcp-level invariants, and allowed the creation of cross-DSS probes,
leading to the mentioned corruption.
Address the issue teaching tcp_can_coalesce_send_queue_head() about
mptcp using the tcp_skb_can_collapse(), also reducing the code
duplication.
Fixes: 85712484110d ("tcp: coalesce/collapse must respect MPTCP extensions")
Cc: stable@vger.kernel.org
Reported-by: syzbot+d1bff73460e33101f0e7@syzkaller.appspotmail.com
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/513
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-2-c6fb8e93e551@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Bugged peer implementation can send corrupted DSS options, consistently
hitting a few warning in the data path. Use DEBUG_NET assertions, to
avoid the splat on some builds and handle consistently the error, dumping
related MIBs and performing fallback and/or reset according to the
subflow type.
Fixes: 6771bfd9ee24 ("mptcp: update mptcp ack sequence from work queue")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-1-c6fb8e93e551@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A warning is triggered when there is insufficient space in the buffer
for userdata. However, this is not an issue since userdata will be sent
in the next iteration.
Current warning message:
------------[ cut here ]------------
WARNING: CPU: 13 PID: 3013042 at drivers/net/netconsole.c:1122 write_ext_msg+0x3b6/0x3d0
? write_ext_msg+0x3b6/0x3d0
console_flush_all+0x1e9/0x330
The code incorrectly issues a warning when this_chunk is zero, which is
a valid scenario. The warning should only be triggered when this_chunk
is negative.
Fixes: 1ec9daf95093 ("net: netconsole: append userdata to fragmented netconsole messages")
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241008094325.896208-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In case of a tc mirred action from one switch to another, the behavior
is not correct. We simply tell the source switch driver to program a
mirroring entry towards mirror->to_local_port = to_dp->index, but it is
not even guaranteed that the to_dp belongs to the same switch as dp.
For proper cross-chip support, we would need to go through the
cross-chip notifier layer in switch.c, program the entry on cascade
ports, and introduce new, explicit API for cross-chip mirroring, given
that intermediary switches should have introspection into the DSA tags
passed through the cascade port (and not just program a port mirror on
the entire cascade port). None of that exists today.
Reject what is not implemented so that user space is not misled into
thinking it works.
Fixes: f50f212749e8 ("net: dsa: Add plumbing for port mirroring")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241008094320.3340980-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some platforms (such as i.MX25 and i.MX27) do not support PTP, so on
these platforms fec_ptp_init() is not called and the related members
in fep are not initialized. However, fec_ptp_save_state() is called
unconditionally, which causes the kernel to panic. Therefore, add a
condition so that fec_ptp_save_state() is not called if PTP is not
supported.
Fixes: a1477dc87dc4 ("net: fec: Restart PPS after link state change")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/lkml/353e41fe-6bb4-4ee9-9980-2da2a9c1c508@roeck-us.net/
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Csókás, Bence <csokas.bence@prolan.hu>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://patch.msgid.link/20241008061153.1977930-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It's done in probe so it should be undone here.
Fixes: 1d3bb996481e ("Device tree aware EMAC driver")
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20241008233050.9422-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add error handling from calling fixed_phy_register.
It may return some error, therefore, need to check the status.
And fixed_phy_register needs to bind a device node for mdio.
Add the mac device node for fixed_phy_register function.
This is a reference to this function, of_phy_register_fixed_link().
Fixes: e24a6c874601 ("net: ftgmac100: Get link speed and duplex for NC-SI")
Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com>
Link: https://patch.msgid.link/20241007032435.787892-1-jacky_chou@aspeedtech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
meta iifname veth0 ip daddr ... fib daddr oif
... is expected to return "dummy0" interface which is part of same vrf
as veth0.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We need to init l3mdev unconditionally, else main routing table is searched
and incorrect result is returned unless strict (iif keyword) matching is
requested.
Next patch adds a selftest for this.
Fixes: 2a8a7c0eaa87 ("netfilter: nft_fib: Fix for rpath check with VRF devices")
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1761
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
syzbot managed to call xt_cluster match via ebtables:
WARNING: CPU: 0 PID: 11 at net/netfilter/xt_cluster.c:72 xt_cluster_mt+0x196/0x780
[..]
ebt_do_table+0x174b/0x2a40
Module registers to NFPROTO_UNSPEC, but it assumes ipv4/ipv6 packet
processing. As this is only useful to restrict locally terminating
TCP/UDP traffic, register this for ipv4 and ipv6 family only.
Pablo points out that this is a general issue, direct users of the
set/getsockopt interface can call into targets/matches that were only
intended for use with ip(6)tables.
Check all UNSPEC matches and targets for similar issues:
- matches and targets are fine except if they assume skb_network_header()
is valid -- this is only true when called from inet layer: ip(6) stack
pulls the ip/ipv6 header into linear data area.
- targets that return XT_CONTINUE or other xtables verdicts must be
restricted too, they are incompatbile with the ebtables traverser, e.g.
EBT_CONTINUE is a completely different value than XT_CONTINUE.
Most matches/targets are changed to register for NFPROTO_IPV4/IPV6, as
they are provided for use by ip(6)tables.
The MARK target is also used by arptables, so register for NFPROTO_ARP too.
While at it, bail out if connbytes fails to enable the corresponding
conntrack family.
This change passes the selftests in iptables.git.
Reported-by: syzbot+256c348558aa5cf611a9@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netfilter-devel/66fec2e2.050a0220.9ec68.0047.GAE@google.com/
Fixes: 0269ea493734 ("netfilter: xtables: add cluster match")
Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Yisen Zhuang has left the company in September.
Jian Shen will be responsible for maintaining the
hns3/hns driver's code in the future,
so add Jian Shen to the hns3/hns driver's matainer list.
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If hashing fails in sctp_listen_start(), the socket remains in the
LISTENING state, even though it was not added to the hash table.
This can lead to a scenario where a socket appears to be listening
without actually being accessible.
This patch ensures that if the hashing operation fails, the sk_state
is set back to CLOSED before returning an error.
Note that there is no need to undo the autobind operation if hashing
fails, as the bind port can still be used for next listen() call on
the same socket.
Fixes: 76c6d988aeb3 ("sctp: add sock_reuseport for the sock in __sctp_hash_endpoint")
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently this driver prints this line with what looks like
a rogue format specifier when the device is probed:
[ 2.840000] eth%d: MVME147 at 0xfffe1800, irq 12, Hardware Address xx:xx:xx:xx:xx:xx
Change the printk() for netdev_info() and move it after the
registration has completed so it prints out the name of the
interface properly.
Signed-off-by: Daniel Palmer <daniel@0x0f.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
All MMD reads return 0 for the RTL8126A-integrated PHY. Therefore phylib
assumes it doesn't support EEE, what results in higher power consumption,
and a significantly higher chip temperature in my case.
To fix this split out the PHY driver for the RTL8126A-integrated PHY
and set the read_mmd/write_mmd callbacks to read from vendor-specific
registers.
Fixes: 5befa3728b85 ("net: phy: realtek: add support for RTL8126A-integrated 5Gbps PHY")
Cc: stable@vger.kernel.org
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The VLAN table is a shared memory between the two ports/slices
in a ICSSG cluster and this may lead to race condition when the
common code paths for both ports are executed in different CPUs.
Fix the race condition access by locking the shared memory access
Fixes: 487f7323f39a ("net: ti: icssg-prueth: Add helper functions to configure FDB")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dcr_map is called in the previous if and therefore needs to be unmapped.
Fixes: 1ff0fcfcb1a6 ("ibm_newemac: Fix new MAL feature handling")
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20241007235711.5714-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Most qdiscs maintain their backlog using qdisc_pkt_len(skb)
on the assumption it is invariant between the enqueue()
and dequeue() handlers.
Unfortunately syzbot can crash a host rather easily using
a TBF + SFQ combination, with an STAB on SFQ [1]
We can't support TCA_STAB on arbitrary level, this would
require to maintain per-qdisc storage.
[1]
[ 88.796496] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 88.798611] #PF: supervisor read access in kernel mode
[ 88.799014] #PF: error_code(0x0000) - not-present page
[ 88.799506] PGD 0 P4D 0
[ 88.799829] Oops: Oops: 0000 [#1] SMP NOPTI
[ 88.800569] CPU: 14 UID: 0 PID: 2053 Comm: b371744477 Not tainted 6.12.0-rc1-virtme #1117
[ 88.801107] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 88.801779] RIP: 0010:sfq_dequeue (net/sched/sch_sfq.c:272 net/sched/sch_sfq.c:499) sch_sfq
[ 88.802544] Code: 0f b7 50 12 48 8d 04 d5 00 00 00 00 48 89 d6 48 29 d0 48 8b 91 c0 01 00 00 48 c1 e0 03 48 01 c2 66 83 7a 1a 00 7e c0 48 8b 3a <4c> 8b 07 4c 89 02 49 89 50 08 48 c7 47 08 00 00 00 00 48 c7 07 00
All code
========
0: 0f b7 50 12 movzwl 0x12(%rax),%edx
4: 48 8d 04 d5 00 00 00 lea 0x0(,%rdx,8),%rax
b: 00
c: 48 89 d6 mov %rdx,%rsi
f: 48 29 d0 sub %rdx,%rax
12: 48 8b 91 c0 01 00 00 mov 0x1c0(%rcx),%rdx
19: 48 c1 e0 03 shl $0x3,%rax
1d: 48 01 c2 add %rax,%rdx
20: 66 83 7a 1a 00 cmpw $0x0,0x1a(%rdx)
25: 7e c0 jle 0xffffffffffffffe7
27: 48 8b 3a mov (%rdx),%rdi
2a:* 4c 8b 07 mov (%rdi),%r8 <-- trapping instruction
2d: 4c 89 02 mov %r8,(%rdx)
30: 49 89 50 08 mov %rdx,0x8(%r8)
34: 48 c7 47 08 00 00 00 movq $0x0,0x8(%rdi)
3b: 00
3c: 48 rex.W
3d: c7 .byte 0xc7
3e: 07 (bad)
...
Code starting with the faulting instruction
===========================================
0: 4c 8b 07 mov (%rdi),%r8
3: 4c 89 02 mov %r8,(%rdx)
6: 49 89 50 08 mov %rdx,0x8(%r8)
a: 48 c7 47 08 00 00 00 movq $0x0,0x8(%rdi)
11: 00
12: 48 rex.W
13: c7 .byte 0xc7
14: 07 (bad)
...
[ 88.803721] RSP: 0018:ffff9a1f892b7d58 EFLAGS: 00000206
[ 88.804032] RAX: 0000000000000000 RBX: ffff9a1f8420c800 RCX: ffff9a1f8420c800
[ 88.804560] RDX: ffff9a1f81bc1440 RSI: 0000000000000000 RDI: 0000000000000000
[ 88.805056] RBP: ffffffffc04bb0e0 R08: 0000000000000001 R09: 00000000ff7f9a1f
[ 88.805473] R10: 000000000001001b R11: 0000000000009a1f R12: 0000000000000140
[ 88.806194] R13: 0000000000000001 R14: ffff9a1f886df400 R15: ffff9a1f886df4ac
[ 88.806734] FS: 00007f445601a740(0000) GS:ffff9a2e7fd80000(0000) knlGS:0000000000000000
[ 88.807225] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 88.807672] CR2: 0000000000000000 CR3: 000000050cc46000 CR4: 00000000000006f0
[ 88.808165] Call Trace:
[ 88.808459] <TASK>
[ 88.808710] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
[ 88.809261] ? page_fault_oops (arch/x86/mm/fault.c:715)
[ 88.809561] ? exc_page_fault (./arch/x86/include/asm/irqflags.h:26 ./arch/x86/include/asm/irqflags.h:87 ./arch/x86/include/asm/irqflags.h:147 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539)
[ 88.809806] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623)
[ 88.810074] ? sfq_dequeue (net/sched/sch_sfq.c:272 net/sched/sch_sfq.c:499) sch_sfq
[ 88.810411] sfq_reset (net/sched/sch_sfq.c:525) sch_sfq
[ 88.810671] qdisc_reset (./include/linux/skbuff.h:2135 ./include/linux/skbuff.h:2441 ./include/linux/skbuff.h:3304 ./include/linux/skbuff.h:3310 net/sched/sch_generic.c:1036)
[ 88.810950] tbf_reset (./include/linux/timekeeping.h:169 net/sched/sch_tbf.c:334) sch_tbf
[ 88.811208] qdisc_reset (./include/linux/skbuff.h:2135 ./include/linux/skbuff.h:2441 ./include/linux/skbuff.h:3304 ./include/linux/skbuff.h:3310 net/sched/sch_generic.c:1036)
[ 88.811484] netif_set_real_num_tx_queues (./include/linux/spinlock.h:396 ./include/net/sch_generic.h:768 net/core/dev.c:2958)
[ 88.811870] __tun_detach (drivers/net/tun.c:590 drivers/net/tun.c:673)
[ 88.812271] tun_chr_close (drivers/net/tun.c:702 drivers/net/tun.c:3517)
[ 88.812505] __fput (fs/file_table.c:432 (discriminator 1))
[ 88.812735] task_work_run (kernel/task_work.c:230)
[ 88.813016] do_exit (kernel/exit.c:940)
[ 88.813372] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:58 (discriminator 4))
[ 88.813639] ? handle_mm_fault (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/memcontrol.h:1022 ./include/linux/memcontrol.h:1045 ./include/linux/memcontrol.h:1052 mm/memory.c:5928 mm/memory.c:6088)
[ 88.813867] do_group_exit (kernel/exit.c:1070)
[ 88.814138] __x64_sys_exit_group (kernel/exit.c:1099)
[ 88.814490] x64_sys_call (??:?)
[ 88.814791] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
[ 88.815012] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[ 88.815495] RIP: 0033:0x7f44560f1975
Fixes: 175f9c1bba9b ("net_sched: Add size table for qdiscs")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20241007184130.3960565-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Sporadic issues, such as PHY access loss, have been observed on I219 (19)
devices. It was found that these devices have hardware more closely
related to ADP than MTP and the issues were caused by taking MTP-specific
flows.
Change the MAC and board types of these devices from MTP to ADP to
correctly reflect the LAN hardware, and flows, of these devices.
Fixes: db2d737d63c5 ("e1000e: Separate MTP board type from ADP")
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Commit 004d25060c78 ("igb: Fix igb_down hung on surprise removal")
changed igb_io_error_detected() to ignore non-fatal pcie errors in order
to avoid hung task that can happen when igb_down() is called multiple
times. This caused an issue when processing transient non-fatal errors.
igb_io_resume(), which is called after igb_io_error_detected(), assumes
that device is brought down by igb_io_error_detected() if the interface
is up. This resulted in panic with stacktrace below.
[ T3256] igb 0000:09:00.0 haeth0: igb: haeth0 NIC Link is Down
[ T292] pcieport 0000:00:1c.5: AER: Uncorrected (Non-Fatal) error received: 0000:09:00.0
[ T292] igb 0000:09:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
[ T292] igb 0000:09:00.0: device [8086:1537] error status/mask=00004000/00000000
[ T292] igb 0000:09:00.0: [14] CmpltTO [ 200.105524,009][ T292] igb 0000:09:00.0: AER: TLP Header: 00000000 00000000 00000000 00000000
[ T292] pcieport 0000:00:1c.5: AER: broadcast error_detected message
[ T292] igb 0000:09:00.0: Non-correctable non-fatal error reported.
[ T292] pcieport 0000:00:1c.5: AER: broadcast mmio_enabled message
[ T292] pcieport 0000:00:1c.5: AER: broadcast resume message
[ T292] ------------[ cut here ]------------
[ T292] kernel BUG at net/core/dev.c:6539!
[ T292] invalid opcode: 0000 [#1] PREEMPT SMP
[ T292] RIP: 0010:napi_enable+0x37/0x40
[ T292] Call Trace:
[ T292] <TASK>
[ T292] ? die+0x33/0x90
[ T292] ? do_trap+0xdc/0x110
[ T292] ? napi_enable+0x37/0x40
[ T292] ? do_error_trap+0x70/0xb0
[ T292] ? napi_enable+0x37/0x40
[ T292] ? napi_enable+0x37/0x40
[ T292] ? exc_invalid_op+0x4e/0x70
[ T292] ? napi_enable+0x37/0x40
[ T292] ? asm_exc_invalid_op+0x16/0x20
[ T292] ? napi_enable+0x37/0x40
[ T292] igb_up+0x41/0x150
[ T292] igb_io_resume+0x25/0x70
[ T292] report_resume+0x54/0x70
[ T292] ? report_frozen_detected+0x20/0x20
[ T292] pci_walk_bus+0x6c/0x90
[ T292] ? aer_print_port_info+0xa0/0xa0
[ T292] pcie_do_recovery+0x22f/0x380
[ T292] aer_process_err_devices+0x110/0x160
[ T292] aer_isr+0x1c1/0x1e0
[ T292] ? disable_irq_nosync+0x10/0x10
[ T292] irq_thread_fn+0x1a/0x60
[ T292] irq_thread+0xe3/0x1a0
[ T292] ? irq_set_affinity_notifier+0x120/0x120
[ T292] ? irq_affinity_notify+0x100/0x100
[ T292] kthread+0xe2/0x110
[ T292] ? kthread_complete_and_exit+0x20/0x20
[ T292] ret_from_fork+0x2d/0x50
[ T292] ? kthread_complete_and_exit+0x20/0x20
[ T292] ret_from_fork_asm+0x11/0x20
[ T292] </TASK>
To fix this issue igb_io_resume() checks if the interface is running and
the device is not down this means igb_io_error_detected() did not bring
the device down and there is no need to bring it up.
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Reviewed-by: Yuanyuan Zhong <yzhong@purestorage.com>
Fixes: 004d25060c78 ("igb: Fix igb_down hung on surprise removal")
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
This patch addresses a macvlan leak issue in the i40e driver caused by
concurrent access to vsi->mac_filter_hash. The leak occurs when multiple
threads attempt to modify the mac_filter_hash simultaneously, leading to
inconsistent state and potential memory leaks.
To fix this, we now wrap the calls to i40e_del_mac_filter() and zeroing
vf->default_lan_addr.addr with spin_lock/unlock_bh(&vsi->mac_filter_hash_lock),
ensuring atomic operations and preventing concurrent access.
Additionally, we add lockdep_assert_held(&vsi->mac_filter_hash_lock) in
i40e_add_mac_filter() to help catch similar issues in the future.
Reproduction steps:
1. Spawn VFs and configure port vlan on them.
2. Trigger concurrent macvlan operations (e.g., adding and deleting
portvlan and/or mac filters).
3. Observe the potential memory leak and inconsistent state in the
mac_filter_hash.
This synchronization ensures the integrity of the mac_filter_hash and prevents
the described leak.
Fixes: fed0d9f13266 ("i40e: Fix VF's MAC Address change on VM")
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Increasing MSI-X value on a VF leads to invalid memory operations. This
is caused by not reallocating some arrays.
Reproducer:
modprobe ice
echo 0 > /sys/bus/pci/devices/$PF_PCI/sriov_drivers_autoprobe
echo 1 > /sys/bus/pci/devices/$PF_PCI/sriov_numvfs
echo 17 > /sys/bus/pci/devices/$VF0_PCI/sriov_vf_msix_count
Default MSI-X is 16, so 17 and above triggers this issue.
KASAN reports:
BUG: KASAN: slab-out-of-bounds in ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice]
Read of size 8 at addr ffff8888b937d180 by task bash/28433
(...)
Call Trace:
(...)
? ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice]
kasan_report+0xed/0x120
? ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice]
ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice]
ice_vsi_cfg_def+0x3360/0x4770 [ice]
? mutex_unlock+0x83/0xd0
? __pfx_ice_vsi_cfg_def+0x10/0x10 [ice]
? __pfx_ice_remove_vsi_lkup_fltr+0x10/0x10 [ice]
ice_vsi_cfg+0x7f/0x3b0 [ice]
ice_vf_reconfig_vsi+0x114/0x210 [ice]
ice_sriov_set_msix_vec_count+0x3d0/0x960 [ice]
sriov_vf_msix_count_store+0x21c/0x300
(...)
Allocated by task 28201:
(...)
ice_vsi_cfg_def+0x1c8e/0x4770 [ice]
ice_vsi_cfg+0x7f/0x3b0 [ice]
ice_vsi_setup+0x179/0xa30 [ice]
ice_sriov_configure+0xcaa/0x1520 [ice]
sriov_numvfs_store+0x212/0x390
(...)
To fix it, use ice_vsi_rebuild() instead of ice_vf_reconfig_vsi(). This
causes the required arrays to be reallocated taking the new queue count
into account (ice_vsi_realloc_stat_arrays()). Set req_txq and req_rxq
before ice_vsi_rebuild(), so that realloc uses the newly set queue
count.
Additionally, ice_vsi_rebuild() does not remove VSI filters
(ice_fltr_remove_all()), so ice_vf_init_host_cfg() is no longer
necessary.
Reported-by: Jacob Keller <jacob.e.keller@intel.com>
Fixes: 2a2cb4c6c181 ("ice: replace ice_vf_recreate_vsi() with ice_vf_reconfig_vsi()")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Triggering the reset while in switchdev mode causes
errors[1]. Rules are already removed by this time
because switch content is flushed in case of the reset.
This means that rules were deleted from HW but SW
still thinks they exist so when we get
SWITCHDEV_FDB_DEL_TO_DEVICE notification we try to
delete not existing rule.
We can avoid these errors by clearing the rules
early in the reset flow before they are removed from HW.
Switchdev API will get notified that the rule was removed
so we won't get SWITCHDEV_FDB_DEL_TO_DEVICE notification.
Remove unnecessary ice_clear_sw_switch_recipes.
[1]
ice 0000:01:00.0: Failed to delete FDB forward rule, err: -2
ice 0000:01:00.0: Failed to delete FDB guard rule, err: -2
Fixes: 7c945a1a8e5f ("ice: Switchdev FDB events support")
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
netif_is_ice() works by checking the pointer to netdev ops. However, it
only checks for the default ice_netdev_ops, not ice_netdev_safe_mode_ops,
so in Safe Mode it always returns false, which is unintuitive. While it
doesn't look like netif_is_ice() is currently being called anywhere in Safe
Mode, this could change and potentially lead to unexpected behaviour.
Fixes: df006dd4b1dc ("ice: Add initial support framework for LAG")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
If DDP package is missing or corrupted, the driver should enter Safe Mode.
Instead, an error is returned and probe fails.
To fix this, don't exit init if ice_init_ddp_config() returns an error.
Repro:
* Remove or rename DDP package (/lib/firmware/intel/ice/ddp/ice.pkg)
* Load ice
Fixes: cc5776fe1832 ("ice: Enable switching default Tx scheduler topology")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Starting with 6.12 commit 85585b4bc8d8 ("selftests: add ncdevmem, netcat
for devmem TCP") kselftest-all creates additional outputs that
kselftest-clean does not cleanup:
$ make defconfig
$ make kselftest-all
$ make kselftest-clean
$ git clean -ndxf | grep tools/net
Would remove tools/net/ynl/lib/__pycache__/
Would remove tools/net/ynl/lib/ynl.a
Would remove tools/net/ynl/lib/ynl.d
Would remove tools/net/ynl/lib/ynl.o
Make kselftest-clean remove the newly added net/ynl outputs.
Fixes: 85585b4bc8d8 ("selftests: add ncdevmem, netcat for devmem TCP")
Signed-off-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://patch.msgid.link/20241005215600.852260-1-gthelen@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Javier Carrasco says:
====================
selftests: net: add missing gitignore and EXTRA_CLEAN entries.
This series is a cherry-pick on top of v6.12-rc1 from the one I sent
for selftests with other patches that were not net-related:
https://lore.kernel.org/all/20240925-selftests-gitignore-v3-0-9db896474170@gmail.com/
The patches have not been modified, and the Reviewed-by tags have
been kept.
v1: https://lore.kernel.org/20240930-net-selftests-gitignore-v1-0-65225a855946@gmail.com
====================
Link: https://patch.msgid.link/20241005-net-selftests-gitignore-v2-0-3a0b2876394a@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The generated include.sh should be ignored by git. Create a new
gitignore and add the file to the list.
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Link: https://patch.msgid.link/20241005-net-selftests-gitignore-v2-3-3a0b2876394a@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The include.sh file is generated when building the net/rds selftests,
but there is no rule to delete it with the clean target. Add the file to
EXTRA_CLEAN in order to remove it when required.
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Link: https://patch.msgid.link/20241005-net-selftests-gitignore-v2-2-3a0b2876394a@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This executable is missing from the corresponding gitignore file.
Add msg_oob to the net gitignore list.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Link: https://patch.msgid.link/20241005-net-selftests-gitignore-v2-1-3a0b2876394a@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jonas Gorski says:
====================
net: dsa: b53: assorted jumbo frame fixes
While investigating the capabilities of BCM63XX's integrated switch and
its DMA engine, I noticed a few issues in b53's jumbo frame code.
Mostly a confusion of MTU vs frame length, but also a few missing cases
for 100M switches.
Tested on BCM63XX and BCM53115 with intel 1G and realtek 1G NICs,
which support MTUs of 9000 or slightly above, but significantly less
than the 9716/9720 supported by BCM53115, so I couldn't verify the
actual maximum frame length.
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
---
====================
Link: https://patch.msgid.link/20241004-b53_jumbo_fixes-v1-0-ce1e54aa7b3c@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
All modern chips support and need the 10_100 bit set for supporting jumbo
frames on 10/100 ports, so instead of enabling it only for 583XX enable
it for everything except bcm63xx, where the bit is writeable, but does
nothing.
Tested on BCM53115, where jumbo frames were dropped at 10/100 speeds
without the bit set.
Fixes: 6ae5834b983a ("net: dsa: b53: add MTU configuration support")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|