Age | Commit message (Collapse) | Author |
|
[ Upstream commit 719817cd293e4fa389e1f69c396f3f816ed5aa41 ]
inet_select_addr() must use RCU protection to make
sure the net structure it reads does not disappear.
Fixes: c4544c724322 ("[NETNS]: Process inet_select_addr inside a namespace.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250205155120.1676781-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit dd205fcc33d92d54eee4d7f21bb073af9bd5ce2b ]
rt_is_expired() must use RCU protection to make
sure the net structure it reads does not disappear.
Fixes: e84f84f27647 ("netns: place rt_genid into struct net")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250205155120.1676781-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 71b8471c93fa0bcab911fcb65da1eb6c4f5f735f ]
ipv4_default_advmss() must use RCU protection to make
sure the net structure it reads does not disappear.
Fixes: 2e9589ff809e ("ipv4: Namespaceify min_adv_mss sysctl knob")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250205155120.1676781-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
length zero
commit 44de577e61ed239db09f0da9d436866bef9b77dd upstream.
The J1939 standard requires the transmission of messages of length 0.
For example proprietary messages are specified with a data length of 0
to 1785. The transmission of such messages is not possible. Sending
results in no error being returned but no corresponding can frame
being generated.
Enable the transmission of zero length J1939 messages. In order to
facilitate this two changes are necessary:
1) If the transmission of a new message is requested from user space
the message is segmented in j1939_sk_send_loop(). Let the segmentation
take into account zero length messages, do not terminate immediately,
queue the corresponding skb.
2) j1939_session_skb_get_by_offset() selects the next skb to transmit
for a session. Take into account that there might be zero length skbs
in the queue.
Signed-off-by: Alexander Hölzl <alexander.hoelzl@gmx.net>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250205174651.103238-1-alexander.hoelzl@gmx.net
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Cc: stable@vger.kernel.org
[mkl: commit message rephrased]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8c8ecc98f5c65947b0070a24bac11e12e47cc65d upstream.
The ELP worker needs to calculate new metric values for all neighbors
"reachable" over an interface. Some of the used metric sources require
locks which might need to sleep. This sleep is incompatible with the RCU
list iterator used for the recorded neighbors. The initial approach to work
around of this problem was to queue another work item per neighbor and then
run this in a new context.
Even when this solved the RCU vs might_sleep() conflict, it has a major
problems: Nothing was stopping the work item in case it is not needed
anymore - for example because one of the related interfaces was removed or
the batman-adv module was unloaded - resulting in potential invalid memory
accesses.
Directly canceling the metric worker also has various problems:
* cancel_work_sync for a to-be-deactivated interface is called with
rtnl_lock held. But the code in the ELP metric worker also tries to use
rtnl_lock() - which will never return in this case. This also means that
cancel_work_sync would never return because it is waiting for the worker
to finish.
* iterating over the neighbor list for the to-be-deactivated interface is
currently done using the RCU specific methods. Which means that it is
possible to miss items when iterating over it without the associated
spinlock - a behaviour which is acceptable for a periodic metric check
but not for a cleanup routine (which must "stop" all still running
workers)
The better approch is to get rid of the per interface neighbor metric
worker and handle everything in the interface worker. The original problems
are solved by:
* creating a list of neighbors which require new metric information inside
the RCU protected context, gathering the metric according to the new list
outside the RCU protected context
* only use rcu_trylock inside metric gathering code to avoid a deadlock
when the cancel_delayed_work_sync is called in the interface removal code
(which is called with the rtnl_lock held)
Cc: stable@vger.kernel.org
Fixes: c833484e5f38 ("batman-adv: ELP - compute the metric based on the estimated throughput")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e7e34ffc976aaae4f465b7898303241b81ceefc3 upstream.
If a temporary error happened in the evaluation of the neighbor throughput
information, then the invalid throughput result should not be stored in the
throughtput EWMA.
Cc: stable@vger.kernel.org
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ccb7276a6d26d6f8416e315b43b45e15ee7f29e2 upstream.
Reference counting is used to ensure that
batadv_hardif_neigh_node and batadv_hard_iface
are not freed before/during
batadv_v_elp_throughput_metric_update work is
finished.
But there isn't a guarantee that the hard if will
remain associated with a soft interface up until
the work is finished.
This fixes a crash triggered by reboot that looks
like this:
Call trace:
batadv_v_mesh_free+0xd0/0x4dc [batman_adv]
batadv_v_elp_throughput_metric_update+0x1c/0xa4
process_one_work+0x178/0x398
worker_thread+0x2e8/0x4d0
kthread+0xd8/0xdc
ret_from_fork+0x10/0x20
(the batadv_v_mesh_free call is misleading,
and does not actually happen)
I was able to make the issue happen more reliably
by changing hardif_neigh->bat_v.metric_work work
to be delayed work. This allowed me to track down
and confirm the fix.
Cc: stable@vger.kernel.org
Fixes: c833484e5f38 ("batman-adv: ELP - compute the metric based on the estimated throughput")
Signed-off-by: Andy Strohman <andrew@andrewstrohman.com>
[sven@narfation.org: prevent entering batadv_v_elp_get_throughput without
soft_iface]
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 48145a57d4bbe3496e8e4880b23ea6b511e6e519 ]
ndisc_send_redirect() is called under RCU protection, not RTNL.
It must use dev_get_by_index_rcu() instead of __dev_get_by_index()
Fixes: 2f17becfbea5 ("vrf: check the original netdevice for generating redirect")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Suryaputra <ssuryaextr@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250207135841.1948589-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bca0902e61731a75fc4860c8720168d9f1bae3b6 ]
If an AX25 device is bound to a socket by setting the SO_BINDTODEVICE
socket option, a refcount leak will occur in ax25_release().
Commit 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by ax25_cb_del()")
added decrement of device refcounts in ax25_release(). In order for that
to work correctly the refcounts must already be incremented when the
device is bound to the socket. An AX25 device can be bound to a socket
by either calling ax25_bind() or setting SO_BINDTODEVICE socket option.
In both cases the refcounts should be incremented, but in fact it is done
only in ax25_bind().
This bug leads to the following issue reported by Syzkaller:
================================================================
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 1 PID: 5932 at lib/refcount.c:31 refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
Modules linked in:
CPU: 1 UID: 0 PID: 5932 Comm: syz-executor424 Not tainted 6.13.0-rc4-syzkaller-00110-g4099a71718b0 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
Call Trace:
<TASK>
__refcount_dec include/linux/refcount.h:336 [inline]
refcount_dec include/linux/refcount.h:351 [inline]
ref_tracker_free+0x710/0x820 lib/ref_tracker.c:236
netdev_tracker_free include/linux/netdevice.h:4156 [inline]
netdev_put include/linux/netdevice.h:4173 [inline]
netdev_put include/linux/netdevice.h:4169 [inline]
ax25_release+0x33f/0xa10 net/ax25/af_ax25.c:1069
__sock_release+0xb0/0x270 net/socket.c:640
sock_close+0x1c/0x30 net/socket.c:1408
...
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
...
</TASK>
================================================================
Fix the implementation of ax25_setsockopt() by adding increment of
refcounts for the new device bound, and decrement of refcounts for
the old unbound device.
Fixes: 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by ax25_cb_del()")
Reported-by: syzbot+33841dc6aa3e1d86b78a@syzkaller.appspotmail.com
Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru>
Link: https://patch.msgid.link/20250203091203.1744-1-m.masimov@mt-integration.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 56b824eb49d6258aa0bad09a406ceac3f643cdae upstream.
Currently the skb size after coalescing is only limited by the skb
layout (the skb must not carry frag_list). A single coalesced skb
covering several MSS can potentially fill completely the receive
buffer. In such a case, the snd win will zero until the receive buffer
will be empty again, affecting tput badly.
Fixes: 8268ed4c9d19 ("mptcp: introduce and use mptcp_try_coalesce()")
Cc: stable@vger.kernel.org # please delay 2 weeks after 6.13-final release
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241230-net-mptcp-rbuf-fixes-v1-3-8608af434ceb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1bb0d1348546ad059f55c93def34e67cb2a034a6 upstream.
With the in-kernel path-manager, it is possible to change the 'fullmesh'
flag. The code in mptcp_pm_nl_fullmesh() expects to change it only on
'subflow' endpoints, to recreate more or less subflows using the linked
address.
Unfortunately, the set_flags() hook was a bit more permissive, and
allowed 'implicit' endpoints to get the 'fullmesh' flag while it is not
allowed before.
That's what syzbot found, triggering the following warning:
WARNING: CPU: 0 PID: 6499 at net/mptcp/pm_netlink.c:1496 __mark_subflow_endp_available net/mptcp/pm_netlink.c:1496 [inline]
WARNING: CPU: 0 PID: 6499 at net/mptcp/pm_netlink.c:1496 mptcp_pm_nl_fullmesh net/mptcp/pm_netlink.c:1980 [inline]
WARNING: CPU: 0 PID: 6499 at net/mptcp/pm_netlink.c:1496 mptcp_nl_set_flags net/mptcp/pm_netlink.c:2003 [inline]
WARNING: CPU: 0 PID: 6499 at net/mptcp/pm_netlink.c:1496 mptcp_pm_nl_set_flags+0x974/0xdc0 net/mptcp/pm_netlink.c:2064
Modules linked in:
CPU: 0 UID: 0 PID: 6499 Comm: syz.1.413 Not tainted 6.13.0-rc5-syzkaller-00172-gd1bf27c4e176 #0
Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
RIP: 0010:__mark_subflow_endp_available net/mptcp/pm_netlink.c:1496 [inline]
RIP: 0010:mptcp_pm_nl_fullmesh net/mptcp/pm_netlink.c:1980 [inline]
RIP: 0010:mptcp_nl_set_flags net/mptcp/pm_netlink.c:2003 [inline]
RIP: 0010:mptcp_pm_nl_set_flags+0x974/0xdc0 net/mptcp/pm_netlink.c:2064
Code: 01 00 00 49 89 c5 e8 fb 45 e8 f5 e9 b8 fc ff ff e8 f1 45 e8 f5 4c 89 f7 be 03 00 00 00 e8 44 1d 0b f9 eb a0 e8 dd 45 e8 f5 90 <0f> 0b 90 e9 17 ff ff ff 89 d9 80 e1 07 38 c1 0f 8c c9 fc ff ff 48
RSP: 0018:ffffc9000d307240 EFLAGS: 00010293
RAX: ffffffff8bb72e03 RBX: 0000000000000000 RCX: ffff88807da88000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc9000d307430 R08: ffffffff8bb72cf0 R09: 1ffff1100b842a5e
R10: dffffc0000000000 R11: ffffed100b842a5f R12: ffff88801e2e5ac0
R13: ffff88805c214800 R14: ffff88805c2152e8 R15: 1ffff1100b842a5d
FS: 00005555619f6500(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020002840 CR3: 00000000247e6000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0xb14/0xec0 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2542
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1347
netlink_sendmsg+0x8e4/0xcb0 net/netlink/af_netlink.c:1891
sock_sendmsg_nosec net/socket.c:711 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:726
____sys_sendmsg+0x52a/0x7e0 net/socket.c:2583
___sys_sendmsg net/socket.c:2637 [inline]
__sys_sendmsg+0x269/0x350 net/socket.c:2669
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5fe8785d29
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff571f5558 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f5fe8975fa0 RCX: 00007f5fe8785d29
RDX: 0000000000000000 RSI: 0000000020000480 RDI: 0000000000000007
RBP: 00007f5fe8801b08 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f5fe8975fa0 R14: 00007f5fe8975fa0 R15: 00000000000011f4
</TASK>
Here, syzbot managed to set the 'fullmesh' flag on an 'implicit' and
used -- according to 'id_avail_bitmap' -- endpoint, causing the PM to
try decrement the local_addr_used counter which is only incremented for
the 'subflow' endpoint.
Note that 'no type' endpoints -- not 'subflow', 'signal', 'implicit' --
are fine, because their ID will not be marked as used in the 'id_avail'
bitmap, and setting 'fullmesh' can help forcing the creation of subflow
when receiving an ADD_ADDR.
Fixes: 73c762c1f07d ("mptcp: set fullmesh flag in pm_netlink")
Cc: stable@vger.kernel.org
Reported-by: syzbot+cd16e79c1e45f3fe0377@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/6786ac51.050a0220.216c54.00a6.GAE@google.com
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/540
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250123-net-mptcp-syzbot-issues-v1-2-af73258a726f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ Conflicts in pm_netlink.c, because the code has been moved around in
commit 6a42477fe449 ("mptcp: update set_flags interfaces"), but the
same fix can still be applied at the original place. ]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 05d91cdb1f9108426b14975ef4eeddf15875ca05 upstream.
Copy of the rationale from 790071347a0a1a89e618eedcd51c687ea783aeb3:
Change ndo_set_mac_address to dev_set_mac_address because
dev_set_mac_address provides a way to notify network layer about MAC
change. In other case, services may not aware about MAC change and keep
using old one which set from network adapter driver.
As example, DHCP client from systemd do not update MAC address without
notification from net subsystem which leads to the problem with acquiring
the right address from DHCP server.
Since dev_set_mac_address requires RTNL lock the operation can not be
performed directly in the response handler, see
9e2bbab94b88295dcc57c7580393c9ee08d7314d.
The way of selecting the first suitable MAC address from the list is
changed, instead of having the driver check it this patch just assumes
any valid MAC should be good.
Fixes: b8291cf3d118 ("net/ncsi: Add NC-SI 1.2 Get MC MAC Address command")
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
channel
commit 6bb194d036c6e1b329dcdff459338cdd9a54802a upstream.
The NCSI state machine as it's currently implemented assumes that
transition to the next logical state is performed either explicitly by
calling `schedule_work(&ndp->work)` to re-queue itself or implicitly
after processing the predefined (ndp->pending_req_num) number of
replies. Thus to avoid the configuration FSM from advancing prematurely
and getting out of sync with the process it's essential to not skip
waiting for a reply.
This patch makes the code wait for reception of the Deselect Package
response for the last package probed before proceeding to channel
configuration.
Thanks go to Potin Lai and Cosmo Chou for the initial investigation and
testing.
Fixes: 8e13f70be05e ("net/ncsi: Probe single packages to avoid conflict")
Cc: stable@vger.kernel.org
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Link: https://patch.msgid.link/20250116152900.8656-1-fercerpav@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 110b43ef05342d5a11284cc8b21582b698b4ef1c upstream.
The "pipe" variable is a u8 which comes from the network. If it's more
than 127, then it results in memory corruption in the caller,
nci_hci_connect_gate().
Cc: stable@vger.kernel.org
Fixes: a1b0b9415817 ("NFC: nci: Create pipe on specific gate in nci_hci_connect_gate")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/bcf5453b-7204-4297-9c20-4d8c7dacf586@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5c61419e02033eaf01733d66e2fcd4044808f482 upstream.
One of the possible ways to enable the input MTU auto-selection for L2CAP
connections is supposed to be through passing a special "0" value for it
as a socket option. Commit [1] added one of those into avdtp. However, it
simply wouldn't work because the kernel still treats the specified value
as invalid and denies the setting attempt. Recorded BlueZ logs include the
following:
bluetoothd[496]: profiles/audio/avdtp.c:l2cap_connect() setsockopt(L2CAP_OPTIONS): Invalid argument (22)
[1]: https://github.com/bluez/bluez/commit/ae5be371a9f53fed33d2b34748a95a5498fd4b77
Found by Linux Verification Center (linuxtesting.org).
Fixes: 4b6e228e297b ("Bluetooth: Auto tune if input MTU is set to 0")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5f397409f8ee5bc82901eeaf799e1cbc4f8edcf1 upstream.
A NULL sock pointer is passed into l2cap_sock_alloc() when it is called
from l2cap_sock_new_connection_cb() and the error handling paths should
also be aware of it.
Seemingly a more elegant solution would be to swap bt_sock_alloc() and
l2cap_chan_create() calls since they are not interdependent to that moment
but then l2cap_chan_create() adds the soon to be deallocated and still
dummy-initialized channel to the global list accessible by many L2CAP
paths. The channel would be removed from the list in short period of time
but be a bit more straight-forward here and just check for NULL instead of
changing the order of function calls.
Found by Linux Verification Center (linuxtesting.org) with SVACE static
analysis tool.
Fixes: 7c4f78cdb8e7 ("Bluetooth: L2CAP: do not leave dangling sk pointer on error in l2cap_sock_create()")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 41b996ce83bf944de5569d6263c8dbd5513e7ed0 ]
The RXRPC_CALL_SERVER_SECURING state doesn't really belong with the other
states in the call's state set as the other states govern the call's Rx/Tx
phase transition and govern when packets can and can't be received or
transmitted. The "Securing" state doesn't actually govern the reception of
packets and would need to be split depending on whether or not we've
received the last packet yet (to mirror RECV_REQUEST/ACK_REQUEST).
The "Securing" state is more about whether or not we can start forwarding
packets to the application as recvmsg will need to decode them and the
decoding can't take place until the challenge/response exchange has
completed.
Fix this by removing the RXRPC_CALL_SERVER_SECURING state from the state
set and, instead, using a flag, RXRPC_CALL_CONN_CHALLENGING, to track
whether or not we can queue the call for reception by recvmsg() or notify
the kernel app that data is ready. In the event that we've already
received all the packets, the connection event handler will poke the app
layer in the appropriate manner.
Also there's a race whereby the app layer sees the last packet before rxrpc
has managed to end the rx phase and change the state to one amenable to
allowing a reply. Fix this by queuing the packet after calling
rxrpc_end_rx_phase().
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20250204230558.712536-2-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 638ba5089324796c2ee49af10427459c2de35f71 ]
qdisc_tree_reduce_backlog() notifies parent qdisc only if child
qdisc becomes empty, therefore we need to reduce the backlog of the
child qdisc before calling it. Otherwise it would miss the opportunity
to call cops->qlen_notify(), in the case of DRR, it resulted in UAF
since DRR uses ->qlen_notify() to maintain its active list.
Fixes: f8d4bc455047 ("net/sched: netem: account for backlog updates from child qdisc")
Cc: Martin Ottens <martin.ottens@fau.de>
Reported-by: Mingi Cho <mincho@theori.io>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Link: https://patch.msgid.link/20250204005841.223511-4-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a1300691aed9ee852b0a9192e29e2bdc2411a7e6 ]
syzbot reported a soft lockup in rose_loopback_timer(),
with a repro calling bind() from multiple threads.
rose_bind() must lock the socket to avoid this issue.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+7ff41b5215f0c534534e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/67a0f78d.050a0220.d7c5a.00a0.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/20250203170838.3521361-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4241a702e0d0c2ca9364cfac08dbf134264962de ]
The rxrpc_connection attend queue is never used because conn::attend_link
is never initialised and so is always NULL'd out and thus always appears to
be busy. This requires the following fix:
(1) Fix this the attend queue problem by initialising conn::attend_link.
And, consequently, two further fixes for things masked by the above bug:
(2) Fix rxrpc_input_conn_event() to handle being invoked with a NULL
sk_buff pointer - something that can now happen with the above change.
(3) Fix the RXRPC_SKB_MARK_SERVICE_CONN_SECURED message to carry a pointer
to the connection and a ref on it.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
Fixes: f2cce89a074e ("rxrpc: Implement a mechanism to send an event notification to a connection")
Link: https://patch.msgid.link/20250203110307.7265-3-dhowells@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 235174b2bed88501fda689c113c55737f99332d8 ]
Commit 4094871db1d6 ("udp: only do GSO if # of segs > 1") avoided GSO
for small packets. But the kernel currently dismisses GSO requests only
after checking MTU/PMTU on gso_size. This means any packets, regardless
of their payload sizes, could be dropped when PMTU becomes smaller than
requested gso_size. We encountered this issue in production and it
caused a reliability problem that new QUIC connection cannot be
established before PMTU cache expired, while non GSO sockets still
worked fine at the same time.
Ideally, do not check any GSO related constraints when payload size is
smaller than requested gso_size, and return EMSGSIZE instead of EINVAL
on MTU/PMTU check failure to be more specific on the error cause.
Fixes: 4094871db1d6 ("udp: only do GSO if # of segs > 1")
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9e2bbab94b88295dcc57c7580393c9ee08d7314d ]
Obtaining RTNL lock in a response handler is not allowed since it runs
in an atomic softirq context. Postpone setting the MAC address by adding
a dedicated step to the configuration FSM.
Fixes: 790071347a0a ("net/ncsi: change from ndo_set_mac_address to dev_set_mac_address")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20241129-potin-revert-ncsi-set-mac-addr-v1-1-94ea2cb596af@gmail.com
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Tested-by: Potin Lai <potin.lai.pt@gmail.com>
Link: https://patch.msgid.link/20250109145054.30925-1-fercerpav@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b8291cf3d1180b5b61299922f17c9441616a805a ]
This change adds support for the NC-SI 1.2 Get MC MAC Address command,
specified here:
https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.2.0.pdf
It serves the exact same function as the existing OEM Get MAC Address
commands, so if a channel reports that it supports NC-SI 1.2, we prefer
to use the standard command rather than the OEM command.
Verified with an invalid MAC address and 2 valid ones:
[ 55.137072] ftgmac100 1e690000.ftgmac eth0: NCSI: Received 3 provisioned MAC addresses
[ 55.137614] ftgmac100 1e690000.ftgmac eth0: NCSI: MAC address 0: 00:00:00:00:00:00
[ 55.138026] ftgmac100 1e690000.ftgmac eth0: NCSI: MAC address 1: fa:ce:b0:0c:20:22
[ 55.138528] ftgmac100 1e690000.ftgmac eth0: NCSI: MAC address 2: fa:ce:b0:0c:20:23
[ 55.139241] ftgmac100 1e690000.ftgmac eth0: NCSI: Unable to assign 00:00:00:00:00:00 to device
[ 55.140098] ftgmac100 1e690000.ftgmac eth0: NCSI: Set MAC address to fa:ce:b0:0c:20:22
Signed-off-by: Peter Delevoryas <peter@pjd.dev>
Signed-off-by: Patrick Williams <patrick@stwcx.xyz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 9e2bbab94b88 ("net/ncsi: fix locking in Get MAC Address handling")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5fe71fda89745fc3cd95f70d06e9162b595c3702 ]
On a 32bit system the "keylen + sizeof(struct tipc_aead_key)" math could
have an integer wrapping issue. It doesn't matter because the "keylen"
is checked on the next line, but just to make life easier for static
analysis tools, let's re-order these conditions and avoid the integer
overflow.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 26fbd3494a7dd26269cb0817c289267dbcfdec06 ]
This fixes the following crash:
==================================================================
BUG: KASAN: slab-use-after-free in mgmt_remove_adv_monitor_sync+0x3a/0xd0 net/bluetooth/mgmt.c:5543
Read of size 8 at addr ffff88814128f898 by task kworker/u9:4/5961
CPU: 1 UID: 0 PID: 5961 Comm: kworker/u9:4 Not tainted 6.12.0-syzkaller-10684-gf1cd565ce577 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Workqueue: hci0 hci_cmd_sync_work
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0x169/0x550 mm/kasan/report.c:489
kasan_report+0x143/0x180 mm/kasan/report.c:602
mgmt_remove_adv_monitor_sync+0x3a/0xd0 net/bluetooth/mgmt.c:5543
hci_cmd_sync_work+0x22b/0x400 net/bluetooth/hci_sync.c:332
process_one_work kernel/workqueue.c:3229 [inline]
process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310
worker_thread+0x870/0xd30 kernel/workqueue.c:3391
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Allocated by task 16026:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
__kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394
kasan_kmalloc include/linux/kasan.h:260 [inline]
__kmalloc_cache_noprof+0x243/0x390 mm/slub.c:4314
kmalloc_noprof include/linux/slab.h:901 [inline]
kzalloc_noprof include/linux/slab.h:1037 [inline]
mgmt_pending_new+0x65/0x250 net/bluetooth/mgmt_util.c:269
mgmt_pending_add+0x36/0x120 net/bluetooth/mgmt_util.c:296
remove_adv_monitor+0x102/0x1b0 net/bluetooth/mgmt.c:5568
hci_mgmt_cmd+0xc47/0x11d0 net/bluetooth/hci_sock.c:1712
hci_sock_sendmsg+0x7b8/0x11c0 net/bluetooth/hci_sock.c:1832
sock_sendmsg_nosec net/socket.c:711 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:726
sock_write_iter+0x2d7/0x3f0 net/socket.c:1147
new_sync_write fs/read_write.c:586 [inline]
vfs_write+0xaeb/0xd30 fs/read_write.c:679
ksys_write+0x18f/0x2b0 fs/read_write.c:731
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 16022:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:582
poison_slab_object mm/kasan/common.c:247 [inline]
__kasan_slab_free+0x59/0x70 mm/kasan/common.c:264
kasan_slab_free include/linux/kasan.h:233 [inline]
slab_free_hook mm/slub.c:2338 [inline]
slab_free mm/slub.c:4598 [inline]
kfree+0x196/0x420 mm/slub.c:4746
mgmt_pending_foreach+0xd1/0x130 net/bluetooth/mgmt_util.c:259
__mgmt_power_off+0x183/0x430 net/bluetooth/mgmt.c:9550
hci_dev_close_sync+0x6c4/0x11c0 net/bluetooth/hci_sync.c:5208
hci_dev_do_close net/bluetooth/hci_core.c:483 [inline]
hci_dev_close+0x112/0x210 net/bluetooth/hci_core.c:508
sock_do_ioctl+0x158/0x460 net/socket.c:1209
sock_ioctl+0x626/0x8e0 net/socket.c:1328
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:906 [inline]
__se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Reported-by: syzbot+479aff51bb361ef5aa18@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=479aff51bb361ef5aa18
Tested-by: syzbot+479aff51bb361ef5aa18@syzkaller.appspotmail.com
Signed-off-by: Mazin Al Haddad <mazin@getstate.dev>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 619af16b3b57a3a4ee50b9a30add9ff155541e71 upstream.
Syzbot was able to trigger a data stream corruption:
WARNING: CPU: 0 PID: 9846 at net/mptcp/protocol.c:1024 __mptcp_clean_una+0xddb/0xff0 net/mptcp/protocol.c:1024
Modules linked in:
CPU: 0 UID: 0 PID: 9846 Comm: syz-executor351 Not tainted 6.13.0-rc2-syzkaller-00059-g00a5acdbf398 #0
Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 11/25/2024
RIP: 0010:__mptcp_clean_una+0xddb/0xff0 net/mptcp/protocol.c:1024
Code: fa ff ff 48 8b 4c 24 18 80 e1 07 fe c1 38 c1 0f 8c 8e fa ff ff 48 8b 7c 24 18 e8 e0 db 54 f6 e9 7f fa ff ff e8 e6 80 ee f5 90 <0f> 0b 90 4c 8b 6c 24 40 4d 89 f4 e9 04 f5 ff ff 44 89 f1 80 e1 07
RSP: 0018:ffffc9000c0cf400 EFLAGS: 00010293
RAX: ffffffff8bb0dd5a RBX: ffff888033f5d230 RCX: ffff888059ce8000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc9000c0cf518 R08: ffffffff8bb0d1dd R09: 1ffff110170c8928
R10: dffffc0000000000 R11: ffffed10170c8929 R12: 0000000000000000
R13: ffff888033f5d220 R14: dffffc0000000000 R15: ffff8880592b8000
FS: 00007f6e866496c0(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6e86f491a0 CR3: 00000000310e6000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__mptcp_clean_una_wakeup+0x7f/0x2d0 net/mptcp/protocol.c:1074
mptcp_release_cb+0x7cb/0xb30 net/mptcp/protocol.c:3493
release_sock+0x1aa/0x1f0 net/core/sock.c:3640
inet_wait_for_connect net/ipv4/af_inet.c:609 [inline]
__inet_stream_connect+0x8bd/0xf30 net/ipv4/af_inet.c:703
mptcp_sendmsg_fastopen+0x2a2/0x530 net/mptcp/protocol.c:1755
mptcp_sendmsg+0x1884/0x1b10 net/mptcp/protocol.c:1830
sock_sendmsg_nosec net/socket.c:711 [inline]
__sock_sendmsg+0x1a6/0x270 net/socket.c:726
____sys_sendmsg+0x52a/0x7e0 net/socket.c:2583
___sys_sendmsg net/socket.c:2637 [inline]
__sys_sendmsg+0x269/0x350 net/socket.c:2669
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f6e86ebfe69
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 b1 1f 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f6e86649168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f6e86f491b8 RCX: 00007f6e86ebfe69
RDX: 0000000030004001 RSI: 0000000020000080 RDI: 0000000000000003
RBP: 00007f6e86f491b0 R08: 00007f6e866496c0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6e86f491bc
R13: 000000000000006e R14: 00007ffe445d9420 R15: 00007ffe445d9508
</TASK>
The root cause is the bad handling of disconnect() generated internally
by the MPTCP protocol in case of connect FASTOPEN errors.
Address the issue increasing the socket disconnect counter even on such
a case, to allow other threads waiting on the same socket lock to
properly error out.
Fixes: c2b2ae3925b6 ("mptcp: handle correctly disconnect() failures")
Cc: stable@vger.kernel.org
Reported-by: syzbot+ebc0b8ae5d3590b2c074@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/67605870.050a0220.37aaf.0137.GAE@google.com
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/537
Tested-by: syzbot+ebc0b8ae5d3590b2c074@syzkaller.appspotmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250123-net-mptcp-syzbot-issues-v1-3-af73258a726f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c86b000782daba926c627d2fa00c3f60a75e7472 upstream.
MPTCP maintains the received sub-options status is the bitmask carrying
the received suboptions and in several bitfields carrying per suboption
additional info.
Zeroing the bitmask before parsing is not enough to ensure a consistent
status, and the MPTCP code has to additionally clear some bitfiled
depending on the actually parsed suboption.
The above schema is fragile, and syzbot managed to trigger a path where
a relevant bitfield is not cleared/initialized:
BUG: KMSAN: uninit-value in __mptcp_expand_seq net/mptcp/options.c:1030 [inline]
BUG: KMSAN: uninit-value in mptcp_expand_seq net/mptcp/protocol.h:864 [inline]
BUG: KMSAN: uninit-value in ack_update_msk net/mptcp/options.c:1060 [inline]
BUG: KMSAN: uninit-value in mptcp_incoming_options+0x2036/0x3d30 net/mptcp/options.c:1209
__mptcp_expand_seq net/mptcp/options.c:1030 [inline]
mptcp_expand_seq net/mptcp/protocol.h:864 [inline]
ack_update_msk net/mptcp/options.c:1060 [inline]
mptcp_incoming_options+0x2036/0x3d30 net/mptcp/options.c:1209
tcp_data_queue+0xb4/0x7be0 net/ipv4/tcp_input.c:5233
tcp_rcv_established+0x1061/0x2510 net/ipv4/tcp_input.c:6264
tcp_v4_do_rcv+0x7f3/0x11a0 net/ipv4/tcp_ipv4.c:1916
tcp_v4_rcv+0x51df/0x5750 net/ipv4/tcp_ipv4.c:2351
ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x336/0x500 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:460 [inline]
ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:447
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:567
__netif_receive_skb_one_core net/core/dev.c:5704 [inline]
__netif_receive_skb+0x319/0xa00 net/core/dev.c:5817
process_backlog+0x4ad/0xa50 net/core/dev.c:6149
__napi_poll+0xe7/0x980 net/core/dev.c:6902
napi_poll net/core/dev.c:6971 [inline]
net_rx_action+0xa5a/0x19b0 net/core/dev.c:7093
handle_softirqs+0x1a0/0x7c0 kernel/softirq.c:561
__do_softirq+0x14/0x1a kernel/softirq.c:595
do_softirq+0x9a/0x100 kernel/softirq.c:462
__local_bh_enable_ip+0x9f/0xb0 kernel/softirq.c:389
local_bh_enable include/linux/bottom_half.h:33 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:919 [inline]
__dev_queue_xmit+0x2758/0x57d0 net/core/dev.c:4493
dev_queue_xmit include/linux/netdevice.h:3168 [inline]
neigh_hh_output include/net/neighbour.h:523 [inline]
neigh_output include/net/neighbour.h:537 [inline]
ip_finish_output2+0x187c/0x1b70 net/ipv4/ip_output.c:236
__ip_finish_output+0x287/0x810
ip_finish_output+0x4b/0x600 net/ipv4/ip_output.c:324
NF_HOOK_COND include/linux/netfilter.h:303 [inline]
ip_output+0x15f/0x3f0 net/ipv4/ip_output.c:434
dst_output include/net/dst.h:450 [inline]
ip_local_out net/ipv4/ip_output.c:130 [inline]
__ip_queue_xmit+0x1f2a/0x20d0 net/ipv4/ip_output.c:536
ip_queue_xmit+0x60/0x80 net/ipv4/ip_output.c:550
__tcp_transmit_skb+0x3cea/0x4900 net/ipv4/tcp_output.c:1468
tcp_transmit_skb net/ipv4/tcp_output.c:1486 [inline]
tcp_write_xmit+0x3b90/0x9070 net/ipv4/tcp_output.c:2829
__tcp_push_pending_frames+0xc4/0x380 net/ipv4/tcp_output.c:3012
tcp_send_fin+0x9f6/0xf50 net/ipv4/tcp_output.c:3618
__tcp_close+0x140c/0x1550 net/ipv4/tcp.c:3130
__mptcp_close_ssk+0x74e/0x16f0 net/mptcp/protocol.c:2496
mptcp_close_ssk+0x26b/0x2c0 net/mptcp/protocol.c:2550
mptcp_pm_nl_rm_addr_or_subflow+0x635/0xd10 net/mptcp/pm_netlink.c:889
mptcp_pm_nl_rm_subflow_received net/mptcp/pm_netlink.c:924 [inline]
mptcp_pm_flush_addrs_and_subflows net/mptcp/pm_netlink.c:1688 [inline]
mptcp_nl_flush_addrs_list net/mptcp/pm_netlink.c:1709 [inline]
mptcp_pm_nl_flush_addrs_doit+0xe10/0x1630 net/mptcp/pm_netlink.c:1750
genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0x1214/0x12c0 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2542
genl_rcv+0x40/0x60 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
netlink_unicast+0xf52/0x1260 net/netlink/af_netlink.c:1347
netlink_sendmsg+0x10da/0x11e0 net/netlink/af_netlink.c:1891
sock_sendmsg_nosec net/socket.c:711 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:726
____sys_sendmsg+0x877/0xb60 net/socket.c:2583
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2637
__sys_sendmsg net/socket.c:2669 [inline]
__do_sys_sendmsg net/socket.c:2674 [inline]
__se_sys_sendmsg net/socket.c:2672 [inline]
__x64_sys_sendmsg+0x212/0x3c0 net/socket.c:2672
x64_sys_call+0x2ed6/0x3c30 arch/x86/include/generated/asm/syscalls_64.h:47
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was stored to memory at:
mptcp_get_options+0x2c0f/0x2f20 net/mptcp/options.c:397
mptcp_incoming_options+0x19a/0x3d30 net/mptcp/options.c:1150
tcp_data_queue+0xb4/0x7be0 net/ipv4/tcp_input.c:5233
tcp_rcv_established+0x1061/0x2510 net/ipv4/tcp_input.c:6264
tcp_v4_do_rcv+0x7f3/0x11a0 net/ipv4/tcp_ipv4.c:1916
tcp_v4_rcv+0x51df/0x5750 net/ipv4/tcp_ipv4.c:2351
ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x336/0x500 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:460 [inline]
ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:447
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:567
__netif_receive_skb_one_core net/core/dev.c:5704 [inline]
__netif_receive_skb+0x319/0xa00 net/core/dev.c:5817
process_backlog+0x4ad/0xa50 net/core/dev.c:6149
__napi_poll+0xe7/0x980 net/core/dev.c:6902
napi_poll net/core/dev.c:6971 [inline]
net_rx_action+0xa5a/0x19b0 net/core/dev.c:7093
handle_softirqs+0x1a0/0x7c0 kernel/softirq.c:561
__do_softirq+0x14/0x1a kernel/softirq.c:595
Uninit was stored to memory at:
put_unaligned_be32 include/linux/unaligned.h:68 [inline]
mptcp_write_options+0x17f9/0x3100 net/mptcp/options.c:1417
mptcp_options_write net/ipv4/tcp_output.c:465 [inline]
tcp_options_write+0x6d9/0xe90 net/ipv4/tcp_output.c:759
__tcp_transmit_skb+0x294b/0x4900 net/ipv4/tcp_output.c:1414
tcp_transmit_skb net/ipv4/tcp_output.c:1486 [inline]
tcp_write_xmit+0x3b90/0x9070 net/ipv4/tcp_output.c:2829
__tcp_push_pending_frames+0xc4/0x380 net/ipv4/tcp_output.c:3012
tcp_send_fin+0x9f6/0xf50 net/ipv4/tcp_output.c:3618
__tcp_close+0x140c/0x1550 net/ipv4/tcp.c:3130
__mptcp_close_ssk+0x74e/0x16f0 net/mptcp/protocol.c:2496
mptcp_close_ssk+0x26b/0x2c0 net/mptcp/protocol.c:2550
mptcp_pm_nl_rm_addr_or_subflow+0x635/0xd10 net/mptcp/pm_netlink.c:889
mptcp_pm_nl_rm_subflow_received net/mptcp/pm_netlink.c:924 [inline]
mptcp_pm_flush_addrs_and_subflows net/mptcp/pm_netlink.c:1688 [inline]
mptcp_nl_flush_addrs_list net/mptcp/pm_netlink.c:1709 [inline]
mptcp_pm_nl_flush_addrs_doit+0xe10/0x1630 net/mptcp/pm_netlink.c:1750
genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0x1214/0x12c0 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2542
genl_rcv+0x40/0x60 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
netlink_unicast+0xf52/0x1260 net/netlink/af_netlink.c:1347
netlink_sendmsg+0x10da/0x11e0 net/netlink/af_netlink.c:1891
sock_sendmsg_nosec net/socket.c:711 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:726
____sys_sendmsg+0x877/0xb60 net/socket.c:2583
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2637
__sys_sendmsg net/socket.c:2669 [inline]
__do_sys_sendmsg net/socket.c:2674 [inline]
__se_sys_sendmsg net/socket.c:2672 [inline]
__x64_sys_sendmsg+0x212/0x3c0 net/socket.c:2672
x64_sys_call+0x2ed6/0x3c30 arch/x86/include/generated/asm/syscalls_64.h:47
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was stored to memory at:
mptcp_pm_add_addr_signal+0x3d7/0x4c0
mptcp_established_options_add_addr net/mptcp/options.c:666 [inline]
mptcp_established_options+0x1b9b/0x3a00 net/mptcp/options.c:884
tcp_established_options+0x2c4/0x7d0 net/ipv4/tcp_output.c:1012
__tcp_transmit_skb+0x5b7/0x4900 net/ipv4/tcp_output.c:1333
tcp_transmit_skb net/ipv4/tcp_output.c:1486 [inline]
tcp_write_xmit+0x3b90/0x9070 net/ipv4/tcp_output.c:2829
__tcp_push_pending_frames+0xc4/0x380 net/ipv4/tcp_output.c:3012
tcp_send_fin+0x9f6/0xf50 net/ipv4/tcp_output.c:3618
__tcp_close+0x140c/0x1550 net/ipv4/tcp.c:3130
__mptcp_close_ssk+0x74e/0x16f0 net/mptcp/protocol.c:2496
mptcp_close_ssk+0x26b/0x2c0 net/mptcp/protocol.c:2550
mptcp_pm_nl_rm_addr_or_subflow+0x635/0xd10 net/mptcp/pm_netlink.c:889
mptcp_pm_nl_rm_subflow_received net/mptcp/pm_netlink.c:924 [inline]
mptcp_pm_flush_addrs_and_subflows net/mptcp/pm_netlink.c:1688 [inline]
mptcp_nl_flush_addrs_list net/mptcp/pm_netlink.c:1709 [inline]
mptcp_pm_nl_flush_addrs_doit+0xe10/0x1630 net/mptcp/pm_netlink.c:1750
genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0x1214/0x12c0 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2542
genl_rcv+0x40/0x60 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
netlink_unicast+0xf52/0x1260 net/netlink/af_netlink.c:1347
netlink_sendmsg+0x10da/0x11e0 net/netlink/af_netlink.c:1891
sock_sendmsg_nosec net/socket.c:711 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:726
____sys_sendmsg+0x877/0xb60 net/socket.c:2583
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2637
__sys_sendmsg net/socket.c:2669 [inline]
__do_sys_sendmsg net/socket.c:2674 [inline]
__se_sys_sendmsg net/socket.c:2672 [inline]
__x64_sys_sendmsg+0x212/0x3c0 net/socket.c:2672
x64_sys_call+0x2ed6/0x3c30 arch/x86/include/generated/asm/syscalls_64.h:47
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was stored to memory at:
mptcp_pm_add_addr_received+0x95f/0xdd0 net/mptcp/pm.c:235
mptcp_incoming_options+0x2983/0x3d30 net/mptcp/options.c:1169
tcp_data_queue+0xb4/0x7be0 net/ipv4/tcp_input.c:5233
tcp_rcv_state_process+0x2a38/0x49d0 net/ipv4/tcp_input.c:6972
tcp_v4_do_rcv+0xbf9/0x11a0 net/ipv4/tcp_ipv4.c:1939
tcp_v4_rcv+0x51df/0x5750 net/ipv4/tcp_ipv4.c:2351
ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x336/0x500 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:460 [inline]
ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:447
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:567
__netif_receive_skb_one_core net/core/dev.c:5704 [inline]
__netif_receive_skb+0x319/0xa00 net/core/dev.c:5817
process_backlog+0x4ad/0xa50 net/core/dev.c:6149
__napi_poll+0xe7/0x980 net/core/dev.c:6902
napi_poll net/core/dev.c:6971 [inline]
net_rx_action+0xa5a/0x19b0 net/core/dev.c:7093
handle_softirqs+0x1a0/0x7c0 kernel/softirq.c:561
__do_softirq+0x14/0x1a kernel/softirq.c:595
Local variable mp_opt created at:
mptcp_incoming_options+0x119/0x3d30 net/mptcp/options.c:1127
tcp_data_queue+0xb4/0x7be0 net/ipv4/tcp_input.c:5233
The current schema is too fragile; address the issue grouping all the
state-related data together and clearing the whole group instead of
just the bitmask. This also cleans-up the code a bit, as there is no
need to individually clear "random" bitfield in a couple of places
any more.
Fixes: 84dfe3677a6f ("mptcp: send out dedicated ADD_ADDR packet")
Cc: stable@vger.kernel.org
Reported-by: syzbot+23728c2df58b3bd175ad@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/6786ac51.050a0220.216c54.00a7.GAE@google.com
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/541
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250123-net-mptcp-syzbot-issues-v1-1-af73258a726f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1b9335a8000fb70742f7db10af314104b6ace220 upstream.
The field length description provides the length of each separated key
field in the concatenation, each field gets rounded up to 32-bits to
calculate the pipapo rule width from pipapo_init(). The set key length
provides the total size of the key aligned to 32-bits.
Register-based arithmetics still allows for combining mismatching set
key length and field length description, eg. set key length 10 and field
description [ 5, 4 ] leading to pipapo width of 12.
Cc: stable@vger.kernel.org
Fixes: 3ce67e3793f4 ("netfilter: nf_tables: do not allow mismatch field size and set key length")
Reported-by: Noam Rathaus <noamr@ssd-disclosure.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 966a675da844f1a764bb44557c21561cc3d09840 upstream.
I noticed that a handful of NFSv3 fstests were taking an
unexpectedly long time to run. Troubleshooting showed that the
server's TCP window closed and never re-opened, which caused the
client to trigger an RPC retransmit timeout after 180 seconds.
The client's recovery action was to establish a fresh connection
and retransmit the timed-out requests. This worked, but it adds a
long delay.
I tracked the problem to the commit that attempted to reduce the
rate at which the network layer delivers TCP socket data_ready
callbacks. Under most circumstances this change worked as expected,
but for NFSv3, which has no session or other type of throttling, it
can overwhelm the receiver on occasion.
I'm sure I could tweak the lowat settings, but the small benefit
doesn't seem worth the bother. Just revert it.
Fixes: 2b877fc53e97 ("SUNRPC: Reduce thread wake-up rate when receiving large RPC messages")
Cc: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0f5697f1a3f99bc2b674b8aa3c5da822c5673c11 ]
Stephan Wurm reported that my recent patch broke VLAN support.
Apparently skb->mac_len is not correct for VLAN traffic as
shown by debug traces [1].
Use instead pskb_may_pull() to make sure the expected header
is present in skb->head.
Many thanks to Stephan for his help.
[1]
kernel: skb len=170 headroom=2 headlen=170 tailroom=20
mac=(2,14) mac_len=14 net=(16,-1) trans=-1
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0x0 start=0 offset=0 ip_summed=0 complete_sw=0 valid=0 level=0)
hash(0x0 sw=0 l4=0) proto=0x0000 pkttype=0 iif=0
priority=0x0 mark=0x0 alloc_cpu=0 vlan_all=0x0
encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
kernel: dev name=prp0 feat=0x0000000000007000
kernel: sk family=17 type=3 proto=0
kernel: skb headroom: 00000000: 74 00
kernel: skb linear: 00000000: 01 0c cd 01 00 01 00 d0 93 53 9c cb 81 00 80 00
kernel: skb linear: 00000010: 88 b8 00 01 00 98 00 00 00 00 61 81 8d 80 16 52
kernel: skb linear: 00000020: 45 47 44 4e 43 54 52 4c 2f 4c 4c 4e 30 24 47 4f
kernel: skb linear: 00000030: 24 47 6f 43 62 81 01 14 82 16 52 45 47 44 4e 43
kernel: skb linear: 00000040: 54 52 4c 2f 4c 4c 4e 30 24 44 73 47 6f 6f 73 65
kernel: skb linear: 00000050: 83 07 47 6f 49 64 65 6e 74 84 08 67 8d f5 93 7e
kernel: skb linear: 00000060: 76 c8 00 85 01 01 86 01 00 87 01 00 88 01 01 89
kernel: skb linear: 00000070: 01 00 8a 01 02 ab 33 a2 15 83 01 00 84 03 03 00
kernel: skb linear: 00000080: 00 91 08 67 8d f5 92 77 4b c6 1f 83 01 00 a2 1a
kernel: skb linear: 00000090: a2 06 85 01 00 83 01 00 84 03 03 00 00 91 08 67
kernel: skb linear: 000000a0: 8d f5 92 77 4b c6 1f 83 01 00
kernel: skb tailroom: 00000000: 80 18 02 00 fe 4e 00 00 01 01 08 0a 4f fd 5e d1
kernel: skb tailroom: 00000010: 4f fd 5e cd
Fixes: b9653d19e556 ("net: hsr: avoid potential out-of-bound access in fill_frame_info()")
Reported-by: Stephan Wurm <stephan.wurm@a-eberle.de>
Tested-by: Stephan Wurm <stephan.wurm@a-eberle.de>
Closes: https://lore.kernel.org/netdev/Z4o_UC0HweBHJ_cw@PC-LX-SteWu/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250129130007.644084-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3595599fa8360bb3c7afa7ee50c810b4a64106ea ]
Device-bound programs are used to support RX metadata kfuncs. These
kfuncs are driver-specific and rely on the driver context to read the
metadata. This means they can't work in generic XDP mode. However, there
is no check to disallow such programs from being attached in generic
mode, in which case the metadata kfuncs will be called in an invalid
context, leading to crashes.
Fix this by adding a check to disallow attaching device-bound programs
in generic mode.
Fixes: 2b3486bc2d23 ("bpf: Introduce device-bound XDP programs")
Reported-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de>
Closes: https://lore.kernel.org/r/dae862ec-43b5-41a0-8edf-46c59071cdda@hetzner-cloud.de
Tested-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250127131344.238147-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8c670bdfa58e48abad1d5b6ca1ee843ca91f7303 ]
Testing with iperf3 using the "pasta" protocol splicer has revealed
a problem in the way tcp handles window advertising in extreme memory
squeeze situations.
Under memory pressure, a socket endpoint may temporarily advertise
a zero-sized window, but this is not stored as part of the socket data.
The reasoning behind this is that it is considered a temporary setting
which shouldn't influence any further calculations.
However, if we happen to stall at an unfortunate value of the current
window size, the algorithm selecting a new value will consistently fail
to advertise a non-zero window once we have freed up enough memory.
This means that this side's notion of the current window size is
different from the one last advertised to the peer, causing the latter
to not send any data to resolve the sitution.
The problem occurs on the iperf3 server side, and the socket in question
is a completely regular socket with the default settings for the
fedora40 kernel. We do not use SO_PEEK or SO_RCVBUF on the socket.
The following excerpt of a logging session, with own comments added,
shows more in detail what is happening:
// tcp_v4_rcv(->)
// tcp_rcv_established(->)
[5201<->39222]: ==== Activating log @ net/ipv4/tcp_input.c/tcp_data_queue()/5257 ====
[5201<->39222]: tcp_data_queue(->)
[5201<->39222]: DROPPING skb [265600160..265665640], reason: SKB_DROP_REASON_PROTO_MEM
[rcv_nxt 265600160, rcv_wnd 262144, snt_ack 265469200, win_now 131184]
[copied_seq 259909392->260034360 (124968), unread 5565800, qlen 85, ofoq 0]
[OFO queue: gap: 65480, len: 0]
[5201<->39222]: tcp_data_queue(<-)
[5201<->39222]: __tcp_transmit_skb(->)
[tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160]
[5201<->39222]: tcp_select_window(->)
[5201<->39222]: (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_NOMEM) ? --> TRUE
[tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160]
returning 0
[5201<->39222]: tcp_select_window(<-)
[5201<->39222]: ADVERTISING WIN 0, ACK_SEQ: 265600160
[5201<->39222]: [__tcp_transmit_skb(<-)
[5201<->39222]: tcp_rcv_established(<-)
[5201<->39222]: tcp_v4_rcv(<-)
// Receive queue is at 85 buffers and we are out of memory.
// We drop the incoming buffer, although it is in sequence, and decide
// to send an advertisement with a window of zero.
// We don't update tp->rcv_wnd and tp->rcv_wup accordingly, which means
// we unconditionally shrink the window.
[5201<->39222]: tcp_recvmsg_locked(->)
[5201<->39222]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160
[5201<->39222]: [new_win = 0, win_now = 131184, 2 * win_now = 262368]
[5201<->39222]: [new_win >= (2 * win_now) ? --> time_to_ack = 0]
[5201<->39222]: NOT calling tcp_send_ack()
[tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160]
[5201<->39222]: __tcp_cleanup_rbuf(<-)
[rcv_nxt 265600160, rcv_wnd 262144, snt_ack 265469200, win_now 131184]
[copied_seq 260040464->260040464 (0), unread 5559696, qlen 85, ofoq 0]
returning 6104 bytes
[5201<->39222]: tcp_recvmsg_locked(<-)
// After each read, the algorithm for calculating the new receive
// window in __tcp_cleanup_rbuf() finds it is too small to advertise
// or to update tp->rcv_wnd.
// Meanwhile, the peer thinks the window is zero, and will not send
// any more data to trigger an update from the interrupt mode side.
[5201<->39222]: tcp_recvmsg_locked(->)
[5201<->39222]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160
[5201<->39222]: [new_win = 262144, win_now = 131184, 2 * win_now = 262368]
[5201<->39222]: [new_win >= (2 * win_now) ? --> time_to_ack = 0]
[5201<->39222]: NOT calling tcp_send_ack()
[tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160]
[5201<->39222]: __tcp_cleanup_rbuf(<-)
[rcv_nxt 265600160, rcv_wnd 262144, snt_ack 265469200, win_now 131184]
[copied_seq 260099840->260171536 (71696), unread 5428624, qlen 83, ofoq 0]
returning 131072 bytes
[5201<->39222]: tcp_recvmsg_locked(<-)
// The above pattern repeats again and again, since nothing changes
// between the reads.
[...]
[5201<->39222]: tcp_recvmsg_locked(->)
[5201<->39222]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160
[5201<->39222]: [new_win = 262144, win_now = 131184, 2 * win_now = 262368]
[5201<->39222]: [new_win >= (2 * win_now) ? --> time_to_ack = 0]
[5201<->39222]: NOT calling tcp_send_ack()
[tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160]
[5201<->39222]: __tcp_cleanup_rbuf(<-)
[rcv_nxt 265600160, rcv_wnd 262144, snt_ack 265469200, win_now 131184]
[copied_seq 265600160->265600160 (0), unread 0, qlen 0, ofoq 0]
returning 54672 bytes
[5201<->39222]: tcp_recvmsg_locked(<-)
// The receive queue is empty, but no new advertisement has been sent.
// The peer still thinks the receive window is zero, and sends nothing.
// We have ended up in a deadlock situation.
Note that well behaved endpoints will send win0 probes, so the problem
will not occur.
Furthermore, we have observed that in these situations this side may
send out an updated 'th->ack_seq´ which is not stored in tp->rcv_wup
as it should be. Backing ack_seq seems to be harmless, but is of
course still wrong from a protocol viewpoint.
We fix this by updating the socket state correctly when a packet has
been dropped because of memory exhaustion and we have to advertize
a zero window.
Further testing shows that the connection recovers neatly from the
squeeze situation, and traffic can continue indefinitely.
Fixes: e2142825c120 ("net: tcp: send zero-window ACK when no memory")
Cc: Menglong Dong <menglong8.dong@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250127231304.1465565-1-jmaloy@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit aa388c72113b7458127b709bdd7d3628af26e9b4 ]
sk_err is set when a (connectible) connect() fails. Effectively, this makes
an otherwise still healthy SS_UNCONNECTED socket impossible to use for any
subsequent connection attempts.
Clear sk_err upon trying to establish a connection.
Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20250128-vsock-transport-vs-autobind-v3-2-1cf57065b770@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5de7665e0a0746b5ad7943554b34db8f8614a196 ]
Rose timers only acquire the socket spinlock, without
checking if the socket is owned by one user thread.
Add a check and rearm the timers if needed.
BUG: KASAN: slab-use-after-free in rose_timer_expiry+0x31d/0x360 net/rose/rose_timer.c:174
Read of size 2 at addr ffff88802f09b82a by task swapper/0/0
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.13.0-rc5-syzkaller-00172-gd1bf27c4e176 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0x169/0x550 mm/kasan/report.c:489
kasan_report+0x143/0x180 mm/kasan/report.c:602
rose_timer_expiry+0x31d/0x360 net/rose/rose_timer.c:174
call_timer_fn+0x187/0x650 kernel/time/timer.c:1793
expire_timers kernel/time/timer.c:1844 [inline]
__run_timers kernel/time/timer.c:2418 [inline]
__run_timer_base+0x66a/0x8e0 kernel/time/timer.c:2430
run_timer_base kernel/time/timer.c:2439 [inline]
run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2449
handle_softirqs+0x2d4/0x9b0 kernel/softirq.c:561
__do_softirq kernel/softirq.c:595 [inline]
invoke_softirq kernel/softirq.c:435 [inline]
__irq_exit_rcu+0xf7/0x220 kernel/softirq.c:662
irq_exit_rcu+0x9/0x30 kernel/softirq.c:678
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1049 [inline]
sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1049
</IRQ>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250122180244.1861468-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 15a901361ec3fb1c393f91880e1cbf24ec0a88bd ]
syzbot found that calling mr_mfc_uses_dev() for unres entries
would crash [1], because c->mfc_un.res.minvif / c->mfc_un.res.maxvif
alias to "struct sk_buff_head unresolved", which contain two pointers.
This code never worked, lets remove it.
[1]
Unable to handle kernel paging request at virtual address ffff5fff2d536613
KASAN: maybe wild-memory-access in range [0xfffefff96a9b3098-0xfffefff96a9b309f]
Modules linked in:
CPU: 1 UID: 0 PID: 7321 Comm: syz.0.16 Not tainted 6.13.0-rc7-syzkaller-g1950a0af2d55 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : mr_mfc_uses_dev net/ipv4/ipmr_base.c:290 [inline]
pc : mr_table_dump+0x5a4/0x8b0 net/ipv4/ipmr_base.c:334
lr : mr_mfc_uses_dev net/ipv4/ipmr_base.c:289 [inline]
lr : mr_table_dump+0x694/0x8b0 net/ipv4/ipmr_base.c:334
Call trace:
mr_mfc_uses_dev net/ipv4/ipmr_base.c:290 [inline] (P)
mr_table_dump+0x5a4/0x8b0 net/ipv4/ipmr_base.c:334 (P)
mr_rtm_dumproute+0x254/0x454 net/ipv4/ipmr_base.c:382
ipmr_rtm_dumproute+0x248/0x4b4 net/ipv4/ipmr.c:2648
rtnl_dump_all+0x2e4/0x4e8 net/core/rtnetlink.c:4327
rtnl_dumpit+0x98/0x1d0 net/core/rtnetlink.c:6791
netlink_dump+0x4f0/0xbc0 net/netlink/af_netlink.c:2317
netlink_recvmsg+0x56c/0xe64 net/netlink/af_netlink.c:1973
sock_recvmsg_nosec net/socket.c:1033 [inline]
sock_recvmsg net/socket.c:1055 [inline]
sock_read_iter+0x2d8/0x40c net/socket.c:1125
new_sync_read fs/read_write.c:484 [inline]
vfs_read+0x740/0x970 fs/read_write.c:565
ksys_read+0x15c/0x26c fs/read_write.c:708
Fixes: cb167893f41e ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps")
Reported-by: syzbot+5cfae50c0e5f2c500013@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/678fe2d1.050a0220.15cac.00b3.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250121181241.841212-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c05c5e5aa163f4682ca97a2f0536575fc7dbdecb ]
When skb needs GSO and wrap around happens, if xo->seq.low (seqno of
the first skb segment) is before the last seq number but oseq (seqno
of the last segment) is after it, xo->seq.low is still bigger than
replay_esn->oseq while oseq is smaller than it, so the update of
replay_esn->oseq_hi is missed for this case wrap around because of
the change in the cited commit.
For example, if sending a packet with gso_segs=3 while old
replay_esn->oseq=0xfffffffe, we calculate:
xo->seq.low = 0xfffffffe + 1 = 0x0xffffffff
oseq = 0xfffffffe + 3 = 0x1
(oseq < replay_esn->oseq) is true, but (xo->seq.low <
replay_esn->oseq) is false, so replay_esn->oseq_hi is not incremented.
To fix this issue, change the outer checking back for the update of
replay_esn->oseq_hi. And add new checking inside for the update of
packet's oseq_hi.
Fixes: 4b549ccce941 ("xfrm: replay: Fix ESN wrap around for GSO")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8ac412a3361173e3000b16167af3d1f6f90af613 ]
MEM_WRITE attribute is defined as: "Non-presence of MEM_WRITE means that
MEM is only being read". bpf_load_hdr_opt() both reads and writes from
its arg2 - void *search_res.
This matters a lot for the next commit where we more precisely track
stack accesses. Without this annotation, the verifier will make false
assumptions about the contents of memory written to by helpers and
possibly prune valid branches.
Fixes: 6fad274f06f0 ("bpf: Add MEM_WRITE attribute")
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/730e45f8c39be2a5f3d8c4406cceca9d574cbf14.1736886479.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d640627663bfe7d8963c7615316d7d4ef60f3b0b ]
In case of possible unpredictably large arguments passed to
rose_setsockopt() and multiplied by extra values on top of that,
integer overflows may occur.
Do the safest minimum and fix these issues by checking the
contents of 'opt' and returning -EINVAL if they are too large. Also,
switch to unsigned int and remove useless check for negative 'opt'
in ROSE_IDLE case.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Link: https://patch.msgid.link/20250115164220.19954-1-n.zhandarovich@fintech.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 25c1a9ca53db5780757e7f53e688b8f916821baa ]
I noticed that HyStart incorrectly marks the start of rounds,
leading to inaccurate measurements of ACK train lengths and
resetting the `ca->sample_cnt` variable. This inaccuracy can impact
HyStart's functionality in terminating exponential cwnd growth during
Slow-Start, potentially degrading TCP performance.
The issue arises because the changes introduced in commit 4e1fddc98d25
("tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows")
moved the caller of the `bictcp_hystart_reset` function inside the `hystart_update` function.
This modification added an additional condition for triggering the caller,
requiring that (tcp_snd_cwnd(tp) >= hystart_low_window) must also
be satisfied before invoking `bictcp_hystart_reset`.
This fix ensures that `bictcp_hystart_reset` is correctly called
at the start of a new round, regardless of the congestion window size.
This is achieved by moving the condition
(tcp_snd_cwnd(tp) >= hystart_low_window)
from before calling `bictcp_hystart_reset` to after it.
I tested with a client and a server connected through two Linux software routers.
In this setup, the minimum RTT was 150 ms, the bottleneck bandwidth was 50 Mbps,
and the bottleneck buffer size was 1 BDP, calculated as (50M / 1514 / 8) * 0.150 = 619 packets.
I conducted the test twice, transferring data from the server to the client for 1.5 seconds.
Before the patch was applied, HYSTART-DELAY stopped the exponential growth of cwnd when
cwnd = 516, and the bottleneck link was not yet saturated (516 < 619).
After the patch was applied, HYSTART-ACK-TRAIN stopped the exponential growth of cwnd when
cwnd = 632, and the bottleneck link was saturated (632 > 619).
In this test, applying the patch resulted in 300 KB more data delivered.
Fixes: 4e1fddc98d25 ("tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows")
Signed-off-by: Mahdi Arghavani <ma.arghavani@yahoo.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Haibo Zhang <haibo.zhang@otago.ac.nz>
Cc: David Eyers <david.eyers@otago.ac.nz>
Cc: Abbas Arghavani <abbas.arghavani@mdu.se>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7a4b61406395291ffb7220a10e8951a9a8684819 ]
The conntrack entry is already public, there is a small chance that another
CPU is handling a packet in reply direction and racing with the tcp state
update.
Move this under ct spinlock.
This is done once, when ct is about to be offloaded, so this should
not result in a noticeable performance hit.
Fixes: 8437a6209f76 ("netfilter: nft_flow_offload: set liberal tracking mode for tcp")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8d738c1869f611955d91d8d0fd0012d9ef207201 ]
The existing rbtree implementation uses singleton elements to represent
ranges, however, userspace provides a set size according to the number
of ranges in the set.
Adjust provided userspace set size to the number of singleton elements
in the kernel by multiplying the range by two.
Check if the no-match all-zero element is already in the set, in such
case release one slot in the set size.
Fixes: 0ed6389c483d ("netfilter: nf_tables: rename set implementations")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7d259f021aaa78904b6c836d975e8e00d83a182a ]
There is no need for asynchronous garbage collection, rbtree inserts
can only happen from the netlink control plane.
We already perform on-demand gc on insertion, in the area of the
tree where the insertion takes place, but we don't do a full tree
walk there for performance reasons.
Do a full gc walk at the end of the transaction instead and
remove the async worker.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stable-dep-of: 8d738c1869f6 ("netfilter: nf_tables: fix set size with rbtree backend")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8079fc30f79799e59d9602e7e080d434936a482d ]
Next patch adds a cllaer that doesn't hold the priv->write lock and
will need a similar function.
Rename the existing function to make it clear that it can only
be used for opportunistic gc during insertion.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stable-dep-of: 8d738c1869f6 ("netfilter: nf_tables: fix set size with rbtree backend")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 256001672153af5786c6ca148114693d7d76d836 ]
The set backend using this already has to work around this via ugly
cast, don't spread this pattern.
Signed-off-by: Florian Westphal <fw@strlen.de>
Stable-dep-of: 8d738c1869f6 ("netfilter: nf_tables: fix set size with rbtree backend")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bc50835e83f60f56e9bec2b392fb5544f250fb6f ]
Lion Ackermann was able to create a UAF which can be abused for privilege
escalation with the following script
Step 1. create root qdisc
tc qdisc add dev lo root handle 1:0 drr
step2. a class for packet aggregation do demonstrate uaf
tc class add dev lo classid 1:1 drr
step3. a class for nesting
tc class add dev lo classid 1:2 drr
step4. a class to graft qdisc to
tc class add dev lo classid 1:3 drr
step5.
tc qdisc add dev lo parent 1:1 handle 2:0 plug limit 1024
step6.
tc qdisc add dev lo parent 1:2 handle 3:0 drr
step7.
tc class add dev lo classid 3:1 drr
step 8.
tc qdisc add dev lo parent 3:1 handle 4:0 pfifo
step 9. Display the class/qdisc layout
tc class ls dev lo
class drr 1:1 root leaf 2: quantum 64Kb
class drr 1:2 root leaf 3: quantum 64Kb
class drr 3:1 root leaf 4: quantum 64Kb
tc qdisc ls
qdisc drr 1: dev lo root refcnt 2
qdisc plug 2: dev lo parent 1:1
qdisc pfifo 4: dev lo parent 3:1 limit 1000p
qdisc drr 3: dev lo parent 1:2
step10. trigger the bug <=== prevented by this patch
tc qdisc replace dev lo parent 1:3 handle 4:0
step 11. Redisplay again the qdiscs/classes
tc class ls dev lo
class drr 1:1 root leaf 2: quantum 64Kb
class drr 1:2 root leaf 3: quantum 64Kb
class drr 1:3 root leaf 4: quantum 64Kb
class drr 3:1 root leaf 4: quantum 64Kb
tc qdisc ls
qdisc drr 1: dev lo root refcnt 2
qdisc plug 2: dev lo parent 1:1
qdisc pfifo 4: dev lo parent 3:1 refcnt 2 limit 1000p
qdisc drr 3: dev lo parent 1:2
Observe that a) parent for 4:0 does not change despite the replace request.
There can only be one parent. b) refcount has gone up by two for 4:0 and
c) both class 1:3 and 3:1 are pointing to it.
Step 12. send one packet to plug
echo "" | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888,priority=$((0x10001))
step13. send one packet to the grafted fifo
echo "" | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888,priority=$((0x10003))
step14. lets trigger the uaf
tc class delete dev lo classid 1:3
tc class delete dev lo classid 1:1
The semantics of "replace" is for a del/add _on the same node_ and not
a delete from one node(3:1) and add to another node (1:3) as in step10.
While we could "fix" with a more complex approach there could be
consequences to expectations so the patch takes the preventive approach of
"disallow such config".
Joint work with Lion Ackermann <nnamrec@gmail.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250116013713.900000-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 12e070eb6964b341b41677fd260af5a305316a1f ]
The following trace can be seen if a device is being unregistered while
its number of channels are being modified.
DEBUG_LOCKS_WARN_ON(lock->magic != lock)
WARNING: CPU: 3 PID: 3754 at kernel/locking/mutex.c:564 __mutex_lock+0xc8a/0x1120
CPU: 3 UID: 0 PID: 3754 Comm: ethtool Not tainted 6.13.0-rc6+ #771
RIP: 0010:__mutex_lock+0xc8a/0x1120
Call Trace:
<TASK>
ethtool_check_max_channel+0x1ea/0x880
ethnl_set_channels+0x3c3/0xb10
ethnl_default_set_doit+0x306/0x650
genl_family_rcv_msg_doit+0x1e3/0x2c0
genl_rcv_msg+0x432/0x6f0
netlink_rcv_skb+0x13d/0x3b0
genl_rcv+0x28/0x40
netlink_unicast+0x42e/0x720
netlink_sendmsg+0x765/0xc20
__sys_sendto+0x3ac/0x420
__x64_sys_sendto+0xe0/0x1c0
do_syscall_64+0x95/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
This is because unregister_netdevice_many_notify might run before the
rtnl lock section of ethnl operations, eg. set_channels in the above
example. In this example the rss lock would be destroyed by the device
unregistration path before being used again, but in general running
ethnl operations while dismantle has started is not a good idea.
Fix this by denying any operation on devices being unregistered. A check
was already there in ethnl_ops_begin, but not wide enough.
Note that the same issue cannot be seen on the ioctl version
(__dev_ethtool) because the device reference is retrieved from within
the rtnl lock section there. Once dismantle started, the net device is
unlisted and no reference will be found.
Fixes: dde91ccfa25f ("ethtool: do not perform operations on net devices being unregistered")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250116092159.50890-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d1f9f79fa2af8e3b45cffdeef66e05833480148a ]
The following problem was encountered during stability test:
(NULL net_device): NAPI poll function process_backlog+0x0/0x530 \
returned 1, exceeding its budget of 0.
------------[ cut here ]------------
list_add double add: new=ffff88905f746f48, prev=ffff88905f746f48, \
next=ffff88905f746e40.
WARNING: CPU: 18 PID: 5462 at lib/list_debug.c:35 \
__list_add_valid_or_report+0xf3/0x130
CPU: 18 UID: 0 PID: 5462 Comm: ping Kdump: loaded Not tainted 6.13.0-rc7+
RIP: 0010:__list_add_valid_or_report+0xf3/0x130
Call Trace:
? __warn+0xcd/0x250
? __list_add_valid_or_report+0xf3/0x130
enqueue_to_backlog+0x923/0x1070
netif_rx_internal+0x92/0x2b0
__netif_rx+0x15/0x170
loopback_xmit+0x2ef/0x450
dev_hard_start_xmit+0x103/0x490
__dev_queue_xmit+0xeac/0x1950
ip_finish_output2+0x6cc/0x1620
ip_output+0x161/0x270
ip_push_pending_frames+0x155/0x1a0
raw_sendmsg+0xe13/0x1550
__sys_sendto+0x3bf/0x4e0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x5b/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The reproduction command is as follows:
sysctl -w net.core.dev_weight=0
ping 127.0.0.1
This is because when the napi's weight is set to 0, process_backlog() may
return 0 and clear the NAPI_STATE_SCHED bit of napi->state, causing this
napi to be re-polled in net_rx_action() until __do_softirq() times out.
Since the NAPI_STATE_SCHED bit has been cleared, napi_schedule_rps() can
be retriggered in enqueue_to_backlog(), causing this issue.
Making the napi's weight always non-zero solves this problem.
Triggering this issue requires system-wide admin (setting is
not namespaced).
Fixes: e38766054509 ("[NET]: Fix sysctl net.core.dev_weight")
Fixes: 3d48b53fb2ae ("net: dev_weight: TX/RX orthogonality")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Link: https://patch.msgid.link/20250116143053.4146855-1-liujian56@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3440fa34ad99d471f1085bc2f4dedeaebc310261 ]
Following fields of 'struct mr_mfc' can be updated
concurrently (no lock protection) from ip_mr_forward()
and ip6_mr_forward()
- bytes
- pkt
- wrong_if
- lastuse
They also can be read from other functions.
Convert bytes, pkt and wrong_if to atomic_long_t,
and use READ_ONCE()/WRITE_ONCE() for lastuse.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250114221049.1190631-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1a0d24775cdee2b8dc14bfa4f4418c930ab1ac57 ]
In 'cfg80211_scan_6ghz()', an instances of 'struct cfg80211_colocated_ap'
are allocated as if they would have 'ssid' as trailing VLA member. Since
this is not so, extra IEEE80211_MAX_SSID_LEN bytes are not needed.
Briefly tested with KUnit.
Fixes: c8cb5b854b40 ("nl80211/cfg80211: support 6 GHz scanning")
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Link: https://patch.msgid.link/20250113155417.552587-1-dmantipov@yandex.ru
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0fca7784b7a14d4ede64f479662afb98876ec7f8 ]
When the scan parameters for a 6GHz scan specify a unicast
BSSID address, and the corresponding AP is found in the scan
list, add a corresponding entry in the collocated AP list,
so this AP would be directly probed even if it was not
advertised as a collocated AP.
This is needed for handling a scan request that is intended
for a ML probe flow, where user space can requests a scan
to retrieve information for other links in the AP MLD.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230928172905.54b954bc02ad.I1c072793d3d77a4c8fbbc64b4db5cce1bbb00382@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Stable-dep-of: 1a0d24775cde ("wifi: cfg80211: adjust allocation of colocated AP data")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|