summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2022-06-22net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsgDuoming Zhou
[ Upstream commit 219b51a6f040fa5367adadd7d58c4dda0896a01d ] The skb_recv_datagram() in ax25_recvmsg() will hold lock_sock and block until it receives a packet from the remote. If the client doesn`t connect to server and calls read() directly, it will not receive any packets forever. As a result, the deadlock will happen. The fail log caused by deadlock is shown below: [ 369.606973] INFO: task ax25_deadlock:157 blocked for more than 245 seconds. [ 369.608919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.613058] Call Trace: [ 369.613315] <TASK> [ 369.614072] __schedule+0x2f9/0xb20 [ 369.615029] schedule+0x49/0xb0 [ 369.615734] __lock_sock+0x92/0x100 [ 369.616763] ? destroy_sched_domains_rcu+0x20/0x20 [ 369.617941] lock_sock_nested+0x6e/0x70 [ 369.618809] ax25_bind+0xaa/0x210 [ 369.619736] __sys_bind+0xca/0xf0 [ 369.620039] ? do_futex+0xae/0x1b0 [ 369.620387] ? __x64_sys_futex+0x7c/0x1c0 [ 369.620601] ? fpregs_assert_state_consistent+0x19/0x40 [ 369.620613] __x64_sys_bind+0x11/0x20 [ 369.621791] do_syscall_64+0x3b/0x90 [ 369.622423] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 369.623319] RIP: 0033:0x7f43c8aa8af7 [ 369.624301] RSP: 002b:00007f43c8197ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000031 [ 369.625756] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f43c8aa8af7 [ 369.626724] RDX: 0000000000000010 RSI: 000055768e2021d0 RDI: 0000000000000005 [ 369.628569] RBP: 00007f43c8197f00 R08: 0000000000000011 R09: 00007f43c8198700 [ 369.630208] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff845e6afe [ 369.632240] R13: 00007fff845e6aff R14: 00007f43c8197fc0 R15: 00007f43c8198700 This patch replaces skb_recv_datagram() with an open-coded variant of it releasing the socket lock before the __skb_wait_for_more_packets() call and re-acquiring it after such call in order that other functions that need socket lock could be executed. what's more, the socket lock will be released only when recvmsg() will block and that should produce nicer overall behavior. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Suggested-by: Thomas Osterried <thomas@osterried.de> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reported-by: Thomas Habets <thomas@@habets.se> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-22sunrpc: set cl_max_connect when cloning an rpc_clntScott Mayhew
[ Upstream commit 304791255a2dc1c9be7e7c8a6cbdb31b6847b0e5 ] If the initial attempt at trunking detection using the krb5i auth flavor fails with -EACCES, -NFS4ERR_CLID_INUSE, or -NFS4ERR_WRONGSEC, then the NFS client tries again using auth_sys, cloning the rpc_clnt in the process. If this second attempt at trunking detection succeeds, then the resulting nfs_client->cl_rpcclient winds up having cl_max_connect=0 and subsequent attempts to add additional transport connections to the rpc_clnt will fail with a message similar to the following being logged: [502044.312640] SUNRPC: reached max allowed number (0) did not add transport to server: 192.168.122.3 Signed-off-by: Scott Mayhew <smayhew@redhat.com> Fixes: dc48e0abee24 ("SUNRPC enforce creation of no more than max_connect xprts") Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-22ipv6: Fix signed integer overflow in l2tp_ip6_sendmsgWang Yufen
[ Upstream commit f638a84afef3dfe10554c51820c16e39a278c915 ] When len >= INT_MAX - transhdrlen, ulen = len + transhdrlen will be overflow. To fix, we can follow what udpv6 does and subtract the transhdrlen from the max. Signed-off-by: Wang Yufen <wangyufen@huawei.com> Link: https://lore.kernel.org/r/20220607120028.845916-2-wangyufen@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14xsk: Fix possible crash when multiple sockets are createdMaciej Fijalkowski
commit ba3beec2ec1d3b4fd8672ca6e781dac4b3267f6e upstream. Fix a crash that happens if an Rx only socket is created first, then a second socket is created that is Tx only and bound to the same umem as the first socket and also the same netdev and queue_id together with the XDP_SHARED_UMEM flag. In this specific case, the tx_descs array page pool was not created by the first socket as it was an Rx only socket. When the second socket is bound it needs this tx_descs array of this shared page pool as it has a Tx component, but unfortunately it was never allocated, leading to a crash. Note that this array is only used for zero-copy drivers using the batched Tx APIs, currently only ice and i40e. [ 5511.150360] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 5511.158419] #PF: supervisor write access in kernel mode [ 5511.164472] #PF: error_code(0x0002) - not-present page [ 5511.170416] PGD 0 P4D 0 [ 5511.173347] Oops: 0002 [#1] PREEMPT SMP PTI [ 5511.178186] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G E 5.18.0-rc1+ #97 [ 5511.187245] Hardware name: Intel Corp. GRANTLEY/GRANTLEY, BIOS GRRFCRB1.86B.0276.D07.1605190235 05/19/2016 [ 5511.198418] RIP: 0010:xsk_tx_peek_release_desc_batch+0x198/0x310 [ 5511.205375] Code: c0 83 c6 01 84 c2 74 6d 8d 46 ff 23 07 44 89 e1 48 83 c0 14 48 c1 e1 04 48 c1 e0 04 48 03 47 10 4c 01 c1 48 8b 50 08 48 8b 00 <48> 89 51 08 48 89 01 41 80 bd d7 00 00 00 00 75 82 48 8b 19 49 8b [ 5511.227091] RSP: 0018:ffffc90000003dd0 EFLAGS: 00010246 [ 5511.233135] RAX: 0000000000000000 RBX: ffff88810c8da600 RCX: 0000000000000000 [ 5511.241384] RDX: 000000000000003c RSI: 0000000000000001 RDI: ffff888115f555c0 [ 5511.249634] RBP: ffffc90000003e08 R08: 0000000000000000 R09: ffff889092296b48 [ 5511.257886] R10: 0000ffffffffffff R11: ffff889092296800 R12: 0000000000000000 [ 5511.266138] R13: ffff88810c8db500 R14: 0000000000000040 R15: 0000000000000100 [ 5511.274387] FS: 0000000000000000(0000) GS:ffff88903f800000(0000) knlGS:0000000000000000 [ 5511.283746] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5511.290389] CR2: 0000000000000008 CR3: 00000001046e2001 CR4: 00000000003706f0 [ 5511.298640] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 5511.306892] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 5511.315142] Call Trace: [ 5511.317972] <IRQ> [ 5511.320301] ice_xmit_zc+0x68/0x2f0 [ice] [ 5511.324977] ? ktime_get+0x38/0xa0 [ 5511.328913] ice_napi_poll+0x7a/0x6a0 [ice] [ 5511.333784] __napi_poll+0x2c/0x160 [ 5511.337821] net_rx_action+0xdd/0x200 [ 5511.342058] __do_softirq+0xe6/0x2dd [ 5511.346198] irq_exit_rcu+0xb5/0x100 [ 5511.350339] common_interrupt+0xa4/0xc0 [ 5511.354777] </IRQ> [ 5511.357201] <TASK> [ 5511.359625] asm_common_interrupt+0x1e/0x40 [ 5511.364466] RIP: 0010:cpuidle_enter_state+0xd2/0x360 [ 5511.370211] Code: 49 89 c5 0f 1f 44 00 00 31 ff e8 e9 00 7b ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 72 02 00 00 31 ff e8 02 0c 80 ff fb 45 85 f6 <0f> 88 11 01 00 00 49 63 c6 4c 2b 2c 24 48 8d 14 40 48 8d 14 90 49 [ 5511.391921] RSP: 0018:ffffffff82a03e60 EFLAGS: 00000202 [ 5511.397962] RAX: ffff88903f800000 RBX: 0000000000000001 RCX: 000000000000001f [ 5511.406214] RDX: 0000000000000000 RSI: ffffffff823400b9 RDI: ffffffff8234c046 [ 5511.424646] RBP: ffff88810a384800 R08: 000005032a28c046 R09: 0000000000000008 [ 5511.443233] R10: 000000000000000b R11: 0000000000000006 R12: ffffffff82bcf700 [ 5511.461922] R13: 000005032a28c046 R14: 0000000000000001 R15: 0000000000000000 [ 5511.480300] cpuidle_enter+0x29/0x40 [ 5511.494329] do_idle+0x1c7/0x250 [ 5511.507610] cpu_startup_entry+0x19/0x20 [ 5511.521394] start_kernel+0x649/0x66e [ 5511.534626] secondary_startup_64_no_verify+0xc3/0xcb [ 5511.549230] </TASK> Detect such case during bind() and allocate this memory region via newly introduced xp_alloc_tx_descs(). Also, use kvcalloc instead of kcalloc as for other buffer pool allocations, so that it matches the kvfree() from xp_destroy(). Fixes: d1bc532e99be ("i40e: xsk: Move tmp desc array from driver to pool") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220425153745.481322-1-maciej.fijalkowski@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-14tcp: fix tcp_mtup_probe_success vs wrong snd_cwndEric Dumazet
commit 11825765291a93d8e7f44230da67b9f607c777bf upstream. syzbot got a new report [1] finally pointing to a very old bug, added in initial support for MTU probing. tcp_mtu_probe() has checks about starting an MTU probe if tcp_snd_cwnd(tp) >= 11. But nothing prevents tcp_snd_cwnd(tp) to be reduced later and before the MTU probe succeeds. This bug would lead to potential zero-divides. Debugging added in commit 40570375356c ("tcp: add accessors to read/set tp->snd_cwnd") has paid off :) While we are at it, address potential overflows in this code. [1] WARNING: CPU: 1 PID: 14132 at include/net/tcp.h:1219 tcp_mtup_probe_success+0x366/0x570 net/ipv4/tcp_input.c:2712 Modules linked in: CPU: 1 PID: 14132 Comm: syz-executor.2 Not tainted 5.18.0-syzkaller-07857-gbabf0bb978e3 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:tcp_snd_cwnd_set include/net/tcp.h:1219 [inline] RIP: 0010:tcp_mtup_probe_success+0x366/0x570 net/ipv4/tcp_input.c:2712 Code: 74 08 48 89 ef e8 da 80 17 f9 48 8b 45 00 65 48 ff 80 80 03 00 00 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 aa b0 c5 f8 <0f> 0b e9 16 fe ff ff 48 8b 4c 24 08 80 e1 07 38 c1 0f 8c c7 fc ff RSP: 0018:ffffc900079e70f8 EFLAGS: 00010287 RAX: ffffffff88c0f7f6 RBX: ffff8880756e7a80 RCX: 0000000000040000 RDX: ffffc9000c6c4000 RSI: 0000000000031f9e RDI: 0000000000031f9f RBP: 0000000000000000 R08: ffffffff88c0f606 R09: ffffc900079e7520 R10: ffffed101011226d R11: 1ffff1101011226c R12: 1ffff1100eadcf50 R13: ffff8880756e72c0 R14: 1ffff1100eadcf89 R15: dffffc0000000000 FS: 00007f643236e700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ab3f1e2a0 CR3: 0000000064fe7000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tcp_clean_rtx_queue+0x223a/0x2da0 net/ipv4/tcp_input.c:3356 tcp_ack+0x1962/0x3c90 net/ipv4/tcp_input.c:3861 tcp_rcv_established+0x7c8/0x1ac0 net/ipv4/tcp_input.c:5973 tcp_v6_do_rcv+0x57b/0x1210 net/ipv6/tcp_ipv6.c:1476 sk_backlog_rcv include/net/sock.h:1061 [inline] __release_sock+0x1d8/0x4c0 net/core/sock.c:2849 release_sock+0x5d/0x1c0 net/core/sock.c:3404 sk_stream_wait_memory+0x700/0xdc0 net/core/stream.c:145 tcp_sendmsg_locked+0x111d/0x3fc0 net/ipv4/tcp.c:1410 tcp_sendmsg+0x2c/0x40 net/ipv4/tcp.c:1448 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg net/socket.c:734 [inline] __sys_sendto+0x439/0x5c0 net/socket.c:2119 __do_sys_sendto net/socket.c:2131 [inline] __se_sys_sendto net/socket.c:2127 [inline] __x64_sys_sendto+0xda/0xf0 net/socket.c:2127 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f6431289109 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f643236e168 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f643139c100 RCX: 00007f6431289109 RDX: 00000000d0d0c2ac RSI: 0000000020000080 RDI: 000000000000000a RBP: 00007f64312e308d R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fff372533af R14: 00007f643236e300 R15: 0000000000022000 Fixes: 5d424d5a674f ("[TCP]: MTU probing") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-14net/sched: act_police: more accurate MTU policingDavide Caratti
commit 4ddc844eb81da59bfb816d8d52089aba4e59e269 upstream. in current Linux, MTU policing does not take into account that packets at the TC ingress have the L2 header pulled. Thus, the same TC police action (with the same value of tcfp_mtu) behaves differently for ingress/egress. In addition, the full GSO size is compared to tcfp_mtu: as a consequence, the policer drops GSO packets even when individual segments have the L2 + L3 + L4 + payload length below the configured valued of tcfp_mtu. Improve the accuracy of MTU policing as follows: - account for mac_len for non-GSO packets at TC ingress. - compare MTU threshold with the segmented size for GSO packets. Also, add a kselftest that verifies the correct behavior. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-14net: openvswitch: fix misuse of the cached connection on tuple changesIlya Maximets
commit 2061ecfdf2350994e5b61c43e50e98a7a70e95ee upstream. If packet headers changed, the cached nfct is no longer relevant for the packet and attempt to re-use it leads to the incorrect packet classification. This issue is causing broken connectivity in OpenStack deployments with OVS/OVN due to hairpin traffic being unexpectedly dropped. The setup has datapath flows with several conntrack actions and tuple changes between them: actions:ct(commit,zone=8,mark=0/0x1,nat(src)), set(eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:06)), set(ipv4(src=172.18.2.10,dst=192.168.100.6,ttl=62)), ct(zone=8),recirc(0x4) After the first ct() action the packet headers are almost fully re-written. The next ct() tries to re-use the existing nfct entry and marks the packet as invalid, so it gets dropped later in the pipeline. Clearing the cached conntrack entry whenever packet tuple is changed to avoid the issue. The flow key should not be cleared though, because we should still be able to match on the ct_state if the recirculation happens after the tuple change but before the next ct() action. Cc: stable@vger.kernel.org Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Frode Nordahl <frode.nordahl@canonical.com> Link: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-May/051829.html Link: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856 Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Link: https://lore.kernel.org/r/20220606221140.488984-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-14Revert "net: af_key: add check for pfkey_broadcast in function pfkey_process"Michal Kubecek
[ Upstream commit 9c90c9b3e50e16d03c7f87d63e9db373974781e0 ] This reverts commit 4dc2a5a8f6754492180741facf2a8787f2c415d7. A non-zero return value from pfkey_broadcast() does not necessarily mean an error occurred as this function returns -ESRCH when no registered listener received the message. In particular, a call with BROADCAST_PROMISC_ONLY flag and null one_sk argument can never return zero so that this commit in fact prevents processing any PF_KEY message. One visible effect is that racoon daemon fails to find encryption algorithms like aes and refuses to start. Excluding -ESRCH return value would fix this but it's not obvious that we really want to bail out here and most other callers of pfkey_broadcast() also ignore the return value. Also, as pointed out by Steffen Klassert, PF_KEY is kind of deprecated and newer userspace code should use netlink instead so that we should only disturb the code for really important fixes. v2: add a comment explaining why is the return value ignored Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14tcp: use alloc_large_system_hash() to allocate table_perturbMuchun Song
[ Upstream commit e67b72b90b7e19a4be4d9c29f3feea6f58ab43f8 ] In our server, there may be no high order (>= 6) memory since we reserve lots of HugeTLB pages when booting. Then the system panic. So use alloc_large_system_hash() to allocate table_perturb. Fixes: e9261476184b ("tcp: dynamically allocate the perturb table used by source ports") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220607070214.94443-1-songmuchun@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14ip_gre: test csum_start instead of transport headerWillem de Bruijn
[ Upstream commit 8d21e9963bec1aad2280cdd034c8993033ef2948 ] GRE with TUNNEL_CSUM will apply local checksum offload on CHECKSUM_PARTIAL packets. ipgre_xmit must validate csum_start after an optional skb_pull, else lco_csum may trigger an overflow. The original check was if (csum && skb_checksum_start(skb) < skb->data) return -EINVAL; This had false positives when skb_checksum_start is undefined: when ip_summed is not CHECKSUM_PARTIAL. A discussed refinement was straightforward if (csum && skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_start(skb) < skb->data) return -EINVAL; But was eventually revised more thoroughly: - restrict the check to the only branch where needed, in an uncommon GRE path that uses header_ops and calls skb_pull. - test skb_transport_header, which is set along with csum_start in skb_partial_csum_set in the normal header_ops datapath. Turns out skbs can arrive in this branch without the transport header set, e.g., through BPF redirection. Revise the check back to check csum_start directly, and only if CHECKSUM_PARTIAL. Do leave the check in the updated location. Check field regardless of whether TUNNEL_CSUM is configured. Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/ Link: https://lore.kernel.org/all/20210902193447.94039-2-willemdebruijn.kernel@gmail.com/T/#u Fixes: 8a0ed250f911 ("ip_gre: validate csum_start only on pull") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20220606132107.3582565-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14net: ipv6: unexport __init-annotated seg6_hmac_init()Masahiro Yamada
[ Upstream commit 5801f064e35181c71857a80ff18af4dbec3c5f5c ] EXPORT_SYMBOL and __init is a bad combination because the .init.text section is freed up after the initialization. Hence, modules cannot use symbols annotated __init. The access to a freed symbol may end up with kernel panic. modpost used to detect it, but it has been broken for a decade. Recently, I fixed modpost so it started to warn it again, then this showed up in linux-next builds. There are two ways to fix it: - Remove __init - Remove EXPORT_SYMBOL I chose the latter for this case because the caller (net/ipv6/seg6.c) and the callee (net/ipv6/seg6_hmac.c) belong to the same module. It seems an internal function call in ipv6.ko. Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14net: xfrm: unexport __init-annotated xfrm4_protocol_init()Masahiro Yamada
[ Upstream commit 4a388f08d8784af48f352193d2b72aaf167a57a1 ] EXPORT_SYMBOL and __init is a bad combination because the .init.text section is freed up after the initialization. Hence, modules cannot use symbols annotated __init. The access to a freed symbol may end up with kernel panic. modpost used to detect it, but it has been broken for a decade. Recently, I fixed modpost so it started to warn it again, then this showed up in linux-next builds. There are two ways to fix it: - Remove __init - Remove EXPORT_SYMBOL I chose the latter for this case because the only in-tree call-site, net/ipv4/xfrm4_policy.c is never compiled as modular. (CONFIG_XFRM is boolean) Fixes: 2f32b51b609f ("xfrm: Introduce xfrm_input_afinfo to access the the callbacks properly") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer()Chuck Lever
[ Upstream commit 6c254bf3b637dd4ef4f78eb78c7447419c0161d7 ] I found that NFSD's new NFSv3 READDIRPLUS XDR encoder was screwing up right at the end of the page array. xdr_get_next_encode_buffer() does not compute the value of xdr->end correctly: * The check to see if we're on the final available page in xdr->buf needs to account for the space consumed by @nbytes. * The new xdr->end value needs to account for the portion of @nbytes that is to be encoded into the previous buffer. Fixes: 2825a7f90753 ("nfsd4: allow encoding across page boundaries") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14xsk: Fix handling of invalid descriptors in XSK TX batching APIMaciej Fijalkowski
[ Upstream commit d678cbd2f867a564a3c5b276c454e873f43f02f8 ] xdpxceiver run on a AF_XDP ZC enabled driver revealed a problem with XSK Tx batching API. There is a test that checks how invalid Tx descriptors are handled by AF_XDP. Each valid descriptor is followed by invalid one on Tx side whereas the Rx side expects only to receive a set of valid descriptors. In current xsk_tx_peek_release_desc_batch() function, the amount of available descriptors is hidden inside xskq_cons_peek_desc_batch(). This can be problematic in cases where invalid descriptors are present due to the fact that xskq_cons_peek_desc_batch() returns only a count of valid descriptors. This means that it is impossible to properly update XSK ring state when calling xskq_cons_release_n(). To address this issue, pull out the contents of xskq_cons_peek_desc_batch() so that callers (currently only xsk_tx_peek_release_desc_batch()) will always be able to update the state of ring properly, as total count of entries is now available and use this value as an argument in xskq_cons_release_n(). By doing so, xskq_cons_peek_desc_batch() can be dropped altogether. Fixes: 9349eb3a9d2a ("xsk: Introduce batched Tx descriptor interfaces") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220607142200.576735-1-maciej.fijalkowski@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14i40e: xsk: Move tmp desc array from driver to poolMagnus Karlsson
[ Upstream commit d1bc532e99becf104635ed4da6fefa306f452321 ] Move desc_array from the driver to the pool. The reason behind this is that we can then reuse this array as a temporary storage for descriptors in all zero-copy drivers that use the batched interface. This will make it easier to add batching to more drivers. i40e is the only driver that has a batched Tx zero-copy implementation, so no need to touch any other driver. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20220125160446.78976-6-maciej.fijalkowski@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14af_unix: Fix a data-race in unix_dgram_peer_wake_me().Kuniyuki Iwashima
[ Upstream commit 662a80946ce13633ae90a55379f1346c10f0c432 ] unix_dgram_poll() calls unix_dgram_peer_wake_me() without `other`'s lock held and check if its receive queue is full. Here we need to use unix_recvq_full_lockless() instead of unix_recvq_full(), otherwise KCSAN will report a data-race. Fixes: 7d267278a9ec ("unix: avoid use-after-free in ep_remove_wait_queue") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20220605232325.11804-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14netfilter: nf_tables: bail out early if hardware offload is not supportedPablo Neira Ayuso
[ Upstream commit 3a41c64d9c1185a2f3a184015e2a9b78bfc99c71 ] If user requests for NFT_CHAIN_HW_OFFLOAD, then check if either device provides the .ndo_setup_tc interface or there is an indirect flow block that has been registered. Otherwise, bail out early from the preparation phase. Moreover, validate that family == NFPROTO_NETDEV and hook is NF_NETDEV_INGRESS. Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14netfilter: nf_tables: memleak flow rule from commit pathPablo Neira Ayuso
[ Upstream commit 9dd732e0bdf538b1b76dc7c157e2b5e560ff30d3 ] Abort path release flow rule object, however, commit path does not. Update code to destroy these objects before releasing the transaction. Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14netfilter: nf_tables: release new hooks on unsupported flowtable flagsPablo Neira Ayuso
[ Upstream commit c271cc9febaaa1bcbc0842d1ee30466aa6148ea8 ] Release the list of new hooks that are pending to be registered in case that unsupported flowtable flags are provided. Fixes: 78d9f48f7f44 ("netfilter: nf_tables: add devices to existing flowtable") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14netfilter: nf_tables: always initialize flowtable hook list in transactionPablo Neira Ayuso
[ Upstream commit 2c9e4559773c261900c674a86b8e455911675d71 ] The hook list is used if nft_trans_flowtable_update(trans) == true. However, initialize this list for other cases for safety reasons. Fixes: 78d9f48f7f44 ("netfilter: nf_tables: add devices to existing flowtable") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14SUNRPC: Trap RDMA segment overflowsChuck Lever
[ Upstream commit f012e95b377c73c0283f009823c633104dedb337 ] Prevent svc_rdma_build_writes() from walking off the end of a Write chunk's segment array. Caught with KASAN. The test that this fix replaces is invalid, and might have been left over from an earlier prototype of the PCL work. Fixes: 7a1cbfa18059 ("svcrdma: Use parsed chunk lists to construct RDMA Writes") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14netfilter: nf_tables: delete flowtable hooks via transaction listPablo Neira Ayuso
[ Upstream commit b6d9014a3335194590abdd2a2471ef5147a67645 ] Remove inactive bool field in nft_hook object that was introduced in abadb2f865d7 ("netfilter: nf_tables: delete devices from flowtable"). Move stale flowtable hooks to transaction list instead. Deleting twice the same device does not result in ENOENT. Fixes: abadb2f865d7 ("netfilter: nf_tables: delete devices from flowtable") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14netfilter: nf_tables: use kfree_rcu(ptr, rcu) to release hooks in clean_net pathPablo Neira Ayuso
[ Upstream commit ab5e5c062f67c5ae8cd07f0632ffa62dc0e7d169 ] Use kfree_rcu(ptr, rcu) variant instead as described by ae089831ff28 ("netfilter: nf_tables: prefer kfree_rcu(ptr, rcu) variant"). Fixes: f9a43007d3f7 ("netfilter: nf_tables: double hook unregistration in netns path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14netfilter: nat: really support inet nat without l3 addressFlorian Westphal
[ Upstream commit 282e5f8fe907dc3f2fbf9f2103b0e62ffc3a68a5 ] When no l3 address is given, priv->family is set to NFPROTO_INET and the evaluation function isn't called. Call it too so l4-only rewrite can work. Also add a test case for this. Fixes: a33f387ecd5aa ("netfilter: nft_nat: allow to specify layer 4 protocol NAT only") Reported-by: Yi Chen <yiche@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14xprtrdma: treat all calls not a bcall when bc_serv is NULLKinglong Mee
[ Upstream commit 11270e7ca268e8d61b5d9e5c3a54bd1550642c9c ] When a rdma server returns a fault format reply, nfs v3 client may treats it as a bcall when bc service is not exist. The debug message at rpcrdma_bc_receive_call are, [56579.837169] RPC: rpcrdma_bc_receive_call: callback XID 00000001, length=20 [56579.837174] RPC: rpcrdma_bc_receive_call: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 After that, rpcrdma_bc_receive_call will meets NULL pointer as, [ 226.057890] BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8 ... [ 226.058704] RIP: 0010:_raw_spin_lock+0xc/0x20 ... [ 226.059732] Call Trace: [ 226.059878] rpcrdma_bc_receive_call+0x138/0x327 [rpcrdma] [ 226.060011] __ib_process_cq+0x89/0x170 [ib_core] [ 226.060092] ib_cq_poll_work+0x26/0x80 [ib_core] [ 226.060257] process_one_work+0x1a7/0x360 [ 226.060367] ? create_worker+0x1a0/0x1a0 [ 226.060440] worker_thread+0x30/0x390 [ 226.060500] ? create_worker+0x1a0/0x1a0 [ 226.060574] kthread+0x116/0x130 [ 226.060661] ? kthread_flush_work_fn+0x10/0x10 [ 226.060724] ret_from_fork+0x35/0x40 ... Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14tipc: check attribute length for bearer nameHoang Le
[ Upstream commit 7f36f798f89bf32c0164049cb0e3fd1af613d0bb ] syzbot reported uninit-value: ===================================================== BUG: KMSAN: uninit-value in string_nocheck lib/vsprintf.c:644 [inline] BUG: KMSAN: uninit-value in string+0x4f9/0x6f0 lib/vsprintf.c:725 string_nocheck lib/vsprintf.c:644 [inline] string+0x4f9/0x6f0 lib/vsprintf.c:725 vsnprintf+0x2222/0x3650 lib/vsprintf.c:2806 vprintk_store+0x537/0x2150 kernel/printk/printk.c:2158 vprintk_emit+0x28b/0xab0 kernel/printk/printk.c:2256 vprintk_default+0x86/0xa0 kernel/printk/printk.c:2283 vprintk+0x15f/0x180 kernel/printk/printk_safe.c:50 _printk+0x18d/0x1cf kernel/printk/printk.c:2293 tipc_enable_bearer net/tipc/bearer.c:371 [inline] __tipc_nl_bearer_enable+0x2022/0x22a0 net/tipc/bearer.c:1033 tipc_nl_bearer_enable+0x6c/0xb0 net/tipc/bearer.c:1042 genl_family_rcv_msg_doit net/netlink/genetlink.c:731 [inline] - Do sanity check the attribute length for TIPC_NLA_BEARER_NAME. - Do not use 'illegal name' in printing message. Reported-by: syzbot+e820fdc8ce362f2dea51@syzkaller.appspotmail.com Fixes: cb30a63384bc ("tipc: refactor function tipc_enable_bearer()") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Link: https://lore.kernel.org/r/20220602063053.5892-1-hoang.h.le@dektech.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14tcp: tcp_rtx_synack() can be called from process contextEric Dumazet
[ Upstream commit 0a375c822497ed6ad6b5da0792a12a6f1af10c0b ] Laurent reported the enclosed report [1] This bug triggers with following coditions: 0) Kernel built with CONFIG_DEBUG_PREEMPT=y 1) A new passive FastOpen TCP socket is created. This FO socket waits for an ACK coming from client to be a complete ESTABLISHED one. 2) A socket operation on this socket goes through lock_sock() release_sock() dance. 3) While the socket is owned by the user in step 2), a retransmit of the SYN is received and stored in socket backlog. 4) At release_sock() time, the socket backlog is processed while in process context. 5) A SYNACK packet is cooked in response of the SYN retransmit. 6) -> tcp_rtx_synack() is called in process context. Before blamed commit, tcp_rtx_synack() was always called from BH handler, from a timer handler. Fix this by using TCP_INC_STATS() & NET_INC_STATS() which do not assume caller is in non preemptible context. [1] BUG: using __this_cpu_add() in preemptible [00000000] code: epollpep/2180 caller is tcp_rtx_synack.part.0+0x36/0xc0 CPU: 10 PID: 2180 Comm: epollpep Tainted: G OE 5.16.0-0.bpo.4-amd64 #1 Debian 5.16.12-1~bpo11+1 Hardware name: Supermicro SYS-5039MC-H8TRF/X11SCD-F, BIOS 1.7 11/23/2021 Call Trace: <TASK> dump_stack_lvl+0x48/0x5e check_preemption_disabled+0xde/0xe0 tcp_rtx_synack.part.0+0x36/0xc0 tcp_rtx_synack+0x8d/0xa0 ? kmem_cache_alloc+0x2e0/0x3e0 ? apparmor_file_alloc_security+0x3b/0x1f0 inet_rtx_syn_ack+0x16/0x30 tcp_check_req+0x367/0x610 tcp_rcv_state_process+0x91/0xf60 ? get_nohz_timer_target+0x18/0x1a0 ? lock_timer_base+0x61/0x80 ? preempt_count_add+0x68/0xa0 tcp_v4_do_rcv+0xbd/0x270 __release_sock+0x6d/0xb0 release_sock+0x2b/0x90 sock_setsockopt+0x138/0x1140 ? __sys_getsockname+0x7e/0xc0 ? aa_sk_perm+0x3e/0x1a0 __sys_setsockopt+0x198/0x1e0 __x64_sys_setsockopt+0x21/0x30 do_syscall_64+0x38/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Laurent Fasnacht <laurent.fasnacht@proton.ch> Acked-by: Neal Cardwell <ncardwell@google.com> Link: https://lore.kernel.org/r/20220530213713.601888-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14tcp: add accessors to read/set tp->snd_cwndEric Dumazet
[ Upstream commit 40570375356c874b1578e05c1dcc3ff7c1322dbe ] We had various bugs over the years with code breaking the assumption that tp->snd_cwnd is greater than zero. Lately, syzbot reported the WARN_ON_ONCE(!tp->prior_cwnd) added in commit 8b8a321ff72c ("tcp: fix zero cwnd in tcp_cwnd_reduction") can trigger, and without a repro we would have to spend considerable time finding the bug. Instead of complaining too late, we want to catch where and when tp->snd_cwnd is set to an illegal value. Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Link: https://lore.kernel.org/r/20220405233538.947344-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14net/smc: fixes for converting from "struct smc_cdc_tx_pend **" to "struct ↵Guangguan Wang
smc_wr_tx_pend_priv *" [ Upstream commit e225c9a5a74b12e9ef8516f30a3db2c7eb866ee1 ] "struct smc_cdc_tx_pend **" can not directly convert to "struct smc_wr_tx_pend_priv *". Fixes: 2bced6aefa3d ("net/smc: put slot when connection is killed") Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09mac80211: upgrade passive scan to active scan on DFS channels after beacon rxFelix Fietkau
commit b041b7b9de6e1d4362de855ab90f9d03ef323edd upstream. In client mode, we can't connect to hidden SSID APs or SSIDs not advertised in beacons on DFS channels, since we're forced to passive scan. Fix this by sending out a probe request immediately after the first beacon, if active scan was requested by the user. Cc: stable@vger.kernel.org Reported-by: Catrinel Catrinescu <cc@80211.de> Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20220420104907.36275-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09cfg80211: declare MODULE_FIRMWARE for regulatory.dbDimitri John Ledkov
commit 7bc7981eeebe1b8e603ad2ffc5e84f4df76920dd upstream. Add MODULE_FIRMWARE declarations for regulatory.db and regulatory.db.p7s such that userspace tooling can discover and include these files. Cc: stable@vger.kernel.org Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com> Link: https://lore.kernel.org/r/20220414125004.267819-1-dimitri.ledkov@canonical.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09wifi: mac80211: fix use-after-free in chanctx codeJohannes Berg
commit 2965c4cdf7ad9ce0796fac5e57debb9519ea721e upstream. In ieee80211_vif_use_reserved_context(), when we have an old context and the new context's replace_state is set to IEEE80211_CHANCTX_REPLACE_NONE, we free the old context in ieee80211_vif_use_reserved_reassign(). Therefore, we cannot check the old_ctx anymore, so we should set it to NULL after this point. However, since the new_ctx replace state is clearly not IEEE80211_CHANCTX_REPLACES_OTHER, we're not going to do anything else in this function and can just return to avoid accessing the freed old_ctx. Cc: stable@vger.kernel.org Fixes: 5bcae31d9cb1 ("mac80211: implement multi-vif in-place reservations") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220601091926.df419d91b165.I17a9b3894ff0b8323ce2afdb153b101124c821e5@changeid Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09net/smc: postpone sk_refcnt increment in connect()liuyacan
[ Upstream commit 75c1edf23b95a9c66923d9269d8e86e4dbde151f ] Same trigger condition as commit 86434744. When setsockopt runs in parallel to a connect(), and switch the socket into fallback mode. Then the sk_refcnt is incremented in smc_connect(), but its state stay in SMC_INIT (NOT SMC_ACTIVE). This cause the corresponding sk_refcnt decrement in __smc_release() will not be performed. Fixes: 86434744fedf ("net/smc: add fallback check to connect()") Signed-off-by: liuyacan <liuyacan@corp.netease.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09rxrpc: Fix decision on when to generate an IDLE ACKDavid Howells
[ Upstream commit 9a3dedcf18096e8f7f22b8777d78c4acfdea1651 ] Fix the decision on when to generate an IDLE ACK by keeping a count of the number of packets we've received, but not yet soft-ACK'd, and the number of packets we've processed, but not yet hard-ACK'd, rather than trying to keep track of which DATA sequence numbers correspond to those points. We then generate an ACK when either counter exceeds 2. The counters are both cleared when we transcribe the information into any sort of ACK packet for transmission. IDLE and DELAY ACKs are skipped if both counters are 0 (ie. no change). Fixes: 805b21b929e2 ("rxrpc: Send an ACK after every few DATA packets we receive") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09rxrpc: Don't let ack.previousPacket regressDavid Howells
[ Upstream commit 81524b6312535897707f2942695da1d359a5e56b ] The previousPacket field in the rx ACK packet should never go backwards - it's now the highest DATA sequence number received, not the last on received (it used to be used for out of sequence detection). Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09rxrpc: Fix overlapping ACK accountingDavid Howells
[ Upstream commit 8940ba3cfe4841928777fd45eaa92051522c7f0c ] Fix accidental overlapping of Rx-phase ACK accounting with Tx-phase ACK accounting through variables shared between the two. call->acks_* members refer to ACKs received in the Tx phase and call->ackr_* members to ACKs sent/to be sent during the Rx phase. Fixes: 1a2391c30c0b ("rxrpc: Fix detection of out of order acks") Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09rxrpc: Don't try to resend the request if we're receiving the replyDavid Howells
[ Upstream commit 114af61f88fbe34d641b13922d098ffec4c1be1b ] rxrpc has a timer to trigger resending of unacked data packets in a call. This is not cancelled when a client call switches to the receive phase on the basis that most calls don't last long enough for it to ever expire. However, if it *does* expire after we've started to receive the reply, we shouldn't then go into trying to retransmit or pinging the server to find out if an ack got lost. Fix this by skipping the resend code if we're into receiving the reply to a client call. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09rxrpc: Fix listen() setting the bar too high for the prealloc ringsDavid Howells
[ Upstream commit 88e22159750b0d55793302eeed8ee603f5c1a95c ] AF_RXRPC's listen() handler lets you set the backlog up to 32 (if you bump up the sysctl), but whilst the preallocation circular buffers have 32 slots in them, one of them has to be a dead slot because we're using CIRC_CNT(). This means that listen(rxrpc_sock, 32) will cause an oops when the socket is closed because rxrpc_service_prealloc_one() allocated one too many calls and rxrpc_discard_prealloc() won't then be able to get rid of them because it'll think the ring is empty. rxrpc_release_calls_on_socket() then tries to abort them, but oopses because call->peer isn't yet set. Fix this by setting the maximum backlog to RXRPC_BACKLOG_MAX - 1 to match the ring capacity. BUG: kernel NULL pointer dereference, address: 0000000000000086 ... RIP: 0010:rxrpc_send_abort_packet+0x73/0x240 [rxrpc] Call Trace: <TASK> ? __wake_up_common_lock+0x7a/0x90 ? rxrpc_notify_socket+0x8e/0x140 [rxrpc] ? rxrpc_abort_call+0x4c/0x60 [rxrpc] rxrpc_release_calls_on_socket+0x107/0x1a0 [rxrpc] rxrpc_release+0xc9/0x1c0 [rxrpc] __sock_release+0x37/0xa0 sock_close+0x11/0x20 __fput+0x89/0x240 task_work_run+0x59/0x90 do_exit+0x319/0xaa0 Fixes: 00e907127e6f ("rxrpc: Preallocate peers, conns and calls for incoming service requests") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: https://lists.infradead.org/pipermail/linux-afs/2022-March/005079.html Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09sctp: read sk->sk_bound_dev_if once in sctp_rcv()Eric Dumazet
[ Upstream commit a20ea298071f46effa3aaf965bf9bb34c901db3f ] sctp_rcv() reads sk->sk_bound_dev_if twice while the socket is not locked. Another cpu could change this field under us. Fixes: 0fd9a65a76e8 ("[SCTP] Support SO_BINDTODEVICE socket option on incoming packets.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09Bluetooth: use hdev lock for accept_list and reject_list in conn reqNiels Dossche
[ Upstream commit fb048cae51bacdfbbda2954af3c213fdb1d484f4 ] All accesses (both reads and modifications) to hdev->{accept,reject}_list are protected by hdev lock, except the ones in hci_conn_request_evt. This can cause a race condition in the form of a list corruption. The solution is to protect these lists in hci_conn_request_evt as well. I was unable to find the exact commit that introduced the issue for the reject list, I was only able to find it for the accept list. Fixes: a55bd29d5227 ("Bluetooth: Add white list lookup for incoming connection requests") Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09Bluetooth: use hdev lock in activate_scan for hci_is_adv_monitoringNiels Dossche
[ Upstream commit 50a3633ae5e98cf1b80ef5b73c9e341aee9ad896 ] hci_is_adv_monitoring's function documentation states that it must be called under the hdev lock. Paths that leads to an unlocked call are: discov_update => start_discovery => interleaved_discov => active_scan and: discov_update => start_discovery => active_scan The solution is to take the lock in active_scan during the duration of the call to hci_is_adv_monitoring. Fixes: c32d624640fd ("Bluetooth: disable filter dup when scan for adv monitor") Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09Bluetooth: fix dangling sco_conn and use-after-free in sco_sock_timeoutYing Hsu
[ Upstream commit 7aa1e7d15f8a5b65f67bacb100d8fc033b21efa2 ] Connecting the same socket twice consecutively in sco_sock_connect() could lead to a race condition where two sco_conn objects are created but only one is associated with the socket. If the socket is closed before the SCO connection is established, the timer associated with the dangling sco_conn object won't be canceled. As the sock object is being freed, the use-after-free problem happens when the timer callback function sco_sock_timeout() accesses the socket. Here's the call trace: dump_stack+0x107/0x163 ? refcount_inc+0x1c/ print_address_description.constprop.0+0x1c/0x47e ? refcount_inc+0x1c/0x7b kasan_report+0x13a/0x173 ? refcount_inc+0x1c/0x7b check_memory_region+0x132/0x139 refcount_inc+0x1c/0x7b sco_sock_timeout+0xb2/0x1ba process_one_work+0x739/0xbd1 ? cancel_delayed_work+0x13f/0x13f ? __raw_spin_lock_init+0xf0/0xf0 ? to_kthread+0x59/0x85 worker_thread+0x593/0x70e kthread+0x346/0x35a ? drain_workqueue+0x31a/0x31a ? kthread_bind+0x4b/0x4b ret_from_fork+0x1f/0x30 Link: https://syzkaller.appspot.com/bug?extid=2bef95d3ab4daa10155b Reported-by: syzbot+2bef95d3ab4daa10155b@syzkaller.appspotmail.com Fixes: e1dee2c1de2b ("Bluetooth: fix repeated calls to sco_sock_kill") Signed-off-by: Ying Hsu <yinghsu@chromium.org> Reviewed-by: Joseph Hwang <josephsih@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09nl80211: don't hold RTNL in color change requestJohannes Berg
[ Upstream commit 1b550a0bebfc0b69d6ec08fe6eb58953a8aec48a ] It's not necessary to hold the RTNL across color change requests, since all the inner locking needs only the wiphy mutex which we already hold as well. Fixes: 0d2ab3aea50b ("nl80211: add support for BSS coloring") Link: https://lore.kernel.org/r/20220414140402.32e03e8c261b.I5e7dc6bc563a129b938c43298da6bb4e812400a5@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09NFC: NULL out the dev->rfkill to prevent UAFLin Ma
[ Upstream commit 1b0e81416a24d6e9b8c2341e22e8bf48f8b8bfc9 ] Commit 3e3b5dfcd16a ("NFC: reorder the logic in nfc_{un,}register_device") assumes the device_is_registered() in function nfc_dev_up() will help to check when the rfkill is unregistered. However, this check only take effect when device_del(&dev->dev) is done in nfc_unregister_device(). Hence, the rfkill object is still possible be dereferenced. The crash trace in latest kernel (5.18-rc2): [ 68.760105] ================================================================== [ 68.760330] BUG: KASAN: use-after-free in __lock_acquire+0x3ec1/0x6750 [ 68.760756] Read of size 8 at addr ffff888009c93018 by task fuzz/313 [ 68.760756] [ 68.760756] CPU: 0 PID: 313 Comm: fuzz Not tainted 5.18.0-rc2 #4 [ 68.760756] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 68.760756] Call Trace: [ 68.760756] <TASK> [ 68.760756] dump_stack_lvl+0x57/0x7d [ 68.760756] print_report.cold+0x5e/0x5db [ 68.760756] ? __lock_acquire+0x3ec1/0x6750 [ 68.760756] kasan_report+0xbe/0x1c0 [ 68.760756] ? __lock_acquire+0x3ec1/0x6750 [ 68.760756] __lock_acquire+0x3ec1/0x6750 [ 68.760756] ? lockdep_hardirqs_on_prepare+0x410/0x410 [ 68.760756] ? register_lock_class+0x18d0/0x18d0 [ 68.760756] lock_acquire+0x1ac/0x4f0 [ 68.760756] ? rfkill_blocked+0xe/0x60 [ 68.760756] ? lockdep_hardirqs_on_prepare+0x410/0x410 [ 68.760756] ? mutex_lock_io_nested+0x12c0/0x12c0 [ 68.760756] ? nla_get_range_signed+0x540/0x540 [ 68.760756] ? _raw_spin_lock_irqsave+0x4e/0x50 [ 68.760756] _raw_spin_lock_irqsave+0x39/0x50 [ 68.760756] ? rfkill_blocked+0xe/0x60 [ 68.760756] rfkill_blocked+0xe/0x60 [ 68.760756] nfc_dev_up+0x84/0x260 [ 68.760756] nfc_genl_dev_up+0x90/0xe0 [ 68.760756] genl_family_rcv_msg_doit+0x1f4/0x2f0 [ 68.760756] ? genl_family_rcv_msg_attrs_parse.constprop.0+0x230/0x230 [ 68.760756] ? security_capable+0x51/0x90 [ 68.760756] genl_rcv_msg+0x280/0x500 [ 68.760756] ? genl_get_cmd+0x3c0/0x3c0 [ 68.760756] ? lock_acquire+0x1ac/0x4f0 [ 68.760756] ? nfc_genl_dev_down+0xe0/0xe0 [ 68.760756] ? lockdep_hardirqs_on_prepare+0x410/0x410 [ 68.760756] netlink_rcv_skb+0x11b/0x340 [ 68.760756] ? genl_get_cmd+0x3c0/0x3c0 [ 68.760756] ? netlink_ack+0x9c0/0x9c0 [ 68.760756] ? netlink_deliver_tap+0x136/0xb00 [ 68.760756] genl_rcv+0x1f/0x30 [ 68.760756] netlink_unicast+0x430/0x710 [ 68.760756] ? memset+0x20/0x40 [ 68.760756] ? netlink_attachskb+0x740/0x740 [ 68.760756] ? __build_skb_around+0x1f4/0x2a0 [ 68.760756] netlink_sendmsg+0x75d/0xc00 [ 68.760756] ? netlink_unicast+0x710/0x710 [ 68.760756] ? netlink_unicast+0x710/0x710 [ 68.760756] sock_sendmsg+0xdf/0x110 [ 68.760756] __sys_sendto+0x19e/0x270 [ 68.760756] ? __ia32_sys_getpeername+0xa0/0xa0 [ 68.760756] ? fd_install+0x178/0x4c0 [ 68.760756] ? fd_install+0x195/0x4c0 [ 68.760756] ? kernel_fpu_begin_mask+0x1c0/0x1c0 [ 68.760756] __x64_sys_sendto+0xd8/0x1b0 [ 68.760756] ? lockdep_hardirqs_on+0xbf/0x130 [ 68.760756] ? syscall_enter_from_user_mode+0x1d/0x50 [ 68.760756] do_syscall_64+0x3b/0x90 [ 68.760756] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 68.760756] RIP: 0033:0x7f67fb50e6b3 ... [ 68.760756] RSP: 002b:00007f67fa91fe90 EFLAGS: 00000293 ORIG_RAX: 000000000000002c [ 68.760756] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f67fb50e6b3 [ 68.760756] RDX: 000000000000001c RSI: 0000559354603090 RDI: 0000000000000003 [ 68.760756] RBP: 00007f67fa91ff00 R08: 00007f67fa91fedc R09: 000000000000000c [ 68.760756] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffe824d496e [ 68.760756] R13: 00007ffe824d496f R14: 00007f67fa120000 R15: 0000000000000003 [ 68.760756] </TASK> [ 68.760756] [ 68.760756] Allocated by task 279: [ 68.760756] kasan_save_stack+0x1e/0x40 [ 68.760756] __kasan_kmalloc+0x81/0xa0 [ 68.760756] rfkill_alloc+0x7f/0x280 [ 68.760756] nfc_register_device+0xa3/0x1a0 [ 68.760756] nci_register_device+0x77a/0xad0 [ 68.760756] nfcmrvl_nci_register_dev+0x20b/0x2c0 [ 68.760756] nfcmrvl_nci_uart_open+0xf2/0x1dd [ 68.760756] nci_uart_tty_ioctl+0x2c3/0x4a0 [ 68.760756] tty_ioctl+0x764/0x1310 [ 68.760756] __x64_sys_ioctl+0x122/0x190 [ 68.760756] do_syscall_64+0x3b/0x90 [ 68.760756] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 68.760756] [ 68.760756] Freed by task 314: [ 68.760756] kasan_save_stack+0x1e/0x40 [ 68.760756] kasan_set_track+0x21/0x30 [ 68.760756] kasan_set_free_info+0x20/0x30 [ 68.760756] __kasan_slab_free+0x108/0x170 [ 68.760756] kfree+0xb0/0x330 [ 68.760756] device_release+0x96/0x200 [ 68.760756] kobject_put+0xf9/0x1d0 [ 68.760756] nfc_unregister_device+0x77/0x190 [ 68.760756] nfcmrvl_nci_unregister_dev+0x88/0xd0 [ 68.760756] nci_uart_tty_close+0xdf/0x180 [ 68.760756] tty_ldisc_kill+0x73/0x110 [ 68.760756] tty_ldisc_hangup+0x281/0x5b0 [ 68.760756] __tty_hangup.part.0+0x431/0x890 [ 68.760756] tty_release+0x3a8/0xc80 [ 68.760756] __fput+0x1f0/0x8c0 [ 68.760756] task_work_run+0xc9/0x170 [ 68.760756] exit_to_user_mode_prepare+0x194/0x1a0 [ 68.760756] syscall_exit_to_user_mode+0x19/0x50 [ 68.760756] do_syscall_64+0x48/0x90 [ 68.760756] entry_SYSCALL_64_after_hwframe+0x44/0xae This patch just add the null out of dev->rfkill to make sure such dereference cannot happen. This is safe since the device_lock() already protect the check/write from data race. Fixes: 3e3b5dfcd16a ("NFC: reorder the logic in nfc_{un,}register_device") Signed-off-by: Lin Ma <linma@zju.edu.cn> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09nl80211: show SSID for P2P_GO interfacesJohannes Berg
[ Upstream commit a75971bc2b8453630e9f85e0beaa4da8db8277a3 ] There's no real reason not to send the SSID to userspace when it requests information about P2P_GO, it is, in that respect, exactly the same as AP interfaces. Fix that. Fixes: 44905265bc15 ("nl80211: don't expose wdev->ssid for most interfaces") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20220318134656.14354ae223f0.Ia25e85a512281b92e1645d4160766a4b1a471597@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09mptcp: reset the packet scheduler on PRIO changePaolo Abeni
[ Upstream commit 0e203c324752e13d22624ab7ffafe934fa06ab50 ] Similar to the previous patch, for priority changes requested by the local PM. Reported-and-suggested-by: Davide Caratti <dcaratti@redhat.com> Fixes: 067065422fcd ("mptcp: add the outgoing MP_PRIO support") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09afs: Adjust ACK interpretation to try and cope with NATDavid Howells
[ Upstream commit adc9613ff66c26ebaff9814973181ac178beb90b ] If a client's address changes, say if it is NAT'd, this can disrupt an in progress operation. For most operations, this is not much of a problem, but StoreData can be different as some servers modify the target file as the data comes in, so if a store request is disrupted, the file can get corrupted on the server. The problem is that the server doesn't recognise packets that come after the change of address as belonging to the original client and will bounce them, either by sending an OUT_OF_SEQUENCE ACK to the apparent new call if the packet number falls within the initial sequence number window of a call or by sending an EXCEEDS_WINDOW ACK if it falls outside and then aborting it. In both cases, firstPacket will be 1 and previousPacket will be 0 in the ACK information. Fix this by the following means: (1) If a client call receives an EXCEEDS_WINDOW ACK with firstPacket as 1 and previousPacket as 0, assume this indicates that the server saw the incoming packets from a different peer and thus as a different call. Fail the call with error -ENETRESET. (2) Also fail the call if a similar OUT_OF_SEQUENCE ACK occurs if the first packet has been hard-ACK'd. If it hasn't been hard-ACK'd, the ACK packet will cause it to get retransmitted, so the call will just be repeated. (3) Make afs_select_fileserver() treat -ENETRESET as a straight fail of the operation. (4) Prioritise the error code over things like -ECONNRESET as the server did actually respond. (5) Make writeback treat -ENETRESET as a retryable error and make it redirty all the pages involved in a write so that the VM will retry. Note that there is still a circumstance that I can't easily deal with: if the operation is fully received and processed by the server, but the reply is lost due to address change. There's no way to know if the op happened. We can examine the server, but a conflicting change could have been made by a third party - and we can't tell the difference. In such a case, a message like: kAFS: vnode modified {100058:146266} b7->b8 YFS.StoreData64 (op=2646a) will be logged to dmesg on the next op to touch the file and the client will reset the inode state, including invalidating clean parts of the pagecache. Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2021-December/004811.html # v1 Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09rxrpc, afs: Fix selection of abort codesDavid Howells
[ Upstream commit de696c4784f0706884458893c5a6c39b3a3ff65c ] The RX_USER_ABORT code should really only be used to indicate that the user of the rxrpc service (ie. userspace) implicitly caused a call to be aborted - for instance if the AF_RXRPC socket is closed whilst the call was in progress. (The user may also explicitly abort a call and specify the abort code to use). Change some of the points of generation to use other abort codes instead: (1) Abort the call with RXGEN_SS_UNMARSHAL or RXGEN_CC_UNMARSHAL if we see ENOMEM and EFAULT during received data delivery and abort with RX_CALL_DEAD in the default case. (2) Abort with RXGEN_SS_MARSHAL if we get ENOMEM whilst trying to send a reply. (3) Abort with RX_CALL_DEAD if we stop hearing from the peer if we had heard from the peer and abort with RX_CALL_TIMEOUT if we hadn't. (4) Abort with RX_CALL_DEAD if we try to disconnect a call that's not completed successfully or been aborted. Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09rxrpc: Return an error to sendmsg if call failedDavid Howells
[ Upstream commit 4ba68c5192554876bd8c3afd904e3064d2915341 ] If at the end of rxrpc sendmsg() or rxrpc_kernel_send_data() the call that was being given data was aborted remotely or otherwise failed, return an error rather than returning the amount of data buffered for transmission. The call (presumably) did not complete, so there's not much point continuing with it. AF_RXRPC considers it "complete" and so will be unwilling to do anything else with it - and won't send a notification for it, deeming the return from sendmsg sufficient. Not returning an error causes afs to incorrectly handle a StoreData operation that gets interrupted by a change of address due to NAT reconfiguration. This doesn't normally affect most operations since their request parameters tend to fit into a single UDP packet and afs_make_call() returns before the server responds; StoreData is different as it involves transmission of a lot of data. This can be triggered on a client by doing something like: dd if=/dev/zero of=/afs/example.com/foo bs=1M count=512 at one prompt, and then changing the network address at another prompt, e.g.: ifconfig enp6s0 inet 192.168.6.2 && route add 192.168.6.1 dev enp6s0 Tracing packets on an Auristor fileserver looks something like: 192.168.6.1 -> 192.168.6.3 RX 107 ACK Idle Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001 192.168.6.3 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(64538) (64538) 192.168.6.3 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(64538) (64538) 192.168.6.1 -> 192.168.6.3 RX 107 ACK Idle Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001 <ARP exchange for 192.168.6.2> 192.168.6.2 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(0) (0) 192.168.6.2 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(0) (0) 192.168.6.1 -> 192.168.6.2 RX 107 ACK Exceeds Window Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001 192.168.6.1 -> 192.168.6.2 RX 74 ABORT Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001 192.168.6.1 -> 192.168.6.2 RX 74 ABORT Seq: 29321 Call: 4 Source Port: 7000 Destination Port: 7001 The Auristor fileserver logs code -453 (RXGEN_SS_UNMARSHAL), but the abort code received by kafs is -5 (RX_PROTOCOL_ERROR) as the rx layer sees the condition and generates an abort first and the unmarshal error is a consequence of that at the application layer. Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2021-December/004810.html # v1 Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09net: remove two BUG() from skb_checksum_help()Eric Dumazet
[ Upstream commit d7ea0d9df2a6265b2b180d17ebc64b38105968fc ] I have a syzbot report that managed to get a crash in skb_checksum_help() If syzbot can trigger these BUG(), it makes sense to replace them with more friendly WARN_ON_ONCE() since skb_checksum_help() can instead return an error code. Note that syzbot will still crash there, until real bug is fixed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>