Age | Commit message (Collapse) | Author |
|
[ Upstream commit ad32205470919c8e04cdd33e0613bdba50c2376d ]
The correct name is GSWIP (Gigabit Switch IP). Typo was introduced in
875138f81d71a ("dsa: Move tagger name into its ops structure") while
moving tagger names to their structures.
Fixes: 875138f81d71a ("dsa: Move tagger name into its ops structure")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e176b1ba476cf36f723cfcc7a9e57f3cb47dec70 ]
When the packet pointed to by retransmit_skb_hint is unlinked by ACK,
retransmit_skb_hint will be set to NULL in tcp_clean_rtx_queue().
If packet loss is detected at this time, retransmit_skb_hint will be set
to point to the current packet loss in tcp_verify_retransmit_hint(),
then the packets that were previously marked lost but not retransmitted
due to the restriction of cwnd will be skipped and cannot be
retransmitted.
To fix this, when retransmit_skb_hint is NULL, retransmit_skb_hint can
be reset only after all marked lost packets are retransmitted
(retrans_out >= lost_out), otherwise we need to traverse from
tcp_rtx_queue_head in tcp_xmit_retransmit_queue().
Packetdrill to demonstrate:
// Disable RACK and set max_reordering to keep things simple
0 `sysctl -q net.ipv4.tcp_recovery=0`
+0 `sysctl -q net.ipv4.tcp_max_reordering=3`
// Establish a connection
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0 > S. 0:0(0) ack 1 <...>
+.01 < . 1:1(0) ack 1 win 257
+0 accept(3, ..., ...) = 4
// Send 8 data segments
+0 write(4, ..., 8000) = 8000
+0 > P. 1:8001(8000) ack 1
// Enter recovery and 1:3001 is marked lost
+.01 < . 1:1(0) ack 1 win 257 <sack 3001:4001,nop,nop>
+0 < . 1:1(0) ack 1 win 257 <sack 5001:6001 3001:4001,nop,nop>
+0 < . 1:1(0) ack 1 win 257 <sack 5001:7001 3001:4001,nop,nop>
// Retransmit 1:1001, now retransmit_skb_hint points to 1001:2001
+0 > . 1:1001(1000) ack 1
// 1001:2001 was ACKed causing retransmit_skb_hint to be set to NULL
+.01 < . 1:1(0) ack 2001 win 257 <sack 5001:8001 3001:4001,nop,nop>
// Now retransmit_skb_hint points to 4001:5001 which is now marked lost
// BUG: 2001:3001 was not retransmitted
+0 > . 2001:3001(1000) ack 1
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 44c23d71599f81a1c7fe8389e0319822dd50c37c ]
It seems better to init ife->metalist earlier in tcf_ife_init()
to avoid the following crash :
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
__tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
__tcf_idr_release net/sched/act_api.c:165 [inline]
__tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
tcf_idr_release include/net/act_api.h:171 [inline]
tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:639 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:659
____sys_sendmsg+0x753/0x880 net/socket.c:2330
___sys_sendmsg+0x100/0x170 net/socket.c:2384
__sys_sendmsg+0x105/0x1d0 net/socket.c:2417
__do_sys_sendmsg net/socket.c:2426 [inline]
__se_sys_sendmsg net/socket.c:2424 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit bd5874da57edd001b35cf28ae737779498c16a56 ]
DSA subsystem takes care of netdev statistics since commit 4ed70ce9f01c
("net: dsa: Refactor transmit path to eliminate duplication"), so
any accounting inside tagger callbacks is redundant and can lead to
messing up the stats.
This bug is present in Qualcomm tagger since day 0.
Fixes: cafdc45c949b ("net-next: dsa: add Qualcomm tag RX/TX handler")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 53d374979ef147ab51f5d632dfe20b14aebeccd0 ]
syzbot reported some bogus lockdep warnings, for example bad unlock
balance in sch_direct_xmit(). They are due to a race condition between
slow path and fast path, that is qdisc_xmit_lock_key gets re-registered
in netdev_update_lockdep_key() on slow path, while we could still
acquire the queue->_xmit_lock on fast path in this small window:
CPU A CPU B
__netif_tx_lock();
lockdep_unregister_key(qdisc_xmit_lock_key);
__netif_tx_unlock();
lockdep_register_key(qdisc_xmit_lock_key);
In fact, unlike the addr_list_lock which has to be reordered when
the master/slave device relationship changes, queue->_xmit_lock is
only acquired on fast path and only when NETIF_F_LLTX is not set,
so there is likely no nested locking for it.
Therefore, we can just get rid of re-registration of
qdisc_xmit_lock_key.
Reported-by: syzbot+4ec99438ed7450da6272@syzkaller.appspotmail.com
Fixes: ab92d68fc22f ("net: core: add generic lockdep keys")
Cc: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4cc4a1708903f404d2ca0dfde30e71e052c6cbc9 upstream.
The distributed arp table is using a DHT to store and retrieve MAC address
information for an IP address. This is done using unicast messages to
selected peers. The potential peers are looked up using the IP address and
the VID.
While the IP address is always stored in big endian byte order, this is not
the case of the VID. It can (depending on the host system) either be big
endian or little endian. The host must therefore always convert it to big
endian to ensure that all devices calculate the same peers for the same
lookup data.
Fixes: be1db4f6615b ("batman-adv: make the Distributed ARP Table vlan aware")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2e012c74823629d9db27963c79caa3f5b2010746 upstream.
It's possible to leak time wait and request sockets via the following
BPF pseudo code:
sk = bpf_skc_lookup_tcp(...)
if (sk)
bpf_sk_release(sk)
If sk->sk_state is TCP_NEW_SYN_RECV or TCP_TIME_WAIT the refcount taken
by bpf_skc_lookup_tcp is not undone by bpf_sk_release. This is because
sk_flags is re-used for other data in both kinds of sockets. The check
!sock_flag(sk, SOCK_RCU_FREE)
therefore returns a bogus result. Check that sk_flags is valid by calling
sk_fullsock. Skip checking SOCK_RCU_FREE if we already know that sk is
not a full socket.
Fixes: edbf8c01de5a ("bpf: add skc_lookup_tcp helper")
Fixes: f7355a6c0497 ("bpf: Check sk_fullsock() before returning from bpf_sk_lookup()")
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200110132336.26099-1-lmb@cloudflare.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 335178d5429c4cee61b58f4ac80688f556630818 upstream.
syzbot reported following crash:
list_del corruption, ffff88808c9bb000->prev is LIST_POISON2 (dead000000000122)
[..]
Call Trace:
__list_del_entry include/linux/list.h:131 [inline]
list_del_rcu include/linux/rculist.h:148 [inline]
nf_tables_commit+0x1068/0x3b30 net/netfilter/nf_tables_api.c:7183
[..]
The commit transaction list has:
NFT_MSG_NEWTABLE
NFT_MSG_NEWFLOWTABLE
NFT_MSG_DELFLOWTABLE
NFT_MSG_DELTABLE
A missing generation check during DELTABLE processing causes it to queue
the DELFLOWTABLE operation a second time, so we corrupt the list here:
case NFT_MSG_DELFLOWTABLE:
list_del_rcu(&nft_trans_flowtable(trans)->list);
nf_tables_flowtable_notify(&trans->ctx,
because we have two different DELFLOWTABLE transactions for the same
flowtable. We then call list_del_rcu() twice for the same flowtable->list.
The object handling seems to suffer from the same bug so add a generation
check too and only queue delete transactions for flowtables/objects that
are still active in the next generation.
Reported-by: syzbot+37a6804945a3a13b1572@syzkaller.appspotmail.com
Fixes: 3b49e2e94e6eb ("netfilter: nf_tables: add flow table netlink frontend")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ec7470b834fe7b5d7eff11b6677f5d7fdf5e9a91 upstream.
This patch fixes a WARN_ON in nft_set_destroy() due to missing
set reference count drop from the preparation phase. This is triggered
by the module autoload path. Do not exercise the abort path from
nft_request_module() while preparation phase cleaning up is still
pending.
WARNING: CPU: 3 PID: 3456 at net/netfilter/nf_tables_api.c:3740 nft_set_destroy+0x45/0x50 [nf_tables]
[...]
CPU: 3 PID: 3456 Comm: nft Not tainted 5.4.6-arch3-1 #1
RIP: 0010:nft_set_destroy+0x45/0x50 [nf_tables]
Code: e8 30 eb 83 c6 48 8b 85 80 00 00 00 48 8b b8 90 00 00 00 e8 dd 6b d7 c5 48 8b 7d 30 e8 24 dd eb c5 48 89 ef 5d e9 6b c6 e5 c5 <0f> 0b c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 7f 10 e9 52
RSP: 0018:ffffac4f43e53700 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff99d63a154d80 RCX: 0000000001f88e03
RDX: 0000000001f88c03 RSI: ffff99d6560ef0c0 RDI: ffff99d63a101200
RBP: ffff99d617721de0 R08: 0000000000000000 R09: 0000000000000318
R10: 00000000f0000000 R11: 0000000000000001 R12: ffffffff880fabf0
R13: dead000000000122 R14: dead000000000100 R15: ffff99d63a154d80
FS: 00007ff3dbd5b740(0000) GS:ffff99d6560c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00001cb5de6a9000 CR3: 000000016eb6a004 CR4: 00000000001606e0
Call Trace:
__nf_tables_abort+0x3e3/0x6d0 [nf_tables]
nft_request_module+0x6f/0x110 [nf_tables]
nft_expr_type_request_module+0x28/0x50 [nf_tables]
nf_tables_expr_parse+0x198/0x1f0 [nf_tables]
nft_expr_init+0x3b/0xf0 [nf_tables]
nft_dynset_init+0x1e2/0x410 [nf_tables]
nf_tables_newrule+0x30a/0x930 [nf_tables]
nfnetlink_rcv_batch+0x2a0/0x640 [nfnetlink]
nfnetlink_rcv+0x125/0x171 [nfnetlink]
netlink_unicast+0x179/0x210
netlink_sendmsg+0x208/0x3d0
sock_sendmsg+0x5e/0x60
____sys_sendmsg+0x21b/0x290
Update comment on the code to describe the new behaviour.
Reported-by: Marco Oliverio <marco.oliverio@tanaza.com>
Fixes: 452238e8d5ff ("netfilter: nf_tables: add and use helper for module autoload")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9332d27d7918182add34e8043f6a754530fdd022 upstream.
This WARN can trigger because some of the names fed to the module
autoload function can be of arbitrary length.
Remove the WARN and add limits for all NLA_STRING attributes.
Reported-by: syzbot+0e63ae76d117ae1c3a01@syzkaller.appspotmail.com
Fixes: 452238e8d5ffd8 ("netfilter: nf_tables: add and use helper for module autoload")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9ec22d7c6c69146180577f3ad5fdf504beeaee62 upstream.
Fixes: af308b94a2a4a5 ("netfilter: nf_tables: add tunnel support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1c702bf902bd37349f6d91cd7f4b372b1e46d0ed upstream.
else we get null deref when one of the attributes is missing, both
must be non-null.
Reported-by: syzbot+76d0b80493ac881ff77b@syzkaller.appspotmail.com
Fixes: aaecfdb5c5dd8ba ("netfilter: nf_tables: match on tunnel metadata")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 61177e911dad660df86a4553eb01c95ece2f6a82 upstream.
Commit 8303b7e8f018 ("netfilter: nat: fix spurious connection timeouts")
made nf_nat_icmp_reply_translation() use icmp_manip_pkt() as the l4
manipulation function for the outer packet on ICMP errors.
However, icmp_manip_pkt() assumes the packet has an 'id' field which
is not correct for all types of ICMP messages.
This is not correct for ICMP error packets, and leads to bogus bytes
being written the ICMP header, which can be wrongfully regarded as
'length' bytes by RFC 4884 compliant receivers.
Fix by assigning the 'id' field only for ICMP messages that have this
semantic.
Reported-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Fixes: 8303b7e8f018 ("netfilter: nat: fix spurious connection timeouts")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 212e7f56605ef9688d0846db60c6c6ec06544095 upstream.
An earlier commit (1b789577f655060d98d20e,
"netfilter: arp_tables: init netns pointer in xt_tgchk_param struct")
fixed missing net initialization for arptables, but turns out it was
incomplete. We can get a very similar struct net NULL deref during
error unwinding:
general protection fault: 0000 [#1] PREEMPT SMP KASAN
RIP: 0010:xt_rateest_put+0xa1/0x440 net/netfilter/xt_RATEEST.c:77
xt_rateest_tg_destroy+0x72/0xa0 net/netfilter/xt_RATEEST.c:175
cleanup_entry net/ipv4/netfilter/arp_tables.c:509 [inline]
translate_table+0x11f4/0x1d80 net/ipv4/netfilter/arp_tables.c:587
do_replace net/ipv4/netfilter/arp_tables.c:981 [inline]
do_arpt_set_ctl+0x317/0x650 net/ipv4/netfilter/arp_tables.c:1461
Also init the netns pointer in xt_tgdtor_param struct.
Fixes: add67461240c1d ("netfilter: add struct net * to target parameters")
Reported-by: syzbot+91bdd8eece0f6629ec8b@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c120959387efa51479056fd01dc90adfba7a590c upstream.
map->members is freed by ip_set_free() right before using it in
mtype_ext_cleanup() again. So we just have to move it down.
Reported-by: syzbot+4c3cc6dbe7259dbf9054@syzkaller.appspotmail.com
Fixes: 40cd63bf33b2 ("netfilter: ipset: Support extensions which need a per data destroy function")
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e7a5f1f1cd0008e5ad379270a8657e121eedb669 upstream.
Right now in tcp_bpf_recvmsg, sock read data first from sk_receive_queue
if not empty than psock->ingress_msg otherwise. If a FIN packet arrives
and there's also some data in psock->ingress_msg, the data in
psock->ingress_msg will be purged. It is always happen when request to a
HTTP1.0 server like python SimpleHTTPServer since the server send FIN
packet after data is sent out.
Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Reported-by: Arika Chen <eaglesora@gmail.com>
Suggested-by: Arika Chen <eaglesora@gmail.com>
Signed-off-by: Lingpeng Chen <forrest0579@gmail.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200109014833.18951-1-forrest0579@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 81c044fc3bdc5b7be967cd3682528ea94b58c06a upstream.
The fragments attached to a skb can be part of a compound page. In that case,
page_ref_inc will increment the refcount for the wrong page. Fix this by
using get_page instead, which calls page_ref_inc on the compound head and
also checks for overflow.
Fixes: 2b67f944f88c ("cfg80211: reuse existing page fragments in A-MSDU rx")
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20200113182107.20461-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit df16737d438f534d0cc9948c7c5158f1986c5c87 upstream.
The per-tid statistics need to be released after the call to rdev_get_station
Cc: stable@vger.kernel.org
Fixes: 8689c051a201 ("cfg80211: dynamically allocate per-tid stats for station info")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20200108170630.33680-2-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2a279b34169e9bbf7c240691466420aba75b4175 upstream.
The per-tid statistics need to be released after the call to rdev_get_station
Cc: stable@vger.kernel.org
Fixes: 5ab92e7fe49a ("cfg80211: add support to probe unexercised mesh link")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20200108170630.33680-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5a128a088a2ab0b5190eeb232b5aa0b1017a0317 upstream.
Use methods which do not try to acquire the wdev lock themselves.
Cc: stable@vger.kernel.org
Fixes: 37b1c004685a3 ("cfg80211: Support all iftypes in autodisconnect_wk")
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Link: https://lore.kernel.org/r/20200108115536.2262-1-markus.theil@tu-ilmenau.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7361d44896ff20d48bdd502d1a0cd66308055d45 upstream.
When user returns SK_DROP we need to reset the number of copied bytes
to indicate to the user the bytes were dropped and not sent. If we
don't reset the copied arg sendmsg will return as if those bytes were
copied giving the user a positive return value.
This works as expected today except in the case where the user also
pops bytes. In the pop case the sg.size is reduced but we don't correctly
account for this when copied bytes is reset. The popped bytes are not
accounted for and we return a small positive value potentially confusing
the user.
The reason this happens is due to a typo where we do the wrong comparison
when accounting for pop bytes. In this fix notice the if/else is not
needed and that we have a similar problem if we push data except its not
visible to the user because if delta is larger the sg.size we return a
negative value so it appears as an error regardless.
Fixes: 7246d8ed4dcce ("bpf: helper to pop data from messages")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/bpf/20200111061206.8028-9-john.fastabend@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9aaaa56845a06aeabdd597cbe19492dc01f281ec upstream.
Its possible through a set of push, pop, apply helper calls to construct
a skmsg, which is just a ring of scatterlist elements, with the start
value larger than the end value. For example,
end start
|_0_|_1_| ... |_n_|_n+1_|
Where end points at 1 and start points and n so that valid elements is
the set {n, n+1, 0, 1}.
Currently, because we don't build the correct chain only {n, n+1} will
be sent. This adds a check and sg_chain call to correctly submit the
above to the crypto and tls send path.
Fixes: d3b18ad31f93d ("tls: add bpf support to sk_msg handling")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/bpf/20200111061206.8028-8-john.fastabend@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d468e4775c1c351616947ba0cccc43273963b9b5 upstream.
It is possible to build a plaintext buffer using push helper that is larger
than the allocated encrypt buffer. When this record is pushed to crypto
layers this can result in a NULL pointer dereference because the crypto
API expects the encrypt buffer is large enough to fit the plaintext
buffer. Kernel splat below.
To resolve catch the cases this can happen and split the buffer into two
records to send individually. Unfortunately, there is still one case to
handle where the split creates a zero sized buffer. In this case we merge
the buffers and unmark the split. This happens when apply is zero and user
pushed data beyond encrypt buffer. This fixes the original case as well
because the split allocated an encrypt buffer larger than the plaintext
buffer and the merge simply moves the pointers around so we now have
a reference to the new (larger) encrypt buffer.
Perhaps its not ideal but it seems the best solution for a fixes branch
and avoids handling these two cases, (a) apply that needs split and (b)
non apply case. The are edge cases anyways so optimizing them seems not
necessary unless someone wants later in next branches.
[ 306.719107] BUG: kernel NULL pointer dereference, address: 0000000000000008
[...]
[ 306.747260] RIP: 0010:scatterwalk_copychunks+0x12f/0x1b0
[...]
[ 306.770350] Call Trace:
[ 306.770956] scatterwalk_map_and_copy+0x6c/0x80
[ 306.772026] gcm_enc_copy_hash+0x4b/0x50
[ 306.772925] gcm_hash_crypt_remain_continue+0xef/0x110
[ 306.774138] gcm_hash_crypt_continue+0xa1/0xb0
[ 306.775103] ? gcm_hash_crypt_continue+0xa1/0xb0
[ 306.776103] gcm_hash_assoc_remain_continue+0x94/0xa0
[ 306.777170] gcm_hash_assoc_continue+0x9d/0xb0
[ 306.778239] gcm_hash_init_continue+0x8f/0xa0
[ 306.779121] gcm_hash+0x73/0x80
[ 306.779762] gcm_encrypt_continue+0x6d/0x80
[ 306.780582] crypto_gcm_encrypt+0xcb/0xe0
[ 306.781474] crypto_aead_encrypt+0x1f/0x30
[ 306.782353] tls_push_record+0x3b9/0xb20 [tls]
[ 306.783314] ? sk_psock_msg_verdict+0x199/0x300
[ 306.784287] bpf_exec_tx_verdict+0x3f2/0x680 [tls]
[ 306.785357] tls_sw_sendmsg+0x4a3/0x6a0 [tls]
test_sockmap test signature to trigger bug,
[TEST]: (1, 1, 1, sendmsg, pass,redir,start 1,end 2,pop (1,2),ktls,):
Fixes: d3b18ad31f93d ("tls: add bpf support to sk_msg handling")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/bpf/20200111061206.8028-7-john.fastabend@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit cf21e9ba1eb86c9333ca5b05b2f1cc94021bcaef upstream.
Leaving an incorrect end mark in place when passing to crypto
layer will cause crypto layer to stop processing data before
all data is encrypted. To fix clear the end mark on push
data instead of expecting users of the helper to clear the
mark value after the fact.
This happens when we push data into the middle of a skmsg and
have room for it so we don't do a set of copies that already
clear the end flag.
Fixes: 6fff607e2f14b ("bpf: sk_msg program helper bpf_msg_push_data")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/bpf/20200111061206.8028-6-john.fastabend@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6562e29cf6f0ddd368657d97a8d484ffc30df5ef upstream.
In the push, pull, and pop helpers operating on skmsg objects to make
data writable or insert/remove data we use this bounds check to ensure
specified data is valid,
/* Bounds checks: start and pop must be inside message */
if (start >= offset + l || last >= msg->sg.size)
return -EINVAL;
The problem here is offset has already included the length of the
current element the 'l' above. So start could be past the end of
the scatterlist element in the case where start also points into an
offset on the last skmsg element.
To fix do the accounting slightly different by adding the length of
the previous entry to offset at the start of the iteration. And
ensure its initialized to zero so that the first iteration does
nothing.
Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Fixes: 6fff607e2f14b ("bpf: sk_msg program helper bpf_msg_push_data")
Fixes: 7246d8ed4dcce ("bpf: helper to pop data from messages")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/bpf/20200111061206.8028-5-john.fastabend@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 33bfe20dd7117dd81fd896a53f743a233e1ad64f upstream.
When sockmap sock with TLS enabled is removed we cleanup bpf/psock state
and call tcp_update_ulp() to push updates to TLS ULP on top. However, we
don't push the write_space callback up and instead simply overwrite the
op with the psock stored previous op. This may or may not be correct so
to ensure we don't overwrite the TLS write space hook pass this field to
the ULP and have it fixup the ctx.
This completes a previous fix that pushed the ops through to the ULP
but at the time missed doing this for write_space, presumably because
write_space TLS hook was added around the same time.
Fixes: 95fa145479fbc ("bpf: sockmap/tls, close can race with map free")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/bpf/20200111061206.8028-4-john.fastabend@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7e81a35302066c5a00b4c72d83e3ea4cad6eeb5b upstream.
The sock_map_free() and sock_hash_free() paths used to delete sockmap
and sockhash maps walk the maps and destroy psock and bpf state associated
with the socks in the map. When done the socks no longer have BPF programs
attached and will function normally. This can happen while the socks in
the map are still "live" meaning data may be sent/received during the walk.
Currently, though we don't take the sock_lock when the psock and bpf state
is removed through this path. Specifically, this means we can be writing
into the ops structure pointers such as sendmsg, sendpage, recvmsg, etc.
while they are also being called from the networking side. This is not
safe, we never used proper READ_ONCE/WRITE_ONCE semantics here if we
believed it was safe. Further its not clear to me its even a good idea
to try and do this on "live" sockets while networking side might also
be using the socket. Instead of trying to reason about using the socks
from both sides lets realize that every use case I'm aware of rarely
deletes maps, in fact kubernetes/Cilium case builds map at init and
never tears it down except on errors. So lets do the simple fix and
grab sock lock.
This patch wraps sock deletes from maps in sock lock and adds some
annotations so we catch any other cases easier.
Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/bpf/20200111061206.8028-3-john.fastabend@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit abc9b4e0549b93fdaff56e9532bc49a2d7b04955 upstream.
When a user message is sent, TIPC will check if the socket has faced a
congestion at link layer. If that happens, it will make a sleep to wait
for the congestion to disappear. This leaves a gap for other users to
take over the socket (e.g. multi threads) since the socket is released
as well. Also, in case of connectionless (e.g. SOCK_RDM), user is free
to send messages to various destinations (e.g. via 'sendto()'), then
the socket's preformatted header has to be updated correspondingly
prior to the actual payload message building.
Unfortunately, the latter action is done before the first action which
causes a condition issue that the destination of a certain message can
be modified incorrectly in the middle, leading to wrong destination
when that message is built. Consequently, when the message is sent to
the link layer, it gets stuck there forever because the peer node will
simply reject it. After a number of retransmission attempts, the link
is eventually taken down and the retransmission failure is reported.
This commit fixes the problem by rearranging the order of actions to
prevent the race condition from occurring, so the message building is
'atomic' and its header will not be modified by anyone.
Fixes: 365ad353c256 ("tipc: reduce risk of user starvation during link congestion")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit dca4a17d24ee9d878836ce5eb8dc25be1ffa5729 upstream.
In commit c55c8edafa91 ("tipc: smooth change between replicast and
broadcast"), we allow instant switching between replicast and broadcast
by sending a dummy 'SYN' packet on the last used link to synchronize
packets on the links. The 'SYN' message is an object of link congestion
also, so if that happens, a 'SOCK_WAKEUP' will be scheduled to be sent
back to the socket...
However, in that commit, we simply use the same socket 'cong_link_cnt'
counter for both the 'SYN' & normal payload message sending. Therefore,
if both the replicast & broadcast links are congested, the counter will
be not updated correctly but overwritten by the latter congestion.
Later on, when the 'SOCK_WAKEUP' messages are processed, the counter is
reduced one by one and eventually overflowed. Consequently, further
activities on the socket will only wait for the false congestion signal
to disappear but never been met.
Because sending the 'SYN' message is vital for the mechanism, it should
be done anyway. This commit fixes the issue by marking the message with
an error code e.g. 'TIPC_ERR_NO_PORT', so its sending should not face a
link congestion, there is no need to touch the socket 'cong_link_cnt'
either. In addition, in the event of any error (e.g. -ENOBUFS), we will
purge the entire payload message queue and make a return immediately.
Fixes: c55c8edafa91 ("tipc: smooth change between replicast and broadcast")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 063c60d39180cec7c9317f5acfc3071f8fecd705 ]
Fix rxrpc_new_incoming_call() to check that we have a suitable service key
available for the combination of service ID and security class of a new
incoming call - and to reject calls for which we don't.
This causes an assertion like the following to appear:
rxrpc: Assertion failed - 6(0x6) == 12(0xc) is false
kernel BUG at net/rxrpc/call_object.c:456!
Where call->state is RXRPC_CALL_SERVER_SECURING (6) rather than
RXRPC_CALL_COMPLETE (12).
Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 13b7955a0252e15265386b229b814152f109b234 ]
Standard kernel mutexes cannot be used in any way from interrupt or softirq
context, so the user_mutex which manages access to a call cannot be a mutex
since on a new call the mutex must start off locked and be unlocked within
the softirq handler to prevent userspace interfering with a call we're
setting up.
Commit a0855d24fc22d49cdc25664fb224caee16998683 ("locking/mutex: Complain
upon mutex API misuse in IRQ contexts") causes big warnings to be splashed
in dmesg for each a new call that comes in from the server. Whilst it
*seems* like it should be okay, since the accept path uses trylock, there
are issues with PI boosting and marking the wrong task as the owner.
Fix this by not taking the mutex in the softirq path at all. It's not
obvious that there should be any need for it as the state is set before the
first notification is generated for the new call.
There's also no particular reason why the link-assessing ping should be
triggered inside the mutex. It's not actually transmitted there anyway,
but rather it has to be deferred to a workqueue.
Further, I don't think that there's any particular reason that the socket
notification needs to be done from within rx->incoming_lock, so the amount
of time that lock is held can be shortened too and the ping prepared before
the new call notification is sent.
Fixes: 540b1c48c37a ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Peter Zijlstra (Intel) <peterz@infradead.org>
cc: Ingo Molnar <mingo@redhat.com>
cc: Will Deacon <will@kernel.org>
cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f33121cbe91973a08e68e4bde8c3f7e6e4e351c1 ]
Move the unlock and the ping transmission for a new incoming call into
rxrpc_new_incoming_call() rather than doing it in the caller. This makes
it clearer to see what's going on.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
cc: Ingo Molnar <mingo@redhat.com>
cc: Will Deacon <will@kernel.org>
cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit b3c424eb6a1a3c485de64619418a471dee6ce849 upstream.
This field has never been checked since introduction in mainline kernel
Signed-off-by: Victorien Molle <victorien.molle@wifirst.fr>
Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Fixes: 2db6dc2662ba "sch_cake: Make gso-splitting configurable from userspace"
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9d7bf41fafa5b5ddd4c13eb39446b0045f0a8167 upstream.
Unlike the normal SIOCOUTQ, SIOCOUTQNSD was never handled in compat
mode. Add it to the common socket compat handler along with similar
ones.
Fixes: 2f4e1b397097 ("tcp: ioctl type SIOCOUTQNSD returns amount of data not sent")
Cc: Eric Dumazet <edumazet@google.com>
Cc: netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5f6beb9e0f633f3cc845cdd67973c506372931b4 upstream.
The af_unix protocol family has a custom ioctl command (inexplicibly
based on SIOCPROTOPRIVATE), but never had a compat_ioctl handler for
32-bit applications.
Since all commands are compatible here, add a trivial wrapper that
performs the compat_ptr() conversion for SIOCOUTQ/SIOCINQ. SIOCUNIXFILE
does not use the argument, but it doesn't hurt to also use compat_ptr()
here.
Fixes: ba94f3088b79 ("unix: add ioctl to open a unix socket file with O_PATH")
Cc: netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 671c450b6fe0680ea1cb1cf1526d764fdd5a3d3f upstream.
Since v5.4, a device removal occasionally triggered this oops:
Dec 2 17:13:53 manet kernel: BUG: unable to handle page fault for address: 0000000c00000219
Dec 2 17:13:53 manet kernel: #PF: supervisor read access in kernel mode
Dec 2 17:13:53 manet kernel: #PF: error_code(0x0000) - not-present page
Dec 2 17:13:53 manet kernel: PGD 0 P4D 0
Dec 2 17:13:53 manet kernel: Oops: 0000 [#1] SMP
Dec 2 17:13:53 manet kernel: CPU: 2 PID: 468 Comm: kworker/2:1H Tainted: G W 5.4.0-00050-g53717e43af61 #883
Dec 2 17:13:53 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
Dec 2 17:13:53 manet kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
Dec 2 17:13:53 manet kernel: RIP: 0010:rpcrdma_wc_receive+0x7c/0xf6 [rpcrdma]
Dec 2 17:13:53 manet kernel: Code: 6d 8b 43 14 89 c1 89 45 78 48 89 4d 40 8b 43 2c 89 45 14 8b 43 20 89 45 18 48 8b 45 20 8b 53 14 48 8b 30 48 8b 40 10 48 8b 38 <48> 8b 87 18 02 00 00 48 85 c0 75 18 48 8b 05 1e 24 c4 e1 48 85 c0
Dec 2 17:13:53 manet kernel: RSP: 0018:ffffc900035dfe00 EFLAGS: 00010246
Dec 2 17:13:53 manet kernel: RAX: ffff888467290000 RBX: ffff88846c638400 RCX: 0000000000000048
Dec 2 17:13:53 manet kernel: RDX: 0000000000000048 RSI: 00000000f942e000 RDI: 0000000c00000001
Dec 2 17:13:53 manet kernel: RBP: ffff888467611b00 R08: ffff888464e4a3c4 R09: 0000000000000000
Dec 2 17:13:53 manet kernel: R10: ffffc900035dfc88 R11: fefefefefefefeff R12: ffff888865af4428
Dec 2 17:13:53 manet kernel: R13: ffff888466023000 R14: ffff88846c63f000 R15: 0000000000000010
Dec 2 17:13:53 manet kernel: FS: 0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000
Dec 2 17:13:53 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 2 17:13:53 manet kernel: CR2: 0000000c00000219 CR3: 0000000002009002 CR4: 00000000001606e0
Dec 2 17:13:53 manet kernel: Call Trace:
Dec 2 17:13:53 manet kernel: __ib_process_cq+0x5c/0x14e [ib_core]
Dec 2 17:13:53 manet kernel: ib_cq_poll_work+0x26/0x70 [ib_core]
Dec 2 17:13:53 manet kernel: process_one_work+0x19d/0x2cd
Dec 2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
Dec 2 17:13:53 manet kernel: worker_thread+0x1a6/0x25a
Dec 2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
Dec 2 17:13:53 manet kernel: kthread+0xf4/0xf9
Dec 2 17:13:53 manet kernel: ? kthread_queue_delayed_work+0x74/0x74
Dec 2 17:13:53 manet kernel: ret_from_fork+0x24/0x30
The proximal cause is that this rpcrdma_rep has a rr_rdmabuf that
is still pointing to the old ib_device, which has been freed. The
only way that is possible is if this rpcrdma_rep was not destroyed
by rpcrdma_ia_remove.
Debugging showed that was indeed the case: this rpcrdma_rep was
still in use by a completing RPC at the time of the device removal,
and thus wasn't on the rep free list. So, it was not found by
rpcrdma_reps_destroy().
The fix is to introduce a list of all rpcrdma_reps so that they all
can be found when a device is removed. That list is used to perform
only regbuf DMA unmapping, replacing that call to
rpcrdma_reps_destroy().
Meanwhile, to prevent corruption of this list, I've moved the
destruction of temp rpcrdma_rep objects to rpcrdma_post_recvs().
rpcrdma_xprt_drain() ensures that post_recvs (and thus rep_destroy) is
not invoked while rpcrdma_reps_unmap is walking rb_all_reps, thus
protecting the rb_all_reps list.
Fixes: b0b227f071a0 ("xprtrdma: Use an llist to manage free rpcrdma_reps")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 13cb886c591f341a8759f175292ddf978ef903a1 upstream.
I've found that on occasion, "rmmod <dev>" will hang while if an NFS
is under load.
Ensure that ri_remove_done is initialized only just before the
transport is woken up to force a close. This avoids the completion
possibly getting initialized again while the CM event handler is
waiting for a wake-up.
Fixes: bebd031866ca ("xprtrdma: Support unplugging an HCA from under an NFS mount")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b32b9ed493f938e191f790a0991d20b18b38c35b upstream.
On device re-insertion, the RDMA device driver crashes trying to set
up a new QP:
Nov 27 16:32:06 manet kernel: BUG: kernel NULL pointer dereference, address: 00000000000001c0
Nov 27 16:32:06 manet kernel: #PF: supervisor write access in kernel mode
Nov 27 16:32:06 manet kernel: #PF: error_code(0x0002) - not-present page
Nov 27 16:32:06 manet kernel: PGD 0 P4D 0
Nov 27 16:32:06 manet kernel: Oops: 0002 [#1] SMP
Nov 27 16:32:06 manet kernel: CPU: 1 PID: 345 Comm: kworker/u28:0 Tainted: G W 5.4.0 #852
Nov 27 16:32:06 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
Nov 27 16:32:06 manet kernel: Workqueue: xprtiod xprt_rdma_connect_worker [rpcrdma]
Nov 27 16:32:06 manet kernel: RIP: 0010:atomic_try_cmpxchg+0x2/0x12
Nov 27 16:32:06 manet kernel: Code: ff ff 48 8b 04 24 5a c3 c6 07 00 0f 1f 40 00 c3 31 c0 48 81 ff 08 09 68 81 72 0c 31 c0 48 81 ff 83 0c 68 81 0f 92 c0 c3 8b 06 <f0> 0f b1 17 0f 94 c2 84 d2 75 02 89 06 88 d0 c3 53 ba 01 00 00 00
Nov 27 16:32:06 manet kernel: RSP: 0018:ffffc900035abbf0 EFLAGS: 00010046
Nov 27 16:32:06 manet kernel: RAX: 0000000000000000 RBX: 00000000000001c0 RCX: 0000000000000000
Nov 27 16:32:06 manet kernel: RDX: 0000000000000001 RSI: ffffc900035abbfc RDI: 00000000000001c0
Nov 27 16:32:06 manet kernel: RBP: ffffc900035abde0 R08: 000000000000000e R09: ffffffffffffc000
Nov 27 16:32:06 manet kernel: R10: 0000000000000000 R11: 000000000002e800 R12: ffff88886169d9f8
Nov 27 16:32:06 manet kernel: R13: ffff88886169d9f4 R14: 0000000000000246 R15: 0000000000000000
Nov 27 16:32:06 manet kernel: FS: 0000000000000000(0000) GS:ffff88846fa40000(0000) knlGS:0000000000000000
Nov 27 16:32:06 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 27 16:32:06 manet kernel: CR2: 00000000000001c0 CR3: 0000000002009006 CR4: 00000000001606e0
Nov 27 16:32:06 manet kernel: Call Trace:
Nov 27 16:32:06 manet kernel: do_raw_spin_lock+0x2f/0x5a
Nov 27 16:32:06 manet kernel: create_qp_common.isra.47+0x856/0xadf [mlx4_ib]
Nov 27 16:32:06 manet kernel: ? slab_post_alloc_hook.isra.60+0xa/0x1a
Nov 27 16:32:06 manet kernel: ? __kmalloc+0x125/0x139
Nov 27 16:32:06 manet kernel: mlx4_ib_create_qp+0x57f/0x972 [mlx4_ib]
The fix is to copy the qp_init_attr struct that was just created by
rpcrdma_ep_create() instead of using the one from the previous
connection instance.
Fixes: 98ef77d1aaa7 ("xprtrdma: Send Queue size grows after a reconnect")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8163999db445021f2651a8a47b5632483e8722ea upstream.
Report from Dan Carpenter,
net/core/skmsg.c:792 sk_psock_write_space()
error: we previously assumed 'psock' could be null (see line 790)
net/core/skmsg.c
789 psock = sk_psock(sk);
790 if (likely(psock && sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)))
Check for NULL
791 schedule_work(&psock->work);
792 write_space = psock->saved_write_space;
^^^^^^^^^^^^^^^^^^^^^^^^
793 rcu_read_unlock();
794 write_space(sk);
Ensure psock dereference on line 792 only occurs if psock is not null.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2ae50ad68cd79224198b525f7bd645c9da98b6ff upstream.
A recent clean up attempted to separate Receive handling and RPC
Reply processing, in the name of clean layering.
Unfortunately, we can't do this because the Receive Queue has to be
refilled _after_ the most recent credit update from the responder
is parsed from the transport header, but _before_ we wake up the
next RPC sender. That is right in the middle of
rpcrdma_reply_handler().
Usually this isn't a problem because current responder
implementations don't vary their credit grant. The one exception is
when a connection is established: the grant goes from one to a much
larger number on the first Receive. The requester MUST post enough
Receives right then so that any outstanding requests can be sent
without risking RNR and connection loss.
Fixes: 6ceea36890a0 ("xprtrdma: Refactor Receive accounting")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c3700780a096fc66467c81076ddf7f3f11d639b5 upstream.
Close some holes introduced by commit 6dc6ec9e04c4 ("xprtrdma: Cache
free MRs in each rpcrdma_req") that could result in list corruption.
In addition, the result that is tabulated in @count is no longer
used, so @count is removed.
Fixes: 6dc6ec9e04c4 ("xprtrdma: Cache free MRs in each rpcrdma_req")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a31b2f939219dd9bffdf01a45bd91f209f8cc369 upstream.
This is because xprt_request_get_cong() is allowing more than one
RPC Call to be transmitted before the first Receive on the new
connection. The first Receive fills the Receive Queue based on the
server's credit grant. Before that Receive, there is only a single
Receive WR posted because the client doesn't know the server's
credit grant.
Solution is to clear rq_cong on all outstanding rpc_rqsts when the
the cwnd is reset. This is because an RPC/RDMA credit is good for
one connection instance only.
Fixes: 75891f502f5f ("SUNRPC: Support for congestion control ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4b93dab36f28e673725e5e6123ebfccf7697f96a upstream.
When adding frwr_unmap_async way back when, I re-used the existing
trace_xprtrdma_post_send() trace point to record the return code
of ib_post_send.
Unfortunately there are some cases where re-using that trace point
causes a crash. Instead, construct a trace point specific to posting
Local Invalidate WRs that will always be safe to use in that context,
and will act as a trace log eye-catcher for Local Invalidation.
Fixes: 847568942f93 ("xprtrdma: Remove fr_state")
Fixes: d8099feda483 ("xprtrdma: Reduce context switching due ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Bill Baker <bill.baker@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6408c40c39d8eee5caaf97f5219b7dd4e041cc59 upstream.
On 32-bit architectures, get_seconds() returns an unsigned 32-bit
time value, which also matches the type used in the nft_meta
code. This will not overflow in year 2038 as a time_t would, but
it still suffers from the overflow problem later on in year 2106.
Change this instance to use the time64_t type consistently
and avoid the deprecated get_seconds().
The nft_meta_weekday() calculation potentially gets a little slower
on 32-bit architectures, but now it has the same behavior as on
64-bit architectures and does not overflow.
Fixes: 63d10e12b00d ("netfilter: nft_meta: support for time matching")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 23403cd8898dbc9808d3eb2f63bc1db8a340b751 upstream.
If hardware offload commit path fails, release all flow_rule objects.
Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 04b69426d846cd04ca9acefff1ea39e1c64d2714 upstream.
hsr slave interfaces don't have debugfs directory.
So, hsr_debugfs_rename() shouldn't be called when hsr slave interface name
is changed.
Test commands:
ip link add dummy0 type dummy
ip link add dummy1 type dummy
ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1
ip link set dummy0 name ap
Splat looks like:
[21071.899367][T22666] ap: renamed from dummy0
[21071.914005][T22666] ==================================================================
[21071.919008][T22666] BUG: KASAN: slab-out-of-bounds in hsr_debugfs_rename+0xaa/0xb0 [hsr]
[21071.923640][T22666] Read of size 8 at addr ffff88805febcd98 by task ip/22666
[21071.926941][T22666]
[21071.927750][T22666] CPU: 0 PID: 22666 Comm: ip Not tainted 5.5.0-rc2+ #240
[21071.929919][T22666] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[21071.935094][T22666] Call Trace:
[21071.935867][T22666] dump_stack+0x96/0xdb
[21071.936687][T22666] ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
[21071.937774][T22666] print_address_description.constprop.5+0x1be/0x360
[21071.939019][T22666] ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
[21071.940081][T22666] ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
[21071.940949][T22666] __kasan_report+0x12a/0x16f
[21071.941758][T22666] ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
[21071.942674][T22666] kasan_report+0xe/0x20
[21071.943325][T22666] hsr_debugfs_rename+0xaa/0xb0 [hsr]
[21071.944187][T22666] hsr_netdev_notify+0x1fe/0x9b0 [hsr]
[21071.945052][T22666] ? __module_text_address+0x13/0x140
[21071.945897][T22666] notifier_call_chain+0x90/0x160
[21071.946743][T22666] dev_change_name+0x419/0x840
[21071.947496][T22666] ? __read_once_size_nocheck.constprop.6+0x10/0x10
[21071.948600][T22666] ? netdev_adjacent_rename_links+0x280/0x280
[21071.949577][T22666] ? __read_once_size_nocheck.constprop.6+0x10/0x10
[21071.950672][T22666] ? lock_downgrade+0x6e0/0x6e0
[21071.951345][T22666] ? do_setlink+0x811/0x2ef0
[21071.951991][T22666] do_setlink+0x811/0x2ef0
[21071.952613][T22666] ? is_bpf_text_address+0x81/0xe0
[ ... ]
Reported-by: syzbot+9328206518f08318a5fd@syzkaller.appspotmail.com
Fixes: 4c2d5e33dcd3 ("hsr: rename debugfs file when interface name is changed")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3ed0a1d563903bdb4b4c36c58c4d9c1bcb23a6e6 upstream.
The supervision frame is L2 frame.
When supervision frame is created, hsr module doesn't set network header.
If tap routine is enabled, dev_queue_xmit_nit() is called and it checks
network_header. If network_header pointer wasn't set(or invalid),
it resets network_header and warns.
In order to avoid unnecessary warning message, resetting network_header
is needed.
Test commands:
ip netns add nst
ip link add veth0 type veth peer name veth1
ip link add veth2 type veth peer name veth3
ip link set veth1 netns nst
ip link set veth3 netns nst
ip link set veth0 up
ip link set veth2 up
ip link add hsr0 type hsr slave1 veth0 slave2 veth2
ip a a 192.168.100.1/24 dev hsr0
ip link set hsr0 up
ip netns exec nst ip link set veth1 up
ip netns exec nst ip link set veth3 up
ip netns exec nst ip link add hsr1 type hsr slave1 veth1 slave2 veth3
ip netns exec nst ip a a 192.168.100.2/24 dev hsr1
ip netns exec nst ip link set hsr1 up
tcpdump -nei veth0
Splat looks like:
[ 175.852292][ C3] protocol 88fb is buggy, dev veth0
Fixes: f421436a591d ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4c2d5e33dcd3a6333a7895be3b542ff3d373177c upstream.
hsr interface has own debugfs file, which name is same with interface name.
So, interface name is changed, debugfs file name should be changed too.
Fixes: fc4ecaeebd26 ("net: hsr: add debugfs support for display node list")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c6c4ccd7f96993e106dfea7ef18127f972f2db5e upstream.
In current hsr code, when hsr interface is created, it creates debugfs
directory /sys/kernel/debug/<interface name>.
If there is same directory or file name in there, it fails.
In order to reduce possibility of failure of creation of debugfs,
this patch adds root directory.
Test commands:
ip link add dummy0 type dummy
ip link add dummy1 type dummy
ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1
Before this patch:
/sys/kernel/debug/hsr0/node_table
After this patch:
/sys/kernel/debug/hsr/hsr0/node_table
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8ca79606cdfde2e37ee4f0707b9d1874a6f0eb38 upstream.
The .deactivate and .activate interfaces already deal with the reference
counter. Otherwise, this results in spurious "Device is busy" errors.
Fixes: a3c90f7a2323 ("netfilter: nf_tables: flow offload expression")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|