summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2022-02-11drop_monitor: fix data-race in dropmon_net_event / trace_napi_poll_hitEric Dumazet
trace_napi_poll_hit() is reading stat->dev while another thread can write on it from dropmon_net_event() Use READ_ONCE()/WRITE_ONCE() here, RCU rules are properly enforced already, we only have to take care of load/store tearing. BUG: KCSAN: data-race in dropmon_net_event / trace_napi_poll_hit write to 0xffff88816f3ab9c0 of 8 bytes by task 20260 on cpu 1: dropmon_net_event+0xb8/0x2b0 net/core/drop_monitor.c:1579 notifier_call_chain kernel/notifier.c:84 [inline] raw_notifier_call_chain+0x53/0xb0 kernel/notifier.c:392 call_netdevice_notifiers_info net/core/dev.c:1919 [inline] call_netdevice_notifiers_extack net/core/dev.c:1931 [inline] call_netdevice_notifiers net/core/dev.c:1945 [inline] unregister_netdevice_many+0x867/0xfb0 net/core/dev.c:10415 ip_tunnel_delete_nets+0x24a/0x280 net/ipv4/ip_tunnel.c:1123 vti_exit_batch_net+0x2a/0x30 net/ipv4/ip_vti.c:515 ops_exit_list net/core/net_namespace.c:173 [inline] cleanup_net+0x4dc/0x8d0 net/core/net_namespace.c:597 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307 worker_thread+0x616/0xa70 kernel/workqueue.c:2454 kthread+0x1bf/0x1e0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 read to 0xffff88816f3ab9c0 of 8 bytes by interrupt on cpu 0: trace_napi_poll_hit+0x89/0x1c0 net/core/drop_monitor.c:292 trace_napi_poll include/trace/events/napi.h:14 [inline] __napi_poll+0x36b/0x3f0 net/core/dev.c:6366 napi_poll net/core/dev.c:6432 [inline] net_rx_action+0x29e/0x650 net/core/dev.c:6519 __do_softirq+0x158/0x2de kernel/softirq.c:558 do_softirq+0xb1/0xf0 kernel/softirq.c:459 __local_bh_enable_ip+0x68/0x70 kernel/softirq.c:383 __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline] _raw_spin_unlock_bh+0x33/0x40 kernel/locking/spinlock.c:210 spin_unlock_bh include/linux/spinlock.h:394 [inline] ptr_ring_consume_bh include/linux/ptr_ring.h:367 [inline] wg_packet_decrypt_worker+0x73c/0x780 drivers/net/wireguard/receive.c:506 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307 worker_thread+0x616/0xa70 kernel/workqueue.c:2454 kthread+0x1bf/0x1e0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 value changed: 0xffff88815883e000 -> 0x0000000000000000 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 26435 Comm: kworker/0:1 Not tainted 5.17.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: wg-crypt-wg2 wg_packet_decrypt_worker Fixes: 4ea7e38696c7 ("dropmon: add ability to detect when hardware dropsrxpackets") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neil Horman <nhorman@tuxdriver.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11ipv6: Reject routes configurations that specify dsfield (tos)Guillaume Nault
The ->rtm_tos option is normally used to route packets based on both the destination address and the DS field. However it's ignored for IPv6 routes. Setting ->rtm_tos for IPv6 is thus invalid as the route is going to work only on the destination address anyway, so it won't behave as specified. Suggested-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11net: dsa: remove lockdep class for DSA slave address listVladimir Oltean
Since commit 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), suggested by Cong Wang, the DSA interfaces and their master have different dev->nested_level, which makes netif_addr_lock() stop complaining about potentially recursive locking on the same lock class. So we no longer need DSA slave interfaces to have their own lockdep class. Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11net: dsa: remove lockdep class for DSA master address listVladimir Oltean
Since commit 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), suggested by Cong Wang, the DSA interfaces and their master have different dev->nested_level, which makes netif_addr_lock() stop complaining about potentially recursive locking on the same lock class. So we no longer need DSA masters to have their own lockdep class. Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11net: dsa: remove ndo_get_phys_port_name and ndo_get_port_parent_idVladimir Oltean
There are no legacy ports, DSA registers a devlink instance with ports unconditionally for all switch drivers. Therefore, delete the old-style ndo operations used for determining bridge forwarding domains. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11net/smc: Add global configure for handshake limitation by netlinkD. Wythe
Although we can control SMC handshake limitation through socket options, which means that applications who need it must modify their code. It's quite troublesome for many existing applications. This patch modifies the global default value of SMC handshake limitation through netlink, providing a way to put constraint on handshake without modifies any code for applications. Suggested-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11net/smc: Dynamic control handshake limitation by socket optionsD. Wythe
This patch aims to add dynamic control for SMC handshake limitation for every smc sockets, in production environment, it is possible for the same applications to handle different service types, and may have different opinion on SMC handshake limitation. This patch try socket options to complete it, since we don't have socket option level for SMC yet, which requires us to implement it at the same time. This patch does the following: - add new socket option level: SOL_SMC. - add new SMC socket option: SMC_LIMIT_HS. - provide getter/setter for SMC socket options. Link: https://lore.kernel.org/all/20f504f961e1a803f85d64229ad84260434203bd.1644323503.git.alibuda@linux.alibaba.com/ Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11net/smc: Limit SMC visits when handshake workqueue congestedD. Wythe
This patch intends to provide a mechanism to put constraint on SMC connections visit according to the pressure of SMC handshake process. At present, frequent visits will cause the incoming connections to be backlogged in SMC handshake queue, raise the connections established time. Which is quite unacceptable for those applications who base on short lived connections. There are two ways to implement this mechanism: 1. Put limitation after TCP established. 2. Put limitation before TCP established. In the first way, we need to wait and receive CLC messages that the client will potentially send, and then actively reply with a decline message, in a sense, which is also a sort of SMC handshake, affect the connections established time on its way. In the second way, the only problem is that we need to inject SMC logic into TCP when it is about to reply the incoming SYN, since we already do that, it's seems not a problem anymore. And advantage is obvious, few additional processes are required to complete the constraint. This patch use the second way. After this patch, connections who beyond constraint will not informed any SMC indication, and SMC will not be involved in any of its subsequent processes. Link: https://lore.kernel.org/all/1641301961-59331-1-git-send-email-alibuda@linux.alibaba.com/ Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11net/smc: Limit backlog connectionsD. Wythe
Current implementation does not handling backlog semantics, one potential risk is that server will be flooded by infinite amount connections, even if client was SMC-incapable. This patch works to put a limit on backlog connections, referring to the TCP implementation, we divides SMC connections into two categories: 1. Half SMC connection, which includes all TCP established while SMC not connections. 2. Full SMC connection, which includes all SMC established connections. For half SMC connection, since all half SMC connections starts with TCP established, we can achieve our goal by put a limit before TCP established. Refer to the implementation of TCP, this limits will based on not only the half SMC connections but also the full connections, which is also a constraint on full SMC connections. For full SMC connections, although we know exactly where it starts, it's quite hard to put a limit before it. The easiest way is to block wait before receive SMC confirm CLC message, while it's under protection by smc_server_lgr_pending, a global lock, which leads this limit to the entire host instead of a single listen socket. Another way is to drop the full connections, but considering the cast of SMC connections, we prefer to keep full SMC connections. Even so, the limits of full SMC connections still exists, see commits about half SMC connection below. After this patch, the limits of backend connection shows like: For SMC: 1. Client with SMC-capability can makes 2 * backlog full SMC connections or 1 * backlog half SMC connections and 1 * backlog full SMC connections at most. 2. Client without SMC-capability can only makes 1 * backlog half TCP connections and 1 * backlog full TCP connections. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11net/smc: Make smc_tcp_listen_work() independentD. Wythe
In multithread and 10K connections benchmark, the backend TCP connection established very slowly, and lots of TCP connections stay in SYN_SENT state. Client: smc_run wrk -c 10000 -t 4 http://server the netstate of server host shows like: 145042 times the listen queue of a socket overflowed 145042 SYNs to LISTEN sockets dropped One reason of this issue is that, since the smc_tcp_listen_work() shared the same workqueue (smc_hs_wq) with smc_listen_work(), while the smc_listen_work() do blocking wait for smc connection established. Once the workqueue became congested, it's will block the accept() from TCP listen. This patch creates a independent workqueue(smc_tcp_ls_wq) for smc_tcp_listen_work(), separate it from smc_listen_work(), which is quite acceptable considering that smc_tcp_listen_work() runs very fast. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11net/smc: Avoid overwriting the copies of clcsock callback functionsWen Gu
The callback functions of clcsock will be saved and replaced during the fallback. But if the fallback happens more than once, then the copies of these callback functions will be overwritten incorrectly, resulting in a loop call issue: clcsk->sk_error_report |- smc_fback_error_report() <------------------------------| |- smc_fback_forward_wakeup() | (loop) |- clcsock_callback() (incorrectly overwritten) | |- smc->clcsk_error_report() ------------------| So this patch fixes the issue by saving these function pointers only once in the fallback and avoiding overwriting. Reported-by: syzbot+4de3c0e8a263e1e499bc@syzkaller.appspotmail.com Fixes: 341adeec9ada ("net/smc: Forward wakeup to smc socket waitqueue after fallback") Link: https://lore.kernel.org/r/0000000000006d045e05d78776f6@google.com Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-10Merge tag 'net-5.17-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter and can. Current release - new code bugs: - sparx5: fix get_stat64 out-of-bound access and crash - smc: fix netdev ref tracker misuse Previous releases - regressions: - eth: ixgbevf: require large buffers for build_skb on 82599VF, avoid overflows - eth: ocelot: fix all IP traffic getting trapped to CPU with PTP over IP - bonding: fix rare link activation misses in 802.3ad mode Previous releases - always broken: - tcp: fix tcp sock mem accounting in zero-copy corner cases - remove the cached dst when uncloning an skb dst and its metadata, since we only have one ref it'd lead to an UaF - netfilter: - conntrack: don't refresh sctp entries in closed state - conntrack: re-init state for retransmitted syn-ack, avoid connection establishment getting stuck with strange stacks - ctnetlink: disable helper autoassign, avoid it getting lost - nft_payload: don't allow transport header access for fragments - dsa: fix use of devres for mdio throughout drivers - eth: amd-xgbe: disable interrupts during pci removal - eth: dpaa2-eth: unregister netdev before disconnecting the PHY - eth: ice: fix IPIP and SIT TSO offload" * tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits) net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister net: mscc: ocelot: fix mutex lock error during ethtool stats read ice: Avoid RTNL lock when re-creating auxiliary device ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler ice: fix IPIP and SIT TSO offload ice: fix an error code in ice_cfg_phy_fec() net: mpls: Fix GCC 12 warning dpaa2-eth: unregister the netdev before disconnecting from the PHY skbuff: cleanup double word in comment net: macb: Align the dma and coherent dma masks mptcp: netlink: process IPv6 addrs in creating listening sockets selftests: mptcp: add missing join check net: usb: qmi_wwan: Add support for Dell DW5829e vlan: move dev_put into vlan_dev_uninit vlan: introduce vlan_dev_free_egress_priority ax25: fix UAF bugs of net_device caused by rebinding operation net: dsa: fix panic when DSA master device unbinds on shutdown net: amd-xgbe: disable interrupts during pci removal tipc: rate limit warning for received illegal binding update net: mdio: aspeed: Add missing MODULE_DEVICE_TABLE ...
2022-02-10net/switchdev: use struct_size over open coded arithmeticMinghao Chi (CGEL ZTE)
Replace zero-length array with flexible-array member and make use of the struct_size() helper in kmalloc(). For example: struct switchdev_deferred_item { ... unsigned long data[]; }; Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi (CGEL ZTE) <chi.minghao@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-10netfilter: nft_synproxy: unregister hooks on init error pathPablo Neira Ayuso
Disable the IPv4 hooks if the IPv6 hooks fail to be registered. Fixes: ad49d86e07a4 ("netfilter: nf_tables: Add synproxy support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-10ipv4: Reject again rules with high DSCP valuesGuillaume Nault
Commit 563f8e97e054 ("ipv4: Stop taking ECN bits into account in fib4-rules") replaced the validation test on frh->tos. While the new test is stricter for ECN bits, it doesn't detect the use of high order DSCP bits. This would be fine if IPv4 could properly handle them. But currently, most IPv4 lookups are done with the three high DSCP bits masked. Therefore, using these bits doesn't lead to the expected result. Let's reject such configurations again, so that nobody starts to use and make any assumption about how the stack handles the three high order DSCP bits in fib4 rules. Fixes: 563f8e97e054 ("ipv4: Stop taking ECN bits into account in fib4-rules") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-10net: make net->dev_unreg_count atomicEric Dumazet
Having to acquire rtnl from netdev_run_todo() for every dismantled device is not desirable when/if rtnl is under stress. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-10net: mpls: Fix GCC 12 warningVictor Erminpour
When building with automatic stack variable initialization, GCC 12 complains about variables defined outside of switch case statements. Move the variable outside the switch, which silences the warning: ./net/mpls/af_mpls.c:1624:21: error: statement will never be executed [-Werror=switch-unreachable] 1624 | int err; | ^~~ Signed-off-by: Victor Erminpour <victor.erminpour@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-10skbuff: cleanup double word in commentTom Rix
Remove the second 'to'. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-10net: ping6: support setting socket options via cmsgJakub Kicinski
Minor reordering of the code and a call to sock_cmsg_send() gives us support for setting the common socket options via cmsg (the usual ones - SO_MARK, SO_TIMESTAMPING_OLD, SCM_TXTIME). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-10net: ping6: support packet timestampingJakub Kicinski
Nothing prevents the user from requesting timestamping on ping6 sockets, yet timestamps are not going to be reported. Plumb the flags through. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-10net: ping6: remove a pr_debug() statementJakub Kicinski
We have ftrace and BPF today, there's no need for printing arguments at the start of a function. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-10Merge tag 'ieee802154-for-davem-2022-02-10' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next Stefan Schmidt says: ==================== pull-request: ieee802154-next 2022-02-10 An update from ieee802154 for your *net-next* tree. There is more ongoing in ieee802154 than usual. This will be the first pull request for this cycle, but I expect one more. Depending on review and rework times. Pavel Skripkin ported the atusb driver over to the new USB api to avoid unint problems as well as making use of the modern api without kmalloc() needs in he driver. Miquel Raynal landed some changes to ensure proper frame checksum checking with hwsim, documenting our use of wake and stop_queue and eliding a magic value by using the proper define. David Girault documented the address struct used in ieee802154. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-10tipc: improve size validations for received domain recordsJon Maloy
The function tipc_mon_rcv() allows a node to receive and process domain_record structs from peer nodes to track their views of the network topology. This patch verifies that the number of members in a received domain record does not exceed the limit defined by MAX_MON_DOMAIN, something that may otherwise lead to a stack overflow. tipc_mon_rcv() is called from the function tipc_link_proto_rcv(), where we are reading a 32 bit message data length field into a uint16. To avert any risk of bit overflow, we add an extra sanity check for this in that function. We cannot see that happen with the current code, but future designers being unaware of this risk, may introduce it by allowing delivery of very large (> 64k) sk buffers from the bearer layer. This potential problem was identified by Eric Dumazet. This fixes CVE-2022-0435 Reported-by: Samuel Page <samuel.page@appgate.com> Reported-by: Eric Dumazet <edumazet@google.com> Fixes: 35c55c9877f8 ("tipc: add neighbor monitoring framework") Signed-off-by: Jon Maloy <jmaloy@redhat.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Samuel Page <samuel.page@appgate.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-09mptcp: netlink: process IPv6 addrs in creating listening socketsKishen Maloor
This change updates mptcp_pm_nl_create_listen_socket() to create listening sockets bound to IPv6 addresses (where IPv6 is supported). Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port") Acked-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next 1) Conntrack sets on CHECKSUM_UNNECESSARY for UDP packet with no checksum, from Kevin Mitchell. 2) skb->priority support for nfqueue, from Nicolas Dichtel. 3) Remove conntrack extension register API, from Florian Westphal. 4) Move nat destroy hook to nf_nat_hook instead, to remove nf_ct_ext_destroy(), also from Florian. 5) Wrap pptp conntrack NAT hooks into single structure, from Florian Westphal. 6) Support for tcp option set to noop for nf_tables, also from Florian. 7) Do not run x_tables comment match from packet path in nf_tables, from Florian Westphal. 8) Replace spinlock by cmpxchg() loop to update missed ct event, from Florian Westphal. 9) Wrap cttimeout hooks into single structure, from Florian. 10) Add fast nft_cmp expression for up to 16-bytes. 11) Use cb->ctx to store context in ctnetlink dump, instead of using cb->args[], from Florian Westphal. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: ctnetlink: use dump structure instead of raw args nfqueue: enable to set skb->priority netfilter: nft_cmp: optimize comparison for 16-bytes netfilter: cttimeout: use option structure netfilter: ecache: don't use nf_conn spinlock netfilter: nft_compat: suppress comment match netfilter: exthdr: add support for tcp option removal netfilter: conntrack: pptp: use single option structure netfilter: conntrack: remove extension register api netfilter: conntrack: handle ->destroy hook via nat_ops instead netfilter: conntrack: move extension sizes into core netfilter: conntrack: make all extensions 8-byte alignned netfilter: nfqueue: enable to get skb->priority netfilter: conntrack: mark UDP zero checksum as CHECKSUM_UNNECESSARY ==================== Link: https://lore.kernel.org/r/20220209133616.165104-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-09tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH.Sebastian Andrzej Siewior
Commit 9652dc2eb9e40 ("tcp: relax listening_hash operations") removed the need to disable bottom half while acquiring listening_hash.lock. There are still two callers left which disable bottom half before the lock is acquired. On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts as a lock to ensure that resources, that are protected by disabling bottom halves, remain protected. This leads to a circular locking dependency if the lock acquired with disabled bottom halves is also acquired with enabled bottom halves followed by disabling bottom halves. This is the reverse locking order. It has been observed with inet_listen_hashbucket::lock: local_bh_disable() + spin_lock(&ilb->lock): inet_listen() inet_csk_listen_start() sk->sk_prot->hash() := inet_hash() local_bh_disable() __inet_hash() spin_lock(&ilb->lock); acquire(&ilb->lock); Reverse order: spin_lock(&ilb2->lock) + local_bh_disable(): tcp_seq_next() listening_get_next() spin_lock(&ilb2->lock); acquire(&ilb2->lock); tcp4_seq_show() get_tcp4_sock() sock_i_ino() read_lock_bh(&sk->sk_callback_lock); acquire(softirq_ctrl) // <---- whoops acquire(&sk->sk_callback_lock) Drop local_bh_disable() around __inet_hash() which acquires listening_hash->lock. Split inet_unhash() and acquire the listen_hashbucket lock without disabling bottom halves; the inet_ehash lock with disabled bottom halves. Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lkml.kernel.org/r/12d6f9879a97cd56c09fb53dee343cbb14f7f1f7.camel@gmx.de Link: https://lkml.kernel.org/r/X9CheYjuXWc75Spa@hirez.programming.kicks-ass.net Link: https://lore.kernel.org/r/YgQOebeZ10eNx1W6@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-09Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-02-09 We've added 126 non-merge commits during the last 16 day(s) which contain a total of 201 files changed, 4049 insertions(+), 2215 deletions(-). The main changes are: 1) Add custom BPF allocator for JITs that pack multiple programs into a huge page to reduce iTLB pressure, from Song Liu. 2) Add __user tagging support in vmlinux BTF and utilize it from BPF verifier when generating loads, from Yonghong Song. 3) Add per-socket fast path check guarding from cgroup/BPF overhead when used by only some sockets, from Pavel Begunkov. 4) Continued libbpf deprecation work of APIs/features and removal of their usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko and various others. 5) Improve BPF instruction set documentation by adding byte swap instructions and cleaning up load/store section, from Christoph Hellwig. 6) Switch BPF preload infra to light skeleton and remove libbpf dependency from it, from Alexei Starovoitov. 7) Fix architecture-agnostic macros in libbpf for accessing syscall arguments from BPF progs for non-x86 architectures, from Ilya Leoshkevich. 8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be of 16-bit field with anonymous zero padding, from Jakub Sitnicki. 9) Add new bpf_copy_from_user_task() helper to read memory from a different task than current. Add ability to create sleepable BPF iterator progs, from Kenny Yu. 10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski. 11) Generate temporary netns names for BPF selftests to avoid naming collisions, from Hangbin Liu. 12) Implement bpf_core_types_are_compat() with limited recursion for in-kernel usage, from Matteo Croce. 13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5 to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor. 14) Misc minor fixes to libbpf and selftests from various folks. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits) selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide libbpf: Fix compilation warning due to mismatched printf format selftests/bpf: Test BPF_KPROBE_SYSCALL macro libbpf: Add BPF_KPROBE_SYSCALL macro libbpf: Fix accessing the first syscall argument on s390 libbpf: Fix accessing the first syscall argument on arm64 libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390 libbpf: Fix accessing syscall arguments on riscv libbpf: Fix riscv register names libbpf: Fix accessing syscall arguments on powerpc selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro libbpf: Add PT_REGS_SYSCALL_REGS macro selftests/bpf: Fix an endianness issue in bpf_syscall_macro test bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE bpf: Fix leftover header->pages in sparc and powerpc code. libbpf: Fix signedness bug in btf_dump_array_data() selftests/bpf: Do not export subtest as standalone test bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures ... ==================== Link: https://lore.kernel.org/r/20220209210050.8425-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-09net: drop_monitor: support drop reasonMenglong Dong
In the commit c504e5c2f964 ("net: skb: introduce kfree_skb_reason()") drop reason is introduced to the tracepoint of kfree_skb. Therefore, drop_monitor is able to report the drop reason to users by netlink. The drop reasons are reported as string to users, which is exactly the same as what we do when reporting it to ftrace. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220209060838.55513-1-imagedong@tencent.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-09bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wideJakub Sitnicki
remote_port is another case of a BPF context field documented as a 32-bit value in network byte order for which the BPF context access converter generates a load of a zero-padded 16-bit integer in network byte order. First such case was dst_port in bpf_sock which got addressed in commit 4421a582718a ("bpf: Make dst_port field in struct bpf_sock 16-bit wide"). Loading 4-bytes from the remote_port offset and converting the value with bpf_ntohl() leads to surprising results, as the expected value is shifted by 16 bits. Reduce the confusion by splitting the field in two - a 16-bit field holding a big-endian integer, and a 16-bit zero-padding anonymous field that follows it. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220209184333.654927-2-jakub@cloudflare.com
2022-02-09vlan: move dev_put into vlan_dev_uninitXin Long
Shuang Li reported an QinQ issue by simply doing: # ip link add dummy0 type dummy # ip link add link dummy0 name dummy0.1 type vlan id 1 # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2 # rmmod 8021q unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1 When rmmods 8021q, all vlan devs are deleted from their real_dev's vlan grp and added into list_kill by unregister_vlan_dev(). dummy0.1 is unregistered before dummy0.1.2, as it's using for_each_netdev() in __rtnl_kill_links(). When unregisters dummy0.1, dummy0.1.2 is not unregistered in the event of NETDEV_UNREGISTER, as it's been deleted from dummy0.1's vlan grp. However, due to dummy0.1.2 still holding dummy0.1, dummy0.1 will keep waiting in netdev_wait_allrefs(), while dummy0.1.2 will never get unregistered and release dummy0.1, as it delays dev_put until calling dev->priv_destructor, vlan_dev_free(). This issue was introduced by Commit 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()"), and this patch is to fix it by moving dev_put() into vlan_dev_uninit(), which is called after NETDEV_UNREGISTER event but before netdev_wait_allrefs(). Fixes: 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09vlan: introduce vlan_dev_free_egress_priorityXin Long
This patch is to introduce vlan_dev_free_egress_priority() to free egress priority for vlan dev, and keep vlan_dev_uninit() static as .ndo_uninit. It makes the code more clear and safer when adding new code in vlan_dev_uninit() in the future. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09ax25: fix UAF bugs of net_device caused by rebinding operationDuoming Zhou
The ax25_kill_by_device() will set s->ax25_dev = NULL and call ax25_disconnect() to change states of ax25_cb and sock, if we call ax25_bind() before ax25_kill_by_device(). However, if we call ax25_bind() again between the window of ax25_kill_by_device() and ax25_dev_device_down(), the values and states changed by ax25_kill_by_device() will be reassigned. Finally, ax25_dev_device_down() will deallocate net_device. If we dereference net_device in syscall functions such as ax25_release(), ax25_sendmsg(), ax25_getsockopt(), ax25_getname() and ax25_info_show(), a UAF bug will occur. One of the possible race conditions is shown below: (USE) | (FREE) ax25_bind() | | ax25_kill_by_device() ax25_bind() | ax25_connect() | ... | ax25_dev_device_down() | ... | dev_put_track(dev, ...) //FREE ax25_release() | ... ax25_send_control() | alloc_skb() //USE | the corresponding fail log is shown below: =============================================================== BUG: KASAN: use-after-free in ax25_send_control+0x43/0x210 ... Call Trace: ... ax25_send_control+0x43/0x210 ax25_release+0x2db/0x3b0 __sock_release+0x6d/0x120 sock_close+0xf/0x20 __fput+0x11f/0x420 ... Allocated by task 1283: ... __kasan_kmalloc+0x81/0xa0 alloc_netdev_mqs+0x5a/0x680 mkiss_open+0x6c/0x380 tty_ldisc_open+0x55/0x90 ... Freed by task 1969: ... kfree+0xa3/0x2c0 device_release+0x54/0xe0 kobject_put+0xa5/0x120 tty_ldisc_kill+0x3e/0x80 ... In order to fix these UAF bugs caused by rebinding operation, this patch adds dev_hold_track() into ax25_bind() and corresponding dev_put_track() into ax25_kill_by_device(). Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09net: dsa: fix panic when DSA master device unbinds on shutdownVladimir Oltean
Rafael reports that on a system with LX2160A and Marvell DSA switches, if a reboot occurs while the DSA master (dpaa2-eth) is up, the following panic can be seen: systemd-shutdown[1]: Rebooting. Unable to handle kernel paging request at virtual address 00a0000800000041 [00a0000800000041] address between user and kernel address ranges Internal error: Oops: 96000004 [#1] PREEMPT SMP CPU: 6 PID: 1 Comm: systemd-shutdow Not tainted 5.16.5-00042-g8f5585009b24 #32 pc : dsa_slave_netdevice_event+0x130/0x3e4 lr : raw_notifier_call_chain+0x50/0x6c Call trace: dsa_slave_netdevice_event+0x130/0x3e4 raw_notifier_call_chain+0x50/0x6c call_netdevice_notifiers_info+0x54/0xa0 __dev_close_many+0x50/0x130 dev_close_many+0x84/0x120 unregister_netdevice_many+0x130/0x710 unregister_netdevice_queue+0x8c/0xd0 unregister_netdev+0x20/0x30 dpaa2_eth_remove+0x68/0x190 fsl_mc_driver_remove+0x20/0x5c __device_release_driver+0x21c/0x220 device_release_driver_internal+0xac/0xb0 device_links_unbind_consumers+0xd4/0x100 __device_release_driver+0x94/0x220 device_release_driver+0x28/0x40 bus_remove_device+0x118/0x124 device_del+0x174/0x420 fsl_mc_device_remove+0x24/0x40 __fsl_mc_device_remove+0xc/0x20 device_for_each_child+0x58/0xa0 dprc_remove+0x90/0xb0 fsl_mc_driver_remove+0x20/0x5c __device_release_driver+0x21c/0x220 device_release_driver+0x28/0x40 bus_remove_device+0x118/0x124 device_del+0x174/0x420 fsl_mc_bus_remove+0x80/0x100 fsl_mc_bus_shutdown+0xc/0x1c platform_shutdown+0x20/0x30 device_shutdown+0x154/0x330 __do_sys_reboot+0x1cc/0x250 __arm64_sys_reboot+0x20/0x30 invoke_syscall.constprop.0+0x4c/0xe0 do_el0_svc+0x4c/0x150 el0_svc+0x24/0xb0 el0t_64_sync_handler+0xa8/0xb0 el0t_64_sync+0x178/0x17c It can be seen from the stack trace that the problem is that the deregistration of the master causes a dev_close(), which gets notified as NETDEV_GOING_DOWN to dsa_slave_netdevice_event(). But dsa_switch_shutdown() has already run, and this has unregistered the DSA slave interfaces, and yet, the NETDEV_GOING_DOWN handler attempts to call dev_close_many() on those slave interfaces, leading to the problem. The previous attempt to avoid the NETDEV_GOING_DOWN on the master after dsa_switch_shutdown() was called seems improper. Unregistering the slave interfaces is unnecessary and unhelpful. Instead, after the slaves have stopped being uppers of the DSA master, we can now reset to NULL the master->dsa_ptr pointer, which will make DSA start ignoring all future notifier events on the master. Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown") Reported-by: Rafael Richter <rafael.richter@gin.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09tipc: rate limit warning for received illegal binding updateJon Maloy
It would be easy to craft a message containing an illegal binding table update operation. This is handled correctly by the code, but the corresponding warning printout is not rate limited as is should be. We fix this now. Fixes: b97bf3fd8f6a ("[TIPC] Initial merge") Signed-off-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09Merge tag 'linux-can-fixes-for-5.17-20220209' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2022-02-09 this is a pull request of 2 patches for net/master. Oliver Hartkopp contributes 2 fixes for the CAN ISOTP protocol. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09mctp: Add SIOCMCTP{ALLOC,DROP}TAG ioctls for tag controlMatt Johnston
This change adds a couple of new ioctls for mctp sockets: SIOCMCTPALLOCTAG and SIOCMCTPDROPTAG. These ioctls provide facilities for explicit allocation / release of tags, overriding the automatic allocate-on-send/release-on-reply and timeout behaviours. This allows userspace more control over messages that may not fit a simple request/response model. In order to indicate a pre-allocated tag to the sendmsg() syscall, we introduce a new flag to the struct sockaddr_mctp.smctp_tag value: MCTP_TAG_PREALLOC. Additional changes from Jeremy Kerr <jk@codeconstruct.com.au>. Contains a fix that was: Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09mctp: Allow keys matching any local addressJeremy Kerr
Currently, we require an exact match on an incoming packet's dest address, and the key's local_addr field. In a future change, we may want to set up a key before packets are routed, meaning we have no local address to match on. This change allows key lookups to match on local_addr = MCTP_ADDR_ANY. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09mctp: Add helper for address match checkingJeremy Kerr
Currently, we have a couple of paths that check that an EID matches, or the match value is MCTP_ADDR_ANY. Rather than open coding this, add a little helper. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09mctp: tests: Add key state testsJeremy Kerr
This change adds a few more tests to check the key/tag lookups on route input. We add a specific entry to the keys lists, route a packet with specific header values, and check for key match/mismatch. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09mctp: tests: Rename FL_T macro to FL_TOJeremy Kerr
This is a definition for the tag-owner flag, which has TO as a standard abbreviation. We'll want to add a helper for the actual tag value in a future change. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09ip6_tunnel: fix possible NULL deref in ip6_tnl_xmitEric Dumazet
Make sure to test that skb has a dst attached to it. general protection fault, probably for non-canonical address 0xdffffc0000000011: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000088-0x000000000000008f] CPU: 0 PID: 32650 Comm: syz-executor.4 Not tainted 5.17.0-rc2-next-20220204-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:ip6_tnl_xmit+0x2140/0x35f0 net/ipv6/ip6_tunnel.c:1127 Code: 4d 85 f6 0f 85 c5 04 00 00 e8 9c b0 66 f9 48 83 e3 fe 48 b8 00 00 00 00 00 fc ff df 48 8d bb 88 00 00 00 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 07 7f 05 e8 11 25 b2 f9 44 0f b6 b3 88 00 00 RSP: 0018:ffffc900141b7310 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc9000c77a000 RDX: 0000000000000011 RSI: ffffffff8811f854 RDI: 0000000000000088 RBP: ffffc900141b7480 R08: 0000000000000000 R09: 0000000000000008 R10: ffffffff8811f846 R11: 0000000000000008 R12: ffffc900141b7548 R13: ffff8880297c6000 R14: 0000000000000000 R15: ffff8880351c8dc0 FS: 00007f9827ba2700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b31322000 CR3: 0000000033a70000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ipxip6_tnl_xmit net/ipv6/ip6_tunnel.c:1386 [inline] ip6_tnl_start_xmit+0x71e/0x1830 net/ipv6/ip6_tunnel.c:1435 __netdev_start_xmit include/linux/netdevice.h:4683 [inline] netdev_start_xmit include/linux/netdevice.h:4697 [inline] xmit_one net/core/dev.c:3473 [inline] dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3489 __dev_queue_xmit+0x2a24/0x3760 net/core/dev.c:4116 packet_snd net/packet/af_packet.c:3057 [inline] packet_sendmsg+0x2265/0x5460 net/packet/af_packet.c:3084 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:725 sock_write_iter+0x289/0x3c0 net/socket.c:1061 call_write_iter include/linux/fs.h:2075 [inline] do_iter_readv_writev+0x47a/0x750 fs/read_write.c:726 do_iter_write+0x188/0x710 fs/read_write.c:852 vfs_writev+0x1aa/0x630 fs/read_write.c:925 do_writev+0x27f/0x300 fs/read_write.c:968 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f9828c2d059 Fixes: c1f55c5e0482 ("ip6_tunnel: allow routing IPv4 traffic in NBMA mode") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Qing Deng <i@moy.cat> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09ax25: fix NPD bug in ax25_disconnectDuoming Zhou
The ax25_disconnect() in ax25_kill_by_device() is not protected by any locks, thus there is a race condition between ax25_disconnect() and ax25_destroy_socket(). when ax25->sk is assigned as NULL by ax25_destroy_socket(), a NULL pointer dereference bug will occur if site (1) or (2) dereferences ax25->sk. ax25_kill_by_device() | ax25_release() ax25_disconnect() | ax25_destroy_socket() ... | if(ax25->sk != NULL) | ... ... | ax25->sk = NULL; bh_lock_sock(ax25->sk); //(1) | ... ... | bh_unlock_sock(ax25->sk); //(2)| This patch moves ax25_disconnect() into lock_sock(), which can synchronize with ax25_destroy_socket() in ax25_release(). Fail log: =============================================================== BUG: kernel NULL pointer dereference, address: 0000000000000088 ... RIP: 0010:_raw_spin_lock+0x7e/0xd0 ... Call Trace: ax25_disconnect+0xf6/0x220 ax25_device_event+0x187/0x250 raw_notifier_call_chain+0x5e/0x70 dev_close_many+0x17d/0x230 rollback_registered_many+0x1f1/0x950 unregister_netdevice_queue+0x133/0x200 unregister_netdev+0x13/0x20 ... Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09netfilter: ctnetlink: use dump structure instead of raw argsFlorian Westphal
netlink_dump structure has a union of 'long args[6]' and a context buffer as scratch space. Convert ctnetlink to use a structure, its easier to read than the raw 'args' usage which comes with no type checks and no readable names. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-09nfqueue: enable to set skb->priorityNicolas Dichtel
This is a follow up of the previous patch that enables to get skb->priority. It's now posssible to set it also. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Florian Westphal <fw@strlen.de> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-09netfilter: nft_cmp: optimize comparison for 16-bytesPablo Neira Ayuso
Allow up to 16-byte comparisons with a new cmp fast version. Use two 64-bit words and calculate the mask representing the bits to be compared. Make sure the comparison is 64-bit aligned and avoid out-of-bound memory access on registers. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-09netfilter: cttimeout: use option structureFlorian Westphal
Instead of two exported functions, export a single option structure. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-09netfilter: ecache: don't use nf_conn spinlockFlorian Westphal
For updating eache missed value we can use cmpxchg. This also avoids need to disable BH. kernel robot reported build failure on v1 because not all arches support cmpxchg for u16, so extend this to u32. This doesn't increase struct size, existing padding is used. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-09netfilter: xt_socket: fix a typo in socket_mt_destroy()Eric Dumazet
Calling nf_defrag_ipv4_disable() instead of nf_defrag_ipv6_disable() was probably not the intent. I found this by code inspection, while chasing a possible issue in TPROXY. Fixes: de8c12110a13 ("netfilter: disable defrag once its no longer needed") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-09xfrm: enforce validity of offload input flagsLeon Romanovsky
struct xfrm_user_offload has flags variable that received user input, but kernel didn't check if valid bits were provided. It caused a situation where not sanitized input was forwarded directly to the drivers. For example, XFRM_OFFLOAD_IPV6 define that was exposed, was used by strongswan, but not implemented in the kernel at all. As a solution, check and sanitize input flags to forward XFRM_OFFLOAD_INBOUND to the drivers. Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>