summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2021-12-20mac80211: Notify cfg80211 about association comebackIlan Peer
Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20211129152938.d76eac9e51ee.I986cffab95d51adfee6d84964711644392005113@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20cfg80211: Add support for notifying association comebackIlan Peer
Thought the underline driver MLME can handle association temporal rejection with comeback, it is still useful to notify this to user space, as user space might want to handle the temporal rejection differently. For example, in case the comeback time is too long, user space can deauthenticate immediately and try to associate with a different AP. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20211129152938.2467809e8cb3.I45574185b582666bc78eef0c29a4c36b478e5382@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20mac80211: introduce channel switch disconnect functionNathan Errera
Introduce a disconnect function that can be used when a channel switch error occurs. The channel switch can request to block the tx, and so, we need to make sure we do not send a deauth frame in this case. Signed-off-by: Nathan Errera <nathan.errera@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20211129152938.cd2a615a0702.I9edb14785586344af17644b610ab5be109dcef00@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20cfg80211: simplify cfg80211_chandef_valid()Johannes Berg
There are a lot of duplicate checks in this function to check the delta between the control channel and CF1. With the addition of 320 MHz, this will become even more. Simplify the code so that the common checks are done only once for multiple bandwidths. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20211129152938.2d0240b07f11.I759e8e990f5386ba2b56ffb2488a8d4e16e22c1b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20mac80211: Remove a couple of obsolete TODOIlan Peer
The HE capability IE is an extension IE so remove an irrelevant comments. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20211129152938.550b95b5fca7.Ia31395e880172aefcc0a8c70ed060f84b94bdb83@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20mac80211: fix FEC flag in radio tap headerP Praneesh
In mac80211, while building radiotap header IEEE80211_RADIOTAP_MCS_HAVE_FEC flag is missing when LDPC enabled from driver, hence LDPC is not updated properly in radiotap header. Fix that by adding HAVE_FEC flag while building radiotap header. Signed-off-by: P Praneesh <quic_ppranees@quicinc.com> Link: https://lore.kernel.org/r/1638294648-844-2-git-send-email-quic_ppranees@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20mac80211: use coarse boottime for airtime fairness codeFelix Fietkau
The time values used by the airtime fairness code only need to be accurate enough to cover station activity detection. Using ktime_get_coarse_boottime_ns instead of ktime_get_boottime_ns will drop the accuracy down to jiffies intervals, but at the same time saves a lot of CPU cycles in a hot path Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20211217114258.14619-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20Merge 5.16-rc6 into tty-nextGreg Kroah-Hartman
We need the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-19flow_offload: validate flags of filter and actionsBaowen Zheng
Add process to validate flags of filter and actions when adding a tc filter. We need to prevent adding filter with flags conflicts with its actions. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: add reoffload process to update hw_countBaowen Zheng
Add reoffload process to update hw_count when driver is inserted or removed. We will delete the action if it is with skip_sw flag and not offloaded to any hardware in reoffload process. When reoffloading actions, we still offload the actions that are added independent of filters. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19net: sched: save full flags for tc actionBaowen Zheng
Save full action flags and return user flags when return flags to user space. Save full action flags to distinguish if the action is created independent from classifier. We made this change mainly for further patch to reoffload tc actions. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: add process to update action stats from hardwareBaowen Zheng
When collecting stats for actions update them using both hardware and software counters. Stats update process should not run in context of preempt_disable. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: rename exts stats update functions with hwBaowen Zheng
Rename exts stats update functions with hw for readability. We make this change also to update stats from hw for an action when it is offloaded to hw as a single action. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: add skip_hw and skip_sw to control if offload the actionBaowen Zheng
We add skip_hw and skip_sw for user to control if offload the action to hardware. We also add in_hw_count for user to indicate if the action is offloaded to any hardware. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: allow user to offload tc action to net deviceBaowen Zheng
Use flow_indr_dev_register/flow_indr_dev_setup_offload to offload tc action. We need to call tc_cleanup_flow_action to clean up tc action entry since in tc_setup_action, some actions may hold dev refcnt, especially the mirror action. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: add ops to tc_action_ops for flow action setupBaowen Zheng
Add a new ops to tc_action_ops for flow action setup. Refactor function tc_setup_flow_action to use this new ops. We make this change to facilitate to add standalone action module. We will also use this ops to offload action independent of filter in following patch. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: rename offload functions with offload instead of flowBaowen Zheng
To improves readability, we rename offload functions with offload instead of flow. The term flow is related to exact matches, so we rename these functions with offload. We make this change to facilitate single action offload functions naming. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: add index to flow_action_entry structureBaowen Zheng
Add index to flow_action_entry structure and delete index from police and gate child structure. We make this change to offload tc action for driver to identify a tc action. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: fill flags to action structureBaowen Zheng
Fill flags to action structure to allow user control if the action should be offloaded to hardware or not. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-18bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem.Hao Luo
Some helper functions may modify its arguments, for example, bpf_d_path, bpf_get_stack etc. Previously, their argument types were marked as ARG_PTR_TO_MEM, which is compatible with read-only mem types, such as PTR_TO_RDONLY_BUF. Therefore it's legitimate, but technically incorrect, to modify a read-only memory by passing it into one of such helper functions. This patch tags the bpf_args compatible with immutable memory with MEM_RDONLY flag. The arguments that don't have this flag will be only compatible with mutable memory types, preventing the helper from modifying a read-only memory. The bpf_args that have MEM_RDONLY are compatible with both mutable memory and immutable memory. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-9-haoluo@google.com
2021-12-18bpf: Introduce MEM_RDONLY flagHao Luo
This patch introduce a flag MEM_RDONLY to tag a reg value pointing to read-only memory. It makes the following changes: 1. PTR_TO_RDWR_BUF -> PTR_TO_BUF 2. PTR_TO_RDONLY_BUF -> PTR_TO_BUF | MEM_RDONLY Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-6-haoluo@google.com
2021-12-18bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULLHao Luo
We have introduced a new type to make bpf_reg composable, by allocating bits in the type to represent flags. One of the flags is PTR_MAYBE_NULL which indicates a pointer may be NULL. This patch switches the qualified reg_types to use this flag. The reg_types changed in this patch include: 1. PTR_TO_MAP_VALUE_OR_NULL 2. PTR_TO_SOCKET_OR_NULL 3. PTR_TO_SOCK_COMMON_OR_NULL 4. PTR_TO_TCP_SOCK_OR_NULL 5. PTR_TO_BTF_ID_OR_NULL 6. PTR_TO_MEM_OR_NULL 7. PTR_TO_RDONLY_BUF_OR_NULL 8. PTR_TO_RDWR_BUF_OR_NULL Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211217003152.48334-5-haoluo@google.com
2021-12-18xdp: move the if dev statements to the firstYajun Deng
The xdp_rxq_info_unreg() called by xdp_rxq_info_reg() is meaningless when dev is NULL, so move the if dev statements to the first. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-18ax25: NPD bug when detaching AX25 deviceLin Ma
The existing cleanup routine implementation is not well synchronized with the syscall routine. When a device is detaching, below race could occur. static int ax25_sendmsg(...) { ... lock_sock() ax25 = sk_to_ax25(sk); if (ax25->ax25_dev == NULL) // CHECK ... ax25_queue_xmit(skb, ax25->ax25_dev->dev); // USE ... } static void ax25_kill_by_device(...) { ... if (s->ax25_dev == ax25_dev) { s->ax25_dev = NULL; ... } Other syscall functions like ax25_getsockopt, ax25_getname, ax25_info_show also suffer from similar races. To fix them, this patch introduce lock_sock() into ax25_kill_by_device in order to guarantee that the nullify action in cleanup routine cannot proceed when another socket request is pending. Signed-off-by: Hanjie Wu <nagi@zju.edu.cn> Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-189p/trans_virtio: Fix typo in the comment for p9_virtio_create()zhuxinran
couldlook ==> could look Link: https://lkml.kernel.org/r/20211216061439.4186-1-zhuran@mail.ustc.edu.cn Signed-off-by: zhuxinran <zhuran@mail.ustc.edu.cn> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2021-12-17mptcp: clean up harmless false expressionsJean Sacren
entry->addr.id is u8 with a range from 0 to 255 and MAX_ADDR_ID is 255. We should drop both false expressions of (entry->addr.id > MAX_ADDR_ID). We should also remove the obsolete parentheses in the first if branch. Use U8_MAX for MAX_ADDR_ID and add a comment to show the link to mptcp_addr_info.id as suggested by Mr. Matthieu Baerts. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jean Sacren <sakiwit@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-17mptcp: enforce HoL-blocking estimationPaolo Abeni
The MPTCP packet scheduler has sub-optimal behavior with asymmetric subflows: if the faster subflow-level cwin is closed, the packet scheduler can enqueue "too much" data on a slower subflow. When all the data on the faster subflow is acked, if the mptcp-level cwin is closed, and link utilization becomes suboptimal. The solution is implementing blest-like[1] HoL-blocking estimation, transmitting only on the subflow with the shorter estimated time to flush the queued memory. If such subflows cwin is closed, we wait even if other subflows are available. This is quite simpler than the original blest implementation, as we leverage the pacing rate provided by the TCP socket. To get a more accurate estimation for the subflow linger-time, we maintain a per-subflow weighted average of such info. Additionally drop magic numbers usage in favor of newly defined macros and use more meaningful names for status variable. [1] http://dl.ifip.org/db/conf/networking/networking2016/1570234725.pdf Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/137 Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-17Revert "tipc: use consistent GFP flags"Hoang Le
This reverts commit 86c3a3e964d910a62eeb277d60b2a60ebefa9feb. The tipc_aead_init() function can be calling from an interrupt routine. This allocation might sleep with GFP_KERNEL flag, hence the following BUG is reported. [ 17.657509] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:230 [ 17.660916] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/3 [ 17.664093] preempt_count: 302, expected: 0 [ 17.665619] RCU nest depth: 2, expected: 0 [ 17.667163] Preemption disabled at: [ 17.667165] [<0000000000000000>] 0x0 [ 17.669753] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G W 5.16.0-rc4+ #1 [ 17.673006] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 17.675540] Call Trace: [ 17.676285] <IRQ> [ 17.676913] dump_stack_lvl+0x34/0x44 [ 17.678033] __might_resched.cold+0xd6/0x10f [ 17.679311] kmem_cache_alloc_trace+0x14d/0x220 [ 17.680663] tipc_crypto_start+0x4a/0x2b0 [tipc] [ 17.682146] ? kmem_cache_alloc_trace+0xd3/0x220 [ 17.683545] tipc_node_create+0x2f0/0x790 [tipc] [ 17.684956] tipc_node_check_dest+0x72/0x680 [tipc] [ 17.686706] ? ___cache_free+0x31/0x350 [ 17.688008] ? skb_release_data+0x128/0x140 [ 17.689431] tipc_disc_rcv+0x479/0x510 [tipc] [ 17.690904] tipc_rcv+0x71c/0x730 [tipc] [ 17.692219] ? __netif_receive_skb_core+0xb7/0xf60 [ 17.693856] tipc_l2_rcv_msg+0x5e/0x90 [tipc] [ 17.695333] __netif_receive_skb_list_core+0x20b/0x260 [ 17.697072] netif_receive_skb_list_internal+0x1bf/0x2e0 [ 17.698870] ? dev_gro_receive+0x4c2/0x680 [ 17.700255] napi_complete_done+0x6f/0x180 [ 17.701657] virtnet_poll+0x29c/0x42e [virtio_net] [ 17.703262] __napi_poll+0x2c/0x170 [ 17.704429] net_rx_action+0x22f/0x280 [ 17.705706] __do_softirq+0xfd/0x30a [ 17.706921] common_interrupt+0xa4/0xc0 [ 17.708206] </IRQ> [ 17.708922] <TASK> [ 17.709651] asm_common_interrupt+0x1e/0x40 [ 17.711078] RIP: 0010:default_idle+0x18/0x20 Fixes: 86c3a3e964d9 ("tipc: use consistent GFP flags") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Link: https://lore.kernel.org/r/20211217030059.5947-1-hoang.h.le@dektech.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-17net: openvswitch: Fix matching zone id for invalid conns arriving from tcPaul Blakey
Zone id is not restored if we passed ct and ct rejected the connection, as there is no ct info on the skb. Save the zone from tc skb cb to tc skb extension and pass it on to ovs, use that info to restore the zone id for invalid connections. Fixes: d29334c15d33 ("net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct") Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-17net/sched: flow_dissector: Fix matching on zone id for invalid connsPaul Blakey
If ct rejects a flow, it removes the conntrack info from the skb. act_ct sets the post_ct variable so the dissector will see this case as an +tracked +invalid state, but the zone id is lost with the conntrack info. To restore the zone id on such cases, set the last executed zone, via the tc control block, when passing ct, and read it back in the dissector if there is no ct info on the skb (invalid connection). Fixes: 7baf2429a1a9 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support") Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-17net/sched: Extend qdisc control block with tc control blockPaul Blakey
BPF layer extends the qdisc control block via struct bpf_skb_data_end and because of that there is no more room to add variables to the qdisc layer control block without going over the skb->cb size. Extend the qdisc control block with a tc control block, and move all tc related variables to there as a pre-step for extending the tc control block with additional members. Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-17Revert "xsk: Do not sleep in poll() when need_wakeup set"Magnus Karlsson
This reverts commit bd0687c18e635b63233dc87f38058cd728802ab4. This patch causes a Tx only workload to go to sleep even when it does not have to, leading to misserable performance in skb mode. It fixed one rare problem but created a much worse one, so this need to be reverted while I try to craft a proper solution to the original problem. Fixes: bd0687c18e63 ("xsk: Do not sleep in poll() when need_wakeup set") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211217145646.26449-1-magnus.karlsson@gmail.com
2021-12-17bus: mhi: core: Add an API for auto queueing buffers for DL channelManivannan Sadhasivam
Add a new API "mhi_prepare_for_transfer_autoqueue" for using with client drivers like QRTR to request MHI core to autoqueue buffers for the DL channel along with starting both UL and DL channels. So far, the "auto_queue" flag specified by the controller drivers in channel definition served this purpose but this will be removed at some point in future. Cc: netdev@vger.kernel.org Cc: Jakub Kicinski <kuba@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Co-developed-by: Loic Poulain <loic.poulain@linaro.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20211216081227.237749-9-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix UAF in set catch-all element, from Eric Dumazet. 2) Fix MAC mangling for multicast/loopback traffic in nfnetlink_queue and nfnetlink_log, from Ignacy Gawędzki. 3) Remove expired entries from ctnetlink dump path regardless the tuple direction, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-17xfrm: state and policy should fail if XFRMA_IF_ID 0Antony Antony
xfrm ineterface does not allow xfrm if_id = 0 fail to create or update xfrm state and policy. With this commit: ip xfrm policy add src 192.0.2.1 dst 192.0.2.2 dir out if_id 0 RTNETLINK answers: Invalid argument ip xfrm state add src 192.0.2.1 dst 192.0.2.2 proto esp spi 1 \ reqid 1 mode tunnel aead 'rfc4106(gcm(aes))' \ 0x1111111111111111111111111111111111111111 96 if_id 0 RTNETLINK answers: Invalid argument v1->v2 change: - add Fixes: tag Fixes: 9f8550e4bd9d ("xfrm: fix disable_xfrm sysctl when used on xfrm interfaces") Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-12-17xfrm: interface with if_id 0 should return errorAntony Antony
xfrm interface if_id = 0 would cause xfrm policy lookup errors since Commit 9f8550e4bd9d. Now explicitly fail to create an xfrm interface when if_id = 0 With this commit: ip link add ipsec0 type xfrm dev lo if_id 0 Error: if_id must be non zero. v1->v2 change: - add Fixes: tag Fixes: 9f8550e4bd9d ("xfrm: fix disable_xfrm sysctl when used on xfrm interfaces") Signed-off-by: Antony Antony <antony.antony@secunet.com> Reviewed-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-12-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16Merge tag 'net-5.16-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from mac80211, wifi, bpf. Relatively large batches of fixes from BPF and the WiFi stack, calm in general networking. Current release - regressions: - dpaa2-eth: fix buffer overrun when reporting ethtool statistics Current release - new code bugs: - bpf: fix incorrect state pruning for <8B spill/fill - iavf: - add missing unlocks in iavf_watchdog_task() - do not override the adapter state in the watchdog task (again) - mlxsw: spectrum_router: consolidate MAC profiles when possible Previous releases - regressions: - mac80211 fixes: - rate control, avoid driver crash for retransmitted frames - regression in SSN handling of addba tx - a memory leak where sta_info is not freed - marking TX-during-stop for TX in in_reconfig, prevent stall - cfg80211: acquire wiphy mutex on regulatory work - wifi drivers: fix build regressions and LED config dependency - virtio_net: fix rx_drops stat for small pkts - dsa: mv88e6xxx: unforce speed & duplex in mac_link_down() Previous releases - always broken: - bpf fixes: - kernel address leakage in atomic fetch - kernel address leakage in atomic cmpxchg's r0 aux reg - signed bounds propagation after mov32 - extable fixup offset - extable address check - mac80211: - fix the size used for building probe request - send ADDBA requests using the tid/queue of the aggregation session - agg-tx: don't schedule_and_wake_txq() under sta->lock, avoid deadlocks - validate extended element ID is present - mptcp: - never allow the PM to close a listener subflow (null-defer) - clear 'kern' flag from fallback sockets, prevent crash - fix deadlock in __mptcp_push_pending() - inet_diag: fix kernel-infoleak for UDP sockets - xsk: do not sleep in poll() when need_wakeup set - smc: avoid very long waits in smc_release() - sch_ets: don't remove idle classes from the round-robin list - netdevsim: - zero-initialize memory for bpf map's value, prevent info leak - don't let user space overwrite read only (max) ethtool parms - ixgbe: set X550 MDIO speed before talking to PHY - stmmac: - fix null-deref in flower deletion w/ VLAN prio Rx steering - dwmac-rk: fix oob read in rk_gmac_setup - ice: time stamping fixes - systemport: add global locking for descriptor life cycle" * tag 'net-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (89 commits) bpf, selftests: Fix racing issue in btf_skc_cls_ingress test selftest/bpf: Add a test that reads various addresses. bpf: Fix extable address check. bpf: Fix extable fixup offset. bpf, selftests: Add test case trying to taint map value pointer bpf: Make 32->64 bounds propagation slightly more robust bpf: Fix signed bounds propagation after mov32 sit: do not call ipip6_dev_free() from sit_init_net() net: systemport: Add global locking for descriptor lifecycle net/smc: Prevent smc_release() from long blocking net: Fix double 0x prefix print in SKB dump virtio_net: fix rx_drops stat for small pkts dsa: mv88e6xxx: fix debug print for SPEED_UNFORCED sfc_ef100: potential dereference of null pointer net: stmmac: dwmac-rk: fix oob read in rk_gmac_setup net: usb: lan78xx: add Allied Telesis AT29M2-AF net/packet: rx_owner_map depends on pg_vec netdevsim: Zero-initialize memory for new map's value in function nsim_bpf_map_alloc dpaa2-eth: fix ethtool statistics ixgbe: set X550 MDIO speed before talking to PHY ...
2021-12-16add missing bpf-cgroup.h includesJakub Kicinski
We're about to break the cgroup-defs.h -> bpf-cgroup.h dependency, make sure those who actually need more than the definition of struct cgroup_bpf include bpf-cgroup.h explicitly. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/bpf/20211216025538.1649516-3-kuba@kernel.org
2021-12-16Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf 2021-12-16 We've added 15 non-merge commits during the last 7 day(s) which contain a total of 12 files changed, 434 insertions(+), 30 deletions(-). The main changes are: 1) Fix incorrect verifier state pruning behavior for <8B register spill/fill, from Paul Chaignon. 2) Fix x86-64 JIT's extable handling for fentry/fexit when return pointer is an ERR_PTR(), from Alexei Starovoitov. 3) Fix 3 different possibilities that BPF verifier missed where unprivileged could leak kernel addresses, from Daniel Borkmann. 4) Fix xsk's poll behavior under need_wakeup flag, from Magnus Karlsson. 5) Fix an oob-write in test_verifier due to a missed MAX_NR_MAPS bump, from Kumar Kartikeya Dwivedi. 6) Fix a race in test_btf_skc_cls_ingress selftest, from Martin KaFai Lau. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, selftests: Fix racing issue in btf_skc_cls_ingress test selftest/bpf: Add a test that reads various addresses. bpf: Fix extable address check. bpf: Fix extable fixup offset. bpf, selftests: Add test case trying to taint map value pointer bpf: Make 32->64 bounds propagation slightly more robust bpf: Fix signed bounds propagation after mov32 bpf, selftests: Update test case for atomic cmpxchg on r0 with pointer bpf: Fix kernel address leakage in atomic cmpxchg's r0 aux reg bpf, selftests: Add test case for atomic fetch on spilled pointer bpf: Fix kernel address leakage in atomic fetch selftests/bpf: Fix OOB write in test_verifier xsk: Do not sleep in poll() when need_wakeup set selftests/bpf: Tests for state pruning with u32 spill/fill bpf: Fix incorrect state pruning for <8B spill/fill ==================== Link: https://lore.kernel.org/r/20211216210005.13815-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16sit: do not call ipip6_dev_free() from sit_init_net()Eric Dumazet
ipip6_dev_free is sit dev->priv_destructor, already called by register_netdevice() if something goes wrong. Alternative would be to make ipip6_dev_free() robust against multiple invocations, but other drivers do not implement this strategy. syzbot reported: dst_release underflow WARNING: CPU: 0 PID: 5059 at net/core/dst.c:173 dst_release+0xd8/0xe0 net/core/dst.c:173 Modules linked in: CPU: 1 PID: 5059 Comm: syz-executor.4 Not tainted 5.16.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:dst_release+0xd8/0xe0 net/core/dst.c:173 Code: 4c 89 f2 89 d9 31 c0 5b 41 5e 5d e9 da d5 44 f9 e8 1d 90 5f f9 c6 05 87 48 c6 05 01 48 c7 c7 80 44 99 8b 31 c0 e8 e8 67 29 f9 <0f> 0b eb 85 0f 1f 40 00 53 48 89 fb e8 f7 8f 5f f9 48 83 c3 a8 48 RSP: 0018:ffffc9000aa5faa0 EFLAGS: 00010246 RAX: d6894a925dd15a00 RBX: 00000000ffffffff RCX: 0000000000040000 RDX: ffffc90005e19000 RSI: 000000000003ffff RDI: 0000000000040000 RBP: 0000000000000000 R08: ffffffff816a1f42 R09: ffffed1017344f2c R10: ffffed1017344f2c R11: 0000000000000000 R12: 0000607f462b1358 R13: 1ffffffff1bfd305 R14: ffffe8ffffcb1358 R15: dffffc0000000000 FS: 00007f66c71a2700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f88aaed5058 CR3: 0000000023e0f000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> dst_cache_destroy+0x107/0x1e0 net/core/dst_cache.c:160 ipip6_dev_free net/ipv6/sit.c:1414 [inline] sit_init_net+0x229/0x550 net/ipv6/sit.c:1936 ops_init+0x313/0x430 net/core/net_namespace.c:140 setup_net+0x35b/0x9d0 net/core/net_namespace.c:326 copy_net_ns+0x359/0x5c0 net/core/net_namespace.c:470 create_new_namespaces+0x4ce/0xa00 kernel/nsproxy.c:110 unshare_nsproxy_namespaces+0x11e/0x180 kernel/nsproxy.c:226 ksys_unshare+0x57d/0xb50 kernel/fork.c:3075 __do_sys_unshare kernel/fork.c:3146 [inline] __se_sys_unshare kernel/fork.c:3144 [inline] __x64_sys_unshare+0x34/0x40 kernel/fork.c:3144 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f66c882ce99 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f66c71a2168 EFLAGS: 00000246 ORIG_RAX: 0000000000000110 RAX: ffffffffffffffda RBX: 00007f66c893ff60 RCX: 00007f66c882ce99 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000048040200 RBP: 00007f66c8886ff1 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fff6634832f R14: 00007f66c71a2300 R15: 0000000000022000 </TASK> Fixes: cf124db566e6 ("net: Fix inconsistent teardown and release of private netdev state.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20211216111741.1387540-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16net/smc: Prevent smc_release() from long blockingD. Wythe
In nginx/wrk benchmark, there's a hung problem with high probability on case likes that: (client will last several minutes to exit) server: smc_run nginx client: smc_run wrk -c 10000 -t 1 http://server Client hangs with the following backtrace: 0 [ffffa7ce8Of3bbf8] __schedule at ffffffff9f9eOd5f 1 [ffffa7ce8Of3bc88] schedule at ffffffff9f9eløe6 2 [ffffa7ce8Of3bcaO] schedule_timeout at ffffffff9f9e3f3c 3 [ffffa7ce8Of3bd2O] wait_for_common at ffffffff9f9el9de 4 [ffffa7ce8Of3bd8O] __flush_work at ffffffff9fOfeOl3 5 [ffffa7ce8øf3bdfO] smc_release at ffffffffcO697d24 [smc] 6 [ffffa7ce8Of3be2O] __sock_release at ffffffff9f8O2e2d 7 [ffffa7ce8Of3be4ø] sock_close at ffffffff9f8ø2ebl 8 [ffffa7ce8øf3be48] __fput at ffffffff9f334f93 9 [ffffa7ce8Of3be78] task_work_run at ffffffff9flOlff5 10 [ffffa7ce8Of3beaO] do_exit at ffffffff9fOe5Ol2 11 [ffffa7ce8Of3bflO] do_group_exit at ffffffff9fOe592a 12 [ffffa7ce8Of3bf38] __x64_sys_exit_group at ffffffff9fOe5994 13 [ffffa7ce8Of3bf4O] do_syscall_64 at ffffffff9f9d4373 14 [ffffa7ce8Of3bfsO] entry_SYSCALL_64_after_hwframe at ffffffff9fa0007c This issue dues to flush_work(), which is used to wait for smc_connect_work() to finish in smc_release(). Once lots of smc_connect_work() was pending or all executing work dangling, smc_release() has to block until one worker comes to free, which is equivalent to wait another smc_connnect_work() to finish. In order to fix this, There are two changes: 1. For those idle smc_connect_work(), cancel it from the workqueue; for executing smc_connect_work(), waiting for it to finish. For that purpose, replace flush_work() with cancel_work_sync(). 2. Since smc_connect() hold a reference for passive closing, if smc_connect_work() has been cancelled, release the reference. Fixes: 24ac3a08e658 ("net/smc: rebuild nonblocking connect") Reported-by: Tony Lu <tonylu@linux.alibaba.com> Tested-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/1639571361-101128-1-git-send-email-alibuda@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16fib: expand fib_rule_policyFlorian Westphal
Now that there is only one fib nla_policy there is no need to keep the macro around. Place it where its used. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16fib: rules: remove duplicated nla policiesFlorian Westphal
The attributes are identical in all implementations so move the ipv4 one into the core and remove the per-family nla policies. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16netfilter: ctnetlink: remove expired entries firstFlorian Westphal
When dumping conntrack table to userspace via ctnetlink, check if the ct has already expired before doing any of the 'skip' checks. This expires dead entries faster. /proc handler also removes outdated entries first. Reported-by: Vitaly Zuevsky <vzuevsky@ns1.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-16netfilter: nf_nat_masquerade: add netns refcount tracker to masq_dev_workEric Dumazet
If compiled with CONFIG_NET_NS_REFCNT_TRACKER=y, using put_net_track() in iterate_cleanup_work() and netns_tracker_alloc() in nf_nat_masq_schedule() might help us finding netns refcount imbalances. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-16netfilter: nfnetlink: add netns refcount tracker to struct nfulnl_instanceEric Dumazet
If compiled with CONFIG_NET_NS_REFCNT_TRACKER=y, using put_net_track() in nfulnl_instance_free_rcu() and get_net_track() in instance_create() might help us finding netns refcount imbalances. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-16net: Fix double 0x prefix print in SKB dumpGal Pressman
When printing netdev features %pNF already takes care of the 0x prefix, remove the explicit one. Fixes: 6413139dfc64 ("skbuff: increase verbosity when dumping skb data") Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15net/packet: rx_owner_map depends on pg_vecWillem de Bruijn
Packet sockets may switch ring versions. Avoid misinterpreting state between versions, whose fields share a union. rx_owner_map is only allocated with a packet ring (pg_vec) and both are swapped together. If pg_vec is NULL, meaning no packet ring was allocated, then neither was rx_owner_map. And the field may be old state from a tpacket_v3. Fixes: 61fad6816fc1 ("net/packet: tpacket_rcv: avoid a producer race condition") Reported-by: Syzbot <syzbot+1ac0994a0a0c55151121@syzkaller.appspotmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20211215143937.106178-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next, mostly rather small housekeeping patches: 1) Remove unused variable in IPVS, from GuoYong Zheng. 2) Use memset_after in conntrack, from Kees Cook. 3) Remove leftover function in nfnetlink_queue, from Florian Westphal. 4) Remove redundant test on bool in conntrack, from Bernard Zhao. 5) egress support for nft_fwd, from Lukas Wunner. 6) Make pppoe work for br_netfilter, from Florian Westphal. 7) Remove unused variable in conntrack resize routine, from luo penghao. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next: netfilter: conntrack: Remove useless assignment statements netfilter: bridge: add support for pppoe filtering netfilter: nft_fwd_netdev: Support egress hook netfilter: ctnetlink: remove useless type conversion to bool netfilter: nf_queue: remove leftover synchronize_rcu netfilter: conntrack: Use memset_startat() to zero struct nf_conn ipvs: remove unused variable for ip_vs_new_dest ==================== Link: https://lore.kernel.org/r/20211215234911.170741-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>