summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)Author
2022-01-04net/sched: act_ct: Fill offloading tuple iifidxPaul Blakey
Driver offloading ct tuples can use the information of which devices received the packets that created the offloaded connections, to more efficiently offload them only to the relevant device. Add new act_ct nf conntrack extension, which is used to store the skb devices before offloading the connection, and then fill in the tuple iifindex so drivers can get the device via metadata dissector match. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-02sctp: hold endpoint before calling cb in sctp_transport_lookup_processXin Long
The same fix in commit 5ec7d18d1813 ("sctp: use call_rcu to free endpoint") is also needed for dumping one asoc and sock after the lookup. Fixes: 86fdb3448cc1 ("sctp: ensure ep is not destroyed before doing the dump") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-12-30 The following pull-request contains BPF updates for your *net-next* tree. We've added 72 non-merge commits during the last 20 day(s) which contain a total of 223 files changed, 3510 insertions(+), 1591 deletions(-). The main changes are: 1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii. 2) Beautify and de-verbose verifier logs, from Christy. 3) Composable verifier types, from Hao. 4) bpf_strncmp helper, from Hou. 5) bpf.h header dependency cleanup, from Jakub. 6) get_func_[arg|ret|arg_cnt] helpers, from Jiri. 7) Sleepable local storage, from KP. 8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c commit 077cdda764c7 ("net/mlx5e: TC, Fix memory leak with rules with internal port") commit 31108d142f36 ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'") commit 4390c6edc0fb ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'") https://lore.kernel.org/all/20211229065352.30178-1-saeed@kernel.org/ net/smc/smc_wr.c commit 49dc9013e34b ("net/smc: Use the bitmap API when applicable") commit 349d43127dac ("net/smc: fix kernel panic caused by race of smc_sock") bitmap_zero()/memset() is removed by the fix Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-29bpf: Invert the dependency between bpf-netns.h and netns/bpf.hJakub Kicinski
netns/bpf.h gets included by netdevice.h (thru net_namespace.h) which in turn gets included in a lot of places. We should keep netns/bpf.h as light-weight as possible. bpf-netns.h seems to contain more implementation details than deserves to be included in a netns header. It needs to pull in uapi/bpf.h to get various enum types. Move enum netns_bpf_attach_type to netns/bpf.h and invert the dependency. This makes netns/bpf.h fit the mold of a struct definition header more clearly, and drops the number of objects rebuilt when uapi/bpf.h is touched from 7.7k to 1.1k. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211230012742.770642-3-kuba@kernel.org
2021-12-29net: Add includes masked by netdevice.h including uapi/bpf.hJakub Kicinski
Add missing includes unmasked by the subsequent change. Mostly network drivers missing an include for XDP_PACKET_HEADROOM. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211230012742.770642-2-kuba@kernel.org
2021-12-29Merge tag 'for-net-next-2021-12-29' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - Add support for Foxconn MT7922A - Add support for Realtek RTL8852AE - Rework HCI event handling to use skb_pull_data * tag 'for-net-next-2021-12-29' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (62 commits) Bluetooth: MGMT: Fix spelling mistake "simultanous" -> "simultaneous" Bluetooth: vhci: Set HCI_QUIRK_VALID_LE_STATES Bluetooth: MGMT: Fix LE simultaneous roles UUID if not supported Bluetooth: hci_sync: Add check simultaneous roles support Bluetooth: hci_sync: Wait for proper events when connecting LE Bluetooth: hci_sync: Add support for waiting specific LE subevents Bluetooth: hci_sync: Add hci_le_create_conn_sync Bluetooth: hci_event: Use skb_pull_data when processing inquiry results Bluetooth: hci_sync: Push sync command cancellation to workqueue Bluetooth: hci_qca: Stop IBS timer during BT OFF Bluetooth: btusb: Add support for Foxconn MT7922A Bluetooth: btintel: Add missing quirks and msft ext for legacy bootloader Bluetooth: btusb: Add two more Bluetooth parts for WCN6855 Bluetooth: L2CAP: Fix using wrong mode Bluetooth: hci_sync: Fix not always pausing advertising when necessary Bluetooth: mgmt: Make use of mgmt_send_event_skb in MGMT_EV_DEVICE_CONNECTED Bluetooth: mgmt: Make use of mgmt_send_event_skb in MGMT_EV_DEVICE_FOUND Bluetooth: mgmt: Introduce mgmt_alloc_skb and mgmt_send_event_skb Bluetooth: btusb: Return error code when getting patch status failed Bluetooth: btusb: Handle download_firmware failure cases ... ==================== Link: https://lore.kernel.org/r/20211229211258.2290966-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-29net: Don't include filter.h from net/sock.hJakub Kicinski
sock.h is pretty heavily used (5k objects rebuilt on x86 after it's touched). We can drop the include of filter.h from it and add a forward declaration of struct sk_filter instead. This decreases the number of rebuilt objects when bpf.h is touched from ~5k to ~1k. There's a lot of missing includes this was masking. Primarily in networking tho, this time. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org
2021-12-25sctp: use call_rcu to free endpointXin Long
This patch is to delay the endpoint free by calling call_rcu() to fix another use-after-free issue in sctp_sock_dump(): BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20 Call Trace: __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218 lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline] _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168 spin_lock_bh include/linux/spinlock.h:334 [inline] __lock_sock+0x203/0x350 net/core/sock.c:2253 lock_sock_nested+0xfe/0x120 net/core/sock.c:2774 lock_sock include/net/sock.h:1492 [inline] sctp_sock_dump+0x122/0xb20 net/sctp/diag.c:324 sctp_for_each_transport+0x2b5/0x370 net/sctp/socket.c:5091 sctp_diag_dump+0x3ac/0x660 net/sctp/diag.c:527 __inet_diag_dump+0xa8/0x140 net/ipv4/inet_diag.c:1049 inet_diag_dump+0x9b/0x110 net/ipv4/inet_diag.c:1065 netlink_dump+0x606/0x1080 net/netlink/af_netlink.c:2244 __netlink_dump_start+0x59a/0x7c0 net/netlink/af_netlink.c:2352 netlink_dump_start include/linux/netlink.h:216 [inline] inet_diag_handler_cmd+0x2ce/0x3f0 net/ipv4/inet_diag.c:1170 __sock_diag_cmd net/core/sock_diag.c:232 [inline] sock_diag_rcv_msg+0x31d/0x410 net/core/sock_diag.c:263 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2477 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:274 This issue occurs when asoc is peeled off and the old sk is freed after getting it by asoc->base.sk and before calling lock_sock(sk). To prevent the sk free, as a holder of the sk, ep should be alive when calling lock_sock(). This patch uses call_rcu() and moves sock_put and ep free into sctp_endpoint_destroy_rcu(), so that it's safe to try to hold the ep under rcu_read_lock in sctp_transport_traverse_process(). If sctp_endpoint_hold() returns true, it means this ep is still alive and we have held it and can continue to dump it; If it returns false, it means this ep is dead and can be freed after rcu_read_unlock, and we should skip it. In sctp_sock_dump(), after locking the sk, if this ep is different from tsp->asoc->ep, it means during this dumping, this asoc was peeled off before calling lock_sock(), and the sk should be skipped; If this ep is the same with tsp->asoc->ep, it means no peeloff happens on this asoc, and due to lock_sock, no peeloff will happen either until release_sock. Note that delaying endpoint free won't delay the port release, as the port release happens in sctp_endpoint_destroy() before calling call_rcu(). Also, freeing endpoint by call_rcu() makes it safe to access the sk by asoc->base.sk in sctp_assocs_seq_show() and sctp_rcv(). Thanks Jones to bring this issue up. v1->v2: - improve the changelog. - add kfree(ep) into sctp_endpoint_destroy_rcu(), as Jakub noticed. Reported-by: syzbot+9276d76e83e3bcde6c99@syzkaller.appspotmail.com Reported-by: Lee Jones <lee.jones@linaro.org> Fixes: d25adbeb0cdb ("sctp: fix an use-after-free issue in sctp_sock_dump") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
include/net/sock.h commit 8f905c0e7354 ("inet: fully convert sk->sk_rx_dst to RCU rules") commit 43f51df41729 ("net: move early demux fields close to sk_refcnt") https://lore.kernel.org/all/20211222141641.0caa0ab3@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-23sctp: move hlist_node and hashent out of sctp_ep_commonXin Long
Struct sctp_ep_common is included in both asoc and ep, but hlist_node and hashent are only needed by ep after asoc_hashtable was dropped by Commit b5eff7128366 ("sctp: drop the old assoc hashtable of sctp"). So it is better to move hlist_node and hashent from sctp_ep_common to sctp_endpoint, and it saves some space for each asoc. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-23xfrm: rate limit SA mapping change message to user spaceAntony Antony
Kernel generates mapping change message, XFRM_MSG_MAPPING, when a source port chage is detected on a input state with UDP encapsulation set. Kernel generates a message for each IPsec packet with new source port. For a high speed flow per packet mapping change message can be excessive, and can overload the user space listener. Introduce rate limiting for XFRM_MSG_MAPPING message to the user space. The rate limiting is configurable via netlink, when adding a new SA or updating it. Use the new attribute XFRMA_MTIMER_THRESH in seconds. v1->v2 change: update xfrm_sa_len() v2->v3 changes: use u32 insted unsigned long to reduce size of struct xfrm_state fix xfrm_ompat size Reported-by: kernel test robot <lkp@intel.com> accept XFRM_MSG_MAPPING only when XFRMA_ENCAP is present Co-developed-by: Thomas Egerer <thomas.egerer@secunet.com> Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com> Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-12-22Merge tag 'mlx5-updates-2021-12-21' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-12-21 1) From Shay Drory: Devlink user knobs to control device's EQ size This series provides knobs which will enable users to minimize memory consumption of mlx5 Functions (PF/VF/SF). mlx5 exposes two new generic devlink params for EQ size configuration and uses devlink generic param max_macs. LINK: https://lore.kernel.org/netdev/20211208141722.13646-1-shayd@nvidia.com/ 2) From Tariq and Lama, allocate software channel objects and statistics of a mlx5 netdevice private data dynamically upon first demand to save on memory. * tag 'mlx5-updates-2021-12-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Take packet_merge params directly from the RX res struct net/mlx5e: Allocate per-channel stats dynamically at first usage net/mlx5e: Use dynamic per-channel allocations in stats net/mlx5e: Allow profile-specific limitation on max num of channels net/mlx5e: Save memory by using dynamic allocation in netdev priv net/mlx5e: Add profile indications for PTP and QOS HTB features net/mlx5e: Use bitmap field for profile features net/mlx5: Remove the repeated declaration net/mlx5: Let user configure max_macs generic param devlink: Clarifies max_macs generic devlink param net/mlx5: Let user configure event_eq_size param devlink: Add new "event_eq_size" generic device param net/mlx5: Let user configure io_eq_size param devlink: Add new "io_eq_size" generic device param ==================== Link: https://lore.kernel.org/r/20211222031604.14540-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-23netfilter: conntrack: tag conntracks picked up in local out hookFlorian Westphal
This allows to identify flows that originate from local machine in a followup patch. It would be possible to make this a ->status bit instead. For now I did not do that yet because I don't have a use-case for exposing this info to userspace. If one comes up the toggle can be replaced with a status bit. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-23netfilter: nf_tables: make counter support built-inPablo Neira Ayuso
Make counter support built-in to allow for direct call in case of CONFIG_RETPOLINE. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-22codel: remove unnecessary pkt_sched.h includeJakub Kicinski
Commit d068ca2ae2e6 ("codel: split into multiple files") moved all Qdisc-related code to codel_qdisc.h, move the include of pkt_sched.h as well. This is similar to the previous commit, although we don't care as much about incremental builds after pkt_sched.h was touched itself it is included by net/sch_generic.h which is modified ~20 times a year. This decreases the incremental build size after touching pkt_sched.h from 1592 to 617 objects. Fix unmasked missing includes in WiFi drivers. Acked-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20211221193941.3805147-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22codel: remove unnecessary sock.h includeJakub Kicinski
Since sock.h is modified relatively often (60 times in the last 12 months) it seems worthwhile to decrease the incremental build work. CoDel's header includes net/inet_ecn.h which in turn includes net/sock.h. codel.h is itself included by mac80211 which is included by much of the WiFi stack and drivers. Removing the net/inet_ecn.h include from CoDel breaks the dependecy between WiFi and sock.h. Commit d068ca2ae2e6 ("codel: split into multiple files") moved all the code which actually needs ECN helpers out to net/codel_impl.h, the include can be moved there as well. This decreases the incremental build size after touching sock.h from 4999 objects to 4051 objects. Fix unmasked missing includes in WiFi drivers. Acked-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20211221193941.3805147-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22Bluetooth: MGMT: Fix LE simultaneous roles UUID if not supportedLuiz Augusto von Dentz
If controller/driver don't support LE simultaneous roles its UUID shall be omitted when responding to MGMT_OP_READ_EXP_FEATURES_INFO. This also rework the support introducing HCI_LE_SIMULTANEOUS_ROLES flag so it can be detected when userspace wants to use or not. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-12-22Bluetooth: hci_sync: Add check simultaneous roles supportLuiz Augusto von Dentz
This attempts to check if the controller can act as both central and peripheral simultaneously and in case it does skip suspending advertising or in case of directed advertising don't fail if scanning. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-12-22Bluetooth: hci_sync: Add support for waiting specific LE subeventsLuiz Augusto von Dentz
This adds support for waiting for specific LE subevents instead of command status which may only indicate that the commands is in progress and a different event is used to complete the operation. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-12-22Bluetooth: hci_sync: Add hci_le_create_conn_syncLuiz Augusto von Dentz
This adds hci_le_create_conn_sync and make hci_le_connect use it instead of queueing multiple commands which may conflict with the likes of hci_update_passive_scan which uses hci_cmd_sync_queue. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-12-22Bluetooth: hci_sync: Push sync command cancellation to workqueueBenjamin Berg
syzbot reported that hci_cmd_sync_cancel may sleep from the wrong context. To avoid this, create a new work item that pushes the relevant parts into a different context. Note that we keep the old implementation with the name __hci_cmd_sync_cancel as the sleeping behaviour is desired in some cases. Reported-and-tested-by: syzbot+485cc00ea7cf41dfdbf1@syzkaller.appspotmail.com Fixes: c97a747efc93 ("Bluetooth: btusb: Cancel sync commands for certain URB errors") Signed-off-by: Benjamin Berg <bberg@redhat.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-12-21devlink: Add new "event_eq_size" generic device paramShay Drory
Add new device generic parameter to determine the size of the asynchronous control events EQ. For example, to reduce event EQ size to 64, execute: $ devlink dev param set pci/0000:06:00.0 \ name event_eq_size value 64 cmode driverinit $ devlink dev reload pci/0000:06:00.0 Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-21devlink: Add new "io_eq_size" generic device paramShay Drory
Add new device generic parameter to determine the size of the I/O completion EQs. For example, to reduce I/O EQ size to 64, execute: $ devlink dev param set pci/0000:06:00.0 \ name io_eq_size value 64 cmode driverinit $ devlink dev reload pci/0000:06:00.0 Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-21Merge tag 'mac80211-next-for-net-next-2021-12-21' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== This time we have: * ndo_fill_forward_path support in mac80211, to let drivers use it * association comeback notification for userspace, to be able to react more sensibly to long delays * support for background radar detection hardware in some chipsets * SA Query Procedures offload on the AP side * more logging if we find problems with HT/VHT/HE * various cleanups and minor fixes Conflicts: net/wireless/reg.c: e08ebd6d7b90 ("cfg80211: Acquire wiphy mutex on regulatory work") 701fdfe348f7 ("cfg80211: Enable regulatory enforcement checks for drivers supporting mesh iface") https://lore.kernel.org/r/20211221111950.57ecc6a7@canb.auug.org.au drivers/net/wireless/ath/ath10k/wmi.c: 7f599aeccbd2 ("cfg80211: Use the HE operation IE to determine a 6GHz BSS channel") 3bf2537ec2e3 ("ath10k: drop beacon and probe response which leak from other channel") https://lore.kernel.org/r/20211221115004.1cd6b262@canb.auug.org.au * tag 'mac80211-next-for-net-next-2021-12-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: (32 commits) cfg80211: Enable regulatory enforcement checks for drivers supporting mesh iface rfkill: allow to get the software rfkill state cfg80211: refactor cfg80211_get_ies_channel_number() nl82011: clarify interface combinations wrt. channels nl80211: Add support to offload SA Query procedures for AP SME device nl80211: Add support to set AP settings flags with single attribute mac80211: add more HT/VHT/HE state logging cfg80211: Use the HE operation IE to determine a 6GHz BSS channel cfg80211: rename offchannel_chain structs to background_chain to avoid confusion with ETSI standard mac80211: Notify cfg80211 about association comeback cfg80211: Add support for notifying association comeback mac80211: introduce channel switch disconnect function cfg80211: Fix order of enum nl80211_band_iftype_attr documentation cfg80211: simplify cfg80211_chandef_valid() mac80211: Remove a couple of obsolete TODO mac80211: fix FEC flag in radio tap header mac80211: use coarse boottime for airtime fairness code ieee80211: change HE nominal packet padding value defines cfg80211: use ieee80211_bss_get_elem() instead of _get_ie() mac80211: Use memset_after() to clear tx status ... ==================== Link: https://lore.kernel.org/r/20211221112532.28708-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-20inet: fully convert sk->sk_rx_dst to RCU rulesEric Dumazet
syzbot reported various issues around early demux, one being included in this changelog [1] sk->sk_rx_dst is using RCU protection without clearly documenting it. And following sequences in tcp_v4_do_rcv()/tcp_v6_do_rcv() are not following standard RCU rules. [a] dst_release(dst); [b] sk->sk_rx_dst = NULL; They look wrong because a delete operation of RCU protected pointer is supposed to clear the pointer before the call_rcu()/synchronize_rcu() guarding actual memory freeing. In some cases indeed, dst could be freed before [b] is done. We could cheat by clearing sk_rx_dst before calling dst_release(), but this seems the right time to stick to standard RCU annotations and debugging facilities. [1] BUG: KASAN: use-after-free in dst_check include/net/dst.h:470 [inline] BUG: KASAN: use-after-free in tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792 Read of size 2 at addr ffff88807f1cb73a by task syz-executor.5/9204 CPU: 0 PID: 9204 Comm: syz-executor.5 Not tainted 5.16.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247 __kasan_report mm/kasan/report.c:433 [inline] kasan_report.cold+0x83/0xdf mm/kasan/report.c:450 dst_check include/net/dst.h:470 [inline] tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792 ip_rcv_finish_core.constprop.0+0x15de/0x1e80 net/ipv4/ip_input.c:340 ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583 ip_sublist_rcv net/ipv4/ip_input.c:609 [inline] ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644 __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline] __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556 __netif_receive_skb_list net/core/dev.c:5608 [inline] netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699 gro_normal_list net/core/dev.c:5853 [inline] gro_normal_list net/core/dev.c:5849 [inline] napi_complete_done+0x1f1/0x880 net/core/dev.c:6590 virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline] virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557 __napi_poll+0xaf/0x440 net/core/dev.c:7023 napi_poll net/core/dev.c:7090 [inline] net_rx_action+0x801/0xb40 net/core/dev.c:7177 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637 irq_exit_rcu+0x5/0x20 kernel/softirq.c:649 common_interrupt+0x52/0xc0 arch/x86/kernel/irq.c:240 asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:629 RIP: 0033:0x7f5e972bfd57 Code: 39 d1 73 14 0f 1f 80 00 00 00 00 48 8b 50 f8 48 83 e8 08 48 39 ca 77 f3 48 39 c3 73 3e 48 89 13 48 8b 50 f8 48 89 38 49 8b 0e <48> 8b 3e 48 83 c3 08 48 83 c6 08 eb bc 48 39 d1 72 9e 48 39 d0 73 RSP: 002b:00007fff8a413210 EFLAGS: 00000283 RAX: 00007f5e97108990 RBX: 00007f5e97108338 RCX: ffffffff81d3aa45 RDX: ffffffff81d3aa45 RSI: 00007f5e97108340 RDI: ffffffff81d3aa45 RBP: 00007f5e97107eb8 R08: 00007f5e97108d88 R09: 0000000093c2e8d9 R10: 0000000000000000 R11: 0000000000000000 R12: 00007f5e97107eb0 R13: 00007f5e97108338 R14: 00007f5e97107ea8 R15: 0000000000000019 </TASK> Allocated by task 13: kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:434 [inline] __kasan_slab_alloc+0x90/0xc0 mm/kasan/common.c:467 kasan_slab_alloc include/linux/kasan.h:259 [inline] slab_post_alloc_hook mm/slab.h:519 [inline] slab_alloc_node mm/slub.c:3234 [inline] slab_alloc mm/slub.c:3242 [inline] kmem_cache_alloc+0x202/0x3a0 mm/slub.c:3247 dst_alloc+0x146/0x1f0 net/core/dst.c:92 rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613 ip_route_input_slow+0x1817/0x3a20 net/ipv4/route.c:2340 ip_route_input_rcu net/ipv4/route.c:2470 [inline] ip_route_input_noref+0x116/0x2a0 net/ipv4/route.c:2415 ip_rcv_finish_core.constprop.0+0x288/0x1e80 net/ipv4/ip_input.c:354 ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583 ip_sublist_rcv net/ipv4/ip_input.c:609 [inline] ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644 __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline] __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556 __netif_receive_skb_list net/core/dev.c:5608 [inline] netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699 gro_normal_list net/core/dev.c:5853 [inline] gro_normal_list net/core/dev.c:5849 [inline] napi_complete_done+0x1f1/0x880 net/core/dev.c:6590 virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline] virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557 __napi_poll+0xaf/0x440 net/core/dev.c:7023 napi_poll net/core/dev.c:7090 [inline] net_rx_action+0x801/0xb40 net/core/dev.c:7177 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558 Freed by task 13: kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38 kasan_set_track+0x21/0x30 mm/kasan/common.c:46 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370 ____kasan_slab_free mm/kasan/common.c:366 [inline] ____kasan_slab_free mm/kasan/common.c:328 [inline] __kasan_slab_free+0xff/0x130 mm/kasan/common.c:374 kasan_slab_free include/linux/kasan.h:235 [inline] slab_free_hook mm/slub.c:1723 [inline] slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1749 slab_free mm/slub.c:3513 [inline] kmem_cache_free+0xbd/0x5d0 mm/slub.c:3530 dst_destroy+0x2d6/0x3f0 net/core/dst.c:127 rcu_do_batch kernel/rcu/tree.c:2506 [inline] rcu_core+0x7ab/0x1470 kernel/rcu/tree.c:2741 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558 Last potentially related work creation: kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38 __kasan_record_aux_stack+0xf5/0x120 mm/kasan/generic.c:348 __call_rcu kernel/rcu/tree.c:2985 [inline] call_rcu+0xb1/0x740 kernel/rcu/tree.c:3065 dst_release net/core/dst.c:177 [inline] dst_release+0x79/0xe0 net/core/dst.c:167 tcp_v4_do_rcv+0x612/0x8d0 net/ipv4/tcp_ipv4.c:1712 sk_backlog_rcv include/net/sock.h:1030 [inline] __release_sock+0x134/0x3b0 net/core/sock.c:2768 release_sock+0x54/0x1b0 net/core/sock.c:3300 tcp_sendmsg+0x36/0x40 net/ipv4/tcp.c:1441 inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 sock_write_iter+0x289/0x3c0 net/socket.c:1057 call_write_iter include/linux/fs.h:2162 [inline] new_sync_write+0x429/0x660 fs/read_write.c:503 vfs_write+0x7cd/0xae0 fs/read_write.c:590 ksys_write+0x1ee/0x250 fs/read_write.c:643 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff88807f1cb700 which belongs to the cache ip_dst_cache of size 176 The buggy address is located 58 bytes inside of 176-byte region [ffff88807f1cb700, ffff88807f1cb7b0) The buggy address belongs to the page: page:ffffea0001fc72c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7f1cb flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000000200 dead000000000100 dead000000000122 ffff8881413bb780 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL), pid 5, ts 108466983062, free_ts 108048976062 prep_new_page mm/page_alloc.c:2418 [inline] get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369 alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191 alloc_slab_page mm/slub.c:1793 [inline] allocate_slab mm/slub.c:1930 [inline] new_slab+0x32d/0x4a0 mm/slub.c:1993 ___slab_alloc+0x918/0xfe0 mm/slub.c:3022 __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3109 slab_alloc_node mm/slub.c:3200 [inline] slab_alloc mm/slub.c:3242 [inline] kmem_cache_alloc+0x35c/0x3a0 mm/slub.c:3247 dst_alloc+0x146/0x1f0 net/core/dst.c:92 rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613 __mkroute_output net/ipv4/route.c:2564 [inline] ip_route_output_key_hash_rcu+0x921/0x2d00 net/ipv4/route.c:2791 ip_route_output_key_hash+0x18b/0x300 net/ipv4/route.c:2619 __ip_route_output_key include/net/route.h:126 [inline] ip_route_output_flow+0x23/0x150 net/ipv4/route.c:2850 ip_route_output_key include/net/route.h:142 [inline] geneve_get_v4_rt+0x3a6/0x830 drivers/net/geneve.c:809 geneve_xmit_skb drivers/net/geneve.c:899 [inline] geneve_xmit+0xc4a/0x3540 drivers/net/geneve.c:1082 __netdev_start_xmit include/linux/netdevice.h:4994 [inline] netdev_start_xmit include/linux/netdevice.h:5008 [inline] xmit_one net/core/dev.c:3590 [inline] dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3606 __dev_queue_xmit+0x299a/0x3650 net/core/dev.c:4229 page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1338 [inline] free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1389 free_unref_page_prepare mm/page_alloc.c:3309 [inline] free_unref_page+0x19/0x690 mm/page_alloc.c:3388 qlink_free mm/kasan/quarantine.c:146 [inline] qlist_free_all+0x5a/0xc0 mm/kasan/quarantine.c:165 kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272 __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:444 kasan_slab_alloc include/linux/kasan.h:259 [inline] slab_post_alloc_hook mm/slab.h:519 [inline] slab_alloc_node mm/slub.c:3234 [inline] kmem_cache_alloc_node+0x255/0x3f0 mm/slub.c:3270 __alloc_skb+0x215/0x340 net/core/skbuff.c:414 alloc_skb include/linux/skbuff.h:1126 [inline] alloc_skb_with_frags+0x93/0x620 net/core/skbuff.c:6078 sock_alloc_send_pskb+0x783/0x910 net/core/sock.c:2575 mld_newpack+0x1df/0x770 net/ipv6/mcast.c:1754 add_grhead+0x265/0x330 net/ipv6/mcast.c:1857 add_grec+0x1053/0x14e0 net/ipv6/mcast.c:1995 mld_send_initial_cr.part.0+0xf6/0x230 net/ipv6/mcast.c:2242 mld_send_initial_cr net/ipv6/mcast.c:1232 [inline] mld_dad_work+0x1d3/0x690 net/ipv6/mcast.c:2268 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298 worker_thread+0x658/0x11f0 kernel/workqueue.c:2445 Memory state around the buggy address: ffff88807f1cb600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88807f1cb680: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc >ffff88807f1cb700: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88807f1cb780: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc ffff88807f1cb800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 41063e9dd119 ("ipv4: Early TCP socket demux.") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20211220143330.680945-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-20nl80211: Add support to set AP settings flags with single attributeVeerendranath Jakkam
In previous method each AP settings flag is represented by a top-level flag attribute and conversion to enum cfg80211_ap_settings_flags had to be done before sending them to driver. This commit is to make it easier to define new AP settings flags and sending them to driver. This commit also deprecate sending of %NL80211_ATTR_EXTERNAL_AUTH_SUPPORT in %NL80211_CMD_START_AP. But to maintain backwards compatibility checks for %NL80211_ATTR_EXTERNAL_AUTH_SUPPORT in %NL80211_CMD_START_AP when %NL80211_ATTR_AP_SETTINGS_FLAGS not present in %NL80211_CMD_START_AP. Signed-off-by: Veerendranath Jakkam <vjakkam@codeaurora.org> Link: https://lore.kernel.org/r/1637911519-21306-1-git-send-email-vjakkam@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20cfg80211: Use the HE operation IE to determine a 6GHz BSS channelAyala Beker
A non-collocated AP whose primary channel is not a PSC channel may transmit a duplicated beacon on the corresponding PSC channel in which it would indicate its true primary channel. Use this inforamtion contained in the HE operation IE to determine the primary channel of the AP. In case of invalid infomration ignore it and use the channel the frame was received on. Signed-off-by: Ayala Beker <ayala.beker@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20211202143322.71eb2176e54e.I130f678e4aa390973ab39d838bbfe7b2d54bff8e@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20cfg80211: rename offchannel_chain structs to background_chain to avoid ↵Lorenzo Bianconi
confusion with ETSI standard ETSI standard defines "Offchannel CAC" as: "Off-Channel CAC is performed by a number of non-continuous checks spread over a period in time. This period, which is required to determine the presence of radar signals, is defined as the Off-Channel CAC Time.. Minimum Off-Channel CAC Time 6 minutes and Maximum Off-Channel CAC Time 4 hours..". mac80211 implementation refers to a dedicated hw chain used for continuous radar monitoring. Rename offchannel_* references to background_* in order to avoid confusion with ETSI standard. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/4204cc1d648d76b44557981713231e030a3bd991.1638190762.git.lorenzo@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20cfg80211: Add support for notifying association comebackIlan Peer
Thought the underline driver MLME can handle association temporal rejection with comeback, it is still useful to notify this to user space, as user space might want to handle the temporal rejection differently. For example, in case the comeback time is too long, user space can deauthenticate immediately and try to associate with a different AP. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20211129152938.2467809e8cb3.I45574185b582666bc78eef0c29a4c36b478e5382@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-20mac80211: introduce channel switch disconnect functionNathan Errera
Introduce a disconnect function that can be used when a channel switch error occurs. The channel switch can request to block the tx, and so, we need to make sure we do not send a deauth frame in this case. Signed-off-by: Nathan Errera <nathan.errera@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20211129152938.cd2a615a0702.I9edb14785586344af17644b610ab5be109dcef00@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-12-19flow_offload: validate flags of filter and actionsBaowen Zheng
Add process to validate flags of filter and actions when adding a tc filter. We need to prevent adding filter with flags conflicts with its actions. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: add reoffload process to update hw_countBaowen Zheng
Add reoffload process to update hw_count when driver is inserted or removed. We will delete the action if it is with skip_sw flag and not offloaded to any hardware in reoffload process. When reoffloading actions, we still offload the actions that are added independent of filters. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: add process to update action stats from hardwareBaowen Zheng
When collecting stats for actions update them using both hardware and software counters. Stats update process should not run in context of preempt_disable. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: rename exts stats update functions with hwBaowen Zheng
Rename exts stats update functions with hw for readability. We make this change also to update stats from hw for an action when it is offloaded to hw as a single action. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: add skip_hw and skip_sw to control if offload the actionBaowen Zheng
We add skip_hw and skip_sw for user to control if offload the action to hardware. We also add in_hw_count for user to indicate if the action is offloaded to any hardware. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: allow user to offload tc action to net deviceBaowen Zheng
Use flow_indr_dev_register/flow_indr_dev_setup_offload to offload tc action. We need to call tc_cleanup_flow_action to clean up tc action entry since in tc_setup_action, some actions may hold dev refcnt, especially the mirror action. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: add ops to tc_action_ops for flow action setupBaowen Zheng
Add a new ops to tc_action_ops for flow action setup. Refactor function tc_setup_flow_action to use this new ops. We make this change to facilitate to add standalone action module. We will also use this ops to offload action independent of filter in following patch. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: rename offload functions with offload instead of flowBaowen Zheng
To improves readability, we rename offload functions with offload instead of flow. The term flow is related to exact matches, so we rename these functions with offload. We make this change to facilitate single action offload functions naming. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-19flow_offload: add index to flow_action_entry structureBaowen Zheng
Add index to flow_action_entry structure and delete index from police and gate child structure. We make this change to offload tc action for driver to identify a tc action. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-17net/sched: flow_dissector: Fix matching on zone id for invalid connsPaul Blakey
If ct rejects a flow, it removes the conntrack info from the skb. act_ct sets the post_ct variable so the dissector will see this case as an +tracked +invalid state, but the zone id is lost with the conntrack info. To restore the zone id on such cases, set the last executed zone, via the tc control block, when passing ct, and read it back in the dissector if there is no ct info on the skb (invalid connection). Fixes: 7baf2429a1a9 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support") Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-17net/sched: Extend qdisc control block with tc control blockPaul Blakey
BPF layer extends the qdisc control block via struct bpf_skb_data_end and because of that there is no more room to add variables to the qdisc layer control block without going over the skb->cb size. Extend the qdisc control block with a tc control block, and move all tc related variables to there as a pre-step for extending the tc control block with additional members. Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16fib: expand fib_rule_policyFlorian Westphal
Now that there is only one fib nla_policy there is no need to keep the macro around. Place it where its used. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16fib: rules: remove duplicated nla policiesFlorian Westphal
The attributes are identical in all implementations so move the ipv4 one into the core and remove the per-family nla policies. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-14net: dsa: make tagging protocols connect to individual switches from a treeVladimir Oltean
On the NXP Bluebox 3 board which uses a multi-switch setup with sja1105, the mechanism through which the tagger connects to the switch tree is broken, due to improper DSA code design. At the time when tag_ops->connect() is called in dsa_port_parse_cpu(), DSA hasn't finished "touching" all the ports, so it doesn't know how large the tree is and how many ports it has. It has just seen the first CPU port by this time. As a result, this function will call the tagger's ->connect method too early, and the tagger will connect only to the first switch from the tree. This could be perhaps addressed a bit more simply by just moving the tag_ops->connect(dst) call a bit later (for example in dsa_tree_setup), but there is already a design inconsistency at present: on the switch side, the notification is on a per-switch basis, but on the tagger side, it is on a per-tree basis. Furthermore, the persistent storage itself is per switch (ds->tagger_data). And the tagger connect and disconnect procedures (at least the ones that exist currently) could see a fair bit of simplification if they didn't have to iterate through the switches of a tree. To fix the issue, this change transforms tag_ops->connect(dst) into tag_ops->connect(ds) and moves it somewhere where we already iterate over all switches of a tree. That is in dsa_switch_setup_tag_protocol(), which is a good placement because we already have there the connection call to the switch side of things. As for the dsa_tree_bind_tag_proto() method (called from the code path that changes the tag protocol), things are a bit more complicated because we receive the tree as argument, yet when we unwind on errors, it would be nice to not call tag_ops->disconnect(ds) where we didn't previously call tag_ops->connect(ds). We didn't have this problem before because the tag_ops connection operations passed the entire dst before, and this is more fine grained now. To solve the error rewind case using the new API, we have to create yet one more cross-chip notifier for disconnection, and stay connected with the old tag protocol to all the switches in the tree until we've succeeded to connect with the new one as well. So if something fails half way, the whole tree is still connected to the old tagger. But there may still be leaks if the tagger fails to connect to the 2nd out of 3 switches in a tree: somebody needs to tell the tagger to disconnect from the first switch. Nothing comes for free, and this was previously handled privately by the tagging protocol driver before, but now we need to emit a disconnect cross-chip notifier for that, because DSA has to take care of the unwind path. We assume that the tagging protocol has connected to a switch if it has set ds->tagger_data to something, otherwise we avoid calling its disconnection method in the error rewind path. The rest of the changes are in the tagging protocol drivers, and have to do with the replacement of dst with ds. The iteration is removed and the error unwind path is simplified, as mentioned above. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14xsk: Wipe out dead zero_copy_allocator declarationsMaciej Fijalkowski
zero_copy_allocator has been removed back when Bjorn Topel introduced xsk_buff_pool. Remove references to it that were dangling in the tree. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20211210171511.11574-1-maciej.fijalkowski@intel.com
2021-12-13bareudp: Move definition of struct bareudp_conf to bareudp.cGuillaume Nault
This structure is used only in bareudp.c. While there, adjust include files: we need netdevice.h, not skbuff.h. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-13bareudp: Remove bareudp_dev_create()Guillaume Nault
There's no user for this function. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-12net: dsa: remove dp->privVladimir Oltean
All current in-tree uses of dp->priv have been replaced with ds->tagger_data, which provides for a safer API especially when the connection isn't the regular 1:1 link between one switch driver and one tagging protocol driver, but could be either one switch to many taggers, or many switches to one tagger. Therefore, we can remove this unused pointer. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-12net: dsa: introduce tagger-owned storage for private and shared dataVladimir Oltean
Ansuel is working on register access over Ethernet for the qca8k switch family. This requires the qca8k tagging protocol driver to receive frames which aren't intended for the network stack, but instead for the qca8k switch driver itself. The dp->priv is currently the prevailing method for passing data back and forth between the tagging protocol driver and the switch driver. However, this method is riddled with caveats. The DSA design allows in principle for any switch driver to return any protocol it desires in ->get_tag_protocol(). The dsa_loop driver can be modified to do just that. But in the current design, the memory behind dp->priv has to be allocated by the switch driver, so if the tagging protocol is paired to an unexpected switch driver, we may end up in NULL pointer dereferences inside the kernel, or worse (a switch driver may allocate dp->priv according to the expectations of a different tagger). The latter possibility is even more plausible considering that DSA switches can dynamically change tagging protocols in certain cases (dsa <-> edsa, ocelot <-> ocelot-8021q), and the current design lends itself to mistakes that are all too easy to make. This patch proposes that the tagging protocol driver should manage its own memory, instead of relying on the switch driver to do so. After analyzing the different in-tree needs, it can be observed that the required tagger storage is per switch, therefore a ds->tagger_data pointer is introduced. In principle, per-port storage could also be introduced, although there is no need for it at the moment. Future changes will replace the current usage of dp->priv with ds->tagger_data. We define a "binding" event between the DSA switch tree and the tagging protocol. During this binding event, the tagging protocol's ->connect() method is called first, and this may allocate some memory for each switch of the tree. Then a cross-chip notifier is emitted for the switches within that tree, and they are given the opportunity to fix up the tagger's memory (for example, they might set up some function pointers that represent virtual methods for consuming packets). Because the memory is owned by the tagger, there exists a ->disconnect() method for the tagger (which is the place to free the resources), but there doesn't exist a ->disconnect() method for the switch driver. This is part of the design. The switch driver should make minimal use of the public part of the tagger data, and only after type-checking it using the supplied "proto" argument. In the code there are in fact two binding events, one is the initial event in dsa_switch_setup_tag_protocol(). At this stage, the cross chip notifier chains aren't initialized, so we call each switch's connect() method by hand. Then there is dsa_tree_bind_tag_proto() during dsa_tree_change_tag_proto(), and here we have an old protocol and a new one. We first connect to the new one before disconnecting from the old one, to simplify error handling a bit and to ensure we remain in a valid state at all times. Co-developed-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>