summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-04-24net: marvell: prestera: allocate dummy net_device dynamicallyBreno Leitao
Embedding net_device into structures prohibits the usage of flexible arrays in the net_device structure. For more details, see the discussion at [1]. Un-embed the net_device from the private struct by converting it into a pointer. Then use the leverage the new alloc_netdev_dummy() helper to allocate and initialize dummy devices. [1] https://lore.kernel.org/all/20240229225910.79e224cf@kernel.org/ Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Elad Nachman <enachman@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: create a dummy net_device allocatorBreno Leitao
It is impossible to use init_dummy_netdev together with alloc_netdev() as the 'setup' argument. This is because alloc_netdev() initializes some fields in the net_device structure, and later init_dummy_netdev() memzero them all. This causes some problems as reported here: https://lore.kernel.org/all/20240322082336.49f110cc@kernel.org/ Split the init_dummy_netdev() function in two. Create a new function called init_dummy_netdev_core() that does not memzero the net_device structure. Then have init_dummy_netdev() memzero-ing and calling init_dummy_netdev_core(), keeping the old behaviour. init_dummy_netdev_core() is the new function that could be called as an argument for alloc_netdev(). Also, create a helper to allocate and initialize dummy net devices, leveraging init_dummy_netdev_core() as the setup argument. This function basically simplify the allocation of dummy devices, by allocating and initializing it. Freeing the device continue to be done through free_netdev() Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: free_netdev: exit earlier if dummyBreno Leitao
For dummy devices, exit earlier at free_netdev() instead of executing the whole function. This is necessary, because dummy devices are special, and shouldn't have the second part of the function executed. Otherwise reg_state, which is NETREG_DUMMY, will be overwritten and there will be no way to identify that this is a dummy device. Also, this device do not need the final put_device(), since dummy devices are not registered (through register_netdevice()), where the device reference is increased (at netdev_register_kobject()/device_add()). Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: core: Fix documentationBreno Leitao
Fix bad grammar in description of init_dummy_netdev() function. This topic showed up in the review of the "allocate dummy device dynamically" patch set. Suggested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24Merge branch 'dsa-mt7530-improvements'David S. Miller
Arınç ÜNAL says: ==================== MT7530 DSA Subdriver Improvements Act IV This is the forth patch series with the goal of simplifying the MT7530 DSA subdriver and improving support for MT7530, MT7531, and the switch on the MT7988 SoC. I have done a simple ping test to confirm basic communication on all switch ports on MCM and standalone MT7530, and MT7531 switch with this patch series applied. MT7621 Unielec, MCM MT7530: rgmii-only-gmac0-mt7621-unielec-u7621-06-16m.dtb gmac0-and-gmac1-mt7621-unielec-u7621-06-16m.dtb tftpboot 0x80008000 mips-uzImage.bin; tftpboot 0x83000000 mips-rootfs.cpio.uboot; tftpboot 0x83f00000 $dtb; bootm 0x80008000 0x83000000 0x83f00000 MT7622 Bananapi, MT7531: gmac0-and-gmac1-mt7622-bananapi-bpi-r64.dtb tftpboot 0x40000000 arm64-Image; tftpboot 0x45000000 arm64-rootfs.cpio.uboot; tftpboot 0x4a000000 $dtb; booti 0x40000000 0x45000000 0x4a000000 MT7623 Bananapi, standalone MT7530: rgmii-only-gmac0-mt7623n-bananapi-bpi-r2.dtb gmac0-and-gmac1-mt7623n-bananapi-bpi-r2.dtb tftpboot 0x80008000 arm-zImage; tftpboot 0x83000000 arm-rootfs.cpio.uboot; tftpboot 0x83f00000 $dtb; bootz 0x80008000 0x83000000 0x83f00000 This patch series finalises the patch series linked below. https://lore.kernel.org/r/20230522121532.86610-1-arinc.unal@arinc9.com --- Changes in v2: - Add two new patches to the end. - Patch 13 - Add the missing patch log. - Link to v1: https://lore.kernel.org/r/20240419-for-netnext-mt7530-improvements-4-v1-0-6d852ca79b1d@arinc9.com ==================== Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: dsa: mt7530: explain exposing MDIO bus of MT7531AE betterArınç ÜNAL
Unlike MT7531BE, the GPIO 6-12 pins are not used for RGMII on MT7531AE. Therefore, the GPIO 11-12 pins are set to function as MDC and MDIO to expose the MDIO bus of the switch. Replace the comment with a better explanation. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: dsa: mt7530: do not pass port variable to mt7531_rgmii_setup()Arınç ÜNAL
The mt7531_rgmii_setup() function does not use the port variable, do not pass the variable to it. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: dsa: mt7530: use priv->ds->num_ports instead of MT7530_NUM_PORTSArınç ÜNAL
Use priv->ds->num_ports on all for loops which configure the switch registers. In the future, the value of MT7530_NUM_PORTS will depend on priv->id. Therefore, this change prepares the subdriver for a simpler implementation. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: dsa: mt7530: get rid of mac_port_validate member of mt753x_infoArınç ÜNAL
The mac_port_validate member of the mt753x_info structure is not being used, remove it. Improve the member description section in the process. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: dsa: mt7530: refactor MT7530_PMEEECR_P()Arınç ÜNAL
The MT7530_PMEEECR_P() register is on MT7530, MT7531, and the switch on the MT7988 SoC. Rename the definition for them to MT753X_PMEEECR_P(). Use the FIELD_PREP and FIELD_GET macros. Rename GET_LPI_THRESH() and SET_LPI_THRESH() to LPI_THRESH_GET() and LPI_THRESH_SET(). Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: dsa: mt7530: get rid of function sanity checkArınç ÜNAL
Get rid of checking whether functions are filled properly. priv->info which is an mt753x_info structure is filled and checked for before this check. It's unnecessary checking whether it's filled properly. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: dsa: mt7530: define MAC speed capabilities per switch modelArınç ÜNAL
With the support of the MT7988 SoC switch, the MAC speed capabilities defined on mt753x_phylink_get_caps() won't apply to all switch models anymore. Move them to more appropriate locations instead of overwriting config->mac_capabilities. Remove the comment on mt753x_phylink_get_caps() as it's become invalid with the support of MT7531 and MT7988 SoC switch. Add break to case 6 of mt7988_mac_port_get_caps() to be explicit. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: dsa: mt7530: return mt7530_setup_mdio & mt7531_setup_common on errorArınç ÜNAL
The mt7530_setup_mdio() and mt7531_setup_common() functions should be checked for errors. Return if the functions return a non-zero value. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: dsa: mt7530: move MT753X_MTRAP operations for MT7530Arınç ÜNAL
On MT7530, the media-independent interfaces of port 5 and 6 are controlled by the MT7530_P5_DIS and MT7530_P6_DIS bits of the hardware trap. Deal with these bits only when the relevant port is being enabled or disabled. This ensures that these ports will be disabled when they are not in use. Do not set MT7530_CHG_TRAP on mt7530_setup_port5() as that's already being done on mt7530_setup(). Instead of globally setting MT7530_P5_MAC_SEL, clear it, then set it only on the appropriate case. If PHY muxing is detected, clear MT7530_P5_DIS before calling mt7530_setup_port5(). Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: dsa: mt7530: refactor MT7530_HWTRAP and MT7530_MHWTRAPArınç ÜNAL
The MT7530_HWTRAP and MT7530_MHWTRAP registers are on MT7530 and MT7531. It's called hardware trap on MT7530, software trap on MT7531. That's because some bits of the trap on MT7530 cannot be modified by software whilst all bits of the trap on MT7531 can. Rename the definitions for them to MT753X_TRAP and MT753X_MTRAP. Add MT7530 and MT7531 prefixes to the definitions specific to the switch model. Remove the extra parentheses from MT7530_XTAL_40MHZ and MT7530_XTAL_20MHZ. Rename MHWTRAP_PHY0_SEL, MHWTRAP_MANUAL, and MHWTRAP_PHY_ACCESS to be on par with the "MT7621 Giga Switch Programming Guide v0.3" document. Make an enumaration for the XTAL frequency. Set the data type of the xtal variable on mt7531_pll_setup() to it. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: dsa: mt7530: refactor MT7530_MFC and MT7531_CFC, add MT7531_QRY_FFPArınç ÜNAL
The MT7530_MFC register is on MT7530, MT7531, and the switch on the MT7988 SoC. Rename it to MT753X_MFC. Bit 7 to 0 differs between MT7530 and MT7531/MT7988. Add MT7530 prefix to these definitions, and define the IGMP/MLD Query Frame Flooding Ports mask for MT7531. Rename the cases of MIRROR_MASK to MIRROR_PORT_MASK. Move mt753x_mirror_port_get() and mt753x_port_mirror_set() to mt7530.h as macros. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: dsa: mt7530: rename mt753x_bpdu_port_fw enum to mt753x_to_cpu_fwArınç ÜNAL
The mt753x_bpdu_port_fw enum is globally used for manipulating the process of deciding the forwardable ports, specifically concerning the CPU port(s). Therefore, rename it and the values in it to mt753x_to_cpu_fw. Change FOLLOW_MFC to SYSTEM_DEFAULT to be on par with the switch documents. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: dsa: mt7530: rename p5_intf_sel and use only for MT7530 switchArınç ÜNAL
The p5_intf_sel pointer is used to store the information of whether PHY muxing is used or not. PHY muxing is a feature specific to port 5 of the MT7530 switch. Do not use it for other switch models. Rename the pointer to p5_mode to store the mode the port is being used in. Rename the p5_interface_select enum to mt7530_p5_mode, the string representation to mt7530_p5_mode_str, and the enum elements. If PHY muxing is not detected, the default mode, GMAC5, will be used. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: dsa: mt7530: refactor MT7530_PMCR_P()Arınç ÜNAL
The MT7530_PMCR_P() registers are on MT7530, MT7531, and the switch on the MT7988 SoC. Rename the definition for them to MT753X_PMCR_P(). Bit 15 is for MT7530 only. Add MT7530 prefix to the definition for bit 15. Use GENMASK and FIELD_PREP for PMCR_IFG_XMIT(). Rename PMCR_TX_EN and PMCR_RX_EN to PMCR_MAC_TX_EN and PMCR_MAC_TX_EN to follow the naming on the "MT7621 Giga Switch Programming Guide v0.3", "MT7531 Reference Manual for Development Board v1.0", and "MT7988A Wi-Fi 7 Generation Router Platform: Datasheet (Open Version) v0.1" documents. These documents show that PMCR_RX_FC_EN is at bit 5. Correct this along with renaming it to PMCR_FORCE_RX_FC_EN, and the same for PMCR_TX_FC_EN. Remove PMCR_SPEED_MASK which doesn't have a use. Rename the force mode definitions for MT7531 to FORCE_MODE. Add MASK at the end for the mask that includes all force mode definitions. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: dsa: mt7530: disable EEE abilities on failure on MT7531 and MT7988Arınç ÜNAL
The MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 bits let the PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 bits determine the 1G/100 EEE abilities of the MAC. If MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 are unset, the abilities are left to be determined by PHY auto polling. The commit 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features") made it so that the PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 bits are set on mt753x_phylink_mac_link_up(). But it did not set the MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 bits. Because of this, the EEE abilities will be determined by PHY auto polling, regardless of the result of phy_init_eee(). Define these bits and add them to the MT7531_FORCE_MODE mask which is set in mt7531_setup_common(). With this, there won't be any EEE abilities set when phy_init_eee() returns a negative value. Thanks to Russell for explaining when phy_init_eee() could return a negative value below. Looking at phy_init_eee(), it could return a negative value when: 1. phydev->drv is NULL 2. if genphy_c45_eee_is_active() returns negative 3. if genphy_c45_eee_is_active() returns zero, it returns -EPROTONOSUPPORT 4. if phy_set_bits_mmd() fails (e.g. communication error with the PHY) If we then look at genphy_c45_eee_is_active(), then: genphy_c45_read_eee_adv() and genphy_c45_read_eee_lpa() propagate their non-zero return values, otherwise this function returns zero or positive integer. If we then look at genphy_c45_read_eee_adv(), then a failure of phy_read_mmd() would cause a negative value to be returned. Looking at genphy_c45_read_eee_lpa(), the same is true. So, it can be summarised as: - phydev->drv is NULL - there is a communication error accessing the PHY - EEE is not active otherwise, it returns zero on success. If one wishes to determine whether an error occurred vs EEE not being supported through negotiation for the negotiated speed, if it returns -EPROTONOSUPPORT in the latter case. Other error codes mean either the driver has been unloaded or communication error. In conclusion, determining the EEE abilities by PHY auto polling shouldn't result in having any EEE abilities enabled, when one of the last two situations in the summary happens. And it seems that if phydev->drv is NULL, there would be bigger problems with the device than a broken link. So this is not a bugfix. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: phy: mediatek-ge-soc: follow netdev LED trigger semanticsDaniel Golle
Only blink if the link is up on a LED which is programmed to also indicate link-status. Otherwise, if both LEDs are in use to indicate different speeds, the resulting blinking being inverted on LEDs which aren't switched on at a specific speed is quite counter-intuitive. Also make sure that state left behind by reset or the bootloader is recognized correctly including the half-duplex and full-duplex bits as well as the (unsupported by Linux netdev trigger semantics) link-down bit. Fixes: c66937b0f8db ("net: phy: mediatek-ge-soc: support PHY LEDs") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-24net: gtp: Fix Use-After-Free in gtp_dellinkHyunwoo Kim
Since call_rcu, which is called in the hlist_for_each_entry_rcu traversal of gtp_dellink, is not part of the RCU read critical section, it is possible that the RCU grace period will pass during the traversal and the key will be free. To prevent this, it should be changed to hlist_for_each_entry_safe. Fixes: 94dc550a5062 ("gtp: fix an use-after-free in ipv4_pdp_find()") Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-23Merge branch 'introduce-bpf_wq'Alexei Starovoitov
Benjamin Tissoires says: ==================== Introduce bpf_wq This is a followup of sleepable bpf_timer[0]. When discussing sleepable bpf_timer, it was thought that we should give a try to bpf_wq, as the 2 APIs are similar but distinct enough to justify a new one. So here it is. I tried to keep as much as possible common code in kernel/bpf/helpers.c but I couldn't get away with code duplication in kernel/bpf/verifier.c. This series introduces a basic bpf_wq support: - creation is supported - assignment is supported - running a simple bpf_wq is also supported. We will probably need to extend the API further with: - a full delayed_work API (can be piggy backed on top with a correct flag) - bpf_wq_cancel() <- apparently not, this is shooting ourself in the foot - bpf_wq_cancel_sync() (for sleepable programs) - documentation --- For reference, the use cases I have in mind: --- Basically, I need to be able to defer a HID-BPF program for the following reasons (from the aforementioned patch): 1. defer an event: Sometimes we receive an out of proximity event, but the device can not be trusted enough, and we need to ensure that we won't receive another one in the following n milliseconds. So we need to wait those n milliseconds, and eventually re-inject that event in the stack. 2. inject new events in reaction to one given event: We might want to transform one given event into several. This is the case for macro keys where a single key press is supposed to send a sequence of key presses. But this could also be used to patch a faulty behavior, if a device forgets to send a release event. 3. communicate with the device in reaction to one event: We might want to communicate back to the device after a given event. For example a device might send us an event saying that it came back from sleeping state and needs to be re-initialized. Currently we can achieve that by keeping a userspace program around, raise a bpf event, and let that userspace program inject the events and commands. However, we are just keeping that program alive as a daemon for just scheduling commands. There is no logic in it, so it doesn't really justify an actual userspace wakeup. So a kernel workqueue seems simpler to handle. bpf_timers are currently running in a soft IRQ context, this patch series implements a sleppable context for them. Cheers, Benjamin [0] https://lore.kernel.org/all/20240408-hid-bpf-sleepable-v6-0-0499ddd91b94@kernel.org/ Changes in v2: - took previous review into account - mainly dropped BPF_F_WQ_SLEEPABLE - Link to v1: https://lore.kernel.org/r/20240416-bpf_wq-v1-0-c9e66092f842@kernel.org ==================== Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-0-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-23selftests/bpf: wq: add bpf_wq_start() checksBenjamin Tissoires
Allows to test if allocation/free works Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-16-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-23bpf: add bpf_wq_startBenjamin Tissoires
again, copy/paste from bpf_timer_start(). Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-15-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-23selftests/bpf: add checks for bpf_wq_set_callback()Benjamin Tissoires
We assign the callback and set everything up. The actual tests of these callbacks will be done when bpf_wq_start() is available. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-14-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-23bpf: wq: add bpf_wq_set_callback_implBenjamin Tissoires
To support sleepable async callbacks, we need to tell push_async_cb() whether the cb is sleepable or not. The verifier now detects that we are in bpf_wq_set_callback_impl and can allow a sleepable callback to happen. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-13-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-23selftests/bpf: wq: add bpf_wq_init() checksBenjamin Tissoires
Allows to test if allocation/free works Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-12-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-23bpf: wq: add bpf_wq_initBenjamin Tissoires
We need to teach the verifier about the second argument which is declared as void * but which is of type KF_ARG_PTR_TO_MAP. We could have dropped this extra case if we declared the second argument as struct bpf_map *, but that means users will have to do extra casting to have their program compile. We also need to duplicate the timer code for the checking if the map argument is matching the provided workqueue. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-11-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-23selftests/bpf: add bpf_wq testsBenjamin Tissoires
We simply try in all supported map types if we can store/load a bpf_wq. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-10-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-23tcp: Fix Use-After-Free in tcp_ao_connect_initHyunwoo Kim
Since call_rcu, which is called in the hlist_for_each_entry_rcu traversal of tcp_ao_connect_init, is not part of the RCU read critical section, it is possible that the RCU grace period will pass during the traversal and the key will be free. To prevent this, it should be changed to hlist_for_each_entry_safe. Fixes: 7c2ffaf21bd6 ("net/tcp: Calculate TCP-AO traffic keys") Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://lore.kernel.org/r/ZiYu9NJ/ClR8uSkH@v4bel-B760M-AORUS-ELITE-AX Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23neighbour: fix neigh_master_filtered()Eric Dumazet
If we no longer hold RTNL, we must use netdev_master_upper_dev_get_rcu() instead of netdev_master_upper_dev_get(). Fixes: ba0f78069423 ("neighbour: no longer hold RTNL in neigh_dump_info()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240421185753.1808077-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23net: usb: ax88179_178a: stop lying about skb->truesizeEric Dumazet
Some usb drivers try to set small skb->truesize and break core networking stacks. In this patch, I removed one of the skb->truesize overide. I also replaced one skb_clone() by an allocation of a fresh and small skb, to get minimally sized skbs, like we did in commit 1e2c61172342 ("net: cdc_ncm: reduce skb truesize in rx path") Fixes: f8ebb3ac881b ("net: usb: ax88179_178a: Fix packet receiving") Reported-by: shironeko <shironeko@tesaguri.club> Closes: https://lore.kernel.org/netdev/c110f41a0d2776b525930f213ca9715c@tesaguri.club/ Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jose Alonso <joalonsof@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240421193828.1966195-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23ipv4: check for NULL idev in ip_route_use_hint()Eric Dumazet
syzbot was able to trigger a NULL deref in fib_validate_source() in an old tree [1]. It appears the bug exists in latest trees. All calls to __in_dev_get_rcu() must be checked for a NULL result. [1] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 2 PID: 3257 Comm: syz-executor.3 Not tainted 5.10.0-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:fib_validate_source+0xbf/0x15a0 net/ipv4/fib_frontend.c:425 Code: 18 f2 f2 f2 f2 42 c7 44 20 23 f3 f3 f3 f3 48 89 44 24 78 42 c6 44 20 27 f3 e8 5d 88 48 fc 4c 89 e8 48 c1 e8 03 48 89 44 24 18 <42> 80 3c 20 00 74 08 4c 89 ef e8 d2 15 98 fc 48 89 5c 24 10 41 bf RSP: 0018:ffffc900015fee40 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88800f7a4000 RCX: ffff88800f4f90c0 RDX: 0000000000000000 RSI: 0000000004001eac RDI: ffff8880160c64c0 RBP: ffffc900015ff060 R08: 0000000000000000 R09: ffff88800f7a4000 R10: 0000000000000002 R11: ffff88800f4f90c0 R12: dffffc0000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88800f7a4000 FS: 00007f938acfe6c0(0000) GS:ffff888058c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f938acddd58 CR3: 000000001248e000 CR4: 0000000000352ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ip_route_use_hint+0x410/0x9b0 net/ipv4/route.c:2231 ip_rcv_finish_core+0x2c4/0x1a30 net/ipv4/ip_input.c:327 ip_list_rcv_finish net/ipv4/ip_input.c:612 [inline] ip_sublist_rcv+0x3ed/0xe50 net/ipv4/ip_input.c:638 ip_list_rcv+0x422/0x470 net/ipv4/ip_input.c:673 __netif_receive_skb_list_ptype net/core/dev.c:5572 [inline] __netif_receive_skb_list_core+0x6b1/0x890 net/core/dev.c:5620 __netif_receive_skb_list net/core/dev.c:5672 [inline] netif_receive_skb_list_internal+0x9f9/0xdc0 net/core/dev.c:5764 netif_receive_skb_list+0x55/0x3e0 net/core/dev.c:5816 xdp_recv_frames net/bpf/test_run.c:257 [inline] xdp_test_run_batch net/bpf/test_run.c:335 [inline] bpf_test_run_xdp_live+0x1818/0x1d00 net/bpf/test_run.c:363 bpf_prog_test_run_xdp+0x81f/0x1170 net/bpf/test_run.c:1376 bpf_prog_test_run+0x349/0x3c0 kernel/bpf/syscall.c:3736 __sys_bpf+0x45c/0x710 kernel/bpf/syscall.c:5115 __do_sys_bpf kernel/bpf/syscall.c:5201 [inline] __se_sys_bpf kernel/bpf/syscall.c:5199 [inline] __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5199 Fixes: 02b24941619f ("ipv4: use dst hint for ipv4 list receive") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240421184326.1704930-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23net: fix sk_memory_allocated_{add|sub} vs softirqsEric Dumazet
Jonathan Heathcote reported a regression caused by blamed commit on aarch64 architecture. x86 happens to have irq-safe __this_cpu_add_return() and __this_cpu_sub(), but this is not generic. I think my confusion came from "struct sock" argument, because these helpers are called with a locked socket. But the memory accounting is per-proto (and per-cpu after the blamed commit). We might cleanup these helpers later to directly accept a "struct proto *proto" argument. Switch to this_cpu_add_return() and this_cpu_xchg() operations, and get rid of preempt_disable()/preempt_enable() pairs. Fast path becomes a bit faster as a result :) Many thanks to Jonathan Heathcote for his awesome report and investigations. Fixes: 3cd3399dd7a8 ("net: implement per-cpu reserves for memory_allocated") Reported-by: Jonathan Heathcote <jonathan.heathcote@bbc.co.uk> Closes: https://lore.kernel.org/netdev/VI1PR01MB42407D7947B2EA448F1E04EFD10D2@VI1PR01MB4240.eurprd01.prod.exchangelabs.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://lore.kernel.org/r/20240421175248.1692552-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23bpf: allow struct bpf_wq to be embedded in arraymaps and hashmapsBenjamin Tissoires
Currently bpf_wq_cancel_and_free() is just a placeholder as there is no memory allocation for bpf_wq just yet. Again, duplication of the bpf_timer approach Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-9-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-23bpf: add support for KF_ARG_PTR_TO_WORKQUEUEBenjamin Tissoires
Introduce support for KF_ARG_PTR_TO_WORKQUEUE. The kfuncs will use bpf_wq as argument and that will be recognized as workqueue argument by verifier. bpf_wq_kern casting can happen inside kfunc, but using bpf_wq in argument makes life easier for users who work with non-kern type in BPF progs. Duplicate process_timer_func into process_wq_func. meta argument is only needed to ensure bpf_wq_init's workqueue and map arguments are coming from the same map (map_uid logic is necessary for correct inner-map handling), so also amend check_kfunc_args() to match what helpers functions check is doing. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-8-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-23bpf: verifier: bail out if the argument is not a mapBenjamin Tissoires
When a kfunc is declared with a KF_ARG_PTR_TO_MAP, we should have reg->map_ptr set to a non NULL value, otherwise, that means that the underlying type is not a map. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-7-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-23tools: sync include/uapi/linux/bpf.hBenjamin Tissoires
cp include/uapi/linux/bpf.h tools/include/uapi/linux/bpf.h Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-6-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-23bpf: add support for bpf_wq user typeBenjamin Tissoires
Mostly a copy/paste from the bpf_timer API, without the initialization and free, as they will be done in a separate patch. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-5-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-23bpf: replace bpf_timer_cancel_and_free with a generic helperBenjamin Tissoires
Same reason than most bpf_timer* functions, we need almost the same for workqueues. So extract the generic part out of it so bpf_wq_cancel_and_free can reuse it. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-4-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-23bpf: replace bpf_timer_set_callback with a generic helperBenjamin Tissoires
In the same way we have a generic __bpf_async_init(), we also need to share code between timer and workqueue for the set_callback call. We just add an unused flags parameter, as it will be used for workqueues. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-3-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-23bpf: replace bpf_timer_init with a generic helperBenjamin Tissoires
No code change except for the new flags argument being stored in the local data struct. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-2-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-23bpf: make timer data struct more genericBenjamin Tissoires
To be able to add workqueues and reuse most of the timer code, we need to make bpf_hrtimer more generic. There is no code change except that the new struct gets a new u64 flags attribute. We are still below 2 cache lines, so this shouldn't impact the current running codes. The ordering is also changed. Everything related to async callback is now on top of bpf_hrtimer. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240420-bpf_wq-v2-1-6c986a5a741f@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-23Revert "NFSD: Convert the callback workqueue to use delayed_work"Chuck Lever
This commit was a pre-requisite for commit c1ccfcf1a9bf ("NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down"), which has already been reverted. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-23Revert "NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down"Chuck Lever
The reverted commit attempted to enable NFSD to retransmit pending callback operations if an NFS client disconnects, but unintentionally introduces a hazardous behavior regression if the client becomes permanently unreachable while callback operations are still pending. A disconnect can occur due to network partition or if the NFS server needs to force the NFS client to retransmit (for example, if a GSS window under-run occurs). Reverting the commit will make NFSD behave the same as it did in v6.8 and before. Pending callback operations are permanently lost if the client connection is terminated before the client receives them. For some callback operations, this loss is not harmful. However, for CB_RECALL, the loss means a delegation might be revoked unnecessarily. For CB_OFFLOAD, pending COPY operations will never complete unless the NFS client subsequently sends an OFFLOAD_STATUS operation, which the Linux NFS client does not currently implement. These issues still need to be addressed somehow. Reported-by: Dai Ngo <dai.ngo@oracle.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218735 Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-23Merge branch 'selftests-drv-net-support-testing-with-a-remote-system'Jakub Kicinski
Jakub Kicinski says: ==================== selftests: drv-net: support testing with a remote system Implement support for tests which require access to a remote system / endpoint which can generate traffic. This series concludes the "groundwork" for upstream driver tests. I wanted to support the three models which came up in discussions: - SW testing with netdevsim - "local" testing with two ports on the same system in a loopback - "remote" testing via SSH so there is a tiny bit of an abstraction which wraps up how "remote" commands are executed. Otherwise hopefully there's nothing surprising. I'm only adding a ping test. I had a bigger one written but I was worried we'll get into discussing the details of the test itself and how I chose to hack up netdevsim, instead of the test infra... So that test will be a follow up :) v4: https://lore.kernel.org/all/20240418233844.2762396-1-kuba@kernel.org v3: https://lore.kernel.org/all/20240417231146.2435572-1-kuba@kernel.org v2: https://lore.kernel.org/all/20240416004556.1618804-1-kuba@kernel.org v1: https://lore.kernel.org/all/20240412233705.1066444-1-kuba@kernel.org ==================== Link: https://lore.kernel.org/r/20240420025237.3309296-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23selftests: drv-net: add require_XYZ() helpers for validating envJakub Kicinski
Wrap typical checks like whether given command used by the test is available in helpers. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240420025237.3309296-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23selftests: drv-net: add a TCP ping test case (and useful helpers)Jakub Kicinski
More complex tests often have to spawn a background process, like a server which will respond to requests or tcpdump. Add support for creating such processes using the with keyword: with bkg("my-daemon", ..): # my-daemon is alive in this block My initial thought was to add this support to cmd() directly but it runs the command in the constructor, so by the time we __enter__ it's too late to make sure we used "background=True". Second useful helper transplanted from net_helper.sh is wait_port_listen(). The test itself uses socat, which insists on v6 addresses being wrapped in [], it's not the only command which requires this format, so add the wrapped address to env. The hope is to save test code from checking if address is v6. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240420025237.3309296-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23selftests: net: support matching cases by name prefixJakub Kicinski
While writing tests with a lot more cases I got tired of having to jump back and forth to add the name of the test to the ksft_run() list. Most unittest frameworks do some name matching, e.g. assume that functions with names starting with test_ are test cases. Support similar flow in ksft_run(). Let the author list the desired prefixes. globals() need to be passed explicitly, IDK how to work around that. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240420025237.3309296-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>