summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)Author
2024-06-28ice: do not init struct ice_adapter more times than neededPrzemek Kitszel
Allocate and initialize struct ice_adapter object only once per physical card instead of once per port. This is not a big deal by now, but we want to extend this struct more and more in the near future. Our plans include PTP stuff and a devlink instance representing whole-device/physical card. Transactions requiring to be sleep-able (like those doing user (here ice) memory allocation) must be performed with an additional (on top of xarray) mutex. Adding it here removes need to xa_lock() manually. Since this commit is a reimplementation of ice_adapter_get(), a rather new scoped_guard() wrapper for locking is used to simplify the logic. It's worth to mention that xa_insert() use gives us both slot reservation and checks if it is already filled, what simplifies code a tiny bit. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-28ice: Distinguish driver reset and removal for AQ shutdownPiotr Gardocki
Admin queue command for shutdown AQ contains a flag to indicate driver unload. However, the flag is always set in the driver, even for resets. It can cause the firmware to consider driver as unloaded once the PF reset is triggered on all ports of device, which could lead to unexpected results. Add an additional function parameter to functions that shutdown AQ, indicating whether the driver is actually unloading. Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-28ice: Allow different FW API versions based on MAC typePaul Greenwalt
Allow the driver to be compatible with different FW API versions based on the device's MAC type. Currently, E810 is only compatible with one FW API version. Now the driver can be compatible with different FW API versions for both E810 and E830. For example, E810 FW API version is 1.5.0 and E830 is 1.7.0. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-28ice: Check all ice_vsi_rebuild() errors in functionEric Joyner
Check the return value from ice_vsi_rebuild() and prevent the usage of incorrectly configured VSI. Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Eric Joyner <eric.joyner@intel.com> Signed-off-by: Karen Ostrowska <karen.ostrowska@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-28ice: Add get/set hw address for VFs using devlink commandsKarthik Sundaravel
Changing the MAC address of the VFs is currently unsupported via devlink. Add the function handlers to set and get the HW address for the VFs. Signed-off-by: Karthik Sundaravel <ksundara@redhat.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-28net/mlx5e: Approximate IPsec per-SA payload data bytes countLeon Romanovsky
ConnectX devices lack ability to count payload data byte size which is needed for SA to return to libreswan for rekeying. As a solution let's approximate that by decreasing headers size from total size counted by flow steering. The calculation doesn't take into account any other headers which can be in the packet (e.g. IP extensions). Fixes: 5a6cddb89b51 ("net/mlx5e: Update IPsec per SA packets/bytes count") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28net/mlx5e: Present succeeded IPsec SA bytes and packetLeon Romanovsky
IPsec SA statistics presents successfully decrypted and encrypted packet and bytes, and not total handled by this SA. So update the calculation logic to take into account failures. Fixes: 6fb7f9408779 ("net/mlx5e: Connect mlx5 IPsec statistics with XFRM core") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28net/mlx5e: Add mqprio_rl cleanup and free in mlx5e_priv_cleanup()Jianbo Liu
In the cited commit, mqprio_rl cleanup and free are mistakenly removed in mlx5e_priv_cleanup(), and it causes the leakage of host memory and firmware SCHEDULING_ELEMENT objects while changing eswitch mode. So, add them back. Fixes: 0bb7228f7096 ("net/mlx5e: Fix mqprio_rl handling on devlink reload") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28net/mlx5: E-switch, Create ingress ACL when neededChris Mi
Currently, ingress acl is used for three features. It is created only when vport metadata match and prio tag are enabled. But active-backup lag mode also uses it. It is independent of vport metadata match and prio tag. And vport metadata match can be disabled using the following devlink command: # devlink dev param set pci/0000:08:00.0 name esw_port_metadata \ value false cmode runtime If ingress acl is not created, will hit panic when creating drop rule for active-backup lag mode. If always create it, there will be about 5% performance degradation. Fix it by creating ingress acl when needed. If esw_port_metadata is true, ingress acl exists, then create drop rule using existing ingress acl. If esw_port_metadata is false, create ingress acl and then create drop rule. Fixes: 1749c4c51c16 ("net/mlx5: E-switch, add drop rule support to ingress ACL") Signed-off-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28net/mlx5: Use max_num_eqs_24b when setting max_io_eqsDaniel Jurgens
Due a bug in the device max_num_eqs doesn't always reflect a written value. As a result, setting max_io_eqs may not work but appear successful. Instead write max_num_eqs_24b, which reflects correct value. Fixes: 93197c7c509d ("mlx5/core: Support max_io_eqs for a function") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28net/mlx5: Use max_num_eqs_24b capability if setDaniel Jurgens
A new capability with more bits is added. If it's set use that value as the maximum number of EQs available. This cap is also writable by the vhca_resource_manager to allow limiting the number of EQs available to SFs and VFs. Fixes: 93197c7c509d ("mlx5/core: Support max_io_eqs for a function") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28mlxsw: Implement ethtool operation to write to a transceiver module EEPROMIdo Schimmel
Implement the ethtool_ops::set_module_eeprom_by_page operation to allow ethtool to write to a transceiver module EEPROM, in a similar fashion to the ethtool_ops::get_module_eeprom_by_page operation. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-27net: thunderx: Unembed netdev structureBreno Leitao
Embedding net_device into structures prohibits the usage of flexible arrays in the net_device structure. For more details, see the discussion at [1]. Un-embed the net_devices from struct lmac by converting them into pointers, and allocating them dynamically. Use the leverage alloc_netdev() to allocate the net_device object at bgx_lmac_enable(). The free of the device occurs at bgx_lmac_disable(). Do not free_netdevice() if bgx_lmac_enable() fails after lmac->netdev is allocated, since bgx_lmac_disable() is called if bgx_lmac_enable() fails, and lmac->netdev will be freed there (similarly to lmac->dmacs). Link: https://lore.kernel.org/all/20240229225910.79e224cf@kernel.org/ [1] Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20240626173503.87636-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: e3f02f32a050 ("ionic: fix kernel panic due to multi-buffer handling") d9c04209990b ("ionic: Mark error paths in the data path as unlikely") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-27net: mana: Fix possible double free in error handling pathMa Ke
When auxiliary_device_add() returns error and then calls auxiliary_device_uninit(), callback function adev_release calls kfree(madev). We shouldn't call kfree(madev) again in the error handling path. Set 'madev' to NULL. Fixes: a69839d4327d ("net: mana: Add support for auxiliary device") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Link: https://patch.msgid.link/20240625130314.2661257-1-make24@iscas.ac.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-27net: stmmac: dwmac-stm32: stm32: add management of stm32mp25 for stm32Christophe Roullier
Add Ethernet support for STM32MP25. STM32MP25 is STM32 SOC with 2 GMACs instances. GMAC IP version is SNPS 5.3x. GMAC IP configure with 2 RX and 4 TX queue. DMA HW capability register supported RX Checksum Offload Engine supported TX Checksum insertion supported Wake-Up On Lan supported TSO supported Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Marek Vasut <marex@denx.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-26mlxsw: pci: Use fragmented buffersAmit Cohen
WQE (Work Queue Element) includes 3 scatter/gather entries for buffers. The buffer can be split into 3 parts, software should set address and byte count of each part. A previous patch-set used page pool to allocate buffers, to simplify the change, we first used one continuous buffer, which was allocated with order > 0. This patch improves page pool usage to allocate the exact number of pages which are required for packet. As part of init, fill WQE.address[x] and WQE.byte_count* with pages which are allocated from the pool. Fill x entries according to number of scatter/gather entries which are required for maximum packet size. When a packet is received, check the actual size and replace only the used pages. Save bytes for software overhead only as part of the first entry. This change also requires using fragmented SKB, till now all the buffer was in the linear part. Note that 'skb->truesize' is decreased for small packets. For now the maximum buffer size is 3 * PAGE_SIZE which is enough, in case that the driver will support larger MTU, we can use 'order' to allocate more than one page per scatter/gather entry. This change significantly reduces memory consumption of mlxsw driver. The footprint is reduced by 26%. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/ee38898c692e7f644a7f3ea4d33aeddb4dd917d2.1719321422.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-26mlxsw: pci: Store number of scatter/gather entries for maximum packet sizeAmit Cohen
A previous patch-set used page pool for Rx buffers allocations. To simplify the change, we first used page pool for one allocation per packet - one continuous buffer is allocated for each packet. This can be improved by using fragmented buffers, then memory consumption will be significantly reduced. WQE (Work Queue Element) includes up to 3 scatter/gather entries for data. As preparation for fragmented buffer usage, calculate number of scatter/gather entries which are required for packet according to maximum MTU and store it for future use. For now use PAGE_SIZE for each entry, which means that maximum buffer size is 3 * PAGE_SIZE. This is enough for the maximum MTU which is supported in the driver now (10K). Warn in an unlikely case of maximum MTU which requires more than 3 pages, for now this warn should not happen with standard page size (>=4K) and maximum MTU (10K). Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/98c3e3adb7e727e571ac538faf67cef262cec4fc.1719321422.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-26net: Drop explicit initialization of struct i2c_device_id::driver_data to 0Uwe Kleine-König
These drivers don't use the driver_data member of struct i2c_device_id, so don't explicitly initialize this member. This prepares putting driver_data in an anonymous union which requires either no initialization or named designators. But it's also a nice cleanup on its own. While add it, also remove commas after the sentinel entries. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Reviewed-by: Petr Machata <petrm@nvidia.com> # For mlxsw Reviewed-by: Kory Maincent <Kory.maincent@bootlin.com> Reviewed-by: Jeremy Kerr <jk@codeconstruct.com.au> # for mctp-i2c Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20240625083853.2205977-2-u.kleine-koenig@baylibre.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25net: tn40xx: add phylink supportFUJITA Tomonori
This patch adds supports for multiple PHY hardware with phylink. The adapters with TN40xx chips use multiple PHY hardware; AMCC QT2025, TI TLK10232, Aqrate AQR105, and Marvell 88X3120, 88X3310, and MV88E2010. For now, the PCI ID table of this driver enables adapters using only QT2025 PHY. I've tested this driver and the QT2025 PHY driver (SFP+ 10G SR) with Edimax EN-9320 10G adapter. Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Reviewed-by: Hans-Frieder Vogt <hfdevel@gmx.net> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20240623235507.108147-8-fujita.tomonori@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25net: tn40xx: add mdio bus supportFUJITA Tomonori
This patch adds supports for mdio bus. A later path adds PHYLIB support on the top of this. Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Link: https://patch.msgid.link/20240623235507.108147-7-fujita.tomonori@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25net: tn40xx: add basic Rx handlingFUJITA Tomonori
This patch adds basic Rx handling. The Rx logic uses three major data structures; two ring buffers with NIC and one database. One ring buffer is used to send information to NIC about memory to be stored packets to be received. The other is used to get information from NIC about received packets. The database is used to keep the information about DMA mapping. After a packet arrived, the db is used to pass the packet to the network stack. Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Reviewed-by: Hans-Frieder Vogt <hfdevel@gmx.net> Link: https://patch.msgid.link/20240623235507.108147-6-fujita.tomonori@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25net: tn40xx: add basic Tx handlingFUJITA Tomonori
This patch adds device specific structures to initialize the hardware with basic Tx handling. The original driver loads the embedded firmware in the header file. This driver is implemented to use the firmware APIs. The Tx logic uses three major data structures; two ring buffers with NIC and one database. One ring buffer is used to send information about packets to be sent for NIC. The other is used to get information from NIC about packet that are sent. The database is used to keep the information about DMA mapping. After a packet is sent, the db is used to free the resource used for the packet. Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Link: https://patch.msgid.link/20240623235507.108147-5-fujita.tomonori@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25net: tn40xx: add register definesFUJITA Tomonori
This adds several defines to handle registers in Tehuti Networks TN40xx chips for later patches. Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Reviewed-by: Hans-Frieder Vogt <hfdevel@gmx.net> Link: https://patch.msgid.link/20240623235507.108147-4-fujita.tomonori@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25net: tn40xx: add pci driver for Tehuti Networks TN40xx chipsFUJITA Tomonori
This just adds the scaffolding for an ethernet driver for Tehuti Networks TN40xx chips. Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20240623235507.108147-3-fujita.tomonori@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25gve: Add flow steering ethtool supportJeroen de Borst
Implement the ethtool commands that can be used to configure and query flow-steering rules. A large part of this change consists of translating the ethtool representation of 'ntuples' to our internal gve_flow_rule and vice-versa in the new created gve_flow_rule.c Considering the possible large amount of flow rules, the driver doesn't store all the rules locally. When the user runs 'ethtool -n <nic>' to check the registered rules, the driver will send adminq command to query a limited amount of rules/rule ids(that filled in a 4096 bytes dma memory) at a time as a cache for the ethtool queries. The adminq query commands will be repeated for several times until the ethtool has queried all the needed rules. Signed-off-by: Jeroen de Borst <jeroendb@google.com> Co-developed-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240625001232.1476315-6-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25gve: Add flow steering adminq commandsJeroen de Borst
Add new adminq commands for the driver to configure and query flow rules that are stored in the device. Flow steering rules are assigned with a location that determines the relative order of the rules. Flow rules can run up to an order of millions. In such cases, storing a full copy of the rules in the driver to prepare for the ethtool query is infeasible while querying them from the device is better. That needs to be optimized too so that we don't send a lot of adminq commands. The solution here is to store a limited number of rules/rule ids in the driver in a cache. Use dma_pool to allocate 4k bytes which lets device write at most 46 flow rules(4096/88) or 1024 rule ids(4096/4) at a time. For configuring flow rules, there are 3 sub-commands: - ADD which adds a rule at the location supplied - DEL which deletes the rule at the location supplied - RESET which clears all currently active rules in the device For querying flow rules, there are also 3 sub-commands: - QUERY_RULES corresponds to ETHTOOL_GRXCLSRULE. It fills the rules in the allocated cache after querying the device - QUERY_RULES_IDS corresponds to ETHTOOL_GRXCLSRLALL. It fills the rule_ids in the allocated cache after querying the device - QUERY_RULES_STATS corresponds to ETHTOOL_GRXCLSRLCNT. It queries the device's current flow rule number and the supported max flow rule limit Signed-off-by: Jeroen de Borst <jeroendb@google.com> Co-developed-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240625001232.1476315-5-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25gve: Add flow steering device optionJeroen de Borst
Add a new device option to signal to the driver that the device supports flow steering. This device option also carries the maximum number of flow steering rules that the device can store. Signed-off-by: Jeroen de Borst <jeroendb@google.com> Co-developed-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240625001232.1476315-4-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25gve: Add adminq extended commandJeroen de Borst
The adminq command is limited to 64 bytes per entry and it's 56 bytes for the command itself at maximum. To support larger commands, we need to dma_alloc a separate memory to put the command in that memory and send the dma memory address instead of the actual command. Introduce an extended adminq command to wrap the real command with the inner opcode and the allocated dma memory address specified. Once the device receives it, it can get the real command from the given dma memory address. As designed with the device, all the extended commands will use inner opcode larger than 0xFF. Signed-off-by: Jeroen de Borst <jeroendb@google.com> Co-developed-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240625001232.1476315-3-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25gve: Add adminq mutex lockZiwei Xiao
We were depending on the rtnl_lock to make sure there is only one adminq command running at a time. But some commands may take too long to hold the rtnl_lock, such as the upcoming flow steering operations. For such situations, it can temporarily drop the rtnl_lock, and replace it for these operations with a new adminq lock, which can ensure the adminq command execution to be thread-safe. Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240625001232.1476315-2-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25ravb: Add MII support for R-Car V4MGeert Uytterhoeven
All EtherAVB instances on R-Car Gen3/Gen4 SoCs support the RGMII interface. In addition, the first two EtherAVB instances on R-Car V4M also support the MII interface, but this is not yet supported by the driver. Add support for MII on R-Car Gen4 by adding an R-Car Gen4-specific EMAC initialization function that selects the MII clock instead of the RGMII clock when the PHY interface is MII. Note that all implementations of EtherAVB on R-Car Gen4 SoCs have the APSR register, but only MII-capable instances are documented to have the MIISELECT bit, which has a documented value of zero when reserved. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://patch.msgid.link/3a21d1d6680864aa85afff9260234c2b8054020a.1719234830.git.geert+renesas@glider.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25ravb: Improve ravb_hw_info instance orderGeert Uytterhoeven
Move ravb_gen2_hw_info before ravb_gen3_hw_info to match ravb_match_table[] order. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://patch.msgid.link/a76febe3737e26365a784e9193da9363f22aa550.1719234830.git.geert+renesas@glider.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25ionic: use dev_consume_skb_any outside of napiShannon Nelson
If we're not in a NAPI softirq context, we need to be careful about how we call napi_consume_skb(), specifically we need to call it with budget==0 to signal to it that we're not in a safe context. This was found while running some configuration stress testing of traffic and a change queue config loop running, and this curious note popped out: [ 4371.402645] BUG: using smp_processor_id() in preemptible [00000000] code: ethtool/20545 [ 4371.402897] caller is napi_skb_cache_put+0x16/0x80 [ 4371.403120] CPU: 25 PID: 20545 Comm: ethtool Kdump: loaded Tainted: G OE 6.10.0-rc3-netnext+ #8 [ 4371.403302] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 01/23/2021 [ 4371.403460] Call Trace: [ 4371.403613] <TASK> [ 4371.403758] dump_stack_lvl+0x4f/0x70 [ 4371.403904] check_preemption_disabled+0xc1/0xe0 [ 4371.404051] napi_skb_cache_put+0x16/0x80 [ 4371.404199] ionic_tx_clean+0x18a/0x240 [ionic] [ 4371.404354] ionic_tx_cq_service+0xc4/0x200 [ionic] [ 4371.404505] ionic_tx_flush+0x15/0x70 [ionic] [ 4371.404653] ? ionic_lif_qcq_deinit.isra.23+0x5b/0x70 [ionic] [ 4371.404805] ionic_txrx_deinit+0x71/0x190 [ionic] [ 4371.404956] ionic_reconfigure_queues+0x5f5/0xff0 [ionic] [ 4371.405111] ionic_set_ringparam+0x2e8/0x3e0 [ionic] [ 4371.405265] ethnl_set_rings+0x1f1/0x300 [ 4371.405418] ethnl_default_set_doit+0xbb/0x160 [ 4371.405571] genl_family_rcv_msg_doit+0xff/0x130 [...] I found that ionic_tx_clean() calls napi_consume_skb() which calls napi_skb_cache_put(), but before that last call is the note /* Zero budget indicate non-NAPI context called us, like netpoll */ and DEBUG_NET_WARN_ON_ONCE(!in_softirq()); Those are pretty big hints that we're doing it wrong. We can pass a context hint down through the calls to let ionic_tx_clean() know what we're doing so it can call napi_consume_skb() correctly. Fixes: 386e69865311 ("ionic: Make use napi_consume_skb") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20240624175015.4520-1-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25net: ethernet: mtk_eth_soc: ppe: prevent ppe update for non-mtk devicesElad Yifee
Introduce an additional validation to ensure that the PPE index is modified exclusively for mtk_eth ingress devices. This primarily addresses the issue related to WED operation with multiple PPEs. Fixes: dee4dd10c79a ("net: ethernet: mtk_eth_soc: ppe: add support for multiple PPEs") Signed-off-by: Elad Yifee <eladwf@gmail.com> Link: https://lore.kernel.org/r/20240623175113.24437-1-eladwf@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-25net: macb: Add ARP support to WOLVineeth Karumanchi
Extend wake-on LAN support with an ARP packet. Currently, if PHY supports WOL, ethtool ignores the modes supported by MACB. This change extends the WOL modes with MACB supported modes. Advertise wake-on LAN supported modes by default without relying on dt node. By default, wake-on LAN will be in disabled state. Using ethtool, users can enable/disable or choose packet types. For wake-on LAN via ARP, ensure the IP address is assigned and report an error otherwise. Co-developed-by: Harini Katakam <harini.katakam@amd.com> Signed-off-by: Harini Katakam <harini.katakam@amd.com> Signed-off-by: Vineeth Karumanchi <vineeth.karumanchi@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> Tested-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> # on SAMA7G5 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-25net: macb: Enable queue disableVineeth Karumanchi
Enable queue disable for Versal devices. Signed-off-by: Vineeth Karumanchi <vineeth.karumanchi@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-25net: macb: queue tie-off or disable during WOL suspendVineeth Karumanchi
When GEM is used as a wake device, it is not mandatory for the RX DMA to be active. The RX engine in IP only needs to receive and identify a wake packet through an interrupt. The wake packet is of no further significance; hence, it is not required to be copied into memory. By disabling RX DMA during suspend, we can avoid unnecessary DMA processing of any incoming traffic. During suspend, perform either of the below operations: - tie-off/dummy descriptor: Disable unused queues by connecting them to a looped descriptor chain without free slots. - queue disable: The newer IP version allows disabling individual queues. Co-developed-by: Harini Katakam <harini.katakam@amd.com> Signed-off-by: Harini Katakam <harini.katakam@amd.com> Signed-off-by: Vineeth Karumanchi <vineeth.karumanchi@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> Tested-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> # on SAMA7G5 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-25ibmvnic: Add tx check to prevent skb leakNick Child
Below is a summary of how the driver stores a reference to an skb during transmit: tx_buff[free_map[consumer_index]]->skb = new_skb; free_map[consumer_index] = IBMVNIC_INVALID_MAP; consumer_index ++; Where variable data looks like this: free_map == [4, IBMVNIC_INVALID_MAP, IBMVNIC_INVALID_MAP, 0, 3] consumer_index^ tx_buff == [skb=null, skb=<ptr>, skb=<ptr>, skb=null, skb=null] The driver has checks to ensure that free_map[consumer_index] pointed to a valid index but there was no check to ensure that this index pointed to an unused/null skb address. So, if, by some chance, our free_map and tx_buff lists become out of sync then we were previously risking an skb memory leak. This could then cause tcp congestion control to stop sending packets, eventually leading to ETIMEDOUT. Therefore, add a conditional to ensure that the skb address is null. If not then warn the user (because this is still a bug that should be patched) and free the old pointer to prevent memleak/tcp problems. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-23octeontx2-pf: Fix coverity and klockwork issues in octeon PF driverRatheesh Kannoth
Fix unintended sign extension and klockwork issues. These are not real issue but for sanity checks. Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-23ice: Rebuild TC queues on VSI queue reconfigurationJan Sokolowski
TC queues needs to be correctly updated when the number of queues on a VSI is reconfigured, so netdev's queue and TC settings will be dynamically adjusted and could accurately represent the underlying hardware state after changes to the VSI queue counts. Fixes: 0754d65bd4be ("ice: Add infrastructure for mqprio support via ndo_setup_tc") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Karen Ostrowska <karen.ostrowska@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-23Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: prepare representor for SF support Michal Swiatkowski says: This is a series to prepare port representor for supporting also subfunctions. We need correct devlink locking and the possibility to update parent VSI after port representor is created. Refactor how devlink lock is taken to suite the subfunction use case. VSI configuration needs to be done after port representor is created. Port representor needs only allocated VSI. It doesn't need to be configured before. VSI needs to be reconfigured when update function is called. The code for this patchset was split from (too big) patchset [1]. [1] https://lore.kernel.org/netdev/20240213072724.77275-1-michal.swiatkowski@linux.intel.com/ --- Originally from https://lore.kernel.org/netdev/20240605-next-2024-06-03-intel-next-batch-v2-0-39c23963fa78@intel.com/ Changes: - delete ice_repr_get_by_vsi() from header - rephrase commit message in moving devlink locking ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-22net: xilinx: axienet: Enable multicast by defaultSean Anderson
We support multicast addresses, so enable it by default. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-22ibmvnic: Free any outstanding tx skbs during scrq resetNick Child
There are 2 types of outstanding tx skb's: Type 1: Packets that are sitting in the drivers ind_buff that are waiting to be batch sent to the NIC. During a device reset, these are freed with a call to ibmvnic_tx_scrq_clean_buffer() Type 2: Packets that have been sent to the NIC and are awaiting a TX completion IRQ. These are free'd during a reset with a call to clean_tx_pools() During any reset which requires us to free the tx irq, ensure that the Type 2 skb references are freed. Since the irq is released, it is impossible for the NIC to inform of any completions. Furthermore, later in the reset process is a call to init_tx_pools() which marks every entry in the tx pool as free (ie not outstanding). So if the driver is to make a call to init_tx_pools(), it must first be sure that the tx pool is empty of skb references. This issue was discovered by observing the following in the logs during EEH testing: TX free map points to untracked skb (tso_pool 0 idx=4) TX free map points to untracked skb (tso_pool 0 idx=5) TX free map points to untracked skb (tso_pool 1 idx=36) Fixes: 65d6470d139a ("ibmvnic: clean pending indirect buffs during reset") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21ice: update representor when VSI is readyMichal Swiatkowski
In case of reset of VF VSI can be reallocated. To handle this case it should be properly updated. Reload representor as vsi->vsi_num can be different than the one stored when representor was created. Instead of only changing antispoof do whole VSI configuration for eswitch. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-21ice: move VSI configuration outside repr setupMichal Swiatkowski
It is needed because subfunction port representor shouldn't configure the source VSI during representor creation. Move the code to separate function and call it only in case the VF port representor is being created. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-21ice: move devlink locking outside the port creationMichal Swiatkowski
In case of subfunction lock will be taken for whole port creation and removing. Do the same in VF case. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-21ice: store representor ID in bridge portMichal Swiatkowski
It is used to get representor structure during cleaning. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-21mlxsw: spectrum_buffers: Fix memory corruptions on Spectrum-4 systemsIdo Schimmel
The following two shared buffer operations make use of the Shared Buffer Status Register (SBSR): # devlink sb occupancy snapshot pci/0000:01:00.0 # devlink sb occupancy clearmax pci/0000:01:00.0 The register has two masks of 256 bits to denote on which ingress / egress ports the register should operate on. Spectrum-4 has more than 256 ports, so the register was extended by cited commit with a new 'port_page' field. However, when filling the register's payload, the driver specifies the ports as absolute numbers and not relative to the first port of the port page, resulting in memory corruptions [1]. Fix by specifying the ports relative to the first port of the port page. [1] BUG: KASAN: slab-use-after-free in mlxsw_sp_sb_occ_snapshot+0xb6d/0xbc0 Read of size 1 at addr ffff8881068cb00f by task devlink/1566 [...] Call Trace: <TASK> dump_stack_lvl+0xc6/0x120 print_report+0xce/0x670 kasan_report+0xd7/0x110 mlxsw_sp_sb_occ_snapshot+0xb6d/0xbc0 mlxsw_devlink_sb_occ_snapshot+0x75/0xb0 devlink_nl_sb_occ_snapshot_doit+0x1f9/0x2a0 genl_family_rcv_msg_doit+0x20c/0x300 genl_rcv_msg+0x567/0x800 netlink_rcv_skb+0x170/0x450 genl_rcv+0x2d/0x40 netlink_unicast+0x547/0x830 netlink_sendmsg+0x8d4/0xdb0 __sys_sendto+0x49b/0x510 __x64_sys_sendto+0xe5/0x1c0 do_syscall_64+0xc1/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f [...] Allocated by task 1: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 copy_verifier_state+0xbc2/0xfb0 do_check_common+0x2c51/0xc7e0 bpf_check+0x5107/0x9960 bpf_prog_load+0xf0e/0x2690 __sys_bpf+0x1a61/0x49d0 __x64_sys_bpf+0x7d/0xc0 do_syscall_64+0xc1/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 1: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 poison_slab_object+0x109/0x170 __kasan_slab_free+0x14/0x30 kfree+0xca/0x2b0 free_verifier_state+0xce/0x270 do_check_common+0x4828/0xc7e0 bpf_check+0x5107/0x9960 bpf_prog_load+0xf0e/0x2690 __sys_bpf+0x1a61/0x49d0 __x64_sys_bpf+0x7d/0xc0 do_syscall_64+0xc1/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: f8538aec88b4 ("mlxsw: Add support for more than 256 ports in SBSR register") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21mlxsw: pci: Fix driver initialization with Spectrum-4Ido Schimmel
Cited commit added support for a new reset flow ("all reset") which is deeper than the existing reset flow ("software reset") and allows the device's PCI firmware to be upgraded. In the new flow the driver first tells the firmware that "all reset" is required by issuing a new reset command (i.e., MRSR.command=6) and then triggers the reset by having the PCI core issue a secondary bus reset (SBR). However, due to a race condition in the device's firmware the device is not always able to recover from this reset, resulting in initialization failures [1]. New firmware versions include a fix for the bug and advertise it using a new capability bit in the Management Capabilities Mask (MCAM) register. Avoid initialization failures by reading the new capability bit and triggering the new reset flow only if the bit is set. If the bit is not set, trigger a normal PCI hot reset by skipping the call to the Management Reset and Shutdown Register (MRSR). Normal PCI hot reset is weaker than "all reset", but it results in a fully operational driver and allows users to flash a new firmware, if they want to. [1] mlxsw_spectrum4 0000:01:00.0: not ready 1023ms after bus reset; waiting mlxsw_spectrum4 0000:01:00.0: not ready 2047ms after bus reset; waiting mlxsw_spectrum4 0000:01:00.0: not ready 4095ms after bus reset; waiting mlxsw_spectrum4 0000:01:00.0: not ready 8191ms after bus reset; waiting mlxsw_spectrum4 0000:01:00.0: not ready 16383ms after bus reset; waiting mlxsw_spectrum4 0000:01:00.0: not ready 32767ms after bus reset; waiting mlxsw_spectrum4 0000:01:00.0: not ready 65535ms after bus reset; giving up mlxsw_spectrum4 0000:01:00.0: PCI function reset failed with -25 mlxsw_spectrum4 0000:01:00.0: cannot register bus device mlxsw_spectrum4: probe of 0000:01:00.0 failed with error -25 Fixes: f257c73e5356 ("mlxsw: pci: Add support for new reset flow") Reported-by: Maksym Yaremchuk <maksymy@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Maksym Yaremchuk <maksymy@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21net: ethernet: rtsn: Add support for Renesas Ethernet-TSNNiklas Söderlund
Add initial support for Renesas Ethernet-TSN End-station device of R-Car V4H. The Ethernet End-station can connect to an Ethernet network using a 10 Mbps, 100 Mbps, or 1 Gbps full-duplex link via MII/GMII/RMII/RGMII. Depending on the connected PHY. The driver supports Rx checksum and offload and hardware timestamps. While full power management and suspend/resume is not yet supported the driver enables runtime PM in order to enable the module clock. While explicit clock management using clk_enable() would suffice for the supported SoC, the module could be reused on SoCs where the module is part of a power domain. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>