summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox
AgeCommit message (Collapse)Author
2023-06-09net/mlx5: Update SRIOV enable/disable to handle EC/VFsDaniel Jurgens
Previously on the embedded CPU platform SRIOV was never enabled/disabled via mlx5_core_sriov_configure. Host VF updates are provided by an event handler. Now in the disable flow it must be known if this is a disable due to driver unload or SRIOV detach, or if the user updated the number of VFs. If due to change in the number of VFs only wait for the pages of ECVFs. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09net/mlx5: Query correct caps for min msix vectorsDaniel Jurgens
The VFs on the host and the embedded CPU platform share function numbers. Set the ec_vf_function field to query the caps for the correct function. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09net/mlx5: Use correct vport when restoring GUIDsDaniel Jurgens
Prior to enabling EC VF functionality the vport number and function ID were always the same. That's not the case now. Use the correct vport number to modify the HCA vport context. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09net/mlx5: Add new page type for EC VF pagesDaniel Jurgens
When the embedded cpu supports SRIOV it can be enabled and disabled independently from the host SRIOV. Track the pages separately so we can properly wait for returned VF pages. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09net/mlx5: Add/remove peer miss rules for EC VFsDaniel Jurgens
Add and remove the peer miss rules for EC VFs. It's possible that there are different amounts of total VFs per function so only create rules for the minimum number of max VFs. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09net/mlx5: Add management of EC VF vportsDaniel Jurgens
Add init, load, unload, and cleanup of the EC VF vports. This includes changes in how eswitch SRIOV is managed. Previous on an embedded CPU platform the number of VFs provided when enabling the eswitch was always 0, host VFs vports are handled in the eswitch functions change event handler. Now track the number of EC VFs as well, so they can be handled properly in the enable/disable flows. There are only 3 marks available for use in xarrays, all 3 were already in use for this use case. EC VF vports are in a known range so we can access them by index instead of marks. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09net/mlx5: Update vport caps query/set for EC VFsDaniel Jurgens
These functions are for query/set by vport, there was an underlying assumption that vport was equal to function ID. That's not the case for EC VF functions. Set the ec_vf_function bit accordingly. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09net/mlx5: Enable devlink port for embedded cpu VF vportsDaniel Jurgens
Enable creation of a devlink port for EC VF vports. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09net/mlx5: Simplify unload all rep codeDaniel Jurgens
Instead of using type specific iterators which are only used in one place just traverse the xarray. It will provide suitable ordering based on the vport numbers. This will also eliminate the need for changes here when new types are added. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-08Merge tag 'mlx5-updates-2023-06-06' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-06-06 1) Support 4 ports VF LAG, part 2/2 2) Few extra trivial cleanup patches Shay Drory Says: ================ Support 4 ports VF LAG, part 2/2 This series continues the series[1] "Support 4 ports VF LAG, part1/2". This series adds support for 4 ports VF LAG (single FDB E-Switch). This series of patches refactoring LAG code that make assumptions about VF LAG supporting only two ports and then enable 4 ports VF LAG. Patch 1: - Fix for ib rep code Patches 2-5: - Refactors LAG layer. Patches 6-7: - Block LAG types which doesn't support 4 ports. Patch 8: - Enable 4 ports VF LAG. This series specifically allows HCAs with 4 ports to create a VF LAG with only 4 ports. It is not possible to create a VF LAG with 2 or 3 ports using HCAs that have 4 ports. Currently, the Merged E-Switch feature only supports HCAs with 2 ports. However, upcoming patches will introduce support for HCAs with 4 ports. In order to activate VF LAG a user can execute: devlink dev eswitch set pci/0000:08:00.0 mode switchdev devlink dev eswitch set pci/0000:08:00.1 mode switchdev devlink dev eswitch set pci/0000:08:00.2 mode switchdev devlink dev eswitch set pci/0000:08:00.3 mode switchdev ip link add name bond0 type bond ip link set dev bond0 type bond mode 802.3ad ip link set dev eth2 master bond0 ip link set dev eth3 master bond0 ip link set dev eth4 master bond0 ip link set dev eth5 master bond0 Where eth2, eth3, eth4 and eth5 are net-interfaces of pci/0000:08:00.0 pci/0000:08:00.1 pci/0000:08:00.2 pci/0000:08:00.3 respectively. User can verify LAG state and type via debugfs: /sys/kernel/debug/mlx5/0000\:08\:00.0/lag/state /sys/kernel/debug/mlx5/0000\:08\:00.0/lag/type [1] https://lore.kernel.org/netdev/20230601060118.154015-1-saeed@kernel.org/T/#mf1d2083780970ba277bfe721554d4925f03f36d1 ================ * tag 'mlx5-updates-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: simplify condition after napi budget handling change mlx5/core: E-Switch, Allocate ECPF vport if it's an eswitch manager net/mlx5: Skip inline mode check after mlx5_eswitch_enable_locked() failure net/mlx5e: TC, refactor access to hash key net/mlx5e: Remove RX page cache leftovers net/mlx5e: Expose catastrophic steering error counters net/mlx5: Enable 4 ports VF LAG net/mlx5: LAG, block multiport eswitch LAG in case ldev have more than 2 ports net/mlx5: LAG, block multipath LAG in case ldev have more than 2 ports net/mlx5: LAG, change mlx5_shared_fdb_supported() to static net/mlx5: LAG, generalize handling of shared FDB net/mlx5: LAG, check if all eswitches are paired for shared FDB {net/RDMA}/mlx5: introduce lag_for_each_peer RDMA/mlx5: Free second uplink ib port ==================== Link: https://lore.kernel.org/r/20230607210410.88209-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-08mlxsw: spectrum_nve_vxlan: Fix unsupported flag regressionIdo Schimmel
The recently added 'VXLAN_F_LOCALBYPASS' flag is set by default on VXLAN devices and denotes a behavior that is irrelevant for the hardware data path. Add it to the lists of IPv4 and IPv6 supported flags to avoid rejecting offload of VXLAN devices which have this flag set. Fixes: 69474a8a5837 ("net: vxlan: Add nolocalbypass option to vxlan.") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/5533e63643bf719bbe286fef60f749c9cad35005.1686139716.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07net/mlx5e: simplify condition after napi budget handling changeJakub Kicinski
Since recent commit budget can't be 0 here. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-07mlx5/core: E-Switch, Allocate ECPF vport if it's an eswitch managerBodong Wang
Eswitch vport is needed for eswitch manager when creating LAG, to create egress rules. However, this was not handled when ECPF is an eswitch manager. Signed-off-by: Bodong Wang <bodong@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-07net/mlx5: Skip inline mode check after mlx5_eswitch_enable_locked() failureJiri Pirko
Commit bffaa916588e ("net/mlx5: E-Switch, Add control for inline mode") added inline mode checking to esw_offloads_start() with a warning printed out in case there is a problem. Tne inline mode checking was done even after mlx5_eswitch_enable_locked() call failed, which is pointless. Later on, commit 8c98ee77d911 ("net/mlx5e: E-Switch, Add extack messages to devlink callbacks") converted the error/warning prints to extack setting, which caused that the inline mode check error to overwrite possible previous extack message when mlx5_eswitch_enable_locked() failed. User then gets confusing error message. Fix this by skipping check of inline mode after mlx5_eswitch_enable_locked() call failed. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-07net/mlx5e: TC, refactor access to hash keyOz Shlomo
Currently, a temp object is filled and used as a key for rhashtable_lookup. Lookups will only works while key remains the first attribute in the relevant rhashtable node object. Fix this by passing a key, instead of a object containing the key. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-07net/mlx5e: Remove RX page cache leftoversTariq Toukan
Remove unused definitions left after the removal of the RX page cache feature. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-07net/mlx5e: Expose catastrophic steering error countersLama Kayal
Add generated_pkt_steering_fail and handled_pkt_steering_fail to devlink heatlth reporter. generated_pkt_steering_fail indicates the number of packets dropped due to illegal steering operation within the vport steering domain. handled_pkt_steering_fail indicates the number of packets dropped due to illegal steering operation, originated by the vport. Also, update devlink reporter functionality documentation with the newly exposed counters. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-07net/mlx5: Enable 4 ports VF LAGShay Drory
Now, after all preparation are done, enable 4 ports VF LAG Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-07net/mlx5: LAG, block multiport eswitch LAG in case ldev have more than 2 portsShay Drory
multiport eswitch LAG is not supported over more than two ports. Add a check in order to block multiport eswitch LAG over such devices. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-07net/mlx5: LAG, block multipath LAG in case ldev have more than 2 portsShay Drory
multipath LAG is not supported over more than two ports. Add a check in order to block multipath LAG over such configurations. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-07net/mlx5: LAG, change mlx5_shared_fdb_supported() to staticShay Drory
mlx5_shared_fdb_supported() is used only in a single file. Change the function to be static. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-07net/mlx5: LAG, generalize handling of shared FDBShay Drory
Shared FDB handling is using the assumption that shared FDB can only be created from two devices. In order to support shared FDB of more than two devices, iterate over all LAG ports instead of hard coding only the first two LAG ports whenever handling shared FDB. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-07net/mlx5: LAG, check if all eswitches are paired for shared FDBShay Drory
Shared FDB LAG can only work if all eswitches are paired. Also, whenever two eswitches are paired, devcom is marked as ready. Therefore, in case of device with two eswitches, checking devcom was sufficient. However, this is not correct for device with more than two eswitches, which will be introduced in downstream patch. Hence, check all eswitches are paired explicitly. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-07{net/RDMA}/mlx5: introduce lag_for_each_peerShay Drory
Introduce a generic APIs to iterate over all the devices which are part of the LAG. This API replace mlx5_lag_get_peer_mdev() which retrieve only a single peer device from the lag. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-05Merge tag 'mlx5-updates-2023-05-31' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-05-31 net/mlx5: Support 4 ports VF LAG, part 1/2 This series continues the series[1] "Support 4 ports HCAs LAG mode" by Mark Bloch. This series adds support for 4 ports VF LAG (single FDB E-Switch). This series of patches focuses on refactoring different sections of the code that make assumptions about VF LAG supporting only two ports. For instance, it assumes that each device can only have one peer. Patches 1-5: - Refactor ETH handling of TC rules of eswitches with peers. Patch 6: - Refactors peer miss group table. Patches 7-9: - Refactor single FDB E-Switch creation. Patch 10: - Refactor the DR layer. Patches 11-14: - Refactors devcom layer. Next series will refactor LAG layer and enable 4 ports VF LAG. This series specifically allows HCAs with 4 ports to create a VF LAG with only 4 ports. It is not possible to create a VF LAG with 2 or 3 ports using HCAs that have 4 ports. Currently, the Merged E-Switch feature only supports HCAs with 2 ports. However, upcoming patches will introduce support for HCAs with 4 ports. In order to activate VF LAG a user can execute: devlink dev eswitch set pci/0000:08:00.0 mode switchdev devlink dev eswitch set pci/0000:08:00.1 mode switchdev devlink dev eswitch set pci/0000:08:00.2 mode switchdev devlink dev eswitch set pci/0000:08:00.3 mode switchdev ip link add name bond0 type bond ip link set dev bond0 type bond mode 802.3ad ip link set dev eth2 master bond0 ip link set dev eth3 master bond0 ip link set dev eth4 master bond0 ip link set dev eth5 master bond0 Where eth2, eth3, eth4 and eth5 are net-interfaces of pci/0000:08:00.0 pci/0000:08:00.1 pci/0000:08:00.2 pci/0000:08:00.3 respectively. User can verify LAG state and type via debugfs: /sys/kernel/debug/mlx5/0000\:08\:00.0/lag/state /sys/kernel/debug/mlx5/0000\:08\:00.0/lag/type [1] https://lore.kernel.org/netdev/20220510055743.118828-1-saeedm@nvidia.com/ * tag 'mlx5-updates-2023-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Devcom, extend mlx5_devcom_send_event to work with more than two devices net/mlx5: Devcom, introduce devcom_for_each_peer_entry net/mlx5: E-switch, mark devcom as not ready when all eswitches are unpaired net/mlx5: Devcom, Rename paired to ready net/mlx5: DR, handle more than one peer domain net/mlx5: E-switch, generalize shared FDB creation net/mlx5: E-switch, Handle multiple master egress rules net/mlx5: E-switch, refactor FDB miss rule add/remove net/mlx5: E-switch, enlarge peer miss group table net/mlx5e: Handle offloads flows per peer net/mlx5e: en_tc, re-factor query route port net/mlx5e: rep, store send to vport rules per peer net/mlx5e: tc, Refactor peer add/del flow net/mlx5e: en_tc, Extend peer flows to a list ==================== Link: https://lore.kernel.org/r/20230602191301.47004-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05mlxsw: spectrum_router: Do not query MAX_VRS on each iterationPetr Machata
MLXSW_CORE_RES_GET involves a call to spectrum_core, a separate module. Instead of making the call on every iteration, cache it up front, and use the value. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05mlxsw: spectrum_router: Do not query MAX_RIFS on each iterationPetr Machata
MLXSW_CORE_RES_GET involves a call to spectrum_core, a separate module. Instead of making the call on every iteration, cache it up front, and use the value. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05mlxsw: spectrum_router: Use extack in mlxsw_sp~_rif_ipip_lb_configure()Petr Machata
In commit 26029225d992 ("mlxsw: spectrum_router: Propagate extack further"), the mlxsw_sp_rif_ops.configure callback got a new argument, extack. However the callbacks that deal with tunnel configuration, mlxsw_sp1_rif_ipip_lb_configure() and mlxsw_sp2_rif_ipip_lb_configure(), were never updated to pass the parameter further. Do that now. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05mlxsw: spectrum_router: Clarify a commentPetr Machata
"Reserved for X" usually means that only X is supposed to use a given object. Here, it is used in the sense that X should consider the object "reserved", as in "restricted". Replace the comment simply by "X", with the implication that that's where the field is used. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-02net/mlx5: Devcom, extend mlx5_devcom_send_event to work with more than two ↵Shay Drory
devices mlx5_devcom_send_event is used to send event from one eswitch to the other. In other words, only one event is sent, which means, no error mechanism is needed. However, In case devcom have more than two eswitches, a proper error mechanism is needed. Hence, in case of error, devcom will perform the error unwind, since devcom knows how many events were successful. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5: Devcom, introduce devcom_for_each_peer_entrySaeed Mahameed
Introduce generic APIs which will retrieve all peers. This API replace mlx5_devcom_get/release_peer_data which retrieve only a single peer. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5: E-switch, mark devcom as not ready when all eswitches are unpairedShay Drory
Whenever an eswitch is unpaired with another, the driver mark devcom as not ready. While this is correct in case we are pairing only two eswitches, in order to support pairing of more than two eswitches, driver need to mark devcom as not ready only when all eswitches are unpaired. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5: Devcom, Rename paired to readyShay Drory
In downstream patch devcom will provide support for more than two devices. The term 'paired' will be renamed as 'ready' to convey a more accurate meaning. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5: DR, handle more than one peer domainShay Drory
Currently, DR domain is using the assumption that each domain can only have a single peer. In order to support VF LAG of more then two ports, expand peer domain to use an array of peers, and align the code accordingly. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5: E-switch, generalize shared FDB creationShay Drory
Shared FDB creation is hard coded for only two eswitches. Generalize shared FDB creation so that any number of eswitches could create shared FDB. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5: E-switch, Handle multiple master egress rulesShay Drory
Currently, whenever a shared FDB is created, the slave eswitch is creating master egress rule to the master eswitch. In order to support more than two ports, which means there will be more than one slave eswitch, enlarge bounce_rule, which is used to create master egress rule, to an xarray. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5: E-switch, refactor FDB miss rule add/removeShay Drory
Currently, E-switch FDB have a single peer miss rule. In order to support more than one peer, refactor E-switch FDB to have peer miss rule per peer, and change the code to add/remove a rule from specific peer. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5: E-switch, enlarge peer miss group tableShay Drory
There is an implicit assumption that peer miss group table require to handle only a single peer. Also, there is an assumption that total_vports of the master is greater or equal to the total_vports of each peer. Change the code to support peer miss group for more than one peer. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5e: Handle offloads flows per peerShay Drory
Currently, E-switch offloads table have a list of all flows that create a peer_flow over the peer eswitch. In order to support more than one peer, extend E-switch offloads table peer_flow to hold an array of lists, where each peer have dedicate index via mlx5_get_dev_index(). Thereafter, extend original flow to hold an array of peers as well. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5e: en_tc, re-factor query route portMark Bloch
query for peer esw outside of if scope. This is preparation for query route port over multiple peers. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5e: rep, store send to vport rules per peerMark Bloch
Each representor, for each send queue, is holding a send_to_vport rule for the peer eswitch. In order to support more than one peer, and to map between the peer rules and peer eswitches, refactor representor to hold both the peer rules and pointer to the peer eswitches. This enables mlx5 to store send_to_vport rules per peer, where each peer have dedicate index via mlx5_get_dev_index(). Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5e: tc, Refactor peer add/del flowShay Drory
Move peer_eswitch outside mlx5e_tc_add_fdb_peer_flow() so downstream patch can call mlx5e_tc_add_fdb_peer_flow() with multiple peers. Move peer_eswitch in the remove flow as well in order to keep symmetry. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-02net/mlx5e: en_tc, Extend peer flows to a listMark Bloch
Currently, mlx5e_flow is holding a pointer to a peer_flow, in case one was created. e.g. There is an assumption that mlx5e_flow can have only one peer. In order to support more than one peer, refactor mlx5e_flow to hold a list of peer flows. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-01devlink: bring port new reply backJiri Pirko
In the offending fixes commit I mistakenly removed the reply message of the port new command. I was under impression it is a new port notification, partly due to the "notify" in the name of the helper function. Bring the code sending reply with new port message back, this time putting it directly to devlink_nl_cmd_port_new_doit() Fixes: c496daeb8630 ("devlink: remove duplicate port notification") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230531142025.2605001-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/sfc/tc.c 622ab656344a ("sfc: fix error unwinds in TC offload") b6583d5e9e94 ("sfc: support TC decap rules matching on enc_src_port") net/mptcp/protocol.c 5b825727d087 ("mptcp: add annotations around msk->subflow accesses") e76c8ef5cc5b ("mptcp: refactor mptcp_stream_accept()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-31net/mlx5: Read embedded cpu after init bit clearedMoshe Shemesh
During driver load it reads embedded_cpu bit from initialization segment, but the initialization segment is readable only after initialization bit is cleared. Move the call to mlx5_read_embedded_cpu() right after initialization bit cleared. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Fixes: 591905ba9679 ("net/mlx5: Introduce Mellanox SmartNIC and modify page management logic") Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-31net/mlx5e: Fix error handling in mlx5e_refresh_tirsSaeed Mahameed
Allocation failure is outside the critical lock section and should return immediately rather than jumping to the unlock section. Also unlock as soon as required and remove the now redundant jump label. Fixes: 80a2a9026b24 ("net/mlx5e: Add a lock on tir list") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-31net/mlx5: Ensure af_desc.mask is properly initializedChuck Lever
[ 9.837087] mlx5_core 0000:02:00.0: firmware version: 16.35.2000 [ 9.843126] mlx5_core 0000:02:00.0: 126.016 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x16 link) [ 10.311515] mlx5_core 0000:02:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps [ 10.321948] mlx5_core 0000:02:00.0: E-Switch: Total vports 2, per vport: max uc(128) max mc(2048) [ 10.344324] mlx5_core 0000:02:00.0: mlx5_pcie_event:301:(pid 88): PCIe slot advertised sufficient power (27W). [ 10.354339] BUG: unable to handle page fault for address: ffffffff8ff0ade0 [ 10.361206] #PF: supervisor read access in kernel mode [ 10.366335] #PF: error_code(0x0000) - not-present page [ 10.371467] PGD 81ec39067 P4D 81ec39067 PUD 81ec3a063 PMD 114b07063 PTE 800ffff7e10f5062 [ 10.379544] Oops: 0000 [#1] PREEMPT SMP PTI [ 10.383721] CPU: 0 PID: 117 Comm: kworker/0:6 Not tainted 6.3.0-13028-g7222f123c983 #1 [ 10.391625] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0b 06/12/2017 [ 10.398750] Workqueue: events work_for_cpu_fn [ 10.403108] RIP: 0010:__bitmap_or+0x10/0x26 [ 10.407286] Code: 85 c0 0f 95 c0 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 89 c9 31 c0 48 83 c1 3f 48 c1 e9 06 39 c> [ 10.426024] RSP: 0000:ffffb45a0078f7b0 EFLAGS: 00010097 [ 10.431240] RAX: 0000000000000000 RBX: ffffffff8ff0adc0 RCX: 0000000000000004 [ 10.438365] RDX: ffff9156801967d0 RSI: ffffffff8ff0ade0 RDI: ffff9156801967b0 [ 10.445489] RBP: ffffb45a0078f7e8 R08: 0000000000000030 R09: 0000000000000000 [ 10.452613] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000000ec [ 10.459737] R13: ffffffff8ff0ade0 R14: 0000000000000001 R15: 0000000000000020 [ 10.466862] FS: 0000000000000000(0000) GS:ffff9165bfc00000(0000) knlGS:0000000000000000 [ 10.474936] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 10.480674] CR2: ffffffff8ff0ade0 CR3: 00000001011ae003 CR4: 00000000003706f0 [ 10.487800] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 10.494922] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 10.502046] Call Trace: [ 10.504493] <TASK> [ 10.506589] ? matrix_alloc_area.constprop.0+0x43/0x9a [ 10.511729] ? prepare_namespace+0x84/0x174 [ 10.515914] irq_matrix_reserve_managed+0x56/0x10c [ 10.520699] x86_vector_alloc_irqs+0x1d2/0x31e [ 10.525146] irq_domain_alloc_irqs_hierarchy+0x39/0x3f [ 10.530284] irq_domain_alloc_irqs_parent+0x1a/0x2a [ 10.535155] intel_irq_remapping_alloc+0x59/0x5e9 [ 10.539859] ? kmem_cache_debug_flags+0x11/0x26 [ 10.544383] ? __radix_tree_lookup+0x39/0xb9 [ 10.548649] irq_domain_alloc_irqs_hierarchy+0x39/0x3f [ 10.553779] irq_domain_alloc_irqs_parent+0x1a/0x2a [ 10.558650] msi_domain_alloc+0x8c/0x120 [ 10.567697] irq_domain_alloc_irqs_locked+0x11d/0x286 [ 10.572741] __irq_domain_alloc_irqs+0x72/0x93 [ 10.577179] __msi_domain_alloc_irqs+0x193/0x3f1 [ 10.581789] ? __xa_alloc+0xcf/0xe2 [ 10.585273] msi_domain_alloc_irq_at+0xa8/0xfe [ 10.589711] pci_msix_alloc_irq_at+0x47/0x5c The crash is due to matrix_alloc_area() attempting to access per-CPU memory for CPUs that are not present on the system. The CPU mask passed into reserve_managed_vector() via it's @irqd parameter is corrupted because it contains uninitialized stack data. Fixes: bbac70c74183 ("net/mlx5: Use newer affinity descriptor") Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-31net/mlx5: Fix setting of irq->map.index for static IRQ caseNiklas Schnelle
When dynamic IRQ allocation is not supported all IRQs are allocated up front in mlx5_irq_table_create() instead of dynamically as part of mlx5_irq_alloc(). In the latter dynamic case irq->map.index is set via the mapping returned by pci_msix_alloc_irq_at(). In the static case and prior to commit 1da438c0ae02 ("net/mlx5: Fix indexing of mlx5_irq") irq->map.index was set in mlx5_irq_alloc() twice once initially to 0 and then to the requested index before storing in the xarray. After this commit it is only set to 0 which breaks all other IRQ mappings. Fix this by setting irq->map.index to the requested index together with irq->map.virq and improve the related comment to make it clearer which cases it deals with. Cc: Chuck Lever III <chuck.lever@oracle.com> Tested-by: Mark Brown <broonie@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Fixes: 1da438c0ae02 ("net/mlx5: Fix indexing of mlx5_irq") Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-31net/mlx5: Remove rmap also in case dynamic MSIX not supportedShay Drory
mlx5 add IRQs to rmap upon MSIX request, and mlx5 remove rmap from MSIX only if msi_map.index is populated. However, msi_map.index is populated only when dynamic MSIX is supported. This results in freeing IRQs without removing them from rmap, which triggers the bellow WARN_ON[1]. rmap is a feature which have no relation to dynamic MSIX. Hence, remove the check of msi_map.index when removing IRQ from rmap. [1] [ 200.307160 ] WARNING: CPU: 20 PID: 1702 at kernel/irq/manage.c:2034 free_irq+0x2ac/0x358 [ 200.316990 ] CPU: 20 PID: 1702 Comm: modprobe Not tainted 6.4.0-rc3_for_upstream_min_debug_2023_05_24_14_02 #1 [ 200.318939 ] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 200.321659 ] pc : free_irq+0x2ac/0x358 [ 200.322400 ] lr : free_irq+0x20/0x358 [ 200.337865 ] Call trace: [ 200.338360 ] free_irq+0x2ac/0x358 [ 200.339029 ] irq_release+0x58/0xd0 [mlx5_core] [ 200.340093 ] mlx5_irqs_release_vectors+0x80/0xb0 [mlx5_core] [ 200.341344 ] destroy_comp_eqs+0x120/0x170 [mlx5_core] [ 200.342469 ] mlx5_eq_table_destroy+0x1c/0x38 [mlx5_core] [ 200.343645 ] mlx5_unload+0x8c/0xc8 [mlx5_core] [ 200.344652 ] mlx5_uninit_one+0x78/0x118 [mlx5_core] [ 200.345745 ] remove_one+0x80/0x108 [mlx5_core] [ 200.346752 ] pci_device_remove+0x40/0xd8 [ 200.347554 ] device_remove+0x50/0x88 [ 200.348272 ] device_release_driver_internal+0x1c4/0x228 [ 200.349312 ] driver_detach+0x54/0xa0 [ 200.350030 ] bus_remove_driver+0x74/0x100 [ 200.350833 ] driver_unregister+0x34/0x68 [ 200.351619 ] pci_unregister_driver+0x28/0xa0 [ 200.352476 ] mlx5_cleanup+0x14/0x2210 [mlx5_core] [ 200.353536 ] __arm64_sys_delete_module+0x190/0x2e8 [ 200.354495 ] el0_svc_common.constprop.0+0x6c/0x1d0 [ 200.355455 ] do_el0_svc+0x38/0x98 [ 200.356122 ] el0_svc+0x1c/0x80 [ 200.356739 ] el0t_64_sync_handler+0xb4/0x130 [ 200.357604 ] el0t_64_sync+0x174/0x178 [ 200.358345 ] ---[ end trace 0000000000000000 ]--- Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>