Age | Commit message (Collapse) | Author |
|
The cited commit removes special handling of CT action. But it
removes too much. Pre ct/ct_nat tables and some other resources
are not destroyed due to the cited commit.
Fix it by adding it back.
Fixes: 08fe94ec5f77 ("net/mlx5e: TC, Remove special handling of CT action")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The cited commits add hardware miss support to tc action. But if
the rules can't be offloaded, the pointers are null and system
will panic when accessing them.
Fix it by checking null pointer.
Fixes: 08fe94ec5f77 ("net/mlx5e: TC, Remove special handling of CT action")
Fixes: 6702782845a5 ("net/mlx5e: TC, Set CT miss to the specific ct action instance")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When a PCI device has just one msix vector available, we want to share
this vector between async and completion events. Current code fails to
do that assuming it will always have at least one dedicated vector for
completion events. Fix this by detecting when the pool contains just a
single vector.
Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The cited commit missed setting napi_id on XSK RQs, it only affected
regular RQs. Add the missing part to support socket busy polling on XSK
RQs.
Fixes: a2740f529da2 ("net/mlx5e: xsk: Set napi_id to support busy polling")
Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The cited commits missed passing frag_size to __xdp_rxq_info_reg, which
is required by bpf_xdp_adjust_tail to support growing the tail pointer
in fragmented packets. Pass the missing parameter when the current RQ
mode allows XDP multi buffer.
Fixes: ea5d49bdae8b ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ")
Fixes: 9cb9482ef10e ("net/mlx5e: Use fragments of the same size in non-linear legacy RQ with XDP")
Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
"ecpu" field in struct mlx5_sf_table is not used anywhere. Remove it.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Separate the event API defined in the generic mlx5.h header into
a dedicated header. And remove the TODO comment in commit
69c1280b1f3b ("net/mlx5: Device events, Use async events chain").
Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
This change is needed to use EC VFs with metadata based steering.
There was an assumption that vport was equal to function ID. That's
not the case for EC VF functions. Adjust to function ID and set the
ec_vf_function bit accordingly.
Fixes: 9ac0b128248e ("net/mlx5: Update vport caps query/set for EC VFs")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The last value is not set correctly. This results in representors not
being created for all EC VFs when the base value is higher than 0.
Fixes: a7719b29a821 ("net/mlx5: Add management of EC VF vports")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Add counter for number of unicast, multicast and broadcast packets/
octets that were loopback.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The msglvl support was implemented using the mlx5e_dbg() macro which is
rarely used in the driver, and is not very useful when you can just use
dynamic debug instead.
Remove mlx5e_dbg() and convert its usages to netdev_dbg().
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
These else statement blocks are redundant since the if block already
jumps to the function abort label.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
|
|
For debugging purposes expose offloaded FDB state (flags, counters, etc.)
via debugfs inside 'esw' root directory. Example debugfs file output:
$ cat mlx5/0000\:08\:00.0/esw/bridge/bridge1/fdb
DEV MAC VLAN PACKETS BYTES LASTUSE FLAGS
enp8s0f0_1 e4:0a:05:08:00:06 2 2 204 4295567112 0x0
enp8s0f0_0 e4:0a:05:08:00:03 2 3 278 4295567112 0x0
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Following patch requires access to additional data in bridge net_device.
Pass the whole structure down the stack instead of adding necessary fields
as function arguments one-by-one.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Following patch in series uses the new directory for bridge FDB debugfs.
The new directory is intended for all future eswitch-specific debugfs
files.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Added a new event handler to firmware sync reset, which is used to
support firmware sync reset flow on smart NIC. Adding this new stage to
the flow enables the firmware to ensure host PFs unload before ECPFs
unload, to avoid race of PFs recovery.
If firmware sends sync_reset_unload event to driver the driver should
unload and close all HW resources of the function. Once the driver
finishes unloading part, it can't get any more events from firmware as
event queues are closed, so it polls the reset state field to know when
to continue to next stage of the sync reset flow.
Added capability bit for supporting sync_reset_unload event.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The Default Timeout Register (DTOR) provides timeout values to driver
for flows that are device dependent. Zero value for DTOR entry is not
valid and should not be used. In case of reading zero value from DTOR,
the driver should use the hard coded SW default value instead.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Expose new timoueout in Default Timeouts Register to be used on sync
reset flow running on smart NIC. In this flow the driver should know how
much time to wait from getting unload request till firmware will ask the
PF to continue to next stage of the flow.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Verify at reset_request stage that PF is capable to do reset_now. In
case PF is not capable, notify the firmware that the sync reset can not
happen and so firmware will abort the sync reset at early stage and will
not send reset_now event to any PF.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
include/linux/mlx5/driver.h
617f5db1a626 ("RDMA/mlx5: Fix affinity assignment")
dc13180824b7 ("net/mlx5: Enable devlink port for embedded cpu VF vports")
https://lore.kernel.org/all/20230613125939.595e50b8@canb.auug.org.au/
tools/testing/selftests/net/mptcp/mptcp_join.sh
47867f0a7e83 ("selftests: mptcp: join: skip check if MIB counter not supported")
425ba803124b ("selftests: mptcp: join: support RM_ADDR for used endpoints or not")
45b1a1227a7a ("mptcp: introduces more address related mibs")
0639fa230a21 ("selftests: mptcp: add explicit check for new mibs")
https://lore.kernel.org/netdev/20230609-upstream-net-20230610-mptcp-selftests-support-old-kernels-part-3-v1-0-2896fe2ee8a3@tessares.net/
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull rdma fixes from Jason Gunthorpe:
"This is an unusually large bunch of bug fixes for the later rc cycle,
rxe and mlx5 both dumped a lot of things at once. rxe continues to fix
itself, and mlx5 is fixing a bunch of "queue counters" related bugs.
There is one highly notable bug fix regarding the qkey. This small
security check was missed in the original 2005 implementation and it
allows some significant issues.
Summary:
- Two rtrs bug fixes for error unwind bugs
- Several rxe bug fixes:
* Incorrect Rx packet validation
* Using memory without a refcount
* Syzkaller found use before initialization
* Regression fix for missing locking with the tasklet conversion
from this merge window
- Have bnxt report the correct link properties to userspace, this was
a regression in v6.3
- Several mlx5 bug fixes:
* Kernel crash triggerable by userspace for the RAW ethernet
profile
* Defend against steering refcounting issues created by userspace
* Incorrect change of QP port affinity parameters in some LAG
configurations
- Fix mlx5 Q counters:
* Do not over allocate Q counters to allow userspace to use the
full port capacity
* Kernel crash triggered by eswitch due to mis-use of Q counters
* Incorrect mlx5_device for Q counters in some LAG configurations
- Properly implement the IBA spec restricting privileged qkeys to
root
- Always an error when reading from a disassociated device's event
queue
- isert bug fixes:
* Avoid a deadlock with the CM handler and CM ID destruction
* Correct list corruption due to incorrect locking
* Fix a use after free around connection tear down"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/rxe: Fix rxe_cq_post
IB/isert: Fix incorrect release of isert connection
IB/isert: Fix possible list corruption in CMA handler
IB/isert: Fix dead lock in ib_isert
RDMA/mlx5: Fix affinity assignment
IB/uverbs: Fix to consider event queue closing also upon non-blocking mode
RDMA/uverbs: Restrict usage of privileged QKEYs
RDMA/cma: Always set static rate to 0 for RoCE
RDMA/mlx5: Fix Q-counters query in LAG mode
RDMA/mlx5: Remove vport Q-counters dependency on normal Q-counters
RDMA/mlx5: Fix Q-counters per vport allocation
RDMA/mlx5: Create an indirect flow table for steering anchor
RDMA/mlx5: Initiate dropless RQ for RAW Ethernet functions
RDMA/rxe: Fix the use-before-initialization error of resp_pkts
RDMA/bnxt_re: Fix reporting active_{speed,width} attributes
RDMA/rxe: Fix ref count error in check_rkey()
RDMA/rxe: Fix packet length checks
RDMA/rtrs: Fix rxe_dealloc_pd warning
RDMA/rtrs: Fix the last iu->buf leak in err path
|
|
All callers of tls_is_sk_tx_device_offloaded() currently do
an equivalent of:
if (skb->sk && tls_is_skb_tx_device_offloaded(skb->sk))
Have the helper accept skb and do the skb->sk check locally.
Two drivers have local static inlines with similar wrappers
already.
While at it change the ifdef condition to TLS_DEVICE.
Only TLS_DEVICE selects SOCK_VALIDATE_XMIT, so the two are
equivalent. This makes removing the duplicated IS_ENABLED()
check in funeth more obviously correct.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Dimitris Michailidis <dmichail@fungible.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use dev->link_active_reporting to determine whether Data Link Layer Link
Active Reporting is available rather than re-retrieving the capability.
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2305310125370.59226@angie.orcam.me.uk
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
mlxsw will need to keep track of certain devices that are not related to
any of its front panel ports. This includes IPIP netdevices. To be able to
query the list of supported IPIP types, router->ipip_ops_arr needs to be
initialized.
To that end, move the IPIP initialization up (and finalization
correspondingly down).
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
RIF configuration contains a number of parameters that cannot be changed
after the RIF is created. For the IPIP loopbacks, this is currently worked
around by creating a new RIF with the desired configuration changes
applied, and updating next hops to the new RIF, and then destroying the old
RIF. This operation will be useful as a reusable atom, so extract a helper
to that effect.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This function will be useful later as the driver will need to retroactively
create RIFs for new uppers with addresses.
Add another helper that assumes RCU lock, and restructure the code to
skip the IPv6 branch not through conditioning on the addr_list_empty
variable, but by directly returning the result value. This makes the skip
more obvious than it previously was.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Right now freeing the object that mlxsw uses to keep track of a RIF is as
simple as calling a kfree. But later on as CRIF abstraction is brought in,
it will involve severing the link between CRIF and its RIF as well. Better
to have the logic encapsulated in a helper.
Since a helper is being introduced, make it a full-fledged destructor and
have it validate that the objects tracked at the RIF have been released.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
To abstract away deduction of RIF from the corresponding next hop group
info (NHGI), mlxsw currently uses a macro. In its current form, that macro
is impossible to extend to more general computation. Therefore introduce a
helper, mlxsw_sp_nhgi_rif(), and use it throughout. This will make it
possible to change the deduction path easily later on.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In order to abstract away deduction of netdevice from the corresponding
next hop, introduce a helper, mlxsw_sp_nexthop_dev(), and use it
throughout. This will make it possible to change the deduction path easily
later on.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The previous patch added a helper to access a netdevice given a RIF. Using
this helper in mlxsw_sp_rif_create() is unreasonable: the netdevice was
given in RIF creation parameters. Just take it there.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In order to abstract away deduction of netdevice from the corresponding
RIF, introduce a helper, mlxsw_sp_rif_dev(), and use it throughout. This
will make it possible to change the deduction path easily later on.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently, joining a LAG very simply means that the LAG RIF should be
joined by the subport representing untagged traffic. If the RIF does not
exist, it does not have to be created: if the user wants there to be RIF
for the LAG device, they are supposed to add an IP address, and they are
supposed to do it after tha LAG becomes mlxsw upper.
We can also assume that the LAG has no uppers, otherwise the enslavement is
not allowed.
In the future, these ordering dependencies should be removed. That means
that joining LAG will be more complex operation, possibly involving a lazy
RIF creation, and possibly joining / lazily creating RIFs for VLAN uppers
of the LAG. It will be handy to have a dedicated function that handles all
this. The new function mlxsw_sp_router_port_join_lag() is that.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Split out of mlxsw_sp_port_vlan_router_join() the part that checks for RIF
and dispatches to __mlxsw_sp_port_vlan_router_join(), leaving it as wrapper
that just manages the router lock.
The new function, mlxsw_sp_port_vlan_router_join_existing(), will be useful
as an atom in later patches.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
After commit b8a1a4cd5a98 ("i2c: Provide a temporary .probe_new()
call-back type"), all drivers being converted to .probe_new() and then
commit 03c835f498b5 ("i2c: Switch .probe() to not take an id parameter")
convert back to (the new) .probe() to be able to eventually drop
.probe_new() from struct i2c_driver.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
mlx5-updates-2023-06-09
1) Embedded CPU Virtual Functions
2) Lightweight local SFs
Daniel Jurgens says:
====================
Embedded CPU Virtual Functions
This series enables the creation of virtual functions on Bluefield (the
embedded CPU platform). Embedded CPU virtual functions (EC VFs). EC VF
creation, deletion and management interfaces are the same as those for
virtual functions in a server with a Connect-X NIC.
When using EC VFs on the ARM the creation of virtual functions on the
host system is still supported. Host VFs eswitch vports occupy a range
of 1..max_vfs, the EC VF vport range is max_vfs+1..max_ec_vfs.
Every function (PF, ECPF, VF, EC VF, and subfunction) has a function ID
associated with it. Prior to this series the function ID and the eswitch
vport were the same. That is no longer the case, the EC VF function ID
range is 1..max_ec_vfs. When querying or setting the capabilities of an
EC VF function an new bit must be set in the query/set HCA cap
structure.
This is a high level overview of the changes made:
- Allocate vports for EC VFs if they are enabled.
- Create representors and devlink ports for the EC VF vports.
- When querying/setting HCA caps by vport break the assumption
that function ID is the same a vport number and adjust
accordingly.
- Create a new type of page, so that when SRIOV on the ARM is
disabled, but remains enabled on the host, the driver can
wait for the correct pages.
- Update SRIOV code to support EC VF creation/deletion.
===================
Lightweight local SFs:
Last 3 patches form Shay Drory:
SFs are heavy weight and by default they come with the full package of
ConnectX features. Usually users want specialized SFs for one specific
purpose and using devlink users will almost always override the set of
advertises features of an SF and reload it.
Shay Drory says:
================
In order to avoid the wasted time and resources on the reload, local SFs
will probe without any auxiliary sub-device, so that the SFs can be
configured prior to its full probe.
The defaults of the enable_* devlink params of these SFs are set to
false.
Usage example:
Create SF:
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
$ devlink port function set pci/0000:08:00.0/32768 \
hw_addr 00:00:00:00:00:11 state active
Enable ETH auxiliary device:
$ devlink dev param set auxiliary/mlx5_core.sf.1 \
name enable_eth value true cmode driverinit
Now, in order to fully probe the SF, use devlink reload:
$ devlink dev reload auxiliary/mlx5_core.sf.1
At this point the user have SF devlink instance with auxiliary device
for the Ethernet functionality only.
================
|
|
Now that the external users of mlxsw_sp_rif_dev() have been converted in
the preceding patches, make the function static.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In a number of places, a netdevice underlying a RIF is obtained only to
compare it to another pointer. In order to clean up the interface between
the router and the other modules, add a new helper to specifically answer
this question, and convert the relevant uses to this new interface.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In a number of places, a netdevice underlying a RIF is obtained only to
check if it a NULL pointer. In order to clean up the interface between the
router and the other modules, add a new helper to specifically answer this
question, and convert the relevant uses to this new interface.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After the struct mlxsw_sp_netevent_work.n field initialization is moved
here, the body of code that handles NETEVENT_NEIGH_UPDATE is almost
identical to the one in the helper function. Therefore defer to the helper
instead of inlining the equivalent.
Note that previously, the code took and put a reference of the netdevice.
The new code defers to mlxsw_sp_dev_lower_is_port() to obviate the need for
taking the reference.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This code handles NETEVENT_DELAY_PROBE_TIME_UPDATE, which is invoked every
time the delay_probe_time changes. mlxsw router currently only maintains
one timer, so the last delay_probe_time set wins.
Currently, mlxsw uses mlxsw_sp_port_lower_dev_hold() to find a reference to
the router. This is no longer necessary. But as a side effect, this makes
sure that only updates to "interesting netdevices" (ones that have a
physical netdevice lower) are projected.
Retain that side effect by calling mlxsw_sp_port_dev_lower_find_rcu() and
punting if there is none. Then just proceed using the router pointer that's
already at hand in the helper.
Note that previously, the code took and put a reference of the netdevice.
Because the mlxsw_sp pointer is now obtained from the notifier block, the
port pointer (non-) NULL-ness is all that's relevant, and the reference
does not need to be taken anymore.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of passing a notifier block and deducing the router pointer from
that in the helper, do that in the caller, and pass the result. In the
following patches, the pointer will also be made useful in the caller.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The validation logic is already in the router code. Move there the notifier
blocks themselves as well.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make mlxsw_sp_router_fini() more similar to the _init() function (and more
concise) by extracting the `router' handle to a named variable and using
that throughout. The availability of a dedicated `router' variable will
come in handy in following patches.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The cited commit aimed to ensure that Virtual Functions (VFs) assign a
queue affinity to a Queue Pair (QP) to distribute traffic when
the LAG master creates a hardware LAG. If the affinity was set while
the hardware was not in LAG, the firmware would ignore the affinity value.
However, this commit unintentionally assigned an affinity to QPs on the LAG
master's VPORT even if the RDMA device was not marked as LAG-enabled.
In most cases, this was not an issue because when the hardware entered
hardware LAG configuration, the RDMA device of the LAG master would be
destroyed and a new one would be created, marked as LAG-enabled.
The problem arises when a user configures Equal-Cost Multipath (ECMP).
In ECMP mode, traffic can be directed to different physical ports based on
the queue affinity, which is intended for use by VPORTS other than the
E-Switch manager. ECMP mode is supported only if both E-Switch managers are
in switchdev mode and the appropriate route is configured via IP. In this
configuration, the RDMA device is not destroyed, and we retain the RDMA
device that is not marked as LAG-enabled.
To ensure correct behavior, Send Queues (SQs) opened by the E-Switch
manager through verbs should be assigned strict affinity. This means they
will only be able to communicate through the native physical port
associated with the E-Switch manager. This will prevent the firmware from
assigning affinity and will not allow the SQs to be remapped in case of
failover.
Fixes: 802dcc7fc5ec ("RDMA/mlx5: Support TX port affinity for VF drivers in LAG mode")
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://lore.kernel.org/r/425b05f4da840bc684b0f7e8ebf61aeb5cef09b0.1685960567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Nullifying qp->dbg is a preparation for the next patches
from the series in which mlx5_core_destroy_qp() could actually fail,
and then it can be called again which causes a kernel crash, since
qp->dbg was not nullified in previous call.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Link: https://lore.kernel.org/r/1677e52bb642fd8d6062d73a5aa69083c0283dc9.1685953497.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
'handle' is known to be NULL here. There is no need to kfree() it.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
In case user wants to configure the SFs, for example: to use only vdpa
functionality, he needs to fully probe a SF, configure what he wants,
and afterward reload the SF.
In order to save the time of the reload, local SFs will probe without
any auxiliary sub-device, so that the SFs can be configured prior to
its full probe.
The defaults of the enable_* devlink params of these SFs are set to
false.
Usage example:
Create SF:
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
$ devlink port function set pci/0000:08:00.0/32768 \
hw_addr 00:00:00:00:00:11 state active
Enable ETH auxiliary device:
$ devlink dev param set auxiliary/mlx5_core.sf.1 \
name enable_eth value true cmode driverinit
Now, in order to fully probe the SF, use devlink reload:
$ devlink dev reload auxiliary/mlx5_core.sf.1
At this point the user have SF devlink instance with auxiliary device
for the Ethernet functionality only.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Move the param registration and handling code into the eswitch
code as they are related to each other. No point in having the
devlink param registration done in separate file.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
mlx5_cmd_init_hca() is taking ~0.2 seconds. In case of a user who
desire to disable some of the SF aux devices, and with large scale-1K
SFs for example, this user will waste more than 3 minutes on
mlx5_cmd_init_hca() which isn't needed at this stage.
Downstream patch will change SFs which are probe over the E-switch,
local SFs, to be probed without any aux dev. In order to support this,
split function_setup() to avoid executing mlx5_cmd_init_hca().
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Set the maximum number of embedded cpu VF functions available.
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|