summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-03-11nexthop: Notify userspace about bucket migrationsPetr Machata
Nexthop replacements et.al. are notified through netlink, but if a delayed work migrates buckets on the background, userspace will stay oblivious. Notify these as RTM_NEWNEXTHOPBUCKET events. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11nexthop: Add netlink handlers for bucket getPetr Machata
Allow getting (but not setting) individual buckets to inspect the next hop mapped therein, idle time, and flags. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11nexthop: Add netlink handlers for bucket dumpPetr Machata
Add a dump handler for resilient next hop buckets. When next-hop group ID is given, it walks buckets of that group, otherwise it walks buckets of all groups. It then dumps the buckets whose next hops match the given filtering criteria. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11nexthop: Add netlink handlers for resilient nexthop groupsPetr Machata
Implement the netlink messages that allow creation and dumping of resilient nexthop groups. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11nexthop: Allow reporting activity of nexthop bucketsIdo Schimmel
The kernel periodically checks the idle time of nexthop buckets to determine if they are idle and can be re-populated with a new nexthop. When the resilient nexthop group is offloaded to hardware, the kernel will not see activity on nexthop buckets unless it is reported from hardware. Add a function that can be periodically called by device drivers to report activity on nexthop buckets after querying it from the underlying device. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11nexthop: Allow setting "offload" and "trap" indication of nexthop bucketsIdo Schimmel
Add a function that can be called by device drivers to set "offload" or "trap" indication on nexthop buckets following nexthop notifications and other changes such as a neighbour becoming invalid. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11nexthop: Implement notifiers for resilient nexthop groupsPetr Machata
Implement the following notifications towards drivers: - NEXTHOP_EVENT_REPLACE, when a resilient nexthop group is created. - NEXTHOP_EVENT_BUCKET_REPLACE any time there is a change in assignment of next hops to hash table buckets. That includes replacements, deletions, and delayed upkeep cycles. Some bucket notifications can be vetoed by the driver, to make it possible to propagate bucket busy-ness flags from the HW back to the algorithm. Some are however forced, e.g. if a next hop is deleted, all buckets that use this next hop simply must be migrated, whether the HW wishes so or not. - NEXTHOP_EVENT_RES_TABLE_PRE_REPLACE, before a resilient nexthop group is replaced. Usually the driver will get the bucket notifications as well, and could veto those. But in some cases, a bucket may not be migrated immediately, but during delayed upkeep, and that is too late to roll the transaction back. This notification allows the driver to take a look and veto the new proposed group up front, before anything is committed. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11nexthop: Add data structures for resilient group notificationsIdo Schimmel
Add data structures that will be used for in-kernel notifications about addition / deletion of a resilient nexthop group and about changes to a hash bucket within a resilient group. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11nexthop: Add implementation of resilient next-hop groupsPetr Machata
At this moment, there is only one type of next-hop group: an mpath group, which implements the hash-threshold algorithm. To select a next hop, hash-threshold algorithm first assigns a range of hashes to each next hop in the group, and then selects the next hop by comparing the SKB hash with the individual ranges. When a next hop is removed from the group, the ranges are recomputed, which leads to reassignment of parts of hash space from one next hop to another. While there will usually be some overlap between the previous and the new distribution, some traffic flows change the next hop that they resolve to. That causes problems e.g. as established TCP connections are reset, because the traffic is forwarded to a server that is not familiar with the connection. Resilient hashing is a technique to address the above problem. Resilient next-hop group has another layer of indirection between the group itself and its constituent next hops: a hash table. The selection algorithm uses a straightforward modulo operation to choose a hash bucket, and then reads the next hop that this bucket contains, and forwards traffic there. This indirection brings an important feature. In the hash-threshold algorithm, the range of hashes associated with a next hop must be continuous. With a hash table, mapping between the hash table buckets and the individual next hops is arbitrary. Therefore when a next hop is deleted the buckets that held it are simply reassigned to other next hops. When weights of next hops in a group are altered, it may be possible to choose a subset of buckets that are currently not used for forwarding traffic, and use those to satisfy the new next-hop distribution demands, keeping the "busy" buckets intact. This way, established flows are ideally kept being forwarded to the same endpoints through the same paths as before the next-hop group change. In a nutshell, the algorithm works as follows. Each next hop has a number of buckets that it wants to have, according to its weight and the number of buckets in the hash table. In case of an event that might cause bucket allocation change, the numbers for individual next hops are updated, similarly to how ranges are updated for mpath group next hops. Following that, a new "upkeep" algorithm runs, and for idle buckets that belong to a next hop that is currently occupying more buckets than it wants (it is "overweight"), it migrates the buckets to one of the next hops that has fewer buckets than it wants (it is "underweight"). If, after this, there are still underweight next hops, another upkeep run is scheduled to a future time. Chances are there are not enough "idle" buckets to satisfy the new demands. The algorithm has knobs to select both what it means for a bucket to be idle, and for whether and when to forcefully migrate buckets if there keeps being an insufficient number of idle buckets. There are three users of the resilient data structures. - The forwarding code accesses them under RCU, and does not modify them except for updating the time a selected bucket was last used. - Netlink code, running under RTNL, which may modify the data. - The delayed upkeep code, which may modify the data. This runs unlocked, and mutual exclusion between the RTNL code and the delayed upkeep is maintained by canceling the delayed work synchronously before the RTNL code touches anything. Later it restarts the delayed work if necessary. The RTNL code has to implement next-hop group replacement, next hop removal, etc. For removal, the mpath code uses a neat trick of having a backup next hop group structure, doing the necessary changes offline, and then RCU-swapping them in. However, the hash tables for resilient hashing are about an order of magnitude larger than the groups themselves (the size might be e.g. 4K entries), and it was felt that keeping two of them is an overkill. Both the primary next-hop group and the spare therefore use the same resilient table, and writers are careful to keep all references valid for the forwarding code. The hash table references next-hop group entries from the next-hop group that is currently in the primary role (i.e. not spare). During the transition from primary to spare, the table references a mix of both the primary group and the spare. When a next hop is deleted, the corresponding buckets are not set to NULL, but instead marked as empty, so that the pointer is valid and can be used by the forwarding code. The buckets are then migrated to a new next-hop group entry during upkeep. The only times that the hash table is invalid is the very beginning and very end of its lifetime. Between those points, it is always kept valid. This patch introduces the core support code itself. It does not handle notifications towards drivers, which are kept as if the group were an mpath one. It does not handle netlink either. The only bit currently exposed to user space is the new next-hop group type, and that is currently bounced. There is therefore no way to actually access this code. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11nexthop: Add netlink defines and enumerators for resilient NH groupsIdo Schimmel
- RTM_NEWNEXTHOP et.al. that handle resilient groups will have a new nested attribute, NHA_RES_GROUP, whose elements are attributes NHA_RES_GROUP_*. - RTM_NEWNEXTHOPBUCKET et.al. is a suite of new messages that will currently serve only for dumping of individual buckets of resilient next hop groups. For nexthop group buckets, these messages will carry a nested attribute NHA_RES_BUCKET, whose elements are attributes NHA_RES_BUCKET_*. There are several reasons why a new suite of messages is created for nexthop buckets instead of overloading the information on the existing RTM_{NEW,DEL,GET}NEXTHOP messages. First, a nexthop group can contain a large number of nexthop buckets (4k is not unheard of). This imposes limits on the amount of information that can be encoded for each nexthop bucket given a netlink message is limited to 64k bytes. Second, while RTM_NEWNEXTHOPBUCKET is only used for notifications at this point, in the future it can be extended to provide user space with control over nexthop buckets configuration. - The new group type is NEXTHOP_GRP_TYPE_RES. Note that nexthop code is adjusted to bounce groups with that type for now. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11nexthop: Add a dedicated flag for multipath next-hop groupsPetr Machata
With the introduction of resilient nexthop groups, there will be two types of multipath groups: the current hash-threshold "mpath" ones, and resilient groups. Both are multipath, but to determine the fact, the system needs to consider two flags. This might prove costly in the datapath. Therefore, introduce a new flag, that should be set for next-hop groups that have more than one nexthop, and should be considered multipath. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11nexthop: __nh_notifier_single_info_init(): Make nh_info an argumentPetr Machata
The cited function currently uses rtnl_dereference() to get nh_info from a handed-in nexthop. However, under the resilient hashing scheme, this function will not always be called under RTNL, sometimes the mutual exclusion will be achieved differently. Therefore move the nh_info extraction from the function to its callers to make it possible to use a different synchronization guarantee. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11nexthop: Pass nh_config to replace_nexthop()Petr Machata
Currently, replace assumes that the new group that is given is a fully-formed object. But mpath groups really only have one attribute, and that is the constituent next hop configuration. This may not be universally true. From the usability perspective, it is desirable to allow the replace operation to adjust just the constituent next hop configuration and leave the group attributes as such intact. But the object that keeps track of whether an attribute was or was not given is the nh_config object, not the next hop or next-hop group. To allow (selective) attribute updates during NH group replacement, propagate `cfg' to replace_nexthop() and further to replace_nexthop_grp(). Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11Merge branch 'seg6-next'David S. Miller
Julien Massonneau says: ==================== SRv6: SRH processing improvements Add support for IPv4 decapsulation in ipv6_srh_rcv() and ignore routing header with segments left equal to 0 for seg6local actions that doesn't perfom decapsulation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11seg6: ignore routing header with segments left equal to 0Julien Massonneau
When there are 2 segments routing header, after an End.B6 action for example, the second SRH will never be handled by an action, packet will be dropped when the first SRH has segments left equal to 0. For actions that doesn't perform decapsulation (currently: End, End.X, End.T, End.B6, End.B6.Encaps), this patch adds the IP6_FH_F_SKIP_RH flag in arguments for ipv6_find_hdr(). Signed-off-by: Julien Massonneau <julien.massonneau@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11seg6: add support for IPv4 decapsulation in ipv6_srh_rcv()Julien Massonneau
As specified in IETF RFC 8754, section 4.3.1.2, if the upper layer header is IPv4 or IPv6, perform IPv6 decapsulation and resubmit the decapsulated packet to the IPv4 or IPv6 module. Only IPv6 decapsulation was implemented. This patch adds support for IPv4 decapsulation. Link: https://tools.ietf.org/html/rfc8754#section-4.3.1.2 Signed-off-by: Julien Massonneau <julien.massonneau@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11Merge branch 'hns3-next'David S. Miller
Huazhong Tan says: ==================== net: hns3: two updates for -next This series includes two updates for the HNS3 ethernet driver. ==================== Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11net: hns3: use pause capability queried from firmwareYufeng Mo
For maintainability and compatibility, add support to use pause capability queried from firmware, and add debugfs support to dump this capability. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11net: hns3: use FEC capability queried from firmwareYufeng Mo
For maintainability and compatibility, add support to use FEC capability queried from firmware. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11net/mlx5e: Alloc flow spec using kvzalloc instead of kzallocRoi Dayan
flow spec is not small and we do allocate it using kvzalloc in most places of the driver. fix rest of the places to use kvzalloc to avoid failure in allocation when memory is too fragmented. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-11net/mlx5: Avoid unnecessary operationEli Cohen
fs_get_obj retrieves the container of fs_parent_node just to pass the same value as &fs_ns->node. Just pass fs_parent_node to init_root_tree_recursive() to get exactly the same effect. Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-11net/mlx5e: rep: Improve reg_cX conditionsSaeed Mahameed
There is no point of calculating reg_c1 or overriding reg_c0 if we are going to abort the function. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com>
2021-03-11net/mlx5: SF, Fix return typeRoi Dayan
Fix the following coccicheck warnings: drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.h:50:8-9: WARNING: return of 0/1 in function 'mlx5_sf_dev_allocated' with return type bool Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-11net/mlx5e: mlx5_tc_ct_init does not failSaeed Mahameed
mlx5_tc_ct_init() either returns a valid pointer or a NULL, either way the caller can continue, remove IS_ERR check from callers as it has no effect. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-11net/mlx5: Fix indir stable stubsVlad Buslov
Some of the stubs for CONFIG_MLX5_CLS_ACT==disabled are missing "static inline" in their definition which causes the following compilation warnings: In file included from drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:41: >> drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h:34:1: warning: no previous prototype for function 'mlx5_esw_indir_table_init' [-Wmissing-prototypes] mlx5_esw_indir_table_init(void) ^ drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h:33:1: note: declare 'static' if the function is not intended to be used outside of this translation unit struct mlx5_esw_indir_table * ^ static >> drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h:40:1: warning: no previous prototype for function 'mlx5_esw_indir_table_destroy' [-Wmissing-prototypes] mlx5_esw_indir_table_destroy(struct mlx5_esw_indir_table *indir) ^ drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h:39:1: note: declare 'static' if the function is not intended to be used outside of this translation unit void ^ static >> drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h:61:1: warning: no previous prototype for function 'mlx5_esw_indir_table_needed' [-Wmissing-prototypes] mlx5_esw_indir_table_needed(struct mlx5_eswitch *esw, ^ drivers/net/ethernet/mellanox/mlx5/core/esw/indir_table.h:60:1: note: declare 'static' if the function is not intended to be used outside of this translation unit bool ^ static 3 warnings generated. Add "static inline" prefix to signatures of stubs that were missing it. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-11net/mlx5e: Add missing includeVlad Buslov
When CONFIG_IPV6 is disabled the header nexthop.h is not included by fib_notifier.h which causes tc_tun_encap.c to fail to compile: In file included from drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:5: In file included from drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.h:7: In file included from drivers/net/ethernet/mellanox/mlx5/core/en/tc_priv.h:7: In file included from drivers/net/ethernet/mellanox/mlx5/core/en_tc.h:40: drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.h:78:5: warning: no previous prototype for function 'mlx5e_tc_tun_update_header_ipv6' [-Wmissing-prototypes] int mlx5e_tc_tun_update_header_ipv6(struct mlx5e_priv *priv, ^ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.h:78:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int mlx5e_tc_tun_update_header_ipv6(struct mlx5e_priv *priv, ^ static >> drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1510:12: error: implicit declaration of function 'fib_info_nh' [-Werror,-Wimplicit-function-declaration] fib_dev = fib_info_nh(fen_info->fi, 0)->fib_nh_dev; ^ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1510:12: note: did you mean 'fib_info_put'? include/net/ip_fib.h:528:20: note: 'fib_info_put' declared here static inline void fib_info_put(struct fib_info *fi) ^ >> drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1510:42: error: member reference type 'int' is not a pointer fib_dev = fib_info_nh(fen_info->fi, 0)->fib_nh_dev; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ include/net/ip_fib.h:113:21: note: expanded from macro 'fib_nh_dev' #define fib_nh_dev nh_common.nhc_dev ^ >> drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1552:13: error: incomplete definition of type 'struct fib6_entry_notifier_info' fen_info = container_of(info, struct fib6_entry_notifier_info, info); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/kernel.h:694:51: note: expanded from macro 'container_of' BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~ include/linux/compiler_types.h:256:74: note: expanded from macro '__same_type' #define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b)) ^ include/linux/build_bug.h:39:58: note: expanded from macro 'BUILD_BUG_ON_MSG' #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~ include/linux/compiler_types.h:320:22: note: expanded from macro 'compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/compiler_types.h:308:23: note: expanded from macro '_compiletime_assert' __compiletime_assert(condition, msg, prefix, suffix) ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/compiler_types.h:300:9: note: expanded from macro '__compiletime_assert' if (!(condition)) \ ^~~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1546:9: note: forward declaration of 'struct fib6_entry_notifier_info' struct fib6_entry_notifier_info *fen_info; ^ >> drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1552:13: error: offsetof of incomplete type 'struct fib6_entry_notifier_info' fen_info = container_of(info, struct fib6_entry_notifier_info, info); ^ ~~~~~~ include/linux/kernel.h:697:21: note: expanded from macro 'container_of' ((type *)(__mptr - offsetof(type, member))); }) ^ ~~~~ include/linux/stddef.h:17:32: note: expanded from macro 'offsetof' #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER) ^ ~~~~ include/linux/compiler_types.h:140:35: note: expanded from macro '__compiler_offsetof' #define __compiler_offsetof(a, b) __builtin_offsetof(a, b) ^ ~ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1546:9: note: forward declaration of 'struct fib6_entry_notifier_info' struct fib6_entry_notifier_info *fen_info; ^ >> drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1552:11: error: assigning to 'struct fib6_entry_notifier_info *' from incompatible type 'void' fen_info = container_of(info, struct fib6_entry_notifier_info, info); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1553:12: error: implicit declaration of function 'fib6_info_nh_dev' [-Werror,-Wimplicit-function-declaration] fib_dev = fib6_info_nh_dev(fen_info->rt); ^ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1553:37: error: incomplete definition of type 'struct fib6_entry_notifier_info' fib_dev = fib6_info_nh_dev(fen_info->rt); ~~~~~~~~^ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1546:9: note: forward declaration of 'struct fib6_entry_notifier_info' struct fib6_entry_notifier_info *fen_info; ^ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1555:14: error: incomplete definition of type 'struct fib6_entry_notifier_info' fen_info->rt->fib6_dst.plen != 128) ~~~~~~~~^ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1546:9: note: forward declaration of 'struct fib6_entry_notifier_info' struct fib6_entry_notifier_info *fen_info; ^ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1562:39: error: incomplete definition of type 'struct fib6_entry_notifier_info' memcpy(&key.endpoint_ip.v6, &fen_info->rt->fib6_dst.addr, ~~~~~~~~^ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1546:9: note: forward declaration of 'struct fib6_entry_notifier_info' struct fib6_entry_notifier_info *fen_info; ^ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1563:24: error: incomplete definition of type 'struct fib6_entry_notifier_info' sizeof(fen_info->rt->fib6_dst.addr)); ~~~~~~~~^ drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c:1546:9: note: forward declaration of 'struct fib6_entry_notifier_info' struct fib6_entry_notifier_info *fen_info; ^ 1 warning and 10 errors generated. Manually include net/nexthop.h in tc_tun_encap.c. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-11net/mlx5e: fix mlx5e_tc_tun_update_header_ipv6 dummy definitionArnd Bergmann
The alternative implementation of this function in a header file is declared as a global symbol, and gets added to every .c file that includes it, which leads to a link error: arm-linux-gnueabi-ld: drivers/net/ethernet/mellanox/mlx5/core/en_rx.o: in function `mlx5e_tc_tun_update_header_ipv6': en_rx.c:(.text+0x0): multiple definition of `mlx5e_tc_tun_update_header_ipv6'; drivers/net/ethernet/mellanox/mlx5/core/en_main.o:en_main.c:(.text+0x0): first defined here Mark it 'static inline' like the other functions here. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-11net/mlx5e: CT, Avoid false lock dependency warningRoi Dayan
To avoid false lock dependency warning set the ct_entries_ht lock class different than the lock class of the ht being used when deleting last flow from a group and then deleting a group, we get into del_sw_flow_group() which call rhashtable_destroy on fg->ftes_hash which will take ht->mutex but it's different than the ht->mutex here. ====================================================== WARNING: possible circular locking dependency detected 5.10.0-rc2+ #8 Tainted: G O ------------------------------------------------------ revalidator23/24009 is trying to acquire lock: ffff888128d83828 (&node->lock){++++}-{3:3}, at: mlx5_del_flow_rules+0x83/0x7a0 [mlx5_core] but task is already holding lock: ffff8881081ef518 (&ht->mutex){+.+.}-{3:3}, at: rhashtable_free_and_destroy+0x37/0x720 which lock already depends on the new lock. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-11net/mlx5: Check returned value from health recover sequenceLeon Romanovsky
MLX5_INTERFACE_STATE_UP is far from being reliable check for success to recover, because it can be changed any time and health logic doesn't have any locks to protect from it. The locks are not needed here because health recover is good to have, but not must to success, so rely on the returned value from the mlx5_recover_device() as a marker for success/failure. Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-11net/mlx5: Don't rely on interface state bitLeon Romanovsky
The check of MLX5_INTERFACE_STATE_UP is completely useless, because the FW tracer cleanup is called on every change of the interface and it ensures that notifier is disabled together with canceling all the pending works. Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-11net/mlx5: Remove second FW tracer checkLeon Romanovsky
The FW tracer check is called twice, so delete one of them. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-11net/mlx5: Separate probe vs. reload flowsLeon Romanovsky
The mix between probe/unprobe and reload flows causes to have an extra mutex lock intf_state_mutex that generates LOCKDEP warning between it and devlink_mutex. As a preparation for the future removal, separate those flows. Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-11net/mlx5: Remove impossible checks of interface stateLeon Romanovsky
The interface state is constant at this stage and checked before calling to the register/unregister reserved GIDs. There is no need to double check it. Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-11net/mlx5: Don't skip vport checkSaeed Mahameed
Users of mlx5_eswitch_get_vport() are required to check return value prior to passing mlx5_vport further. Fix all the places to do not skip that check. Reviewed-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-11netdevsim: fib: Remove redundant codeJiapeng Chong
Fix the following coccicheck warnings: ./drivers/net/netdevsim/fib.c:874:5-8: Unneeded variable: "err". Return "0" on line 889. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11net: phy: Expose phydev::dev_flags through sysfsFlorian Fainelli
phydev::dev_flags contains a bitmask of configuration bits requested by the consumer of a PHY device (Ethernet MAC or switch) towards the PHY driver. Since these flags are often used for requesting LED or other type of configuration being able to quickly audit them without instrumenting the kernel is useful. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11net: dsa: b53: Add debug prints in b53_vlan_enable()Florian Fainelli
Having dynamic debug prints in b53_vlan_enable() has been helpful to uncover a recent but update the function to indicate the port being configured (or -1 for initial setup) and include the global VLAN enabled and VLAN filtering enable status. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10net: fddi: skfp: Mundane typo fixes throughout the file smt.hBhaskar Chowdhury
Few spelling fixes throughout the file. Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10net: ipv4: route.c: fix space before tabShubhankar Kuranagatti
The extra space before tab space has been removed. Signed-off-by: Shubhankar Kuranagatti <shubhankarvk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10Merge branch 'ionic-next'David S. Miller
Shannon Nelson says: ==================== ionic Rx updates The ionic driver's Rx path is due for an overhaul in order to better use memory buffers and to clean up the data structures. The first two patches convert the driver to using page sharing between buffers so as to lessen the page alloc and free overhead. The remaining patches clean up the structs and fastpath code for better efficency. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10ionic: simplify use of completion typesShannon Nelson
Make better use of our struct types and type checking by passing the actual Rx or Tx completion type rather than a generic void pointer type. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10ionic: rebuild debugfs on qcq swapShannon Nelson
With a reconfigure of each queue is needed a rebuild of the matching debugfs information. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10ionic: simplify rx skb allocShannon Nelson
Remove an unnecessary layer over rx skb allocation. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10ionic: optimize fastpath struct usageShannon Nelson
Clean up a couple of struct uses to make for better fast path access. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10ionic: implement Rx page reuseShannon Nelson
Rework the Rx buffer allocations to use pages twice when using normal MTU in order to cut down on buffer allocation and mapping overhead. Instead of tracking individual pages, in which we may have wasted half the space when using standard 1500 MTU, we track buffers which use half pages, so we can use the second half of the page rather than allocate and map a new page once the first buffer has been used. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10ionic: move rx_page_alloc and freeShannon Nelson
Move ionic_rx_page_alloc() and ionic_rx_page_free() to earlier in the file to make the next patch easier to review. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10Merge branch 'dpaa2-switch-next'David S. Miller
Ioana Ciornei says: ==================== dpaa2-switch: CPU terminated traffic and move out of staging This patch set adds support for Rx/Tx capabilities on DPAA2 switch port interfaces as well as fixing up some major blunders in how we take care of the switching domains. The last patch actually moves the driver out of staging now that the minimum requirements are met. I am sending this directly towards the net-next tree so that I can use the rest of the development cycle adding new features on top of the current driver without worrying about merge conflicts between the staging and net-next tree. The control interface is comprised of 3 queues in total: Rx, Rx error and Tx confirmation. In this patch set we only enable Rx and Tx conf. All switch ports share the same queues when frames are redirected to the CPU. Information regarding the ingress switch port is passed through frame metadata - the flow context field of the descriptor. NAPI instances are also shared between switch net_devices and are enabled when at least on one of the switch ports .dev_open() was called and disabled when no switch port is still up. Since the last version of this feature was submitted to the list, I reworked how the switching and flooding domains are taken care of by the driver, thus the switch is now able to also add the control port (the queues that the CPU can dequeue from) into the flooding domains of a port (broadcast, unknown unicast etc). With this, we are able to receive and sent traffic from the switch interfaces. Also, the capability to properly partition the DPSW object into multiple switching domains was added so that when not under a bridge, the ports are not actually capable to switch between them. This is possible by adding a private FDB table per switch interface. When multiple switch interfaces are under the same bridge, they will all use the same FDB table. Another thing that is fixed in this patch set is how the driver handles VLAN awareness. The DPAA2 switch is not capable to run as VLAN unaware but this was not reflected in how the driver responded to requests to change the VLAN awareness. In the last patch, this is fixed by describing the switch interfaces as Rx VLAN filtering on [fixed] and declining any request to join a VLAN unaware bridge. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10staging: dpaa2-switch: move the driver out of stagingIoana Ciornei
Now that the dpaa2-switch driver has basic I/O capabilities on the switch port net_devices and multiple bridging domains are supported, move the driver out of staging. The dpaa2-switch driver is placed right next to the dpaa2-eth driver since, in the near future, they will be sharing most of the data path. I didn't implement code reuse in this patch series because I wanted to keep it as small as possible. Also, the README is removed from staging with the intention to add proper rst documentation afterwards to actually match was is supported by the driver. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10staging: dpaa2-switch: prevent joining a bridge while VLAN uppers are presentIoana Ciornei
Each time a switch port joins a bridge, it will start to use a FDB table common with all the other switch ports that are under the same bridge. This means that any VLAN added prior to a bridge join, will retain its previous FDB table destination. With this patch, I choose to restrict when a switch port can change it's upper device (either join or leave) so that the driver does not have to delete all the previously installed VLANs from the previous FDB and add them into the new one. Thus, in the PRECHANGEUPPER notification we check if there are any VLAN type upper devices and if that's true, deny the CHANGEUPPER. This way, the user is not restricted in the topology but rather in the order in which the setup is done: it must first create the bridging domain layout and after that add the necessary VLAN devices if necessary. The teardown is similar, the VLAN devices will need to be destroyed prior to a change in the bridging layout. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10staging: dpaa2-switch: add fast-ageing on bridge leaveIoana Ciornei
Upon leaving a bridge, any MAC addresses learnt on the switch port prior to this point have to be removed so that we preserve the bridging domain configuration. Restructure the dpaa2_switch_port_fdb_dump() function in order to have a common dpaa2_switch_fdb_iterate() function between the FDB dump callback and the fast age procedure. To accomplish this, add a new callback - dpaa2_switch_fdb_cb_t - which will be called on each MAC addr and, depending on the situation, will either dump the FDB entry into a netlink message or will delete the address from the FDB table, in case of the fast-age. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>