summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-10-10net-shapers: implement cap validation in the corePaolo Abeni
Use the device capabilities to reject invalid attribute values before pushing them to the H/W. Note that validating the metric explicitly avoids NL_SET_BAD_ATTR() usage, to provide unambiguous error messages to the user. Validating the nesting requires the knowledge of the new parent for the given shaper; as such is a chicken-egg problem: to validate the leaf nesting we need to know the node scope, to validate the node nesting we need to know the leafs parent scope. To break the circular dependency, place the leafs nesting validation after the parsing. Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/54667601813e4c0348f39bf8ad2446ffc9fcd383.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10net: shaper: implement introspection supportPaolo Abeni
The netlink op is a simple wrapper around the device callback. Extend the existing fetch_dev() helper adding an attribute argument for the requested device. Reuse such helper in the newly implemented operation. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/66eb62f22b3a5ba06ca91d01ae77515e5f447e15.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10netlink: spec: add shaper introspection supportPaolo Abeni
Allow the user-space to fine-grain query the shaping features supported by the NIC on each domain. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/3ddd10e450e3fe7d4b944c0d0b886d4483529ee6.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10net-shapers: implement shaper cleanup on queue deletionPaolo Abeni
hook into netif_set_real_num_tx_queues() to cleanup any shaper configured on top of the to-be-destroyed TX queues. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/6da4ee03cae2b2a757d7b59e88baf09cc94c5ef1.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10net-shapers: implement delete support for NODE scope shaperPaolo Abeni
Leverage the previously introduced group operation to implement the removal of NODE scope shaper, re-linking its leaves under the the parent node before actually deleting the specified NODE scope shaper. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/763d484b5b69e365acccfd8031b183c647a367a4.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10net-shapers: implement NL group operationPaolo Abeni
Allow grouping multiple leaves shaper under the given root. The node and the leaves shapers are created, if needed, otherwise the existing shapers are re-linked as requested. Try hard to pre-allocated the needed resources, to avoid non trivial H/W configuration rollbacks in case of any failure. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/8a721274fde18b872d1e3a61aaa916bb7b7996d3.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10net-shapers: implement NL set and delete operationsPaolo Abeni
Both NL operations directly map on the homonymous device shaper callbacks, update accordingly the shapers cache and are serialized via a per device lock. Implement the cache modification helpers to additionally deal with NODE scope shaper. That will be needed by the group() operation implemented in the next patch. The delete implementation is partial: does not handle NODE scope shaper yet. Such support will require infrastructure from the next patch and will be implemented later in the series. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/1e6a34a4095b35d773d2b9c476164671bbcf8397.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10net-shapers: implement NL get operationPaolo Abeni
Introduce the basic infrastructure to implement the net-shaper core functionality. Each network devices carries a net-shaper cache, the NL get() operation fetches the data from such cache. The cache is initially empty, will be fill by the set()/group() operation implemented later and is destroyed at device cleanup time. The net_shaper_fill_handle(), net_shaper_ctx_init(), and net_shaper_generic_pre() implementations handle generic index type attributes, despite the current caller always pass a constant value to avoid more noise in later patches using them with different attributes. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/ddd10fd645a9367803ad02fca4a5664ea5ace170.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10netlink: spec: add shaper YAML specPaolo Abeni
Define the user-space visible interface to query, configure and delete network shapers via yaml definition. Add dummy implementations for the relevant NL callbacks. set() and delete() operations touch a single shaper creating/updating or deleting it. The group() operation creates a shaper's group, nesting multiple input shapers under the specified output shaper. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/7a33a1ff370bdbcd0cd3f909575c912cd56f41da.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10genetlink: extend info user-storage to match NL cb ctxPaolo Abeni
This allows a more uniform implementation of non-dump and dump operations, and will be used later in the series to avoid some per-operation allocation. Additionally rename the NL_ASSERT_DUMP_CTX_FITS macro, to fit a more extended usage. Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/1130cc2896626b84587a2a5f96a5c6829638f4da.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10Merge branch 'rtnetlink-handle-error-of-rtnl_register_module'Paolo Abeni
Kuniyuki Iwashima says: ==================== rtnetlink: Handle error of rtnl_register_module(). While converting phonet to per-netns RTNL, I found a weird comment /* Further rtnl_register_module() cannot fail */ that was true but no longer true after commit addf9b90de22 ("net: rtnetlink: use rcu to free rtnl message handlers"). Many callers of rtnl_register_module() just ignore the returned value but should handle them properly. This series introduces two helpers, rtnl_register_many() and rtnl_unregister_many(), to do that easily and fix such callers. All rtnl_register() and rtnl_register_module() will be converted to _many() variant and some rtnl_lock() will be saved in _many() later in net-next. Changes: v4: * Add more context in changelog of each patch v3: https://lore.kernel.org/all/20241007124459.5727-1-kuniyu@amazon.com/ * Move module *owner to struct rtnl_msg_handler * Make struct rtnl_msg_handler args/vars const * Update mctp goto labels v2: https://lore.kernel.org/netdev/20241004222358.79129-1-kuniyu@amazon.com/ * Remove __exit from mctp_neigh_exit(). v1: https://lore.kernel.org/netdev/20241003205725.5612-1-kuniyu@amazon.com/ ==================== Link: https://patch.msgid.link/20241008184737.9619-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10phonet: Handle error of rtnl_register_module().Kuniyuki Iwashima
Before commit addf9b90de22 ("net: rtnetlink: use rcu to free rtnl message handlers"), once the first rtnl_register_module() allocated rtnl_msg_handlers[PF_PHONET], the following calls never failed. However, after the commit, rtnl_register_module() could fail silently to allocate rtnl_msg_handlers[PF_PHONET][msgtype] and requires error handling for each call. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's use rtnl_register_many() to handle the errors easily. Fixes: addf9b90de22 ("net: rtnetlink: use rcu to free rtnl message handlers") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Rémi Denis-Courmont <courmisch@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10mpls: Handle error of rtnl_register_module().Kuniyuki Iwashima
Since introduced, mpls_init() has been ignoring the returned value of rtnl_register_module(), which could fail silently. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's handle the errors by rtnl_register_many(). Fixes: 03c0566542f4 ("mpls: Netlink commands to add, remove, and dump routes") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10mctp: Handle error of rtnl_register_module().Kuniyuki Iwashima
Since introduced, mctp has been ignoring the returned value of rtnl_register_module(), which could fail silently. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's handle the errors by rtnl_register_many(). Fixes: 583be982d934 ("mctp: Add device handling and netlink interface") Fixes: 831119f88781 ("mctp: Add neighbour netlink interface") Fixes: 06d2f4c583a7 ("mctp: Add netlink route management") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10bridge: Handle error of rtnl_register_module().Kuniyuki Iwashima
Since introduced, br_vlan_rtnl_init() has been ignoring the returned value of rtnl_register_module(), which could fail silently. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's handle the errors by rtnl_register_many(). Fixes: 8dcea187088b ("net: bridge: vlan: add rtm definitions and dump support") Fixes: f26b296585dc ("net: bridge: vlan: add new rtm message support") Fixes: adb3ce9bcb0f ("net: bridge: vlan: add del rtm message support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10vxlan: Handle error of rtnl_register_module().Kuniyuki Iwashima
Since introduced, vxlan_vnifilter_init() has been ignoring the returned value of rtnl_register_module(), which could fail silently. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's handle the errors by rtnl_register_many(). Fixes: f9c4bb0b245c ("vxlan: vni filtering support on collect metadata device") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10rtnetlink: Add bulk registration helpers for rtnetlink message handlers.Kuniyuki Iwashima
Before commit addf9b90de22 ("net: rtnetlink: use rcu to free rtnl message handlers"), once rtnl_msg_handlers[protocol] was allocated, the following rtnl_register_module() for the same protocol never failed. However, after the commit, rtnl_msg_handler[protocol][msgtype] needs to be allocated in each rtnl_register_module(), so each call could fail. Many callers of rtnl_register_module() do not handle the returned error, and we need to add many error handlings. To handle that easily, let's add wrapper functions for bulk registration of rtnetlink message handlers. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net: phy: Validate PHY LED OPs presence before registeringChristian Marangi
Validate PHY LED OPs presence before registering and parsing them. Defining LED nodes for a PHY driver that actually doesn't supports them is redundant and useless. It's also the case with Generic PHY driver used and a DT having LEDs node for the specific PHY. Skip it and report the error with debug print enabled. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241008194718.9682-1-ansuelsmth@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10Merge tag 'nf-24-10-09' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Restrict xtables extensions to families that are safe, syzbot found a way to combine ebtables with extensions that are never used by userspace tools. From Florian Westphal. 2) Set l3mdev inconditionally whenever possible in nft_fib to fix lookup mismatch, also from Florian. netfilter pull request 24-10-09 * tag 'nf-24-10-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: selftests: netfilter: conntrack_vrf.sh: add fib test case netfilter: fib: check correct rtable in vrf setups netfilter: xtables: avoid NFPROTO_UNSPEC where needed ==================== Link: https://patch.msgid.link/20241009213858.3565808-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10Merge branch 'net-mlx5-qos-refactor-esw-qos-to-support-new-features'Paolo Abeni
Tariq Toukan says: ==================== net/mlx5: qos: Refactor esw qos to support new features This patch series by Cosmin and Carolina prepares the mlx5 qos infra for the upcoming feature of cross E-Switch scheduling. Noop cleanups: net/mlx5: qos: Flesh out element_attributes in mlx5_ifc.h net/mlx5: qos: Rename vport 'tsar' into 'sched_elem'. net/mlx5: qos: Consistently name vport vars as 'vport' net/mlx5: qos: Refactor and document bw_share calculation net/mlx5: qos: Rename rate group 'list' as 'parent_entry' Refactor the code with the goal of moving groups out of E-Switches: net/mlx5: qos: Maintain rate group vport members in a list net/mlx5: qos: Always create group0 net/mlx5: qos: Drop 'esw' param from vport qos functions net/mlx5: qos: Store the eswitch in a mlx5_esw_rate_group Move groups from an E-Switch into an mlx5_qos_domain: net/mlx5: qos: Store rate groups in a qos domain Refactor locking to use a new mutex in the qos domain: net/mlx5: qos: Refactor locking to a qos domain mutex In follow-up patchsets, we'll allow qos domains to be shared between E-Switches of the same NIC. The two top patches are simple enhancements. ==================== Link: https://patch.msgid.link/20241008183222.137702-1-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: Add support check for TSAR types in QoS schedulingCarolina Jubran
Introduce a new function, mlx5_qos_tsar_type_supported(), to handle the validation of TSAR types within QoS scheduling contexts. Refactor the existing code to use this new function, replacing direct checks for TSAR type support in the NIC scheduling hierarchy. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: Unify QoS element type checks across NIC and E-SwitchCarolina Jubran
Refactor the QoS element type support check by introducing a new function, mlx5_qos_element_type_supported(), which handles element type validation for both NIC and E-Switch schedulers. This change removes the redundant esw_qos_element_type_supported() function and unifies the element type checks into a single implementation. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: qos: Refactor locking to a qos domain mutexCosmin Ratiu
E-Switch qos changes used the esw state_lock to serialize qos changes. With the introduction of cross-esw scheduling, multiple E-Switches might be involved in a qos operation, so prepare for that by switching locking to use a qos domain mutex. Add three helper functions: - esw_qos_lock - esw_qos_unlock - esw_assert_qos_lock_held Convert existing direct lock/unlock/lockdep calls to them. Also call esw_assert_qos_lock_held in a couple more places. mlx5_esw_qos_set_vport_rate expected to be called with the esw state_lock already held. Change it to instead acquire the qos lock directly. mlx5_eswitch_get_vport_config also accessed qos properties with the esw state lock. Introduce a new function mlx5_esw_qos_get_vport_rate to access those with the correct lock and change get_vport_config to use it. Finally, mlx5_vport_disable is called from the cleanup path with the esw state_lock held, so have it additionally acquire the qos lock to make sure there are no races. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: qos: Store rate groups in a qos domainCosmin Ratiu
Groups are currently maintained as a list in their corresponding eswitch, protected by the esw state_lock. The upcoming cross-eswitch scheduling feature cannot work with this approach, as it would require acquiring multiple eswitch locks (in the correct order) in order to maintain group membership. This commit moves the rate groups into a new 'qos domain' struct and adds explicit qos init/cleanup steps to the eswitch init/cleanup. Upcoming patches will expand the qos domain struct and allow it to be shared between eswitches. For now, qos domains are private to each esw so there's only an extra indirection. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: qos: Rename rate group 'list' as 'parent_entry'Cosmin Ratiu
'list' is not very descriptive, I prefer list membership to clearly specify which list the entry belongs to. This commit renames the list entry into the esw groups list as 'parent_entry' to make the code more readable. This is a no-op change. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: qos: Add an explicit 'dev' to vport trace callsCosmin Ratiu
vport qos trace calls used vport->dev implicitly as the device to which the command was sent (and thus the device logged in traces). But that will no longer be the case for cross-esw scheduling, where the commands have to be sent to the group esw device instead. This commit corrects that. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: qos: Store the eswitch in a mlx5_esw_rate_groupCosmin Ratiu
The rate groups are about to be moved out of eswitches, so store a reference to the eswitch they belong to so things can still work later. This allows dropping the esw parameter from a couple of functions and simplifying some of the code. Use this opportunity to make sure that vport scheduling element commands are always sent to the group eswitch, because that will be relevant for cross-esw scheduling. For now though, the eswitches are not different. There is no functionality change here. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: qos: Drop 'esw' param from vport qos functionsCosmin Ratiu
The vport has a pointer to its own eswitch in vport->dev->priv.eswitch, so passing the same eswitch as a parameter to the various functions manipulating vport qos is superfluous at best and prone to errors at worst. More importantly, with the upcoming cross-esw scheduling changes, the eswitch that should receive the various scheduling element commands is NOT the same as the vport's eswitch, so the current code's assumptions will break. To avoid confusion and bugs, this commit drops the 'esw' parameter from all vport qos functions and uses the vport's own eswitch pointer instead. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: qos: Always create group0Cosmin Ratiu
All vports not explicitly members of a group with QoS enabled are part of the internal esw group0, except when the hw reports that groups aren't supported (log_esw_max_sched_depth == 0). This creates corner cases in the code, which has to make sure that this case is supported. Additionally, the groups are about to be moved out of eswitches, and group0 being NULL creates additional complications there. This patch makes sure to always create group0, even if max sched depth is 0. In that case, a software-only group0 is created referencing the root TSAR. Vports can point to this group when their QoS is enabled and they'll be attached to the root TSAR directly. This eliminates corner cases in the code by offering the guarantee that if qos is enabled, vport->qos.group is non-NULL. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: qos: Maintain rate group vport members in a listCosmin Ratiu
Previously, finding group members was done by iterating over all vports of an eswitch and comparing their group with the required one, but that approach will break down when a group can contain vports from multiple eswitches. Solve that by maintaining a list of vport members. Instead of iterating over esw vports, loop over the members list. Use this opportunity to provide two new functions to allocate and free a group, so that the number of state transitions is smaller. This will also be used in a future patch. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: qos: Refactor and document bw_share calculationCosmin Ratiu
The previous function (esw_qos_calculate_group_min_rate_divider) had two completely different modes of execution, depending on the 'group_level' parameter. Split it into two separate functions: - esw_qos_calculate_min_rate_divider - computes min across groups. - esw_qos_calculate_group_min_rate_divider - computes min in a group. Fold the divider calculation into the corresponding normalize functions to avoid having the caller compute the corresponding divider. Also rename the normalize functions to better indicate what level they're operating on. Finally, document everything so that this topic can more easily be understood by future maintainers. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: qos: Consistently name vport vars as 'vport'Cosmin Ratiu
The current mixture of 'vport' and 'evport' can be improved. There is no functional change. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: qos: Rename vport 'tsar' into 'sched_elem'.Cosmin Ratiu
Vports do not use TSARs (Transmit Scheduling ARbiters), which are used for grouping multiple entities together. Use the correct name in variables and functions for clarity. Also move the scheduling context to a local variable in the esw_qos_sched_elem_config function instead of an empty parameter that needs to be provided by all callers. There is no functional change here. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net/mlx5: qos: Flesh out element_attributes in mlx5_ifc.hCosmin Ratiu
This is used for multiple purposes, depending on the scheduling element created. There are a few helper struct defined a long time ago, but they are not easy to find in the file and they are about to get new members. This commit cleans up this area a bit by: - moving the helper structs closer to where they are relevant. - defining a helper union to include all of them to help discoverability. - making use of it everywhere element_attributes is used. - using a consistent 'attr' name. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10Merge branch 'eth-fbnic-add-timestamping-support'Paolo Abeni
Vadim Fedorenko says: ==================== eth: fbnic: add timestamping support The series is to add timestamping support for Meta's NIC driver. Changelog: v3 -> v4: - use adjust_by_scaled_ppm() instead of open coding it - adjust cached value of high bits of timestamp to be sure it is older then incoming timestamps v2 -> v3: - rebase on top of net-next - add doc to describe retur value of fbnic_ts40_to_ns() v1 -> v2: - adjust comment about using u64 stats locking primitive - fix typo in the first patch - Cc Richard ==================== Link: https://patch.msgid.link/20241008181436.4120604-1-vadfed@meta.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10eth: fbnic: add ethtool timestamping statisticsVadim Fedorenko
Add counters of packets with HW timestamps requests and lost timestamps with no associated skbs. Use ethtool interface to report these counters. Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10eth: fbnic: add TX packets timestamping supportVadim Fedorenko
Add TX configuration to ethtool interface. Add processing of TX timestamp completions as well as configuration to request HW to create TX timestamp completion. Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10eth: fbnic: add RX packets timestamping supportVadim Fedorenko
Add callbacks to support timestamping configuration via ethtool. Add processing of RX timestamps. Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10eth: fbnic: add initial PHC supportVadim Fedorenko
Create PHC device and provide callbacks needed for ptp_clock device. Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10eth: fbnic: add software TX timestamping supportVadim Fedorenko
Add software TX timestamping support. RX software timestamping is implemented in the core and there is no need to provide special flag in the driver anymore. Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net: Remove likely from l3mdev_master_ifindex_by_indexBreno Leitao
The likely() annotation in l3mdev_master_ifindex_by_index() has been found to be incorrect 100% of the time in real-world workloads (e.g., web servers). Annotated branches shows the following in these servers: correct incorrect % Function File Line 0 169053813 100 l3mdev_master_ifindex_by_index l3mdev.h 81 This is happening because l3mdev_master_ifindex_by_index() is called from __inet_check_established(), which calls l3mdev_master_ifindex_by_index() passing the socked bounded interface. l3mdev_master_ifindex_by_index(net, sk->sk_bound_dev_if); Since most sockets are not going to be bound to a network device, the likely() is giving the wrong assumption. Remove the likely() annotation to ensure more accurate branch prediction. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241008163205.3939629-1-leitao@debian.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10net: do not delay dst_entries_add() in dst_release()Eric Dumazet
dst_entries_add() uses per-cpu data that might be freed at netns dismantle from ip6_route_net_exit() calling dst_entries_destroy() Before ip6_route_net_exit() can be called, we release all the dsts associated with this netns, via calls to dst_release(), which waits an rcu grace period before calling dst_destroy() dst_entries_add() use in dst_destroy() is racy, because dst_entries_destroy() could have been called already. Decrementing the number of dsts must happen sooner. Notes: 1) in CONFIG_XFRM case, dst_destroy() can call dst_release_immediate(child), this might also cause UAF if the child does not have DST_NOCOUNT set. IPSEC maintainers might take a look and see how to address this. 2) There is also discussion about removing this count of dst, which might happen in future kernels. Fixes: f88649721268 ("ipv4: fix dst race in sk_dst_get()") Closes: https://lore.kernel.org/lkml/CANn89iLCCGsP7SFn9HKpvnKu96Td4KD08xf7aGtiYgZnkjaL=w@mail.gmail.com/T/ Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Xin Long <lucien.xin@gmail.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20241008143110.1064899-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-09Merge branch 'ipv4-namespacify-ipv4-address-hash-table'Jakub Kicinski
Kuniyuki Iwashima says: ==================== ipv4: Namespacify IPv4 address hash table. This is a prep of per-net RTNL conversion for RTM_(NEW|DEL|SET)ADDR. Currently, each IPv4 address is linked to the global hash table, and this needs to be protected by another global lock or namespacified to support per-net RTNL. Adding a global lock will cause deadlock in the rtnetlink path and GC, rtnetlink check_lifetime |- rtnl_net_lock(net) |- acquire the global lock |- acquire the global lock |- check ifa's netns `- put ifa into hash table `- rtnl_net_lock(net) so we need to namespacify the hash table. The IPv6 one is already namespacified, let's follow that. v2: https://lore.kernel.org/netdev/20241004195958.64396-1-kuniyu@amazon.com/ v1: https://lore.kernel.org/netdev/20241001024837.96425-1-kuniyu@amazon.com/ ==================== Link: https://patch.msgid.link/20241008172906.1326-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09ipv4: Retire global IPv4 hash table inet_addr_lst.Kuniyuki Iwashima
No one uses inet_addr_lst anymore, so let's remove it. Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241008172906.1326-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09ipv4: Namespacify IPv4 address GC.Kuniyuki Iwashima
Each IPv4 address could have a lifetime, which is useful for DHCP, and GC is periodically executed as check_lifetime_work. check_lifetime() does the actual GC under RTNL. 1. Acquire RTNL 2. Iterate inet_addr_lst 3. Remove IPv4 address if expired 4. Release RTNL Namespacifying the GC is required for per-netns RTNL, but using the per-netns hash table will shorten the time on the hash bucket iteration under RTNL. Let's add per-netns GC work and use the per-netns hash table. Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241008172906.1326-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09ipv4: Use per-netns hash table in inet_lookup_ifaddr_rcu().Kuniyuki Iwashima
Now, all IPv4 addresses are put in the per-netns hash table. Let's use it in inet_lookup_ifaddr_rcu(). Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241008172906.1326-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09ipv4: Link IPv4 address to per-netns hash table.Kuniyuki Iwashima
As a prep for per-netns RTNL conversion, we want to namespacify the IPv4 address hash table and the GC work. Let's allocate the per-netns IPv4 address hash table to net->ipv4.inet_addr_lst and link IPv4 addresses into it. The actual users will be converted later. Note that the IPv6 address hash table is already namespacified. Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241008172906.1326-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-10-08 (ice, iavf, igb, e1000e, e1000) This series contains updates to ice, iavf, igb, e1000e, and e1000 drivers. For ice: Wojciech adds support for ethtool reset. Paul adds support for hardware based VF mailbox limits for E830 devices. Jake adjusts to a common iterator in ice_vc_cfg_qs_msg() and moves storing of max_frame and rx_buf_len from VSI struct to the ring structure. Hongbo Li uses assign_bit() to replace an open-coded instance. Markus Elfring adjusts a couple of PTP error paths to use a common, shared exit point. Yue Haibing removes unused declarations. For iavf: Yue Haibing removes unused declarations. For igb: Yue Haibing removes unused declarations. For e1000e: Takamitsu Iwai removes unneccessary writel() calls. Joe Damato adds support for netdev-genl support to query IRQ, NAPI, and queue information. For e1000: Joe Damato adds support for netdev-genl support to query IRQ, NAPI, and queue information. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: e1000: Link NAPI instances to queues and IRQs e1000e: Link NAPI instances to queues and IRQs e1000e: Remove duplicated writel() in e1000_configure_tx/rx() igb: Cleanup unused declarations iavf: Remove unused declarations ice: Cleanup unused declarations ice: Use common error handling code in two functions ice: Make use of assign_bit() API ice: store max_frame and rx_buf_len only in ice_rx_ring ice: consistently use q_idx in ice_vc_cfg_qs_msg() ice: add E830 HW VF mailbox message limit support ice: Implement ethtool reset support ==================== Link: https://patch.msgid.link/20241008233441.928802-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-10-08 (ice, i40e, igb, e1000e) This series contains updates to ice, i40e, igb, and e1000e drivers. For ice: Marcin allows driver to load, into safe mode, when DDP package is missing or corrupted and adjusts the netif_is_ice() check to account for when the device is in safe mode. He also fixes an out-of-bounds issue when MSI-X are increased for VFs. Wojciech clears FDB entries on reset to match the hardware state. For i40e: Aleksandr adds locking around MACVLAN filters to prevent memory leaks due to concurrency issues. For igb: Mohamed Khalfella adds a check to not attempt to bring up an already running interface on non-fatal PCIe errors. For e1000e: Vitaly changes board type for I219 to more closely match the hardware and stop PHY issues. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: e1000e: change I219 (19) devices to ADP igb: Do not bring the device up after non-fatal error i40e: Fix macvlan leak by synchronizing access to mac_filter_hash ice: Fix increasing MSI-X on VF ice: Flush FDB entries before reset ice: Fix netif_is_ice() in Safe Mode ice: Fix entering Safe Mode ==================== Link: https://patch.msgid.link/20241008230050.928245-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09Fix misspelling of "accept*" in netAlexander Zubkov
Several files have "accept*" misspelled as "accpet*" in the comments. Fix all such occurrences. Signed-off-by: Alexander Zubkov <green@qrator.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241008162756.22618-2-green@qrator.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>