summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-02-08net: ethernet: mtk_eth_soc: fix DSA TX tag hwaccel for switch port 0Vladimir Oltean
Arınç reports that on his MT7621AT Unielec U7621-06 board and MT7623NI Bananapi BPI-R2, packets received by the CPU over mt7530 switch port 0 (of which this driver acts as the DSA master) are not processed correctly by software. More precisely, they arrive without a DSA tag (in packet or in the hwaccel area - skb_metadata_dst()), so DSA cannot demux them towards the switch's interface for port 0. Traffic from other ports receives a skb_metadata_dst() with the correct port and is demuxed properly. Looking at mtk_poll_rx(), it becomes apparent that this driver uses the skb vlan hwaccel area: union { u32 vlan_all; struct { __be16 vlan_proto; __u16 vlan_tci; }; }; as a temporary storage for the VLAN hwaccel tag, or the DSA hwaccel tag. If this is a DSA master it's a DSA hwaccel tag, and finally clears up the skb VLAN hwaccel header. I'm guessing that the problem is the (mis)use of API. skb_vlan_tag_present() looks like this: #define skb_vlan_tag_present(__skb) (!!(__skb)->vlan_all) So if both vlan_proto and vlan_tci are zeroes, skb_vlan_tag_present() returns precisely false. I don't know for sure what is the format of the DSA hwaccel tag, but I surely know that lowermost 3 bits of vlan_proto are 0 when receiving from port 0: unsigned int port = vlan_proto & GENMASK(2, 0); If the RX descriptor has no other bits set to non-zero values in RX_DMA_VTAG, then the call to __vlan_hwaccel_put_tag() will not, in fact, make the subsequent skb_vlan_tag_present() return true, because it's implemented like this: static inline void __vlan_hwaccel_put_tag(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci) { skb->vlan_proto = vlan_proto; skb->vlan_tci = vlan_tci; } What we need to do to fix this problem (assuming this is the problem) is to stop using skb->vlan_all as temporary storage for driver affairs, and just create some local variables that serve the same purpose, but hopefully better. Instead of calling skb_vlan_tag_present(), let's look at a boolean has_hwaccel_tag which we set to true when the RX DMA descriptors have something. Disambiguate based on netdev_uses_dsa() whether this is a VLAN or DSA hwaccel tag, and only call __vlan_hwaccel_put_tag() if we're certain it's a VLAN tag. Arınç confirms that the treatment works, so this validates the assumption. Link: https://lore.kernel.org/netdev/704f3a72-fc9e-714a-db54-272e17612637@arinc9.com/ Fixes: 2d7605a72906 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging") Reported-by: Arınç ÜNAL <arinc.unal@arinc9.com> Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08nfp: ethtool: fix the bug of setting unsupported port speedYu Xiao
Unsupported port speed can be set and cause error. Now fixing it and return an error if setting unsupported speed. This fix depends on the following, which was included in v6.2-rc1: commit a61474c41e8c ("nfp: ethtool: support reporting link modes"). Fixes: 7c698737270f ("nfp: add support for .set_link_ksettings()") Signed-off-by: Yu Xiao <yu.xiao@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08txhash: fix sk->sk_txrehash defaultKevin Yang
This code fix a bug that sk->sk_txrehash gets its default enable value from sysctl_txrehash only when the socket is a TCP listener. We should have sysctl_txrehash to set the default sk->sk_txrehash, no matter TCP, nor listerner/connector. Tested by following packetdrill: 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 socket(..., SOCK_DGRAM, IPPROTO_UDP) = 4 // SO_TXREHASH == 74, default to sysctl_txrehash == 1 +0 getsockopt(3, SOL_SOCKET, 74, [1], [4]) = 0 +0 getsockopt(4, SOL_SOCKET, 74, [1], [4]) = 0 Fixes: 26859240e4ee ("txhash: Add socket option to control TX hash rethink behavior") Signed-off-by: Kevin Yang <yyd@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08net: ethernet: mtk_eth_soc: fix wrong parameters order in __xdp_rxq_info_reg()Tariq Toukan
Parameters 'queue_index' and 'napi_id' are passed in a swapped order. Fix it here. Fixes: 23233e577ef9 ("net: ethernet: mtk_eth_soc: rely on page_pool for single page buffers") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08net: ethernet: mtk_eth_soc: enable special tag when any MAC uses DSAArınç ÜNAL
The special tag is only enabled when the first MAC uses DSA. However, it must be enabled when any MAC uses DSA. Change the check accordingly. This fixes hardware DSA untagging not working on the second MAC of the MT7621 and MT7623 SoCs, and likely other SoCs too. Therefore, remove the check that disables hardware DSA untagging for the second MAC of the MT7621 and MT7623 SoCs. Fixes: a1f47752fd62 ("net: ethernet: mtk_eth_soc: disable hardware DSA untagging for second MAC") Co-developed-by: Richard van Schagen <richard@routerhints.com> Signed-off-by: Richard van Schagen <richard@routerhints.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-07net: sched: sch: Fix off by one in htb_activate_prios()Dan Carpenter
The > needs be >= to prevent an out of bounds access. Fixes: de5ca4c3852f ("net: sched: sch: Bounds check priority") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/Y+D+KN18FQI2DKLq@kili Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-02-06 (ice) This series contains updates to ice driver only. Ani removes WQ_MEM_RECLAIM flag from workqueue to resolve check_flush_dependency warning. Michal fixes KASAN out-of-bounds warning. Brett corrects behaviour for port VLAN Rx filters to prevent receiving of unintended traffic. Dan Carpenter fixes possible off by one issue. Zhang Changzhong adjusts error path for switch recipe to prevent memory leak. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: switch: fix potential memleak in ice_add_adv_recipe() ice: Fix off by one in ice_tc_forward_to_queue() ice: Fix disabling Rx VLAN filtering with port VLAN enabled ice: fix out-of-bounds KASAN warning in virtchnl ice: Do not use WQ_MEM_RECLAIM flag for workqueue ==================== Link: https://lore.kernel.org/r/20230206232934.634298-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07igc: Add ndo_tx_timeout supportSasha Neftin
On some platforms, 100/1000/2500 speeds seem to have sometimes problems reporting false positive tx unit hang during stressful UDP traffic. Likely other Intel drivers introduce responses to a tx hang. Update the 'tx hang' comparator with the comparison of the head and tail of ring pointers and restore the tx_timeout_factor to the previous value (one). This can be test by using netperf or iperf3 applications. Example: iperf3 -s -p 5001 iperf3 -c 192.168.0.2 --udp -p 5001 --time 600 -b 0 netserver -p 16604 netperf -H 192.168.0.2 -l 600 -p 16604 -t UDP_STREAM -- -m 64000 Fixes: b27b8dc77b5e ("igc: Increase timeout value for Speed 100/1000/2500") Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230206235818.662384-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: various virtualization cleanups Jacob Keller says: This series contains a variety of refactors and cleanups in the VF code for the ice driver. Its primary focus is cleanup and simplification of the VF operations and addition of a few new operations that will be required by Scalable IOV, as well as some other refactors needed for the handling of VF subfunctions. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: remove unnecessary virtchnl_ether_addr struct use ice: introduce .irq_close VF operation ice: introduce clear_reset_state operation ice: convert vf_ops .vsi_rebuild to .create_vsi ice: introduce ice_vf_init_host_cfg function ice: add a function to initialize vf entry ice: Pull common tasks into ice_vf_post_vsi_rebuild ice: move ice_vf_vsi_release into ice_vf_lib.c ice: move vsi_type assignment from ice_vsi_alloc to ice_vsi_cfg ice: refactor VSI setup to use parameter structure ice: drop unnecessary VF parameter from several VSI functions ice: fix function comment referring to ice_vsi_alloc ice: Add more usage of existing function ice_get_vf_vsi(vf) ==================== Link: https://lore.kernel.org/r/20230206214813.20107-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07net: mana: Fix accessing freed irq affinity_hintHaiyang Zhang
After calling irq_set_affinity_and_hint(), the cpumask pointer is saved in desc->affinity_hint, and will be used later when reading /proc/irq/<num>/affinity_hint. So the cpumask variable needs to be persistent. Otherwise, we are accessing freed memory when reading the affinity_hint file. Also, need to clear affinity_hint before free_irq(), otherwise there is a one-time warning and stack trace during module unloading: [ 243.948687] WARNING: CPU: 10 PID: 1589 at kernel/irq/manage.c:1913 free_irq+0x318/0x360 ... [ 243.948753] Call Trace: [ 243.948754] <TASK> [ 243.948760] mana_gd_remove_irqs+0x78/0xc0 [mana] [ 243.948767] mana_gd_remove+0x3e/0x80 [mana] [ 243.948773] pci_device_remove+0x3d/0xb0 [ 243.948778] device_remove+0x46/0x70 [ 243.948782] device_release_driver_internal+0x1fe/0x280 [ 243.948785] driver_detach+0x4e/0xa0 [ 243.948787] bus_remove_driver+0x70/0xf0 [ 243.948789] driver_unregister+0x35/0x60 [ 243.948792] pci_unregister_driver+0x44/0x90 [ 243.948794] mana_driver_exit+0x14/0x3fe [mana] [ 243.948800] __do_sys_delete_module.constprop.0+0x185/0x2f0 To fix the bug, use the persistent mask, cpumask_of(cpu#), and set affinity_hint to NULL before freeing the IRQ, as required by free_irq(). Cc: stable@vger.kernel.org Fixes: 71fa6887eeca ("net: mana: Assign interrupts to CPUs based on NUMA nodes") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/1675718929-19565-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07Merge tag 'linux-can-fixes-for-6.2-20230207' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== can 2023-02-07 The patch is from Devid Antonio Filoni and fixes an address claiming problem in the J1939 CAN protocol. * tag 'linux-can-fixes-for-6.2-20230207' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: j1939: do not wait 250 ms if the same addr was already claimed ==================== Link: https://lore.kernel.org/r/20230207140514.2885065-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07hv_netvsc: Allocate memory in netvsc_dma_map() with GFP_ATOMICMichael Kelley
Memory allocations in the network transmit path must use GFP_ATOMIC so they won't sleep. Reported-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/lkml/8a4d08f94d3e6fe8b6da68440eaa89a088ad84f9.camel@redhat.com/ Fixes: 846da38de0e8 ("net: netvsc: Add Isolation VM support for netvsc driver") Cc: stable@vger.kernel.org Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1675714317-48577-1-git-send-email-mikelley@microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07devlink: Fix memleak in health diagnose callbackMoshe Shemesh
The callback devlink_nl_cmd_health_reporter_diagnose_doit() miss devlink_fmsg_free(), which leads to memory leak. Fix it by adding devlink_fmsg_free(). Fixes: e994a75fb7f9 ("devlink: remove reporter reference counting") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/1675698976-45993-1-git-send-email-moshe@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07nfp: flower: add check for flower VF netdevs for get/set_eepromJames Hershaw
Move the nfp_net_get_port_mac_by_hwinfo() check to ahead in the get/set_eeprom() functions to in order to check for a VF netdev, which this function does not support. It is debatable if this is a fix or an enhancement, and we have chosen to go for the latter. It does address a problem introduced by commit 74b4f1739d4e ("nfp: flower: change get/set_eeprom logic and enable for flower reps"). However, the ethtool->len == 0 check avoids the problem manifesting as a run-time bug (NULL pointer dereference of app). Signed-off-by: James Hershaw <james.hershaw@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230206154836.2803995-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07Merge branch 'mlxsw-misc-devlink-changes'Jakub Kicinski
Petr Machata says: ==================== mlxsw: Misc devlink changes This patchset adjusts mlxsw to recent devlink changes in net-next. Patch #1 removes a devl_param_driverinit_value_set() call that was unnecessary, but now additionally triggers a WARN_ON. Patches #2-#4 are non-functional preparations for the following patches. Patch #5 fixes a use-after-free that is triggered while changing network namespaces. Patch #6 makes mlxsw consistent with netdevsim by having mlxsw register its devlink instance before its sub-objects. It helps us avoid a warning described in the commit message. ==================== Link: https://lore.kernel.org/r/cover.1675692666.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07mlxsw: core: Register devlink instance before sub-objectsIdo Schimmel
Recent changes made it possible to register the devlink instance before its sub-objects and under the instance lock. Among other things, it allows us to avoid warnings such as this one [1]. The warning is generated because a buggy firmware is generating a health event during driver initialization, before the devlink instance is registered. Move the registration of the devlink instance to the beginning of the initialization flow to avoid such problems. A similar change was implemented in netdevsim in commit 82a3aef2e6af ("netdevsim: move devlink registration under the instance lock"). [1] WARNING: CPU: 3 PID: 49 at net/devlink/leftover.c:7509 devlink_recover_notify.constprop.0+0xaf/0xc0 [...] Call Trace: <TASK> devlink_health_report+0x45/0x1d0 mlxsw_core_health_event_work+0x24/0x30 [mlxsw_core] process_one_work+0x1db/0x390 worker_thread+0x49/0x3b0 kthread+0xe5/0x110 ret_from_fork+0x1f/0x30 </TASK> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07mlxsw: spectrum_acl_tcam: Move devlink param to TCAM codeIdo Schimmel
Cited commit added 'DEVLINK_CMD_PARAM_DEL' notifications whenever the network namespace of the devlink instance is changed. Specifically, the notifications are generated after calling reload_down(), but before calling reload_up(). At this stage, the data structures accessed while reading the value of the "acl_region_rehash_interval" devlink parameter are uninitialized, resulting in a use-after-free [1]. Fix by moving the registration and unregistration of the devlink parameter to the TCAM code where it is actually used. This means that the parameter is unregistered during reload_down() and then re-registered during reload_up(), avoiding the use-after-free between these two operations. Reproducer: # ip netns add test123 # devlink dev reload pci/0000:06:00.0 netns test123 [1] BUG: KASAN: use-after-free in mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xb2/0xd0 Read of size 4 at addr ffff888162fd37d8 by task devlink/1323 [...] Call Trace: <TASK> dump_stack_lvl+0x95/0xbd print_report+0x181/0x4a1 kasan_report+0xdb/0x200 mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xb2/0xd0 mlxsw_sp_params_acl_region_rehash_intrvl_get+0x32/0x80 devlink_nl_param_fill.constprop.0+0x29a/0x11e0 devlink_param_notify.constprop.0+0xb9/0x250 devlink_notify_unregister+0xbc/0x470 devlink_reload+0x1aa/0x440 devlink_nl_cmd_reload+0x559/0x11b0 genl_family_rcv_msg_doit.isra.0+0x1f8/0x2e0 genl_rcv_msg+0x558/0x7f0 netlink_rcv_skb+0x170/0x440 genl_rcv+0x2d/0x40 netlink_unicast+0x53f/0x810 netlink_sendmsg+0x961/0xe80 __sys_sendto+0x2a4/0x420 __x64_sys_sendto+0xe5/0x1c0 do_syscall_64+0x38/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 7d7e9169a3ec ("devlink: move devlink reload notifications back in between _down() and _up() calls") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07mlxsw: spectrum_acl_tcam: Reorder functions to avoid forward declarationsIdo Schimmel
Move the initialization and de-initialization code further below in order to avoid forward declarations in the next patch. No functional changes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07mlxsw: spectrum_acl_tcam: Make fini symmetric to initIdo Schimmel
Move mutex_destroy() to the end to make the function symmetric with mlxsw_sp_acl_tcam_init(). No functional changes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07mlxsw: spectrum_acl_tcam: Add missing mutex_destroy()Ido Schimmel
Pair mutex_init() with a mutex_destroy() in the error path. Found during code review. No functional changes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07mlxsw: spectrum: Remove pointless call to devlink_param_driverinit_value_set()Danielle Ratson
The "acl_region_rehash_interval" devlink parameter is a "runtime" parameter, making the call to devl_param_driverinit_value_set() pointless. Before cited commit the function simply returned an error (that was not checked), but now it emits a WARNING [1]. Fix by removing the function call. [1] WARNING: CPU: 0 PID: 7 at net/devlink/leftover.c:10974 devl_param_driverinit_value_set+0x8c/0x90 [...] Call Trace: <TASK> mlxsw_sp2_params_register+0x83/0xb0 [mlxsw_spectrum] __mlxsw_core_bus_device_register+0x5e5/0x990 [mlxsw_core] mlxsw_core_bus_device_register+0x42/0x60 [mlxsw_core] mlxsw_pci_probe+0x1f0/0x230 [mlxsw_pci] local_pci_probe+0x1a/0x40 work_for_cpu_fn+0xf/0x20 process_one_work+0x1db/0x390 worker_thread+0x1d5/0x3b0 kthread+0xe5/0x110 ret_from_fork+0x1f/0x30 </TASK> Fixes: 85fe0b324c83 ("devlink: make devlink_param_driverinit_value_set() return void") Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07net: enetc: add support for MAC Merge statistics countersVladimir Oltean
Add PF driver support for the following: - Viewing the standardized MAC Merge layer counters. - Viewing the standardized Ethernet MAC and RMON counters associated with the pMAC. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230206094531.444988-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07net: enetc: add support for MAC Merge layerVladimir Oltean
Add PF driver support for viewing and changing the MAC Merge sublayer parameters, and seeing the verification state machine's current state. The verification handshake with the link partner is driven by hardware. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230206094531.444988-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07net/mlx5: Serialize module cleanup with reload and removeShay Drory
Currently, remove and reload flows can run in parallel to module cleanup. This design is error prone. For example: aux_drivers callbacks are called from both cleanup and remove flows with different lockings, which can cause a deadlock[1]. Hence, serialize module cleanup with reload and remove. [1] cleanup remove ------- ------ auxiliary_driver_unregister(); devl_lock() auxiliary_device_delete(mlx5e_aux) device_lock(mlx5e_aux) devl_lock() device_lock(mlx5e_aux) Fixes: 912cebf420c2 ("net/mlx5e: Connect ethernet part to auxiliary bus") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07net/mlx5: fw_tracer, Zero consumer index when reloading the tracerShay Drory
When tracer is reloaded, the device will log the traces at the beginning of the log buffer. Also, driver is reading the log buffer in chunks in accordance to the consumer index. Hence, zero consumer index when reloading the tracer. Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07net/mlx5: fw_tracer, Clear load bit when freeing string DBs buffersShay Drory
Whenever the driver is reading the string DBs into buffers, the driver is setting the load bit, but the driver never clears this bit. As a result, in case load bit is on and the driver query the device for new string DBs, the driver won't read again the string DBs. Fix it by clearing the load bit when query the device for new string DBs. Fixes: 2d69356752ff ("net/mlx5: Add support for fw live patch event") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07net/mlx5: Expose SF firmware pages counterMaher Sanalla
Currently, each core device has VF pages counter which stores number of fw pages used by its VFs and SFs. The current design led to a hang when performing firmware reset on DPU, where the DPU PFs stalled in sriov unload flow due to waiting on release of SFs pages instead of waiting on only VFs pages. Thus, Add a separate counter for SF firmware pages, which will prevent the stall scenario described above. Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07net/mlx5: Store page counters in a single arrayMaher Sanalla
Currently, an independent page counter is used for tracking memory usage for each function type such as VF, PF and host PF (DPU). For better code-readibilty, use a single array that stores the number of allocated memory pages for each function type. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07net/mlx5e: IPoIB, Show unknown speed instead of errorDragos Tatulea
ethtool is returning an error for unknown speeds for the IPoIB interface: $ ethtool ib0 netlink error: failed to retrieve link settings netlink error: Invalid argument netlink error: failed to retrieve link settings netlink error: Invalid argument Settings for ib0: Link detected: no After this change, ethtool will return success and show "unknown speed": $ ethtool ib0 Settings for ib0: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: Unknown! Duplex: Full Auto-negotiation: off Port: Other PHYAD: 0 Transceiver: internal Link detected: no Fixes: eb234ee9d541 ("net/mlx5e: IPoIB, Add support for get_link_ksettings in ethtool") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07net/mlx5e: Fix crash unsetting rx-vlan-filter in switchdev modeAmir Tzin
Moving to switchdev mode with rx-vlan-filter on and then setting it off causes the kernel to crash since fs->vlan is freed during nic profile cleanup flow. RX VLAN filtering is not supported in switchdev mode so unset it when changing to switchdev and restore its value when switching back to legacy. trace: [] RIP: 0010:mlx5e_disable_cvlan_filter+0x43/0x70 [] set_feature_cvlan_filter+0x37/0x40 [mlx5_core] [] mlx5e_handle_feature+0x3a/0x60 [mlx5_core] [] mlx5e_set_features+0x6d/0x160 [mlx5_core] [] __netdev_update_features+0x288/0xa70 [] ethnl_set_features+0x309/0x380 [] ? __nla_parse+0x21/0x30 [] genl_family_rcv_msg_doit.isra.17+0x110/0x150 [] genl_rcv_msg+0x112/0x260 [] ? features_reply_size+0xe0/0xe0 [] ? genl_family_rcv_msg_doit.isra.17+0x150/0x150 [] netlink_rcv_skb+0x4e/0x100 [] genl_rcv+0x24/0x40 [] netlink_unicast+0x1ab/0x290 [] netlink_sendmsg+0x257/0x4f0 [] sock_sendmsg+0x5c/0x70 Fixes: cb67b832921c ("net/mlx5e: Introduce SRIOV VF representors") Signed-off-by: Amir Tzin <amirtz@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07net/mlx5: Bridge, fix ageing of peer FDB entriesVlad Buslov
SWITCHDEV_FDB_ADD_TO_BRIDGE event handler that updates FDB entry 'lastuse' field is only executed for eswitch that owns the entry. However, if peer entry processed packets at least once it will have hardware counter 'used' value greater than entry 'lastuse' from that point on, which will cause FDB entry not being aged out. Process the event on all eswitch instances. Fixes: ff9b7521468b ("net/mlx5: Bridge, support LAG") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07net/mlx5: DR, Fix potential race in dr_rule_create_rule_nicYevgeny Kliteynik
Selecting builder should be protected by the lock to prevent the case where a new rule sets a builder in the nic_matcher while the previous rule is still using the nic_matcher. Fixing this issue and cleaning the error flow. Fixes: b9b81e1e9382 ("net/mlx5: DR, For short chains of STEs, avoid allocating ste_arr dynamically") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag changeAdham Faris
rq->hw_mtu is used in function en_rx.c/mlx5e_skb_from_cqe_mpwrq_linear() to catch oversized packets. If FCS is concatenated to the end of the packet then the check should be updated accordingly. Rx rings initialization (mlx5e_init_rxq_rq()) invoked for every new set of channels, as part of mlx5e_safe_switch_params(), unknowingly if it runs with default configuration or not. Current rq->hw_mtu initialization assumes default configuration and ignores params->scatter_fcs_en flag state. Fix this, by accounting for params->scatter_fcs_en flag state during rq->hw_mtu initialization. In addition, updating rq->hw_mtu value during ingress traffic might lead to packets drop and oversize_pkts_sw_drop counter increase with no good reason. Hence we remove this optimization and switch the set of channels with a new one, to make sure we don't get false positives on the oversize_pkts_sw_drop counter. Fixes: 102722fc6832 ("net/mlx5e: Add support for RXFCS feature flag") Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07bpf, docs: Use consistent names for the same fieldDave Thaler
Use consistent names for the same field, e.g., 'dst' vs 'dst_reg'. Previously a mix of terms were used for the same thing in various cases. Signed-off-by: Dave Thaler <dthaler@microsoft.com> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230127224555.916-1-dthaler1968@googlemail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-07Merge branch 'sched-cpumask-improve-on-cpumask_local_spread-locality'Jakub Kicinski
Yury Norov says: ==================== sched: cpumask: improve on cpumask_local_spread() locality cpumask_local_spread() currently checks local node for presence of i'th CPU, and then if it finds nothing makes a flat search among all non-local CPUs. We can do it better by checking CPUs per NUMA hops. This has significant performance implications on NUMA machines, for example when using NUMA-aware allocated memory together with NUMA-aware IRQ affinity hints. Performance tests from patch 8 of this series for mellanox network driver show: TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on). Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121 +-------------------------+-----------+------------------+------------------+ | | BW (Gbps) | TX side CPU util | RX side CPU util | +-------------------------+-----------+------------------+------------------+ | Baseline | 52.3 | 6.4 % | 17.9 % | +-------------------------+-----------+------------------+------------------+ | Applied on TX side only | 52.6 | 5.2 % | 18.5 % | +-------------------------+-----------+------------------+------------------+ | Applied on RX side only | 94.9 | 11.9 % | 27.2 % | +-------------------------+-----------+------------------+------------------+ | Applied on both sides | 95.1 | 8.4 % | 27.3 % | +-------------------------+-----------+------------------+------------------+ Bottleneck in RX side is released, reached linerate (~1.8x speedup). ~30% less cpu util on TX. ==================== Link: https://lore.kernel.org/r/20230121042436.2661843-1-yury.norov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07lib/cpumask: update comment for cpumask_local_spread()Yury Norov
Now that we have an iterator-based alternative for a very common case of using cpumask_local_spread for all cpus in a row, it's worth to mention that in comment to cpumask_local_spread(). Signed-off-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hintsTariq Toukan
In the IRQ affinity hints, replace the binary NUMA preference (local / remote) with the improved for_each_numa_hop_cpu() API that minds the actual distances, so that remote NUMAs with short distance are preferred over farther ones. This has significant performance implications when using NUMA-aware allocated memory (follow [1] and derivatives for example). [1] drivers/net/ethernet/mellanox/mlx5/core/en_main.c :: mlx5e_open_channel() int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix)); Performance tests: TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on). Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121 +-------------------------+-----------+------------------+------------------+ | | BW (Gbps) | TX side CPU util | RX side CPU util | +-------------------------+-----------+------------------+------------------+ | Baseline | 52.3 | 6.4 % | 17.9 % | +-------------------------+-----------+------------------+------------------+ | Applied on TX side only | 52.6 | 5.2 % | 18.5 % | +-------------------------+-----------+------------------+------------------+ | Applied on RX side only | 94.9 | 11.9 % | 27.2 % | +-------------------------+-----------+------------------+------------------+ | Applied on both sides | 95.1 | 8.4 % | 27.3 % | +-------------------------+-----------+------------------+------------------+ Bottleneck in RX side is released, reached linerate (~1.8x speedup). ~30% less cpu util on TX. * CPU util on active cores only. Setups details (similar for both sides): NIC: ConnectX6-DX dual port, 100 Gbps each. Single port used in the tests. $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 256 On-line CPU(s) list: 0-255 Thread(s) per core: 2 Core(s) per socket: 64 Socket(s): 2 NUMA node(s): 16 Vendor ID: AuthenticAMD CPU family: 25 Model: 1 Model name: AMD EPYC 7763 64-Core Processor Stepping: 1 CPU MHz: 2594.804 BogoMIPS: 4890.73 Virtualization: AMD-V L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 32768K NUMA node0 CPU(s): 0-7,128-135 NUMA node1 CPU(s): 8-15,136-143 NUMA node2 CPU(s): 16-23,144-151 NUMA node3 CPU(s): 24-31,152-159 NUMA node4 CPU(s): 32-39,160-167 NUMA node5 CPU(s): 40-47,168-175 NUMA node6 CPU(s): 48-55,176-183 NUMA node7 CPU(s): 56-63,184-191 NUMA node8 CPU(s): 64-71,192-199 NUMA node9 CPU(s): 72-79,200-207 NUMA node10 CPU(s): 80-87,208-215 NUMA node11 CPU(s): 88-95,216-223 NUMA node12 CPU(s): 96-103,224-231 NUMA node13 CPU(s): 104-111,232-239 NUMA node14 CPU(s): 112-119,240-247 NUMA node15 CPU(s): 120-127,248-255 .. $ numactl -H .. node distances: node 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: 10 11 11 11 12 12 12 12 32 32 32 32 32 32 32 32 1: 11 10 11 11 12 12 12 12 32 32 32 32 32 32 32 32 2: 11 11 10 11 12 12 12 12 32 32 32 32 32 32 32 32 3: 11 11 11 10 12 12 12 12 32 32 32 32 32 32 32 32 4: 12 12 12 12 10 11 11 11 32 32 32 32 32 32 32 32 5: 12 12 12 12 11 10 11 11 32 32 32 32 32 32 32 32 6: 12 12 12 12 11 11 10 11 32 32 32 32 32 32 32 32 7: 12 12 12 12 11 11 11 10 32 32 32 32 32 32 32 32 8: 32 32 32 32 32 32 32 32 10 11 11 11 12 12 12 12 9: 32 32 32 32 32 32 32 32 11 10 11 11 12 12 12 12 10: 32 32 32 32 32 32 32 32 11 11 10 11 12 12 12 12 11: 32 32 32 32 32 32 32 32 11 11 11 10 12 12 12 12 12: 32 32 32 32 32 32 32 32 12 12 12 12 10 11 11 11 13: 32 32 32 32 32 32 32 32 12 12 12 12 11 10 11 11 14: 32 32 32 32 32 32 32 32 12 12 12 12 11 11 10 11 15: 32 32 32 32 32 32 32 32 12 12 12 12 11 11 11 10 $ cat /sys/class/net/ens5f0/device/numa_node 14 Affinity hints (127 IRQs): Before: 331: 00000000,00000000,00000000,00000000,00010000,00000000,00000000,00000000 332: 00000000,00000000,00000000,00000000,00020000,00000000,00000000,00000000 333: 00000000,00000000,00000000,00000000,00040000,00000000,00000000,00000000 334: 00000000,00000000,00000000,00000000,00080000,00000000,00000000,00000000 335: 00000000,00000000,00000000,00000000,00100000,00000000,00000000,00000000 336: 00000000,00000000,00000000,00000000,00200000,00000000,00000000,00000000 337: 00000000,00000000,00000000,00000000,00400000,00000000,00000000,00000000 338: 00000000,00000000,00000000,00000000,00800000,00000000,00000000,00000000 339: 00010000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 340: 00020000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 341: 00040000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 342: 00080000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 343: 00100000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 344: 00200000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 345: 00400000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 346: 00800000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 347: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001 348: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000002 349: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000004 350: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000008 351: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000010 352: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000020 353: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000040 354: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000080 355: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000100 356: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000200 357: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000400 358: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000800 359: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00001000 360: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00002000 361: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00004000 362: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00008000 363: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00010000 364: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00020000 365: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00040000 366: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00080000 367: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00100000 368: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00200000 369: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00400000 370: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00800000 371: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,01000000 372: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,02000000 373: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,04000000 374: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,08000000 375: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,10000000 376: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,20000000 377: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,40000000 378: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,80000000 379: 00000000,00000000,00000000,00000000,00000000,00000000,00000001,00000000 380: 00000000,00000000,00000000,00000000,00000000,00000000,00000002,00000000 381: 00000000,00000000,00000000,00000000,00000000,00000000,00000004,00000000 382: 00000000,00000000,00000000,00000000,00000000,00000000,00000008,00000000 383: 00000000,00000000,00000000,00000000,00000000,00000000,00000010,00000000 384: 00000000,00000000,00000000,00000000,00000000,00000000,00000020,00000000 385: 00000000,00000000,00000000,00000000,00000000,00000000,00000040,00000000 386: 00000000,00000000,00000000,00000000,00000000,00000000,00000080,00000000 387: 00000000,00000000,00000000,00000000,00000000,00000000,00000100,00000000 388: 00000000,00000000,00000000,00000000,00000000,00000000,00000200,00000000 389: 00000000,00000000,00000000,00000000,00000000,00000000,00000400,00000000 390: 00000000,00000000,00000000,00000000,00000000,00000000,00000800,00000000 391: 00000000,00000000,00000000,00000000,00000000,00000000,00001000,00000000 392: 00000000,00000000,00000000,00000000,00000000,00000000,00002000,00000000 393: 00000000,00000000,00000000,00000000,00000000,00000000,00004000,00000000 394: 00000000,00000000,00000000,00000000,00000000,00000000,00008000,00000000 395: 00000000,00000000,00000000,00000000,00000000,00000000,00010000,00000000 396: 00000000,00000000,00000000,00000000,00000000,00000000,00020000,00000000 397: 00000000,00000000,00000000,00000000,00000000,00000000,00040000,00000000 398: 00000000,00000000,00000000,00000000,00000000,00000000,00080000,00000000 399: 00000000,00000000,00000000,00000000,00000000,00000000,00100000,00000000 400: 00000000,00000000,00000000,00000000,00000000,00000000,00200000,00000000 401: 00000000,00000000,00000000,00000000,00000000,00000000,00400000,00000000 402: 00000000,00000000,00000000,00000000,00000000,00000000,00800000,00000000 403: 00000000,00000000,00000000,00000000,00000000,00000000,01000000,00000000 404: 00000000,00000000,00000000,00000000,00000000,00000000,02000000,00000000 405: 00000000,00000000,00000000,00000000,00000000,00000000,04000000,00000000 406: 00000000,00000000,00000000,00000000,00000000,00000000,08000000,00000000 407: 00000000,00000000,00000000,00000000,00000000,00000000,10000000,00000000 408: 00000000,00000000,00000000,00000000,00000000,00000000,20000000,00000000 409: 00000000,00000000,00000000,00000000,00000000,00000000,40000000,00000000 410: 00000000,00000000,00000000,00000000,00000000,00000000,80000000,00000000 411: 00000000,00000000,00000000,00000000,00000000,00000001,00000000,00000000 412: 00000000,00000000,00000000,00000000,00000000,00000002,00000000,00000000 413: 00000000,00000000,00000000,00000000,00000000,00000004,00000000,00000000 414: 00000000,00000000,00000000,00000000,00000000,00000008,00000000,00000000 415: 00000000,00000000,00000000,00000000,00000000,00000010,00000000,00000000 416: 00000000,00000000,00000000,00000000,00000000,00000020,00000000,00000000 417: 00000000,00000000,00000000,00000000,00000000,00000040,00000000,00000000 418: 00000000,00000000,00000000,00000000,00000000,00000080,00000000,00000000 419: 00000000,00000000,00000000,00000000,00000000,00000100,00000000,00000000 420: 00000000,00000000,00000000,00000000,00000000,00000200,00000000,00000000 421: 00000000,00000000,00000000,00000000,00000000,00000400,00000000,00000000 422: 00000000,00000000,00000000,00000000,00000000,00000800,00000000,00000000 423: 00000000,00000000,00000000,00000000,00000000,00001000,00000000,00000000 424: 00000000,00000000,00000000,00000000,00000000,00002000,00000000,00000000 425: 00000000,00000000,00000000,00000000,00000000,00004000,00000000,00000000 426: 00000000,00000000,00000000,00000000,00000000,00008000,00000000,00000000 427: 00000000,00000000,00000000,00000000,00000000,00010000,00000000,00000000 428: 00000000,00000000,00000000,00000000,00000000,00020000,00000000,00000000 429: 00000000,00000000,00000000,00000000,00000000,00040000,00000000,00000000 430: 00000000,00000000,00000000,00000000,00000000,00080000,00000000,00000000 431: 00000000,00000000,00000000,00000000,00000000,00100000,00000000,00000000 432: 00000000,00000000,00000000,00000000,00000000,00200000,00000000,00000000 433: 00000000,00000000,00000000,00000000,00000000,00400000,00000000,00000000 434: 00000000,00000000,00000000,00000000,00000000,00800000,00000000,00000000 435: 00000000,00000000,00000000,00000000,00000000,01000000,00000000,00000000 436: 00000000,00000000,00000000,00000000,00000000,02000000,00000000,00000000 437: 00000000,00000000,00000000,00000000,00000000,04000000,00000000,00000000 438: 00000000,00000000,00000000,00000000,00000000,08000000,00000000,00000000 439: 00000000,00000000,00000000,00000000,00000000,10000000,00000000,00000000 440: 00000000,00000000,00000000,00000000,00000000,20000000,00000000,00000000 441: 00000000,00000000,00000000,00000000,00000000,40000000,00000000,00000000 442: 00000000,00000000,00000000,00000000,00000000,80000000,00000000,00000000 443: 00000000,00000000,00000000,00000000,00000001,00000000,00000000,00000000 444: 00000000,00000000,00000000,00000000,00000002,00000000,00000000,00000000 445: 00000000,00000000,00000000,00000000,00000004,00000000,00000000,00000000 446: 00000000,00000000,00000000,00000000,00000008,00000000,00000000,00000000 447: 00000000,00000000,00000000,00000000,00000010,00000000,00000000,00000000 448: 00000000,00000000,00000000,00000000,00000020,00000000,00000000,00000000 449: 00000000,00000000,00000000,00000000,00000040,00000000,00000000,00000000 450: 00000000,00000000,00000000,00000000,00000080,00000000,00000000,00000000 451: 00000000,00000000,00000000,00000000,00000100,00000000,00000000,00000000 452: 00000000,00000000,00000000,00000000,00000200,00000000,00000000,00000000 453: 00000000,00000000,00000000,00000000,00000400,00000000,00000000,00000000 454: 00000000,00000000,00000000,00000000,00000800,00000000,00000000,00000000 455: 00000000,00000000,00000000,00000000,00001000,00000000,00000000,00000000 456: 00000000,00000000,00000000,00000000,00002000,00000000,00000000,00000000 457: 00000000,00000000,00000000,00000000,00004000,00000000,00000000,00000000 After: 331: 00000000,00000000,00000000,00000000,00010000,00000000,00000000,00000000 332: 00000000,00000000,00000000,00000000,00020000,00000000,00000000,00000000 333: 00000000,00000000,00000000,00000000,00040000,00000000,00000000,00000000 334: 00000000,00000000,00000000,00000000,00080000,00000000,00000000,00000000 335: 00000000,00000000,00000000,00000000,00100000,00000000,00000000,00000000 336: 00000000,00000000,00000000,00000000,00200000,00000000,00000000,00000000 337: 00000000,00000000,00000000,00000000,00400000,00000000,00000000,00000000 338: 00000000,00000000,00000000,00000000,00800000,00000000,00000000,00000000 339: 00010000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 340: 00020000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 341: 00040000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 342: 00080000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 343: 00100000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 344: 00200000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 345: 00400000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 346: 00800000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 347: 00000000,00000000,00000000,00000000,00000001,00000000,00000000,00000000 348: 00000000,00000000,00000000,00000000,00000002,00000000,00000000,00000000 349: 00000000,00000000,00000000,00000000,00000004,00000000,00000000,00000000 350: 00000000,00000000,00000000,00000000,00000008,00000000,00000000,00000000 351: 00000000,00000000,00000000,00000000,00000010,00000000,00000000,00000000 352: 00000000,00000000,00000000,00000000,00000020,00000000,00000000,00000000 353: 00000000,00000000,00000000,00000000,00000040,00000000,00000000,00000000 354: 00000000,00000000,00000000,00000000,00000080,00000000,00000000,00000000 355: 00000000,00000000,00000000,00000000,00000100,00000000,00000000,00000000 356: 00000000,00000000,00000000,00000000,00000200,00000000,00000000,00000000 357: 00000000,00000000,00000000,00000000,00000400,00000000,00000000,00000000 358: 00000000,00000000,00000000,00000000,00000800,00000000,00000000,00000000 359: 00000000,00000000,00000000,00000000,00001000,00000000,00000000,00000000 360: 00000000,00000000,00000000,00000000,00002000,00000000,00000000,00000000 361: 00000000,00000000,00000000,00000000,00004000,00000000,00000000,00000000 362: 00000000,00000000,00000000,00000000,00008000,00000000,00000000,00000000 363: 00000000,00000000,00000000,00000000,01000000,00000000,00000000,00000000 364: 00000000,00000000,00000000,00000000,02000000,00000000,00000000,00000000 365: 00000000,00000000,00000000,00000000,04000000,00000000,00000000,00000000 366: 00000000,00000000,00000000,00000000,08000000,00000000,00000000,00000000 367: 00000000,00000000,00000000,00000000,10000000,00000000,00000000,00000000 368: 00000000,00000000,00000000,00000000,20000000,00000000,00000000,00000000 369: 00000000,00000000,00000000,00000000,40000000,00000000,00000000,00000000 370: 00000000,00000000,00000000,00000000,80000000,00000000,00000000,00000000 371: 00000001,00000000,00000000,00000000,00000000,00000000,00000000,00000000 372: 00000002,00000000,00000000,00000000,00000000,00000000,00000000,00000000 373: 00000004,00000000,00000000,00000000,00000000,00000000,00000000,00000000 374: 00000008,00000000,00000000,00000000,00000000,00000000,00000000,00000000 375: 00000010,00000000,00000000,00000000,00000000,00000000,00000000,00000000 376: 00000020,00000000,00000000,00000000,00000000,00000000,00000000,00000000 377: 00000040,00000000,00000000,00000000,00000000,00000000,00000000,00000000 378: 00000080,00000000,00000000,00000000,00000000,00000000,00000000,00000000 379: 00000100,00000000,00000000,00000000,00000000,00000000,00000000,00000000 380: 00000200,00000000,00000000,00000000,00000000,00000000,00000000,00000000 381: 00000400,00000000,00000000,00000000,00000000,00000000,00000000,00000000 382: 00000800,00000000,00000000,00000000,00000000,00000000,00000000,00000000 383: 00001000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 384: 00002000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 385: 00004000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 386: 00008000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 387: 01000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 388: 02000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 389: 04000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 390: 08000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 391: 10000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 392: 20000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 393: 40000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 394: 80000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000 395: 00000000,00000000,00000000,00000000,00000000,00000001,00000000,00000000 396: 00000000,00000000,00000000,00000000,00000000,00000002,00000000,00000000 397: 00000000,00000000,00000000,00000000,00000000,00000004,00000000,00000000 398: 00000000,00000000,00000000,00000000,00000000,00000008,00000000,00000000 399: 00000000,00000000,00000000,00000000,00000000,00000010,00000000,00000000 400: 00000000,00000000,00000000,00000000,00000000,00000020,00000000,00000000 401: 00000000,00000000,00000000,00000000,00000000,00000040,00000000,00000000 402: 00000000,00000000,00000000,00000000,00000000,00000080,00000000,00000000 403: 00000000,00000000,00000000,00000000,00000000,00000100,00000000,00000000 404: 00000000,00000000,00000000,00000000,00000000,00000200,00000000,00000000 405: 00000000,00000000,00000000,00000000,00000000,00000400,00000000,00000000 406: 00000000,00000000,00000000,00000000,00000000,00000800,00000000,00000000 407: 00000000,00000000,00000000,00000000,00000000,00001000,00000000,00000000 408: 00000000,00000000,00000000,00000000,00000000,00002000,00000000,00000000 409: 00000000,00000000,00000000,00000000,00000000,00004000,00000000,00000000 410: 00000000,00000000,00000000,00000000,00000000,00008000,00000000,00000000 411: 00000000,00000000,00000000,00000000,00000000,00010000,00000000,00000000 412: 00000000,00000000,00000000,00000000,00000000,00020000,00000000,00000000 413: 00000000,00000000,00000000,00000000,00000000,00040000,00000000,00000000 414: 00000000,00000000,00000000,00000000,00000000,00080000,00000000,00000000 415: 00000000,00000000,00000000,00000000,00000000,00100000,00000000,00000000 416: 00000000,00000000,00000000,00000000,00000000,00200000,00000000,00000000 417: 00000000,00000000,00000000,00000000,00000000,00400000,00000000,00000000 418: 00000000,00000000,00000000,00000000,00000000,00800000,00000000,00000000 419: 00000000,00000000,00000000,00000000,00000000,01000000,00000000,00000000 420: 00000000,00000000,00000000,00000000,00000000,02000000,00000000,00000000 421: 00000000,00000000,00000000,00000000,00000000,04000000,00000000,00000000 422: 00000000,00000000,00000000,00000000,00000000,08000000,00000000,00000000 423: 00000000,00000000,00000000,00000000,00000000,10000000,00000000,00000000 424: 00000000,00000000,00000000,00000000,00000000,20000000,00000000,00000000 425: 00000000,00000000,00000000,00000000,00000000,40000000,00000000,00000000 426: 00000000,00000000,00000000,00000000,00000000,80000000,00000000,00000000 427: 00000000,00000001,00000000,00000000,00000000,00000000,00000000,00000000 428: 00000000,00000002,00000000,00000000,00000000,00000000,00000000,00000000 429: 00000000,00000004,00000000,00000000,00000000,00000000,00000000,00000000 430: 00000000,00000008,00000000,00000000,00000000,00000000,00000000,00000000 431: 00000000,00000010,00000000,00000000,00000000,00000000,00000000,00000000 432: 00000000,00000020,00000000,00000000,00000000,00000000,00000000,00000000 433: 00000000,00000040,00000000,00000000,00000000,00000000,00000000,00000000 434: 00000000,00000080,00000000,00000000,00000000,00000000,00000000,00000000 435: 00000000,00000100,00000000,00000000,00000000,00000000,00000000,00000000 436: 00000000,00000200,00000000,00000000,00000000,00000000,00000000,00000000 437: 00000000,00000400,00000000,00000000,00000000,00000000,00000000,00000000 438: 00000000,00000800,00000000,00000000,00000000,00000000,00000000,00000000 439: 00000000,00001000,00000000,00000000,00000000,00000000,00000000,00000000 440: 00000000,00002000,00000000,00000000,00000000,00000000,00000000,00000000 441: 00000000,00004000,00000000,00000000,00000000,00000000,00000000,00000000 442: 00000000,00008000,00000000,00000000,00000000,00000000,00000000,00000000 443: 00000000,00010000,00000000,00000000,00000000,00000000,00000000,00000000 444: 00000000,00020000,00000000,00000000,00000000,00000000,00000000,00000000 445: 00000000,00040000,00000000,00000000,00000000,00000000,00000000,00000000 446: 00000000,00080000,00000000,00000000,00000000,00000000,00000000,00000000 447: 00000000,00100000,00000000,00000000,00000000,00000000,00000000,00000000 448: 00000000,00200000,00000000,00000000,00000000,00000000,00000000,00000000 449: 00000000,00400000,00000000,00000000,00000000,00000000,00000000,00000000 450: 00000000,00800000,00000000,00000000,00000000,00000000,00000000,00000000 451: 00000000,01000000,00000000,00000000,00000000,00000000,00000000,00000000 452: 00000000,02000000,00000000,00000000,00000000,00000000,00000000,00000000 453: 00000000,04000000,00000000,00000000,00000000,00000000,00000000,00000000 454: 00000000,08000000,00000000,00000000,00000000,00000000,00000000,00000000 455: 00000000,10000000,00000000,00000000,00000000,00000000,00000000,00000000 456: 00000000,20000000,00000000,00000000,00000000,00000000,00000000,00000000 457: 00000000,40000000,00000000,00000000,00000000,00000000,00000000,00000000 Signed-off-by: Tariq Toukan <tariqt@nvidia.com> [Tweaked API use] Suggested-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07sched/topology: Introduce for_each_numa_hop_mask()Valentin Schneider
The recently introduced sched_numa_hop_mask() exposes cpumasks of CPUs reachable within a given distance budget, wrap the logic for iterating over all (distance, mask) values inside an iterator macro. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07sched/topology: Introduce sched_numa_hop_mask()Valentin Schneider
Tariq has pointed out that drivers allocating IRQ vectors would benefit from having smarter NUMA-awareness - cpumask_local_spread() only knows about the local node and everything outside is in the same bucket. sched_domains_numa_masks is pretty much what we want to hand out (a cpumask of CPUs reachable within a given distance budget), introduce sched_numa_hop_mask() to export those cpumasks. Link: http://lore.kernel.org/r/20220728191203.4055-1-tariqt@nvidia.com Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07lib/cpumask: reorganize cpumask_local_spread() logicYury Norov
Now after moving all NUMA logic into sched_numa_find_nth_cpu(), else-branch of cpumask_local_spread() is just a function call, and we can simplify logic by using ternary operator. While here, replace BUG() with WARN_ON(). Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07cpumask: improve on cpumask_local_spread() localityYury Norov
Switch cpumask_local_spread() to use newly added sched_numa_find_nth_cpu(), which takes into account distances to each node in the system. For the following NUMA configuration: root@debian:~# numactl -H available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 node 0 size: 3869 MB node 0 free: 3740 MB node 1 cpus: 4 5 node 1 size: 1969 MB node 1 free: 1937 MB node 2 cpus: 6 7 node 2 size: 1967 MB node 2 free: 1873 MB node 3 cpus: 8 9 10 11 12 13 14 15 node 3 size: 7842 MB node 3 free: 7723 MB node distances: node 0 1 2 3 0: 10 50 30 70 1: 50 10 70 30 2: 30 70 10 50 3: 70 30 50 10 The new cpumask_local_spread() traverses cpus for each node like this: node 0: 0 1 2 3 6 7 4 5 8 9 10 11 12 13 14 15 node 1: 4 5 8 9 10 11 12 13 14 15 0 1 2 3 6 7 node 2: 6 7 0 1 2 3 8 9 10 11 12 13 14 15 4 5 node 3: 8 9 10 11 12 13 14 15 4 5 6 7 0 1 2 3 Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07sched: add sched_numa_find_nth_cpu()Yury Norov
The function finds Nth set CPU in a given cpumask starting from a given node. Leveraging the fact that each hop in sched_domains_numa_masks includes the same or greater number of CPUs than the previous one, we can use binary search on hops instead of linear walk, which makes the overall complexity of O(log n) in terms of number of cpumask_weight() calls. Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07cpumask: introduce cpumask_nth_and_andnotYury Norov
Introduce cpumask_nth_and_andnot() based on find_nth_and_andnot_bit(). It's used in the following patch to traverse cpumasks without storing intermediate result in temporary cpumask. Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07lib/find: introduce find_nth_and_andnot_bitYury Norov
In the following patches the function is used to implement in-place bitmaps traversing without storing intermediate result in temporary bitmaps. Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07net/mlx5: fw_tracer, Add support for unrecognized stringShay Drory
In case FW is publishing a string which isn't found in the driver's string DBs, keep the string as raw data. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07net/mlx5: fw_tracer, Add support for strings DB update eventShay Drory
In case a new string DB is added to the FW, the FW publishes an event notifying the strings DB have updated. Add support in driver for handling this event. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07net/mlx5: fw_tracer, allow 0 size string DBsShay Drory
Device can expose string DB of size 0 which means this string DB is currently not in use. Therefore, allow for 0 size string DBs. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07net/mlx5: fw_tracer: Fix debug printShay Drory
The debug message specify tdsn, but takes as an argument the tmsn. The correct argument is tmsn, hence, fix the print. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07net/mlx5: fs, Remove redundant assignment of sizeRoi Dayan
size is being reassigned in the line after. remove the redundant assignment. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07net/mlx5: fs_core, Remove redundant variable errMaor Dickman
Local variable "err" is not used so it is safe to remove. Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>