summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-14net/mlx5e: Refactor mlx5e_rss_set_rxfh() and mlx5e_rss_get_rxfh()Adham Faris
Initialize indirect table array with memcpy rather than for loop. This change has made for two reasons: 1) To be consistent with the indirect table array init in mlx5e_rss_set_rxfh(). 2) In general, prefer to use memcpy for array initializing rather than for loop. Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14net/mlx5e: Refactor rx_res_init() and rx_res_free() APIsAdham Faris
Refactor mlx5e_rx_res_init() and mlx5e_rx_res_free() by wrapping mlx5e_rx_res_alloc() and mlx5e_rx_res_destroy() API's respectively. Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14net/mlx5e: Use PTR_ERR_OR_ZERO() to simplify codeYu Liao
Use the standard error pointer macro to shorten the code and simplify. Signed-off-by: Yu Liao <liaoyu15@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14net/mlx5: Use PTR_ERR_OR_ZERO() to simplify codeJinjie Ruan
Return PTR_ERR_OR_ZERO() instead of return 0 or PTR_ERR() to simplify code. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14net/mlx5: fix config name in Kconfig parameter documentationLukas Bulwahn
Commit a12ba19269d7 ("net/mlx5: Update Kconfig parameter documentation") adds documentation on Kconfig options for the mlx5 driver. It refers to the config MLX5_EN_MACSEC for MACSec offloading, but the config is actually called MLX5_MACSEC. Fix the reference to the right config name in the documentation. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14net/mlx5: Remove unused declarationYue Haibing
Commit 2ac9cfe78223 ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path") declared mlx5e_ipsec_inverse_table_init() but never implemented it. Commit f52f2faee581 ("net/mlx5e: Introduce flow steering API") declared mlx5e_fs_set_tc() but never implemented it. Commit f2f3df550139 ("net/mlx5: EQ, Privatize eq_table and friends") declared mlx5_eq_comp_cpumask() but never implemented it. Commit cac1eb2cf2e3 ("net/mlx5: Lag, properly lock eswitch if needed") removed mlx5_lag_update() but not its declaration. Commit 35ba005d820b ("net/mlx5: DR, Set flex parser for TNL_MPLS dynamically") removed mlx5dr_ste_build_tnl_mpls() but not its declaration. Commit e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") declared but never implemented mlx5_alloc_cmd_mailbox_chain() and mlx5_free_cmd_mailbox_chain(). Commit 0cf53c124756 ("net/mlx5: FWPage, Use async events chain") removed mlx5_core_req_pages_handler() but not its declaration. Commit 938fe83c8dcb ("net/mlx5_core: New device capabilities handling") removed mlx5_query_odp_caps() but not its declaration. Commit f6a8a19bb11b ("RDMA/netdev: Hoist alloc_netdev_mqs out of the driver") removed mlx5_rdma_netdev_alloc() but not its declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14net/mlx5: Replace global mlx5_intf_lock with HCA devcom component lockShay Drory
mlx5_intf_lock is used to sync between LAG changes and its slaves mlx5 core dev aux devices changes, which means every time mlx5 core dev add/remove aux devices, mlx5 is taking this global lock, even if LAG functionality isn't supported over the core dev. This cause a bottleneck when probing VFs/SFs in parallel. Hence, replace mlx5_intf_lock with HCA devcom component lock, or no lock if LAG functionality isn't supported. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14net/mlx5: Refactor LAG peer device lookout bus logic to mlx5 devcomShay Drory
LAG peer device lookout bus logic required the usage of global lock, mlx5_intf_mutex. As part of the effort to remove this global lock, refactor LAG peer device lookout to use mlx5 devcom layer. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14net/mlx5: Avoid false positive lockdep warning by adding lock_class_keyShay Drory
Downstream patch will add devcom component which will be locked in many places. This can lead to a false positive "possible circular locking dependency" warning by lockdep, on flows which lock more than one mlx5 devcom component, such as probing ETH aux device. Hence, add a lock_class_key per mlx5 device. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14net/mlx5: Redesign SF active work to remove table_lockWei Zhang
active_work is a work that iterates over all possible SF devices which their SF port representors are located on different function, and in case SF is in active state, probes it. Currently, the active_work in active_wq is synced with mlx5_vhca_events_work via table_lock and this lock causing a bottleneck in performance. To remove table_lock, redesign active_wq logic so that it now pushes active_work per SF to mlx5_vhca_events_workqueues. Since the latter workqueues are ordered, active_work and mlx5_vhca_events_work with same index will be pushed into same workqueue, thus it completely eliminates the need for a lock. Signed-off-by: Wei Zhang <weizhang@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14net/mlx5: Parallelize vhca event handlingWei Zhang
At present, mlx5 driver have a general purpose event handler which not only handles vhca event but also many other events. This incurs a huge bottleneck because the event handler is implemented by single threaded workqueue and all events are forced to be handled in serial manner even though application tries to create multiple SFs simultaneously. Introduce a dedicated vhca event handler which manages SFs parallel creation. Signed-off-by: Wei Zhang <weizhang@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-14wifi: rtw89: mac: do bf_monitor only if WiFi 6 chipsZong-Zhe Yang
Beamforming monitor is used to adjust registers to fine tune performance and power save, and currently only existing WiFi 6 chips need it. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231012021455.19816-7-pkshih@realtek.com
2023-10-14wifi: rtw89: mac: set bf_assoc capabilities according to chip genZong-Zhe Yang
When associated peer has beamformer capability, we should enable beamformee, set CSI parameter, and configure rate to send CSI packets. Since registers of WiFi 7 chips are very different from existing chips, separate configuration functions. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231012021455.19816-6-pkshih@realtek.com
2023-10-14wifi: rtw89: mac: set bfee_ctrl() according to chip genZong-Zhe Yang
When associated peer has beamformer capability, enable hardware beamformee function, and then hardware can run sounding protocol itself. Oppositely, disable this function when disassociated. Define different registers for WiFi 6 and 7 generations respectively. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231012021455.19816-5-pkshih@realtek.com
2023-10-14wifi: rtw89: mac: add registers of MU-EDCA parameters for WiFi 7 chipsPing-Ke Shih
According to chip generation, set MU-EDCA parameters from mac80211 when connected. Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231012021455.19816-4-pkshih@realtek.com
2023-10-14wifi: rtw89: mac: generalize register of MU-EDCA switch according to chip genZong-Zhe Yang
When connected with 802.11ax AP, MU-EDCA parameters are given, so enable this hardware function by registers according to chip generation. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231012021455.19816-3-pkshih@realtek.com
2023-10-14wifi: rtw89: mac: update RTS threshold according to chip genZong-Zhe Yang
When TX size or time of packet over RTS threshold set by this register, hardware will use RTS protection automatically. Since WiFi 6 and 7 chips have different register address for this, separate the address according to chip gen. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231012021455.19816-2-pkshih@realtek.com
2023-10-14wifi: rtlwifi: simplify TX command fill callbacksDmitry Antipov
Since 'rtlpriv->cfg->ops->fill_tx_cmddesc()' is always called with 'firstseg' and 'lastseg' set to 1 (and the latter is never actually used), all of the relevant chip-specific routines may be simplified. Compile tested only. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Acked-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231011154442.52457-2-dmantipov@yandex.ru
2023-10-14wifi: hostap: remove unused ioctl functionArnd Bergmann
The ioctl handler has no actual callers in the kernel and is useless. All the functionality should be reachable through the regualar interfaces. Acked-by: Kalle Valo <kvalo@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231011140225.253106-9-arnd@kernel.org
2023-10-14wifi: atmel: remove unused ioctl functionArnd Bergmann
This function has no callers, and for the past 20 years, the request_firmware interface has been in place instead of the custom firmware loader. Acked-by: Kalle Valo <kvalo@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231011140225.253106-8-arnd@kernel.org
2023-10-13appletalk: remove special handling code for ipddpLukas Bulwahn
After commit 1dab47139e61 ("appletalk: remove ipddp driver") removes the config IPDDP, there is some minor code clean-up possible in the appletalk network layer. Remove some code in appletalk layer after the ipddp driver is gone. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231012063443.22368-1-lukas.bulwahn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13qed: replace uses of strncpyJustin Stitt
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. This patch eliminates three uses of strncpy(): Firstly, `dest` is expected to be NUL-terminated which is evident by the manual setting of a NUL-byte at size - 1. For this use specifically, strscpy() is a viable replacement due to the fact that it guarantees NUL-termination on the destination buffer. The next two cases should simply be memcpy() as the size of the src string is always 3 and the destination string just wants the first 3 bytes changed. To be clear, there are no buffer overread bugs in the current code as the sizes and offsets are carefully managed such that buffers are NUL-terminated. However, with these changes, the code is now more robust and less ambiguous (and hopefully easier to read). Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20231012-strncpy-drivers-net-ethernet-qlogic-qed-qed_debug-c-v2-1-16d2c0162b80@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13r8169: fix rare issue with broken rx after link-down on RTL8125Heiner Kallweit
In very rare cases (I've seen two reports so far about different RTL8125 chip versions) it seems the MAC locks up when link goes down and requires a software reset to get revived. Realtek doesn't publish hw errata information, therefore the root cause is unknown. Realtek vendor drivers do a full hw re-initialization on each link-up event, the slimmed-down variant here was reported to fix the issue for the reporting user. It's not fully clear which parts of the NIC are reset as part of the software reset, therefore I can't rule out side effects. Fixes: f1bce4ad2f1c ("r8169: add support for RTL8125") Reported-by: Martin Kjær Jørgensen <me@lagy.org> Link: https://lore.kernel.org/netdev/97ec2232-3257-316c-c3e7-a08192ce16a6@gmail.com/T/ Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/9edde757-9c3b-4730-be3b-0ef3a374ff71@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13Merge branch 'net-netconsole-configfs-entries-for-boot-target'Jakub Kicinski
Breno Leitao says: ==================== net: netconsole: configfs entries for boot target There is a limitation in netconsole, where it is impossible to disable or modify the target created from the command line parameter. (netconsole=...). "netconsole" cmdline parameter sets the remote IP, and if the remote IP changes, the machine needs to be rebooted (with the new remote IP set in the command line parameter). This allows the user to modify a target without the need to restart the machine. This functionality sits on top of the dynamic target reconfiguration that is already implemented in netconsole. The way to modify a boot time target is creating special named configfs directories, that will be associated with the targets coming from `netconsole=...`. Example: Let's suppose you have two netconsole targets defined at boot time:: netconsole=4444@10.0.0.1/eth1,9353@10.0.0.2/12:34:56:78:9a:bc;4444@10.0.0.1/eth1,9353@10.0.0.3/12:34:56:78:9a:bc You can modify these targets in runtime by creating the following targets:: $ mkdir cmdline1 $ cat cmdline1/remote_ip 10.0.0.3 $ echo 0 > cmdline1/enabled $ echo 10.0.0.4 > cmdline1/remote_ip $ echo 1 > cmdline1/enabled ==================== Link: https://lore.kernel.org/r/20231012111401.333798-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13Documentation: netconsole: add support for cmdline targetsBreno Leitao
With the previous patches, there is no more limitation at modifying the targets created at boot time (or module load time). Document the way on how to create the configfs directories to be able to modify these netconsole targets. The design discussion about this topic could be found at: https://lore.kernel.org/all/ZRWRal5bW93px4km@gmail.com/ Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20231012111401.333798-5-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13netconsole: Attach cmdline target to dynamic targetBreno Leitao
Enable the attachment of a dynamic target to the target created during boot time. The boot-time targets are named as "cmdline\d", where "\d" is a number starting at 0. If the user creates a dynamic target named "cmdline0", it will attach to the first target created at boot time (as defined in the `netconsole=...` command line argument). `cmdline1` will attach to the second target and so forth. If there is no netconsole target created at boot time, then, the target name could be reused. Relevant design discussion: https://lore.kernel.org/all/ZRWRal5bW93px4km@gmail.com/ Suggested-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20231012111401.333798-4-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13netconsole: Initialize configfs_item for default targetsBreno Leitao
For netconsole targets allocated during the boot time (passing netconsole=... argument), netconsole_target->item is not initialized. That is not a problem because it is not used inside configfs. An upcoming patch will be using it, thus, initialize the targets with the name 'cmdline' plus a counter starting from 0. This name will match entries in the configfs later. Suggested-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20231012111401.333798-3-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13netconsole: move init/cleanup functions lowerBreno Leitao
Move alloc_param_target() and its counterpart (free_param_target()) to the bottom of the file. These functions are called mostly at initialization/cleanup of the module, and they should be just above the callers, at the bottom of the file. From a practical perspective, having alloc_param_target() at the bottom of the file will avoid forward declaration later (in the following patch). Nothing changed other than the functions location. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20231012111401.333798-2-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13sfc: replace deprecated strncpy with strscpyJustin Stitt
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. `desc` is expected to be NUL-terminated as evident by the manual NUL-byte assignment. Moreover, NUL-padding does not seem to be necessary. The only caller of efx_mcdi_nvram_metadata() is efx_devlink_info_nvram_partition() which provides a NULL for `desc`: | rc = efx_mcdi_nvram_metadata(efx, partition_type, NULL, version, NULL, 0); Due to this, I am not sure this code is even reached but we should still favor something other than strncpy. Considering the above, a suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer without unnecessarily NUL-padding. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20231012-strncpy-drivers-net-ethernet-sfc-mcdi-c-v1-1-478c8de1039d@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13net: phy: tja11xx: replace deprecated strncpy with ethtool_sprintfJustin Stitt
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. ethtool_sprintf() is designed specifically for get_strings() usage. Let's replace strncpy in favor of this dedicated helper function. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20231012-strncpy-drivers-net-phy-nxp-tja11xx-c-v1-1-5ad6c9dff5c4@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13ionic: replace deprecated strncpy with strscpyJustin Stitt
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. NUL-padding is not needed due to `ident` being memset'd to 0 just before the copy. Considering the above, a suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer without unnecessarily NUL-padding. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20231011-strncpy-drivers-net-ethernet-pensando-ionic-ionic_main-c-v1-1-23c62a16ff58@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13net: sparx5: replace deprecated strncpy with ethtool_sprintfJustin Stitt
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. ethtool_sprintf() is designed specifically for get_strings() usage. Let's replace strncpy() in favor of this more robust and easier to understand interface. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20231011-strncpy-drivers-net-ethernet-microchip-sparx5-sparx5_ethtool-c-v1-1-410953d07f42@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13net/mlx4_core: replace deprecated strncpy with strscpyJustin Stitt
`strncpy` is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. We expect `dst` to be NUL-terminated based on its use with format strings: | mlx4_dbg(dev, "Reporting Driver Version to FW: %s\n", dst); Moreover, NUL-padding is not required. Considering the above, a suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer without unnecessarily NUL-padding. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231011-strncpy-drivers-net-ethernet-mellanox-mlx4-fw-c-v1-1-4d7b5d34c933@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13nfp: replace deprecated strncpy with strscpyJustin Stitt
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. We expect res->name to be NUL-terminated based on its usage with format strings: | dev_err(cpp->dev.parent, "Dangling area: %d:%d:%d:0x%0llx-0x%0llx%s%s\n", | NFP_CPP_ID_TARGET_of(res->cpp_id), | NFP_CPP_ID_ACTION_of(res->cpp_id), | NFP_CPP_ID_TOKEN_of(res->cpp_id), | res->start, res->end, | res->name ? " " : "", | res->name ? res->name : ""); ... and with strcmp() | if (!strcmp(res->name, NFP_RESOURCE_TBL_NAME)) { Moreover, NUL-padding is not required as `res` is already zero-allocated: | res = kzalloc(sizeof(*res), GFP_KERNEL); Considering the above, a suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer without unnecessarily NUL-padding. Let's also opt to use the more idiomatic strscpy() usage of (dest, src, sizeof(dest)) rather than (dest, src, SOME_LEN). Typically the pattern of 1) allocate memory for string, 2) copy string into freshly-allocated memory is a candidate for kmemdup_nul() but in this case we are allocating the entirety of the `res` struct and that should stay as is. As mentioned above, simple 1:1 replacement of strncpy -> strscpy :) Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Louis Peens <louis.peens@corigine.com> Link: https://lore.kernel.org/r/20231011-strncpy-drivers-net-ethernet-netronome-nfp-nfpcore-nfp_resource-c-v1-1-7d1c984f0eba@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13mlxsw: pci: Allocate skbs using GFP_KERNEL during initializationIdo Schimmel
The driver allocates skbs during initialization and during Rx processing. Take advantage of the fact that the former happens in process context and allocate the skbs using GFP_KERNEL to decrease the probability of allocation failure. Tested with CONFIG_DEBUG_ATOMIC_SLEEP=y. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/dfa6ed0926e045fe7c14f0894cc0c37fee81bf9d.1697034729.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13octeontx2-af: Enable hardware timestamping for VFsSubbaraya Sundeep
Currently for VFs, mailbox returns ENODEV error when hardware timestamping enable is requested. This patch fixes this issue. Modified this patch to return EPERM error for the PF/VFs which are not attached to CGX/RPM. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231011121551.1205211-1-saikrishnag@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13Merge branch 'wangxun-ethtool-stats'Jakub Kicinski
Jiawen Wu says: ==================== Wangxun ethtool stats Support to show ethtool stats for txgbe/ngbe. ==================== Link: https://lore.kernel.org/r/20231011091906.70486-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13net: ngbe: add ethtool stats supportJiawen Wu
Support to show ethtool statistics. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://lore.kernel.org/r/20231011091906.70486-4-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13net: txgbe: add ethtool stats supportJiawen Wu
Support to show ethtool statistics. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://lore.kernel.org/r/20231011091906.70486-3-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13net: libwx: support hardware statisticsJiawen Wu
Implement update and clear Rx/Tx statistics. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://lore.kernel.org/r/20231011091906.70486-2-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13net: dsa: vsc73xx: replace deprecated strncpy with ethtool_sprintfJustin Stitt
`strncpy` is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. ethtool_sprintf() is designed specifically for get_strings() usage. Let's replace strncpy in favor of this more robust and easier to understand interface. This change could result in misaligned strings when if(cnt) fails. To combat this, use ternary to place empty string in buffer and properly increment pointer to next string slot. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20231010-strncpy-drivers-net-dsa-vitesse-vsc73xx-core-c-v2-1-ba4416a9ff23@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13Merge branch 'Open-coded task_vma iter'Andrii Nakryiko
Dave Marchevsky says: ==================== At Meta we have a profiling daemon which periodically collects information on many hosts. This collection usually involves grabbing stacks (user and kernel) using perf_event BPF progs and later symbolicating them. For user stacks we try to use BPF_F_USER_BUILD_ID and rely on remote symbolication, but BPF_F_USER_BUILD_ID doesn't always succeed. In those cases we must fall back to digging around in /proc/PID/maps to map virtual address to (binary, offset). The /proc/PID/maps digging does not occur synchronously with stack collection, so the process might already be gone, in which case it won't have /proc/PID/maps and we will fail to symbolicate. This 'exited process problem' doesn't occur very often as most of the prod services we care to profile are long-lived daemons, but there are enough usecases to warrant a workaround: a BPF program which can be optionally loaded at data collection time and essentially walks /proc/PID/maps. Currently this is done by walking the vma list: struct vm_area_struct* mmap = BPF_CORE_READ(mm, mmap); mmap_next = BPF_CORE_READ(rmap, vm_next); /* in a loop */ Since commit 763ecb035029 ("mm: remove the vma linked list") there's no longer a vma linked list to walk. Walking the vma maple tree is not as simple as hopping struct vm_area_struct->vm_next. Luckily, commit f39af05949a4 ("mm: add VMA iterator"), another commit in that series, added struct vma_iterator and for_each_vma macro for easy vma iteration. If similar functionality was exposed to BPF programs, it would be perfect for our usecase. This series adds such functionality, specifically a BPF equivalent of for_each_vma using the open-coded iterator style. Notes: * This approach was chosen after discussion on a previous series [0] which attempted to solve the same problem by adding a BPF_F_VMA_NEXT flag to bpf_find_vma. * Unlike the task_vma bpf_iter, the open-coded iterator kfuncs here do not drop the vma read lock between iterations. See Alexei's response in [0]. * The [vsyscall] page isn't really part of task->mm's vmas, but /proc/PID/maps returns information about it anyways. The vma iter added here does not do the same. See comment on selftest in patch 3. * bpf_iter_task_vma allocates a _data struct which contains - among other things - struct vma_iterator, using BPF allocator and keeps a pointer to the bpf_iter_task_vma_data. This is done in order to prevent changes to struct ma_state - which is wrapped by struct vma_iterator - from necessitating changes to uapi struct bpf_iter_task_vma. Changelog: v6 -> v7: https://lore.kernel.org/bpf/20231010185944.3888849-1-davemarchevsky@fb.com/ Patch numbers correspond to their position in v6 Patch 2 ("selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c") * Add Andrii ack Patch 3 ("bpf: Introduce task_vma open-coded iterator kfuncs") * Add Andrii ack * Add missing __diag_ignore_all for -Wmissing-prototypes (Song) Patch 4 ("selftests/bpf: Add tests for open-coded task_vma iter") * Remove two unnecessary header includes (Andrii) * Remove extraneous !vmas_seen check (Andrii) New Patch ("bpf: Add BPF_KFUNC_{START,END}_defs macros") * After talking to Andrii, this is an attempt to clean up __diag_ignore_all spam everywhere kfuncs are defined. If nontrivial changes are needed, let's apply the other 4 and I'll respin as a standalone patch. v5 -> v6: https://lore.kernel.org/bpf/20231010175637.3405682-1-davemarchevsky@fb.com/ Patch 4 ("selftests/bpf: Add tests for open-coded task_vma iter") * Remove extraneous blank line. I did this manually to the .patch file for v5, which caused BPF CI to complain about failing to apply the series v4 -> v5: https://lore.kernel.org/bpf/20231002195341.2940874-1-davemarchevsky@fb.com/ Patch numbers correspond to their position in v4 New Patch ("selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c") * Patch 2's renaming of this selftest, and associated changes in the userspace runner, are split out into this separate commit (Andrii) Patch 2 ("bpf: Introduce task_vma open-coded iterator kfuncs") * Remove bpf_iter_task_vma kfuncs from libbpf's bpf_helpers.h, they'll be added to selftests' bpf_experimental.h in selftests patch below (Andrii) * Split bpf_iter_task_vma.c renaming into separate commit (Andrii) Patch 3 ("selftests/bpf: Add tests for open-coded task_vma iter") * Add bpf_iter_task_vma kfuncs to bpf_experimental.h (Andrii) * Remove '?' from prog SEC, open_and_load the skel in one operation (Andrii) * Ensure that fclose() always happens in test runner (Andrii) * Use global var w/ 1000 (vm_start, vm_end) structs instead of two MAP_TYPE_ARRAY's w/ 1k u64s each (Andrii) v3 -> v4: https://lore.kernel.org/bpf/20230822050558.2937659-1-davemarchevsky@fb.com/ Patch 1 ("bpf: Don't explicitly emit BTF for struct btf_iter_num") * Add Andrii ack Patch 2 ("bpf: Introduce task_vma open-coded iterator kfuncs") * Mark bpf_iter_task_vma_new args KF_RCU and remove now-unnecessary !task check (Yonghong) * Although KF_RCU is a function-level flag, in reality it only applies to the task_struct *task parameter, as the other two params are a scalar int and a specially-handled KF_ARG_PTR_TO_ITER * Remove struct bpf_iter_task_vma definition from uapi headers, define in kernel/bpf/task_iter.c instead (Andrii) Patch 3 ("selftests/bpf: Add tests for open-coded task_vma iter") * Use a local var when looping over vmas to track map idx. Update vmas_seen global after done iterating. Don't start iterating or update vmas_seen if vmas_seen global is nonzero. (Andrii) * Move getpgid() call to correct spot - above skel detach. (Andrii) v2 -> v3: https://lore.kernel.org/bpf/20230821173415.1970776-1-davemarchevsky@fb.com/ Patch 1 ("bpf: Don't explicitly emit BTF for struct btf_iter_num") * Add Yonghong ack Patch 2 ("bpf: Introduce task_vma open-coded iterator kfuncs") * UAPI bpf header and tools/ version should match * Add bpf_iter_task_vma_kern_data which bpf_iter_task_vma_kern points to, bpf_mem_alloc/free it instead of just vma_iterator. (Alexei) * Inner data ptr == NULL implies initialization failed v1 -> v2: https://lore.kernel.org/bpf/20230810183513.684836-1-davemarchevsky@fb.com/ * Patch 1 * Now removes the unnecessary BTF_TYPE_EMIT instead of changing the type (Yonghong) * Patch 2 * Don't do unnecessary BTF_TYPE_EMIT (Yonghong) * Bump task refcount to prevent ->mm reuse (Yonghong) * Keep a pointer to vma_iterator in bpf_iter_task_vma, alloc/free via BPF mem allocator (Yonghong, Stanislav) * Patch 3 [0]: https://lore.kernel.org/bpf/20230801145414.418145-1-davemarchevsky@fb.com/ ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-10-13selftests/bpf: Add tests for open-coded task_vma iterDave Marchevsky
The open-coded task_vma iter added earlier in this series allows for natural iteration over a task's vmas using existing open-coded iter infrastructure, specifically bpf_for_each. This patch adds a test demonstrating this pattern and validating correctness. The vma->vm_start and vma->vm_end addresses of the first 1000 vmas are recorded and compared to /proc/PID/maps output. As expected, both see the same vmas and addresses - with the exception of the [vsyscall] vma - which is explained in a comment in the prog_tests program. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231013204426.1074286-5-davemarchevsky@fb.com
2023-10-13bpf: Introduce task_vma open-coded iterator kfuncsDave Marchevsky
This patch adds kfuncs bpf_iter_task_vma_{new,next,destroy} which allow creation and manipulation of struct bpf_iter_task_vma in open-coded iterator style. BPF programs can use these kfuncs directly or through bpf_for_each macro for natural-looking iteration of all task vmas. The implementation borrows heavily from bpf_find_vma helper's locking - differing only in that it holds the mmap_read lock for all iterations while the helper only executes its provided callback on a maximum of 1 vma. Aside from locking, struct vma_iterator and vma_next do all the heavy lifting. A pointer to an inner data struct, struct bpf_iter_task_vma_data, is the only field in struct bpf_iter_task_vma. This is because the inner data struct contains a struct vma_iterator (not ptr), whose size is likely to change under us. If bpf_iter_task_vma_kern contained vma_iterator directly such a change would require change in opaque bpf_iter_task_vma struct's size. So better to allocate vma_iterator using BPF allocator, and since that alloc must already succeed, might as well allocate all iter fields, thereby freezing struct bpf_iter_task_vma size. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231013204426.1074286-4-davemarchevsky@fb.com
2023-10-13selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.cDave Marchevsky
Further patches in this series will add a struct bpf_iter_task_vma, which will result in a name collision with the selftest prog renamed in this patch. Rename the selftest to avoid the collision. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231013204426.1074286-3-davemarchevsky@fb.com
2023-10-13bpf: Don't explicitly emit BTF for struct btf_iter_numDave Marchevsky
Commit 6018e1f407cc ("bpf: implement numbers iterator") added the BTF_TYPE_EMIT line that this patch is modifying. The struct btf_iter_num doesn't exist, so only a forward declaration is emitted in BTF: FWD 'btf_iter_num' fwd_kind=struct That commit was probably hoping to ensure that struct bpf_iter_num is emitted in vmlinux BTF. A previous version of this patch changed the line to emit the correct type, but Yonghong confirmed that it would definitely be emitted regardless in [0], so this patch simply removes the line. This isn't marked "Fixes" because the extraneous btf_iter_num FWD wasn't causing any issues that I noticed, aside from mild confusion when I looked through the code. [0]: https://lore.kernel.org/bpf/25d08207-43e6-36a8-5e0f-47a913d4cda5@linux.dev/ Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231013204426.1074286-2-davemarchevsky@fb.com
2023-10-13bpf: Change syscall_nr type to int in struct syscall_tp_tArtem Savkov
linux-rt-devel tree contains a patch (b1773eac3f29c ("sched: Add support for lazy preemption")) that adds an extra member to struct trace_entry. This causes the offset of args field in struct trace_event_raw_sys_enter be different from the one in struct syscall_trace_enter: struct trace_event_raw_sys_enter { struct trace_entry ent; /* 0 12 */ /* XXX last struct has 3 bytes of padding */ /* XXX 4 bytes hole, try to pack */ long int id; /* 16 8 */ long unsigned int args[6]; /* 24 48 */ /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */ char __data[]; /* 72 0 */ /* size: 72, cachelines: 2, members: 4 */ /* sum members: 68, holes: 1, sum holes: 4 */ /* paddings: 1, sum paddings: 3 */ /* last cacheline: 8 bytes */ }; struct syscall_trace_enter { struct trace_entry ent; /* 0 12 */ /* XXX last struct has 3 bytes of padding */ int nr; /* 12 4 */ long unsigned int args[]; /* 16 0 */ /* size: 16, cachelines: 1, members: 3 */ /* paddings: 1, sum paddings: 3 */ /* last cacheline: 16 bytes */ }; This, in turn, causes perf_event_set_bpf_prog() fail while running bpf test_profiler testcase because max_ctx_offset is calculated based on the former struct, while off on the latter: 10488 if (is_tracepoint || is_syscall_tp) { 10489 int off = trace_event_get_offsets(event->tp_event); 10490 10491 if (prog->aux->max_ctx_offset > off) 10492 return -EACCES; 10493 } What bpf program is actually getting is a pointer to struct syscall_tp_t, defined in kernel/trace/trace_syscalls.c. This patch fixes the problem by aligning struct syscall_tp_t with struct syscall_trace_(enter|exit) and changing the tests to use these structs to dereference context. Signed-off-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20231013054219.172920-1-asavkov@redhat.com
2023-10-13net/bpf: Avoid unused "sin_addr_len" warning when CONFIG_CGROUP_BPF is not setMartin KaFai Lau
It was reported that there is a compiler warning on the unused variable "sin_addr_len" in af_inet.c when CONFIG_CGROUP_BPF is not set. This patch is to address it similar to the ipv6 counterpart in inet6_getname(). It is to "return sin_addr_len;" instead of "return sizeof(*sin);". Fixes: fefba7d1ae19 ("bpf: Propagate modified uaddrlen from cgroup sockaddr programs") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/bpf/20231013185702.3993710-1-martin.lau@linux.dev Closes: https://lore.kernel.org/bpf/20231013114007.2fb09691@canb.auug.org.au/
2023-10-13bpf: Avoid unnecessary audit log for CPU security mitigationsYafang Shao
Check cpu_mitigations_off() first to avoid calling capable() if it is off. This can avoid unnecessary audit log. Fixes: bc5bc309db45 ("bpf: Inherit system settings for CPU security mitigations") Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/CAEf4Bza6UVUWqcWQ-66weZ-nMDr+TFU3Mtq=dumZFD-pSqU7Ow@mail.gmail.com/ Link: https://lore.kernel.org/bpf/20231013083916.4199-1-laoar.shao@gmail.com
2023-10-13net: fix IPSTATS_MIB_OUTFORWDATAGRAMS increment after fragment checkHeng Guo
Reproduce environment: network with 3 VM linuxs is connected as below: VM1<---->VM2(latest kernel 6.5.0-rc7)<---->VM3 VM1: eth0 ip: 192.168.122.207 MTU 1800 VM2: eth0 ip: 192.168.122.208, eth1 ip: 192.168.123.224 MTU 1500 VM3: eth0 ip: 192.168.123.240 MTU 1800 Reproduce: VM1 send 1600 bytes UDP data to VM3 using tools scapy with flags='DF'. scapy command: send(IP(dst="192.168.123.240",flags='DF')/UDP()/str('0'*1600),count=1, inter=1.000000) Result: Before IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss Ip: 1 64 6 0 2 2 0 0 2 4 0 0 0 0 0 0 0 0 0 ...... root@qemux86-64:~# ---------------------------------------------------------------------- After IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss Ip: 1 64 7 0 2 2 0 0 2 5 0 0 0 0 0 0 0 1 0 ...... root@qemux86-64:~# ---------------------------------------------------------------------- ForwDatagrams is always keeping 2 without increment. Issue description and patch: ip_exceeds_mtu() in ip_forward() drops this IP datagram because skb len (1600 sending by scapy) is over MTU(1500 in VM2) if "DF" is set. According to RFC 4293 "3.2.3. IP Statistics Tables", +-------+------>------+----->-----+----->-----+ | InForwDatagrams (6) | OutForwDatagrams (6) | | V +->-+ OutFragReqds | InNoRoutes | | (packets) / (local packet (3) | | | IF is that of the address | +--> OutFragFails | and may not be the receiving IF) | | (packets) the IPSTATS_MIB_OUTFORWDATAGRAMS should be counted before fragment check. The existing implementation, instead, would incease the counter after fragment check: ip_exceeds_mtu() in ipv4 and ip6_pkt_too_big() in ipv6. So do patch to move IPSTATS_MIB_OUTFORWDATAGRAMS counter to ip_forward() for ipv4 and ip6_forward() for ipv6. Test result with patch: Before IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss Ip: 1 64 6 0 2 2 0 0 2 4 0 0 0 0 0 0 0 0 0 ...... root@qemux86-64:~# ---------------------------------------------------------------------- After IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss Ip: 1 64 7 0 2 3 0 0 2 5 0 0 0 0 0 0 0 1 0 ...... root@qemux86-64:~# ---------------------------------------------------------------------- ForwDatagrams is updated from 2 to 3. Reviewed-by: Filip Pudak <filip.pudak@windriver.com> Signed-off-by: Heng Guo <heng.guo@windriver.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231011015137.27262-1-heng.guo@windriver.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>