linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2023-10-18	i40e: use scnprintf over strncpy+strncat	Justin Stitt
	`strncpy` is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. Moreover, `strncat` shouldn't really be used either as per fortify-string.h: * Do not use this function. While FORTIFY_SOURCE tries to avoid * read and write overflows, this is only possible when the sizes * of @p and @q are known to the compiler. Prefer building the * string with formatting, via scnprintf() or similar. Instead, use `scnprintf` with "%s%s" format string. This code is now more readable and robust. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231017190411.2199743-7-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18	fm10k: replace deprecated strncpy with strscpy	Justin Stitt
	`strncpy` is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. A suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer without unnecessarily NUL-padding. Other implementations of .get_drvinfo also use strscpy so this patch brings fm10k_get_drvinfo in line as well: igb/igb_ethtool.c +851 static void igb_get_drvinfo(struct net_device netdev, igbvf/ethtool.c 167:static void igbvf_get_drvinfo(struct net_device netdev, i40e/i40e_ethtool.c 1999:static void i40e_get_drvinfo(struct net_device netdev, e1000/e1000_ethtool.c 529:static void e1000_get_drvinfo(struct net_device netdev, ixgbevf/ethtool.c 211:static void ixgbevf_get_drvinfo(struct net_device netdev, Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231017190411.2199743-6-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18	e1000: replace deprecated strncpy with strscpy	Justin Stitt
	`strncpy` is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. We can see that netdev->name is expected to be NUL-terminated based on it's usage with format strings: \| pr_info("%s NIC Link is Down\n", \| netdev->name); A suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer without unnecessarily NUL-padding. This is in line with other uses of strscpy on netdev->name: $ rg "strscpy\(netdev\->name.pci." drivers/net/ethernet/intel/e1000e/netdev.c 7455: strscpy(netdev->name, pci_name(pdev), sizeof(netdev->name)); drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 10839: strscpy(netdev->name, pci_name(pdev), sizeof(netdev->name)); Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231017190411.2199743-5-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18	e100: replace deprecated strncpy with strscpy	Justin Stitt
	`strncpy` is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. The "...-1" pattern makes it evident that netdev->name is expected to be NUL-terminated. Meanwhile, it seems NUL-padding is not required due to alloc_etherdev zero-allocating the buffer. Considering the above, a suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer without unnecessarily NUL-padding. This is in line with other uses of strscpy on netdev->name: $ rg "strscpy\(netdev\->name.pci." drivers/net/ethernet/intel/e1000e/netdev.c 7455: strscpy(netdev->name, pci_name(pdev), sizeof(netdev->name)); drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 10839: strscpy(netdev->name, pci_name(pdev), sizeof(netdev->name)); Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231017190411.2199743-4-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18	intel: fix format warnings	Jesse Brandeburg
	Get ahead of the game and fix all the -Wformat=2 noted warnings in the intel drivers directory. There are one set of i40e and iavf warnings I couldn't figure out how to fix because the driver is already using vsnprintf without an explicit "const char *" format string. Tested with both gcc-12 and clang-15. I found gcc-12 runs clean after this series but clang-15 is a little worried about the vsnprintf lines. summary of warnings: drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c:148:34: warning: format string is not a string literal [-Wformat-nonliteral] drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c:1416:24: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c:1416:24: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c:1421:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c:1421:6: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/igc/igc_ethtool.c:776:24: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/igc/igc_ethtool.c:776:24: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/igc/igc_ethtool.c:779:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/igc/igc_ethtool.c:779:6: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/iavf/iavf_ethtool.c:199:34: warning: format string is not a string literal [-Wformat-nonliteral] drivers/net/ethernet/intel/igb/igb_ethtool.c:2360:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/igb/igb_ethtool.c:2360:6: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/igb/igb_ethtool.c:2363:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/igb/igb_ethtool.c:2363:6: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/i40e/i40e_ethtool.c:208:34: warning: format string is not a string literal [-Wformat-nonliteral] drivers/net/ethernet/intel/i40e/i40e_ethtool.c:2515:23: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/i40e/i40e_ethtool.c:2515:23: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/i40e/i40e_ethtool.c:2519:23: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/i40e/i40e_ethtool.c:2519:23: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/ice/ice_ethtool.c:1064:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/ice/ice_ethtool.c:1064:6: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/ice/ice_ethtool.c:1084:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/ice/ice_ethtool.c:1084:6: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/ice/ice_ethtool.c:1100:24: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/ice/ice_ethtool.c:1100:24: note: treat the string as an argument to avoid this Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231017190411.2199743-3-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18	intel: fix string truncation warnings	Jesse Brandeburg
	Fix -Wformat-truncated warnings to complete the intel directories' W=1 clean efforts. The W=1 recently got enhanced with a few new flags and this brought up some new warnings. Switch to using kasprintf() when possible so we always allocate the right length strings. summary of warnings: drivers/net/ethernet/intel/iavf/iavf_virtchnl.c:1425:60: warning: ‘%s’ directive output may be truncated writing 4 bytes into a region of size between 1 and 11 [-Wformat-truncation=] drivers/net/ethernet/intel/iavf/iavf_virtchnl.c:1425:17: note: ‘snprintf’ output between 7 and 17 bytes into a destination of size 13 drivers/net/ethernet/intel/ice/ice_ptp.c:43:27: warning: ‘%s’ directive output may be truncated writing up to 479 bytes into a region of size 64 [-Wformat-truncation=] drivers/net/ethernet/intel/ice/ice_ptp.c:42:17: note: ‘snprintf’ output between 1 and 480 bytes into a destination of size 64 drivers/net/ethernet/intel/igb/igb_main.c:3092:53: warning: ‘%d’ directive output may be truncated writing between 1 and 5 bytes into a region of size between 1 and 13 [-Wformat-truncation=] drivers/net/ethernet/intel/igb/igb_main.c:3092:34: note: directive argument in the range [0, 65535] drivers/net/ethernet/intel/igb/igb_main.c:3092:34: note: directive argument in the range [0, 65535] drivers/net/ethernet/intel/igb/igb_main.c:3090:25: note: ‘snprintf’ output between 23 and 43 bytes into a destination of size 32 Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231017190411.2199743-2-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18	Merge branch 'selftests-tc-testing-fixes-for-kselftest'	Jakub Kicinski
	Pedro Tammela says: ==================== selftests: tc-testing: fixes for kselftest While playing around with TuxSuite, we noticed a couple of things were broken for strict CI/automated builds. We had a script that didn't make into the kselftest tarball and a couple of missing Kconfig knobs in our minimal config. ==================== Link: https://lore.kernel.org/r/20231017152309.3196320-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18	selftests: tc-testing: move auxiliary scripts to a dedicated folder	Pedro Tammela
	Some taprio tests need auxiliary scripts to wait for workqueue events to process. Move them to a dedicated folder in order to package them for the kselftests tarball. Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Link: https://lore.kernel.org/r/20231017152309.3196320-3-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18	selftests: tc-testing: add missing Kconfig options to 'config'	Pedro Tammela
	Make sure CI builds using just tc-testing/config can run all tdc tests. Some tests were broken because of missing knobs. Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Link: https://lore.kernel.org/r/20231017152309.3196320-2-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18	net: wangxun: remove redundant kernel log	Jiawen Wu
	Since PBA info can be read from lspci, delete txgbe_read_pba_string() and the prints. In addition, delete the redundant MAC address printing. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20231017100635.154967-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18	Merge branch 'net-fec-fix-device_get_match_data-usage'	Jakub Kicinski
	Alexander Stein says: ==================== net: fec: Fix device_get_match_data usage this is v2 adressing the regression introduced by commit b0377116decd ("net: ethernet: Use device_get_match_data()"). You could also remove the (!dev_info) case for Coldfire as this platform has no quirks. But IMHO this should be kept as long as Coldfire platform data is supported. ==================== Link: https://lore.kernel.org/r/20231017063419.925266-1-alexander.stein@ew.tq-group.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18	net: fec: Remove non-Coldfire platform IDs	Alexander Stein
	All i.MX platforms (non-Coldfire) use DT nowadays, so their platform ID entries can be removed. Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20231017063419.925266-3-alexander.stein@ew.tq-group.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18	net: fec: Fix device_get_match_data usage	Alexander Stein
	device_get_match_data() expects that of_device_id->data points to actual fec_devinfo data, not a platform_device_id entry. Fix this by adjusting OF device data pointers to their corresponding structs. enum imx_fec_type is now unused and can be removed. Fixes: b0377116decd ("net: ethernet: Use device_get_match_data()") Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20231017063419.925266-2-alexander.stein@ew.tq-group.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18	drivers: net: wwan: iosm: Fixed multiple typos in multiple files	Muhammad Muzammil
	iosm_ipc_chnl_cfg.h: Fixed typo iosm_ipc_imem_ops.h: Fixed typo iosm_ipc_mux.h: Fixed typo iosm_ipc_pm.h: Fixed typo iosm_ipc_port.h: Fixed typo iosm_ipc_trace.h: Fixed typo Signed-off-by: Muhammad Muzammil <m.muzzammilashraf@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231014121407.10012-1-m.muzzammilashraf@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18	net: skb_find_text: Ignore patterns extending past 'to'	Phil Sutter
	Assume that caller's 'to' offset really represents an upper boundary for the pattern search, so patterns extending past this offset are to be rejected. The old behaviour also was kind of inconsistent when it comes to fragmentation (or otherwise non-linear skbs): If the pattern started in between 'to' and 'from' offsets but extended to the next fragment, it was not found if 'to' offset was still within the current fragment. Test the new behaviour in a kselftest using iptables' string match. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Fixes: f72b948dcbb8 ("[NET]: skb_find_text ignores to argument") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18	Merge tag 'nf-next-23-10-18' of ↵	David S. Miller
	https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter next pull request 2023-10-18 This series contains initial netfilter skb drop_reason support, from myself. First few patches fix up a few spots to make sure we won't trip when followup patches embed error numbers in the upper bits (we already do this in some places). Then, nftables and bridge netfilter get converted to call kfree_skb_reason directly to let tooling pinpoint exact location of packet drops, rather than the existing NF_DROP catchall in nf_hook_slow(). I would like to eventually convert all netfilter modules, but as some callers cannot deal with NF_STOLEN (notably act_ct), more preparation work is needed for this. Last patch gets rid of an ugly 'de-const' cast in nftables. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18	Merge branch 'ethtool-forced-speed'	David S. Miller
	Paul Greenwalt says: ==================== ethtool: Add link mode maps for forced speeds The following patch set was initially a part of [1]. As the purpose of the original series was to add the support of the new hardware to the intel ice driver, the refactoring of advertised link modes mapping was extracted to a new set. The patch set adds a common mechanism for mapping Ethtool forced speeds with Ethtool supported link modes, which can be used in drivers code. [1] https://lore.kernel.org/netdev/20230823180633.2450617-1-pawel.chmielewski@intel.com Changelog: v4->v5: Separated ethtool and qede changes into two patches, fixed indentation, and moved ethtool_forced_speed_maps_init() from ioctl.c to ethtool.h v3->v4: Moved the macro for setting fields into the common header file v2->v3: Fixed whitespaces, added missing line at end of file v1->v2: Fixed formatting, typo, moved declaration of iterator to loop line. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18	ice: Refactor finding advertised link speed	Pawel Chmielewski
	Refactor ice_get_link_ksettings to using forced speed to link modes mapping. Suggested-by : Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18	qede: Refactor qede_forced_speed_maps_init()	Paul Greenwalt
	Refactor qede_forced_speed_maps_init() to use commen implementation ethtool_forced_speed_maps_init(). The qede driver was compile tested only. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18	ethtool: Add forced speed to supported link modes maps	Paul Greenwalt
	The need to map Ethtool forced speeds to Ethtool supported link modes is common among drivers. To support this, add a common structure for forced speed maps and a function to init them. This is solution was originally introduced in commit 1d4e4ecccb11 ("qede: populate supported link modes maps on module init") for qede driver. ethtool_forced_speed_maps_init() should be called during driver init with an array of struct ethtool_forced_speed_map to populate the mapping. Definitions for maps themselves are left in the driver code, as the sets of supported link modes may vary between the devices. Suggested-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18	netfilter: nf_tables: de-constify set commit ops function argument	Florian Westphal
	The set backend using this already has to work around this via ugly cast, don't spread this pattern. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18	netfilter: bridge: convert br_netfilter to NF_DROP_REASON	Florian Westphal
	errno is 0 because these hooks are called from prerouting and forward. There is no socket that the errno would ever be propagated to. Other netfilter modules (e.g. nf_nat, conntrack, ...) can be converted in a similar way. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18	netfilter: make nftables drops visible in net dropmonitor	Florian Westphal
	net_dropmonitor blames core.c:nf_hook_slow. Add NF_DROP_REASON() helper and use it in nft_do_chain(). The helper releases the skb, so exact drop location becomes available. Calling code will observe the NF_STOLEN verdict instead. Adjust nf_hook_slow so we can embed an erro value wih NF_STOLEN verdicts, just like we do for NF_DROP. After this, drop in nftables can be pinpointed to a drop due to a rule or the chain policy. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18	netfilter: nf_nat: mask out non-verdict bits when checking return value	Florian Westphal
	Same as previous change: we need to mask out the non-verdict bits, as upcoming patches may embed an errno value in NF_STOLEN verdicts too. NF_DROP could already do this, but not all called functions do this. Checks that only test ret vs NF_ACCEPT are fine, the 'errno parts' are always 0 for those. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18	netfilter: conntrack: convert nf_conntrack_update to netfilter verdicts	Florian Westphal
	This function calls helpers that can return nf-verdicts, but then those get converted to -1/0 as thats what the caller expects. Theoretically NF_DROP could have an errno number set in the upper 24 bits of the return value. Or any of those helpers could return NF_STOLEN, which would result in use-after-free. This is fine as-is, the called functions don't do this yet. But its better to avoid possible future problems if the upcoming patchset to add NF_DROP_REASON() support gains further users, so remove the 0/-1 translation from the picture and pass the verdicts down to the caller. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18	netfilter: nf_tables: mask out non-verdict bits when checking return value	Florian Westphal
	nftables trace infra must mask out the non-verdict bit parts of the return value, else followup changes that 'return errno << 8 \| NF_STOLEN' will cause breakage. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18	netfilter: xt_mangle: only check verdict part of return value	Florian Westphal
	These checks assume that the caller only returns NF_DROP without any errno embedded in the upper bits. This is fine right now, but followup patches will start to propagate such errors to allow kfree_skb_drop_reason() in the called functions, those would then indicate 'errno << 8 \| NF_STOLEN'. To not break things we have to mask those parts out. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18	Merge branch 'devlink-deadlock'	David S. Miller
	Jiri Pirko says: ==================== devlink: fix a deadlock when taking devlink instance lock while holding RTNL lock devlink_port_fill() may be called sometimes with RTNL lock held. When putting the nested port function devlink instance attrs, current code takes nested devlink instance lock. In that case lock ordering is wrong. Patch #1 is a dependency of patch #2. Patch #2 converts the peernet2id_alloc() call to rely in RCU so it could called without devlink instance lock. Patch #3 takes device reference for devlink instance making sure that device does not disappear before devlink_release() is called. Patch #4 benefits from the preparations done in patches #2 and #3 and removes the problematic nested devlink lock aquisition. Patched #5-#7 improve documentation to reflect this issue so it is avoided in the future. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18	devlink: document devlink_rel_nested_in_notify() function	Jiri Pirko
	Add a documentation for devlink_rel_nested_in_notify() describing the devlink instance locking consequences. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18	Documentation: devlink: add a note about RTNL lock into locking section	Jiri Pirko
	Add a note describing the locking order of taking RTNL lock with devlink instance lock. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18	Documentation: devlink: add nested instance section	Jiri Pirko
	Add a part talking about nested devlink instances describing the helpers and locking ordering. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18	devlink: don't take instance lock for nested handle put	Jiri Pirko
	Lockdep reports following issue: WARNING: possible circular locking dependency detected ------------------------------------------------------ devlink/8191 is trying to acquire lock: ffff88813f32c250 (&devlink->lock_key#14){+.+.}-{3:3}, at: devlink_rel_devlink_handle_put+0x11e/0x2d0 but task is already holding lock: ffffffff8511eca8 (rtnl_mutex){+.+.}-{3:3}, at: unregister_netdev+0xe/0x20 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (rtnl_mutex){+.+.}-{3:3}: lock_acquire+0x1c3/0x500 __mutex_lock+0x14c/0x1b20 register_netdevice_notifier_net+0x13/0x30 mlx5_lag_add_mdev+0x51c/0xa00 [mlx5_core] mlx5_load+0x222/0xc70 [mlx5_core] mlx5_init_one_devl_locked+0x4a0/0x1310 [mlx5_core] mlx5_init_one+0x3b/0x60 [mlx5_core] probe_one+0x786/0xd00 [mlx5_core] local_pci_probe+0xd7/0x180 pci_device_probe+0x231/0x720 really_probe+0x1e4/0xb60 __driver_probe_device+0x261/0x470 driver_probe_device+0x49/0x130 __driver_attach+0x215/0x4c0 bus_for_each_dev+0xf0/0x170 bus_add_driver+0x21d/0x590 driver_register+0x133/0x460 vdpa_match_remove+0x89/0xc0 [vdpa] do_one_initcall+0xc4/0x360 do_init_module+0x22d/0x760 load_module+0x51d7/0x6750 init_module_from_file+0xd2/0x130 idempotent_init_module+0x326/0x5a0 __x64_sys_finit_module+0xc1/0x130 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 -> #2 (mlx5_intf_mutex){+.+.}-{3:3}: lock_acquire+0x1c3/0x500 __mutex_lock+0x14c/0x1b20 mlx5_register_device+0x3e/0xd0 [mlx5_core] mlx5_init_one_devl_locked+0x8fa/0x1310 [mlx5_core] mlx5_devlink_reload_up+0x147/0x170 [mlx5_core] devlink_reload+0x203/0x380 devlink_nl_cmd_reload+0xb84/0x10e0 genl_family_rcv_msg_doit+0x1cc/0x2a0 genl_rcv_msg+0x3c9/0x670 netlink_rcv_skb+0x12c/0x360 genl_rcv+0x24/0x40 netlink_unicast+0x435/0x6f0 netlink_sendmsg+0x7a0/0xc70 sock_sendmsg+0xc5/0x190 __sys_sendto+0x1c8/0x290 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 -> #1 (&dev->lock_key#8){+.+.}-{3:3}: lock_acquire+0x1c3/0x500 __mutex_lock+0x14c/0x1b20 mlx5_init_one_devl_locked+0x45/0x1310 [mlx5_core] mlx5_devlink_reload_up+0x147/0x170 [mlx5_core] devlink_reload+0x203/0x380 devlink_nl_cmd_reload+0xb84/0x10e0 genl_family_rcv_msg_doit+0x1cc/0x2a0 genl_rcv_msg+0x3c9/0x670 netlink_rcv_skb+0x12c/0x360 genl_rcv+0x24/0x40 netlink_unicast+0x435/0x6f0 netlink_sendmsg+0x7a0/0xc70 sock_sendmsg+0xc5/0x190 __sys_sendto+0x1c8/0x290 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 -> #0 (&devlink->lock_key#14){+.+.}-{3:3}: check_prev_add+0x1af/0x2300 __lock_acquire+0x31d7/0x4eb0 lock_acquire+0x1c3/0x500 __mutex_lock+0x14c/0x1b20 devlink_rel_devlink_handle_put+0x11e/0x2d0 devlink_nl_port_fill+0xddf/0x1b00 devlink_port_notify+0xb5/0x220 __devlink_port_type_set+0x151/0x510 devlink_port_netdevice_event+0x17c/0x220 notifier_call_chain+0x97/0x240 unregister_netdevice_many_notify+0x876/0x1790 unregister_netdevice_queue+0x274/0x350 unregister_netdev+0x18/0x20 mlx5e_vport_rep_unload+0xc5/0x1c0 [mlx5_core] __esw_offloads_unload_rep+0xd8/0x130 [mlx5_core] mlx5_esw_offloads_rep_unload+0x52/0x70 [mlx5_core] mlx5_esw_offloads_unload_rep+0x85/0xc0 [mlx5_core] mlx5_eswitch_unload_sf_vport+0x41/0x90 [mlx5_core] mlx5_devlink_sf_port_del+0x120/0x280 [mlx5_core] genl_family_rcv_msg_doit+0x1cc/0x2a0 genl_rcv_msg+0x3c9/0x670 netlink_rcv_skb+0x12c/0x360 genl_rcv+0x24/0x40 netlink_unicast+0x435/0x6f0 netlink_sendmsg+0x7a0/0xc70 sock_sendmsg+0xc5/0x190 __sys_sendto+0x1c8/0x290 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 other info that might help us debug this: Chain exists of: &devlink->lock_key#14 --> mlx5_intf_mutex --> rtnl_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock(mlx5_intf_mutex); lock(rtnl_mutex); lock(&devlink->lock_key#14); Problem is taking the devlink instance lock of nested instance when RTNL is already held. To fix this, don't take the devlink instance lock when putting nested handle. Instead, rely on the preparations done by previous two patches to be able to access device pointer and obtain netns id without devlink instance lock held. Fixes: c137743bce02 ("devlink: introduce object and nested devlink relationship infra") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18	devlink: take device reference for devlink object	Jiri Pirko
	In preparation to allow to access device pointer without devlink instance lock held, make sure the device pointer is usable until devlink_release() is called. Fixes: c137743bce02 ("devlink: introduce object and nested devlink relationship infra") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18	devlink: call peernet2id_alloc() with net pointer under RCU read lock	Jiri Pirko
	peernet2id_alloc() allows to be called lockless with peer net pointer obtained in RCU critical section and makes sure to return ns ID if net namespaces is not being removed concurrently. Benefit from read_pnet_rcu() helper addition, use it to obtain net pointer under RCU read lock and pass it to peernet2id_alloc() to get ns ID. Fixes: c137743bce02 ("devlink: introduce object and nested devlink relationship infra") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18	net: treat possible_net_t net pointer as an RCU one and add read_pnet_rcu()	Jiri Pirko
	Make the net pointer stored in possible_net_t structure annotated as an RCU pointer. Change the access helpers to treat it as such. Introduce read_pnet_rcu() helper to allow caller to dereference the net pointer under RCU read lock. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-17	Merge tag 'mlx5-updates-2023-10-10' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-10-10 1) Adham Faris, Increase max supported channels number to 256 2) Leon Romanovsky, Allow IPsec soft/hard limits in bytes 3) Shay Drory, Replace global mlx5_intf_lock with HCA devcom component lock 4) Wei Zhang, Optimize SF creation flow During SF creation, HCA state gets changed from INVALID to IN_USE step by step. Accordingly, FW sends vhca event to driver to inform about this state change asynchronously. Each vhca event is critical because all related SW/FW operations are triggered by it. Currently there is only a single mlx5 general event handler which not only handles vhca event but many other events. This incurs huge bottleneck because all events are forced to be handled in serial manner. Moreover, all SFs share same table_lock which inevitably impacts each other when they are created in parallel. This series will solve this issue by: 1. A dedicated vhca event handler is introduced to eliminate the mutual impact with other mlx5 events. 2. Max FW threads work queues are employed in the vhca event handler to fully utilize FW capability. 3. Redesign SF active work logic to completely remove table_lock. With above optimization, SF creation time is reduced by 25%, i.e. from 80s to 60s when creating 100 SFs. Patches summary: Patch 1 - implement dedicated vhca event handler with max FW cmd threads of work queues. Patch 2 - remove table_lock by redesigning SF active work logic. * tag 'mlx5-updates-2023-10-10' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Allow IPsec soft/hard limits in bytes net/mlx5e: Increase max supported channels number to 256 net/mlx5e: Preparations for supporting larger number of channels net/mlx5e: Refactor mlx5e_rss_init() and mlx5e_rss_free() API's net/mlx5e: Refactor mlx5e_rss_set_rxfh() and mlx5e_rss_get_rxfh() net/mlx5e: Refactor rx_res_init() and rx_res_free() APIs net/mlx5e: Use PTR_ERR_OR_ZERO() to simplify code net/mlx5: Use PTR_ERR_OR_ZERO() to simplify code net/mlx5: fix config name in Kconfig parameter documentation net/mlx5: Remove unused declaration net/mlx5: Replace global mlx5_intf_lock with HCA devcom component lock net/mlx5: Refactor LAG peer device lookout bus logic to mlx5 devcom net/mlx5: Avoid false positive lockdep warning by adding lock_class_key net/mlx5: Redesign SF active work to remove table_lock net/mlx5: Parallelize vhca event handling ==================== Link: https://lore.kernel.org/r/20231014171908.290428-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17	hamradio: replace deprecated strncpy with strscpy_pad	Justin Stitt
	strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. We expect both hi.data.modename and hi.data.drivername to be NUL-terminated based on its usage with sprintf: \| sprintf(hi.data.modename, "%sclk,%smodem,fclk=%d,bps=%d%s", \| bc->cfg.intclk ? "int" : "ext", \| bc->cfg.extmodem ? "ext" : "int", bc->cfg.fclk, bc->cfg.bps, \| bc->cfg.loopback ? ",loopback" : ""); Note that this data is copied out to userspace with: \| if (copy_to_user(data, &hi, sizeof(hi))) ... however, the data was also copied FROM the user here: \| if (copy_from_user(&hi, data, sizeof(hi))) Considering the above, a suitable replacement is strscpy_pad() as it guarantees NUL-termination on the destination buffer while also NUL-padding (which is good+wanted behavior when copying data to userspace). Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231016-strncpy-drivers-net-hamradio-baycom_epp-c-v2-1-39f72a72de30@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17	docs: netlink: clean up after deprecating version	Jakub Kicinski
	Jiri moved version to legacy specs in commit 0f07415ebb78 ("netlink: specs: don't allow version to be specified for genetlink"). Update the documentation. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231016214540.1822392-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17	tools: ynl: fix converting flags to names after recent cleanup	Jakub Kicinski
	I recently cleaned up specs to not specify enum-as-flags when target enum is already defined as flags. YNL Python library did not convert flags, unfortunately, so this caused breakage for Stan and Willem. Note that the nlspec.py abstraction already hides the differences between flags and enums (value vs user_value), so the changes are pretty trivial. Fixes: 0629f22ec130 ("ynl: netdev: drop unnecessary enum-as-flags") Reported-and-tested-by: Willem de Bruijn <willemb@google.com> Reported-and-tested-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/all/ZS10NtQgd_BJZ3RU@google.com/ Link: https://lore.kernel.org/r/20231016213937.1820386-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17	Merge branch 'net-remove-last-of-the-phylink-validate-methods-and-clean-up'	Jakub Kicinski
	Russell King says: ==================== net: remove last of the phylink validate methods and clean up This four patch series removes the last of the phylink MAC .validate methods which can be found in the Freescale fman driver. fman has a requirement that half duplex may not be supported in RGMII mode, which is currently handled in its .validate method. In order to keep this functionality when removing the .validate method, we need to replace that with equivalent functionality, for which I propose the optional .mac_get_caps method in the first patch. The advantage of this approach over the .validate callback is that MAC drivers only have to deal with the MAC_* capabilities, and don't need to call back into phylink functions to do the masking of the ethtool linkmodes etc - which then becomes internal to phylink. This can be seen in the fourth patch where we make a load of these methods static. ==================== Link: https://lore.kernel.org/r/ZS1Z5DDfHyjMryYu@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17	net: phylink: remove a bunch of unused validation methods	Russell King (Oracle)
	Remove exports for phylink_caps_to_linkmodes(), phylink_get_capabilities(), phylink_validate_mask_caps() and phylink_generic_validate(). Also, as phylink_generic_validate() is no longer called, we can remove its implementation as well. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qsPkK-009wip-W9@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17	net: phylink: remove .validate() method	Russell King (Oracle)
	The MAC .validate() method is no longer used, so remove it from the phylink_mac_ops structure, and remove the callsite in phylink_validate_mac_and_pcs(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qsPkF-009wij-QM@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17	net: fman: convert to .mac_get_caps()	Russell King (Oracle)
	Convert fman to use the .mac_get_caps() method rather than the .validate() method. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Sean Anderson <sean.anderson@seco.com> Link: https://lore.kernel.org/r/E1qsPkA-009wid-Kv@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17	net: phylink: provide mac_get_caps() method	Russell King (Oracle)
	Provide a new method, mac_get_caps() to get the MAC capabilities for the specified interface mode. This is for MACs which have special requirements, such as not supporting half-duplex in certain interface modes, and will replace the validate() method. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qsPk5-009wiX-G5@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17	eth: bnxt: fix backward compatibility with older devices	Jakub Kicinski
	Recent FW interface update bumped the size of struct hwrm_func_cfg_input above 128B which is the max some devices support. Probe on Stratus (BCM957452) with FW 20.8.3.11 fails with: bnxt_en ...: Unable to reserve tx rings bnxt_en ...: 2nd rings reservation failed. bnxt_en ...: Not enough rings available. Once probe is fixed other errors pop up: bnxt_en ...: Failed to set async event completion ring. This is because __hwrm_send() rejects requests larger than bp->hwrm_max_ext_req_len with -E2BIG. Since the driver doesn't actually access any of the new fields, yet, trim the length. It should be safe. Similar workaround exists for backing_store_cfg_input. Although that one mins() to a constant of 256, not 128 we'll effectively use here. Michael explains: "the backing store cfg command is supported by relatively newer firmware that will accept 256 bytes at least." To make debugging easier in the future add a warning for oversized requests. Fixes: 754fbf604ff6 ("bnxt_en: Update firmware interface to 1.10.2.171") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231016171640.1481493-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17	Merge branch 'bridge-add-a-limit-on-learned-fdb-entries'	Jakub Kicinski
	Johannes Nixdorf says: ==================== bridge: Add a limit on learned FDB entries Introduce a limit on the amount of learned FDB entries on a bridge, configured by netlink with a build time default on bridge creation in the kernel config. For backwards compatibility the kernel config default is disabling the limit (0). Without any limit a malicious actor may OOM a kernel by spamming packets with changing MAC addresses on their bridge port, so allow the bridge creator to limit the number of entries. Currently the manual entries are identified by the bridge flags BR_FDB_LOCAL or BR_FDB_ADDED_BY_USER, atomically bundled under the new flag BR_FDB_DYNAMIC_LEARNED. This means the limit also applies to entries created with BR_FDB_ADDED_BY_EXT_LEARN but none of BR_FDB_LOCAL or BR_FDB_ADDED_BY_USER, e.g. ones added by SWITCHDEV_FDB_ADD_TO_BRIDGE. Link to the corresponding iproute2 changes: https://lore.kernel.org/r/20230919-fdb_limit-v4-1-b4d2dc4df30f@avm.de v4: https://lore.kernel.org/r/20230919-fdb_limit-v4-0-39f0293807b8@avm.de/ v3: https://lore.kernel.org/r/20230905-fdb_limit-v3-0-7597cd500a82@avm.de/ v2: https://lore.kernel.org/netdev/20230619071444.14625-1-jnixdorf-oss@avm.de/ v1: https://lore.kernel.org/netdev/20230515085046.4457-1-jnixdorf-oss@avm.de/ ==================== Link: https://lore.kernel.org/r/20231016-fdb_limit-v5-0-32cddff87758@avm.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17	selftests: forwarding: bridge_fdb_learning_limit: Add a new selftest	Johannes Nixdorf
	Add a suite covering the fdb_n_learned and fdb_max_learned bridge features, touching all special cases in accounting at least once. Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de> Link: https://lore.kernel.org/r/20231016-fdb_limit-v5-5-32cddff87758@avm.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17	net: bridge: Set strict_start_type for br_policy	Johannes Nixdorf
	Set any new attributes added to br_policy to be parsed strictly, to prevent userspace from passing garbage. Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20231016-fdb_limit-v5-4-32cddff87758@avm.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17	net: bridge: Add netlink knobs for number / max learned FDB entries	Johannes Nixdorf
	The previous patch added accounting and a limit for the number of dynamically learned FDB entries per bridge. However it did not provide means to actually configure those bounds or read back the count. This patch does that. Two new netlink attributes are added for the accounting and limit of dynamically learned FDB entries: - IFLA_BR_FDB_N_LEARNED (RO) for the number of entries accounted for a single bridge. - IFLA_BR_FDB_MAX_LEARNED (RW) for the configured limit of entries for the bridge. The new attributes are used like this: # ip link add name br up type bridge fdb_max_learned 256 # ip link add name v1 up master br type veth peer v2 # ip link set up dev v2 # mausezahn -a rand -c 1024 v2 0.01 seconds (90877 packets per second # bridge fdb \| grep -v permanent \| wc -l 256 # ip -d link show dev br 13: br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 [...] [...] fdb_n_learned 256 fdb_max_learned 256 Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20231016-fdb_limit-v5-3-32cddff87758@avm.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17	net: bridge: Track and limit dynamically learned FDB entries	Johannes Nixdorf
	A malicious actor behind one bridge port may spam the kernel with packets with a random source MAC address, each of which will create an FDB entry, each of which is a dynamic allocation in the kernel. There are roughly 2^48 different MAC addresses, further limited by the rhashtable they are stored in to 2^31. Each entry is of the type struct net_bridge_fdb_entry, which is currently 128 bytes big. This means the maximum amount of memory allocated for FDB entries is 2^31 * 128B = 256GiB, which is too much for most computers. Mitigate this by maintaining a per bridge count of those automatically generated entries in fdb_n_learned, and a limit in fdb_max_learned. If the limit is hit new entries are not learned anymore. For backwards compatibility the default setting of 0 disables the limit. User-added entries by netlink or from bridge or bridge port addresses are never blocked and do not count towards that limit. Introduce a new fdb entry flag BR_FDB_DYNAMIC_LEARNED to keep track of whether an FDB entry is included in the count. The flag is enabled for dynamically learned entries, and disabled for all other entries. This should be equivalent to BR_FDB_ADDED_BY_USER and BR_FDB_LOCAL being unset, but contrary to the two flags it can be toggled atomically. Atomicity is required here, as there are multiple callers that modify the flags, but are not under a common lock (br_fdb_update is the exception for br->hash_lock, br_fdb_external_learn_add for RTNL). Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de> Link: https://lore.kernel.org/r/20231016-fdb_limit-v5-2-32cddff87758@avm.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>