summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-18tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skbEric Dumazet
In commit 75eefc6c59fd ("tcp: tsq: add a shortcut in tcp_small_queue_check()") we allowed to send an skb regardless of TSQ limits being hit if rtx queue was empty or had a single skb, in order to better fill the pipe when/if TX completions were slow. Then later, commit 75c119afe14f ("tcp: implement rb-tree based retransmit queue") accidentally removed the special case for one skb in rtx queue. Stefan Wahren reported a regression in single TCP flow throughput using a 100Mbit fec link, starting from commit 65466904b015 ("tcp: adjust TSO packet sizes based on min_rtt"). This last commit only made the regression more visible, because it locked the TCP flow on a particular behavior where TSQ prevented two skbs being pushed downstream, adding silences on the wire between each TSO packet. Many thanks to Stefan for his invaluable help ! Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue") Link: https://lore.kernel.org/netdev/7f31ddc8-9971-495e-a1f6-819df542e0af@gmx.net/ Reported-by: Stefan Wahren <wahrenst@gmx.net> Tested-by: Stefan Wahren <wahrenst@gmx.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Link: https://lore.kernel.org/r/20231017124526.4060202-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18octeon_ep: update BQL sent bytes before ringing doorbellShinas Rasheed
Sometimes Tx is completed immediately after doorbell is updated, which causes Tx completion routing to update completion bytes before the same packet bytes are updated in sent bytes in transmit function, hence hitting BUG_ON() in dql_completed(). To avoid this, update BQL sent bytes before ringing doorbell. Fixes: 37d79d059606 ("octeon_ep: add Tx/Rx processing and interrupt support") Signed-off-by: Shinas Rasheed <srasheed@marvell.com> Link: https://lore.kernel.org/r/20231017105030.2310966-1-srasheed@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18net: wangxun: remove redundant kernel logJiawen Wu
Since PBA info can be read from lspci, delete txgbe_read_pba_string() and the prints. In addition, delete the redundant MAC address printing. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20231017100635.154967-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18perf/benchmark: fix seccomp_unotify benchmark for 32-bitJiri Slaby (SUSE)
Commit 7d5cb68af638 (perf/benchmark: add a new benchmark for seccom_unotify) added a reference to __NR_seccomp into perf. This is fine as it added also a definition of __NR_seccomp for 64-bit. But it failed to do so for 32-bit as instead of ifndef, ifdef was used. Fix this typo (so fix the build of perf on 32-bit). Fixes: 7d5cb68af638 (perf/benchmark: add a new benchmark for seccom_unotify) Cc: Andrei Vagin <avagin@google.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20231017083019.31733-1-jirislaby@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2023-10-18Merge branch 'net-fec-fix-device_get_match_data-usage'Jakub Kicinski
Alexander Stein says: ==================== net: fec: Fix device_get_match_data usage this is v2 adressing the regression introduced by commit b0377116decd ("net: ethernet: Use device_get_match_data()"). You could also remove the (!dev_info) case for Coldfire as this platform has no quirks. But IMHO this should be kept as long as Coldfire platform data is supported. ==================== Link: https://lore.kernel.org/r/20231017063419.925266-1-alexander.stein@ew.tq-group.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18net: fec: Remove non-Coldfire platform IDsAlexander Stein
All i.MX platforms (non-Coldfire) use DT nowadays, so their platform ID entries can be removed. Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20231017063419.925266-3-alexander.stein@ew.tq-group.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18net: fec: Fix device_get_match_data usageAlexander Stein
device_get_match_data() expects that of_device_id->data points to actual fec_devinfo data, not a platform_device_id entry. Fix this by adjusting OF device data pointers to their corresponding structs. enum imx_fec_type is now unused and can be removed. Fixes: b0377116decd ("net: ethernet: Use device_get_match_data()") Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20231017063419.925266-2-alexander.stein@ew.tq-group.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18drivers: net: wwan: iosm: Fixed multiple typos in multiple filesMuhammad Muzammil
iosm_ipc_chnl_cfg.h: Fixed typo iosm_ipc_imem_ops.h: Fixed typo iosm_ipc_mux.h: Fixed typo iosm_ipc_pm.h: Fixed typo iosm_ipc_port.h: Fixed typo iosm_ipc_trace.h: Fixed typo Signed-off-by: Muhammad Muzammil <m.muzzammilashraf@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231014121407.10012-1-m.muzzammilashraf@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18Merge tag 'spi-fix-v6-6-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fix from Mark Brown: "A fix for the npcm-fiu driver in cases where there are no dummy bytes during reads" * tag 'spi-fix-v6-6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: npcm-fiu: Fix UMA reads when dummy.nbytes == 0
2023-10-18Merge tag 'regmap-fix-v6.6-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap fix from Mark Brown: "A straightforward fix from Johan for a long standing bug in cases where we both have regmaps without devices and something is using dev_get_regmap()" * tag 'regmap-fix-v6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: fix NULL deref on lookup
2023-10-18netfilter: nf_tables: revert do not remove elements if set backend ↵Pablo Neira Ayuso
implements .abort nf_tables_abort_release() path calls nft_set_elem_destroy() for NFT_MSG_NEWSETELEM which releases the element, however, a reference to the element still remains in the working copy. Fixes: ebd032fa8818 ("netfilter: nf_tables: do not remove elements if set backend implements .abort") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18netfilter: nft_set_rbtree: .deactivate fails if element has expiredPablo Neira Ayuso
This allows to remove an expired element which is not possible in other existing set backends, this is more noticeable if gc-interval is high so expired elements remain in the tree. On-demand gc also does not help in this case, because this is delete element path. Return NULL if element has expired. Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18selftests: netfilter: Run nft_audit.sh in its own netnsPhil Sutter
Don't mess with the host's firewall ruleset. Since audit logging is not per-netns, add an initial delay of a second so other selftests' netns cleanups have a chance to finish. Fixes: e8dbde59ca3f ("selftests: netfilter: Test nf_tables audit logging") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18netfilter: nf_tables: audit log object reset once per tablePhil Sutter
When resetting multiple objects at once (via dump request), emit a log message per table (or filled skb) and resurrect the 'entries' parameter to contain the number of objects being logged for. To test the skb exhaustion path, perform some bulk counter and quota adds in the kselftest. Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> (Audit) Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18neighbor: tracing: Move pin6 inside CONFIG_IPV6=y sectionGeert Uytterhoeven
When CONFIG_IPV6=n, and building with W=1: In file included from include/trace/define_trace.h:102, from include/trace/events/neigh.h:255, from net/core/net-traces.c:51: include/trace/events/neigh.h: In function ‘trace_event_raw_event_neigh_create’: include/trace/events/neigh.h:42:34: error: variable ‘pin6’ set but not used [-Werror=unused-but-set-variable] 42 | struct in6_addr *pin6; | ^~~~ include/trace/trace_events.h:402:11: note: in definition of macro ‘DECLARE_EVENT_CLASS’ 402 | { assign; } \ | ^~~~~~ include/trace/trace_events.h:44:30: note: in expansion of macro ‘PARAMS’ 44 | PARAMS(assign), \ | ^~~~~~ include/trace/events/neigh.h:23:1: note: in expansion of macro ‘TRACE_EVENT’ 23 | TRACE_EVENT(neigh_create, | ^~~~~~~~~~~ include/trace/events/neigh.h:41:9: note: in expansion of macro ‘TP_fast_assign’ 41 | TP_fast_assign( | ^~~~~~~~~~~~~~ In file included from include/trace/define_trace.h:103, from include/trace/events/neigh.h:255, from net/core/net-traces.c:51: include/trace/events/neigh.h: In function ‘perf_trace_neigh_create’: include/trace/events/neigh.h:42:34: error: variable ‘pin6’ set but not used [-Werror=unused-but-set-variable] 42 | struct in6_addr *pin6; | ^~~~ include/trace/perf.h:51:11: note: in definition of macro ‘DECLARE_EVENT_CLASS’ 51 | { assign; } \ | ^~~~~~ include/trace/trace_events.h:44:30: note: in expansion of macro ‘PARAMS’ 44 | PARAMS(assign), \ | ^~~~~~ include/trace/events/neigh.h:23:1: note: in expansion of macro ‘TRACE_EVENT’ 23 | TRACE_EVENT(neigh_create, | ^~~~~~~~~~~ include/trace/events/neigh.h:41:9: note: in expansion of macro ‘TP_fast_assign’ 41 | TP_fast_assign( | ^~~~~~~~~~~~~~ Indeed, the variable pin6 is declared and initialized unconditionally, while it is only used and needlessly re-initialized when support for IPv6 is enabled. Fix this by dropping the unused variable initialization, and moving the variable declaration inside the existing section protected by a check for CONFIG_IPV6. Fixes: fc651001d2c5ca4f ("neighbor: Add tracepoint to __neigh_create") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> # build-tested Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18net: skb_find_text: Ignore patterns extending past 'to'Phil Sutter
Assume that caller's 'to' offset really represents an upper boundary for the pattern search, so patterns extending past this offset are to be rejected. The old behaviour also was kind of inconsistent when it comes to fragmentation (or otherwise non-linear skbs): If the pattern started in between 'to' and 'from' offsets but extended to the next fragment, it was not found if 'to' offset was still within the current fragment. Test the new behaviour in a kselftest using iptables' string match. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Fixes: f72b948dcbb8 ("[NET]: skb_find_text ignores to argument") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18Revert "net: wwan: iosm: enable runtime pm support for 7560"Bagas Sanjaya
Runtime power management support breaks Intel LTE modem where dmesg dump showes timeout errors: ``` [ 72.027442] iosm 0000:01:00.0: msg timeout [ 72.531638] iosm 0000:01:00.0: msg timeout [ 73.035414] iosm 0000:01:00.0: msg timeout [ 73.540359] iosm 0000:01:00.0: msg timeout ``` Furthermore, when shutting down with `poweroff` and modem attached, the system rebooted instead of powering down as expected. The modem works again only after power cycling. Revert runtime power management support for IOSM driver as introduced by commit e4f5073d53be6c ("net: wwan: iosm: enable runtime pm support for 7560"). Fixes: e4f5073d53be ("net: wwan: iosm: enable runtime pm support for 7560") Reported-by: Martin <mwolf@adiumentum.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217996 Link: https://lore.kernel.org/r/267abf02-4b60-4a2e-92cd-709e3da6f7d3@gmail.com/ Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18Merge tag 'nf-next-23-10-18' of ↵David S. Miller
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter next pull request 2023-10-18 This series contains initial netfilter skb drop_reason support, from myself. First few patches fix up a few spots to make sure we won't trip when followup patches embed error numbers in the upper bits (we already do this in some places). Then, nftables and bridge netfilter get converted to call kfree_skb_reason directly to let tooling pinpoint exact location of packet drops, rather than the existing NF_DROP catchall in nf_hook_slow(). I would like to eventually convert all netfilter modules, but as some callers cannot deal with NF_STOLEN (notably act_ct), more preparation work is needed for this. Last patch gets rid of an ugly 'de-const' cast in nftables. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18net: pktgen: Fix interface flags printingGavrilov Ilia
Device flags are displayed incorrectly: 1) The comparison (i == F_FLOW_SEQ) is always false, because F_FLOW_SEQ is equal to (1 << FLOW_SEQ_SHIFT) == 2048, and the maximum value of the 'i' variable is (NR_PKT_FLAG - 1) == 17. It should be compared with FLOW_SEQ_SHIFT. 2) Similarly to the F_IPSEC flag. 3) Also add spaces to the print end of the string literal "spi:%u" to prevent the output from merging with the flag that follows. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 99c6d3d20d62 ("pktgen: Remove brute-force printing of flags") Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18Merge branch 'ethtool-forced-speed'David S. Miller
Paul Greenwalt says: ==================== ethtool: Add link mode maps for forced speeds The following patch set was initially a part of [1]. As the purpose of the original series was to add the support of the new hardware to the intel ice driver, the refactoring of advertised link modes mapping was extracted to a new set. The patch set adds a common mechanism for mapping Ethtool forced speeds with Ethtool supported link modes, which can be used in drivers code. [1] https://lore.kernel.org/netdev/20230823180633.2450617-1-pawel.chmielewski@intel.com Changelog: v4->v5: Separated ethtool and qede changes into two patches, fixed indentation, and moved ethtool_forced_speed_maps_init() from ioctl.c to ethtool.h v3->v4: Moved the macro for setting fields into the common header file v2->v3: Fixed whitespaces, added missing line at end of file v1->v2: Fixed formatting, typo, moved declaration of iterator to loop line. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18ice: Refactor finding advertised link speedPawel Chmielewski
Refactor ice_get_link_ksettings to using forced speed to link modes mapping. Suggested-by : Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18qede: Refactor qede_forced_speed_maps_init()Paul Greenwalt
Refactor qede_forced_speed_maps_init() to use commen implementation ethtool_forced_speed_maps_init(). The qede driver was compile tested only. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18ethtool: Add forced speed to supported link modes mapsPaul Greenwalt
The need to map Ethtool forced speeds to Ethtool supported link modes is common among drivers. To support this, add a common structure for forced speed maps and a function to init them. This is solution was originally introduced in commit 1d4e4ecccb11 ("qede: populate supported link modes maps on module init") for qede driver. ethtool_forced_speed_maps_init() should be called during driver init with an array of struct ethtool_forced_speed_map to populate the mapping. Definitions for maps themselves are left in the driver code, as the sets of supported link modes may vary between the devices. Suggested-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18netfilter: nf_tables: de-constify set commit ops function argumentFlorian Westphal
The set backend using this already has to work around this via ugly cast, don't spread this pattern. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18netfilter: bridge: convert br_netfilter to NF_DROP_REASONFlorian Westphal
errno is 0 because these hooks are called from prerouting and forward. There is no socket that the errno would ever be propagated to. Other netfilter modules (e.g. nf_nat, conntrack, ...) can be converted in a similar way. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18netfilter: make nftables drops visible in net dropmonitorFlorian Westphal
net_dropmonitor blames core.c:nf_hook_slow. Add NF_DROP_REASON() helper and use it in nft_do_chain(). The helper releases the skb, so exact drop location becomes available. Calling code will observe the NF_STOLEN verdict instead. Adjust nf_hook_slow so we can embed an erro value wih NF_STOLEN verdicts, just like we do for NF_DROP. After this, drop in nftables can be pinpointed to a drop due to a rule or the chain policy. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18netfilter: nf_nat: mask out non-verdict bits when checking return valueFlorian Westphal
Same as previous change: we need to mask out the non-verdict bits, as upcoming patches may embed an errno value in NF_STOLEN verdicts too. NF_DROP could already do this, but not all called functions do this. Checks that only test ret vs NF_ACCEPT are fine, the 'errno parts' are always 0 for those. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18netfilter: conntrack: convert nf_conntrack_update to netfilter verdictsFlorian Westphal
This function calls helpers that can return nf-verdicts, but then those get converted to -1/0 as thats what the caller expects. Theoretically NF_DROP could have an errno number set in the upper 24 bits of the return value. Or any of those helpers could return NF_STOLEN, which would result in use-after-free. This is fine as-is, the called functions don't do this yet. But its better to avoid possible future problems if the upcoming patchset to add NF_DROP_REASON() support gains further users, so remove the 0/-1 translation from the picture and pass the verdicts down to the caller. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18netfilter: nf_tables: mask out non-verdict bits when checking return valueFlorian Westphal
nftables trace infra must mask out the non-verdict bit parts of the return value, else followup changes that 'return errno << 8 | NF_STOLEN' will cause breakage. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18netfilter: xt_mangle: only check verdict part of return valueFlorian Westphal
These checks assume that the caller only returns NF_DROP without any errno embedded in the upper bits. This is fine right now, but followup patches will start to propagate such errors to allow kfree_skb_drop_reason() in the called functions, those would then indicate 'errno << 8 | NF_STOLEN'. To not break things we have to mask those parts out. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18Merge branch 'devlink-deadlock'David S. Miller
Jiri Pirko says: ==================== devlink: fix a deadlock when taking devlink instance lock while holding RTNL lock devlink_port_fill() may be called sometimes with RTNL lock held. When putting the nested port function devlink instance attrs, current code takes nested devlink instance lock. In that case lock ordering is wrong. Patch #1 is a dependency of patch #2. Patch #2 converts the peernet2id_alloc() call to rely in RCU so it could called without devlink instance lock. Patch #3 takes device reference for devlink instance making sure that device does not disappear before devlink_release() is called. Patch #4 benefits from the preparations done in patches #2 and #3 and removes the problematic nested devlink lock aquisition. Patched #5-#7 improve documentation to reflect this issue so it is avoided in the future. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18devlink: document devlink_rel_nested_in_notify() functionJiri Pirko
Add a documentation for devlink_rel_nested_in_notify() describing the devlink instance locking consequences. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18Documentation: devlink: add a note about RTNL lock into locking sectionJiri Pirko
Add a note describing the locking order of taking RTNL lock with devlink instance lock. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18Documentation: devlink: add nested instance sectionJiri Pirko
Add a part talking about nested devlink instances describing the helpers and locking ordering. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18devlink: don't take instance lock for nested handle putJiri Pirko
Lockdep reports following issue: WARNING: possible circular locking dependency detected ------------------------------------------------------ devlink/8191 is trying to acquire lock: ffff88813f32c250 (&devlink->lock_key#14){+.+.}-{3:3}, at: devlink_rel_devlink_handle_put+0x11e/0x2d0 but task is already holding lock: ffffffff8511eca8 (rtnl_mutex){+.+.}-{3:3}, at: unregister_netdev+0xe/0x20 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (rtnl_mutex){+.+.}-{3:3}: lock_acquire+0x1c3/0x500 __mutex_lock+0x14c/0x1b20 register_netdevice_notifier_net+0x13/0x30 mlx5_lag_add_mdev+0x51c/0xa00 [mlx5_core] mlx5_load+0x222/0xc70 [mlx5_core] mlx5_init_one_devl_locked+0x4a0/0x1310 [mlx5_core] mlx5_init_one+0x3b/0x60 [mlx5_core] probe_one+0x786/0xd00 [mlx5_core] local_pci_probe+0xd7/0x180 pci_device_probe+0x231/0x720 really_probe+0x1e4/0xb60 __driver_probe_device+0x261/0x470 driver_probe_device+0x49/0x130 __driver_attach+0x215/0x4c0 bus_for_each_dev+0xf0/0x170 bus_add_driver+0x21d/0x590 driver_register+0x133/0x460 vdpa_match_remove+0x89/0xc0 [vdpa] do_one_initcall+0xc4/0x360 do_init_module+0x22d/0x760 load_module+0x51d7/0x6750 init_module_from_file+0xd2/0x130 idempotent_init_module+0x326/0x5a0 __x64_sys_finit_module+0xc1/0x130 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 -> #2 (mlx5_intf_mutex){+.+.}-{3:3}: lock_acquire+0x1c3/0x500 __mutex_lock+0x14c/0x1b20 mlx5_register_device+0x3e/0xd0 [mlx5_core] mlx5_init_one_devl_locked+0x8fa/0x1310 [mlx5_core] mlx5_devlink_reload_up+0x147/0x170 [mlx5_core] devlink_reload+0x203/0x380 devlink_nl_cmd_reload+0xb84/0x10e0 genl_family_rcv_msg_doit+0x1cc/0x2a0 genl_rcv_msg+0x3c9/0x670 netlink_rcv_skb+0x12c/0x360 genl_rcv+0x24/0x40 netlink_unicast+0x435/0x6f0 netlink_sendmsg+0x7a0/0xc70 sock_sendmsg+0xc5/0x190 __sys_sendto+0x1c8/0x290 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 -> #1 (&dev->lock_key#8){+.+.}-{3:3}: lock_acquire+0x1c3/0x500 __mutex_lock+0x14c/0x1b20 mlx5_init_one_devl_locked+0x45/0x1310 [mlx5_core] mlx5_devlink_reload_up+0x147/0x170 [mlx5_core] devlink_reload+0x203/0x380 devlink_nl_cmd_reload+0xb84/0x10e0 genl_family_rcv_msg_doit+0x1cc/0x2a0 genl_rcv_msg+0x3c9/0x670 netlink_rcv_skb+0x12c/0x360 genl_rcv+0x24/0x40 netlink_unicast+0x435/0x6f0 netlink_sendmsg+0x7a0/0xc70 sock_sendmsg+0xc5/0x190 __sys_sendto+0x1c8/0x290 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 -> #0 (&devlink->lock_key#14){+.+.}-{3:3}: check_prev_add+0x1af/0x2300 __lock_acquire+0x31d7/0x4eb0 lock_acquire+0x1c3/0x500 __mutex_lock+0x14c/0x1b20 devlink_rel_devlink_handle_put+0x11e/0x2d0 devlink_nl_port_fill+0xddf/0x1b00 devlink_port_notify+0xb5/0x220 __devlink_port_type_set+0x151/0x510 devlink_port_netdevice_event+0x17c/0x220 notifier_call_chain+0x97/0x240 unregister_netdevice_many_notify+0x876/0x1790 unregister_netdevice_queue+0x274/0x350 unregister_netdev+0x18/0x20 mlx5e_vport_rep_unload+0xc5/0x1c0 [mlx5_core] __esw_offloads_unload_rep+0xd8/0x130 [mlx5_core] mlx5_esw_offloads_rep_unload+0x52/0x70 [mlx5_core] mlx5_esw_offloads_unload_rep+0x85/0xc0 [mlx5_core] mlx5_eswitch_unload_sf_vport+0x41/0x90 [mlx5_core] mlx5_devlink_sf_port_del+0x120/0x280 [mlx5_core] genl_family_rcv_msg_doit+0x1cc/0x2a0 genl_rcv_msg+0x3c9/0x670 netlink_rcv_skb+0x12c/0x360 genl_rcv+0x24/0x40 netlink_unicast+0x435/0x6f0 netlink_sendmsg+0x7a0/0xc70 sock_sendmsg+0xc5/0x190 __sys_sendto+0x1c8/0x290 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 other info that might help us debug this: Chain exists of: &devlink->lock_key#14 --> mlx5_intf_mutex --> rtnl_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock(mlx5_intf_mutex); lock(rtnl_mutex); lock(&devlink->lock_key#14); Problem is taking the devlink instance lock of nested instance when RTNL is already held. To fix this, don't take the devlink instance lock when putting nested handle. Instead, rely on the preparations done by previous two patches to be able to access device pointer and obtain netns id without devlink instance lock held. Fixes: c137743bce02 ("devlink: introduce object and nested devlink relationship infra") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18devlink: take device reference for devlink objectJiri Pirko
In preparation to allow to access device pointer without devlink instance lock held, make sure the device pointer is usable until devlink_release() is called. Fixes: c137743bce02 ("devlink: introduce object and nested devlink relationship infra") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18devlink: call peernet2id_alloc() with net pointer under RCU read lockJiri Pirko
peernet2id_alloc() allows to be called lockless with peer net pointer obtained in RCU critical section and makes sure to return ns ID if net namespaces is not being removed concurrently. Benefit from read_pnet_rcu() helper addition, use it to obtain net pointer under RCU read lock and pass it to peernet2id_alloc() to get ns ID. Fixes: c137743bce02 ("devlink: introduce object and nested devlink relationship infra") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18net: treat possible_net_t net pointer as an RCU one and add read_pnet_rcu()Jiri Pirko
Make the net pointer stored in possible_net_t structure annotated as an RCU pointer. Change the access helpers to treat it as such. Introduce read_pnet_rcu() helper to allow caller to dereference the net pointer under RCU read lock. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-17Merge tag 'mlx5-updates-2023-10-10' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-10-10 1) Adham Faris, Increase max supported channels number to 256 2) Leon Romanovsky, Allow IPsec soft/hard limits in bytes 3) Shay Drory, Replace global mlx5_intf_lock with HCA devcom component lock 4) Wei Zhang, Optimize SF creation flow During SF creation, HCA state gets changed from INVALID to IN_USE step by step. Accordingly, FW sends vhca event to driver to inform about this state change asynchronously. Each vhca event is critical because all related SW/FW operations are triggered by it. Currently there is only a single mlx5 general event handler which not only handles vhca event but many other events. This incurs huge bottleneck because all events are forced to be handled in serial manner. Moreover, all SFs share same table_lock which inevitably impacts each other when they are created in parallel. This series will solve this issue by: 1. A dedicated vhca event handler is introduced to eliminate the mutual impact with other mlx5 events. 2. Max FW threads work queues are employed in the vhca event handler to fully utilize FW capability. 3. Redesign SF active work logic to completely remove table_lock. With above optimization, SF creation time is reduced by 25%, i.e. from 80s to 60s when creating 100 SFs. Patches summary: Patch 1 - implement dedicated vhca event handler with max FW cmd threads of work queues. Patch 2 - remove table_lock by redesigning SF active work logic. * tag 'mlx5-updates-2023-10-10' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Allow IPsec soft/hard limits in bytes net/mlx5e: Increase max supported channels number to 256 net/mlx5e: Preparations for supporting larger number of channels net/mlx5e: Refactor mlx5e_rss_init() and mlx5e_rss_free() API's net/mlx5e: Refactor mlx5e_rss_set_rxfh() and mlx5e_rss_get_rxfh() net/mlx5e: Refactor rx_res_init() and rx_res_free() APIs net/mlx5e: Use PTR_ERR_OR_ZERO() to simplify code net/mlx5: Use PTR_ERR_OR_ZERO() to simplify code net/mlx5: fix config name in Kconfig parameter documentation net/mlx5: Remove unused declaration net/mlx5: Replace global mlx5_intf_lock with HCA devcom component lock net/mlx5: Refactor LAG peer device lookout bus logic to mlx5 devcom net/mlx5: Avoid false positive lockdep warning by adding lock_class_key net/mlx5: Redesign SF active work to remove table_lock net/mlx5: Parallelize vhca event handling ==================== Link: https://lore.kernel.org/r/20231014171908.290428-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17Merge tag 'ipsec-2023-10-17' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2023-10-17 1) Fix a slab-use-after-free in xfrm_policy_inexact_list_reinsert. From Dong Chenchen. 2) Fix data-races in the xfrm interfaces dev->stats fields. From Eric Dumazet. 3) Fix a data-race in xfrm_gen_index. From Eric Dumazet. 4) Fix an inet6_dev refcount underflow. From Zhang Changzhong. 5) Check the return value of pskb_trim in esp_remove_trailer for esp4 and esp6. From Ma Ke. 6) Fix a data-race in xfrm_lookup_with_ifid. From Eric Dumazet. * tag 'ipsec-2023-10-17' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: fix a data-race in xfrm_lookup_with_ifid() net: ipv4: fix return value check in esp_remove_trailer net: ipv6: fix return value check in esp_remove_trailer xfrm6: fix inet6_dev refcount underflow problem xfrm: fix a data-race in xfrm_gen_index() xfrm: interface: use DEV_STATS_INC() net: xfrm: skip policies marked as dead while reinserting policies ==================== Link: https://lore.kernel.org/r/20231017083723.1364940-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17hamradio: replace deprecated strncpy with strscpy_padJustin Stitt
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. We expect both hi.data.modename and hi.data.drivername to be NUL-terminated based on its usage with sprintf: | sprintf(hi.data.modename, "%sclk,%smodem,fclk=%d,bps=%d%s", | bc->cfg.intclk ? "int" : "ext", | bc->cfg.extmodem ? "ext" : "int", bc->cfg.fclk, bc->cfg.bps, | bc->cfg.loopback ? ",loopback" : ""); Note that this data is copied out to userspace with: | if (copy_to_user(data, &hi, sizeof(hi))) ... however, the data was also copied FROM the user here: | if (copy_from_user(&hi, data, sizeof(hi))) Considering the above, a suitable replacement is strscpy_pad() as it guarantees NUL-termination on the destination buffer while also NUL-padding (which is good+wanted behavior when copying data to userspace). Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231016-strncpy-drivers-net-hamradio-baycom_epp-c-v2-1-39f72a72de30@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17docs: netlink: clean up after deprecating versionJakub Kicinski
Jiri moved version to legacy specs in commit 0f07415ebb78 ("netlink: specs: don't allow version to be specified for genetlink"). Update the documentation. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231016214540.1822392-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17tools: ynl: fix converting flags to names after recent cleanupJakub Kicinski
I recently cleaned up specs to not specify enum-as-flags when target enum is already defined as flags. YNL Python library did not convert flags, unfortunately, so this caused breakage for Stan and Willem. Note that the nlspec.py abstraction already hides the differences between flags and enums (value vs user_value), so the changes are pretty trivial. Fixes: 0629f22ec130 ("ynl: netdev: drop unnecessary enum-as-flags") Reported-and-tested-by: Willem de Bruijn <willemb@google.com> Reported-and-tested-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/all/ZS10NtQgd_BJZ3RU@google.com/ Link: https://lore.kernel.org/r/20231016213937.1820386-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17net: usb: smsc95xx: Fix an error code in smsc95xx_reset()Dan Carpenter
Return a negative error code instead of success. Fixes: 2f7ca802bdae ("net: Add SMSC LAN9500 USB2.0 10/100 ethernet adapter driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/147927f0-9ada-45cc-81ff-75a19dd30b76@moroto.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17Merge branch 'net-remove-last-of-the-phylink-validate-methods-and-clean-up'Jakub Kicinski
Russell King says: ==================== net: remove last of the phylink validate methods and clean up This four patch series removes the last of the phylink MAC .validate methods which can be found in the Freescale fman driver. fman has a requirement that half duplex may not be supported in RGMII mode, which is currently handled in its .validate method. In order to keep this functionality when removing the .validate method, we need to replace that with equivalent functionality, for which I propose the optional .mac_get_caps method in the first patch. The advantage of this approach over the .validate callback is that MAC drivers only have to deal with the MAC_* capabilities, and don't need to call back into phylink functions to do the masking of the ethtool linkmodes etc - which then becomes internal to phylink. This can be seen in the fourth patch where we make a load of these methods static. ==================== Link: https://lore.kernel.org/r/ZS1Z5DDfHyjMryYu@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17net: phylink: remove a bunch of unused validation methodsRussell King (Oracle)
Remove exports for phylink_caps_to_linkmodes(), phylink_get_capabilities(), phylink_validate_mask_caps() and phylink_generic_validate(). Also, as phylink_generic_validate() is no longer called, we can remove its implementation as well. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qsPkK-009wip-W9@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17net: phylink: remove .validate() methodRussell King (Oracle)
The MAC .validate() method is no longer used, so remove it from the phylink_mac_ops structure, and remove the callsite in phylink_validate_mac_and_pcs(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qsPkF-009wij-QM@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17net: fman: convert to .mac_get_caps()Russell King (Oracle)
Convert fman to use the .mac_get_caps() method rather than the .validate() method. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Sean Anderson <sean.anderson@seco.com> Link: https://lore.kernel.org/r/E1qsPkA-009wid-Kv@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17net: phylink: provide mac_get_caps() methodRussell King (Oracle)
Provide a new method, mac_get_caps() to get the MAC capabilities for the specified interface mode. This is for MACs which have special requirements, such as not supporting half-duplex in certain interface modes, and will replace the validate() method. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qsPk5-009wiX-G5@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17eth: bnxt: fix backward compatibility with older devicesJakub Kicinski
Recent FW interface update bumped the size of struct hwrm_func_cfg_input above 128B which is the max some devices support. Probe on Stratus (BCM957452) with FW 20.8.3.11 fails with: bnxt_en ...: Unable to reserve tx rings bnxt_en ...: 2nd rings reservation failed. bnxt_en ...: Not enough rings available. Once probe is fixed other errors pop up: bnxt_en ...: Failed to set async event completion ring. This is because __hwrm_send() rejects requests larger than bp->hwrm_max_ext_req_len with -E2BIG. Since the driver doesn't actually access any of the new fields, yet, trim the length. It should be safe. Similar workaround exists for backing_store_cfg_input. Although that one mins() to a constant of 256, not 128 we'll effectively use here. Michael explains: "the backing store cfg command is supported by relatively newer firmware that will accept 256 bytes at least." To make debugging easier in the future add a warning for oversized requests. Fixes: 754fbf604ff6 ("bnxt_en: Update firmware interface to 1.10.2.171") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231016171640.1481493-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>