summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-02-11Merge tag 'batadv-net-pullrequest-20250207' of ↵Paolo Abeni
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here are some batman-adv bugfixes: - Fix panic during interface removal in BATMAN V, by Andy Strohman - Cleanup BATMAN V/ELP metric handling, by Sven Eckelmann (2 patches) - Fix incorrect offset in batadv_tt_tvlv_ogm_handler_v1(), by Remi Pommarel * tag 'batadv-net-pullrequest-20250207' of git://git.open-mesh.org/linux-merge: batman-adv: Fix incorrect offset in batadv_tt_tvlv_ogm_handler_v1() batman-adv: Drop unmanaged ELP metric worker batman-adv: Ignore neighbor throughput metrics in error case batman-adv: fix panic during interface removal ==================== Link: https://patch.msgid.link/20250207095823.26043-1-sw@simonwunderlich.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11Merge branch 'ptp-vmclock-bugfixes-and-cleanups-for-error-handling'Paolo Abeni
says: ==================== ptp: vmclock: bugfixes and cleanups for error handling Some error handling issues I noticed while looking at the code. Only compile-tested. v1: https://lore.kernel.org/r/20250206-vmclock-probe-v1-0-17a3ea07be34@linutronix.de Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> ==================== Link: https://patch.msgid.link/20250207-vmclock-probe-v2-0-bc2fce0bdf07@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11ptp: vmclock: Remove goto-based cleanup logicThomas Weißschuh
vmclock_probe() uses an "out:" label to return from the function on error. This indicates that some cleanup operation is necessary. However the label does not do anything as all resources are managed through devres, making the code slightly harder to read. Remove the label and just return directly. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11ptp: vmclock: Clean up miscdev and ptp clock through devresThomas Weißschuh
Most resources owned by the vmclock device are managed through devres. Only the miscdev and ptp clock are managed manually. This makes the code slightly harder to understand than necessary. Switch them over to devres and remove the now unnecessary drvdata. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11ptp: vmclock: Don't unregister misc device if it was not registeredThomas Weißschuh
vmclock_remove() tries to detect the successful registration of the misc device based on the value of its minor value. However that check is incorrect if the misc device registration was not attempted in the first place. Always initialize the minor number, so the check works properly. Fixes: 205032724226 ("ptp: Add support for the AMZNC10C 'vmclock' device") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11ptp: vmclock: Set driver data before its usageThomas Weißschuh
If vmclock_ptp_register() fails during probing, vmclock_remove() is called to clean up the ptp clock and misc device. It uses dev_get_drvdata() to access the vmclock state. However the driver data is not yet set at this point. Assign the driver data earlier. Fixes: 205032724226 ("ptp: Add support for the AMZNC10C 'vmclock' device") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11ptp: vmclock: Add .owner to vmclock_miscdev_fopsDavid Woodhouse
Without the .owner field, the module can be unloaded while /dev/vmclock0 is open, leading to an oops. Fixes: 205032724226 ("ptp: Add support for the AMZNC10C 'vmclock' device") Cc: stable@vger.kernel.org Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-10Merge tag 'linux-can-fixes-for-6.14-20250208' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2025-02-08 The first patch is by Reyders Morales and fixes a code example in the CAN ISO15765-2 documentation. The next patch is contributed by Alexander Hölzl and fixes sending of J1939 messages with zero data length. Fedor Pchelkin's patch for the ctucanfd driver adds a missing handling for an skb allocation error. Krzysztof Kozlowski contributes a patch for the c_can driver to fix unbalanced runtime PM disable in error path. The next patch is by Vincent Mailhol and fixes a NULL pointer dereference on udev->serial in the etas_es58x driver. The patch is by Robin van der Gracht and fixes the handling for an skb allocation error. * tag 'linux-can-fixes-for-6.14-20250208' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: rockchip: rkcanfd_handle_rx_fifo_overflow_int(): bail out if skb cannot be allocated can: etas_es58x: fix potential NULL pointer dereference on udev->serial can: c_can: fix unbalanced runtime PM disable in error path can: ctucanfd: handle skb allocation failure can: j1939: j1939_sk_send_loop(): fix unable to send messages with data length zero Documentation/networking: fix basic node example document ISO 15765-2 ==================== Link: https://patch.msgid.link/20250208115120.237274-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10mlxsw: Enable Tx checksum offloadIdo Schimmel
The device is able to checksum plain TCP / UDP packets over IPv4 / IPv6 when the 'ipcs' bit in the send descriptor is set. Advertise support for the 'NETIF_F_IP{,6}_CSUM' features in net devices registered by the driver and VLAN uppers and set the 'ipcs' bit when the stack requests Tx checksum offload. Note that the device also calculates the IPv4 checksum, but it first zeroes the current checksum so there should not be any difference compared to the checksum calculated by the kernel. On SN5600 (Spectrum-4) there is about 10% improvement in Tx packet rate with 1400 byte packets when using pktgen. Tested on Spectrum-{1,2,3,4} with all the combinations of IPv4 / IPv6, TCP / UDP, with and without VLAN. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/8dc86c95474ce10572a0fa83b8adb0259558e982.1738950446.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10selftests: drv-net: add helper for path resolutionJakub Kicinski
Refering to C binaries from Python code is going to be a common need. Add a helper to convert from path in relation to the test. Meaning, if the test is in the same directory as the binary, the call would be simply: cfg.rpath("binary"). The helper name "rpath" is not great. I can't think of a better name that would be accurate yet concise. Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20250207184140.1730466-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10selftests: drv-net: factor out a DrvEnv base classJakub Kicinski
We have separate Env classes for local tests and tests with a remote endpoint. Make it easier to share the code by creating a base class. Make env loading a method of this class. Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20250207184140.1730466-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10selftests: drv-net: remove an unnecessary libmnl includeJakub Kicinski
ncdevmem doesn't need libmnl, remove the unnecessary include. Since YNL doesn't depend on libmnl either, any more, it's actually possible to build selftests without having libmnl installed. Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250207183119.1721424-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10Merge branch 'fib-rules-convert-rtm_newrule-and-rtm_delrule-to-per-netns-rtnl'Jakub Kicinski
Kuniyuki Iwashima says: ==================== fib: rules: Convert RTM_NEWRULE and RTM_DELRULE to per-netns RTNL. Patch 1 ~ 2 are small cleanup, and patch 3 ~ 8 make fib_nl_newrule() and fib_nl_delrule() hold per-netns RTNL. v1: https://lore.kernel.org/20250206084629.16602-1-kuniyu@amazon.com ==================== Link: https://patch.msgid.link/20250207072502.87775-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10net: fib_rules: Convert RTM_DELRULE to per-netns RTNL.Kuniyuki Iwashima
fib_nl_delrule() is the doit() handler for RTM_DELRULE but also called from vrf_newlink() in case something fails in vrf_add_fib_rules(). In the latter case, RTNL is already held and the 4th arg is true. Let's hold per-netns RTNL in fib_delrule() if rtnl_held is false. Now we can place ASSERT_RTNL_NET() in call_fib_rule_notifiers(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-9-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10net: fib_rules: Add error_free label in fib_delrule().Kuniyuki Iwashima
We will hold RTNL just before calling fib_nl2rule_rtnl() in fib_delrule() and release it before kfree(nlrule). Let's add a new rule to make the following change cleaner. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-8-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10net: fib_rules: Convert RTM_NEWRULE to per-netns RTNL.Kuniyuki Iwashima
fib_nl_newrule() is the doit() handler for RTM_NEWRULE but also called from vrf_newlink(). In the latter case, RTNL is already held and the 4th arg is true. Let's hold per-netns RTNL in fib_newrule() if rtnl_held is false. Note that we call fib_rule_get() before releasing per-netns RTNL to call notify_rule_change() without RTNL and prevent freeing the new rule. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10net: fib_rules: Factorise fib_newrule() and fib_delrule().Kuniyuki Iwashima
fib_nl_newrule() / fib_nl_delrule() is the doit() handler for RTM_NEWRULE / RTM_DELRULE but also called from vrf_newlink(). Currently, we hold RTNL on both paths but will not on the former. Also, we set dev_net(dev)->rtnl to skb->sk in vrf_fib_rule() because fib_nl_newrule() / fib_nl_delrule() fetch net as sock_net(skb->sk). Let's Factorise the two functions and pass net and rtnl_held flag. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure().Kuniyuki Iwashima
The following patch will not set skb->sk from VRF path. Let's fetch net from fib_rule->fr_net instead of sock_net(skb->sk) in fib[46]_rule_configure(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10net: fib_rules: Split fib_nl2rule().Kuniyuki Iwashima
We will move RTNL down to fib_nl_newrule() and fib_nl_delrule(). Some operations in fib_nl2rule() require RTNL: fib_default_rule_pref() and __dev_get_by_name(). Let's split the RTNL parts as fib_nl2rule_rtnl(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10net: fib_rules: Pass net to fib_nl2rule() instead of skb.Kuniyuki Iwashima
skb is not used in fib_nl2rule() other than sock_net(skb->sk), which is already available in callers, fib_nl_newrule() and fib_nl_delrule(). Let's pass net directly to fib_nl2rule(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10net: fib_rules: Don't check net in rule_exists() and rule_find().Kuniyuki Iwashima
fib_nl_newrule() / fib_nl_delrule() looks up struct fib_rules_ops in sock_net(skb->sk) and calls rule_exists() / rule_find() respectively. fib_nl_newrule() creates a new rule and links it to the found ops, so struct fib_rule never belongs to a different netns's ops->rules_list. Let's remove redundant netns check in rule_exists() and rule_find(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207072502.87775-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10Merge branch 'tun-unify-vnet-implementation'Jakub Kicinski
Akihiko Odaki says: ==================== tun: Unify vnet implementation When I implemented virtio's hash-related features to tun/tap [1], I found tun/tap does not fill the entire region reserved for the virtio header, leaving some uninitialized hole in the middle of the buffer after read()/recvmesg(). This series fills the uninitialized hole. More concretely, the num_buffers field will be initialized with 1, and the other fields will be inialized with 0. Setting the num_buffers field to 1 is mandated by virtio 1.0 [2]. The change to virtio header is preceded by another change that refactors tun and tap to unify their virtio-related code. [1]: https://lore.kernel.org/r/20241008-rss-v5-0-f3cf68df005d@daynix.com [2]: https://lore.kernel.org/r/20241227084256-mutt-send-email-mst@kernel.org/ ==================== Link: https://patch.msgid.link/20250207-tun-v6-0-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10tap: Use tun's vnet-related codeAkihiko Odaki
tun and tap implements the same vnet-related features so reuse the code. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-7-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10tap: Keep hdr_len in tap_get_user()Akihiko Odaki
hdr_len is repeatedly used so keep it in a local variable. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-6-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10tun: Extract the vnet handling codeAkihiko Odaki
The vnet handling code will be reused by tap. Functions are renamed to ensure that their names contain "vnet" to clarify that they are part of the decoupled vnet handling code. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-5-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10tun: Decouple vnet handlingAkihiko Odaki
Decouple the vnet handling code so that we can reuse it for tap. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-4-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10tun: Decouple vnet from tun_structAkihiko Odaki
Decouple vnet-related functions from tun_struct so that we can reuse them for tap in the future. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-3-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10tun: Keep hdr_len in tun_get_user()Akihiko Odaki
hdr_len is repeatedly used so keep it in a local variable. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-2-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10tun: Refactor CONFIG_TUN_VNET_CROSS_LEAkihiko Odaki
Check IS_ENABLED(CONFIG_TUN_VNET_CROSS_LE) to save some lines and make future changes easier. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250207-tun-v6-1-fb49cf8b103e@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10Merge branch 'net-xilinx-axienet-enable-adaptive-irq-coalescing-with-dim'Jakub Kicinski
Sean Anderson says: ==================== net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM To improve performance without sacrificing latency under low load, enable DIM. While I appreciate not having to write the library myself, I do think there are many unusual aspects to DIM, as detailed in the last patch. ==================== Link: https://patch.msgid.link/20250206201036.1516800-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10net: xilinx: axienet: Enable adaptive IRQ coalescing with DIMSean Anderson
The default RX IRQ coalescing settings of one IRQ per packet can represent a significant CPU load. However, increasing the coalescing unilaterally can result in undesirable latency under low load. Adaptive IRQ coalescing with DIM offers a way to adjust the coalescing settings based on load. This device only supports "CQE" mode [1], where each packet resets the timer. Therefore, an interrupt is fired either when we receive coalesce_count_rx packets or when the interface is idle for coalesce_usec_rx. With this in mind, consider the following scenarios: Link saturated Here we want to set coalesce_count_rx to a large value, in order to coalesce more packets and reduce CPU load. coalesce_usec_rx should be set to at least the time for one packet. Otherwise the link will be "idle" and we will get an interrupt for each packet anyway. Bursts of packets Each burst should be coalesced into a single interrupt, although it may be prudent to reduce coalesce_count_rx for better latency. coalesce_usec_rx should be set to at least the time for one packet so bursts are coalesced. However, additional time beyond the packet time will just increase latency at the end of a burst. Sporadic packets Due to low load, we can set coalesce_count_rx to 1 in order to reduce latency to the minimum. coalesce_usec_rx does not matter in this case. Based on this analysis, I expected the CQE profiles to look something like usec = 0, pkts = 1 // Low load usec = 16, pkts = 4 usec = 16, pkts = 16 usec = 16, pkts = 64 usec = 16, pkts = 256 // High load Where usec is set to 16 to be a few us greater than the 12.3 us packet time of a 1500 MTU packet at 1 GBit/s. However, the CQE profile is instead usec = 2, pkts = 256 // Low load usec = 8, pkts = 128 usec = 16, pkts = 64 usec = 32, pkts = 64 usec = 64, pkts = 64 // High load I found this very surprising. The number of coalesced packets *decreases* as load increases. But as load increases we have more opportunities to coalesce packets without affecting latency as much. Additionally, the profile *increases* the usec as the load increases. But as load increases, the gaps between packets will tend to become smaller, making it possible to *decrease* usec for better latency at the end of a "burst". I consider the default CQE profile unsuitable for this NIC. Therefore, we use the first profile outlined in this commit instead. coalesce_usec_rx is set to 16 by default, but the user can customize it. This may be necessary if they are using jumbo frames. I think adjusting the profile times based on the link speed/mtu would be good improvement for generic DIM. In addition to the above profile problems, I noticed the following additional issues with DIM while testing: - DIM tends to "wander" when at low load, since the performance gradient is pretty flat. If you only have 10p/ms anyway then adjusting the coalescing settings will not affect throughput very much. - DIM takes a long time to adjust back to low indices when load is decreased following a period of high load. This is because it only re-evaluates its settings once every 64 interrupts. However, at low load 64 interrupts can be several seconds. Finally: performance. This patch increases receive throughput with iperf3 from 840 Mbits/sec to 938 Mbits/sec, decreases interrupts from 69920/sec to 316/sec, and decreases CPU utilization (4x Cortex-A53) from 43% to 9%. [1] Who names this stuff? Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20250206201036.1516800-5-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10net: xilinx: axienet: Get coalesce parameters from driver stateSean Anderson
The cr variables now contain the same values as the control registers themselves. Extract/calculate the values from the variables instead of saving the user-specified values. This allows us to remove some bookeeping, and also lets the user know what the actual coalesce settings are. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20250206201036.1516800-4-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10net: xilinx: axienet: Support adjusting coalesce settings while runningSean Anderson
In preparation for adaptive IRQ coalescing, we first need to support adjusting the settings at runtime. The existing code doesn't require any locking because - dma_start is the only function that modifies rx/tx_dma_cr. It is always called with IRQs and NAPI disabled, so nothing else is touching the hardware. - The IRQs don't race with poll, since the latter is a softirq. - The IRQs don't race with dma_stop since they both just clear the control registers. - dma_stop doesn't race with poll since the former is called with NAPI disabled. However, once we introduce another function that modifies rx/tx_dma_cr, we need to have some locking to prevent races. Introduce two locks to protect these variables and their registers. The control register values are now generated where the coalescing settings are set. Converting coalescing settings to control register values may require sleeping because of clk_get_rate. However, the read/modify/write of the control registers themselves can't sleep because it needs to happen in IRQ context. By pre-calculating the control register values, we avoid introducing an additional mutex. Since axienet_dma_start writes the control settings when it runs, we don't bother updating the CR registers when rx/tx_dma_started is false. This prevents any issues from writing to the control registers in the middle of a reset sequence. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20250206201036.1516800-3-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10net: xilinx: axienet: Combine CR calculationSean Anderson
Combine the common parts of the CR calculations for better code reuse. While we're at it, simplify the code a bit. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20250206201036.1516800-2-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10Merge tag 'wireless-2025-02-07' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Kalle Valo says: ==================== wireless fixes for v6.14-rc3 We have only one fix for ath12k and one fix for brcmfmac. Also this will be my last pull request as I'm stepping down as wireless driver maintainer. * tag 'wireless-2025-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: MAINTAINERS: wifi: remove Kalle MAINTAINERS: wifi: ath: remove Kalle wifi: brcmfmac: use random seed flag for BCM4355 and BCM4364 firmware wifi: ath12k: fix handling of 6 GHz rules ==================== Link: https://patch.msgid.link/20250207182957.23315C4CED1@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10Merge branch 'net-second-round-to-use-dev_net_rcu'Jakub Kicinski
Eric Dumazet says: ==================== net: second round to use dev_net_rcu() dev_net(dev) should either be protected by RTNL or RCU. There is no LOCKDEP support yet for this helper. Adding it would trigger too many splats. This second series fixes some of them. ==================== Link: https://patch.msgid.link/20250207135841.1948589-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10ipv6: mcast: extend RCU protection in igmp6_send()Eric Dumazet
igmp6_send() can be called without RTNL or RCU being held. Extend RCU protection so that we can safely fetch the net pointer and avoid a potential UAF. Note that we no longer can use sock_alloc_send_skb() because ipv6.igmp_sk uses GFP_KERNEL allocations which can sleep. Instead use alloc_skb() and charge the net->ipv6.igmp_sk socket under RCU protection. Fixes: b8ad0cbc58f7 ("[NETNS][IPV6] mcast - handle several network namespace") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250207135841.1948589-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10ndisc: extend RCU protection in ndisc_send_skb()Eric Dumazet
ndisc_send_skb() can be called without RTNL or RCU held. Acquire rcu_read_lock() earlier, so that we can use dev_net_rcu() and avoid a potential UAF. Fixes: 1762f7e88eb3 ("[NETNS][IPV6] ndisc - make socket control per namespace") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250207135841.1948589-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10vrf: use RCU protection in l3mdev_l3_out()Eric Dumazet
l3mdev_l3_out() can be called without RCU being held: raw_sendmsg() ip_push_pending_frames() ip_send_skb() ip_local_out() __ip_local_out() l3mdev_ip_out() Add rcu_read_lock() / rcu_read_unlock() pair to avoid a potential UAF. Fixes: a8e3e1a9f020 ("net: l3mdev: Add hook to output path") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250207135841.1948589-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10openvswitch: use RCU protection in ovs_vport_cmd_fill_info()Eric Dumazet
ovs_vport_cmd_fill_info() can be called without RTNL or RCU. Use RCU protection and dev_net_rcu() to avoid potential UAF. Fixes: 9354d4520342 ("openvswitch: reliable interface indentification in port dumps") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250207135841.1948589-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10arp: use RCU protection in arp_xmit()Eric Dumazet
arp_xmit() can be called without RTNL or RCU protection. Use RCU protection to avoid potential UAF. Fixes: 29a26a568038 ("netfilter: Pass struct net into the netfilter hooks") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250207135841.1948589-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10neighbour: use RCU protection in __neigh_notify()Eric Dumazet
__neigh_notify() can be called without RTNL or RCU protection. Use RCU protection to avoid potential UAF. Fixes: 426b5303eb43 ("[NETNS]: Modify the neighbour table code so it handles multiple network namespaces") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250207135841.1948589-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10ndisc: use RCU protection in ndisc_alloc_skb()Eric Dumazet
ndisc_alloc_skb() can be called without RTNL or RCU being held. Add RCU protection to avoid possible UAF. Fixes: de09334b9326 ("ndisc: Introduce ndisc_alloc_skb() helper.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250207135841.1948589-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10ndisc: ndisc_send_redirect() must use dev_get_by_index_rcu()Eric Dumazet
ndisc_send_redirect() is called under RCU protection, not RTNL. It must use dev_get_by_index_rcu() instead of __dev_get_by_index() Fixes: 2f17becfbea5 ("vrf: check the original netdevice for generating redirect") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stephen Suryaputra <ssuryaextr@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250207135841.1948589-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10net: stmmac: Apply new page pool parameters when SPH is enabledFurong Xu
Commit df542f669307 ("net: stmmac: Switch to zero-copy in non-XDP RX path") makes DMA write received frame into buffer at offset of NET_SKB_PAD and sets page pool parameters to sync from offset of NET_SKB_PAD. But when Header Payload Split is enabled, the header is written at offset of NET_SKB_PAD, while the payload is written at offset of zero. Uncorrect offset parameter for the payload breaks dma coherence [1] since both CPU and DMA touch the page buffer from offset of zero which is not handled by the page pool sync parameter. And in case the DMA cannot split the received frame, for example, a large L2 frame, pp_params.max_len should grow to match the tail of entire frame. [1] https://lore.kernel.org/netdev/d465f277-bac7-439f-be1d-9a47dfe2d951@nvidia.com/ Reported-by: Jon Hunter <jonathanh@nvidia.com> Reported-by: Brad Griffis <bgriffis@nvidia.com> Suggested-by: Ido Schimmel <idosch@idosch.org> Fixes: df542f669307 ("net: stmmac: Switch to zero-copy in non-XDP RX path") Signed-off-by: Furong Xu <0x1207@gmail.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250207085639.13580-1-0x1207@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10r8152: add vendor/device ID pair for Dell Alienware AW1022zAleksander Jan Bajkowski
The Dell AW1022z is an RTL8156B based 2.5G Ethernet controller. Add the vendor and product ID values to the driver. This makes Ethernet work with the adapter. Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Link: https://patch.msgid.link/20250206224033.980115-1-olek2@wp.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10Merge branch 'xsk-the-lost-bits-from-chapter-iii'Jakub Kicinski
Alexander Lobakin says: ==================== xsk: the lost bits from Chapter III Before introducing libeth_xdp, we need to add a couple more generic helpers. Notably: * 01: add generic loop unrolling hint helpers; * 04: add helper to get both xdp_desc's DMA address and metadata pointer in one go, saving several cycles and hotpath object code size in drivers (especially when unrolling). Bonus: * 02, 03: convert two drivers which were using custom macros to generic unrolled_count() (trivial, no object code changes). ==================== Link: https://patch.msgid.link/20250206182630.3914318-1-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10xsk: add helper to get &xdp_desc's DMA and meta pointer in one goAlexander Lobakin
Currently, when your driver supports XSk Tx metadata and you want to send an XSk frame, you need to do the following: * call external xsk_buff_raw_get_dma(); * call inline xsk_buff_get_metadata(), which calls external xsk_buff_raw_get_data() and then do some inline checks. This effectively means that the following piece: addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr; is done twice per frame, plus you have 2 external calls per frame, plus this: meta = pool->addrs + addr - pool->tx_metadata_len; if (unlikely(!xsk_buff_valid_tx_metadata(meta))) is always inlined, even if there's no meta or it's invalid. Add xsk_buff_raw_get_ctx() (xp_raw_get_ctx() to be precise) to do that in one go. It returns a small structure with 2 fields: DMA address, filled unconditionally, and metadata pointer, non-NULL only if it's present and valid. The address correction is performed only once and you also have only 1 external call per XSk frame, which does all the calculations and checks outside of your hotpath. You only need to check `if (ctx.meta)` for the metadata presence. To not copy any existing code, derive address correction and getting virtual and DMA address into small helpers. bloat-o-meter reports no object code changes for the existing functionality. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20250206182630.3914318-5-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10ice: use generic unrolled_count() macroAlexander Lobakin
ice, same as i40e, has a custom loop unrolling macros for unrolling Tx descriptors filling on XSk xmit. Replace ice defs with generic unrolled_count(), which is also more convenient as it allows passing defines as its argument, not hardcoded values, while the loop declaration will still be usual for-loop. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20250206182630.3914318-4-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10i40e: use generic unrolled_count() macroAlexander Lobakin
i40e, as well as ice, has a custom loop unrolling macro for unrolling Tx descriptors filling on XSk xmit. Replace i40e defs with generic unrolled_count(), which is also more convenient as it allows passing defines as its argument, not hardcoded values, while the loop declaration will still be a usual for-loop. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20250206182630.3914318-3-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>