summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2025-02-26net: Handle napi_schedule() calls from non-interruptFrederic Weisbecker
napi_schedule() is expected to be called either: * From an interrupt, where raised softirqs are handled on IRQ exit * From a softirq disabled section, where raised softirqs are handled on the next call to local_bh_enable(). * From a softirq handler, where raised softirqs are handled on the next round in do_softirq(), or further deferred to a dedicated kthread. Other bare tasks context may end up ignoring the raised NET_RX vector until the next random softirq handling opportunity, which may not happen before a while if the CPU goes idle afterwards with the tick stopped. Such "misuses" have been detected on several places thanks to messages of the kind: "NOHZ tick-stop error: local softirq work is pending, handler #08!!!" For example: __raise_softirq_irqoff __napi_schedule rtl8152_runtime_resume.isra.0 rtl8152_resume usb_resume_interface.isra.0 usb_resume_both __rpm_callback rpm_callback rpm_resume __pm_runtime_resume usb_autoresume_device usb_remote_wakeup hub_event process_one_work worker_thread kthread ret_from_fork ret_from_fork_asm And also: * drivers/net/usb/r8152.c::rtl_work_func_t * drivers/net/netdevsim/netdev.c::nsim_start_xmit There is a long history of issues of this kind: 019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()") 330068589389 ("idpf: disable local BH when scheduling napi for marker packets") e3d5d70cb483 ("net: lan78xx: fix "softirq work is pending" error") e55c27ed9ccf ("mt76: mt7615: add missing bh-disable around rx napi schedule") c0182aa98570 ("mt76: mt7915: add missing bh-disable around tx napi enable/schedule") 970be1dff26d ("mt76: disable BH around napi_schedule() calls") 019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()") 30bfec4fec59 ("can: rx-offload: can_rx_offload_threaded_irq_finish(): add new function to be called from threaded interrupt") e63052a5dd3c ("mlx5e: add add missing BH locking around napi_schdule()") 83a0c6e58901 ("i40e: Invoke softirqs after napi_reschedule") bd4ce941c8d5 ("mlx4: Invoke softirqs after napi_reschedule") 8cf699ec849f ("mlx4: do not call napi_schedule() without care") ec13ee80145c ("virtio_net: invoke softirqs after __napi_schedule") This shows that relying on the caller to arrange a proper context for the softirqs to be handled while calling napi_schedule() is very fragile and error prone. Also fixing them can also prove challenging if the caller may be called from different kinds of contexts. Therefore fix this from napi_schedule() itself with waking up ksoftirqd when softirqs are raised from task contexts. Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Reported-by: Jakub Kicinski <kuba@kernel.org> Reported-by: Francois Romieu <romieu@fr.zoreil.com> Closes: https://lore.kernel.org/lkml/354a2690-9bbf-4ccb-8769-fa94707a9340@molgen.mpg.de/ Cc: Breno Leitao <leitao@debian.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250223221708.27130-1-frederic@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26net: Clear old fragment checksum value in napi_reuse_skbMohammad Heib
In certain cases, napi_get_frags() returns an skb that points to an old received fragment, This skb may have its skb->ip_summed, csum, and other fields set from previous fragment handling. Some network drivers set skb->ip_summed to either CHECKSUM_COMPLETE or CHECKSUM_UNNECESSARY when getting skb from napi_get_frags(), while others only set skb->ip_summed when RX checksum offload is enabled on the device, and do not set any value for skb->ip_summed when hardware checksum offload is disabled, assuming that the skb->ip_summed initiated to zero by napi_reuse_skb, ionic driver for example will ignore/unset any value for the ip_summed filed if HW checksum offload is disabled, and if we have a situation where the user disables the checksum offload during a traffic that could lead to the following errors shown in the kernel logs: <IRQ> dump_stack_lvl+0x34/0x48 __skb_gro_checksum_complete+0x7e/0x90 tcp6_gro_receive+0xc6/0x190 ipv6_gro_receive+0x1ec/0x430 dev_gro_receive+0x188/0x360 ? ionic_rx_clean+0x25a/0x460 [ionic] napi_gro_frags+0x13c/0x300 ? __pfx_ionic_rx_service+0x10/0x10 [ionic] ionic_rx_service+0x67/0x80 [ionic] ionic_cq_service+0x58/0x90 [ionic] ionic_txrx_napi+0x64/0x1b0 [ionic] __napi_poll+0x27/0x170 net_rx_action+0x29c/0x370 handle_softirqs+0xce/0x270 __irq_exit_rcu+0xa3/0xc0 common_interrupt+0x80/0xa0 </IRQ> This inconsistency sometimes leads to checksum validation issues in the upper layers of the network stack. To resolve this, this patch clears the skb->ip_summed value for each reused skb in by napi_reuse_skb(), ensuring that the caller is responsible for setting the correct checksum status. This eliminates potential checksum validation issues caused by improper handling of skb->ip_summed. Fixes: 76620aafd66f ("gro: New frags interface to avoid copying shinfo") Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250225112852.2507709-1-mheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26tcp: be less liberal in TSEcr received while in SYN_RECV stateEric Dumazet
Yong-Hao Zou mentioned that linux was not strict as other OS in 3WHS, for flows using TCP TS option (RFC 7323) As hinted by an old comment in tcp_check_req(), we can check the TSEcr value in the incoming packet corresponds to one of the SYNACK TSval values we have sent. In this patch, I record the oldest and most recent values that SYNACK packets have used. Send a challenge ACK if we receive a TSEcr outside of this range, and increase a new SNMP counter. nstat -az | grep TSEcrRejected TcpExtTSEcrRejected 0 0.0 Due to TCP fastopen implementation, do not apply yet these checks for fastopen flows. v2: No longer use req->num_timeout, but treq->snt_tsval_first to detect when first SYNACK is prepared. This means we make sure to not send an initial zero TSval. Make sure MPTCP and TCP selftests are passing. Change MIB name to TcpExtTSEcrRejected v1: https://lore.kernel.org/netdev/CADVnQykD8i4ArpSZaPKaoNxLJ2if2ts9m4As+=Jvdkrgx1qMHw@mail.gmail.com/T/ Reported-by: Yong-Hao Zou <yonghaoz1994@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250225171048.3105061-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26net: Use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net().Kuniyuki Iwashima
Breno Leitao reported the splat below. [0] Commit 65161fb544aa ("net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().") added the DEBUG_NET_WARN_ON_ONCE(), assuming that the netdev is not registered before register_netdevice_notifier_dev_net(). But the assumption was simply wrong. Let's use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net(). [0]: WARNING: CPU: 25 PID: 849 at net/core/dev.c:2150 register_netdevice_notifier_dev_net (net/core/dev.c:2150) <TASK> ? __warn (kernel/panic.c:242 kernel/panic.c:748) ? register_netdevice_notifier_dev_net (net/core/dev.c:2150) ? register_netdevice_notifier_dev_net (net/core/dev.c:2150) ? report_bug (lib/bug.c:? lib/bug.c:219) ? handle_bug (arch/x86/kernel/traps.c:285) ? exc_invalid_op (arch/x86/kernel/traps.c:309) ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) ? register_netdevice_notifier_dev_net (net/core/dev.c:2150) ? register_netdevice_notifier_dev_net (./include/net/net_namespace.h:406 ./include/linux/netdevice.h:2663 net/core/dev.c:2144) mlx5e_mdev_notifier_event+0x9f/0xf0 mlx5_ib notifier_call_chain.llvm.12241336988804114627 (kernel/notifier.c:85) blocking_notifier_call_chain (kernel/notifier.c:380) mlx5_core_uplink_netdev_event_replay (drivers/net/ethernet/mellanox/mlx5/core/main.c:352) mlx5_ib_roce_init.llvm.12447516292400117075+0x1c6/0x550 mlx5_ib mlx5r_probe+0x375/0x6a0 mlx5_ib ? kernfs_put (./include/linux/instrumented.h:96 ./include/linux/atomic/atomic-arch-fallback.h:2278 ./include/linux/atomic/atomic-instrumented.h:1384 fs/kernfs/dir.c:557) ? auxiliary_match_id (drivers/base/auxiliary.c:174) ? mlx5r_mp_remove+0x160/0x160 mlx5_ib really_probe (drivers/base/dd.c:? drivers/base/dd.c:658) driver_probe_device (drivers/base/dd.c:830) __driver_attach (drivers/base/dd.c:1217) bus_for_each_dev (drivers/base/bus.c:369) ? driver_attach (drivers/base/dd.c:1157) bus_add_driver (drivers/base/bus.c:679) driver_register (drivers/base/driver.c:249) Fixes: 7fb1073300a2 ("net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().") Reported-by: Breno Leitao <leitao@debian.org> Closes: https://lore.kernel.org/netdev/20250224-noisy-cordial-roadrunner-fad40c@leitao/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Tested-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250225211023.96448-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26Merge tag 'nfs-for-6.14-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: "Stable Fixes: - O_DIRECT writes should adjust file length Other Bugfixes: - Adjust delegated timestamps for O_DIRECT reads and writes - Prevent looping due to rpc_signal_task() races - Fix a deadlock when recovering state on a sillyrenamed file - Properly handle -ETIMEDOUT errors from tlshd - Suppress build warnings for unused procfs functions - Fix memory leak of lsm_contexts" * tag 'nfs-for-6.14-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: lsm,nfs: fix memory leak of lsm_context sunrpc: suppress warnings for unused procfs functions SUNRPC: Handle -ETIMEDOUT return from tlshd NFSv4: Fix a deadlock when recovering state on a sillyrenamed file SUNRPC: Prevent looping due to rpc_signal_task() races NFS: Adjust delegated timestamps for O_DIRECT reads and writes NFS: O_DIRECT writes must check and adjust the file length
2025-02-26bpf: add get_netns_cookie helper to cgroup_skb programsMahe Tardy
This is needed in the context of Cilium and Tetragon to retrieve netns cookie from hostns when traffic leaves Pod, so that we can correlate skb->sk's netns cookie. Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com> Link: https://lore.kernel.org/r/20250225125031.258740-1-mahe.tardy@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-26wifi: cfg80211: expose update timestamp to driversBenjamin Berg
This information is exposed to userspace but not drivers. Make this field public so that drivers are also able to access it. The information is for example useful for link selection to determine whether the BSS corresponding to an MLO link has been seen in a recent scan. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250212082137.b682ee7aebc8.I0f7cca9effa2b1cee79f4f2eb8b549c99b4e0571@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-26wifi: mac80211: add ieee80211_iter_chan_contexts_mtxMiri Korenblit
Add a chanctx iterator that can be called from a wiphy-locked context. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250212082137.d85eef3024de.Icda0616416c5fd4b2cbf892bdab2476f26e644ec@changeid [fix kernel-doc] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-26wifi: mac80211: fix integer overflow in hwmp_route_info_get()Gavrilov Ilia
Since the new_metric and last_hop_metric variables can reach the MAX_METRIC(0xffffffff) value, an integer overflow may occur when multiplying them by 10/9. It can lead to incorrect behavior. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: a8d418d9ac25 ("mac80211: mesh: only switch path when new metric is at least 10% better") Cc: stable@vger.kernel.org Signed-off-by: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru> Link: https://patch.msgid.link/20250212082124.4078236-1-Ilia.Gavrilov@infotecs.ru Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-26wifi: mac80211: Fix possible integer promotion issueIlan Peer
Fix a possible integer promotion issue in mac80211 in ieee80211_ml_epcs() Fixes: de86c5f60839 ("wifi: mac80211: Add support for EPCS configuration") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Ilan Peer <ilan.peer@intel.com> Link: https://patch.msgid.link/20250214074721.1613549-1-ilan.peer@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-26wifi: cfg80211: convert timeouts to secs_to_jiffies()Easwar Hariharan
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies to avoid the multiplication. This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @depends on patch@ expression E; @@ -msecs_to_jiffies(E * 1000) +secs_to_jiffies(E) -msecs_to_jiffies(E * MSEC_PER_SEC) +secs_to_jiffies(E) Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Link: https://patch.msgid.link/20250219203240.141272-1-eahariha@linux.microsoft.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-26wifi: mac80211: Add counter for all monitor interfacesAlexander Wetzel
Count open monitor interfaces regardless of the monitor interface type. The new counter virt_monitors takes over counting interfaces depending on the virtual monitor interface while monitors is used for all active monitor interfaces. This fixes monitor packet mirroring when using MONITOR_FLAG_ACTIVE or NO_VIRTUAL_MONITOR interfaces. Fixes: 286e69677065 ("wifi: mac80211: Drop cooked monitor support") Reported-by: Karthikeyan Periyasamy <quic_periyasa@quicinc.com> Closes: https://lore.kernel.org/r/cc715114-4e3b-619a-49dc-a4878075e1dc@quicinc.com Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de> Tested-by: Karthikeyan Periyasamy <quic_periyasa@quicinc.com> Link: https://patch.msgid.link/20250220094139.61459-1-Alexander@wetzel-home.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-26wifi: mac80211: Fix sparse warning for monitor_sdataAlexander Wetzel
Use rcu_access_pointer() to avoid sparse warning in drv_remove_interface(). Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502130534.bVrZZBK0-lkp@intel.com/ Fixes: 646262c71aca ("wifi: mac80211: remove debugfs dir for virtual monitor") Link: https://patch.msgid.link/20250213214330.6113-1-Alexander@wetzel-home.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-26wifi: mac80211: fix vendor-specific inheritanceJohannes Berg
If there's any vendor-specific element in the subelements then the outer element parsing must not parse any vendor element at all. This isn't implemented correctly now due to parsing into the pointers and then overriding them, so explicitly skip vendor elements if any exist in the sub- elements (non-transmitted profile or per-STA profile). Fixes: 671042a4fb77 ("mac80211: support non-inheritance element") Reviewed-by: Ilan Peer <ilan.peer@intel.com> Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250221112451.fd71e5268840.I9db3e6a3367e6ff38d052d07dc07005f0dd3bd5c@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-26wifi: mac80211: fix MLE non-inheritance parsingJohannes Berg
The code is erroneously applying the non-inheritance element to the inner elements rather than the outer, which is clearly completely wrong. Fix it by finding the MLE basic element at the beginning, and then applying the non-inheritance for the outer parsing. While at it, do some general cleanups such as not allowing callers to try looking for a specific non-transmitted BSS and link at the same time. Fixes: 45ebac4f059b ("wifi: mac80211: Parse station profile from association response") Reviewed-by: Ilan Peer <ilan.peer@intel.com> Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250221112451.b46d42f45b66.If5b95dc3c80208e0c62d8895fb6152aa54b6620b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-26tcp: Defer ts_recent changes until req is ownedWang Hai
Recently a bug was discovered where the server had entered TCP_ESTABLISHED state, but the upper layers were not notified. The same 5-tuple packet may be processed by different CPUSs, so two CPUs may receive different ack packets at the same time when the state is TCP_NEW_SYN_RECV. In that case, req->ts_recent in tcp_check_req may be changed concurrently, which will probably cause the newsk's ts_recent to be incorrectly large. So that tcp_validate_incoming will fail. At this point, newsk will not be able to enter the TCP_ESTABLISHED. cpu1 cpu2 tcp_check_req tcp_check_req req->ts_recent = rcv_tsval = t1 req->ts_recent = rcv_tsval = t2 syn_recv_sock tcp_sk(child)->rx_opt.ts_recent = req->ts_recent = t2 // t1 < t2 tcp_child_process tcp_rcv_state_process tcp_validate_incoming tcp_paws_check if ((s32)(rx_opt->ts_recent - rx_opt->rcv_tsval) <= paws_win) // t2 - t1 > paws_win, failed tcp_v4_do_rcv tcp_rcv_state_process // TCP_ESTABLISHED The cpu2's skb or a newly received skb will call tcp_v4_do_rcv to get the newsk into the TCP_ESTABLISHED state, but at this point it is no longer possible to notify the upper layer application. A notification mechanism could be added here, but the fix is more complex, so the current fix is used. In tcp_check_req, req->ts_recent is used to assign a value to tcp_sk(child)->rx_opt.ts_recent, so removing the change in req->ts_recent and changing tcp_sk(child)->rx_opt.ts_recent directly after owning the req fixes this bug. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-02-25mptcp: safety check before fallbackMatthieu Baerts (NGI0)
Recently, some fallback have been initiated, while the connection was not supposed to fallback. Add a safety check with a warning to detect when an wrong attempt to fallback is being done. This should help detecting any future issues quicker. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250224-net-mptcp-misc-fixes-v1-3-f550f636b435@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-25mptcp: reset when MPTCP opts are dropped after joinMatthieu Baerts (NGI0)
Before this patch, if the checksum was not used, the subflow was only reset if map_data_len was != 0. If there were no MPTCP options or an invalid mapping, map_data_len was not set to the data len, and then the subflow was not reset as it should have been, leaving the MPTCP connection in a wrong fallback mode. This map_data_len condition has been introduced to handle the reception of the infinite mapping. Instead, a new dedicated mapping error could have been returned and treated as a special case. However, the commit 31bf11de146c ("mptcp: introduce MAPPING_BAD_CSUM") has been introduced by Paolo Abeni soon after, and backported later on to stable. It better handle the csum case, and it means the exception for valid_csum_seen in subflow_can_fallback(), plus this one for the infinite mapping in subflow_check_data_avail(), are no longer needed. In other words, the code can be simplified there: a fallback should only be done if msk->allow_infinite_fallback is set. This boolean is set to false once MPTCP-specific operations acting on the whole MPTCP connection vs the initial path have been done, e.g. a second path has been created, or an MPTCP re-injection -- yes, possible even with a single subflow. The subflow_can_fallback() helper can then be dropped, and replaced by this single condition. This also makes the code clearer: a fallback should only be done if it is possible to do so. While at it, no need to set map_data_len to 0 in get_mapping_status() for the infinite mapping case: it will be set to skb->len just after, at the end of subflow_check_data_avail(), and not read in between. Fixes: f8d4bcacff3b ("mptcp: infinite mapping receiving") Cc: stable@vger.kernel.org Reported-by: Chester A. Unal <chester.a.unal@xpedite-tech.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/544 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Tested-by: Chester A. Unal <chester.a.unal@xpedite-tech.com> Link: https://patch.msgid.link/20250224-net-mptcp-misc-fixes-v1-2-f550f636b435@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-25mptcp: always handle address removal under msk socket lockPaolo Abeni
Syzkaller reported a lockdep splat in the PM control path: WARNING: CPU: 0 PID: 6693 at ./include/net/sock.h:1711 sock_owned_by_me include/net/sock.h:1711 [inline] WARNING: CPU: 0 PID: 6693 at ./include/net/sock.h:1711 msk_owned_by_me net/mptcp/protocol.h:363 [inline] WARNING: CPU: 0 PID: 6693 at ./include/net/sock.h:1711 mptcp_pm_nl_addr_send_ack+0x57c/0x610 net/mptcp/pm_netlink.c:788 Modules linked in: CPU: 0 UID: 0 PID: 6693 Comm: syz.0.205 Not tainted 6.14.0-rc2-syzkaller-00303-gad1b832bf1cf #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024 RIP: 0010:sock_owned_by_me include/net/sock.h:1711 [inline] RIP: 0010:msk_owned_by_me net/mptcp/protocol.h:363 [inline] RIP: 0010:mptcp_pm_nl_addr_send_ack+0x57c/0x610 net/mptcp/pm_netlink.c:788 Code: 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ca 7b d3 f5 eb b9 e8 c3 7b d3 f5 90 0f 0b 90 e9 dd fb ff ff e8 b5 7b d3 f5 90 <0f> 0b 90 e9 3e fb ff ff 44 89 f1 80 e1 07 38 c1 0f 8c eb fb ff ff RSP: 0000:ffffc900034f6f60 EFLAGS: 00010283 RAX: ffffffff8bee3c2b RBX: 0000000000000001 RCX: 0000000000080000 RDX: ffffc90004d42000 RSI: 000000000000a407 RDI: 000000000000a408 RBP: ffffc900034f7030 R08: ffffffff8bee37f6 R09: 0100000000000000 R10: dffffc0000000000 R11: ffffed100bcc62e4 R12: ffff88805e6316e0 R13: ffff88805e630c00 R14: dffffc0000000000 R15: ffff88805e630c00 FS: 00007f7e9a7e96c0(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2fd18ff8 CR3: 0000000032c24000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> mptcp_pm_remove_addr+0x103/0x1d0 net/mptcp/pm.c:59 mptcp_pm_remove_anno_addr+0x1f4/0x2f0 net/mptcp/pm_netlink.c:1486 mptcp_nl_remove_subflow_and_signal_addr net/mptcp/pm_netlink.c:1518 [inline] mptcp_pm_nl_del_addr_doit+0x118d/0x1af0 net/mptcp/pm_netlink.c:1629 genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline] genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline] genl_rcv_msg+0xb1f/0xec0 net/netlink/genetlink.c:1210 netlink_rcv_skb+0x206/0x480 net/netlink/af_netlink.c:2543 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219 netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline] netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1348 netlink_sendmsg+0x8de/0xcb0 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:718 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:733 ____sys_sendmsg+0x53a/0x860 net/socket.c:2573 ___sys_sendmsg net/socket.c:2627 [inline] __sys_sendmsg+0x269/0x350 net/socket.c:2659 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f7e9998cde9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f7e9a7e9038 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f7e99ba5fa0 RCX: 00007f7e9998cde9 RDX: 000000002000c094 RSI: 0000400000000000 RDI: 0000000000000007 RBP: 00007f7e99a0e2a0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007f7e99ba5fa0 R15: 00007fff49231088 Indeed the PM can try to send a RM_ADDR over a msk without acquiring first the msk socket lock. The bugged code-path comes from an early optimization: when there are no subflows, the PM should (usually) not send RM_ADDR notifications. The above statement is incorrect, as without locks another process could concurrent create a new subflow and cause the RM_ADDR generation. Additionally the supposed optimization is not very effective even performance-wise, as most mptcp sockets should have at least one subflow: the MPC one. Address the issue removing the buggy code path, the existing "slow-path" will handle correctly even the edge case. Fixes: b6c08380860b ("mptcp: remove addr and subflow in PM netlink") Cc: stable@vger.kernel.org Reported-by: syzbot+cd3ce3d03a3393ae9700@syzkaller.appspotmail.com Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/546 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250224-net-mptcp-misc-fixes-v1-1-f550f636b435@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-25ethtool: Symmetric OR-XOR RSS hashGal Pressman
Add an additional type of symmetric RSS hash type: OR-XOR. The "Symmetric-OR-XOR" algorithm transforms the input as follows: (SRC_IP | DST_IP, SRC_IP ^ DST_IP, SRC_PORT | DST_PORT, SRC_PORT ^ DST_PORT) Change 'cap_rss_sym_xor_supported' to 'supported_input_xfrm', a bitmap of supported RXH_XFRM_* types. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/20250224174416.499070-2-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-25tcp: devmem: don't write truncated dmabuf CMSGs to userspaceStanislav Fomichev
Currently, we report -ETOOSMALL (err) only on the first iteration (!sent). When we get put_cmsg error after a bunch of successful put_cmsg calls, we don't signal the error at all. This might be confusing on the userspace side which will see truncated CMSGs but no MSG_CTRUNC signal. Consider the following case: - sizeof(struct cmsghdr) = 16 - sizeof(struct dmabuf_cmsg) = 24 - total cmsg size (CMSG_LEN) = 40 (16+24) When calling recvmsg with msg_controllen=60, the userspace will receive two(!) dmabuf_cmsg(s), the first one will be a valid one and the second one will be silently truncated. There is no easy way to discover the truncation besides doing something like "cm->cmsg_len != CMSG_LEN(sizeof(dmabuf_cmsg))". Introduce new put_devmem_cmsg wrapper that reports an error instead of doing the truncation. Mina suggests that it's the intended way this API should work. Note that we might now report MSG_CTRUNC when the users (incorrectly) call us with msg_control == NULL. Fixes: 8f0b3cc9a4c1 ("tcp: RX path for devmem TCP") Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250224174401.3582695-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-25sunrpc: suppress warnings for unused procfs functionsArnd Bergmann
There is a warning about unused variables when building with W=1 and no procfs: net/sunrpc/cache.c:1660:30: error: 'cache_flush_proc_ops' defined but not used [-Werror=unused-const-variable=] 1660 | static const struct proc_ops cache_flush_proc_ops = { | ^~~~~~~~~~~~~~~~~~~~ net/sunrpc/cache.c:1622:30: error: 'content_proc_ops' defined but not used [-Werror=unused-const-variable=] 1622 | static const struct proc_ops content_proc_ops = { | ^~~~~~~~~~~~~~~~ net/sunrpc/cache.c:1598:30: error: 'cache_channel_proc_ops' defined but not used [-Werror=unused-const-variable=] 1598 | static const struct proc_ops cache_channel_proc_ops = { | ^~~~~~~~~~~~~~~~~~~~~~ These are used inside of an #ifdef, so replacing that with an IS_ENABLED() check lets the compiler see how they are used while still dropping them during dead code elimination. Fixes: dbf847ecb631 ("knfsd: allow cache_register to return error on failure") Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-02-25ipvs: Always clear ipvs_property flag in skb_scrub_packet()Philo Lu
We found an issue when using bpf_redirect with ipvs NAT mode after commit ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within the same name space"). Particularly, we use bpf_redirect to return the skb directly back to the netif it comes from, i.e., xnet is false in skb_scrub_packet(), and then ipvs_property is preserved and SNAT is skipped in the rx path. ipvs_property has been already cleared when netns is changed in commit 2b5ec1a5f973 ("netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed"). This patch just clears it in spite of netns. Fixes: 2b5ec1a5f973 ("netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed") Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Acked-by: Julian Anastasov <ja@ssi.bg> Link: https://patch.msgid.link/20250222033518.126087-1-lulie@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-24mptcp: blackhole: avoid checking the state twiceMatthieu Baerts (NGI0)
A small cleanup, reordering the conditions to avoid checking things twice. The code here is called in case of timeout on a TCP connection, before triggering a retransmission. But it only acts on SYN + MPC packets. So the conditions can be re-order to exit early in case of non-MPTCP SYN + MPC. This also reduce the indentation levels. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-10-2b70ab1cee79@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24mptcp: sched: reduce size for unused dataMatthieu Baerts (NGI0)
Thanks for the previous commit ("mptcp: sched: split get_subflow interface into two"), the mptcp_sched_data structure is now currently unused. This structure has been added to allow future extensions that are not ready yet. At the end, this structure will not even be used at all when mptcp_subflow bpf_iter will be supported [1]. Here is a first step to save 64 bytes on the stack for each scheduling operation. The structure is not removed yet not to break the WIP work on these extensions, but will be done when [1] will be ready and applied. Link: https://lore.kernel.org/6645ad6e-8874-44c5-8730-854c30673218@linux.dev [1] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-9-2b70ab1cee79@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24mptcp: sched: split get_subflow interface into twoGeliang Tang
get_retrans() interface of the burst packet scheduler invokes a sleeping function mptcp_pm_subflow_chk_stale(), which calls __lock_sock_fast(). So get_retrans() interface should be set with BPF_F_SLEEPABLE flag in BPF. But get_send() interface of this scheduler can't be set with BPF_F_SLEEPABLE flag since it's invoked in ack_update_msk() under mptcp data lock. So this patch has to split get_subflow() interface of packet scheduer into two interfaces: get_send() and get_retrans(). Then we can set get_retrans() interface alone with BPF_F_SLEEPABLE flag. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-8-2b70ab1cee79@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24mptcp: pm: use ipv6_addr_equal in addresses_equalGeliang Tang
Use ipv6_addr_equal() to check whether two IPv6 addresses are equal in mptcp_addresses_equal(). This is more appropriate than using !ipv6_addr_cmp(). Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-7-2b70ab1cee79@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24mptcp: pm: drop inet6_sk after inet_skGeliang Tang
In mptcp_event_add_subflow(), mptcp_event_pm_listener() and mptcp_nl_find_ssk(), 'issk' has already been got through inet_sk(). No need to use inet6_sk() to get 'ipv6_pinfo' again, just use issk->pinet6 instead. This patch also drops these 'ipv6_pinfo' variables. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-6-2b70ab1cee79@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24mptcp: pm: drop match in userspace_pm_append_new_local_addrGeliang Tang
The variable 'match' in mptcp_userspace_pm_append_new_local_addr() is a redundant one, and this patch drops it. No need to define 'match' as 'struct mptcp_pm_addr_entry *' type. In this function, it's only used to check whether it's NULL. It can be defined as a Boolean one. Also other variables 'addr_match' and 'id_match' make 'match' a redundant one, which can be replaced by directly checking 'addr_match && id_match'. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-5-2b70ab1cee79@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24mptcp: pm: add mptcp_pm_genl_fill_addr helperGeliang Tang
To save some redundant code in dump_addr() interfaces of both the netlink PM and userspace PM, the code that calls netlink message helpers (genlmsg_put/cancel/end) and mptcp_nl_fill_addr() is wrapped into a new helper mptcp_pm_genl_fill_addr(). Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-4-2b70ab1cee79@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24mptcp: pm: add a build check for userspace_pm_dump_addrGeliang Tang
This patch adds a build check for mptcp_userspace_pm_dump_addr() to make sure there is enough space in 'cb->ctx' to store an address id bitmap. Just in case info stored in 'cb->ctx' are increased later. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-3-2b70ab1cee79@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24mptcp: pm: change to fullmesh only for 'subflow'Matthieu Baerts (NGI0)
If an endpoint doesn't have the 'subflow' flag -- in fact, has no type, so not 'subflow', 'signal', nor 'implicit' -- there are then no subflows created from this local endpoint to at least the initial destination address. In this case, no need to call mptcp_pm_nl_fullmesh() which is there to recreate the subflows to reflect the new value of the fullmesh attribute. Similarly, there is then no need to iterate over all connections to do nothing, if only the 'fullmesh' flag has been changed, and the endpoint doesn't have the 'subflow' one. So stop early when dealing with this specific case. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-2-2b70ab1cee79@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24mptcp: pm: remove unused ret value to set flagsMatthieu Baerts (NGI0)
The returned value is not used, it can then be dropped. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-1-2b70ab1cee79@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24net: Remove shadow variable in netdev_run_todo()Breno Leitao
Fix a shadow variable warning in net/core/dev.c when compiled with CONFIG_LOCKDEP enabled. The warning occurs because 'dev' is redeclared inside the while loop, shadowing the outer scope declaration. net/core/dev.c:11211:22: warning: declaration shadows a local variable [-Wshadow] struct net_device *dev = list_first_entry(&unlink_list, net/core/dev.c:11202:21: note: previous declaration is here struct net_device *dev, *tmp; Remove the redundant declaration since the variable is already defined in the outer scope and will be overwritten in the subsequent list_for_each_entry_safe() loop anyway. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250221-netcons_fix_shadow-v1-1-dee20c8658dd@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24net: remove '__' from __skb_flow_get_ports()Nicolas Dichtel
Only one version of skb_flow_get_ports() exists after the previous commit, so let's remove the useless '__'. Suggested-by: Simon Horman <horms@kernel.org> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://patch.msgid.link/20250221110941.2041629-3-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24net-sysfs: restore behavior for not running devicesEric Dumazet
modprobe dummy dumdummies=1 Old behavior : $ cat /sys/class/net/dummy0/carrier cat: /sys/class/net/dummy0/carrier: Invalid argument After blamed commit, an empty string is reported. $ cat /sys/class/net/dummy0/carrier $ In this commit, I restore the old behavior for carrier, speed and duplex attributes. Fixes: 79c61899b5ee ("net-sysfs: remove rtnl_trylock from device attributes") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Marco Leogrande <leogrande@google.com> Reviewed-by: Antoine Tenart <atenart@kernel.org> Link: https://patch.msgid.link/20250221051223.576726-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24net: ethtool: fix ioctl confusing drivers about desired HDS user configJakub Kicinski
The legacy ioctl path does not have support for extended attributes. So we issue a GET to fetch the current settings from the driver, in an attempt to keep them unchanged. HDS is a bit "special" as the GET only returns on/off while the SET takes a "ternary" argument (on/off/default). If the driver was in the "default" setting - executing the ioctl path binds it to on or off, even tho the user did not intend to change HDS config. Factor the relevant logic out of the netlink code and reuse it. Fixes: 87c8f8496a05 ("bnxt_en: add support for tcp-data-split ethtool command") Acked-by: Stanislav Fomichev <sdf@fomichev.me> Tested-by: Daniel Xu <dxu@dxuuu.xyz> Tested-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250221025141.1132944-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-23batman-adv: add missing newlines for log macrosSven Eckelmann
Missing trailing newlines can lead to incomplete log lines that do not appear properly in dmesg or in console output. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2025-02-22batman-adv: Limit aggregation size to outgoing MTUSven Eckelmann
If a B.A.T.M.A.N. IV aggregated OGM was prepared, it was always assumed that 512 bytes (BATADV_MAX_AGGREGATION_BYTES) can be transmitted. But the outgoing MTU might be too small for these 512 bytes and the aggregation size must be adjusted in this case. Otherwise, the aggregates will cause unnecessary packet loss. For now, the non-aggregated packet length is not touched. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2025-02-22batman-adv: Use actual packet count for aggregated packetsSven Eckelmann
The batadv_forw_packet->num_packets didn't store the number of packets but the the number of packets - 1. This didn't had any effects on the actual handling of aggregates but can easily be a source of confusion when reading the code. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2025-02-22batman-adv: Switch to bitmap helper for aggregation handlingSven Eckelmann
The aggregation code duplicates code which already exists in the the bitops and bitmap helper. By switching to the bitmap helpers, operating on larger aggregations becomes possible without touching the different portions of the code which read/modify direct_link_flags. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2025-02-22batman-adv: Limit number of aggregated packets directlySven Eckelmann
The currently selected size in BATADV_MAX_AGGREGATION_BYTES (512) is chosen such that the number of possible aggregated packets is lower than 32. This number must be limited so that the type of batadv_forw_packet->direct_link_flags has enough bits to represent each packet (with the size of at least 24 bytes). This requirement is better implemented in code instead of having it inside a comment. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2025-02-22batman-adv: Use consistent name for mesh interfaceSven Eckelmann
The way how the virtual interface is called inside the batman-adv source code is not consistent. The genl headers call it meshif and the rest of the code calls is (mostly) softif. The genl definitions cannot be touched because they are part of the UAPI. But the rest of the batman-adv code can be touched to have a consistent name again. The bulk of the renaming was done using sed -i -e 's/soft\(-\|\_\| \|\)i\([nf]\)/mesh\1i\2/g' \ -e 's/SOFT\(-\|\_\| \|\)I\([NF]\)/MESH\1I\2/g' and then it was adjusted slightly when proofreading the changes. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2025-02-22batman-adv: Add support for jumbo framesSven Eckelmann
Since batman-adv is not actually depending on hardware capabilities, it has no limit on the MTU. Only the lower hard interfaces can limit it. In case these have an high enough MTU or fragmentation is enabled, a higher MTU than 1500 can be enabled. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2025-02-22batman-adv: adopt netdev_hold() / netdev_put()Eric Dumazet
Add a device tracker to struct batadv_hard_iface to help debugging of network device refcount imbalances. Signed-off-by: Eric Dumazet <edumazet@google.com> [sven@narfation.org: fix kernel-doc, adopt for softif reference] Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2025-02-22batman-adv: Drop batadv_priv_debug_log structSven Eckelmann
The support for the batman-adv ring buffer for debug logs was dropped with the removal of the debugfs filesystem. The structure storing this ring buffer is therefore no longer needed since commit aff6f5a68b92 ("batman-adv: Drop deprecated debugfs support") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2025-02-22batman-adv: Start new development cycleSimon Wunderlich
This version will contain all the (major or even only minor) changes for Linux 6.15. The version number isn't a semantic version number with major and minor information. It is just encoding the year of the expected publishing as Linux -rc1 and the number of published versions this year (starting at 0). Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2025-02-21net: set the minimum for net_hotdata.netdev_budget_usecsJiri Slaby (SUSE)
Commit 7acf8a1e8a28 ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning") added a possibility to set net_hotdata.netdev_budget_usecs, but added no lower bound checking. Commit a4837980fd9f ("net: revert default NAPI poll timeout to 2 jiffies") made the *initial* value HZ-dependent, so the initial value is at least 2 jiffies even for lower HZ values (2 ms for 1000 Hz, 8ms for 250 Hz, 20 ms for 100 Hz). But a user still can set improper values by a sysctl. Set .extra1 (the lower bound) for net_hotdata.netdev_budget_usecs to the same value as in the latter commit. That is to 2 jiffies. Fixes: a4837980fd9f ("net: revert default NAPI poll timeout to 2 jiffies") Fixes: 7acf8a1e8a28 ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning") Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Cc: Dmitry Yakunin <zeil@yandex-team.ru> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Link: https://patch.msgid.link/20250220110752.137639-1-jirislaby@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21net: fib_rules: Enable DSCP mask usageIdo Schimmel
Allow user space to configure FIB rules that match on DSCP with a mask, now that support has been added to the IPv4 and IPv6 address families. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/20250220080525.831924-5-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21ipv6: fib_rules: Add DSCP mask matchingIdo Schimmel
Extend IPv6 FIB rules to match on DSCP using a mask. Unlike IPv4, also initialize the DSCP mask when a non-zero 'tos' is specified as there is no difference in matching between 'tos' and 'dscp'. As a side effect, this makes it possible to match on 'dscp 0', like in IPv4. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/20250220080525.831924-4-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>