summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2025-03-04net: pktgen: remove some superfluous variable initializingPeter Seiderer
Remove some superfluous variable initializing before hex32_arg call (as the same init is done here already). Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: pktgen: remove extra tmp variable (re-use len instead)Peter Seiderer
Remove extra tmp variable in pktgen_if_write (re-use len instead). Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: pktgen: fix mix of int/longPeter Seiderer
Fix mix of int/long (and multiple conversion from/to) by using consequently size_t for i and max and ssize_t for len and adjust function signatures of hex32_arg(), count_trail_chars(), num_arg() and strn_len() accordingly. Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-03mptcp: Remove unused declaration mptcp_set_owner_r()Yue Haibing
Commit 6639498ed85f ("mptcp: cleanup mem accounting") removed the implementation but leave declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250228095148.4003065-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03mptcp: use sock_kmemdup for address entryGeliang Tang
Instead of using sock_kmalloc() to allocate an address entry "e" and then immediately duplicate the input "entry" to it, the newly added sock_kmemdup() helper can be used in mptcp_userspace_pm_append_new_local_addr() to simplify the code. More importantly, the code "*e = *entry;" that assigns "entry" to "e" is not easy to implemented in BPF if we use the same code to implement an append_new_local_addr() helper of a BFP path manager. This patch avoids this type of memory assignment operation. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/3e5a307aed213038a87e44ff93b5793229b16279.1740735165.git.tanggeliang@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03net: use sock_kmemdup for ip_optionsGeliang Tang
Instead of using sock_kmalloc() to allocate an ip_options and then immediately duplicate another ip_options to the newly allocated one in ipv6_dup_options(), mptcp_copy_ip_options() and sctp_v4_copy_ip_options(), the newly added sock_kmemdup() helper can be used to simplify the code. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/91ae749d66600ec6fb679e0e518fda6acb5c3e6f.1740735165.git.tanggeliang@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03sock: add sock_kmemdup helperGeliang Tang
This patch adds the sock version of kmemdup() helper, named sock_kmemdup(), to duplicate the input "src" memory block using the socket's option memory buffer. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/f828077394c7d1f3560123497348b438c875b510.1740735165.git.tanggeliang@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03tcp: tcp_set_window_clamp() cleanupEric Dumazet
Remove one indentation level. Use max_t() and clamp() macros. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250301201424.2046477-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03tcp: remove READ_ONCE(req->ts_recent)Eric Dumazet
After commit 8d52da23b6c6 ("tcp: Defer ts_recent changes until req is owned"), req->ts_recent is not changed anymore. It is set once in tcp_openreq_init(), bpf_sk_assign_tcp_reqsk() or cookie_tcp_reqsk_alloc() before the req can be seen by other cpus/threads. This completes the revert of eba20811f326 ("tcp: annotate data-races around tcp_rsk(req)->ts_recent"). Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wang Hai <wanghai38@huawei.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250301201424.2046477-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03net: gro: convert four dev_net() callsEric Dumazet
tcp4_check_fraglist_gro(), tcp6_check_fraglist_gro(), udp4_gro_lookup_skb() and udp6_gro_lookup_skb() assume RCU is held so that the net structure does not disappear. Use dev_net_rcu() instead of dev_net() to get LOCKDEP support. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250301201424.2046477-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03tcp: convert to dev_net_rcu()Eric Dumazet
TCP uses of dev_net() are under RCU protection, change them to dev_net_rcu() to get LOCKDEP support. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250301201424.2046477-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03tcp: add four drop reasons to tcp_check_req()Eric Dumazet
Use two existing drop reasons in tcp_check_req(): - TCP_RFC7323_PAWS - TCP_OVERWINDOW Add two new ones: - TCP_RFC7323_TSECR (corresponds to LINUX_MIB_TSECRREJECTED) - TCP_LISTEN_OVERFLOW (when a listener accept queue is full) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250301201424.2046477-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03tcp: add a drop_reason pointer to tcp_check_req()Eric Dumazet
We want to add new drop reasons for packets dropped in 3WHS in the following patches. tcp_rcv_state_process() has to set reason to TCP_FASTOPEN, because tcp_check_req() will conditionally overwrite the drop_reason. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250301201424.2046477-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Convert RTM_NEWROUTE and RTM_DELROUTE to per-netns RTNL.Kuniyuki Iwashima
We converted fib_info hash tables to per-netns one and now ready to convert RTM_NEWROUTE and RTM_DELROUTE to per-netns RTNL. Let's hold rtnl_net_lock() in inet_rtm_newroute() and inet_rtm_delroute(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-13-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Move fib_valid_key_len() to rtm_to_fib_config().Kuniyuki Iwashima
fib_valid_key_len() is called in the beginning of fib_table_insert() or fib_table_delete() to check if the prefix length is valid. fib_table_insert() and fib_table_delete() are called from 3 paths - ip_rt_ioctl() - inet_rtm_newroute() / inet_rtm_delroute() - fib_magic() In the first ioctl() path, rtentry_to_fib_config() checks the prefix length with bad_mask(). Also, fib_magic() always passes the correct prefix: 32 or ifa->ifa_prefixlen, which is already validated. Let's move fib_valid_key_len() to the rtnetlink path, rtm_to_fib_config(). While at it, 2 direct returns in rtm_to_fib_config() are changed to goto to match other places in the same function Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-12-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Hold rtnl_net_lock() in ip_rt_ioctl().Kuniyuki Iwashima
ioctl(SIOCADDRT/SIOCDELRT) calls ip_rt_ioctl() to add/remove a route in the netns of the specified socket. Let's hold rtnl_net_lock() there. Note that rtentry_to_fib_config() can be called without rtnl_net_lock() if we convert rtentry.dev handling to RCU later. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-11-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Hold rtnl_net_lock() for ip_fib_net_exit().Kuniyuki Iwashima
ip_fib_net_exit() requires RTNL and is called from fib_net_init() and fib_net_exit_batch(). Let's hold rtnl_net_lock() before ip_fib_net_exit(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-10-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Namespacify fib_info hash tables.Kuniyuki Iwashima
We will convert RTM_NEWROUTE and RTM_DELROUTE to per-netns RTNL. Then, we need to have per-netns hash tables for struct fib_info. Let's allocate the hash tables per netns. fib_info_hash, fib_info_hash_bits, and fib_info_cnt are now moved to struct netns_ipv4 and accessed with net->ipv4.fib_XXX. Also, the netns checks are removed from fib_find_info_nh() and fib_find_info(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-9-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Add fib_info_hash_grow().Kuniyuki Iwashima
When the number of struct fib_info exceeds the hash table size in fib_create_info(), we try to allocate a new hash table with the doubled size. The allocation is done in fib_create_info(), and if successful, each struct fib_info is moved to the new hash table by fib_info_hash_move(). Let's integrate the allocation and fib_info_hash_move() as fib_info_hash_grow() to make the following change cleaner. While at it, fib_info_hash_grow() is placed near other hash-table-specific functions. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-8-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Remove fib_info_hash_size.Kuniyuki Iwashima
We will allocate the fib_info hash tables per netns. There are 5 global variables for fib_info hash tables: fib_info_hash, fib_info_laddrhash, fib_info_hash_size, fib_info_hash_bits, fib_info_cnt. However, fib_info_laddrhash and fib_info_hash_size can be easily calculated from fib_info_hash and fib_info_hash_bits. Let's remove fib_info_hash_size and use (1 << fib_info_hash_bits) instead. Now we need not pass the new hash table size to fib_info_hash_move(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Remove fib_info_laddrhash pointer.Kuniyuki Iwashima
We will allocate the fib_info hash tables per netns. There are 5 global variables for fib_info hash tables: fib_info_hash, fib_info_laddrhash, fib_info_hash_size, fib_info_hash_bits, fib_info_cnt. However, fib_info_laddrhash and fib_info_hash_size can be easily calculated from fib_info_hash and fib_info_hash_bits. Let's remove the fib_info_laddrhash pointer and instead use fib_info_hash + (1 << fib_info_hash_bits). While at it, fib_info_laddrhash_bucket() is moved near other hash-table-specific functions. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Make fib_info_hashfn() return struct hlist_head.Kuniyuki Iwashima
Every time fib_info_hashfn() returns a hash value, we fetch &fib_info_hash[hash]. Let's return the hlist_head pointer from fib_info_hashfn() and rename it to fib_info_hash_bucket() to match a similar function, fib_info_laddrhash_bucket(). Note that we need to move the fib_info_hash assignment earlier in fib_info_hash_move() to use fib_info_hash_bucket() in the for loop. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Allocate fib_info_hash[] during netns initialisation.Kuniyuki Iwashima
We will allocate fib_info_hash[] and fib_info_laddrhash[] for each netns. Currently, fib_info_hash[] is allocated when the first route is added. Let's move the first allocation to a new __net_init function. Note that we must call fib4_semantics_exit() in fib_net_exit_batch() because ->exit() is called earlier than ->exit_batch(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Allocate fib_info_hash[] and fib_info_laddrhash[] by kvcalloc().Kuniyuki Iwashima
Both fib_info_hash[] and fib_info_laddrhash[] are hash tables for struct fib_info and are allocated by kvzmalloc() separately. Let's replace the two kvzmalloc() calls with kvcalloc() to remove the fib_info_laddrhash pointer later. Note that fib_info_hash_alloc() allocates a new hash table based on fib_info_hash_bits because we will remove fib_info_hash_size later. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Use cached net in fib_inetaddr_event().Kuniyuki Iwashima
net is available in fib_inetaddr_event(), let's use it. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03llc: do not use skb_get() before dev_queue_xmit()Eric Dumazet
syzbot is able to crash hosts [1], using llc and devices not supporting IFF_TX_SKB_SHARING. In this case, e1000 driver calls eth_skb_pad(), while the skb is shared. Simply replace skb_get() by skb_clone() in net/llc/llc_s_ac.c Note that e1000 driver might have an issue with pktgen, because it does not clear IFF_TX_SKB_SHARING, this is an orthogonal change. We need to audit other skb_get() uses in net/llc. [1] kernel BUG at net/core/skbuff.c:2178 ! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 0 UID: 0 PID: 16371 Comm: syz.2.2764 Not tainted 6.14.0-rc4-syzkaller-00052-gac9c34d1e45a #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:pskb_expand_head+0x6ce/0x1240 net/core/skbuff.c:2178 Call Trace: <TASK> __skb_pad+0x18a/0x610 net/core/skbuff.c:2466 __skb_put_padto include/linux/skbuff.h:3843 [inline] skb_put_padto include/linux/skbuff.h:3862 [inline] eth_skb_pad include/linux/etherdevice.h:656 [inline] e1000_xmit_frame+0x2d99/0x5800 drivers/net/ethernet/intel/e1000/e1000_main.c:3128 __netdev_start_xmit include/linux/netdevice.h:5151 [inline] netdev_start_xmit include/linux/netdevice.h:5160 [inline] xmit_one net/core/dev.c:3806 [inline] dev_hard_start_xmit+0x9a/0x7b0 net/core/dev.c:3822 sch_direct_xmit+0x1ae/0xc30 net/sched/sch_generic.c:343 __dev_xmit_skb net/core/dev.c:4045 [inline] __dev_queue_xmit+0x13d4/0x43e0 net/core/dev.c:4621 dev_queue_xmit include/linux/netdevice.h:3313 [inline] llc_sap_action_send_test_c+0x268/0x320 net/llc/llc_s_ac.c:144 llc_exec_sap_trans_actions net/llc/llc_sap.c:153 [inline] llc_sap_next_state net/llc/llc_sap.c:182 [inline] llc_sap_state_process+0x239/0x510 net/llc/llc_sap.c:209 llc_ui_sendmsg+0xd0d/0x14e0 net/llc/af_llc.c:993 sock_sendmsg_nosec net/socket.c:718 [inline] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+da65c993ae113742a25f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/67c020c0.050a0220.222324.0011.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-03-03netfilter: nft_ct: Use __refcount_inc() for per-CPU nft_ct_pcpu_template.Sebastian Andrzej Siewior
nft_ct_pcpu_template is a per-CPU variable and relies on disabled BH for its locking. The refcounter is read and if its value is set to one then the refcounter is incremented and variable is used - otherwise it is already in use and left untouched. Without per-CPU locking in local_bh_disable() on PREEMPT_RT the read-then-increment operation is not atomic and therefore racy. This can be avoided by using unconditionally __refcount_inc() which will increment counter and return the old value as an atomic operation. In case the returned counter is not one, the variable is in use and we need to decrement counter. Otherwise we can use it. Use __refcount_inc() instead of read and a conditional increment. Fixes: edee4f1e9245 ("netfilter: nft_ct: add zone id set support") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-03-03wifi: cfg80211: regulatory: improve invalid hints checkingNikita Zhandarovich
Syzbot keeps reporting an issue [1] that occurs when erroneous symbols sent from userspace get through into user_alpha2[] via regulatory_hint_user() call. Such invalid regulatory hints should be rejected. While a sanity check from commit 47caf685a685 ("cfg80211: regulatory: reject invalid hints") looks to be enough to deter these very cases, there is a way to get around it due to 2 reasons. 1) The way isalpha() works, symbols other than latin lower and upper letters may be used to determine a country/domain. For instance, greek letters will also be considered upper/lower letters and for such characters isalpha() will return true as well. However, ISO-3166-1 alpha2 codes should only hold latin characters. 2) While processing a user regulatory request, between reg_process_hint_user() and regulatory_hint_user() there happens to be a call to queue_regulatory_request() which modifies letters in request->alpha2[] with toupper(). This works fine for latin symbols, less so for weird letter characters from the second part of _ctype[]. Syzbot triggers a warning in is_user_regdom_saved() by first sending over an unexpected non-latin letter that gets malformed by toupper() into a character that ends up failing isalpha() check. Prevent this by enhancing is_an_alpha2() to ensure that incoming symbols are latin letters and nothing else. [1] Syzbot report: ------------[ cut here ]------------ Unexpected user alpha2: A� WARNING: CPU: 1 PID: 964 at net/wireless/reg.c:442 is_user_regdom_saved net/wireless/reg.c:440 [inline] WARNING: CPU: 1 PID: 964 at net/wireless/reg.c:442 restore_alpha2 net/wireless/reg.c:3424 [inline] WARNING: CPU: 1 PID: 964 at net/wireless/reg.c:442 restore_regulatory_settings+0x3c0/0x1e50 net/wireless/reg.c:3516 Modules linked in: CPU: 1 UID: 0 PID: 964 Comm: kworker/1:2 Not tainted 6.12.0-rc5-syzkaller-00044-gc1e939a21eb1 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Workqueue: events_power_efficient crda_timeout_work RIP: 0010:is_user_regdom_saved net/wireless/reg.c:440 [inline] RIP: 0010:restore_alpha2 net/wireless/reg.c:3424 [inline] RIP: 0010:restore_regulatory_settings+0x3c0/0x1e50 net/wireless/reg.c:3516 ... Call Trace: <TASK> crda_timeout_work+0x27/0x50 net/wireless/reg.c:542 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa65/0x1850 kernel/workqueue.c:3310 worker_thread+0x870/0xd30 kernel/workqueue.c:3391 kthread+0x2f2/0x390 kernel/kthread.c:389 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Reported-by: syzbot+e10709ac3c44f3d4e800@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e10709ac3c44f3d4e800 Fixes: 09d989d179d0 ("cfg80211: add regulatory hint disconnect support") Cc: stable@kernel.org Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Link: https://patch.msgid.link/20250228134659.1577656-1-n.zhandarovich@fintech.ru Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-02net/tls: use the new scatterwalk functionsEric Biggers
Replace calls to the deprecated function scatterwalk_copychunks() with memcpy_from_scatterwalk(), memcpy_to_scatterwalk(), or scatterwalk_skip() as appropriate. The new functions generally behave more as expected and eliminate the need to call scatterwalk_done() or scatterwalk_pagedone(). However, the new functions intentionally do not advance to the next sg entry right away, which would have broken chain_to_walk() which is accessing the fields of struct scatter_walk directly. To avoid this, replace chain_to_walk() with scatterwalk_get_sglist() which supports the needed functionality. Cc: Boris Pismenny <borisp@nvidia.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-02-28net: gso: fix ownership in __udp_gso_segmentAntoine Tenart
In __udp_gso_segment the skb destructor is removed before segmenting the skb but the socket reference is kept as-is. This is an issue if the original skb is later orphaned as we can hit the following bug: kernel BUG at ./include/linux/skbuff.h:3312! (skb_orphan) RIP: 0010:ip_rcv_core+0x8b2/0xca0 Call Trace: ip_rcv+0xab/0x6e0 __netif_receive_skb_one_core+0x168/0x1b0 process_backlog+0x384/0x1100 __napi_poll.constprop.0+0xa1/0x370 net_rx_action+0x925/0xe50 The above can happen following a sequence of events when using OpenVSwitch, when an OVS_ACTION_ATTR_USERSPACE action precedes an OVS_ACTION_ATTR_OUTPUT action: 1. OVS_ACTION_ATTR_USERSPACE is handled (in do_execute_actions): the skb goes through queue_gso_packets and then __udp_gso_segment, where its destructor is removed. 2. The segments' data are copied and sent to userspace. 3. OVS_ACTION_ATTR_OUTPUT is handled (in do_execute_actions) and the same original skb is sent to its path. 4. If it later hits skb_orphan, we hit the bug. Fix this by also removing the reference to the socket in __udp_gso_segment. Fixes: ad405857b174 ("udp: better wmem accounting on gso") Signed-off-by: Antoine Tenart <atenart@kernel.org> Link: https://patch.msgid.link/20250226171352.258045-1-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-28inet: ping: avoid skb_clone() dance in ping_rcv()Eric Dumazet
ping_rcv() callers currently call skb_free() or consume_skb(), forcing ping_rcv() to clone the skb. After this patch ping_rcv() is now 'consuming' the original skb, either moving to a socket receive queue, or dropping it. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250226183437.1457318-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-28ipv4: icmp: do not process ICMP_EXT_ECHOREPLY for broadcast/multicast addressesEric Dumazet
There is no point processing ICMP_EXT_ECHOREPLY for routes which would drop ICMP_ECHOREPLY (RFC 1122 3.2.2.6, 3.2.2.8) This seems an oversight of the initial implementation. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250226183437.1457318-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-28wifi: mac80211: refactor populating mesh related fields in sinfoSarika Sharma
Introduce the sta_set_mesh_sinfo() to populate mesh related fields in sinfo structure for station statistics. This will allow for the simplified population of other fields in the sinfo structure for link level in a subsequent patch to add support for MLO station statistics. No functionality changes added. Signed-off-by: Sarika Sharma <quic_sarishar@quicinc.com> Link: https://patch.msgid.link/20250213171632.1646538-3-quic_sarishar@quicinc.com [reword since it's just an internal thing] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-28wifi: cfg80211: reorg sinfo structure elements for meshSarika Sharma
Currently, as multi-link operation(MLO) is not supported for mesh, reorganize the sinfo structure for mesh-specific fields and embed mesh related NL attributes together in organized view. This will allow for the simplified reorganization of sinfo structure for link level in a subsequent patch to add support for MLO station statistics. No functionality changes added. Pahole summary before the reorg of sinfo structure: - size: 256, cachelines: 4, members: 50 - sum members: 239, holes: 4, sum holes: 17 - paddings: 2, sum paddings: 2 - forced alignments: 1, forced holes: 1, sum forced holes: 1 Pahole summary after the reorg of sinfo structure: - size: 248, cachelines: 4, members: 50 - sum members: 239, holes: 4, sum holes: 9 - paddings: 2, sum paddings: 2 - forced alignments: 1, last cacheline: 56 bytes Signed-off-by: Sarika Sharma <quic_sarishar@quicinc.com> Link: https://patch.msgid.link/20250213171632.1646538-2-quic_sarishar@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-27net: sched: Remove newline at the end of a netlink error messageGal Pressman
Netlink error messages should not have a newline at the end of the string. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250226093904.6632-5-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27net-sysfs: remove unused initial ret valuesAntoine Tenart
In some net-sysfs functions the ret value is initialized but never used as it is always overridden. Remove those. Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Link: https://patch.msgid.link/20250226174644.311136-1-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27Bluetooth: Add check for mgmt_alloc_skb() in mgmt_device_connected()Haoxiang Li
Add check for the return value of mgmt_alloc_skb() in mgmt_device_connected() to prevent null pointer dereference. Fixes: e96741437ef0 ("Bluetooth: mgmt: Make use of mgmt_send_event_skb in MGMT_EV_DEVICE_CONNECTED") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-02-27Bluetooth: Add check for mgmt_alloc_skb() in mgmt_remote_name()Haoxiang Li
Add check for the return value of mgmt_alloc_skb() in mgmt_remote_name() to prevent null pointer dereference. Fixes: ba17bb62ce41 ("Bluetooth: Fix skb allocation in mgmt_remote_name() & mgmt_device_connected()") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-02-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.14-rc5). Conflicts: drivers/net/ethernet/cadence/macb_main.c fa52f15c745c ("net: cadence: macb: Synchronize stats calculations") 75696dd0fd72 ("net: cadence: macb: Convert to get_stats64") https://lore.kernel.org/20250224125848.68ee63e5@canb.auug.org.au Adjacent changes: drivers/net/ethernet/intel/ice/ice_sriov.c 79990cf5e7ad ("ice: Fix deinitializing VF in error path") a203163274a4 ("ice: simplify VF MSI-X managing") net/ipv4/tcp.c 18912c520674 ("tcp: devmem: don't write truncated dmabuf CMSGs to userspace") 297d389e9e5b ("net: prefix devmem specific helpers") net/mptcp/subflow.c 8668860b0ad3 ("mptcp: reset when MPTCP opts are dropped after join") c3349a22c200 ("mptcp: consolidate subflow cleanup") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-27Merge tag 'net-6.14-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth. We didn't get netfilter or wireless PRs this week, so next week's PR is probably going to be bigger. A healthy dose of fixes for bugs introduced in the current release nonetheless. Current release - regressions: - Bluetooth: always allow SCO packets for user channel - af_unix: fix memory leak in unix_dgram_sendmsg() - rxrpc: - remove redundant peer->mtu_lock causing lockdep splats - fix spinlock flavor issues with the peer record hash - eth: iavf: fix circular lock dependency with netdev_lock - net: use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net() RDMA driver register notifier after the device Current release - new code bugs: - ethtool: fix ioctl confusing drivers about desired HDS user config - eth: ixgbe: fix media cage present detection for E610 device Previous releases - regressions: - loopback: avoid sending IP packets without an Ethernet header - mptcp: reset connection when MPTCP opts are dropped after join Previous releases - always broken: - net: better track kernel sockets lifetime - ipv6: fix dst ref loop on input in seg6 and rpl lw tunnels - phy: qca807x: use right value from DTS for DAC_DSP_BIAS_CURRENT - eth: enetc: number of error handling fixes - dsa: rtl8366rb: reshuffle the code to fix config / build issue with LED support" * tag 'net-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits) net: ti: icss-iep: Reject perout generation request idpf: fix checksums set in idpf_rx_rsc() selftests: drv-net: Check if combined-count exists net: ipv6: fix dst ref loop on input in rpl lwt net: ipv6: fix dst ref loop on input in seg6 lwt usbnet: gl620a: fix endpoint checking in genelink_bind() net/mlx5: IRQ, Fix null string in debug print net/mlx5: Restore missing trace event when enabling vport QoS net/mlx5: Fix vport QoS cleanup on error net: mvpp2: cls: Fixed Non IP flow, with vlan tag flow defination. af_unix: Fix memory leak in unix_dgram_sendmsg() net: Handle napi_schedule() calls from non-interrupt net: Clear old fragment checksum value in napi_reuse_skb gve: unlink old napi when stopping a queue using queue API net: Use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net(). tcp: Defer ts_recent changes until req is owned net: enetc: fix the off-by-one issue in enetc_map_tx_tso_buffs() net: enetc: remove the mm_lock from the ENETC v4 driver net: enetc: add missing enetc4_link_deinit() net: enetc: update UDP checksum when updating originTimestamp field ...
2025-02-27net: ipv6: fix dst ref loop on input in rpl lwtJustin Iurman
Prevent a dst ref loop on input in rpl_iptunnel. Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel") Cc: Alexander Aring <alex.aring@gmail.com> Cc: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-27net: ipv6: fix dst ref loop on input in seg6 lwtJustin Iurman
Prevent a dst ref loop on input in seg6_iptunnel. Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input") Cc: David Lebrun <dlebrun@google.com> Cc: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-27xdp: remove xdp_alloc_skb_bulk()Alexander Lobakin
The only user was veth, which now uses napi_skb_cache_get_bulk(). It's now preferred over a direct allocation and is exported as well, so remove this one. Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-27net: skbuff: introduce napi_skb_cache_get_bulk()Alexander Lobakin
Add a function to get an array of skbs from the NAPI percpu cache. It's supposed to be a drop-in replacement for kmem_cache_alloc_bulk(skbuff_head_cache, GFP_ATOMIC) and xdp_alloc_skb_bulk(GFP_ATOMIC). The difference (apart from the requirement to call it only from the BH) is that it tries to use as many NAPI cache entries for skbs as possible, and allocate new ones only if needed. The logic is as follows: * there is enough skbs in the cache: decache them and return to the caller; * not enough: try refilling the cache first. If there is now enough skbs, return; * still not enough: try allocating skbs directly to the output array with %GFP_ZERO, maybe we'll be able to get some. If there's now enough, return; * still not enough: return as many as we were able to obtain. Most of times, if called from the NAPI polling loop, the first one will be true, sometimes (rarely) the second one. The third and the fourth -- only under heavy memory pressure. It can save significant amounts of CPU cycles if there are GRO cycles and/or Tx completion cycles (anything that descends to napi_skb_cache_put()) happening on this CPU. Tested-by: Daniel Xu <dxu@dxuuu.xyz> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-27net: gro: expose GRO init/cleanup to use outside of NAPIAlexander Lobakin
Make GRO init and cleanup functions global to be able to use GRO without a NAPI instance. Taking into account already global gro_flush(), it's now fully usable standalone. New functions are not exported, since they're not supposed to be used outside of the kernel core code. Tested-by: Daniel Xu <dxu@dxuuu.xyz> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-27net: gro: decouple GRO from the NAPI layerAlexander Lobakin
In fact, these two are not tied closely to each other. The only requirements to GRO are to use it in the BH context and have some sane limits on the packet batches, e.g. NAPI has a limit of its budget (64/8/etc.). Move purely GRO fields into a new structure, &gro_node. Embed it into &napi_struct and adjust all the references. gro_node::cached_napi_id is effectively the same as napi_struct::napi_id, but to be used on GRO hotpath to mark skbs. napi_struct::napi_id is now a fully control path field. Three Ethernet drivers use napi_gro_flush() not really meant to be exported, so move it to <net/gro.h> and add that include there. napi_gro_receive() is used in more than 100 drivers, keep it in <linux/netdevice.h>. This does not make GRO ready to use outside of the NAPI context yet. Tested-by: Daniel Xu <dxu@dxuuu.xyz> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-27pktgen: avoid unused-const-variable warningArnd Bergmann
When extra warnings are enable, there are configurations that build pktgen without CONFIG_XFRM, which leaves a static const variable unused: net/core/pktgen.c:213:1: error: unused variable 'F_IPSEC' [-Werror,-Wunused-const-variable] 213 | PKT_FLAGS | ^~~~~~~~~ net/core/pktgen.c:197:2: note: expanded from macro 'PKT_FLAGS' 197 | pf(IPSEC) /* ipsec on for flows */ \ | ^~~~~~~~~ This could be marked as __maybe_unused, or by making the one use visible to the compiler by slightly rearranging the #ifdef blocks. The second variant looks slightly nicer here, so use that. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Peter Seiderer <ps.report@gmx.net> Link: https://patch.msgid.link/20250225085722.469868-1-arnd@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-26net: move aRFS rmap management and CPU affinity to coreAhmed Zaki
A common task for most drivers is to remember the user-set CPU affinity to its IRQs. On each netdev reset, the driver should re-assign the user's settings to the IRQs. Unify this task across all drivers by moving the CPU affinity to napi->config. However, to move the CPU affinity to core, we also need to move aRFS rmap management since aRFS uses its own IRQ notifiers. For the aRFS, add a new netdev flag "rx_cpu_rmap_auto". Drivers supporting aRFS should set the flag via netif_enable_cpu_rmap() and core will allocate and manage the aRFS rmaps. Freeing the rmap is also done by core when the netdev is freed. For better IRQ affinity management, move the IRQ rmap notifier inside the napi_struct and add new notify.notify and notify.release functions: netif_irq_cpu_rmap_notify() and netif_napi_affinity_release(). Now we have the aRFS rmap management in core, add CPU affinity mask to napi_config. To delegate the CPU affinity management to the core, drivers must: 1 - set the new netdev flag "irq_affinity_auto": netif_enable_irq_affinity(netdev) 2 - create the napi with persistent config: netif_napi_add_config() 3 - bind an IRQ to the napi instance: netif_napi_set_irq() the core will then make sure to use re-assign affinity to the napi's IRQ. The default IRQ mask is set to one cpu starting from the closest NUMA. Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Link: https://patch.msgid.link/20250224232228.990783-2-ahmed.zaki@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26af_unix: Fix memory leak in unix_dgram_sendmsg()Adrian Huang
After running the 'sendmsg02' program of Linux Test Project (LTP), kmemleak reports the following memory leak: # cat /sys/kernel/debug/kmemleak unreferenced object 0xffff888243866800 (size 2048): comm "sendmsg02", pid 67, jiffies 4294903166 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 5e 00 00 00 00 00 00 00 ........^....... 01 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00 ...@............ backtrace (crc 7e96a3f2): kmemleak_alloc+0x56/0x90 kmem_cache_alloc_noprof+0x209/0x450 sk_prot_alloc.constprop.0+0x60/0x160 sk_alloc+0x32/0xc0 unix_create1+0x67/0x2b0 unix_create+0x47/0xa0 __sock_create+0x12e/0x200 __sys_socket+0x6d/0x100 __x64_sys_socket+0x1b/0x30 x64_sys_call+0x7e1/0x2140 do_syscall_64+0x54/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e Commit 689c398885cc ("af_unix: Defer sock_put() to clean up path in unix_dgram_sendmsg().") defers sock_put() in the error handling path. However, it fails to account for the condition 'msg->msg_namelen != 0', resulting in a memory leak when the code jumps to the 'lookup' label. Fix issue by calling sock_put() if 'msg->msg_namelen != 0' is met. Fixes: 689c398885cc ("af_unix: Defer sock_put() to clean up path in unix_dgram_sendmsg().") Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Acked-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250225021457.1824-1-ahuang12@lenovo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26net: skb: free up one bit in tx_flagsWillem de Bruijn
The linked series wants to add skb tx completion timestamps. That needs a bit in skb_shared_info.tx_flags, but all are in use. A per-skb bit is only needed for features that are configured on a per packet basis. Per socket features can be read from sk->sk_tsflags. Per packet tsflags can be set in sendmsg using cmsg, but only those in SOF_TIMESTAMPING_TX_RECORD_MASK. Per packet tsflags can also be set without cmsg by sandwiching a send inbetween two setsockopts: val |= SOF_TIMESTAMPING_$FEATURE; setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val)); write(fd, buf, sz); val &= ~SOF_TIMESTAMPING_$FEATURE; setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val)); Changing a datapath test from skb_shinfo(skb)->tx_flags to skb->sk->sk_tsflags can change behavior in that case, as the tx_flags is written before the second setsockopt updates sk_tsflags. Therefore, only bits can be reclaimed that cannot be set by cmsg and are also highly unlikely to be used to target individual packets otherwise. Free up the bit currently used for SKBTX_HW_TSTAMP_USE_CYCLES. This selects between clock and free running counter source for HW TX timestamps. It is probable that all packets of the same socket will always use the same source. Link: https://lore.kernel.org/netdev/cover.1739988644.git.pav@iki.fi/ Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Link: https://patch.msgid.link/20250225023416.2088705-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>