summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)Author
2022-10-25soreuseport: Fix socket selection for SO_INCOMING_CPU.Kuniyuki Iwashima
Kazuho Oku reported that setsockopt(SO_INCOMING_CPU) does not work with setsockopt(SO_REUSEPORT) since v4.6. With the combination of SO_REUSEPORT and SO_INCOMING_CPU, we could build a highly efficient server application. setsockopt(SO_INCOMING_CPU) associates a CPU with a TCP listener or UDP socket, and then incoming packets processed on the CPU will likely be distributed to the socket. Technically, a socket could even receive packets handled on another CPU if no sockets in the reuseport group have the same CPU receiving the flow. The logic exists in compute_score() so that a socket will get a higher score if it has the same CPU with the flow. However, the score gets ignored after the blamed two commits, which introduced a faster socket selection algorithm for SO_REUSEPORT. This patch introduces a counter of sockets with SO_INCOMING_CPU in a reuseport group to check if we should iterate all sockets to find a proper one. We increment the counter when * calling listen() if the socket has SO_INCOMING_CPU and SO_REUSEPORT * enabling SO_INCOMING_CPU if the socket is in a reuseport group Also, we decrement it when * detaching a socket out of the group to apply SO_INCOMING_CPU to migrated TCP requests * disabling SO_INCOMING_CPU if the socket is in a reuseport group When the counter reaches 0, we can get back to the O(1) selection algorithm. The overall changes are negligible for the non-SO_INCOMING_CPU case, and the only notable thing is that we have to update sk_incomnig_cpu under reuseport_lock. Otherwise, the race prevents transitioning to the O(n) algorithm and results in the wrong socket selection. cpu1 (setsockopt) cpu2 (listen) +-----------------+ +-------------+ lock_sock(sk1) lock_sock(sk2) reuseport_update_incoming_cpu(sk1, val) . | /* set CPU as 0 */ |- WRITE_ONCE(sk1->incoming_cpu, val) | | spin_lock_bh(&reuseport_lock) | reuseport_grow(sk2, reuse) | . | |- more_socks_size = reuse->max_socks * 2U; | |- if (more_socks_size > U16_MAX && | | reuse->num_closed_socks) | | . | | |- RCU_INIT_POINTER(sk1->sk_reuseport_cb, NULL); | | `- __reuseport_detach_closed_sock(sk1, reuse) | | . | | `- reuseport_put_incoming_cpu(sk1, reuse) | | . | | | /* Read shutdown()ed sk1's sk_incoming_cpu | | | * without lock_sock(). | | | */ | | `- if (sk1->sk_incoming_cpu >= 0) | | . | | | /* decrement not-yet-incremented | | | * count, which is never incremented. | | | */ | | `- __reuseport_put_incoming_cpu(reuse); | | | `- spin_lock_bh(&reuseport_lock) | |- spin_lock_bh(&reuseport_lock) | |- reuse = rcu_dereference_protected(sk1->sk_reuseport_cb, ...) |- if (!reuse) | . | | /* Cannot increment reuse->incoming_cpu. */ | `- goto out; | `- spin_unlock_bh(&reuseport_lock) Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection") Fixes: c125e80b8868 ("soreuseport: fast reuseport TCP socket selection") Reported-by: Kazuho Oku <kazuhooku@gmail.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-25act_skbedit: skbedit queue mapping for receive queueAmritha Nambiar
Add support for skbedit queue mapping action on receive side. This is supported only in hardware, so the skip_sw flag is enforced. This enables offloading filters for receive queue selection in the hardware using the skbedit action. Traffic arrives on the Rx queue requested in the skbedit action parameter. A new tc action flag TCA_ACT_FLAGS_AT_INGRESS is introduced to identify the traffic direction the action queue_mapping is requested on during filter addition. This is used to disallow offloading the skbedit queue mapping action on transmit side. Example: $tc filter add dev $IFACE ingress protocol ip flower dst_ip $DST_IP\ action skbedit queue_mapping $rxq_id skip_sw Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-24genetlink: piggy back on resv_op to default to a reject policyJakub Kicinski
To keep backward compatibility we used to leave attribute parsing to the family if no policy is specified. This becomes tedious as we move to more strict validation. Families must define reject all policies if they don't want any attributes accepted. Piggy back on the resv_start_op field as the switchover point. AFAICT only ethtool has added new commands since the resv_start_op was defined, and it has per-op policies so this should be a no-op. Nonetheless the patch should still go into v6.1 for consistency. Link: https://lore.kernel.org/all/20221019125745.3f2e7659@kernel.org/ Link: https://lore.kernel.org/r/20221021193532.1511293-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
include/linux/net.h a5ef058dc4d9 ("net: introduce and use custom sockopt socket flag") e993ffe3da4b ("net: flag sockets supporting msghdr originated zerocopy") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-24net-memcg: avoid stalls when under memory pressureJakub Kicinski
As Shakeel explains the commit under Fixes had the unintended side-effect of no longer pre-loading the cached memory allowance. Even tho we previously dropped the first packet received when over memory limit - the consecutive ones would get thru by using the cache. The charging was happening in batches of 128kB, so we'd let in 128kB (truesize) worth of packets per one drop. After the change we no longer force charge, there will be no cache filling side effects. This causes significant drops and connection stalls for workloads which use a lot of page cache, since we can't reclaim page cache under GFP_NOWAIT. Some of the latency can be recovered by improving SACK reneg handling but nowhere near enough to get back to the pre-5.15 performance (the application I'm experimenting with still sees 5-10x worst latency). Apply the suggested workaround of using GFP_ATOMIC. We will now be more permissive than previously as we'll drop _no_ packets in softirq when under pressure. But I can't think of any good and simple way to address that within networking. Link: https://lore.kernel.org/all/20221012163300.795e7b86@kernel.org/ Suggested-by: Shakeel Butt <shakeelb@google.com> Fixes: 4b1327be9fe5 ("net-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem()") Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20221021160304.1362511-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-24net: remove useless parameter of __sock_cmsg_sendxu xin
The parameter 'msg' has never been used by __sock_cmsg_send, so we can remove it safely. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: xu xin <xu.xin16@zte.com.cn> Reviewed-by: Zhang Yunkai <zhang.yunkai@zte.com.cn> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24net: add a refcount tracker for kernel socketsEric Dumazet
Commit ffa84b5ffb37 ("net: add netns refcount tracker to struct sock") added a tracker to sockets, but did not track kernel sockets. We still have syzbot reports hinting about netns being destroyed while some kernel TCP sockets had not been dismantled. This patch tracks kernel sockets, and adds a ref_tracker_dir_print() call to net_free() right before the netns is freed. Normally, each layer is responsible for properly releasing its kernel sockets before last call to net_free(). This debugging facility is enabled with CONFIG_NET_NS_REFCNT_TRACKER=y Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Tested-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24udp: track the forward memory release threshold in an hot cachelinePaolo Abeni
When the receiver process and the BH runs on different cores, udp_rmem_release() experience a cache miss while accessing sk_rcvbuf, as the latter shares the same cacheline with sk_forward_alloc, written by the BH. With this patch, UDP tracks the rcvbuf value and its update via custom SOL_SOCKET socket options, and copies the forward memory threshold value used by udp_rmem_release() in a different cacheline, already accessed by the above function and uncontended. Since the UDP socket init operation grown a bit, factor out the common code between v4 and v6 in a shared helper. Overall the above give a 10% peek throughput increase under UDP flood. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24inet6: Remove inet6_destroy_sock().Kuniyuki Iwashima
The last user of inet6_destroy_sock() is its wrapper inet6_cleanup_sock(). Let's rename inet6_destroy_sock() to inet6_cleanup_sock(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-20sctp: remove unnecessary NULL check in sctp_association_init()Alexey Kodanev
'&asoc->ulpq' passed to sctp_ulpq_init() as the first argument, then sctp_qlpq_init() initializes it and eventually returns the address of the struct member back. Therefore, in this case, the return pointer cannot be NULL. Moreover, it seems sctp_ulpq_init() has always been used only in sctp_association_init(), so there's really no need to return ulpq anymore. Detected using the static analysis tool - Svace. Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20221019180735.161388-1-aleksei.kodanev@bell-sw.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-20Merge tag 'net-6.1-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter. Current release - regressions: - revert "net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and}" - revert "net: sched: fq_codel: remove redundant resource cleanup in fq_codel_init()" - dsa: uninitialized variable in dsa_slave_netdevice_event() - eth: sunhme: uninitialized variable in happy_meal_init() Current release - new code bugs: - eth: octeontx2: fix resource not freed after malloc Previous releases - regressions: - sched: fix return value of qdisc ingress handling on success - sched: fix race condition in qdisc_graft() - udp: update reuse->has_conns under reuseport_lock. - tls: strp: make sure the TCP skbs do not have overlapping data - hsr: avoid possible NULL deref in skb_clone() - tipc: fix an information leak in tipc_topsrv_kern_subscr - phylink: add mac_managed_pm in phylink_config structure - eth: i40e: fix DMA mappings leak - eth: hyperv: fix a RX-path warning - eth: mtk: fix memory leaks Previous releases - always broken: - sched: cake: fix null pointer access issue when cake_init() fails" * tag 'net-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits) net: phy: dp83822: disable MDI crossover status change interrupt net: sched: fix race condition in qdisc_graft() net: hns: fix possible memory leak in hnae_ae_register() wwan_hwsim: fix possible memory leak in wwan_hwsim_dev_new() sfc: include vport_id in filter spec hash and equal() genetlink: fix kdoc warnings selftests: add selftest for chaining of tc ingress handling to egress net: Fix return value of qdisc ingress handling on success net: sched: sfb: fix null pointer access issue when sfb_init() fails Revert "net: sched: fq_codel: remove redundant resource cleanup in fq_codel_init()" net: sched: cake: fix null pointer access issue when cake_init() fails ethernet: marvell: octeontx2 Fix resource not freed after malloc netfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirements netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces. ionic: catch NULL pointer issue on reconfig net: hsr: avoid possible NULL deref in skb_clone() bnxt_en: fix memory leak in bnxt_nvm_test() ip6mr: fix UAF issue in ip6mr_sk_done() when addrconf_init_net() failed udp: Update reuse->has_conns under reuseport_lock. net: ethernet: mediatek: ppe: Remove the unused function mtk_foe_entry_usable() ...
2022-10-19genetlink: fix kdoc warningsJakub Kicinski
Address a bunch of kdoc warnings: include/net/genetlink.h:81: warning: Function parameter or member 'module' not described in 'genl_family' include/net/genetlink.h:243: warning: expecting prototype for struct genl_info. Prototype was for struct genl_dumpit_info instead include/net/genetlink.h:419: warning: Function parameter or member 'net' not described in 'genlmsg_unicast' include/net/genetlink.h:438: warning: expecting prototype for gennlmsg_data(). Prototype was for genlmsg_data() instead include/net/genetlink.h:244: warning: Function parameter or member 'op' not described in 'genl_dumpit_info' Link: https://lore.kernel.org/r/20221018231310.1040482-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-18udp: Update reuse->has_conns under reuseport_lock.Kuniyuki Iwashima
When we call connect() for a UDP socket in a reuseport group, we have to update sk->sk_reuseport_cb->has_conns to 1. Otherwise, the kernel could select a unconnected socket wrongly for packets sent to the connected socket. However, the current way to set has_conns is illegal and possible to trigger that problem. reuseport_has_conns() changes has_conns under rcu_read_lock(), which upgrades the RCU reader to the updater. Then, it must do the update under the updater's lock, reuseport_lock, but it doesn't for now. For this reason, there is a race below where we fail to set has_conns resulting in the wrong socket selection. To avoid the race, let's split the reader and updater with proper locking. cpu1 cpu2 +----+ +----+ __ip[46]_datagram_connect() reuseport_grow() . . |- reuseport_has_conns(sk, true) |- more_reuse = __reuseport_alloc(more_socks_size) | . | | |- rcu_read_lock() | |- reuse = rcu_dereference(sk->sk_reuseport_cb) | | | | | /* reuse->has_conns == 0 here */ | | |- more_reuse->has_conns = reuse->has_conns | |- reuse->has_conns = 1 | /* more_reuse->has_conns SHOULD BE 1 HERE */ | | | | | |- rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb, | | | more_reuse) | `- rcu_read_unlock() `- kfree_rcu(reuse, rcu) | |- sk->sk_state = TCP_ESTABLISHED Note the likely(reuse) in reuseport_has_conns_set() is always true, but we put the test there for ease of review. [0] For the record, usually, sk_reuseport_cb is changed under lock_sock(). The only exception is reuseport_grow() & TCP reqsk migration case. 1) shutdown() TCP listener, which is moved into the latter part of reuse->socks[] to migrate reqsk. 2) New listen() overflows reuse->socks[] and call reuseport_grow(). 3) reuse->max_socks overflows u16 with the new listener. 4) reuseport_grow() pops the old shutdown()ed listener from the array and update its sk->sk_reuseport_cb as NULL without lock_sock(). shutdown()ed TCP sk->sk_reuseport_cb can be changed without lock_sock(), but, reuseport_has_conns_set() is called only for UDP under lock_sock(), so likely(reuse) never be false in reuseport_has_conns_set(). [0]: https://lore.kernel.org/netdev/CANn89iLja=eQHbsM_Ta2sQF0tOGU8vAGrh_izRuuHjuO1ouUag@mail.gmail.com/ Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20221014182625.89913-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-16Merge tag 'random-6.1-rc1-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull more random number generator updates from Jason Donenfeld: "This time with some large scale treewide cleanups. The intent of this pull is to clean up the way callers fetch random integers. The current rules for doing this right are: - If you want a secure or an insecure random u64, use get_random_u64() - If you want a secure or an insecure random u32, use get_random_u32() The old function prandom_u32() has been deprecated for a while now and is just a wrapper around get_random_u32(). Same for get_random_int(). - If you want a secure or an insecure random u16, use get_random_u16() - If you want a secure or an insecure random u8, use get_random_u8() - If you want secure or insecure random bytes, use get_random_bytes(). The old function prandom_bytes() has been deprecated for a while now and has long been a wrapper around get_random_bytes() - If you want a non-uniform random u32, u16, or u8 bounded by a certain open interval maximum, use prandom_u32_max() I say "non-uniform", because it doesn't do any rejection sampling or divisions. Hence, it stays within the prandom_*() namespace, not the get_random_*() namespace. I'm currently investigating a "uniform" function for 6.2. We'll see what comes of that. By applying these rules uniformly, we get several benefits: - By using prandom_u32_max() with an upper-bound that the compiler can prove at compile-time is ≤65536 or ≤256, internally get_random_u16() or get_random_u8() is used, which wastes fewer batched random bytes, and hence has higher throughput. - By using prandom_u32_max() instead of %, when the upper-bound is not a constant, division is still avoided, because prandom_u32_max() uses a faster multiplication-based trick instead. - By using get_random_u16() or get_random_u8() in cases where the return value is intended to indeed be a u16 or a u8, we waste fewer batched random bytes, and hence have higher throughput. This series was originally done by hand while I was on an airplane without Internet. Later, Kees and I worked on retroactively figuring out what could be done with Coccinelle and what had to be done manually, and then we split things up based on that. So while this touches a lot of files, the actual amount of code that's hand fiddled is comfortably small" * tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: prandom: remove unused functions treewide: use get_random_bytes() when possible treewide: use get_random_u32() when possible treewide: use get_random_{u8,u16}() when possible, part 2 treewide: use get_random_{u8,u16}() when possible, part 1 treewide: use prandom_u32_max() when possible, part 2 treewide: use prandom_u32_max() when possible, part 1
2022-10-13Merge tag 'net-6.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, and wifi. Current release - regressions: - Revert "net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs", it may cause crashes when the qdisc is reconfigured - inet: ping: fix splat due to packet allocation refactoring in inet - tcp: clean up kernel listener's reqsk in inet_twsk_purge(), fix UAF due to races when per-netns hash table is used Current release - new code bugs: - eth: adin1110: check in netdev_event that netdev belongs to driver - fixes for PTR_ERR() vs NULL bugs in driver code, from Dan and co. Previous releases - regressions: - ipv4: handle attempt to delete multipath route when fib_info contains an nh reference, avoid oob access - wifi: fix handful of bugs in the new Multi-BSSID code - wifi: mt76: fix rate reporting / throughput regression on mt7915 and newer, fix checksum offload - wifi: iwlwifi: mvm: fix double list_add at iwl_mvm_mac_wake_tx_queue (other cases) - wifi: mac80211: do not drop packets smaller than the LLC-SNAP header on fast-rx Previous releases - always broken: - ieee802154: don't warn zero-sized raw_sendmsg() - ipv6: ping: fix wrong checksum for large frames - mctp: prevent double key removal and unref - tcp/udp: fix memory leaks and races around IPV6_ADDRFORM - hv_netvsc: fix race between VF offering and VF association message Misc: - remove -Warray-bounds silencing in the drivers, compilers fixed" * tag 'net-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits) sunhme: fix an IS_ERR() vs NULL check in probe net: marvell: prestera: fix a couple NULL vs IS_ERR() checks kcm: avoid potential race in kcm_tx_work tcp: Clean up kernel listener's reqsk in inet_twsk_purge() net: phy: micrel: Fixes FIELD_GET assertion openvswitch: add nf_ct_is_confirmed check before assigning the helper tcp: Fix data races around icsk->icsk_af_ops. ipv6: Fix data races around sk->sk_prot. tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct(). udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM). tcp/udp: Fix memory leak in ipv6_renew_options(). mctp: prevent double key removal and unref selftests: netfilter: Fix nft_fib.sh for all.rp_filter=1 netfilter: rpfilter/fib: Populate flowic_l3mdev field selftests: netfilter: Test reverse path filtering net/mlx5: Make ASO poll CQ usable in atomic context tcp: cdg: allow tcp_cdg_release() to be called multiple times inet: ping: fix recent breakage ipv6: ping: fix wrong checksum for large frames net: ethernet: ti: am65-cpsw: set correct devlink flavour for unused ports ...
2022-10-12tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct().Kuniyuki Iwashima
Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were able to clean them up by calling inet6_destroy_sock() during the IPv6 -> IPv4 conversion by IPV6_ADDRFORM. However, commit 03485f2adcde ("udpv6: Add lockless sendmsg() support") added a lockless memory allocation path, which could cause a memory leak: setsockopt(IPV6_ADDRFORM) sendmsg() +-----------------------+ +-------+ - do_ipv6_setsockopt(sk, ...) - udpv6_sendmsg(sk, ...) - sockopt_lock_sock(sk) ^._ called via udpv6_prot - lock_sock(sk) before WRITE_ONCE() - WRITE_ONCE(sk->sk_prot, &tcp_prot) - inet6_destroy_sock() - if (!corkreq) - sockopt_release_sock(sk) - ip6_make_skb(sk, ...) - release_sock(sk) ^._ lockless fast path for the non-corking case - __ip6_append_data(sk, ...) - ipv6_local_rxpmtu(sk, ...) - xchg(&np->rxpmtu, skb) ^._ rxpmtu is never freed. - goto out_no_dst; - lock_sock(sk) For now, rxpmtu is only the case, but not to miss the future change and a similar bug fixed in commit e27326009a3d ("net: ping6: Fix memleak in ipv6_renew_options()."), let's set a new function to IPv6 sk->sk_destruct() and call inet6_cleanup_sock() there. Since the conversion does not change sk->sk_destruct(), we can guarantee that we can clean up IPv6 resources finally. We can now remove all inet6_destroy_sock() calls from IPv6 protocol specific ->destroy() functions, but such changes are invasive to backport. So they can be posted as a follow-up later for net-next. Fixes: 03485f2adcde ("udpv6: Add lockless sendmsg() support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-12udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM).Kuniyuki Iwashima
Commit 4b340ae20d0e ("IPv6: Complete IPV6_DONTFRAG support") forgot to add a change to free inet6_sk(sk)->rxpmtu while converting an IPv6 socket into IPv4 with IPV6_ADDRFORM. After conversion, sk_prot is changed to udp_prot and ->destroy() never cleans it up, resulting in a memory leak. This is due to the discrepancy between inet6_destroy_sock() and IPV6_ADDRFORM, so let's call inet6_destroy_sock() from IPV6_ADDRFORM to remove the difference. However, this is not enough for now because rxpmtu can be changed without lock_sock() after commit 03485f2adcde ("udpv6: Add lockless sendmsg() support"). We will fix this case in the following patch. Note we will rename inet6_destroy_sock() to inet6_cleanup_sock() and remove unnecessary inet6_destroy_sock() calls in sk_prot->destroy() in the future. Fixes: 4b340ae20d0e ("IPv6: Complete IPV6_DONTFRAG support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-12mac802154: Drop IEEE802154_HW_RX_DROP_BAD_CKSUMMiquel Raynal
This IEEE802154_HW_RX_DROP_BAD_CKSUM flag was only used by hwsim to reflect the fact that it would not validate the checksum (FCS). So this was only useful while the only filtering level hwsim was capable of was "NONE". Now that the driver has been improved we no longer need this flag. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20221007085310.503366-7-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-10-12ieee802154: hwsim: Implement address filteringMiquel Raynal
We have access to the address filters being theoretically applied, we also have access to the actual filtering level applied, so let's add a proper frame validation sequence in hwsim. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20221007085310.503366-6-miquel.raynal@bootlin.com [stefan@datenfreihafen.org: fixup some checkpatch warnings] Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-10-12mac802154: set filter at drv_start()Alexander Aring
The current filtering level is set on the first interface up on a wpan phy. If we support scan functionality we need to change the filtering level on the fly on an operational phy and switching back again. This patch will move the receive mode parameter e.g. address filter and promiscuous mode to the drv_start() functionality to allow changing the receive mode on an operational phy not on first ifup only. In future this should be handled on driver layer because each hardware has it's own way to enter a specific filtering level. However this should offer to switch to mode IEEE802154_FILTERING_NONE and back to IEEE802154_FILTERING_4_FRAME_FIELDS. Only IEEE802154_FILTERING_4_FRAME_FIELDS and IEEE802154_FILTERING_NONE are somewhat supported by current hardware. All other filtering levels can be supported in future but will end in IEEE802154_FILTERING_NONE as the receive part can kind of "emulate" those receive paths by doing additional filtering routines. There are in total three filtering levels in the code: - the per-interface default level (should not be changed) - the required per-interface level (mac commands may play with it) - the actual per-PHY (hw) level that is currently in use Signed-off-by: Alexander Aring <aahringo@redhat.com> [<miquel.raynal@bootlin.com: Add the third filtering variable] Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20221007085310.503366-4-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-10-11treewide: use get_random_u32() when possibleJason A. Donenfeld
The prandom_u32() function has been a deprecated inline wrapper around get_random_u32() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. The same also applies to get_random_int(), which is just a wrapper around get_random_u32(). This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Acked-by: Helge Deller <deller@gmx.de> # for parisc Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-10Merge tag '9p-for-6.1' of https://github.com/martinetd/linuxLinus Torvalds
Pull 9p updates from Dominique Martinet: "Smaller buffers for small messages and fixes. The highlight of this is Christian's patch to allocate smaller buffers for most metadata requests: 9p with a big msize would try to allocate large buffers when just 4 or 8k would be more than enough; this brings in nice performance improvements. There's also a few fixes for problems reported by syzkaller (thanks to Schspa Shi, Tetsuo Handa for tests and feedback/patches) as well as some minor cleanup" * tag '9p-for-6.1' of https://github.com/martinetd/linux: net/9p: clarify trans_fd parse_opt failure handling net/9p: add __init/__exit annotations to module init/exit funcs net/9p: use a dedicated spinlock for trans_fd 9p/trans_fd: always use O_NONBLOCK read/write net/9p: allocate appropriate reduced message buffers net/9p: add 'pooled_rbuffers' flag to struct p9_trans_module net/9p: add p9_msg_buf_size() 9p: add P9_ERRMAX for 9p2000 and 9p2000.u net/9p: split message size argument into 't_size' and 'r_size' pair 9p: trans_fd/p9_conn_cancel: drop client lock earlier
2022-10-10Merge remote-tracking branch 'wireless/main' into wireless-nextJohannes Berg
Pull in wireless/main content since some new code would otherwise conflict with it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-10wifi: mac80211: add internal handler for wake_tx_queueAlexander Wetzel
Start to align the TX handling to only use internal TX queues (iTXQs): Provide a handler for drivers not having a custom wake_tx_queue callback and update the documentation. Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07cfg80211: Update Transition Disable policy during port authorizationVinayak Yadawad
In case of 4way handshake offload, transition disable policy updated by the AP during EAPOL 3/4 is not updated to the upper layer. This results in mismatch between transition disable policy between the upper layer and the driver. This patch addresses this issue by updating transition disable policy as part of port authorization indication. Signed-off-by: Vinayak Yadawad <vinayak.yadawad@broadcom.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: add RCU _check() link access variantsJohannes Berg
We might sometimes need to use RCU and locking in the same code path, so add the two variants link_conf_dereference_check() and link_sta_dereference_check(). Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: nl80211: use link ID in NL80211_CMD_SET_BSSJohannes Berg
We clearly need the link ID here, to know the right BSS to configure. Use/require it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: cfg80211: support reporting failed linksJohannes Berg
For assoc and connect result APIs, support reporting failed links; they should still come with the BSS pointer in the case of assoc, so they're released correctly. In the case of connect result, this is optional. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: add API to show the link STAs in debugfsBenjamin Berg
Create debugfs data per-link. For drivers, there is a new operation link_sta_add_debugfs which will always be called. For non-MLO, the station directory will be used directly rather than creating a corresponding subdirectory. As such, non-MLO drivers can simply continue to create the data from sta_debugfs_add. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> [add missing inlines if !CONFIG_MAC80211_DEBUGFS] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: add pointer from link STA to STABenjamin Berg
While often not needed, this considerably simplifies going from a link to the STA. This helps in cases such as debugfs where a single pointer should allow accessing a specific link and the STA. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07net: ieee802154: return -EINVAL for unknown addr typeAlexander Aring
This patch adds handling to return -EINVAL for an unknown addr type. The current behaviour is to return 0 as successful but the size of an unknown addr type is not defined and should return an error like -EINVAL. Fixes: 94160108a70c ("net/ieee802154: fix uninit value bug in dgram_sendmsg") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-05net/9p: add 'pooled_rbuffers' flag to struct p9_trans_moduleChristian Schoenebeck
This is a preparatory change for the subsequent patch: the RDMA transport pulls the buffers for its 9p response messages from a shared pool. [1] So this case has to be considered when choosing an appropriate response message size in the subsequent patch. Link: https://lore.kernel.org/all/Ys3jjg52EIyITPua@codewreck.org/ [1] Link: https://lkml.kernel.org/r/79d24310226bc4eb037892b5c097ec4ad4819a03.1657920926.git.linux_oss@crudebyte.com Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-10-059p: add P9_ERRMAX for 9p2000 and 9p2000.uChristian Schoenebeck
Add P9_ERRMAX macro to 9P protocol header which reflects the maximum error string length of Rerror replies for 9p2000 and 9p2000.u protocol versions. Unfortunately a maximum error string length is not defined by the 9p2000 spec, picking 128 as value for now, as this seems to be a common max. size for POSIX error strings in practice. 9p2000.L protocol version uses Rlerror replies instead which does not contain an error string. Link: https://lkml.kernel.org/r/3f23191d21032e7c14852b1e1a4ae26417a36739.1657920926.git.linux_oss@crudebyte.com Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-10-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Merge in the left-over fixes before the net-next pull-request. Conflicts: drivers/net/ethernet/mediatek/mtk_ppe.c ae3ed15da588 ("net: ethernet: mtk_eth_soc: fix state in __mtk_foe_entry_clear") 9d8cb4c096ab ("net: ethernet: mtk_eth_soc: add foe_entry_size to mtk_eth_soc") https://lore.kernel.org/all/6cb6893b-4921-a068-4c30-1109795110bb@tessares.net/ kernel/bpf/helpers.c 8addbfc7b308 ("bpf: Gate dynptr API behind CAP_BPF") 5679ff2f138f ("bpf: Move bpf_loop and bpf_for_each_map_elem under CAP_BPF") 8a67f2de9b1d ("bpf: expose bpf_strtol and bpf_strtoul to all program types") https://lore.kernel.org/all/20221003201957.13149-1-daniel@iogearbox.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf 2022-10-03 We've added 10 non-merge commits during the last 23 day(s) which contain a total of 14 files changed, 130 insertions(+), 69 deletions(-). The main changes are: 1) Fix dynptr helper API to gate behind CAP_BPF given it was not intended for unprivileged BPF programs, from Kumar Kartikeya Dwivedi. 2) Fix need_wakeup flag inheritance from umem buffer pool for shared xsk sockets, from Jalal Mostafa. 3) Fix truncated last_member_type_id in btf_struct_resolve() which had a wrong storage type, from Lorenz Bauer. 4) Fix xsk back-pressure mechanism on tx when amount of produced descriptors to CQ is lower than what was grabbed from xsk tx ring, from Maciej Fijalkowski. 5) Fix wrong cgroup attach flags being displayed to effective progs, from Pu Lehui. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: xsk: Inherit need_wakeup flag for shared sockets bpf: Gate dynptr API behind CAP_BPF selftests/bpf: Adapt cgroup effective query uapi change bpftool: Fix wrong cgroup attach flags being assigned to effective progs bpf, cgroup: Reject prog_attach_flags array when effective query bpf: Ensure correct locking around vulnerable function find_vpid() bpf: btf: fix truncated last_member_type_id in btf_struct_resolve selftests/xsk: Add missing close() on netns fd xsk: Fix backpressure mechanism on Tx MAINTAINERS: Add include/linux/tnum.h to BPF CORE ==================== Link: https://lore.kernel.org/r/20221003201957.13149-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-10-03 We've added 143 non-merge commits during the last 27 day(s) which contain a total of 151 files changed, 8321 insertions(+), 1402 deletions(-). The main changes are: 1) Add kfuncs for PKCS#7 signature verification from BPF programs, from Roberto Sassu. 2) Add support for struct-based arguments for trampoline based BPF programs, from Yonghong Song. 3) Fix entry IP for kprobe-multi and trampoline probes under IBT enabled, from Jiri Olsa. 4) Batch of improvements to veristat selftest tool in particular to add CSV output, a comparison mode for CSV outputs and filtering, from Andrii Nakryiko. 5) Add preparatory changes needed for the BPF core for upcoming BPF HID support, from Benjamin Tissoires. 6) Support for direct writes to nf_conn's mark field from tc and XDP BPF program types, from Daniel Xu. 7) Initial batch of documentation improvements for BPF insn set spec, from Dave Thaler. 8) Add a new BPF_MAP_TYPE_USER_RINGBUF map which provides single-user-space-producer / single-kernel-consumer semantics for BPF ring buffer, from David Vernet. 9) Follow-up fixes to BPF allocator under RT to always use raw spinlock for the BPF hashtab's bucket lock, from Hou Tao. 10) Allow creating an iterator that loops through only the resources of one task/thread instead of all, from Kui-Feng Lee. 11) Add support for kptrs in the per-CPU arraymap, from Kumar Kartikeya Dwivedi. 12) Add a new kfunc helper for nf to set src/dst NAT IP/port in a newly allocated CT entry which is not yet inserted, from Lorenzo Bianconi. 13) Remove invalid recursion check for struct_ops for TCP congestion control BPF programs, from Martin KaFai Lau. 14) Fix W^X issue with BPF trampoline and BPF dispatcher, from Song Liu. 15) Fix percpu_counter leakage in BPF hashtab allocation error path, from Tetsuo Handa. 16) Various cleanups in BPF selftests to use preferred ASSERT_* macros, from Wang Yufen. 17) Add invocation for cgroup/connect{4,6} BPF programs for ICMP pings, from YiFei Zhu. 18) Lift blinding decision under bpf_jit_harden = 1 to bpf_capable(), from Yauheni Kaliuta. 19) Various libbpf fixes and cleanups including a libbpf NULL pointer deref, from Xin Liu. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (143 commits) net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c Documentation: bpf: Add implementation notes documentations to table of contents bpf, docs: Delete misformatted table. selftests/xsk: Fix double free bpftool: Fix error message of strerror libbpf: Fix overrun in netlink attribute iteration selftests/bpf: Fix spelling mistake "unpriviledged" -> "unprivileged" samples/bpf: Fix typo in xdp_router_ipv4 sample bpftool: Remove unused struct event_ring_info bpftool: Remove unused struct btf_attach_point bpf, docs: Add TOC and fix formatting. bpf, docs: Add Clang note about BPF_ALU bpf, docs: Move Clang notes to a separate file bpf, docs: Linux byteswap note bpf, docs: Move legacy packet instructions to a separate file selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION) bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another function bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt() bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline ... ==================== Link: https://lore.kernel.org/r/20221003194915.11847-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.cLorenzo Bianconi
Remove circular dependency between nf_nat module and nf_conntrack one moving bpf_ct_set_nat_info kfunc in nf_nat_bpf.c Fixes: 0fabd2aa199f ("net: netfilter: add bpf_ct_set_nat_info kfunc helper") Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Yauheni Kaliuta <ykaliuta@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/51a65513d2cda3eeb0754842e8025ab3966068d8.1664490511.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-03net: Remove DECnet leftovers from flow.h.Guillaume Nault
DECnet was removed by commit 1202cdd66531 ("Remove DECnet support from kernel"). Let's also revome its flow structure. Compile-tested only (allmodconfig). Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03net: Add helper function to parse netlink msg of ip_tunnel_parmLiu Jian
Add ip_tunnel_netlink_parms to parse netlink msg of ip_tunnel_parm. Reduces duplicate code, no actual functional changes. Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03net: Add helper function to parse netlink msg of ip_tunnel_encapLiu Jian
Add ip_tunnel_netlink_encap_parms to parse netlink msg of ip_tunnel_encap. Reduces duplicate code, no actual functional changes. Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-03Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== 1) Refactor selftests to use an array of structs in xfrm_fill_key(). From Gautam Menghani. 2) Drop an unused argument from xfrm_policy_match. From Hongbin Wang. 3) Support collect metadata mode for xfrm interfaces. From Eyal Birger. 4) Add netlink extack support to xfrm. From Sabrina Dubroca. Please note, there is a merge conflict in: include/net/dst_metadata.h between commit: 0a28bfd4971f ("net/macsec: Add MACsec skb_metadata_dst Tx Data path support") from the net-next tree and commit: 5182a5d48c3d ("net: allow storing xfrm interface metadata in metadata_dst") from the ipsec-next tree. Can be solved as done in linux-next. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-02net: sched: cls_api: introduce tc_cls_bind_class() helperZhengchao Shao
All the bind_class callback duplicate the same logic, this patch introduces tc_cls_bind_class() helper for common usage. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30net: dsa: remove bool devlink_port_setupVladimir Oltean
Since dsa_port_devlink_setup() and dsa_port_devlink_teardown() are already called from code paths which only execute once per port (due to the existing bool dp->setup), keeping another dp->devlink_port_setup is redundant, because we can already manage to balance the calls properly (and not call teardown when setup was never called, or call setup twice, or things like that). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net: devlink: add port_init/fini() helpers to allow ↵Jiri Pirko
pre-register/post-unregister functions Lifetime of some of the devlink objects, like regions, is currently forced to be different for devlink instance and devlink port instance (per-port regions). The reason is that for devlink ports, the internal structures initialization happens only after devlink_port_register() is called. To resolve this inconsistency, introduce new set of helpers to allow driver to initialize devlink pointer and region list before devlink_register() is called. That allows port regions to be created before devlink port registration and destroyed after devlink port unregistration. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30net: devlink: introduce a flag to indicate devlink port being registeredJiri Pirko
Instead of relying on devlink pointer not being initialized, introduce an extra flag to indicate if devlink port is registered. This is needed as later on devlink pointer is going to be initialized even in case devlink port is not registered yet. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30Merge tag 'for-net-next-2022-09-30' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next - Add RTL8761BUV device (Edimax BT-8500) - Add a new PID/VID 13d3/3583 for MT7921 - Add Realtek RTL8852C support ID 0x13D3:0x3592 - Add VID/PID 0489/e0e0 for MediaTek MT7921 - Add a new VID/PID 0e8d/0608 for MT7921 - Add a new PID/VID 13d3/3578 for MT7921 - Add BT device 0cb8:c549 from RTW8852AE - Add support for Intel Magnetor * tag 'for-net-next-2022-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (49 commits) Bluetooth: hci_sync: Fix not indicating power state Bluetooth: L2CAP: Fix user-after-free Bluetooth: Call shutdown for HCI_USER_CHANNEL Bluetooth: Prevent double register of suspend Bluetooth: hci_core: Fix not handling link timeouts propertly Bluetooth: hci_event: Make sure ISO events don't affect non-ISO connections Bluetooth: hci_debugfs: Fix not checking conn->debugfs Bluetooth: hci_sysfs: Fix attempting to call device_add multiple times Bluetooth: MGMT: fix zalloc-simple.cocci warnings Bluetooth: hci_{ldisc,serdev}: check percpu_init_rwsem() failure Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works Bluetooth: L2CAP: initialize delayed works at l2cap_chan_create() Bluetooth: RFCOMM: Fix possible deadlock on socket shutdown/release Bluetooth: hci_sync: allow advertise when scan without RPA Bluetooth: btusb: Add a new VID/PID 0e8d/0608 for MT7921 Bluetooth: btusb: Add a new PID/VID 13d3/3583 for MT7921 Bluetooth: avoid hci_dev_test_and_set_flag() in mgmt_init_hdev() Bluetooth: btintel: Mark Intel controller to support LE_STATES quirk Bluetooth: btintel: Add support for Magnetor Bluetooth: btusb: Add a new PID/VID 13d3/3578 for MT7921 ... ==================== Link: https://lore.kernel.org/r/20221001004602.297366-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30Merge tag 'wireless-next-2022-09-30' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.1 Few stack changes and lots of driver changes in this round. brcmfmac has more activity as usual and it gets new hardware support. ath11k improves WCN6750 support and also other smaller features. And of course changes all over. Note: in early September wireless tree was merged to wireless-next to avoid some conflicts with mac80211 patches, this shouldn't cause any problems but wanted to mention anyway. Major changes: mac80211 - refactoring and preparation for Wi-Fi 7 Multi-Link Operation (MLO) feature continues brcmfmac - support CYW43439 SDIO chipset - support BCM4378 on Apple platforms - support CYW89459 PCIe chipset rtw89 - more work to get rtw8852c supported - P2P support - support for enabling and disabling MSDU aggregation via nl80211 mt76 - tx status reporting improvements ath11k - cold boot calibration support on WCN6750 - Target Wake Time (TWT) debugfs support for STA interface - support to connect to a non-transmit MBSSID AP profile - enable remain-on-channel support on WCN6750 - implement SRAM dump debugfs interface - enable threaded NAPI on all hardware - WoW support for WCN6750 - support to provide transmit power from firmware via nl80211 - support to get power save duration for each client - spectral scan support for 160 MHz wcn36xx - add SNR from a received frame as a source of system entropy * tag 'wireless-next-2022-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (231 commits) wifi: rtl8xxxu: Improve rtl8xxxu_queue_select wifi: rtl8xxxu: Fix AIFS written to REG_EDCA_*_PARAM wifi: rtl8xxxu: gen2: Enable 40 MHz channel width wifi: rtw89: 8852b: configure DLE mem wifi: rtw89: check DLE FIFO size with reserved size wifi: rtw89: mac: correct register of report IMR wifi: rtw89: pci: set power cut closed for 8852be wifi: rtw89: pci: add to do PCI auto calibration wifi: rtw89: 8852b: implement chip_ops::{enable,disable}_bb_rf wifi: rtw89: add DMA busy checking bits to chip info wifi: rtw89: mac: define DMA channel mask to avoid unsupported channels wifi: rtw89: pci: mask out unsupported TX channels iwlegacy: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper ipw2x00: Replace zero-length array with DECLARE_FLEX_ARRAY() helper wifi: iwlwifi: Track scan_cmd allocation size explicitly brcmfmac: Remove the call to "dtim_assoc" IOVAR brcmfmac: increase dcmd maximum buffer size brcmfmac: Support 89459 pcie brcmfmac: increase default max WOWL patterns to 16 cw1200: fix incorrect check to determine if no element is found in list ... ==================== Link: https://lore.kernel.org/r/20220930150413.A7984C433D6@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30xsk: Remove unused xsk_buff_discardMaxim Mikityanskiy
The previous commit removed the last usage of xsk_buff_discard in mlx5e, so the function that is no longer used can be removed. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> CC: "Björn Töpel" <bjorn@kernel.org> CC: Magnus Karlsson <magnus.karlsson@intel.com> CC: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30xsk: Expose min chunk size to driversMaxim Mikityanskiy
Drivers should be aware of the range of valid UMEM chunk sizes to be able to allocate their internal structures of an appropriate size. It will be used by mlx5e in the following patches. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> CC: "Björn Töpel" <bjorn@kernel.org> CC: Magnus Karlsson <magnus.karlsson@intel.com> CC: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limitedNeal Cardwell
This commit fixes a bug in the tracking of max_packets_out and is_cwnd_limited. This bug can cause the connection to fail to remember that is_cwnd_limited is true, causing the connection to fail to grow cwnd when it should, causing throughput to be lower than it should be. The following event sequence is an example that triggers the bug: (a) The connection is cwnd_limited, but packets_out is not at its peak due to TSO deferral deciding not to send another skb yet. In such cases the connection can advance max_packets_seq and set tp->is_cwnd_limited to true and max_packets_out to a small number. (b) Then later in the round trip the connection is pacing-limited (not cwnd-limited), and packets_out is larger. In such cases the connection would raise max_packets_out to a bigger number but (unexpectedly) flip tp->is_cwnd_limited from true to false. This commit fixes that bug. One straightforward fix would be to separately track (a) the next window after max_packets_out reaches a maximum, and (b) the next window after tp->is_cwnd_limited is set to true. But this would require consuming an extra u32 sequence number. Instead, to save space we track only the most important information. Specifically, we track the strongest available signal of the degree to which the cwnd is fully utilized: (1) If the connection is cwnd-limited then we remember that fact for the current window. (2) If the connection not cwnd-limited then we track the maximum number of outstanding packets in the current window. In particular, note that the new logic cannot trigger the buggy (a)/(b) sequence above because with the new logic a condition where tp->packets_out > tp->max_packets_out can only trigger an update of tp->is_cwnd_limited if tp->is_cwnd_limited is false. This first showed up in a testing of a BBRv2 dev branch, but this buggy behavior highlighted a general issue with the tcp_cwnd_validate() logic that can cause cwnd to fail to increase at the proper rate for any TCP congestion control, including Reno or CUBIC. Fixes: ca8a22634381 ("tcp: make cwnd-limited checks measurement-based, and gentler") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Kevin(Yudong) Yang <yyd@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>