summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-06net: omit ndo_hwtstamp_get() call when possible in dev_set_hwtstamp_phylib()Vladimir Oltean
Setting dev->priv_flags & IFF_SEE_ALL_HWTSTAMP_REQUESTS is only legal for drivers which were converted to ndo_hwtstamp_get() and ndo_hwtstamp_set(), and it is only there that we call ndo_hwtstamp_set() for a request that otherwise goes to phylib (for stuff like packet traps, which need to be undone if phylib failed, hence the old_cfg logic). The problem is that we end up calling ndo_hwtstamp_get() when we don't need to (even if the SIOCSHWTSTAMP wasn't intended for phylib, or if it was, but the driver didn't set IFF_SEE_ALL_HWTSTAMP_REQUESTS). For those unnecessary conditions, we share a code path with virtual drivers (vlan, macvlan, bonding) where ndo_hwtstamp_get() is implemented as generic_hwtstamp_get_lower(), and may be resolved through generic_hwtstamp_ioctl_lower() if the lower device is unconverted. I.e. this situation: $ ip link add link eno0 name eno0.100 type vlan id 100 $ hwstamp_ctl -i eno0.100 -t 1 We are unprepared to deal with this, because if ndo_hwtstamp_get() is resolved through a legacy ndo_eth_ioctl(SIOCGHWTSTAMP) lower_dev implementation, that needs a non-NULL old_cfg.ifr pointer, and we don't have it. But we don't even need to deal with it either. In the general case, drivers may not even implement SIOCGHWTSTAMP handling, only SIOCSHWTSTAMP, so it makes sense to completely avoid a SIOCGHWTSTAMP call if we can. The solution is to split the single "if" condition into 3 smaller ones, thus separating the decision to call ndo_hwtstamp_get() from the decision to call ndo_hwtstamp_set(). The third "if" condition is identical to the first one, and both are subsets of the second one. Thus, the "cfg" argument of kernel_hwtstamp_config_changed() is always valid. Reported-by: Eric Dumazet <edumazet@google.com> Closes: https://lore.kernel.org/netdev/CANn89iLOspJsvjPj+y8jikg7erXDomWe8sqHMdfL_2LQSFrPAg@mail.gmail.com/ Fixes: fd770e856e22 ("net: remove phy_has_hwtstamp() -> phy_mii_ioctl() decision from converted drivers") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06net: ethernet: adi: adin1110: use eth_broadcast_addr() to assign broadcast ↵Yang Yingliang
address Use eth_broadcast_addr() to assign broadcast address instead of memset(). Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06ibmvnic: remove unused rc variableYu Liao
gcc with W=1 reports drivers/net/ethernet/ibm/ibmvnic.c:194:13: warning: variable 'rc' set but not used [-Wunused-but-set-variable] ^ This variable is not used so remove it. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202308040609.zQsSXWXI-lkp@intel.com/ Signed-off-by: Yu Liao <liaoyu15@huawei.com> Reviewed-by: Nick Child <nnac123@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06macsec: use DEV_STATS_INC()Eric Dumazet
syzbot/KCSAN reported data-races in macsec whenever dev->stats fields are updated. It appears all of these updates can happen from multiple cpus. Adopt SMP safe DEV_STATS_INC() to update dev->stats fields. Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06net: mana: Add page pool for RX buffersHaiyang Zhang
Add page pool for RX buffers for faster buffer cycle and reduce CPU usage. The standard page pool API is used. With iperf and 128 threads test, this patch improved the throughput by 12-15%, and decreased the IRQ associated CPU's usage from 99-100% to 10-50%. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06Merge branch 'gve-desc'David S. Miller
Rushil Gupta says: ==================== gve: Add QPL mode for DQO descriptor format GVE supports QPL ("queue-page-list") mode where all data is communicated through a set of pre-registered pages. Adding this mode to DQO. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06gve: update gve.rstRushil Gupta
Add a note about QPL and RDA mode Signed-off-by: Rushil Gupta <rushilg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06gve: RX path for DQO-QPLRushil Gupta
The RX path allocates the QPL page pool at queue creation, and tries to reuse these pages through page recycling. This patch ensures that on refill no non-QPL pages are posted to the device. When the driver is running low on free buffers, an ondemand allocation step kicks in that allocates a non-qpl page for SKB business to free up the QPL page in use. gve_try_recycle_buf was moved to gve_rx_append_frags so that driver does not attempt to mark buffer as used if a non-qpl page was allocated ondemand. Signed-off-by: Rushil Gupta <rushilg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06gve: Tx path for DQO-QPLRushil Gupta
Each QPL page is divided into GVE_TX_BUFS_PER_PAGE_DQO buffers. When a packet needs to be transmitted, we break the packet into max GVE_TX_BUF_SIZE_DQO sized chunks and transmit each chunk using a TX descriptor. We allocate the TX buffers from the free list in dqo_tx. We store these TX buffer indices in an array in the pending_packet structure. The TX buffers are returned to the free list in dqo_compl after receiving packet completion or when removing packets from miss completions list. Signed-off-by: Rushil Gupta <rushilg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06gve: Control path for DQO-QPLRushil Gupta
GVE supports QPL ("queue-page-list") mode where all data is communicated through a set of pre-registered pages. Adding this mode to DQO descriptor format. Add checks, abi-changes and device options to support QPL mode for DQO in addition to GQI. Also, use pages-per-qpl supplied by device-option to control the size of the "queue-page-list". Signed-off-by: Rushil Gupta <rushilg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06net: tls: avoid discarding data on record closeJakub Kicinski
TLS records end with a 16B tag. For TLS device offload we only need to make space for this tag in the stream, the device will generate and replace it with the actual calculated tag. Long time ago the code would just re-reference the head frag which mostly worked but was suboptimal because it prevented TCP from combining the record into a single skb frag. I'm not sure if it was correct as the first frag may be shorter than the tag. The commit under fixes tried to replace that with using the page frag and if the allocation failed rolling back the data, if record was long enough. It achieves better fragment coalescing but is also buggy. We don't roll back the iterator, so unless we're at the end of send we'll skip the data we designated as tag and start the next record as if the rollback never happened. There's also the possibility that the record was constructed with MSG_MORE and the data came from a different syscall and we already told the user space that we "got it". Allocate a single dummy page and use it as fallback. Found by code inspection, and proven by forcing allocation failures. Fixes: e7b159a48ba6 ("net/tls: remove the record tail optimization") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06Merge branch 'tcp-options-lockless'David S. Miller
Eric Dumazet says: ==================== tcp: set few options locklessly This series is avoiding the socket lock for six TCP options. They are not heavily used, but this exercise can give ideas for other parts of TCP/IP stack :) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06tcp: set TCP_DEFER_ACCEPT locklesslyEric Dumazet
rskq_defer_accept field can be read/written without the need of holding the socket lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06tcp: set TCP_LINGER2 locklesslyEric Dumazet
tp->linger2 can be set locklessly as long as readers use READ_ONCE(). Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06tcp: set TCP_KEEPCNT locklesslyEric Dumazet
tp->keepalive_probes can be set locklessly, readers are already taking care of this field being potentially set by other threads. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06tcp: set TCP_KEEPINTVL locklesslyEric Dumazet
tp->keepalive_intvl can be set locklessly, readers are already taking care of this field being potentially set by other threads. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06tcp: set TCP_USER_TIMEOUT locklesslyEric Dumazet
icsk->icsk_user_timeout can be set locklessly, if all read sides use READ_ONCE(). Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06tcp: set TCP_SYNCNT locklesslyEric Dumazet
icsk->icsk_syn_retries can safely be set without locking the socket. We have to add READ_ONCE() annotations in tcp_fastopen_synack_timer() and tcp_write_timeout(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-05Merge tag 'rust-fixes-6.5-rc5' of https://github.com/Rust-for-Linux/linuxLinus Torvalds
Pull rust fixes from Miguel Ojeda: - Allocator: prevent mis-aligned allocation - Types: delete 'ForeignOwnable::borrow_mut'. A sound replacement is planned for the merge window - Build: fix bindgen error with UBSAN_BOUNDS_STRICT * tag 'rust-fixes-6.5-rc5' of https://github.com/Rust-for-Linux/linux: rust: fix bindgen build error with UBSAN_BOUNDS_STRICT rust: delete `ForeignOwnable::borrow_mut` rust: allocator: Prevent mis-aligned allocation
2023-08-05ksmbd: fix wrong next length validation of ea buffer in smb2_set_ea()Namjae Jeon
There are multiple smb2_ea_info buffers in FILE_FULL_EA_INFORMATION request from client. ksmbd find next smb2_ea_info using ->NextEntryOffset of current smb2_ea_info. ksmbd need to validate buffer length Before accessing the next ea. ksmbd should check buffer length using buf_len, not next variable. next is the start offset of current ea that got from previous ea. Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21598 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-05ksmbd: validate command request sizeLong Li
In commit 2b9b8f3b68ed ("ksmbd: validate command payload size"), except for SMB2_OPLOCK_BREAK_HE command, the request size of other commands is not checked, it's not expected. Fix it by add check for request size of other commands. Cc: stable@vger.kernel.org Fixes: 2b9b8f3b68ed ("ksmbd: validate command payload size") Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Long Li <leo.lilong@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-05Merge tag 'ata-6.5-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull ata fix from Damien Le Moal: - Prevent the scsi disk driver from issuing a START STOP UNIT command for ATA devices during system resume as this causes various issues reported by multiple users. * tag 'ata-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: ata,scsi: do not issue START STOP UNIT on resume
2023-08-05Merge tag '6.5-rc4-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fix from Steve French: - Fix DFS interlink problem (different namespace) * tag '6.5-rc4-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6: smb: client: fix dfs link mount against w2k8
2023-08-05Merge tag 'powerpc-6.5-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix vmemmap altmap boundary check which could cause memory hotunplug failure - Create a dummy stackframe to fix ftrace stack unwind - Fix secondary thread bringup for Book3E ELFv2 kernels - Use early_ioremap/unmap() in via_calibrate_decr() Thanks to Aneesh Kumar K.V, Benjamin Gray, Christophe Leroy, David Hildenbrand, and Naveen N Rao. * tag 'powerpc-6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/powermac: Use early_* IO variants in via_calibrate_decr() powerpc/64e: Fix secondary thread bringup for ELFv2 kernels powerpc/ftrace: Create a dummy stackframe to fix stack unwind powerpc/mm/altmap: Fix altmap boundary check
2023-08-05Merge tag 'parisc-for-6.5-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc architecture fixes from Helge Deller: - early fixmap preallocation to fix boot failures on kernel >= 6.4 - remove DMA leftover code in parport_gsc - drop old comments and code style fixes * tag 'parisc-for-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: unaligned: Add required spaces after ',' parport: gsc: remove DMA leftover code parisc: pci-dma: remove unused and dead EISA code and comment parisc/mm: preallocate fixmap page tables at init
2023-08-04Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A few clk driver fixes for some SoC clk drivers: - Change a usleep() to udelay() to avoid scheduling while atomic in the Amlogic PLL code - Revert a patch to the Mediatek MT8183 driver that caused an out-of-bounds write - Return the right error value when devm_of_iomap() fails in imx93_clocks_probe() - Constrain the Kconfig for the fixed mmio clk so that it depends on HAS_IOMEM and can't be compiled on architectures such as s390" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: fixed-mmio: make COMMON_CLK_FIXED_MMIO depend on HAS_IOMEM clk: imx93: Propagate correct error in imx93_clocks_probe() clk: mediatek: mt8183: Add back SSPM related clocks clk: meson: change usleep_range() to udelay() for atomic context
2023-08-04Merge tag 'wireless-next-2023-08-04' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.6 The first pull request for v6.6 and only driver patches this time. Nothing special really standing out, it has been quiet most likely due to vacations. Major changes: rtl8xxxu - enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU), RTL8192EU and RTL8723BU mwifiex - allow moving to a different namespace mt76 - preparation for mt7925 support - mt7981 support ath12k - Extremely High Throughput (EHT) PHY support for Wi-Fi 7 * tag 'wireless-next-2023-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (172 commits) wifi: rtw89: return failure if needed firmware elements are not recognized wifi: rtw89: add to parse firmware elements of BB and RF tables wifi: rtw89: introduce infrastructure of firmware elements wifi: rtw89: add firmware suit for BB MCU 0/1 wifi: rtw89: add firmware parser for v1 format wifi: rtw89: introduce v1 format of firmware header wifi: rtw89: support firmware log with formatted text wifi: rtw89: recognize log format from firmware file wifi: ath12k: avoid deadlock by change ieee80211_queue_work for regd_update_work wifi: ath12k: add handler for scan event WMI_SCAN_EVENT_DEQUEUED wifi: ath12k: relax list iteration in ath12k_mac_vif_unref() wifi: ath12k: configure puncturing bitmap wifi: ath12k: parse WMI service ready ext2 event wifi: ath12k: add MLO header in peer association wifi: ath12k: peer assoc for 320 MHz wifi: ath12k: add WMI support for EHT peer wifi: ath12k: prepare EHT peer assoc parameters wifi: ath12k: add EHT PHY modes wifi: ath12k: propagate EHT capabilities to userspace wifi: ath12k: WMI support to process EHT capabilities ... ==================== Link: https://lore.kernel.org/r/87msz7j942.fsf@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-04Merge branch 'tcp-disable-header-prediction-for-md5'Jakub Kicinski
Kuniyuki Iwashima says: ==================== tcp: Disable header prediction for MD5. The 1st patch disable header prediction for MD5 flow and the 2nd patch updates the stale comment in tcp_parse_options(). ==================== Link: https://lore.kernel.org/r/20230803224552.69398-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-04tcp: Update stale comment for MD5 in tcp_parse_options().Kuniyuki Iwashima
Since commit 9ea88a153001 ("tcp: md5: check md5 signature without socket lock"), the MD5 option is checked in tcp_v[46]_rcv(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230803224552.69398-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-04tcp: Disable header prediction for MD5 flow.Kuniyuki Iwashima
TCP socket saves the minimum required header length in tcp_header_len of struct tcp_sock, and later the value is used in __tcp_fast_path_on() to generate a part of TCP header in tcp_sock(sk)->pred_flags. In tcp_rcv_established(), if the incoming packet has the same pattern with pred_flags, we enter the fast path and skip full option parsing. The MD5 option is parsed in tcp_v[46]_rcv(), so we need not parse it again later in tcp_rcv_established() unless other options exist. We add TCPOLEN_MD5SIG_ALIGNED to tcp_header_len in two paths to avoid the slow path. For passive open connections with MD5, we add TCPOLEN_MD5SIG_ALIGNED to tcp_header_len in tcp_create_openreq_child() after 3WHS. On the other hand, we do it in tcp_connect_init() for active open connections. However, the value is overwritten while processing SYN+ACK or crossed SYN in tcp_rcv_synsent_state_process(). These two cases will have the wrong value in pred_flags and never go into the fast path. We could update tcp_header_len in tcp_rcv_synsent_state_process(), but a test with slightly modified netperf which uses MD5 for each flow shows that the slow path is actually a bit faster than the fast path. On c5.4xlarge EC2 instance (16 vCPU, 32 GiB mem) $ for i in {1..10}; do ./super_netperf $(nproc) -H localhost -l 10 -- -m 256 -M 256; done Avg of 10 * 36e68eadd303 : 10.376 Gbps * all fast path : 10.374 Gbps (patch v2, See Link) * all slow path : 10.394 Gbps The header prediction is not worth adding complexity for MD5, so let's disable it for MD5. Link: https://lore.kernel.org/netdev/20230803042214.38309-1-kuniyu@amazon.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230803224552.69398-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-04dccp: fix data-race around dp->dccps_mss_cacheEric Dumazet
dccp_sendmsg() reads dp->dccps_mss_cache before locking the socket. Same thing in do_dccp_getsockopt(). Add READ_ONCE()/WRITE_ONCE() annotations, and change dccp_sendmsg() to check again dccps_mss_cache after socket is locked. Fixes: 7c657876b63c ("[DCCP]: Initial implementation") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230803163021.2958262-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-04Merge branch 'mptcp-more-fixes-for-v6-5'Jakub Kicinski
Matthieu Baerts says: ==================== mptcp: more fixes for v6.5 Here is a new batch of fixes related to MPTCP for v6.5 and older. Patches 1 and 2 fix issues with MPTCP Join selftest when manually launched with '-i' parameter to use 'ip mptcp' tool instead of the dedicated one (pm_nl_ctl). The issues have been there since v5.18. Thank you Andrea for your first contributions to MPTCP code in the upstream kernel! Patch 3 avoids corrupting the data stream when trying to reset connections that have fallen back to TCP. This can happen from v6.1. Patch 4 fixes a race when doing a disconnect() and an accept() in parallel on a listener socket. The issue only happens in rare cases if the user is really unlucky since a fix that landed in v6.3 but backported up to v6.1. ==================== Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-0-6671b1ab11cc@tessares.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-04mptcp: fix disconnect vs accept racePaolo Abeni
Despite commit 0ad529d9fd2b ("mptcp: fix possible divide by zero in recvmsg()"), the mptcp protocol is still prone to a race between disconnect() (or shutdown) and accept. The root cause is that the mentioned commit checks the msk-level flag, but mptcp_stream_accept() does acquire the msk-level lock, as it can rely directly on the first subflow lock. As reported by Christoph than can lead to a race where an msk socket is accepted after that mptcp_subflow_queue_clean() releases the listener socket lock and just before it takes destructive actions leading to the following splat: BUG: kernel NULL pointer dereference, address: 0000000000000012 PGD 5a4ca067 P4D 5a4ca067 PUD 37d4c067 PMD 0 Oops: 0000 [#1] PREEMPT SMP CPU: 2 PID: 10955 Comm: syz-executor.5 Not tainted 6.5.0-rc1-gdc7b257ee5dd #37 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 RIP: 0010:mptcp_stream_accept+0x1ee/0x2f0 include/net/inet_sock.h:330 Code: 0a 09 00 48 8b 1b 4c 39 e3 74 07 e8 bc 7c 7f fe eb a1 e8 b5 7c 7f fe 4c 8b 6c 24 08 eb 05 e8 a9 7c 7f fe 49 8b 85 d8 09 00 00 <0f> b6 40 12 88 44 24 07 0f b6 6c 24 07 bf 07 00 00 00 89 ee e8 89 RSP: 0018:ffffc90000d07dc0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff888037e8d020 RCX: ffff88803b093300 RDX: 0000000000000000 RSI: ffffffff833822c5 RDI: ffffffff8333896a RBP: 0000607f82031520 R08: ffff88803b093300 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000003e83 R12: ffff888037e8d020 R13: ffff888037e8c680 R14: ffff888009af7900 R15: ffff888009af6880 FS: 00007fc26d708640(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000012 CR3: 0000000066bc5001 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> do_accept+0x1ae/0x260 net/socket.c:1872 __sys_accept4+0x9b/0x110 net/socket.c:1913 __do_sys_accept4 net/socket.c:1954 [inline] __se_sys_accept4 net/socket.c:1951 [inline] __x64_sys_accept4+0x20/0x30 net/socket.c:1951 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Address the issue by temporary removing the pending request socket from the accept queue, so that racing accept() can't touch them. After depleting the msk - the ssk still exists, as plain TCP sockets, re-insert them into the accept queue, so that later inet_csk_listen_stop() will complete the tcp socket disposal. Fixes: 2a6a870e44dd ("mptcp: stops worker on unaccepted sockets at listener close") Cc: stable@vger.kernel.org Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/423 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-4-6671b1ab11cc@tessares.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-04mptcp: avoid bogus reset on fallback closePaolo Abeni
Since the blamed commit, the MPTCP protocol unconditionally sends TCP resets on all the subflows on disconnect(). That fits full-blown MPTCP sockets - to implement the fastclose mechanism - but causes unexpected corruption of the data stream, caught as sporadic self-tests failures. Fixes: d21f83485518 ("mptcp: use fastclose on more edge scenarios") Cc: stable@vger.kernel.org Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/419 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-3-6671b1ab11cc@tessares.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-04selftests: mptcp: join: fix 'implicit EP' testAndrea Claudi
mptcp_join 'implicit EP' test currently fails when using ip mptcp: $ ./mptcp_join.sh -iI <snip> 001 implicit EP creation[fail] expected '10.0.2.2 10.0.2.2 id 1 implicit' found '10.0.2.2 id 1 rawflags 10 ' Error: too many addresses or duplicate one: -22. ID change is prevented[fail] expected '10.0.2.2 10.0.2.2 id 1 implicit' found '10.0.2.2 id 1 rawflags 10 ' modif is allowed[fail] expected '10.0.2.2 10.0.2.2 id 1 signal' found '10.0.2.2 id 1 signal ' This happens because of two reasons: - iproute v6.3.0 does not support the implicit flag, fixed with iproute2-next commit 3a2535a41854 ("mptcp: add support for implicit flag") - pm_nl_check_endpoint wrongly expects the ip address to be repeated two times in iproute output, and does not account for a final whitespace in it. This fixes the issue trimming the whitespace in the output string and removing the double address in the expected string. Fixes: 69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case") Cc: stable@vger.kernel.org Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-2-6671b1ab11cc@tessares.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-04selftests: mptcp: join: fix 'delete and re-add' testAndrea Claudi
mptcp_join 'delete and re-add' test fails when using ip mptcp: $ ./mptcp_join.sh -iI <snip> 002 delete and re-add before delete[ ok ] mptcp_info subflows=1 [ ok ] Error: argument "ADDRESS" is wrong: invalid for non-zero id address after delete[fail] got 2:2 subflows expected 1 This happens because endpoint delete includes an ip address while id is not 0, contrary to what is indicated in the ip mptcp man page: "When used with the delete id operation, an IFADDR is only included when the ID is 0." This fixes the issue using the $addr variable in pm_nl_del_endpoint() only when id is 0. Fixes: 34aa6e3bccd8 ("selftests: mptcp: add ip mptcp wrappers") Cc: stable@vger.kernel.org Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-1-6671b1ab11cc@tessares.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-04net: phy: move marking PHY on SFP module into SFP codeRussell King (Oracle)
Move marking the PHY as being on a SFP module into the SFP code between getting the PHY device (and thus initialising the phy_device structure) and registering the discovered device. This means that PHY drivers can use phy_on_sfp() in their match and get_features methods. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qRaga-001vKt-8X@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-04Merge branch 'tunnels-fix-ipv4-pmtu-icmp-checksum'Jakub Kicinski
Florian Westphal says: ==================== tunnels: fix ipv4 pmtu icmp checksum The checksum of the generated ipv4 icmp pmtud message is only correct if the skb that causes the icmp error generation is linear. Fix this and add a selftest for this. ==================== Link: https://lore.kernel.org/r/20230803152653.29535-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-04selftests: net: test vxlan pmtu exceptions with tcpFlorian Westphal
TCP might get stuck if a nonlinear skb exceeds the path MTU, icmp error contains an incorrect icmp checksum in that case. Extend the existing test for vxlan to also send at least 1MB worth of data via TCP in addition to the existing 'large icmp packet adds route exception'. On my test VM this fails due to 0-size output file without "tunnels: fix kasan splat when generating ipv4 pmtu error". Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230803152653.29535-3-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-04tunnels: fix kasan splat when generating ipv4 pmtu errorFlorian Westphal
If we try to emit an icmp error in response to a nonliner skb, we get BUG: KASAN: slab-out-of-bounds in ip_compute_csum+0x134/0x220 Read of size 4 at addr ffff88811c50db00 by task iperf3/1691 CPU: 2 PID: 1691 Comm: iperf3 Not tainted 6.5.0-rc3+ #309 [..] kasan_report+0x105/0x140 ip_compute_csum+0x134/0x220 iptunnel_pmtud_build_icmp+0x554/0x1020 skb_tunnel_check_pmtu+0x513/0xb80 vxlan_xmit_one+0x139e/0x2ef0 vxlan_xmit+0x1867/0x2760 dev_hard_start_xmit+0x1ee/0x4f0 br_dev_queue_push_xmit+0x4d1/0x660 [..] ip_compute_csum() cannot deal with nonlinear skbs, so avoid it. After this change, splat is gone and iperf3 is no longer stuck. Fixes: 4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets") Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230803152653.29535-2-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-04net/packet: annotate data-races around tp->statusEric Dumazet
Another syzbot report [1] is about tp->status lockless reads from __packet_get_status() [1] BUG: KCSAN: data-race in __packet_rcv_has_room / __packet_set_status write to 0xffff888117d7c080 of 8 bytes by interrupt on cpu 0: __packet_set_status+0x78/0xa0 net/packet/af_packet.c:407 tpacket_rcv+0x18bb/0x1a60 net/packet/af_packet.c:2483 deliver_skb net/core/dev.c:2173 [inline] __netif_receive_skb_core+0x408/0x1e80 net/core/dev.c:5337 __netif_receive_skb_one_core net/core/dev.c:5491 [inline] __netif_receive_skb+0x57/0x1b0 net/core/dev.c:5607 process_backlog+0x21f/0x380 net/core/dev.c:5935 __napi_poll+0x60/0x3b0 net/core/dev.c:6498 napi_poll net/core/dev.c:6565 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6698 __do_softirq+0xc1/0x265 kernel/softirq.c:571 invoke_softirq kernel/softirq.c:445 [inline] __irq_exit_rcu+0x57/0xa0 kernel/softirq.c:650 sysvec_apic_timer_interrupt+0x6d/0x80 arch/x86/kernel/apic/apic.c:1106 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 smpboot_thread_fn+0x33c/0x4a0 kernel/smpboot.c:112 kthread+0x1d7/0x210 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 read to 0xffff888117d7c080 of 8 bytes by interrupt on cpu 1: __packet_get_status net/packet/af_packet.c:436 [inline] packet_lookup_frame net/packet/af_packet.c:524 [inline] __tpacket_has_room net/packet/af_packet.c:1255 [inline] __packet_rcv_has_room+0x3f9/0x450 net/packet/af_packet.c:1298 tpacket_rcv+0x275/0x1a60 net/packet/af_packet.c:2285 deliver_skb net/core/dev.c:2173 [inline] dev_queue_xmit_nit+0x38a/0x5e0 net/core/dev.c:2243 xmit_one net/core/dev.c:3574 [inline] dev_hard_start_xmit+0xcf/0x3f0 net/core/dev.c:3594 __dev_queue_xmit+0xefb/0x1d10 net/core/dev.c:4244 dev_queue_xmit include/linux/netdevice.h:3088 [inline] can_send+0x4eb/0x5d0 net/can/af_can.c:276 bcm_can_tx+0x314/0x410 net/can/bcm.c:302 bcm_tx_timeout_handler+0xdb/0x260 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] __hrtimer_run_queues+0x217/0x700 kernel/time/hrtimer.c:1749 hrtimer_run_softirq+0xd6/0x120 kernel/time/hrtimer.c:1766 __do_softirq+0xc1/0x265 kernel/softirq.c:571 run_ksoftirqd+0x17/0x20 kernel/softirq.c:939 smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164 kthread+0x1d7/0x210 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 value changed: 0x0000000000000000 -> 0x0000000020000081 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 19 Comm: ksoftirqd/1 Not tainted 6.4.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023 Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230803145600.2937518-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-04mlxsw: spectrum: Remove unused function declarationsYue Haibing
Commit c3d2ed93b14d ("mlxsw: Remove old parsing depth infrastructure") left behind mlxsw_sp_nve_inc_parsing_depth_get()/mlxsw_sp_nve_inc_parsing_depth_put(). And commit 532b49e41e64 ("mlxsw: spectrum_span: Derive SBIB from maximum port speed & MTU") remove mlxsw_sp_span_port_mtu_update()/mlxsw_sp_span_speed_update_work() but leave the declarations. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230803142047.42660-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-04ixgbevf: Remove unused function declarationsYue Haibing
ixgbe_napi_add_all()/ixgbe_napi_del_all() are declared but never implemented in commit 92915f71201b ("ixgbevf: Driver main and ethool interface module and main header") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230803141904.15316-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-04Merge tag 'hyperv-fixes-signed-20230804' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Fix a bug in a python script for Hyper-V (Ani Sinha) - Workaround a bug in Hyper-V when IBT is enabled (Michael Kelley) - Fix an issue parsing MP table when Linux runs in VTL2 (Saurabh Sengar) - Several cleanup patches (Nischala Yelchuri, Kameron Carr, YueHaibing, ZhiHu) * tag 'hyperv-fixes-signed-20230804' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Drivers: hv: vmbus: Remove unused extern declaration vmbus_ontimer() x86/hyperv: add noop functions to x86_init mpparse functions vmbus_testing: fix wrong python syntax for integer value comparison x86/hyperv: fix a warning in mshyperv.h x86/hyperv: Disable IBT when hypercall page lacks ENDBR instruction x86/hyperv: Improve code for referencing hyperv_pcpu_input_arg Drivers: hv: Change hv_free_hyperv_page() to take void * argument
2023-08-04bpf: change bpf_alu_sign_string and bpf_movsx_string to staticYang Yingliang
The bpf_alu_sign_string and bpf_movsx_string introduced in commit f835bb622299 ("bpf: Add kernel/bpftool asm support for new instructions") are only used in disasm.c now, change them to static. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202308050615.wxAn1v2J-lkp@intel.com/ Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230803023128.3753323-1-yangyingliang@huawei.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-04Merge tag 'riscv-for-linus-6.5-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - A pair of fixes for build-related failures in the selftests - A fix for a sparse warning in acpi_os_ioremap() - A fix to restore the kernel PA offset in vmcoreinfo, to fix crash handling * tag 'riscv-for-linus-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: Documentation: kdump: Add va_kernel_pa_offset for RISCV64 riscv: Export va_kernel_pa_offset in vmcoreinfo RISC-V: ACPI: Fix acpi_os_ioremap to return iomem address selftests: riscv: Fix compilation error with vstate_exec_nolibc.c selftests/riscv: fix potential build failure during the "emit_tests" step
2023-08-04Merge tag 'pm-6.5-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Fix a sparse warning triggered by the TPMI interface recently added to the Intel RAPL power capping driver (Zhang Rui)" * tag 'pm-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: powercap: intel_rapl: Fix a sparse warning in TPMI interface
2023-08-04net: dsa: ocelot: call dsa_tag_8021q_unregister() under rtnl_lock() on ↵Vladimir Oltean
driver remove When the tagging protocol in current use is "ocelot-8021q" and we unbind the driver, we see this splat: $ echo '0000:00:00.2' > /sys/bus/pci/drivers/fsl_enetc/unbind mscc_felix 0000:00:00.5 swp0: left promiscuous mode sja1105 spi2.0: Link is Down DSA: tree 1 torn down mscc_felix 0000:00:00.5 swp2: left promiscuous mode sja1105 spi2.2: Link is Down DSA: tree 3 torn down fsl_enetc 0000:00:00.2 eno2: left promiscuous mode mscc_felix 0000:00:00.5: Link is Down ------------[ cut here ]------------ RTNL: assertion failed at net/dsa/tag_8021q.c (409) WARNING: CPU: 1 PID: 329 at net/dsa/tag_8021q.c:409 dsa_tag_8021q_unregister+0x12c/0x1a0 Modules linked in: CPU: 1 PID: 329 Comm: bash Not tainted 6.5.0-rc3+ #771 pc : dsa_tag_8021q_unregister+0x12c/0x1a0 lr : dsa_tag_8021q_unregister+0x12c/0x1a0 Call trace: dsa_tag_8021q_unregister+0x12c/0x1a0 felix_tag_8021q_teardown+0x130/0x150 felix_teardown+0x3c/0xd8 dsa_tree_teardown_switches+0xbc/0xe0 dsa_unregister_switch+0x168/0x260 felix_pci_remove+0x30/0x60 pci_device_remove+0x4c/0x100 device_release_driver_internal+0x188/0x288 device_links_unbind_consumers+0xfc/0x138 device_release_driver_internal+0xe0/0x288 device_driver_detach+0x24/0x38 unbind_store+0xd8/0x108 drv_attr_store+0x30/0x50 ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ RTNL: assertion failed at net/8021q/vlan_core.c (376) WARNING: CPU: 1 PID: 329 at net/8021q/vlan_core.c:376 vlan_vid_del+0x1b8/0x1f0 CPU: 1 PID: 329 Comm: bash Tainted: G W 6.5.0-rc3+ #771 pc : vlan_vid_del+0x1b8/0x1f0 lr : vlan_vid_del+0x1b8/0x1f0 dsa_tag_8021q_unregister+0x8c/0x1a0 felix_tag_8021q_teardown+0x130/0x150 felix_teardown+0x3c/0xd8 dsa_tree_teardown_switches+0xbc/0xe0 dsa_unregister_switch+0x168/0x260 felix_pci_remove+0x30/0x60 pci_device_remove+0x4c/0x100 device_release_driver_internal+0x188/0x288 device_links_unbind_consumers+0xfc/0x138 device_release_driver_internal+0xe0/0x288 device_driver_detach+0x24/0x38 unbind_store+0xd8/0x108 drv_attr_store+0x30/0x50 DSA: tree 0 torn down This was somewhat not so easy to spot, because "ocelot-8021q" is not the default tagging protocol, and thus, not everyone who tests the unbinding path may have switched to it beforehand. The default felix_tag_npi_teardown() does not require rtnl_lock() to be held. Fixes: 7c83a7c539ab ("net: dsa: add a second tagger for Ocelot switches based on tag_8021q") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230803134253.2711124-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-04af_vsock: Remove unused declaration vsock_release_pending()/vsock_init_tap()Yue Haibing
Commit d021c344051a ("VSOCK: Introduce VM Sockets") declared but never implemented vsock_release_pending(). Also vsock_init_tap() never implemented since introduction in commit 531b374834c8 ("VSOCK: Add vsockmon tap functions"). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20230803134507.22660-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-04net: 802: Remove unused function declarationsYue Haibing
Commit d8d9ba8dc9c7 ("net: 802: remove dead leftover after ipx driver removal") remove these implementations but leave the declarations. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230803135424.41664-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>