Age | Commit message (Collapse) | Author |
|
Setting dev->priv_flags & IFF_SEE_ALL_HWTSTAMP_REQUESTS is only legal
for drivers which were converted to ndo_hwtstamp_get() and
ndo_hwtstamp_set(), and it is only there that we call ndo_hwtstamp_set()
for a request that otherwise goes to phylib (for stuff like packet traps,
which need to be undone if phylib failed, hence the old_cfg logic).
The problem is that we end up calling ndo_hwtstamp_get() when we don't
need to (even if the SIOCSHWTSTAMP wasn't intended for phylib, or if it
was, but the driver didn't set IFF_SEE_ALL_HWTSTAMP_REQUESTS). For those
unnecessary conditions, we share a code path with virtual drivers (vlan,
macvlan, bonding) where ndo_hwtstamp_get() is implemented as
generic_hwtstamp_get_lower(), and may be resolved through
generic_hwtstamp_ioctl_lower() if the lower device is unconverted.
I.e. this situation:
$ ip link add link eno0 name eno0.100 type vlan id 100
$ hwstamp_ctl -i eno0.100 -t 1
We are unprepared to deal with this, because if ndo_hwtstamp_get() is
resolved through a legacy ndo_eth_ioctl(SIOCGHWTSTAMP) lower_dev
implementation, that needs a non-NULL old_cfg.ifr pointer, and we don't
have it.
But we don't even need to deal with it either. In the general case,
drivers may not even implement SIOCGHWTSTAMP handling, only SIOCSHWTSTAMP,
so it makes sense to completely avoid a SIOCGHWTSTAMP call if we can.
The solution is to split the single "if" condition into 3 smaller ones,
thus separating the decision to call ndo_hwtstamp_get() from the
decision to call ndo_hwtstamp_set(). The third "if" condition is
identical to the first one, and both are subsets of the second one.
Thus, the "cfg" argument of kernel_hwtstamp_config_changed() is always
valid.
Reported-by: Eric Dumazet <edumazet@google.com>
Closes: https://lore.kernel.org/netdev/CANn89iLOspJsvjPj+y8jikg7erXDomWe8sqHMdfL_2LQSFrPAg@mail.gmail.com/
Fixes: fd770e856e22 ("net: remove phy_has_hwtstamp() -> phy_mii_ioctl() decision from converted drivers")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
address
Use eth_broadcast_addr() to assign broadcast address instead
of memset().
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
gcc with W=1 reports
drivers/net/ethernet/ibm/ibmvnic.c:194:13: warning: variable 'rc' set but not used [-Wunused-but-set-variable]
^
This variable is not used so remove it.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308040609.zQsSXWXI-lkp@intel.com/
Signed-off-by: Yu Liao <liaoyu15@huawei.com>
Reviewed-by: Nick Child <nnac123@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot/KCSAN reported data-races in macsec whenever dev->stats fields
are updated.
It appears all of these updates can happen from multiple cpus.
Adopt SMP safe DEV_STATS_INC() to update dev->stats fields.
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add page pool for RX buffers for faster buffer cycle and reduce CPU
usage.
The standard page pool API is used.
With iperf and 128 threads test, this patch improved the throughput
by 12-15%, and decreased the IRQ associated CPU's usage from 99-100% to
10-50%.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rushil Gupta says:
====================
gve: Add QPL mode for DQO descriptor format
GVE supports QPL ("queue-page-list") mode where
all data is communicated through a set of pre-registered
pages. Adding this mode to DQO.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a note about QPL and RDA mode
Signed-off-by: Rushil Gupta <rushilg@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Bailey Forrest <bcf@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The RX path allocates the QPL page pool at queue creation, and
tries to reuse these pages through page recycling. This patch
ensures that on refill no non-QPL pages are posted to the device.
When the driver is running low on free buffers, an ondemand
allocation step kicks in that allocates a non-qpl page for
SKB business to free up the QPL page in use.
gve_try_recycle_buf was moved to gve_rx_append_frags so that driver does
not attempt to mark buffer as used if a non-qpl page was allocated
ondemand.
Signed-off-by: Rushil Gupta <rushilg@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Bailey Forrest <bcf@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Each QPL page is divided into GVE_TX_BUFS_PER_PAGE_DQO buffers.
When a packet needs to be transmitted, we break the packet into max
GVE_TX_BUF_SIZE_DQO sized chunks and transmit each chunk using a TX
descriptor.
We allocate the TX buffers from the free list in dqo_tx.
We store these TX buffer indices in an array in the pending_packet
structure.
The TX buffers are returned to the free list in dqo_compl after
receiving packet completion or when removing packets from miss
completions list.
Signed-off-by: Rushil Gupta <rushilg@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Bailey Forrest <bcf@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
GVE supports QPL ("queue-page-list") mode where
all data is communicated through a set of pre-registered
pages. Adding this mode to DQO descriptor format.
Add checks, abi-changes and device options to support
QPL mode for DQO in addition to GQI. Also, use
pages-per-qpl supplied by device-option to control the
size of the "queue-page-list".
Signed-off-by: Rushil Gupta <rushilg@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Bailey Forrest <bcf@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
TLS records end with a 16B tag. For TLS device offload we only
need to make space for this tag in the stream, the device will
generate and replace it with the actual calculated tag.
Long time ago the code would just re-reference the head frag
which mostly worked but was suboptimal because it prevented TCP
from combining the record into a single skb frag. I'm not sure
if it was correct as the first frag may be shorter than the tag.
The commit under fixes tried to replace that with using the page
frag and if the allocation failed rolling back the data, if record
was long enough. It achieves better fragment coalescing but is
also buggy.
We don't roll back the iterator, so unless we're at the end of
send we'll skip the data we designated as tag and start the
next record as if the rollback never happened.
There's also the possibility that the record was constructed
with MSG_MORE and the data came from a different syscall and
we already told the user space that we "got it".
Allocate a single dummy page and use it as fallback.
Found by code inspection, and proven by forcing allocation
failures.
Fixes: e7b159a48ba6 ("net/tls: remove the record tail optimization")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Eric Dumazet says:
====================
tcp: set few options locklessly
This series is avoiding the socket lock for six TCP options.
They are not heavily used, but this exercise can give
ideas for other parts of TCP/IP stack :)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rskq_defer_accept field can be read/written without
the need of holding the socket lock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tp->linger2 can be set locklessly as long as readers
use READ_ONCE().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tp->keepalive_probes can be set locklessly, readers
are already taking care of this field being potentially
set by other threads.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tp->keepalive_intvl can be set locklessly, readers
are already taking care of this field being potentially
set by other threads.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
icsk->icsk_user_timeout can be set locklessly,
if all read sides use READ_ONCE().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
icsk->icsk_syn_retries can safely be set without locking the socket.
We have to add READ_ONCE() annotations in tcp_fastopen_synack_timer()
and tcp_write_timeout().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull rust fixes from Miguel Ojeda:
- Allocator: prevent mis-aligned allocation
- Types: delete 'ForeignOwnable::borrow_mut'. A sound replacement is
planned for the merge window
- Build: fix bindgen error with UBSAN_BOUNDS_STRICT
* tag 'rust-fixes-6.5-rc5' of https://github.com/Rust-for-Linux/linux:
rust: fix bindgen build error with UBSAN_BOUNDS_STRICT
rust: delete `ForeignOwnable::borrow_mut`
rust: allocator: Prevent mis-aligned allocation
|
|
There are multiple smb2_ea_info buffers in FILE_FULL_EA_INFORMATION request
from client. ksmbd find next smb2_ea_info using ->NextEntryOffset of
current smb2_ea_info. ksmbd need to validate buffer length Before
accessing the next ea. ksmbd should check buffer length using buf_len,
not next variable. next is the start offset of current ea that got from
previous ea.
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21598
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
In commit 2b9b8f3b68ed ("ksmbd: validate command payload size"), except
for SMB2_OPLOCK_BREAK_HE command, the request size of other commands
is not checked, it's not expected. Fix it by add check for request
size of other commands.
Cc: stable@vger.kernel.org
Fixes: 2b9b8f3b68ed ("ksmbd: validate command payload size")
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Long Li <leo.lilong@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ata fix from Damien Le Moal:
- Prevent the scsi disk driver from issuing a START STOP UNIT command
for ATA devices during system resume as this causes various issues
reported by multiple users.
* tag 'ata-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata,scsi: do not issue START STOP UNIT on resume
|
|
Pull smb client fix from Steve French:
- Fix DFS interlink problem (different namespace)
* tag '6.5-rc4-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix dfs link mount against w2k8
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix vmemmap altmap boundary check which could cause memory hotunplug
failure
- Create a dummy stackframe to fix ftrace stack unwind
- Fix secondary thread bringup for Book3E ELFv2 kernels
- Use early_ioremap/unmap() in via_calibrate_decr()
Thanks to Aneesh Kumar K.V, Benjamin Gray, Christophe Leroy, David
Hildenbrand, and Naveen N Rao.
* tag 'powerpc-6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powermac: Use early_* IO variants in via_calibrate_decr()
powerpc/64e: Fix secondary thread bringup for ELFv2 kernels
powerpc/ftrace: Create a dummy stackframe to fix stack unwind
powerpc/mm/altmap: Fix altmap boundary check
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc architecture fixes from Helge Deller:
- early fixmap preallocation to fix boot failures on kernel >= 6.4
- remove DMA leftover code in parport_gsc
- drop old comments and code style fixes
* tag 'parisc-for-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: unaligned: Add required spaces after ','
parport: gsc: remove DMA leftover code
parisc: pci-dma: remove unused and dead EISA code and comment
parisc/mm: preallocate fixmap page tables at init
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A few clk driver fixes for some SoC clk drivers:
- Change a usleep() to udelay() to avoid scheduling while atomic in
the Amlogic PLL code
- Revert a patch to the Mediatek MT8183 driver that caused an
out-of-bounds write
- Return the right error value when devm_of_iomap() fails in
imx93_clocks_probe()
- Constrain the Kconfig for the fixed mmio clk so that it depends on
HAS_IOMEM and can't be compiled on architectures such as s390"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: fixed-mmio: make COMMON_CLK_FIXED_MMIO depend on HAS_IOMEM
clk: imx93: Propagate correct error in imx93_clocks_probe()
clk: mediatek: mt8183: Add back SSPM related clocks
clk: meson: change usleep_range() to udelay() for atomic context
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.6
The first pull request for v6.6 and only driver patches this time.
Nothing special really standing out, it has been quiet most likely due
to vacations.
Major changes:
rtl8xxxu
- enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
RTL8192EU and RTL8723BU
mwifiex
- allow moving to a different namespace
mt76
- preparation for mt7925 support
- mt7981 support
ath12k
- Extremely High Throughput (EHT) PHY support for Wi-Fi 7
* tag 'wireless-next-2023-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (172 commits)
wifi: rtw89: return failure if needed firmware elements are not recognized
wifi: rtw89: add to parse firmware elements of BB and RF tables
wifi: rtw89: introduce infrastructure of firmware elements
wifi: rtw89: add firmware suit for BB MCU 0/1
wifi: rtw89: add firmware parser for v1 format
wifi: rtw89: introduce v1 format of firmware header
wifi: rtw89: support firmware log with formatted text
wifi: rtw89: recognize log format from firmware file
wifi: ath12k: avoid deadlock by change ieee80211_queue_work for regd_update_work
wifi: ath12k: add handler for scan event WMI_SCAN_EVENT_DEQUEUED
wifi: ath12k: relax list iteration in ath12k_mac_vif_unref()
wifi: ath12k: configure puncturing bitmap
wifi: ath12k: parse WMI service ready ext2 event
wifi: ath12k: add MLO header in peer association
wifi: ath12k: peer assoc for 320 MHz
wifi: ath12k: add WMI support for EHT peer
wifi: ath12k: prepare EHT peer assoc parameters
wifi: ath12k: add EHT PHY modes
wifi: ath12k: propagate EHT capabilities to userspace
wifi: ath12k: WMI support to process EHT capabilities
...
====================
Link: https://lore.kernel.org/r/87msz7j942.fsf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Kuniyuki Iwashima says:
====================
tcp: Disable header prediction for MD5.
The 1st patch disable header prediction for MD5 flow and the 2nd
patch updates the stale comment in tcp_parse_options().
====================
Link: https://lore.kernel.org/r/20230803224552.69398-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since commit 9ea88a153001 ("tcp: md5: check md5 signature without socket
lock"), the MD5 option is checked in tcp_v[46]_rcv().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230803224552.69398-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
TCP socket saves the minimum required header length in tcp_header_len
of struct tcp_sock, and later the value is used in __tcp_fast_path_on()
to generate a part of TCP header in tcp_sock(sk)->pred_flags.
In tcp_rcv_established(), if the incoming packet has the same pattern
with pred_flags, we enter the fast path and skip full option parsing.
The MD5 option is parsed in tcp_v[46]_rcv(), so we need not parse it
again later in tcp_rcv_established() unless other options exist. We
add TCPOLEN_MD5SIG_ALIGNED to tcp_header_len in two paths to avoid the
slow path.
For passive open connections with MD5, we add TCPOLEN_MD5SIG_ALIGNED
to tcp_header_len in tcp_create_openreq_child() after 3WHS.
On the other hand, we do it in tcp_connect_init() for active open
connections. However, the value is overwritten while processing
SYN+ACK or crossed SYN in tcp_rcv_synsent_state_process().
These two cases will have the wrong value in pred_flags and never go
into the fast path.
We could update tcp_header_len in tcp_rcv_synsent_state_process(), but
a test with slightly modified netperf which uses MD5 for each flow shows
that the slow path is actually a bit faster than the fast path.
On c5.4xlarge EC2 instance (16 vCPU, 32 GiB mem)
$ for i in {1..10}; do
./super_netperf $(nproc) -H localhost -l 10 -- -m 256 -M 256;
done
Avg of 10
* 36e68eadd303 : 10.376 Gbps
* all fast path : 10.374 Gbps (patch v2, See Link)
* all slow path : 10.394 Gbps
The header prediction is not worth adding complexity for MD5, so let's
disable it for MD5.
Link: https://lore.kernel.org/netdev/20230803042214.38309-1-kuniyu@amazon.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230803224552.69398-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
dccp_sendmsg() reads dp->dccps_mss_cache before locking the socket.
Same thing in do_dccp_getsockopt().
Add READ_ONCE()/WRITE_ONCE() annotations,
and change dccp_sendmsg() to check again dccps_mss_cache
after socket is locked.
Fixes: 7c657876b63c ("[DCCP]: Initial implementation")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230803163021.2958262-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Matthieu Baerts says:
====================
mptcp: more fixes for v6.5
Here is a new batch of fixes related to MPTCP for v6.5 and older.
Patches 1 and 2 fix issues with MPTCP Join selftest when manually
launched with '-i' parameter to use 'ip mptcp' tool instead of the
dedicated one (pm_nl_ctl). The issues have been there since v5.18.
Thank you Andrea for your first contributions to MPTCP code in the
upstream kernel!
Patch 3 avoids corrupting the data stream when trying to reset
connections that have fallen back to TCP. This can happen from v6.1.
Patch 4 fixes a race when doing a disconnect() and an accept() in
parallel on a listener socket. The issue only happens in rare cases if
the user is really unlucky since a fix that landed in v6.3 but
backported up to v6.1.
====================
Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-0-6671b1ab11cc@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Despite commit 0ad529d9fd2b ("mptcp: fix possible divide by zero in
recvmsg()"), the mptcp protocol is still prone to a race between
disconnect() (or shutdown) and accept.
The root cause is that the mentioned commit checks the msk-level
flag, but mptcp_stream_accept() does acquire the msk-level lock,
as it can rely directly on the first subflow lock.
As reported by Christoph than can lead to a race where an msk
socket is accepted after that mptcp_subflow_queue_clean() releases
the listener socket lock and just before it takes destructive
actions leading to the following splat:
BUG: kernel NULL pointer dereference, address: 0000000000000012
PGD 5a4ca067 P4D 5a4ca067 PUD 37d4c067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 2 PID: 10955 Comm: syz-executor.5 Not tainted 6.5.0-rc1-gdc7b257ee5dd #37
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
RIP: 0010:mptcp_stream_accept+0x1ee/0x2f0 include/net/inet_sock.h:330
Code: 0a 09 00 48 8b 1b 4c 39 e3 74 07 e8 bc 7c 7f fe eb a1 e8 b5 7c 7f fe 4c 8b 6c 24 08 eb 05 e8 a9 7c 7f fe 49 8b 85 d8 09 00 00 <0f> b6 40 12 88 44 24 07 0f b6 6c 24 07 bf 07 00 00 00 89 ee e8 89
RSP: 0018:ffffc90000d07dc0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff888037e8d020 RCX: ffff88803b093300
RDX: 0000000000000000 RSI: ffffffff833822c5 RDI: ffffffff8333896a
RBP: 0000607f82031520 R08: ffff88803b093300 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000003e83 R12: ffff888037e8d020
R13: ffff888037e8c680 R14: ffff888009af7900 R15: ffff888009af6880
FS: 00007fc26d708640(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000012 CR3: 0000000066bc5001 CR4: 0000000000370ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
do_accept+0x1ae/0x260 net/socket.c:1872
__sys_accept4+0x9b/0x110 net/socket.c:1913
__do_sys_accept4 net/socket.c:1954 [inline]
__se_sys_accept4 net/socket.c:1951 [inline]
__x64_sys_accept4+0x20/0x30 net/socket.c:1951
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Address the issue by temporary removing the pending request socket
from the accept queue, so that racing accept() can't touch them.
After depleting the msk - the ssk still exists, as plain TCP sockets,
re-insert them into the accept queue, so that later inet_csk_listen_stop()
will complete the tcp socket disposal.
Fixes: 2a6a870e44dd ("mptcp: stops worker on unaccepted sockets at listener close")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/423
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-4-6671b1ab11cc@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since the blamed commit, the MPTCP protocol unconditionally sends
TCP resets on all the subflows on disconnect().
That fits full-blown MPTCP sockets - to implement the fastclose
mechanism - but causes unexpected corruption of the data stream,
caught as sporadic self-tests failures.
Fixes: d21f83485518 ("mptcp: use fastclose on more edge scenarios")
Cc: stable@vger.kernel.org
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/419
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-3-6671b1ab11cc@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mptcp_join 'implicit EP' test currently fails when using ip mptcp:
$ ./mptcp_join.sh -iI
<snip>
001 implicit EP creation[fail] expected '10.0.2.2 10.0.2.2 id 1 implicit' found '10.0.2.2 id 1 rawflags 10 '
Error: too many addresses or duplicate one: -22.
ID change is prevented[fail] expected '10.0.2.2 10.0.2.2 id 1 implicit' found '10.0.2.2 id 1 rawflags 10 '
modif is allowed[fail] expected '10.0.2.2 10.0.2.2 id 1 signal' found '10.0.2.2 id 1 signal '
This happens because of two reasons:
- iproute v6.3.0 does not support the implicit flag, fixed with
iproute2-next commit 3a2535a41854 ("mptcp: add support for implicit
flag")
- pm_nl_check_endpoint wrongly expects the ip address to be repeated two
times in iproute output, and does not account for a final whitespace
in it.
This fixes the issue trimming the whitespace in the output string and
removing the double address in the expected string.
Fixes: 69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case")
Cc: stable@vger.kernel.org
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-2-6671b1ab11cc@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mptcp_join 'delete and re-add' test fails when using ip mptcp:
$ ./mptcp_join.sh -iI
<snip>
002 delete and re-add before delete[ ok ]
mptcp_info subflows=1 [ ok ]
Error: argument "ADDRESS" is wrong: invalid for non-zero id address
after delete[fail] got 2:2 subflows expected 1
This happens because endpoint delete includes an ip address while id is
not 0, contrary to what is indicated in the ip mptcp man page:
"When used with the delete id operation, an IFADDR is only included when
the ID is 0."
This fixes the issue using the $addr variable in pm_nl_del_endpoint()
only when id is 0.
Fixes: 34aa6e3bccd8 ("selftests: mptcp: add ip mptcp wrappers")
Cc: stable@vger.kernel.org
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-1-6671b1ab11cc@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move marking the PHY as being on a SFP module into the SFP code between
getting the PHY device (and thus initialising the phy_device structure)
and registering the discovered device.
This means that PHY drivers can use phy_on_sfp() in their match and
get_features methods.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1qRaga-001vKt-8X@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Florian Westphal says:
====================
tunnels: fix ipv4 pmtu icmp checksum
The checksum of the generated ipv4 icmp pmtud message is
only correct if the skb that causes the icmp error generation
is linear.
Fix this and add a selftest for this.
====================
Link: https://lore.kernel.org/r/20230803152653.29535-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
TCP might get stuck if a nonlinear skb exceeds the path MTU,
icmp error contains an incorrect icmp checksum in that case.
Extend the existing test for vxlan to also send at least 1MB worth of
data via TCP in addition to the existing 'large icmp packet adds
route exception'.
On my test VM this fails due to 0-size output file without
"tunnels: fix kasan splat when generating ipv4 pmtu error".
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230803152653.29535-3-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If we try to emit an icmp error in response to a nonliner skb, we get
BUG: KASAN: slab-out-of-bounds in ip_compute_csum+0x134/0x220
Read of size 4 at addr ffff88811c50db00 by task iperf3/1691
CPU: 2 PID: 1691 Comm: iperf3 Not tainted 6.5.0-rc3+ #309
[..]
kasan_report+0x105/0x140
ip_compute_csum+0x134/0x220
iptunnel_pmtud_build_icmp+0x554/0x1020
skb_tunnel_check_pmtu+0x513/0xb80
vxlan_xmit_one+0x139e/0x2ef0
vxlan_xmit+0x1867/0x2760
dev_hard_start_xmit+0x1ee/0x4f0
br_dev_queue_push_xmit+0x4d1/0x660
[..]
ip_compute_csum() cannot deal with nonlinear skbs, so avoid it.
After this change, splat is gone and iperf3 is no longer stuck.
Fixes: 4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets")
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230803152653.29535-2-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Another syzbot report [1] is about tp->status lockless reads
from __packet_get_status()
[1]
BUG: KCSAN: data-race in __packet_rcv_has_room / __packet_set_status
write to 0xffff888117d7c080 of 8 bytes by interrupt on cpu 0:
__packet_set_status+0x78/0xa0 net/packet/af_packet.c:407
tpacket_rcv+0x18bb/0x1a60 net/packet/af_packet.c:2483
deliver_skb net/core/dev.c:2173 [inline]
__netif_receive_skb_core+0x408/0x1e80 net/core/dev.c:5337
__netif_receive_skb_one_core net/core/dev.c:5491 [inline]
__netif_receive_skb+0x57/0x1b0 net/core/dev.c:5607
process_backlog+0x21f/0x380 net/core/dev.c:5935
__napi_poll+0x60/0x3b0 net/core/dev.c:6498
napi_poll net/core/dev.c:6565 [inline]
net_rx_action+0x32b/0x750 net/core/dev.c:6698
__do_softirq+0xc1/0x265 kernel/softirq.c:571
invoke_softirq kernel/softirq.c:445 [inline]
__irq_exit_rcu+0x57/0xa0 kernel/softirq.c:650
sysvec_apic_timer_interrupt+0x6d/0x80 arch/x86/kernel/apic/apic.c:1106
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645
smpboot_thread_fn+0x33c/0x4a0 kernel/smpboot.c:112
kthread+0x1d7/0x210 kernel/kthread.c:379
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
read to 0xffff888117d7c080 of 8 bytes by interrupt on cpu 1:
__packet_get_status net/packet/af_packet.c:436 [inline]
packet_lookup_frame net/packet/af_packet.c:524 [inline]
__tpacket_has_room net/packet/af_packet.c:1255 [inline]
__packet_rcv_has_room+0x3f9/0x450 net/packet/af_packet.c:1298
tpacket_rcv+0x275/0x1a60 net/packet/af_packet.c:2285
deliver_skb net/core/dev.c:2173 [inline]
dev_queue_xmit_nit+0x38a/0x5e0 net/core/dev.c:2243
xmit_one net/core/dev.c:3574 [inline]
dev_hard_start_xmit+0xcf/0x3f0 net/core/dev.c:3594
__dev_queue_xmit+0xefb/0x1d10 net/core/dev.c:4244
dev_queue_xmit include/linux/netdevice.h:3088 [inline]
can_send+0x4eb/0x5d0 net/can/af_can.c:276
bcm_can_tx+0x314/0x410 net/can/bcm.c:302
bcm_tx_timeout_handler+0xdb/0x260
__run_hrtimer kernel/time/hrtimer.c:1685 [inline]
__hrtimer_run_queues+0x217/0x700 kernel/time/hrtimer.c:1749
hrtimer_run_softirq+0xd6/0x120 kernel/time/hrtimer.c:1766
__do_softirq+0xc1/0x265 kernel/softirq.c:571
run_ksoftirqd+0x17/0x20 kernel/softirq.c:939
smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164
kthread+0x1d7/0x210 kernel/kthread.c:379
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
value changed: 0x0000000000000000 -> 0x0000000020000081
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 19 Comm: ksoftirqd/1 Not tainted 6.4.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230803145600.2937518-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit c3d2ed93b14d ("mlxsw: Remove old parsing depth infrastructure")
left behind mlxsw_sp_nve_inc_parsing_depth_get()/mlxsw_sp_nve_inc_parsing_depth_put().
And commit 532b49e41e64 ("mlxsw: spectrum_span: Derive SBIB from maximum port speed & MTU")
remove mlxsw_sp_span_port_mtu_update()/mlxsw_sp_span_speed_update_work() but leave the
declarations.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230803142047.42660-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ixgbe_napi_add_all()/ixgbe_napi_del_all() are declared but never implemented in
commit 92915f71201b ("ixgbevf: Driver main and ethool interface module and main header")
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230803141904.15316-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Fix a bug in a python script for Hyper-V (Ani Sinha)
- Workaround a bug in Hyper-V when IBT is enabled (Michael Kelley)
- Fix an issue parsing MP table when Linux runs in VTL2 (Saurabh
Sengar)
- Several cleanup patches (Nischala Yelchuri, Kameron Carr, YueHaibing,
ZhiHu)
* tag 'hyperv-fixes-signed-20230804' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
Drivers: hv: vmbus: Remove unused extern declaration vmbus_ontimer()
x86/hyperv: add noop functions to x86_init mpparse functions
vmbus_testing: fix wrong python syntax for integer value comparison
x86/hyperv: fix a warning in mshyperv.h
x86/hyperv: Disable IBT when hypercall page lacks ENDBR instruction
x86/hyperv: Improve code for referencing hyperv_pcpu_input_arg
Drivers: hv: Change hv_free_hyperv_page() to take void * argument
|
|
The bpf_alu_sign_string and bpf_movsx_string introduced in commit
f835bb622299 ("bpf: Add kernel/bpftool asm support for new instructions")
are only used in disasm.c now, change them to static.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308050615.wxAn1v2J-lkp@intel.com/
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230803023128.3753323-1-yangyingliang@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A pair of fixes for build-related failures in the selftests
- A fix for a sparse warning in acpi_os_ioremap()
- A fix to restore the kernel PA offset in vmcoreinfo, to fix crash
handling
* tag 'riscv-for-linus-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
Documentation: kdump: Add va_kernel_pa_offset for RISCV64
riscv: Export va_kernel_pa_offset in vmcoreinfo
RISC-V: ACPI: Fix acpi_os_ioremap to return iomem address
selftests: riscv: Fix compilation error with vstate_exec_nolibc.c
selftests/riscv: fix potential build failure during the "emit_tests" step
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix a sparse warning triggered by the TPMI interface recently added to
the Intel RAPL power capping driver (Zhang Rui)"
* tag 'pm-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
powercap: intel_rapl: Fix a sparse warning in TPMI interface
|
|
driver remove
When the tagging protocol in current use is "ocelot-8021q" and we unbind
the driver, we see this splat:
$ echo '0000:00:00.2' > /sys/bus/pci/drivers/fsl_enetc/unbind
mscc_felix 0000:00:00.5 swp0: left promiscuous mode
sja1105 spi2.0: Link is Down
DSA: tree 1 torn down
mscc_felix 0000:00:00.5 swp2: left promiscuous mode
sja1105 spi2.2: Link is Down
DSA: tree 3 torn down
fsl_enetc 0000:00:00.2 eno2: left promiscuous mode
mscc_felix 0000:00:00.5: Link is Down
------------[ cut here ]------------
RTNL: assertion failed at net/dsa/tag_8021q.c (409)
WARNING: CPU: 1 PID: 329 at net/dsa/tag_8021q.c:409 dsa_tag_8021q_unregister+0x12c/0x1a0
Modules linked in:
CPU: 1 PID: 329 Comm: bash Not tainted 6.5.0-rc3+ #771
pc : dsa_tag_8021q_unregister+0x12c/0x1a0
lr : dsa_tag_8021q_unregister+0x12c/0x1a0
Call trace:
dsa_tag_8021q_unregister+0x12c/0x1a0
felix_tag_8021q_teardown+0x130/0x150
felix_teardown+0x3c/0xd8
dsa_tree_teardown_switches+0xbc/0xe0
dsa_unregister_switch+0x168/0x260
felix_pci_remove+0x30/0x60
pci_device_remove+0x4c/0x100
device_release_driver_internal+0x188/0x288
device_links_unbind_consumers+0xfc/0x138
device_release_driver_internal+0xe0/0x288
device_driver_detach+0x24/0x38
unbind_store+0xd8/0x108
drv_attr_store+0x30/0x50
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
RTNL: assertion failed at net/8021q/vlan_core.c (376)
WARNING: CPU: 1 PID: 329 at net/8021q/vlan_core.c:376 vlan_vid_del+0x1b8/0x1f0
CPU: 1 PID: 329 Comm: bash Tainted: G W 6.5.0-rc3+ #771
pc : vlan_vid_del+0x1b8/0x1f0
lr : vlan_vid_del+0x1b8/0x1f0
dsa_tag_8021q_unregister+0x8c/0x1a0
felix_tag_8021q_teardown+0x130/0x150
felix_teardown+0x3c/0xd8
dsa_tree_teardown_switches+0xbc/0xe0
dsa_unregister_switch+0x168/0x260
felix_pci_remove+0x30/0x60
pci_device_remove+0x4c/0x100
device_release_driver_internal+0x188/0x288
device_links_unbind_consumers+0xfc/0x138
device_release_driver_internal+0xe0/0x288
device_driver_detach+0x24/0x38
unbind_store+0xd8/0x108
drv_attr_store+0x30/0x50
DSA: tree 0 torn down
This was somewhat not so easy to spot, because "ocelot-8021q" is not the
default tagging protocol, and thus, not everyone who tests the unbinding
path may have switched to it beforehand. The default
felix_tag_npi_teardown() does not require rtnl_lock() to be held.
Fixes: 7c83a7c539ab ("net: dsa: add a second tagger for Ocelot switches based on tag_8021q")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230803134253.2711124-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit d021c344051a ("VSOCK: Introduce VM Sockets") declared but never implemented
vsock_release_pending(). Also vsock_init_tap() never implemented since introduction
in commit 531b374834c8 ("VSOCK: Add vsockmon tap functions").
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20230803134507.22660-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit d8d9ba8dc9c7 ("net: 802: remove dead leftover after ipx driver removal")
remove these implementations but leave the declarations.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230803135424.41664-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|