Age | Commit message (Collapse) | Author |
|
A few error paths are missing a kfree_skb.
Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Commit 5eb902b8e719 ("net/ipv6: Remove expired routes with a separated list
of routes.") introduced a separated list for managing route expiration via
the GC timer.
However, it missed adding exception routes (created by ip6_rt_update_pmtu()
and rt6_do_redirect()) to this GC list. As a result, these exceptions were
never considered for expiration and removal, leading to stale entries
persisting in the routing table.
This patch fixes the issue by calling fib6_add_gc_list() in
rt6_insert_exception(), ensuring that exception routes are properly tracked
and garbage collected when expired.
Fixes: 5eb902b8e719 ("net/ipv6: Remove expired routes with a separated list of routes.")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/837e7506ffb63f47faa2b05d9b85481aad28e1a4.1744134377.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For bnxt when the agg ring is used then tcp-data-split is automatically
reported to be enabled, but __net_mp_open_rxq() requires tcp-data-split
to be explicitly enabled by the user.
Enable tcp-data-split explicitly in io_uring zc rx selftest.
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250409163153.2747918-1-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter.
Current release - regressions:
- core: hold instance lock during NETDEV_CHANGE
- rtnetlink: fix bad unlock balance in do_setlink()
- ipv6:
- fix null-ptr-deref in addrconf_add_ifaddr()
- align behavior across nexthops during path selection
Previous releases - regressions:
- sctp: prevent transport UaF in sendmsg
- mptcp: only inc MPJoinAckHMacFailure for HMAC failures
Previous releases - always broken:
- sched:
- make ->qlen_notify() idempotent
- ensure sufficient space when sending filter netlink notifications
- sch_sfq: really don't allow 1 packet limit
- netfilter: fix incorrect avx2 match of 5th field octet
- tls: explicitly disallow disconnect
- eth: octeontx2-pf: fix VF root node parent queue priority"
* tag 'net-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits)
ethtool: cmis_cdb: Fix incorrect read / write length extension
selftests: netfilter: add test case for recent mismatch bug
nft_set_pipapo: fix incorrect avx2 match of 5th field octet
net: ppp: Add bound checking for skb data on ppp_sync_txmung
net: Fix null-ptr-deref by sock_lock_init_class_and_name() and rmmod.
ipv6: Align behavior across nexthops during path selection
net: phy: allow MDIO bus PM ops to start/stop state machine for phylink-controlled PHY
net: phy: move phy_link_change() prior to mdio_bus_phy_may_suspend()
selftests/tc-testing: sfq: check that a derived limit of 1 is rejected
net_sched: sch_sfq: move the limit validation
net_sched: sch_sfq: use a temporary work area for validating configuration
net: libwx: handle page_pool_dev_alloc_pages error
selftests: mptcp: validate MPJoin HMacFailure counters
mptcp: only inc MPJoinAckHMacFailure for HMAC failures
rtnetlink: Fix bad unlock balance in do_setlink().
net: ethtool: Don't call .cleanup_data when prepare_data fails
tc: Ensure we have enough buffer space when sending filter netlink notifications
net: libwx: Fix the wrong Rx descriptor field
octeontx2-pf: qos: fix VF root node parent queue index
selftests: tls: check that disconnect does nothing
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- A simple fix adding the module description of the Xenbus frontend
module
- A fix correcting the xen-acpi-processor Kconfig dependency for PVH
Dom0 support
- A fix for the Xen balloon driver when running as Xen Dom0 in PVH mode
- A fix for PVH Dom0 in order to avoid problems with CPU idle and
frequency drivers conflicting with Xen
* tag 'for-linus-6.15a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: disable CPU idle and frequency drivers for PVH dom0
x86/xen: fix balloon target initialization for PVH dom0
xen: Change xen-acpi-processor dom0 dependency
xenbus: add module description
|
|
Pull block fixes from Jens Axboe:
- Add a missing ublk selftest script, from test additions added last
week
- Two fixes for ublk error recovery and reissue
- Cleanup of ublk argument passing
* tag 'block-6.15-20250410' of git://git.kernel.dk/linux:
ublk: pass ublksrv_ctrl_cmd * instead of io_uring_cmd *
ublk: don't fail request for recovery & reissue in case of ubq->canceling
ublk: fix handling recovery & reissue in ublk_abort_queue()
selftests: ublk: fix test_stripe_04
|
|
Pull io_uring fixes from Jens Axboe:
- Reject zero sized legacy provided buffers upfront. No ill side
effects from this one, only really done to shut up a silly syzbot
test case.
- Fix for a regression in tag posting for registered files or buffers,
where the tag would be posted even when the registration failed.
- two minor zcrx cleanups for code added this merge window.
* tag 'io_uring-6.15-20250410' of git://git.kernel.dk/linux:
io_uring/kbuf: reject zero sized provided buffers
io_uring/zcrx: separate niov number from pages
io_uring/zcrx: put refill data into separate cache line
io_uring: don't post tag CQEs on file/buffer registration failure
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix resource handling in gpio-tegra186
- fix wakeup source leaks in gpio-mpc8xxx and gpio-zynq
- fix minor issues with some GPIO OF quirks
- deprecate GPIOD_FLAGS_BIT_NONEXCLUSIVE and devm_gpiod_unhinge()
symbols and add a TODO task to track replacing them with a better
solution
* tag 'gpio-fixes-for-v6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: of: Move Atmel HSMCI quirk up out of the regulator comment
gpiolib: of: Fix the choice for Ingenic NAND quirk
gpio: zynq: Fix wakeup source leaks on device unbind
gpio: mpc8xxx: Fix wakeup source leaks on device unbind
gpio: TODO: track the removal of regulator-related workarounds
MAINTAINERS: add more keywords for the GPIO subsystem entry
gpio: deprecate devm_gpiod_unhinge()
gpio: deprecate the GPIOD_FLAGS_BIT_NONEXCLUSIVE flag
gpio: tegra186: fix resource handling in ACPI probe path
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull mtd fixes from Miquel Raynal:
"Two important fixes: the build of the SPI NAND layer with old GCC
versions as well as the fix of the Qpic Makefile which was wrong in
the first place.
There are also two smaller fixes about a missing error and status
check"
* tag 'mtd/fixes-for-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: spinand: Fix build with gcc < 7.5
mtd: rawnand: Add status chack in r852_ready()
mtd: inftlcore: Add error check for inftl_read_oob()
mtd: nand: Drop explicit test for built-in CONFIG_SPI_QPIC_SNAND
|
|
The 'read_write_len_ext' field in 'struct ethtool_cmis_cdb_cmd_args'
stores the maximum number of bytes that can be read from or written to
the Local Payload (LPL) page in a single multi-byte access.
Cited commit started overwriting this field with the maximum number of
bytes that can be read from or written to the Extended Payload (LPL)
pages in a single multi-byte access. Transceiver modules that support
auto paging can advertise a number larger than 255 which is problematic
as 'read_write_len_ext' is a 'u8', resulting in the number getting
truncated and firmware flashing failing [1].
Fix by ignoring the maximum EPL access size as the kernel does not
currently support auto paging (even if the transceiver module does) and
will not try to read / write more than 128 bytes at once.
[1]
Transceiver module firmware flashing started for device enp177s0np0
Transceiver module firmware flashing in progress for device enp177s0np0
Progress: 0%
Transceiver module firmware flashing encountered an error for device enp177s0np0
Status message: Write FW block EPL command failed, LPL length is longer
than CDB read write length extension allows.
Fixes: 9a3b0d078bd8 ("net: ethtool: Add support for writing firmware blocks using EPL payload")
Reported-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Closes: https://lore.kernel.org/netdev/20250402183123.321036-3-michael.chan@broadcom.com/
Tested-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20250409112440.365672-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following batch contains a Netfilter fix and improved test coverage:
1) Fix AVX2 matching in nft_pipapo, from Florian Westphal.
2) Extend existing test to improve coverage for the aforementioned bug,
also from Florian.
netfilter pull request 25-04-10
* tag 'nf-25-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
selftests: netfilter: add test case for recent mismatch bug
nft_set_pipapo: fix incorrect avx2 match of 5th field octet
====================
Link: https://patch.msgid.link/20250410103647.1030244-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Without 'nft_set_pipapo: fix incorrect avx2 match of 5th field octet"
this fails:
TEST: reported issues
Add two elements, flush, re-add 1s [ OK ]
net,mac with reload 0s [ OK ]
net,port,proto 3s [ OK ]
avx2 false match 0s [FAIL]
False match for fe80:dead:01fe:0a02:0b03:6007:8009:a001
Other tests do not detect the kernel bug as they only alter parts in
the /64 netmask.
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Given a set element like:
icmpv6 . dead:beef:00ff::1
The value of 'ff' is irrelevant, any address will be matched
as long as the other octets are the same.
This is because of too-early register clobbering:
ymm7 is reloaded with new packet data (pkt[9]) but it still holds data
of an earlier load that wasn't processed yet.
The existing tests in nft_concat_range.sh selftests do exercise this code
path, but do not trigger incorrect matching due to the network prefix
limitation.
Fixes: 7400b063969b ("nft_set_pipapo: Introduce AVX2-based lookup implementation")
Reported-by: sontu mazumdar <sontu21@gmail.com>
Closes: https://lore.kernel.org/netfilter/CANgxkqwnMH7fXra+VUfODT-8+qFLgskq3set1cAzqqJaV4iEZg@mail.gmail.com/T/#t
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Ensure we have enough data in linear buffer from skb before accessing
initial bytes. This prevents potential out-of-bounds accesses
when processing short packets.
When ppp_sync_txmung receives an incoming package with an empty
payload:
(remote) gef➤ p *(struct pppoe_hdr *) (skb->head + skb->network_header)
$18 = {
type = 0x1,
ver = 0x1,
code = 0x0,
sid = 0x2,
length = 0x0,
tag = 0xffff8880371cdb96
}
from the skb struct (trimmed)
tail = 0x16,
end = 0x140,
head = 0xffff88803346f400 "4",
data = 0xffff88803346f416 ":\377",
truesize = 0x380,
len = 0x0,
data_len = 0x0,
mac_len = 0xe,
hdr_len = 0x0,
it is not safe to access data[2].
Reported-by: syzbot+29fc8991b0ecb186cf40@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=29fc8991b0ecb186cf40
Tested-by: syzbot+29fc8991b0ecb186cf40@syzkaller.appspotmail.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
Link: https://patch.msgid.link/20250408-bound-checking-ppp_txmung-v2-1-94bb6e1b92d0@arnaud-lcm.com
[pabeni@redhat.com: fixed subj typo]
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When I ran the repro [0] and waited a few seconds, I observed two
LOCKDEP splats: a warning immediately followed by a null-ptr-deref. [1]
Reproduction Steps:
1) Mount CIFS
2) Add an iptables rule to drop incoming FIN packets for CIFS
3) Unmount CIFS
4) Unload the CIFS module
5) Remove the iptables rule
At step 3), the CIFS module calls sock_release() for the underlying
TCP socket, and it returns quickly. However, the socket remains in
FIN_WAIT_1 because incoming FIN packets are dropped.
At this point, the module's refcnt is 0 while the socket is still
alive, so the following rmmod command succeeds.
# ss -tan
State Recv-Q Send-Q Local Address:Port Peer Address:Port
FIN-WAIT-1 0 477 10.0.2.15:51062 10.0.0.137:445
# lsmod | grep cifs
cifs 1159168 0
This highlights a discrepancy between the lifetime of the CIFS module
and the underlying TCP socket. Even after CIFS calls sock_release()
and it returns, the TCP socket does not die immediately in order to
close the connection gracefully.
While this is generally fine, it causes an issue with LOCKDEP because
CIFS assigns a different lock class to the TCP socket's sk->sk_lock
using sock_lock_init_class_and_name().
Once an incoming packet is processed for the socket or a timer fires,
sk->sk_lock is acquired.
Then, LOCKDEP checks the lock context in check_wait_context(), where
hlock_class() is called to retrieve the lock class. However, since
the module has already been unloaded, hlock_class() logs a warning
and returns NULL, triggering the null-ptr-deref.
If LOCKDEP is enabled, we must ensure that a module calling
sock_lock_init_class_and_name() (CIFS, NFS, etc) cannot be unloaded
while such a socket is still alive to prevent this issue.
Let's hold the module reference in sock_lock_init_class_and_name()
and release it when the socket is freed in sk_prot_free().
Note that sock_lock_init() clears sk->sk_owner for svc_create_socket()
that calls sock_lock_init_class_and_name() for a listening socket,
which clones a socket by sk_clone_lock() without GFP_ZERO.
[0]:
CIFS_SERVER="10.0.0.137"
CIFS_PATH="//${CIFS_SERVER}/Users/Administrator/Desktop/CIFS_TEST"
DEV="enp0s3"
CRED="/root/WindowsCredential.txt"
MNT=$(mktemp -d /tmp/XXXXXX)
mount -t cifs ${CIFS_PATH} ${MNT} -o vers=3.0,credentials=${CRED},cache=none,echo_interval=1
iptables -A INPUT -s ${CIFS_SERVER} -j DROP
for i in $(seq 10);
do
umount ${MNT}
rmmod cifs
sleep 1
done
rm -r ${MNT}
iptables -D INPUT -s ${CIFS_SERVER} -j DROP
[1]:
DEBUG_LOCKS_WARN_ON(1)
WARNING: CPU: 10 PID: 0 at kernel/locking/lockdep.c:234 hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223)
Modules linked in: cifs_arc4 nls_ucs2_utils cifs_md4 [last unloaded: cifs]
CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Not tainted 6.14.0 #36
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223)
...
Call Trace:
<IRQ>
__lock_acquire (kernel/locking/lockdep.c:4853 kernel/locking/lockdep.c:5178)
lock_acquire (kernel/locking/lockdep.c:469 kernel/locking/lockdep.c:5853 kernel/locking/lockdep.c:5816)
_raw_spin_lock_nested (kernel/locking/spinlock.c:379)
tcp_v4_rcv (./include/linux/skbuff.h:1678 ./include/net/tcp.h:2547 net/ipv4/tcp_ipv4.c:2350)
...
BUG: kernel NULL pointer dereference, address: 00000000000000c4
PF: supervisor read access in kernel mode
PF: error_code(0x0000) - not-present page
PGD 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Tainted: G W 6.14.0 #36
Tainted: [W]=WARN
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:__lock_acquire (kernel/locking/lockdep.c:4852 kernel/locking/lockdep.c:5178)
Code: 15 41 09 c7 41 8b 44 24 20 25 ff 1f 00 00 41 09 c7 8b 84 24 a0 00 00 00 45 89 7c 24 20 41 89 44 24 24 e8 e1 bc ff ff 4c 89 e7 <44> 0f b6 b8 c4 00 00 00 e8 d1 bc ff ff 0f b6 80 c5 00 00 00 88 44
RSP: 0018:ffa0000000468a10 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ff1100010091cc38 RCX: 0000000000000027
RDX: ff1100081f09ca48 RSI: 0000000000000001 RDI: ff1100010091cc88
RBP: ff1100010091c200 R08: ff1100083fe6e228 R09: 00000000ffffbfff
R10: ff1100081eca0000 R11: ff1100083fe10dc0 R12: ff1100010091cc88
R13: 0000000000000001 R14: 0000000000000000 R15: 00000000000424b1
FS: 0000000000000000(0000) GS:ff1100081f080000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000c4 CR3: 0000000002c4a003 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
lock_acquire (kernel/locking/lockdep.c:469 kernel/locking/lockdep.c:5853 kernel/locking/lockdep.c:5816)
_raw_spin_lock_nested (kernel/locking/spinlock.c:379)
tcp_v4_rcv (./include/linux/skbuff.h:1678 ./include/net/tcp.h:2547 net/ipv4/tcp_ipv4.c:2350)
ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1))
ip_local_deliver_finish (./include/linux/rcupdate.h:878 net/ipv4/ip_input.c:234)
ip_sublist_rcv_finish (net/ipv4/ip_input.c:576)
ip_list_rcv_finish (net/ipv4/ip_input.c:628)
ip_list_rcv (net/ipv4/ip_input.c:670)
__netif_receive_skb_list_core (net/core/dev.c:5939 net/core/dev.c:5986)
netif_receive_skb_list_internal (net/core/dev.c:6040 net/core/dev.c:6129)
napi_complete_done (./include/linux/list.h:37 ./include/net/gro.h:519 ./include/net/gro.h:514 net/core/dev.c:6496)
e1000_clean (drivers/net/ethernet/intel/e1000/e1000_main.c:3815)
__napi_poll.constprop.0 (net/core/dev.c:7191)
net_rx_action (net/core/dev.c:7262 net/core/dev.c:7382)
handle_softirqs (kernel/softirq.c:561)
__irq_exit_rcu (kernel/softirq.c:596 kernel/softirq.c:435 kernel/softirq.c:662)
irq_exit_rcu (kernel/softirq.c:680)
common_interrupt (arch/x86/kernel/irq.c:280 (discriminator 14))
</IRQ>
<TASK>
asm_common_interrupt (./arch/x86/include/asm/idtentry.h:693)
RIP: 0010:default_idle (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:92 arch/x86/kernel/process.c:744)
Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d c3 2b 15 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
RSP: 0018:ffa00000000ffee8 EFLAGS: 00000202
RAX: 000000000000640b RBX: ff1100010091c200 RCX: 0000000000061aa4
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff812f30c5
RBP: 000000000000000a R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
? do_idle (kernel/sched/idle.c:186 kernel/sched/idle.c:325)
default_idle_call (./include/linux/cpuidle.h:143 kernel/sched/idle.c:118)
do_idle (kernel/sched/idle.c:186 kernel/sched/idle.c:325)
cpu_startup_entry (kernel/sched/idle.c:422 (discriminator 1))
start_secondary (arch/x86/kernel/smpboot.c:315)
common_startup_64 (arch/x86/kernel/head_64.S:421)
</TASK>
Modules linked in: cifs_arc4 nls_ucs2_utils cifs_md4 [last unloaded: cifs]
CR2: 00000000000000c4
Fixes: ed07536ed673 ("[PATCH] lockdep: annotate nfs/nfsd in-kernel sockets")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250407163313.22682-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A nexthop is only chosen when the calculated multipath hash falls in the
nexthop's hash region (i.e., the hash is smaller than the nexthop's hash
threshold) and when the nexthop is assigned a non-negative score by
rt6_score_route().
Commit 4d0ab3a6885e ("ipv6: Start path selection from the first
nexthop") introduced an unintentional difference between the first
nexthop and the rest when the score is negative.
When the first nexthop matches, but has a negative score, the code will
currently evaluate subsequent nexthops until one is found with a
non-negative score. On the other hand, when a different nexthop matches,
but has a negative score, the code will fallback to the nexthop with
which the selection started ('match').
Align the behavior across all nexthops and fallback to 'match' when the
first nexthop matches, but has a negative score.
Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N")
Fixes: 4d0ab3a6885e ("ipv6: Start path selection from the first nexthop")
Reported-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Closes: https://lore.kernel.org/netdev/67efef607bc41_1ddca82948c@willemb.c.googlers.com.notmuch/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250408084316.243559-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
phylink-controlled PHY
DSA has 2 kinds of drivers:
1. Those who call dsa_switch_suspend() and dsa_switch_resume() from
their device PM ops: qca8k-8xxx, bcm_sf2, microchip ksz
2. Those who don't: all others. The above methods should be optional.
For type 1, dsa_switch_suspend() calls dsa_user_suspend() -> phylink_stop(),
and dsa_switch_resume() calls dsa_user_resume() -> phylink_start().
These seem good candidates for setting mac_managed_pm = true because
that is essentially its definition [1], but that does not seem to be the
biggest problem for now, and is not what this change focuses on.
Talking strictly about the 2nd category of DSA drivers here (which
do not have MAC managed PM, meaning that for their attached PHYs,
mdio_bus_phy_suspend() and mdio_bus_phy_resume() should run in full),
I have noticed that the following warning from mdio_bus_phy_resume() is
triggered:
WARN_ON(phydev->state != PHY_HALTED && phydev->state != PHY_READY &&
phydev->state != PHY_UP);
because the PHY state machine is running.
It's running as a result of a previous dsa_user_open() -> ... ->
phylink_start() -> phy_start() having been initiated by the user.
The previous mdio_bus_phy_suspend() was supposed to have called
phy_stop_machine(), but it didn't. So this is why the PHY is in state
PHY_NOLINK by the time mdio_bus_phy_resume() runs.
mdio_bus_phy_suspend() did not call phy_stop_machine() because for
phylink, the phydev->adjust_link function pointer is NULL. This seems a
technicality introduced by commit fddd91016d16 ("phylib: fix PAL state
machine restart on resume"). That commit was written before phylink
existed, and was intended to avoid crashing with consumer drivers which
don't use the PHY state machine - phylink always does, when using a PHY.
But phylink itself has historically not been developed with
suspend/resume in mind, and apparently not tested too much in that
scenario, allowing this bug to exist unnoticed for so long. Plus, prior
to the WARN_ON(), it would have likely been invisible.
This issue is not in fact restricted to type 2 DSA drivers (according to
the above ad-hoc classification), but can be extrapolated to any MAC
driver with phylink and MDIO-bus-managed PHY PM ops. DSA is just where
the issue was reported. Assuming mac_managed_pm is set correctly, a
quick search indicates the following other drivers might be affected:
$ grep -Zlr PHYLINK_NETDEV drivers/ | xargs -0 grep -L mac_managed_pm
drivers/net/ethernet/atheros/ag71xx.c
drivers/net/ethernet/microchip/sparx5/sparx5_main.c
drivers/net/ethernet/microchip/lan966x/lan966x_main.c
drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c
drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
drivers/net/ethernet/freescale/ucc_geth.c
drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
drivers/net/ethernet/marvell/mvneta.c
drivers/net/ethernet/marvell/prestera/prestera_main.c
drivers/net/ethernet/mediatek/mtk_eth_soc.c
drivers/net/ethernet/altera/altera_tse_main.c
drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c
drivers/net/ethernet/meta/fbnic/fbnic_phylink.c
drivers/net/ethernet/tehuti/tn40_phy.c
drivers/net/ethernet/mscc/ocelot_net.c
Make the existing conditions dependent on the PHY device having a
phydev->phy_link_change() implementation equal to the default
phy_link_change() provided by phylib. Otherwise, we implicitly know that
the phydev has the phylink-provided phylink_phy_change() callback, and
when phylink is used, the PHY state machine always needs to be stopped/
started on the suspend/resume path. The code is structured as such that
if phydev->phy_link_change() is absent, it is a matter of time until the
kernel will crash - no need to further complicate the test.
Thus, for the situation where the PM is not managed by the MAC, we will
make the MDIO bus PM ops treat identically the phylink-controlled PHYs
with the phylib-controlled PHYs where an adjust_link() callback is
supplied. In both cases, the MDIO bus PM ops should stop and restart the
PHY state machine.
[1] https://lore.kernel.org/netdev/Z-1tiW9zjcoFkhwc@shell.armlinux.org.uk/
Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
Reported-by: Wei Fang <wei.fang@nxp.com>
Tested-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250407094042.2155633-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In an upcoming change, mdio_bus_phy_may_suspend() will need to
distinguish a phylib-based PHY client from a phylink PHY client.
For that, it will need to compare the phydev->phy_link_change() function
pointer with the eponymous phy_link_change() provided by phylib.
To avoid forward function declarations, the default PHY link state
change method should be moved upwards. There is no functional change
associated with this patch, it is only to reduce the noise from a real
bug fix.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250407093900.2155112-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
- Fixes tpm2, futex, and mincore tests
- Create a dedicated .gitignore for tpm2 tests
* tag 'linux_kselftest-fixes-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/mincore: Allow read-ahead pages to reach the end of the file
selftests/futex: futex_waitv wouldblock test should fail
selftests: tpm2: test_smoke: use POSIX-conformant expression operator
selftests: tpm2: create a dedicated .gitignore
|
|
The ublk_ctrl_*() handlers all take struct io_uring_cmd *cmd but only
use it to get struct ublksrv_ctrl_cmd *header from the io_uring SQE.
Since the caller ublk_ctrl_uring_cmd() has already computed header, pass
it instead of cmd.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://lore.kernel.org/r/20250409012928.3527198-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
ubq->canceling is set with request queue quiesced when io_uring context is
exiting. USER_RECOVERY or !RECOVERY_FAIL_IO requires request to be re-queued
and re-dispatch after device is recovered.
However commit d796cea7b9f3 ("ublk: implement ->queue_rqs()") still may fail
any request in case of ubq->canceling, this way breaks USER_RECOVERY or
!RECOVERY_FAIL_IO.
Fix it by calling __ublk_abort_rq() in case of ubq->canceling.
Reviewed-by: Uday Shankar <ushankar@purestorage.com>
Reported-by: Uday Shankar <ushankar@purestorage.com>
Closes: https://lore.kernel.org/linux-block/Z%2FQkkTRHfRxtN%2FmB@dev-ushankar.dev.purestorage.com/
Fixes: d796cea7b9f3 ("ublk: implement ->queue_rqs()")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250409011444.2142010-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit 8284066946e6 ("ublk: grab request reference when the request is handled
by userspace") doesn't grab request reference in case of recovery reissue.
Then the request can be requeued & re-dispatch & failed when canceling
uring command.
If it is one zc request, the request can be freed before io_uring
returns the zc buffer back, then cause kernel panic:
[ 126.773061] BUG: kernel NULL pointer dereference, address: 00000000000000c8
[ 126.773657] #PF: supervisor read access in kernel mode
[ 126.774052] #PF: error_code(0x0000) - not-present page
[ 126.774455] PGD 0 P4D 0
[ 126.774698] Oops: Oops: 0000 [#1] SMP NOPTI
[ 126.775034] CPU: 13 UID: 0 PID: 1612 Comm: kworker/u64:55 Not tainted 6.14.0_blk+ #182 PREEMPT(full)
[ 126.775676] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39 04/01/2014
[ 126.776275] Workqueue: iou_exit io_ring_exit_work
[ 126.776651] RIP: 0010:ublk_io_release+0x14/0x130 [ublk_drv]
Fixes it by always grabbing request reference for aborting the request.
Reported-by: Caleb Sander Mateos <csander@purestorage.com>
Closes: https://lore.kernel.org/linux-block/CADUfDZodKfOGUeWrnAxcZiLT+puaZX8jDHoj_sfHZCOZwhzz6A@mail.gmail.com/
Fixes: 8284066946e6 ("ublk: grab request reference when the request is handled by userspace")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250409011444.2142010-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Octavian Purdila says:
====================
net_sched: sch_sfq: reject a derived limit of 1
Because sfq parameters can influence each other there can be
situations where although the user sets a limit of 2 it can be lowered
to 1:
$ tc qdisc add dev dummy0 handle 1: root sfq limit 2 flows 1 depth 1
$ tc qdisc show dev dummy0
qdisc sfq 1: dev dummy0 root refcnt 2 limit 1p quantum 1514b depth 1 divisor 1024
$ tc qdisc add dev dummy0 handle 1: root sfq limit 2 flows 10 depth 1 divisor 1
$ tc qdisc show dev dummy0
qdisc sfq 2: root refcnt 2 limit 1p quantum 1514b depth 1 divisor 1
As a limit of 1 is invalid, this patch series moves the limit
validation to after all configuration changes have been done. To do
so, the configuration is done in a temporary work area then applied to
the internal state.
The patch series also adds new test cases.
v3:
- remove a couple of unnecessary comments
- rearrange local variables to use reverse Christmas tree style
declaration order
v2: https://lore.kernel.org/all/20250402162750.1671155-1-tavip@google.com/
- remove tmp struct and directly use local variables
v1: https://lore.kernel.org/all/20250328201634.3876474-1-tavip@google.com/
===================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Because the limit is updated indirectly when other parameters are
updated, there are cases where even though the user requests a limit
of 2 it can actually be set to 1.
Add the following test cases to check that the kernel rejects them:
- limit 2 depth 1 flows 1
- limit 2 depth 1 divisor 1
Signed-off-by: Octavian Purdila <tavip@google.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is not sufficient to directly validate the limit on the data that
the user passes as it can be updated based on how the other parameters
are changed.
Move the check at the end of the configuration update process to also
catch scenarios where the limit is indirectly updated, for example
with the following configurations:
tc qdisc add dev dummy0 handle 1: root sfq limit 2 flows 1 depth 1
tc qdisc add dev dummy0 handle 1: root sfq limit 2 flows 1 divisor 1
This fixes the following syzkaller reported crash:
------------[ cut here ]------------
UBSAN: array-index-out-of-bounds in net/sched/sch_sfq.c:203:6
index 65535 is out of range for type 'struct sfq_head[128]'
CPU: 1 UID: 0 PID: 3037 Comm: syz.2.16 Not tainted 6.14.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x201/0x300 lib/dump_stack.c:120
ubsan_epilogue lib/ubsan.c:231 [inline]
__ubsan_handle_out_of_bounds+0xf5/0x120 lib/ubsan.c:429
sfq_link net/sched/sch_sfq.c:203 [inline]
sfq_dec+0x53c/0x610 net/sched/sch_sfq.c:231
sfq_dequeue+0x34e/0x8c0 net/sched/sch_sfq.c:493
sfq_reset+0x17/0x60 net/sched/sch_sfq.c:518
qdisc_reset+0x12e/0x600 net/sched/sch_generic.c:1035
tbf_reset+0x41/0x110 net/sched/sch_tbf.c:339
qdisc_reset+0x12e/0x600 net/sched/sch_generic.c:1035
dev_reset_queue+0x100/0x1b0 net/sched/sch_generic.c:1311
netdev_for_each_tx_queue include/linux/netdevice.h:2590 [inline]
dev_deactivate_many+0x7e5/0xe70 net/sched/sch_generic.c:1375
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: 10685681bafc ("net_sched: sch_sfq: don't allow 1 packet limit")
Signed-off-by: Octavian Purdila <tavip@google.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Many configuration parameters have influence on others (e.g. divisor
-> flows -> limit, depth -> limit) and so it is difficult to correctly
do all of the validation before applying the configuration. And if a
validation error is detected late it is difficult to roll back a
partially applied configuration.
To avoid these issues use a temporary work area to update and validate
the configuration and only then apply the configuration to the
internal state.
Signed-off-by: Octavian Purdila <tavip@google.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit fixes from Shuah Khan:
- Fix the tool to report test count in case of a late test plan when
tests are specified before the test plan
- Fix spelling error
* tag 'linux_kselftest-kunit-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: Spelling s/slowm/slow/
kunit: tool: fix count of tests if late test plan
|
|
page_pool_dev_alloc_pages could return NULL. There was a WARN_ON(!page)
but it would still proceed to use the NULL pointer and then crash.
This is similar to commit 001ba0902046
("net: fec: handle page_pool_dev_alloc_pages error").
This is found by our static analysis tool KNighter.
Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com>
Fixes: 3c47e8ae113a ("net: libwx: Support to receive packets in NAPI")
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250407184952.2111299-1-chenyuan0y@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Matthieu Baerts says:
====================
mptcp: only inc MPJoinAckHMacFailure for HMAC failures
Recently, during a debugging session using local MPTCP connections, I
noticed MPJoinAckHMacFailure was strangely not zero on the server side.
The first patch fixes this issue -- present since v5.9 -- and the second
one validates it in the selftests.
====================
Link: https://patch.msgid.link/20250407-net-mptcp-hmac-failure-mib-v1-0-3c9ecd0a3a50@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The parent commit fixes an issue around these counters where one of them
-- MPJoinAckHMacFailure -- was wrongly incremented in some cases.
This makes sure the counter is always 0. It should be incremented only
in case of corruption, or a wrong implementation, which should not be
the case in these selftests.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250407-net-mptcp-hmac-failure-mib-v1-2-3c9ecd0a3a50@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Recently, during a debugging session using local MPTCP connections, I
noticed MPJoinAckHMacFailure was not zero on the server side. The
counter was in fact incremented when the PM rejected new subflows,
because the 'subflow' limit was reached.
The fix is easy, simply dissociating the two cases: only the HMAC
validation check should increase MPTCP_MIB_JOINACKMAC counter.
Fixes: 4cf8b7e48a09 ("subflow: introduce and use mptcp_can_accept_new_subflow()")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250407-net-mptcp-hmac-failure-mib-v1-1-3c9ecd0a3a50@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When running the mincore_selftest on a system with an XFS file system, it
failed the "check_file_mmap" test case due to the read-ahead pages reaching
the end of the file. The failure log is as below:
RUN global.check_file_mmap ...
mincore_selftest.c:264:check_file_mmap:Expected i (1024) < vec_size (1024)
mincore_selftest.c:265:check_file_mmap:Read-ahead pages reached the end of the file
check_file_mmap: Test failed
FAIL global.check_file_mmap
This is because the read-ahead window size of the XFS file system on this
machine is 4 MB, which is larger than the size from the #PF address to the
end of the file. As a result, all the pages for this file are populated.
blockdev --getra /dev/nvme0n1p5
8192
blockdev --getbsz /dev/nvme0n1p5
512
This issue can be fixed by extending the current FILE_SIZE 4MB to a larger
number, but it will still fail if the read-ahead window size of the file
system is larger enough. Additionally, in the real world, read-ahead pages
reaching the end of the file can happen and is an expected behavior.
Therefore, allowing read-ahead pages to reach the end of the file is a
better choice for the "check_file_mmap" test case.
Link: https://lore.kernel.org/r/20250311080940.21413-1-qiuxu.zhuo@intel.com
Reported-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Testcase should fail if -EWOULDBLOCK is not returned when expected value
differs from actual value from the waiter.
Link: https://lore.kernel.org/r/20250404221225.1596324-1-edliaw@google.com
Fixes: 9d57f7c79748920636f8293d2f01192d702fe390 ("selftests: futex: Test sys_futex_waitv() wouldblock")
Signed-off-by: Edward Liaw <edliaw@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Fix a misspelling of "slow".
Link: https://lore.kernel.org/r/1f7ebf98598418914ec9f5b6d5cb8583d24a4bf0.1743089563.git.geert@linux-m68k.org
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <shuah@kernel.org>
|
|
Fix test count with late test plan.
For example,
TAP version 13
ok 1 test1
1..4
Returns a count of 1 passed, 1 crashed (because it expects tests after
the test plan): returning the total count of 2 tests
Change this to be 1 passed, 1 error: total count of 1 test
Link: https://lore.kernel.org/r/20250319223351.1517262-1-rmoar@google.com
Signed-off-by: Rae Moar <rmoar@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <shuah@kernel.org>
|
|
Use POSIX-conformant expression operator symbol '='.
The use of the non POSIX-conformant symbol '==' would work
in bash, but not in sh where the unexpected operator error
would result in test_smoke.sh being skipped.
Instead of changing the shebang to use bash, which may not be
available on all systems, use the POSIX-conformant expression
symbol '=' to test for equality.
Without this patch:
===================
# make -j8 TARGETS=tpm2 kselftest
# selftests: tpm2: test_smoke.sh
# ./test_smoke.sh: 9: [: 2: unexpected operator
ok 1 selftests: tpm2: test_smoke.sh # SKIP
With this patch:
================
# make -j8 TARGETS=tpm2 kselftest
# selftests: tpm2: test_smoke.sh
# Ran 9 tests in 9.236s
ok 1 selftests: tpm2: test_smoke.sh
Link: https://lore.kernel.org/r/37ztyakgrrtgvec344mg7mspchwjpxxtsprtjidso3pwkmm4f4@awsa5mzgqmtb
Signed-off-by: Ahmed Salem <x0rw3ll@gmail.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
The tpm2 selftests produce two logs: SpaceTest.log and
AsyncTest.log. Only SpaceTest.log was listed in selftests/.gitignore,
while AsyncTest.log remained untracked.
This change creates a dedicated .gitignore in the tpm2/ directory to
manage these entries, keeping tpm2-specific patterns isolated from
parent .gitignore.
Fixed white-space errors during commit
Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250126195147.902608-1-khaledelnaggarlinux@gmail.com
Signed-off-by: Khaled Elnaggar <khaledelnaggarlinux@gmail.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Rework heuristics for resolving the fault IPA (HPFAR_EL2 v. re-walk
stage-1 page tables) to align with the architecture. This avoids
possibly taking an SEA at EL2 on the page table walk or using an
architecturally UNKNOWN fault IPA
- Use acquire/release semantics in the KVM FF-A proxy to avoid
reading a stale value for the FF-A version
- Fix KVM guest driver to match PV CPUID hypercall ABI
- Use Inner Shareable Normal Write-Back mappings at stage-1 in KVM
selftests, which is the only memory type for which atomic
instructions are architecturally guaranteed to work
s390:
- Don't use %pK for debug printing and tracepoints
x86:
- Use a separate subclass when acquiring KVM's per-CPU posted
interrupts wakeup lock in the scheduled out path, i.e. when adding
a vCPU on the list of vCPUs to wake, to workaround a false positive
deadlock. The schedule out code runs with a scheduler lock that the
wakeup handler takes in the opposite order; but it does so with
IRQs disabled and cannot run concurrently with a wakeup
- Explicitly zero-initialize on-stack CPUID unions
- Allow building irqbypass.ko as as module when kvm.ko is a module
- Wrap relatively expensive sanity check with KVM_PROVE_MMU
- Acquire SRCU in KVM_GET_MP_STATE to protect guest memory accesses
selftests:
- Add more scenarios to the MONITOR/MWAIT test
- Add option to rseq test to override /dev/cpu_dma_latency
- Bring list of exit reasons up to date
- Cleanup Makefile to list once tests that are valid on all
architectures
Other:
- Documentation fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits)
KVM: arm64: Use acquire/release to communicate FF-A version negotiation
KVM: arm64: selftests: Explicitly set the page attrs to Inner-Shareable
KVM: arm64: selftests: Introduce and use hardware-definition macros
KVM: VMX: Use separate subclasses for PI wakeup lock to squash false positive
KVM: VMX: Assert that IRQs are disabled when putting vCPU on PI wakeup list
KVM: x86: Explicitly zero-initialize on-stack CPUID unions
KVM: Allow building irqbypass.ko as as module when kvm.ko is a module
KVM: x86/mmu: Wrap sanity check on number of TDP MMU pages with KVM_PROVE_MMU
KVM: selftests: Add option to rseq test to override /dev/cpu_dma_latency
KVM: x86: Acquire SRCU in KVM_GET_MP_STATE to protect guest memory accesses
Documentation: kvm: remove KVM_CAP_MIPS_TE
Documentation: kvm: organize capabilities in the right section
Documentation: kvm: fix some definition lists
Documentation: kvm: drop "Capability" heading from capabilities
Documentation: kvm: give correct name for KVM_CAP_SPAPR_MULTITCE
Documentation: KVM: KVM_GET_SUPPORTED_CPUID now exposes TSC_DEADLINE
selftests: kvm: list once tests that are valid on all architectures
selftests: kvm: bring list of exit reasons up to date
selftests: kvm: revamp MONITOR/MWAIT tests
KVM: arm64: Don't translate FAR if invalid/unsafe
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:
- fprobe: remove fprobe_hlist_node when module unloading
When a fprobe target module is removed, the fprobe_hlist_node should
be removed from the fprobe's hash table to prevent reusing
accidentally if another module is loaded at the same address.
- fprobe: lock module while registering fprobe
The module containing the function to be probeed is locked using a
reference counter until the fprobe registration is complete, which
prevents use after free.
- fprobe-events: fix possible UAF on modules
Basically as same as above, but in the fprobe-events layer we also
need to get module reference counter when we find the tracepoint in
the module.
* tag 'probes-fixes-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: fprobe: Cleanup fprobe hash when module unloading
tracing: fprobe events: Fix possible UAF on modules
tracing: fprobe: Fix to lock module while registering fprobe
|
|
When validate_linkmsg() fails in do_setlink(), we jump to the errout
label and calls netdev_unlock_ops() even though we have not called
netdev_lock_ops() as reported by syzbot. [0]
Let's return an error directly in such a case.
[0]
WARNING: bad unlock balance detected!
6.14.0-syzkaller-12504-g8bc251e5d874 #0 Not tainted
syz-executor814/5834 is trying to release lock (&dev_instance_lock_key) at:
[<ffffffff89f41f56>] netdev_unlock include/linux/netdevice.h:2756 [inline]
[<ffffffff89f41f56>] netdev_unlock_ops include/net/netdev_lock.h:48 [inline]
[<ffffffff89f41f56>] do_setlink+0xc26/0x43a0 net/core/rtnetlink.c:3406
but there are no more locks to release!
other info that might help us debug this:
1 lock held by syz-executor814/5834:
#0: ffffffff900fc408 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:80 [inline]
#0: ffffffff900fc408 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock net/core/rtnetlink.c:341 [inline]
#0: ffffffff900fc408 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0xd68/0x1fe0 net/core/rtnetlink.c:4064
stack backtrace:
CPU: 0 UID: 0 PID: 5834 Comm: syz-executor814 Not tainted 6.14.0-syzkaller-12504-g8bc251e5d874 #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_unlock_imbalance_bug+0x185/0x1a0 kernel/locking/lockdep.c:5296
__lock_release kernel/locking/lockdep.c:5535 [inline]
lock_release+0x1ed/0x3e0 kernel/locking/lockdep.c:5887
__mutex_unlock_slowpath+0xee/0x800 kernel/locking/mutex.c:907
netdev_unlock include/linux/netdevice.h:2756 [inline]
netdev_unlock_ops include/net/netdev_lock.h:48 [inline]
do_setlink+0xc26/0x43a0 net/core/rtnetlink.c:3406
rtnl_group_changelink net/core/rtnetlink.c:3783 [inline]
__rtnl_newlink net/core/rtnetlink.c:3937 [inline]
rtnl_newlink+0x1619/0x1fe0 net/core/rtnetlink.c:4065
rtnetlink_rcv_msg+0x80f/0xd70 net/core/rtnetlink.c:6955
netlink_rcv_skb+0x208/0x480 net/netlink/af_netlink.c:2534
netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
netlink_unicast+0x7f8/0x9a0 net/netlink/af_netlink.c:1339
netlink_sendmsg+0x8c3/0xcd0 net/netlink/af_netlink.c:1883
sock_sendmsg_nosec net/socket.c:712 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:727
____sys_sendmsg+0x523/0x860 net/socket.c:2566
___sys_sendmsg net/socket.c:2620 [inline]
__sys_sendmsg+0x271/0x360 net/socket.c:2652
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f8427b614a9
Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff9b59f3a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fff9b59f578 RCX: 00007f8427b614a9
RDX: 0000000000000000 RSI: 0000200000000300 RDI: 0000000000000004
RBP: 00007f8427bd4610 R08: 000000000000000c R09: 00007fff9b59f578
R10: 000000000000001b R11: 0000000000000246 R12: 0000000000000001
R13:
Fixes: 4c975fd70002 ("net: hold instance lock during NETDEV_REGISTER/UP")
Reported-by: syzbot+45016fe295243a7882d3@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=45016fe295243a7882d3
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250407164229.24414-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- A number of cpuset remote partition related fixes and cleanups along
with selftest updates.
- A change from this merge window made cgroup_rstat_updated_list()
called outside cgroup_rstat_lock leading to list corruptions. Fix it
by relocating the call inside the lock.
* tag 'cgroup-for-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: Fix race between newly created partition and dying one
cgroup: rstat: call cgroup_rstat_updated_list with cgroup_rstat_lock
selftest/cgroup: Add a remote partition transition test to test_cpuset_prs.sh
selftest/cgroup: Clean up and restructure test_cpuset_prs.sh
selftest/cgroup: Update test_cpuset_prs.sh to use | as effective CPUs and state separator
cgroup/cpuset: Remove unneeded goto in sched_partition_write() and rename it
cgroup/cpuset: Code cleanup and comment update
cgroup/cpuset: Don't allow creation of local partition over a remote one
cgroup/cpuset: Remove remote_partition_check() & make update_cpumasks_hier() handle remote partition
cgroup/cpuset: Fix error handling in remote_partition_disable()
cgroup/cpuset: Fix incorrect isolated_cpus update in update_parent_effective_cpumask()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC cleanups from Eric Biggers:
"Finish cleaning up the CRC kconfig options by removing the remaining
unnecessary prompts and an unnecessary 'default y', removing
CONFIG_LIBCRC32C, and documenting all the CRC library options"
* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crc: remove CONFIG_LIBCRC32C
lib/crc: document all the CRC library kconfig options
lib/crc: remove unnecessary prompt for CONFIG_CRC_ITU_T
lib/crc: remove unnecessary prompt for CONFIG_CRC_T10DIF
lib/crc: remove unnecessary prompt for CONFIG_CRC16
lib/crc: remove unnecessary prompt for CONFIG_CRC_CCITT
lib/crc: remove unnecessary prompt for CONFIG_CRC32 and drop 'default y'
|
|
A recent optimization change in LLVM [1] aims to transform certain loop
idioms into calls to strlen() or wcslen(). This change transforms the
first while loop in UniStrcat() into a call to wcslen(), breaking the
build when UniStrcat() gets inlined into alloc_path_with_tree_prefix():
ld.lld: error: undefined symbol: wcslen
>>> referenced by nls_ucs2_utils.h:54 (fs/smb/client/../../nls/nls_ucs2_utils.h:54)
>>> vmlinux.o:(alloc_path_with_tree_prefix)
>>> referenced by nls_ucs2_utils.h:54 (fs/smb/client/../../nls/nls_ucs2_utils.h:54)
>>> vmlinux.o:(alloc_path_with_tree_prefix)
Disable this optimization with '-fno-builtin-wcslen', which prevents the
compiler from assuming that wcslen() is available in the kernel's C
library.
[ More to the point - it's not that we couldn't implement wcslen(), it's
that this isn't an optimization at all in the context of the kernel.
Replacing a simple inlined loop with a function call to the same loop
is just stupid and pointless if you don't have long strings and fancy
libraries with vectorization support etc.
For the regular 'strlen()' cases, we want the compiler to do this in
order to handle the trivial case of constant strings. And we do have
optimized versions of 'strlen()' on some architectures. But for
wcslen? Just no. - Linus ]
Cc: stable@vger.kernel.org
Link: https://github.com/llvm/llvm-project/commit/9694844d7e36fd5e01011ab56b64f27b867aa72d [1]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There's a consistent pattern where the .cleanup_data() callback is
called when .prepare_data() fails, when it should really be called to
clean after a successful .prepare_data() as per the documentation.
Rewrite the error-handling paths to make sure we don't cleanup
un-prepared data.
Fixes: c781ff12a2f3 ("ethtool: Allow network drivers to dump arbitrary EEPROM data")
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250407130511.75621-1-maxime.chevallier@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The tfilter_notify() and tfilter_del_notify() functions assume that
NLMSG_GOODSIZE is always enough to dump the filter chain. This is not
always the case, which can lead to silent notify failures (because the
return code of tfilter_notify() is not always checked). In particular,
this can lead to NLM_F_ECHO not being honoured even though an action
succeeds, which forces userspace to create workarounds[0].
Fix this by increasing the message size if dumping the filter chain into
the allocated skb fails. Use the size of the incoming skb as a size hint
if set, so we can start at a larger value when appropriate.
To trigger this, run the following commands:
# ip link add type veth
# tc qdisc replace dev veth0 root handle 1: fq_codel
# tc -echo filter add dev veth0 parent 1: u32 match u32 0 0 $(for i in $(seq 32); do echo action pedit munge ip dport set 22; done)
Before this fix, tc just returns:
Not a filter(cmd 2)
After the fix, we get the correct echo:
added filter dev veth0 parent 1: protocol all pref 49152 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 terminal flowid not_in_hw
match 00000000/00000000 at 0
action order 1: pedit action pass keys 1
index 1 ref 1 bind 1
key #0 at 20: val 00000016 mask ffff0000
[repeated 32 times]
[0] https://github.com/openvswitch/ovs/commit/106ef21860c935e5e0017a88bf42b94025c4e511
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Frode Nordahl <frode.nordahl@canonical.com>
Closes: https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/2018500
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20250407105542.16601-1-toke@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
WX_RXD_IPV6EX was incorrectly defined in Rx ring descriptor. In fact, this
field stores the 802.1ad ID from which the packet was received. The wrong
definition caused the statistics rx_csum_offload_errors to fail to grow
when receiving the 802.1ad packet with incorrect checksum.
Fixes: ef4f3c19f912 ("net: wangxun: libwx add rx offload functions")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/20250407103322.273241-1-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When running as a PVH dom0 the ACPI tables exposed to Linux are (mostly)
the native ones, thus exposing the C and P states, that can lead to
attachment of CPU idle and frequency drivers. However the entity in
control of the CPU C and P states is Xen, as dom0 doesn't have a full view
of the system load, neither has all CPUs assigned and identity pinned.
Like it's done for classic PV guests, prevent Linux from using idle or
frequency state drivers when running as a PVH dom0.
On an AMD EPYC 7543P system without this fix a Linux PVH dom0 will keep the
host CPUs spinning at 100% even when dom0 is completely idle, as it's
attempting to use the acpi_idle driver.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-ID: <20250407101842.67228-1-roger.pau@citrix.com>
|
|
The current code configures the Physical Function (PF) root node at TL1
and the Virtual Function (VF) root node at TL2.
This ensure at any given point of time PF traffic gets more priority.
PF root node
TL1
/ \
TL2 TL2 VF root node
/ \
TL3 TL3
/ \
TL4 TL4
/ \
SMQ SMQ
Due to a bug in the current code, the TL2 parent queue index on the
VF interface is not being configured, leading to 'SMQ Flush' errors
Fixes: 5e6808b4c68d ("octeontx2-pf: Add support for HTB offload")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250407070341.2765426-1-hkelam@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64: First batch of fixes for 6.15
- Rework heuristics for resolving the fault IPA (HPFAR_EL2 v. re-walk
stage-1 page tables) to align with the architecture. This avoids
possibly taking an SEA at EL2 on the page table walk or using an
architecturally UNKNOWN fault IPA.
- Use acquire/release semantics in the KVM FF-A proxy to avoid reading
a stale value for the FF-A version.
- Fix KVM guest driver to match PV CPUID hypercall ABI.
- Use Inner Shareable Normal Write-Back mappings at stage-1 in KVM
selftests, which is the only memory type for which atomic
instructions are architecturally guaranteed to work.
|
|
"Inspired" by syzbot test, pre-queue some data, disconnect()
and try to receive(). This used to trigger a warning in TLS's strp.
Now we expect the disconnect() to have almost no effect.
Link: https://lore.kernel.org/67e6be74.050a0220.2f068f.007e.GAE@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250404180334.3224206-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|