Age | Commit message (Collapse) | Author |
|
Commit b3ea416419c8 ("testing: net-drv: add basic shaper test")
removed the trailing backslash from the last entry. We have
a terminating comment here to avoid having to modify the last
line when adding at the end.
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20241010211857.2193076-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Wei Fang says:
====================
net: enetc: fix some issues of XDP
We found some bugs when testing the XDP function of enetc driver,
and these bugs are easy to reproduce. This is not only causes XDP
to not work, but also the network cannot be restored after exiting
the XDP program. So the patch set is mainly to fix these bugs. For
details, please see the commit message of each patch.
v1: https://lore.kernel.org/bpf/20240919084104.661180-1-wei.fang@nxp.com/
v2: https://lore.kernel.org/netdev/20241008224806.2onzkt3gbslw5jxb@skbuf/
v3: https://lore.kernel.org/imx/20241009090327.146461-1-wei.fang@nxp.com/
====================
Link: https://patch.msgid.link/20241010092056.298128-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC
on LS1028A, it was found that if the command was re-run multiple times,
Rx could not receive the frames, and the result of xdp-bench showed
that the rx rate was 0.
root@ls1028ardb:~# ./xdp-bench tx eno0
Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc)
Summary 2046 rx/s 0 err,drop/s
Summary 0 rx/s 0 err,drop/s
Summary 0 rx/s 0 err,drop/s
Summary 0 rx/s 0 err,drop/s
By observing the Rx PIR and CIR registers, CIR is always 0x7FF and
PIR is always 0x7FE, which means that the Rx ring is full and can no
longer accommodate other Rx frames. Therefore, the problem is caused
by the Rx BD ring not being cleaned up.
Further analysis of the code revealed that the Rx BD ring will only
be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met.
Therefore, some debug logs were added to the driver and the current
values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx
BD ring was full. The logs are as follows.
[ 178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140
[ 178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110
[ 178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110
From the results, the max value of xdp_tx_in_flight has reached 2140.
However, the size of the Rx BD ring is only 2048. So xdp_tx_in_flight
did not drop to 0 after enetc_stop() is called and the driver does not
clear it. The root cause is that NAPI is disabled too aggressively,
without having waited for the pending XDP_TX frames to be transmitted,
and their buffers recycled, so that xdp_tx_in_flight cannot naturally
drop to 0. Later, enetc_free_tx_ring() does free those stale, unsent
XDP_TX packets, but it is not coded up to also reset xdp_tx_in_flight,
hence the manifestation of the bug.
One option would be to cover this extra condition in enetc_free_tx_ring(),
but now that the ENETC_TX_DOWN exists, we have created a window at
the beginning of enetc_stop() where NAPI can still be scheduled, but
any concurrent enqueue will be blocked. Therefore, enetc_wait_bdrs()
and enetc_disable_tx_bdrs() can be called with NAPI still scheduled,
and it is guaranteed that this will not wait indefinitely, but instead
give us an indication that the pending TX frames have orderly dropped
to zero. Only then should we call napi_disable().
This way, enetc_free_tx_ring() becomes entirely redundant and can be
dropped as part of subsequent cleanup.
The change also refactors enetc_start() so that it looks like the
mirror opposite procedure of enetc_stop().
Fixes: ff58fda09096 ("net: enetc: prioritize ability to go down over packet processing")
Cc: stable@vger.kernel.org
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241010092056.298128-5-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The Tx BD rings are disabled first in enetc_stop() and the driver
waits for them to become empty. This operation is not safe while
the ring is actively transmitting frames, and will cause the ring
to not be empty and hardware exception. As described in the NETC
block guide, software should only disable an active Tx ring after
all pending ring entries have been consumed (i.e. when PI = CI).
Disabling a transmit ring that is actively processing BDs risks
a HW-SW race hazard whereby a hardware resource becomes assigned
to work on one or more ring entries only to have those entries be
removed due to the ring becoming disabled.
When testing XDP_REDIRECT feautre, although all frames were blocked
from being put into Tx rings during ring reconfiguration, the similar
warning log was still encountered:
fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear
fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear
The reason is that when there are still unsent frames in the Tx ring,
disabling the Tx ring causes the remaining frames to be unable to be
sent out. And the Tx ring cannot be restored, which means that even
if the xdp program is uninstalled, the Tx frames cannot be sent out
anymore. Therefore, correct the operation order in enect_start() and
enect_stop().
Fixes: ff58fda09096 ("net: enetc: prioritize ability to go down over packet processing")
Cc: stable@vger.kernel.org
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241010092056.298128-4-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When testing the XDP_REDIRECT function on the LS1028A platform, we
found a very reproducible issue that the Tx frames can no longer be
sent out even if XDP_REDIRECT is turned off. Specifically, if there
is a lot of traffic on Rx direction, when XDP_REDIRECT is turned on,
the console may display some warnings like "timeout for tx ring #6
clear", and all redirected frames will be dropped, the detailed log
is as follows.
root@ls1028ardb:~# ./xdp-bench redirect eno0 eno2
Redirecting from eno0 (ifindex 3; driver fsl_enetc) to eno2 (ifindex 4; driver fsl_enetc)
[203.849809] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #5 clear
[204.006051] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear
[204.161944] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear
eno0->eno2 1420505 rx/s 1420590 err,drop/s 0 xmit/s
xmit eno0->eno2 0 xmit/s 1420590 drop/s 0 drv_err/s 15.71 bulk-avg
eno0->eno2 1420484 rx/s 1420485 err,drop/s 0 xmit/s
xmit eno0->eno2 0 xmit/s 1420485 drop/s 0 drv_err/s 15.71 bulk-avg
By analyzing the XDP_REDIRECT implementation of enetc driver, the
driver will reconfigure Tx and Rx BD rings when a bpf program is
installed or uninstalled, but there is no mechanisms to block the
redirected frames when enetc driver reconfigures rings. Similarly,
XDP_TX verdicts on received frames can also lead to frames being
enqueued in the Tx rings. Because XDP ignores the state set by the
netif_tx_wake_queue() API, so introduce the ENETC_TX_DOWN flag to
suppress transmission of XDP frames.
Fixes: c33bfaf91c4c ("net: enetc: set up XDP program under enetc_reconfigure()")
Cc: stable@vger.kernel.org
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241010092056.298128-3-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The xdp_drops statistic indicates the number of XDP frames dropped in
the Rx direction. However, enetc_xdp_drop() is also used in XDP_TX and
XDP_REDIRECT actions. If frame loss occurs in these two actions, the
frames loss count should not be included in xdp_drops, because there
are already xdp_tx_drops and xdp_redirect_failures to count the frame
loss of these two actions, so it's better to remove xdp_drops statistic
from enetc_xdp_drop() and increase xdp_drops in XDP_DROP action.
Fixes: 7ed2bc80074e ("net: enetc: add support for XDP_TX")
Cc: stable@vger.kernel.org
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241010092056.298128-2-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Hangbin Liu says:
====================
netdevsim: better ipsec output format
The first 2 patches improve the netdevsim ipsec debug output with better
format. The 3rd patch update the selftests.
v2: update rtnetlink selftest with new output format (Stanislav Fomichev, Jakub Kicinski)
====================
Link: https://patch.msgid.link/20241010040027.21440-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After the netdevsim update to use human-readable IP address formats for
IPsec, we can now use the source and destination IPs directly in testing.
Here is the result:
# ./rtnetlink.sh -t kci_test_ipsec_offload
PASS: ipsec_offload
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20241010040027.21440-4-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The current code only copies the address for the in path, leaving the out
path address set to 0. This patch corrects the issue by copying the addresses
for both the in and out paths. Before this patch:
# cat /sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec
SA count=2 tx=20
sa[0] tx ipaddr=0.0.0.0
sa[0] spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
sa[0] key=0x3167608a ca4f1397 43565909 941fa627
sa[1] rx ipaddr=192.168.0.1
sa[1] spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
sa[1] key=0x3167608a ca4f1397 43565909 941fa627
After this patch:
= cat /sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec
SA count=2 tx=20
sa[0] tx ipaddr=192.168.0.2
sa[0] spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
sa[0] key=0x3167608a ca4f1397 43565909 941fa627
sa[1] rx ipaddr=192.168.0.1
sa[1] spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
sa[1] key=0x3167608a ca4f1397 43565909 941fa627
Fixes: 7699353da875 ("netdevsim: add ipsec offload testing")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20241010040027.21440-3-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, IPSec addresses are printed in hexadecimal format, which is
not user-friendly. e.g.
# cat /sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec
SA count=2 tx=20
sa[0] rx ipaddr=0x00000000 00000000 00000000 0100a8c0
sa[0] spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
sa[0] key=0x3167608a ca4f1397 43565909 941fa627
sa[1] tx ipaddr=0x00000000 00000000 00000000 00000000
sa[1] spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
sa[1] key=0x3167608a ca4f1397 43565909 941fa627
This patch updates the code to print the IPSec address in a human-readable
format for easier debug. e.g.
# cat /sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec
SA count=4 tx=40
sa[0] tx ipaddr=0.0.0.0
sa[0] spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
sa[0] key=0x3167608a ca4f1397 43565909 941fa627
sa[1] rx ipaddr=192.168.0.1
sa[1] spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
sa[1] key=0x3167608a ca4f1397 43565909 941fa627
sa[2] tx ipaddr=::
sa[2] spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
sa[2] key=0x3167608a ca4f1397 43565909 941fa627
sa[3] rx ipaddr=2000::1
sa[3] spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
sa[3] key=0x3167608a ca4f1397 43565909 941fa627
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20241010040027.21440-2-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The err value in mv88e6xxx_region_atu_snapshot is now potentially
uninitialised on return. Initialise err as 0.
Fixes: ada5c3229b32 ("net: dsa: mv88e6xxx: Add FID map cache")
Signed-off-by: Aryan Srivastava <aryan.srivastava@alliedtelesis.co.nz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241009212319.1045176-1-aryan.srivastava@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When port mirroring is added to a port, the bit position of the source
port, needs to be written to the register ANA_AC_PROBE_PORT_CFG. This
register is replicated for n_ports > 32, and therefore we need to derive
the correct register from the port number.
Before this patch, we wrongly calculate the register from portno /
BITS_PER_BYTE, where the divisor ought to be 32, causing any port >=8 to
be written to the wrong register. We fix this, by using do_div(), where
the dividend is the register, the remainder is the bit position and the
divisor is now 32.
Fixes: 4e50d72b3b95 ("net: sparx5: add port mirroring implementation")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009-mirroring-fix-v1-1-9ec962301989@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix clock handle leak in probe() error path in gpio-aspeed
- add a dummy register read to ensure the write actually completed
* tag 'gpio-fixes-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: aspeed: Use devm_clk api to manage clock source
gpio: aspeed: Add the flush write to ensure the write complete.
|
|
Radhey Shyam Pandey says:
====================
net: xilinx: emaclite: Adopt clock support
This patchset adds emaclite clock support. AXI Ethernet Lite IP can also
be used on SoC platforms like Zynq UltraScale+ MPSoC which combines
powerful processing system (PS) and user-programmable logic (PL) into
the same device. On these platforms it is mandatory to explicitly enable
IP clocks for proper functionality.
====================
Link: https://patch.msgid.link/1728491303-1456171-1-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Adapt to use the clock framework. Add s_axi_aclk clock from the processor
bus clock domain and make clk optional to keep DTB backward compatibility.
Signed-off-by: Abin Joseph <abin.joseph@amd.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1728491303-1456171-4-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use device managed ethernet device allocation to simplify the error
handling logic. No functional change.
Signed-off-by: Abin Joseph <abin.joseph@amd.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1728491303-1456171-3-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add s_axi_aclk AXI4 clock support. Traditionally this IP was used on
microblaze platforms which had fixed clocks enabled all the time. But
since its a PL IP, it can also be used on SoC platforms like Zynq
UltraScale+ MPSoC which combines processing system (PS) and user
programmable logic (PL) into the same device. On these platforms instead
of fixed enabled clocks it is mandatory to explicitly enable IP clocks
for proper functionality.
So make clock a required property and also define max supported clock
constraints.
Signed-off-by: Abin Joseph <abin.joseph@amd.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/1728491303-1456171-2-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull NFS client fixes from Anna Schumaker:
"Localio Bugfixes:
- remove duplicated include in localio.c
- fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()
- fix Kconfig for NFS_COMMON_LOCALIO_SUPPORT
- fix nfsd_file tracepoints to handle NULL rqstp pointers
Other Bugfixes:
- fix program selection loop in svc_process_common
- fix integer overflow in decode_rc_list()
- prevent NULL-pointer dereference in nfs42_complete_copies()
- fix CB_RECALL performance issues when using a large number of
delegations"
* tag 'nfs-for-6.12-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: remove revoked delegation from server's delegation list
nfsd/localio: fix nfsd_file tracepoints to handle NULL rqstp
nfs_common: fix Kconfig for NFS_COMMON_LOCALIO_SUPPORT
nfs_common: fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()
NFSv4: Prevent NULL-pointer dereference in nfs42_complete_copies()
SUNRPC: Fix integer overflow in decode_rc_list()
sunrpc: fix prog selection loop in svc_process_common
nfs: Remove duplicated include in localio.c
|
|
After commit 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to
invalidate dst entries"), blackhole_netdev was introduced to invalidate
dst cache entries on the TX path whenever the cache times out or is
flushed.
When two UDP sockets (sk1 and sk2) send messages to the same destination
simultaneously, they are using the same dst cache. If the dst cache is
invalidated on one path (sk2) while the other (sk1) is still transmitting,
sk1 may try to use the invalid dst entry.
CPU1 CPU2
udp_sendmsg(sk1) udp_sendmsg(sk2)
udp_send_skb()
ip_output()
<--- dst timeout or flushed
dst_dev_put()
ip_finish_output2()
ip_neigh_for_gw()
This results in a scenario where ip_neigh_for_gw() returns -EINVAL because
blackhole_dev lacks an in_dev, which is needed to initialize the neigh in
arp_constructor(). This error is then propagated back to userspace,
breaking the UDP application.
The patch fixes this issue by assigning an in_dev to blackhole_dev for
IPv4, similar to what was done for IPv6 in commit e5f80fcf869a ("ipv6:
give an IPv6 dev to blackhole_netdev"). This ensures that even when the
dst entry is invalidated with blackhole_dev, it will not fail to create
the neigh entry.
As devinet_init() is called ealier than blackhole_netdev_init() in system
booting, it can not assign the in_dev to blackhole_dev in devinet_init().
As Paolo suggested, add a separate late_initcall() in devinet.c to ensure
inet_blackhole_dev_init() is called after blackhole_netdev_init().
Fixes: 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/3000792d45ca44e16c785ebe2b092e610e5b3df1.1728499633.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Eric Dumazet says:
====================
net: remove RTNL from fib_seq_sum()
This series is inspired by a syzbot report showing
rtnl contention and one thread blocked in:
7 locks held by syz-executor/10835:
#0: ffff888033390420 (sb_writers#8){.+.+}-{0:0}, at: file_start_write include/linux/fs.h:2931 [inline]
#0: ffff888033390420 (sb_writers#8){.+.+}-{0:0}, at: vfs_write+0x224/0xc90 fs/read_write.c:679
#1: ffff88806df6bc88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x1ea/0x500 fs/kernfs/file.c:325
#2: ffff888026fcf3c8 (kn->active#50){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x20e/0x500 fs/kernfs/file.c:326
#3: ffffffff8f56f848 (nsim_bus_dev_list_lock){+.+.}-{3:3}, at: new_device_store+0x1b4/0x890 drivers/net/netdevsim/bus.c:166
#4: ffff88805e0140e8 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:1014 [inline]
#4: ffff88805e0140e8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x8e/0x520 drivers/base/dd.c:1005
#5: ffff88805c5fb250 (&devlink->lock_key#55){+.+.}-{3:3}, at: nsim_drv_probe+0xcb/0xb80 drivers/net/netdevsim/dev.c:1534
#6: ffffffff8fcd1748 (rtnl_mutex){+.+.}-{3:3}, at: fib_seq_sum+0x31/0x290 net/core/fib_notifier.c:46
====================
Link: https://patch.msgid.link/20241009184405.3752829-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After we made sure no fib_seq_read() handlers needs RTNL anymore,
we can remove RTNL from fib_seq_sum().
Note that after RTNL was dropped, fib_seq_sum() result was possibly
outdated anyway.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241009184405.3752829-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mr_call_vif_notifiers() and mr_call_mfc_notifiers() already
uses WRITE_ONCE() on the write side.
Using RTNL to protect the reads seems a big hammer.
Constify 'struct net' argument of ip6mr_rules_seq_read()
and ipmr_rules_seq_read().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241009184405.3752829-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Using RTNL to protect ops->fib_rules_seq reads seems a big hammer.
Writes are protected by RTNL.
We can use READ_ONCE() when reading it.
Constify 'struct net' argument of fib6_tables_seq_read() and
fib6_rules_seq_read().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241009184405.3752829-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Using RTNL to protect ops->fib_rules_seq reads seems a big hammer.
Writes are protected by RTNL.
We can use READ_ONCE() when reading it.
Constify 'struct net' argument of fib4_rules_seq_read()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241009184405.3752829-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Using RTNL to protect ops->fib_rules_seq reads seems a big hammer.
Writes are protected by RTNL.
We can use READ_ONCE() on readers.
Constify 'struct net' argument of fib_rules_seq_read()
and lookup_rules_ops().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241009184405.3752829-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU fix from Neeraj Upadhyay:
"Fix rcuog kthread wakeup invocation from softirq context on a CPU
which has been marked offline.
This can happen when new callbacks are enqueued from a softirq on an
offline CPU before it calls rcutree_report_cpu_dead(). When this
happens on NOCB configuration, the rcuog wake-up is deferred through
an IPI to an online CPU. This is done to avoid call into the scheduler
which can risk arming the RT-bandwidth after hrtimers have been
migrated out and disabled.
However, doing IPI call from softirq is not allowed: Fix this by
forcing deferred rcuog wakeup through the NOCB timer when the CPU is
offline"
* tag 'rcu.fixes.6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux:
rcu/nocb: Fix rcuog wake-up from offline softirq
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fix from Juergen Gross:
"A fix for topology information of Xen PV guests"
* tag 'for-linus-6.12a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: mark boot CPU of PV guest in MSR_IA32_APICBASE
|
|
Masami reported a bug when running function graph tracing then the
function profiler. The following commands would cause a kernel crash:
# cd /sys/kernel/tracing/
# echo function_graph > current_tracer
# echo 1 > function_profile_enabled
In that order. Create a test to test this two to make sure this does not
come back as a regression.
Link: https://lore.kernel.org/172398528350.293426.8347220120333730248.stgit@devnote2
Link: https://lore.kernel.org/all/20241010165235.35122877@gandalf.local.home/
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Adapt the rseq.c/rseq.h code to follow GNU C library changes introduced by:
glibc commit 2e456ccf0c34 ("Linux: Make __rseq_size useful for feature detection (bug 31965)")
Without this fix, rseq selftests for mm_cid fail:
./run_param_test.sh
Default parameters
Running test spinlock
Running compare-twice test spinlock
Running mm_cid test spinlock
Error: cpu id getter unavailable
Fixes: 18c2355838e7 ("selftests/rseq: Implement rseq mm_cid field support")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
CC: Carlos O'Donell <carlos@redhat.com>
CC: Florian Weimer <fweimer@redhat.com>
CC: linux-kselftest@vger.kernel.org
CC: stable@vger.kernel.org
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Pull io_uring fixes from Jens Axboe:
- Explicitly have a mshot_finished condition for IORING_OP_RECV in
multishot mode, similarly to what IORING_OP_RECVMSG has. This doesn't
fix a bug right now, but it makes it harder to actually have a bug
here if a request takes multiple iterations to finish.
- Fix handling of retry of read/write of !FMODE_NOWAIT files. If they
are pollable, that's all we need.
* tag 'io_uring-6.12-20241011' of git://git.kernel.dk/linux:
io_uring/rw: allow pollable non-blocking attempts for !FMODE_NOWAIT
io_uring/rw: fix cflags posting for single issue multishot read
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These address two issues in the TPMI module of the Intel RAPL power
capping driver and one issue in the processor part of the Intel
int340x thermal driver, update a CPU ID list and register definitions
needed for RAPL PL4 support and remove some unused code.
Specifics:
- Fix the TPMI_RAPL_REG_DOMAIN_INFO register offset in the TPMI part
of the Intel RAPL power capping driver, make it ignore minor
hardware version mismatches (which only indicate exposing
additional features) and update register definitions in it to
enable PL4 support (Zhang Rui)
- Add Arrow Lake-U to the list of processors supporting PL4 in the
MSR part of the Intel RAPL power capping driver (Sumeet Pawnikar)
- Remove excess pci_disable_device() calls from the processor part of
the int340x thermal driver to address a warning triggered during
module unload and remove unused CPU hotplug code related to RAPL
support from it (Zhang Rui)"
* tag 'pm-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel: int340x: processor: Add MMIO RAPL PL4 support
thermal: intel: int340x: processor: Remove MMIO RAPL CPU hotplug support
powercap: intel_rapl_msr: Add PL4 support for Arrowlake-U
powercap: intel_rapl_tpmi: Ignore minor version change
thermal: intel: int340x: processor: Fix warning during module unload
powercap: intel_rapl_tpmi: Fix bogus register reading
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"Address possible use-after-free scenarios during the processing of
thermal netlink commands and during thermal zone removal (Rafael
Wysocki)"
* tag 'thermal-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: core: Free tzp copy along with the thermal zone
thermal: core: Reference count the zone in thermal_zone_get_by_id()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"Reduce the number of ACPI IRQ override DMI quirks by combining quirks
that cover similar systems while making them cover additional models
at the same time (Hans de Goede)"
* tag 'acpi-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: resource: Fold Asus Vivobook Pro N6506M* DMI quirks together
ACPI: resource: Fold Asus ExpertBook B1402C* and B1502C* DMI quirks together
ACPI: resource: Make Asus ExpertBook B2502 matches cover more models
ACPI: resource: Make Asus ExpertBook B2402 matches cover more models
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain fixes from Ulf Hansson:
"pmdomain core:
- Fix alloc/free in dev_pm_domain_attach|detach_list()
pmdomain providers:
- qcom: Fix the return of uninitialized variable
pmdomain consumers:
- drm/tegra/gr3d: Revert conversion to dev_pm_domain_attach|detach_list()
OPP core:
- Fix error code in dev_pm_opp_set_config()"
* tag 'pmdomain-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
PM: domains: Fix alloc/free in dev_pm_domain_attach|detach_list()
Revert "drm/tegra: gr3d: Convert into dev_pm_domain_attach|detach_list()"
pmdomain: qcom-cpr: Fix the return of uninitialized variable
OPP: fix error code in dev_pm_opp_set_config()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Prevent splat from warning when setting maximum DMA segment
MMC host:
- mvsdio: Drop sg_miter support for PIO as it didn't work
- sdhci-of-dwcmshc: Prevent stale interrupt for the T-Head 1520
variant"
* tag 'mmc-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-of-dwcmshc: Prevent stale command interrupt handling
Revert "mmc: mvsdio: Use sg_miter for PIO"
mmc: core: Only set maximum DMA segment size if DMA is supported
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fixes from Niklas Cassel:
- Fix a hibernate regression where the disk was needlessly spun down
and then immediately spun up both when entering and when resuming
from hibernation (me)
- Update the MAINTAINERS file to remove remnants from Jens
maintainership of libata (Damien)
* tag 'ata-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata: Update MAINTAINERS file
ata: libata: avoid superfluous disk spin down + spin up during hibernation
|
|
Pull drm fixes from Dave Airlie:
"Weekly fixes haul for drm, lots of small fixes all over, amdgpu, xe
lead the way, some minor nouveau and radeon fixes, and then a bunch of
misc all over.
Nothing too scary or out of the unusual.
sched:
- Avoid leaking lockdep map
fbdev-dma:
- Only clean up deferred I/O if instanciated
amdgpu:
- Fix invalid UBSAN warnings
- Fix artifacts in MPO transitions
- Hibernation fix
amdkfd:
- Fix an eviction fence leak
radeon:
- Add late register for connectors
- Always set GEM function pointers
i915:
- HDCP refcount fix
nouveau:
- dmem: Fix privileged error in copy engine channel; Fix possible
data leak in migrate_to_ram()
- gsp: Fix coding style
v3d:
- Stop active perfmon before destroying it
vc4:
- Stop active perfmon before destroying it
xe:
- Drop GuC submit_wq pool
- Fix error checking with xa_store()
- Fix missing freq restore on GSC load error
- Fix wedged_mode file permission
- Fix use-after-free in ct communication"
* tag 'drm-fixes-2024-10-11' of https://gitlab.freedesktop.org/drm/kernel:
drm/fbdev-dma: Only cleanup deferred I/O if necessary
drm/xe: Make wedged_mode debugfs writable
drm/xe: Restore GT freq on GSC load error
drm/xe/guc_submit: fix xa_store() error checking
drm/xe/ct: fix xa_store() error checking
drm/xe/ct: prevent UAF in send_recv()
drm/radeon: always set GEM function pointer
nouveau/dmem: Fix vulnerability in migrate_to_ram upon copy error
nouveau/dmem: Fix privileged error in copy engine channel
drm/amd/display: fix hibernate entry for DCN35+
drm/amd/display: Clear update flags after update has been applied
drm/amdgpu: partially revert powerplay `__counted_by` changes
drm/radeon: add late_register for connector
drm/amdkfd: Fix an eviction fence leak
drm/vc4: Stop the active perfmon before being destroyed
drm/v3d: Stop the active perfmon before being destroyed
drm/i915/hdcp: fix connector refcounting
drm/nouveau/gsp: remove extraneous ; after mutex
drm/xe: Drop GuC submit_wq pool
drm/sched: Use drm sched lockdep map for submit_wq
|
|
The function read_alloc_one_name() does not initialize the name field of
the passed fscrypt_str struct if kmalloc fails to allocate the
corresponding buffer. Thus, it is not guaranteed that
fscrypt_str.name is initialized when freeing it.
This is a follow-up to the linked patch that fixes the remaining
instances of the bug introduced by commit e43eec81c516 ("btrfs: use
struct qstr instead of name and namelen pairs").
Link: https://lore.kernel.org/linux-btrfs/20241009080833.1355894-1-jroi.martin@gmail.com/
Fixes: e43eec81c516 ("btrfs: use struct qstr instead of name and namelen pairs")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Roi Martin <jroi.martin@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
As all changed_* functions need to return something, just return 0
directly here, as the verity status is passed via the context.
Reported by LKP: fs/btrfs/send.c:6877:5-8: Unneeded variable: "ret". Return "0" on line 6883
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202410092305.WbyqspH8-lkp@intel.com/
Signed-off-by: Christian Heusel <christian@heusel.eu>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The add_inode_ref() function does not initialize the "name" struct when
it is declared. If any of the following calls to "read_one_inode()
returns NULL,
dir = read_one_inode(root, parent_objectid);
if (!dir) {
ret = -ENOENT;
goto out;
}
inode = read_one_inode(root, inode_objectid);
if (!inode) {
ret = -EIO;
goto out;
}
then "name.name" would be freed on "out" before being initialized.
out:
...
kfree(name.name);
This issue was reported by Coverity with CID 1526744.
Fixes: e43eec81c516 ("btrfs: use struct qstr instead of name and namelen pairs")
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Roi Martin <jroi.martin@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Since commit 4c39529663b9 ("slab: Warn on duplicate cache names when
DEBUG_VM=y"), slab complains about duplicate cache names. Hence this
patch. The approach is as follows:
- Maintain an xarray with the slab size as index and a reference count
and a kmem_cache pointer as contents. Use srpt-${slab_size} as kmem
cache name.
- Use 512-byte alignment for all slabs instead of only for some of the
slabs.
- Increment the reference count instead of calling kmem_cache_create().
- Decrement the reference count instead of calling kmem_cache_destroy().
Fixes: 5dabcd0456d7 ("RDMA/srpt: Add support for immediate data")
Link: https://patch.msgid.link/r/20241009210048.4122518-1-bvanassche@acm.org
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Closes: https://lore.kernel.org/linux-block/xpe6bea7rakpyoyfvspvin2dsozjmjtjktpph7rep3h25tv7fb@ooz4cu5z6bq6/
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
There is "accept*" misspelled as "accpet*" in the comments. Fix the
spelling.
Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager")
Link: https://patch.msgid.link/r/20241008161913.19965-1-green@qrator.net
Signed-off-by: Alexander Zubkov <green@qrator.net>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
ip_dev_find() always returns real net_device address, whether traffic is
running on a vlan or real device, if traffic is over vlan, filling
endpoint struture with real ndev and an attempt to send a connect request
will results in RDMA_CM_EVENT_UNREACHABLE error. This patch fixes the
issue by using vlan_dev_real_dev().
Fixes: 830662f6f032 ("RDMA/cxgb4: Add support for active and passive open connection with IPv6 address")
Link: https://patch.msgid.link/r/20241007132311.70593-1-anumula@chelsio.com
Signed-off-by: Anumula Murali Mohan Reddy <anumula@chelsio.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
While running ISER over SIW, the initiator machine encounters a warning
from skb_splice_from_iter() indicating that a slab page is being used in
send_page. To address this, it is better to add a sendpage_ok() check
within the driver itself, and if it returns 0, then MSG_SPLICE_PAGES flag
should be disabled before entering the network stack.
A similar issue has been discussed for NVMe in this thread:
https://lore.kernel.org/all/20240530142417.146696-1-ofir.gal@volumez.com/
WARNING: CPU: 0 PID: 5342 at net/core/skbuff.c:7140 skb_splice_from_iter+0x173/0x320
Call Trace:
tcp_sendmsg_locked+0x368/0xe40
siw_tx_hdt+0x695/0xa40 [siw]
siw_qp_sq_process+0x102/0xb00 [siw]
siw_sq_resume+0x39/0x110 [siw]
siw_run_sq+0x74/0x160 [siw]
kthread+0xd2/0x100
ret_from_fork+0x34/0x40
ret_from_fork_asm+0x1a/0x30
Link: https://patch.msgid.link/r/20241007125835.89942-1-showrya@chelsio.com
Signed-off-by: Showrya M N <showrya@chelsio.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
We are using the logical address ("bytenr") of an extent as the key for
qgroup records in the dirty extents xarray. This is a problem because the
xarrays use "unsigned long" for keys/indices, meaning that on a 32 bits
platform any extent starting at or beyond 4G is truncated, which is a too
low limitation as virtually everyone is using storage with more than 4G of
space. This means a "bytenr" of 4G gets truncated to 0, and so does 8G and
16G for example, resulting in incorrect qgroup accounting.
Fix this by using sector numbers as keys instead, that is, using keys that
match the logical address right shifted by fs_info->sectorsize_bits, which
is what we do for the fs_info->buffer_radix that tracks extent buffers
(radix trees also use an "unsigned long" type for keys). This also makes
the index space more dense which helps optimize the xarray (as mentioned
at Documentation/core-api/xarray.rst).
Fixes: 3cce39a8ca4e ("btrfs: qgroup: use xarray to track dirty extents in transaction")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Even though system user has a supplementary group, It gets
NT_STATUS_ACCESS_DENIED when attempting to create file or directory.
This patch add KSMBD_EVENT_LOGIN_REQUEST_EXT/RESPONSE_EXT netlink events
to get supplementary groups list. The new netlink event doesn't break
backward compatibility when using old ksmbd-tools.
Co-developed-by: Atte Heikkilä <atteh.mailbox@gmail.com>
Signed-off-by: Atte Heikkilä <atteh.mailbox@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
sysctl_tcp_l3mdev_accept is read from TCP receive fast path from
tcp_v6_early_demux(),
__inet6_lookup_established,
inet_request_bound_dev_if().
Move it to netns_ipv4_read_rx.
Remove the '#ifdef CONFIG_NET_L3_MASTER_DEV' that was guarding
its definition.
Note this adds a hole of three bytes that could be filled later.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Cc: Wei Wang <weiwan@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Link: https://patch.msgid.link/20241010034100.320832-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The system interface connection status register is not immediately
correct upon line side link up. This results in the status being read as
OFF and then transitioning to the correct host side link mode with a
short delay. This causes the phylink framework passing the OFF status
down to all MAC config drivers, resulting in the host side link being
misconfigured, which in turn can lead to link flapping or complete
packet loss in some cases.
Mitigate this by periodically polling the register until it not showing
the OFF state. This will be done every 1ms for 10ms, using the same
poll/timeout as the processor intensive operation reads.
If the phy is still expressing the OFF state after the timeout, then set
the link to false and pass the NA interface mode onto the phylink
framework.
Signed-off-by: Aryan Srivastava <aryan.srivastava@alliedtelesis.co.nz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241010004935.1774601-1-aryan.srivastava@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This fixes a regression which prevents parallel DIO reads.
Fixes: 0cac51185e65 ("f2fs: fix to avoid racing in between read and OPU dio write")
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Konstantin reports the maintainer's address bounces.
There is no other maintainer and the driver is quite old.
There is a good chance nobody is using this driver any more.
Let's try to remove it completely, we can revert it back in
if someone complains.
Link: https://lore.kernel.org/20240925-bizarre-earwig-from-pluto-1484aa@lemu/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Denis Kirjanov <dkirjanov@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|