Age | Commit message (Collapse) | Author |
|
check_lru_key() wasn't using write buffer updates for deleting bad lru
entries - dating from before the lru btree used the btree write buffer.
And when possibly flushing the btree write buffer (to make sure we're
seeing a real inconsistency), we need to be using the modern
bch2_btree_write_buffer_maybe_flush().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Errors are getting marked as AUTOFIX once they've been (re)-tested and
audited.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Newly generated keys, in the transaction commit path or write path,
should not be AUTOFIX; those indicate bugs that we need to fail fast
for.
Fixes: 5612daafb764 ("bcachefs: Fix fsck warnings from bkey validation")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Ensure a copy of the lost+found inode exists in the snapshot that we're
reattaching, so that we don't trigger warnings in
lookup_inode_for_snapshot() later.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This fixes two different bugs:
- Looser locking with the rhashtable means we need to recheck if the
inode is still hashed after prepare_to_wait(), and add a corresponding
wakeup after removing from the hash table.
- da18ecbf0fb6 ("fs: add i_state helpers") changed the bit waitqueues
used for inodes, and bcachefs wasn't updated and thus broke; this
updates bcachefs to the new helper.
Fixes: 112d21fd1a12 ("bcachefs: switch to rhashtable for vfs inodes hash")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Uwe Kleine-König says:
====================
net: Switch back to struct platform_driver::remove()
I already sent a patch last week that is very similar to patch #1 of
this series. However the previous submission was based on plain next.
I was asked to resend based on net-next once the merge window closed,
so here comes this v2. The additional patches address drivers/net/dsa,
drivers/net/mdio and the rest of drivers/net apart from wireless which
has its own tree and will addressed separately at a later point in time.
====================
Link: https://patch.msgid.link/cover.1727949050.git.u.kleine-koenig@baylibre.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After commit 0edb555a65d1 ("platform: Make platform_driver::remove()
return void") .remove() is (again) the right callback to implement for
platform drivers.
Convert all platform drivers below drivers/net after the previous
conversion commits apart from the wireless drivers to use .remove(),
with the eventual goal to drop struct platform_driver::remove_new(). As
.remove() and .remove_new() have the same prototypes, conversion is done
by just changing the structure member name in the driver initializer.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After commit 0edb555a65d1 ("platform: Make platform_driver::remove()
return void") .remove() is (again) the right callback to implement for
platform drivers.
Convert all platform drivers below drivers/net/mdio to use .remove(),
with the eventual goal to drop struct platform_driver::remove_new(). As
.remove() and .remove_new() have the same prototypes, conversion is done
by just changing the structure member name in the driver initializer.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/0b60d8bfc45a3de8193f953794dda241e11032a9.1727949050.git.u.kleine-koenig@baylibre.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After commit 0edb555a65d1 ("platform: Make platform_driver::remove()
return void") .remove() is (again) the right callback to implement for
platform drivers.
Convert all platform drivers below drivers/net/dsa to use .remove(),
with the eventual goal to drop struct platform_driver::remove_new(). As
.remove() and .remove_new() have the same prototypes, conversion is done
by just changing the structure member name in the driver initializer.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/36da477cb9fa0bffec32d50c2cf3d18e94a0e7e3.1727949050.git.u.kleine-koenig@baylibre.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After commit 0edb555a65d1 ("platform: Make platform_driver::remove()
return void") .remove() is (again) the right callback to implement for
platform drivers.
Convert all platform drivers below drivers/net/ethernet to use
.remove(), with the eventual goal to drop struct
platform_driver::remove_new(). As .remove() and .remove_new() have the
same prototypes, conversion is done by just changing the structure
member name in the driver initializer.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://patch.msgid.link/18f7c585a1a8a8ac8b03a2fca7de19bd5c52ac2b.1727949050.git.u.kleine-koenig@baylibre.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The SF2 crossbar register is a packed bitfield, giving the index of the
external port selected for each of the internal ports. On BCM4908 (the
only currently-supported switch family with a crossbar), there are 2
internal ports and 3 external ports, so there are 2 bits per internal
port.
The driver currently conflates the "bits per port" and "number of ports"
concepts, lumping both into the `num_crossbar_int_ports` field. Since it
is currently only possible for either of these counts to have a value of
2, there is no behavioral error resulting from this situation for now.
Make the code more readable (and support the future possibility of
larger crossbars) by adding a `num_crossbar_ext_bits` field to represent
the "bits per port" count and relying on this where appropriate instead.
Signed-off-by: Sam Edwards <CFSworks@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20241003212301.1339647-1-CFSworks@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Eric Dumazet says:
====================
net: prepare pacing offload support
Some network devices have the ability to offload EDT (Earliest
Departure Time) which is the model used for TCP pacing and FQ
packet scheduler.
Some of them implement the timing wheel mechanism described in
https://saeed.github.io/files/carousel-sigcomm17.pdf
with an associated 'timing wheel horizon'.
In order to upstream the NIC support, this series adds :
1) timing wheel horizon as a per-device attribute.
2) FQ packet scheduler support, to let paced packets
below the timing wheel horizon be handled by the driver.
v1: https://lore.kernel.org/20240930152304.472767-2-edumazet@google.com
====================
Link: https://patch.msgid.link/20241003121219.2396589-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some network devices have the ability to offload EDT (Earliest
Departure Time) which is the model used for TCP pacing and FQ packet
scheduler.
Some of them implement the timing wheel mechanism described in
https://saeed.github.io/files/carousel-sigcomm17.pdf
with an associated 'timing wheel horizon'.
This patchs adds to FQ packet scheduler TCA_FQ_OFFLOAD_HORIZON
attribute.
Its value is capped by the device max_pacing_offload_horizon,
added in the prior patch.
It allows FQ to let packets within pacing offload horizon
to be delivered to the device, which will handle the needed
delay without host involvement.
Signed-off-by: Jeffrey Ji <jeffreyji@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20241003121219.2396589-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some network devices have the ability to offload EDT (Earliest
Departure Time) which is the model used for TCP pacing and FQ
packet scheduler.
Some of them implement the timing wheel mechanism described in
https://saeed.github.io/files/carousel-sigcomm17.pdf
with an associated 'timing wheel horizon'.
This patch adds dev->max_pacing_offload_horizon expressing
this timing wheel horizon in nsec units.
This is a read-only attribute.
Unless a driver sets it, dev->max_pacing_offload_horizon
is zero.
v2: addressed Jakub feedback ( https://lore.kernel.org/netdev/20240930152304.472767-2-edumazet@google.com/T/#mf6294d714c41cc459962154cc2580ce3c9693663 )
v3: added yaml doc (also per Jakub feedback)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20241003121219.2396589-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
With the inclusion of commit c259acab839e ("ptp/ioctl: support
MONOTONIC{,_RAW} timestamps for PTP_SYS_OFFSET_EXTENDED") clock_gettime()
now allows retrieval of pre/post timestamps for CLOCK_MONOTONIC and
CLOCK_MONOTONIC_RAW timebases along with the previously supported
CLOCK_REALTIME.
This patch adds a command line option 'y' to the testptp program to
choose one of the allowed timebases [realtime aka system, monotonic,
and monotonic-raw).
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://patch.msgid.link/20241003101506.769418-1-maheshb@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Eric Dumazet says:
====================
tcp: add fast path in timer handlers
As mentioned in Netconf 2024:
TCP retransmit and delack timers are not stopped from
inet_csk_clear_xmit_timer() because we do not define
INET_CSK_CLEAR_TIMERS.
Enabling INET_CSK_CLEAR_TIMERS leads to lower performance,
mainly because del_timer() and mod_timer() happen from
different cpus quite often.
What we can do instead is to add fast paths to tcp_write_timer()
and tcp_delack_timer() to avoid socket spinlock acquisition.
====================
Link: https://patch.msgid.link/20241002173042.917928-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
delack timer is not stopped from inet_csk_clear_xmit_timer()
because we do not define INET_CSK_CLEAR_TIMERS.
This is a conscious choice : inet_csk_clear_xmit_timer()
is often called from another cpu. Calling del_timer()
would cause false sharing and lock contention.
This means that very often, tcp_delack_timer() is called
at the timer expiration, while there is no ACK to transmit.
This can be detected very early, avoiding the socket spinlock.
Notes:
- test about tp->compressed_ack is racy,
but in the unlikely case there is a race, the dedicated
compressed_ack_timer hrtimer would close it.
- Even if the fast path is not taken, reading
icsk->icsk_ack.pending and tp->compressed_ack
before acquiring the socket spinlock reduces
acquisition time and chances of contention.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241002173042.917928-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
retransmit timer is not stopped from inet_csk_clear_xmit_timer()
because we do not define INET_CSK_CLEAR_TIMERS.
This is a conscious choice : for active TCP flows, it is better
to only call mod_timer(), because there is more chances of
keeping the timer unchanged. Also inet_csk_clear_xmit_timer()
is often called from another cpu, and calling del_timer()
would cause false sharing and lock contention.
This means that very often, tcp_write_timer() is called
at the timer expiration, while there is nothing to retransmit.
This can be detected very early, avoiding the socket spinlock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241002173042.917928-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
icsk->icsk_pending can be read locklessly already.
Following patch in the series will add another lockless read.
Add smp_load_acquire() and smp_store_release() annotations
because following patch will add a test in tcp_write_timer(),
and READ_ONCE()/WRITE_ONCE() alone would possibly lead to races.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241002173042.917928-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Justin Iurman says:
====================
selftests: net: ioam: add tunsrc support
TL;DR This patch comes from a discussion we had with Jakub and Paolo on
aligning the ioam selftests with its new "tunsrc" feature.
This patch updates the IOAM selftests to support the new "tunsrc"
feature of IOAM. As a consequence, some changes were required. For
example, the IPv6 header must be accessed to check some fields (i.e.,
the source address for the "tunsrc" feature), which is not possible
AFAIK with IPv6 raw sockets. The latter is currently used with
IPV6_RECVHOPOPTS and was introduced by commit 187bbb6968af ("selftests:
ioam: refactoring to align with the fix") to fix an issue. But, we
really need packet sockets actually... which is one of the changes in
this patch (see the description of the topology at the top of ioam6.sh
for explanations). Another change is that all IPv6 addresses used in the
topology are now based on the documentation prefix (2001:db8::/32).
Also, the tests have been improved and there are now many more of them.
Overall, the script is more robust.
====================
Link: https://patch.msgid.link/20241002162731.19847-1-justin.iurman@uliege.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch re-adds the (updated) ioam selftests with support for the
tunsrc feature.
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Link: https://patch.msgid.link/20241002162731.19847-3-justin.iurman@uliege.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch entirely removes the ioam selftests to prepare for the next
patch in this series, which re-adds the new ioam selftests for better
readability.
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Link: https://patch.msgid.link/20241002162731.19847-2-justin.iurman@uliege.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This adds control over the hardware LEDs in the Marvell
MV88E6xxx DSA switch and enables it for MV88E6352.
This fixes an imminent problem on the Inteno XG6846 which
has a WAN LED that simply do not work with hardware
defaults: driver amendment is necessary.
The patch is modeled after Christian Marangis LED support
code for the QCA8k DSA switch, I got help with the register
definitions from Tim Harvey.
After this patch it is possible to activate hardware link
indication like this (or with a similar script):
cd /sys/class/leds/Marvell\ 88E6352:05:00:green:wan/
echo netdev > trigger
echo 1 > link
This makes the green link indicator come up on any link
speed. It is also possible to be more elaborate, like this:
cd /sys/class/leds/Marvell\ 88E6352:05:00:green:wan/
echo netdev > trigger
echo 1 > link_1000
cd /sys/class/leds/Marvell\ 88E6352:05:01:amber:wan/
echo netdev > trigger
echo 1 > link_100
Making the green LED come on for a gigabit link and the
amber LED come on for a 100 mbit link.
Each port has 2 LED slots (the hardware may use just one or
none) and the hardware triggers are specified in four bits per
LED, and some of the hardware triggers are only available on the
SFP (fiber) uplink. The restrictions are described in the
port.h header file where the registers are described. For
example, selector 1 set for LED 1 on port 5 or 6 will indicate
Fiber 1000 (gigabit) and activity with a blinking LED, but
ONLY for an SFP connection. If port 5/6 is used with something
not SFP, this selector is a noop: something else need to be
selected.
After the previous series rewriting the MV88E6xxx DT
bindings to use YAML a "leds" subnode is already valid
for each port, in my scratch device tree it looks like
this:
leds {
#address-cells = <1>;
#size-cells = <0>;
led@0 {
reg = <0>;
color = <LED_COLOR_ID_GREEN>;
function = LED_FUNCTION_LAN;
default-state = "off";
linux,default-trigger = "netdev";
};
led@1 {
reg = <1>;
color = <LED_COLOR_ID_AMBER>;
function = LED_FUNCTION_LAN;
default-state = "off";
};
};
This DT config is not yet configuring everything: when the netdev
default trigger is assigned the hw acceleration callbacks are
not called, and there is no way to set the netdev sub-trigger
type (such as link_1000) from the device tree, such as if you want
a gigabit link indicator. This has to be done from userspace at
this point.
We add LED operations to all switches in the 6352 family:
6172, 6176, 6240 and 6352.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241001-mv88e6xxx-leds-v4-1-cc11c4f49b18@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Wesley reported an issue:
==================================================================
EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks
------------[ cut here ]------------
kernel BUG at fs/ext4/resize.c:324!
CPU: 9 UID: 0 PID: 3576 Comm: resize2fs Not tainted 6.11.0+ #27
RIP: 0010:ext4_resize_fs+0x1212/0x12d0
Call Trace:
__ext4_ioctl+0x4e0/0x1800
ext4_ioctl+0x12/0x20
__x64_sys_ioctl+0x99/0xd0
x64_sys_call+0x1206/0x20d0
do_syscall_64+0x72/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
==================================================================
While reviewing the patch, Honza found that when adjusting resize_bg in
alloc_flex_gd(), it was possible for flex_gd->resize_bg to be bigger than
flexbg_size.
The reproduction of the problem requires the following:
o_group = flexbg_size * 2 * n;
o_size = (o_group + 1) * group_size;
n_group: [o_group + flexbg_size, o_group + flexbg_size * 2)
o_size = (n_group + 1) * group_size;
Take n=0,flexbg_size=16 as an example:
last:15
|o---------------|--------------n-|
o_group:0 resize to n_group:30
The corresponding reproducer is:
img=test.img
rm -f $img
truncate -s 600M $img
mkfs.ext4 -F $img -b 1024 -G 16 8M
dev=`losetup -f --show $img`
mkdir -p /tmp/test
mount $dev /tmp/test
resize2fs $dev 248M
Delete the problematic plus 1 to fix the issue, and add a WARN_ON_ONCE()
to prevent the issue from happening again.
[ Note: another reproucer which this commit fixes is:
img=test.img
rm -f $img
truncate -s 25MiB $img
mkfs.ext4 -b 4096 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $img
truncate -s 3GiB $img
dev=`losetup -f --show $img`
mkdir -p /tmp/test
mount $dev /tmp/test
resize2fs $dev 3G
umount $dev
losetup -d $dev
-- TYT ]
Reported-by: Wesley Hershberger <wesley.hershberger@canonical.com>
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2081231
Reported-by: Stéphane Graber <stgraber@stgraber.org>
Closes: https://lore.kernel.org/all/20240925143325.518508-1-aleksandr.mikhalitsyn@canonical.com/
Tested-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Tested-by: Eric Sandeen <sandeen@redhat.com>
Fixes: 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()")
Cc: stable@vger.kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240927133329.1015041-1-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Calling ext4_fc_mark_ineligible() with a NULL handle is racy and may result
in a fast-commit being done before the filesystem is effectively marked as
ineligible. This patch moves the call to this function so that an handle
can be used. If a transaction fails to start, then there's not point in
trying to mark the filesystem as ineligible, and an error will eventually be
returned to user-space.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240923104909.18342-3-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
Calling ext4_fc_mark_ineligible() with a NULL handle is racy and may result
in a fast-commit being done before the filesystem is effectively marked as
ineligible. This patch fixes the calls to this function in
__track_dentry_update() by adding an extra parameter to the callback used in
ext4_fc_track_template().
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240923104909.18342-2-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
Current code allocates the pcpu_sum array with size num_possible_cpus().
This code assumes the cpu_possible_mask is dense, which is not true in
the general case per [1]. If cpu_possible_mask is sparse, the array
might be indexed by a value beyond the size of the array.
However, the configurations that Hyper-V provides to guest VMs on x86
and ARM64 hardware, in combination with how architecture specific code
assigns Linux CPU numbers, *does* always produce a dense cpu_possible_mask.
So the dense assumption is not currently causing failures. But for
robustness against future changes in how cpu_possible_mask is populated,
update the code to no longer assume dense.
The correct approach is to allocate and initialize the array using size
"nr_cpu_ids". While this leaves unused array entries corresponding to
holes in cpu_possible_mask, the holes are assumed to be minimal and hence
the amount of memory wasted by unused entries is minimal.
[1] https://lore.kernel.org/lkml/SN6PR02MB4157210CC36B2593F8572E5ED4692@SN6PR02MB4157.namprd02.prod.outlook.com/
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20241003035333.49261-6-mhklinux@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This warning is emitted when a driver does not default populate an rss
key when one is not provided from userspace. Some devices do not
support individual rss keys per context. For these devices, it is ok
to leave the key zeroed out in ethtool_rxfh_context. Do not warn on
zeroed key when ethtool_ops.rxfh_per_ctx_key == 0.
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20241003162310.1310576-1-daniel.zahka@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-10-01 (ice)
This series contains updates to ice driver only.
Karol cleans up current PTP GPIO pin handling, fixes minor bugs,
refactors implementation for all products, introduces SDP (Software
Definable Pins) for E825C and implements reading SDP section from NVM
for E810 products.
Sergey replaces multiple aux buses and devices used in the PTP support
code with struct ice_adapter holding the necessary shared data.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: Drop auxbus use for PTP to finalize ice_adapter move
ice: Use ice_adapter for PTP shared data instead of auxdev
ice: Initial support for E825C hardware in ice_adapter
ice: Add ice_get_ctrl_ptp() wrapper to simplify the code
ice: Introduce ice_get_phy_model() wrapper
ice: Enable 1PPS out from CGU for E825C products
ice: Read SDP section from NVM for pin definitions
ice: Disable shared pin on E810 on setfunc
ice: Cache perout/extts requests and check flags
ice: Align E810T GPIO to other products
ice: Add SDPs support for E825C
ice: Implement ice_ptp_pin_desc
====================
Link: https://patch.msgid.link/20241001201702.3252954-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
"A couple of build/config issues and expanding the speculative SSBS
workaround to more CPUs:
- Expand the speculative SSBS workaround to cover Cortex-A715,
Neoverse-N3 and Microsoft Azure Cobalt 100
- Force position-independent veneers - in some kernel configurations,
the LLD linker generates position-dependent veneers for otherwise
position-independent code, resulting in early boot-time failures
- Fix Kconfig selection of HAVE_DYNAMIC_FTRACE_WITH_ARGS so that it
is not enabled when not supported by the combination of clang and
GNU ld"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Subscribe Microsoft Azure Cobalt 100 to erratum 3194386
arm64: fix selection of HAVE_DYNAMIC_FTRACE_WITH_ARGS
arm64: errata: Expand speculative SSBS workaround once more
arm64: cputype: Add Neoverse-N3 definitions
arm64: Force position-independent veneers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- PERF_TYPE_BREAKPOINT now returns -EOPNOTSUPP instead of -ENOENT,
which aligns to other ports and is a saner value
- The KASAN-related stack size increasing logic has been moved to a C
header, to avoid dependency issues
* tag 'riscv-for-linus-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix kernel stack size when KASAN is enabled
drivers/perf: riscv: Align errno for unsupported perf event
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix tp_printk command line option crashing the kernel
With the code that can handle a buffer from a previous boot, the
trace_check_vprintf() needed access to the delta of the address space
used by the old buffer and the current buffer. To do so, the
trace_array (tr) parameter was used. But when tp_printk is enabled on
the kernel command line, no trace buffer is used and the trace event
is sent directly to printk(). That meant the tr field of the iterator
descriptor was NULL, and since tp_printk still uses
trace_check_vprintf() it caused a NULL dereference.
- Add ptrace.h include to x86 ftrace file for completeness
- Fix rtla installation when done with out-of-tree build
- Fix the help messages in rtla that were incorrect
- Several fixes to fix races with the timerlat and hwlat code
Several locking issues were discovered with the coordination between
timerlat kthread creation and hotplug. As timerlat has callbacks from
hotplug code to start kthreads when CPUs come online. There are also
locking issues with grabbing the cpu_read_lock() and the locks within
timerlat.
* tag 'trace-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/hwlat: Fix a race during cpuhp processing
tracing/timerlat: Fix a race during cpuhp processing
tracing/timerlat: Drop interface_lock in stop_kthread()
tracing/timerlat: Fix duplicated kthread creation due to CPU online/offline
x86/ftrace: Include <asm/ptrace.h>
rtla: Fix the help text in osnoise and timerlat top tools
tools/rtla: Fix installation from out-of-tree build
tracing: Fix trace_check_vprintf() when tp_printk is used
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fixes from Vlastimil Babka:
"Fixes for issues introduced in this merge window: kobject memory leak,
unsupressed warning and possible lockup in new slub_kunit tests,
misleading code in kvfree_rcu_queue_batch()"
* tag 'slab-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in
mm, slab: suppress warnings in test_leak_destroy kunit test
rcu/kvfree: Refactor kvfree_rcu_queue_batch()
mm, slab: fix use of SLAB_SUPPORTS_SYSFS in kmem_cache_release()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix up the ACPI IRQ override quirk list and add two new entries
to it, add a new quirk to the ACPI backlight (video) driver, and fix
the ACPI battery driver.
Specifics:
- Add a quirk for Dell OptiPlex 5480 AIO to the ACPI backlight
(video) driver (Hans de Goede)
- Prevent the ACPI battery driver from crashing when unregistering a
battery hook and simplify battery hook locking in it (Armin Wolf)
- Fix up the ACPI IRQ override quirk list and add quirks for Asus
Vivobook X1704VAP and Asus ExpertBook B2502CVA to it (Hans de
Goede)"
* tag 'acpi-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: battery: Fix possible crash when unregistering a battery hook
ACPI: battery: Simplify battery hook locking
ACPI: video: Add backlight=native quirk for Dell OptiPlex 5480 AIO
ACPI: resource: Add Asus ExpertBook B2502CVA to irq1_level_low_skip_override[]
ACPI: resource: Add Asus Vivobook X1704VAP to irq1_level_low_skip_override[]
ACPI: resource: Loosen the Asus E1404GAB DMI match to also cover the E1404GA
ACPI: resource: Remove duplicate Asus E1504GAB IRQ override
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix two cpufreq issues, one in the core and one in the
intel_pstate driver:
- Fix CPU device node reference counting in the cpufreq core (Miquel
Sabaté Solà)
- Turn the spinlock used by the intel_pstate driver in hard IRQ
context into a raw one to prevent the driver from crashing when
PREEMPT_RT is enabled (Uwe Kleine-König)"
* tag 'pm-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Avoid a bad reference count on CPU node
cpufreq: intel_pstate: Make hwp_notify_lock a raw spinlock
|
|
Vadim Fedorenko says:
====================
Add option to provide OPT_ID value via cmsg
SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
timestamps and packets sent via socket. Unfortunately, there is no way
to reliably predict socket timestamp ID value in case of error returned
by sendmsg. For UDP sockets it's impossible because of lockless
nature of UDP transmit, several threads may send packets in parallel. In
case of RAW sockets MSG_MORE option makes things complicated. More
details are in the conversation [1].
This patch adds new control message type to give user-space
software an opportunity to control the mapping between packets and
values by providing ID with each sendmsg.
The first patch in the series adds all needed definitions and implements
the function for UDP sockets. The explicit check of socket's type is not
added because subsequent patches in the series will add support for other
types of sockets. The documentation is also included into the first
patch.
Patch 2/4 adds support for TCP sockets. This part is simple and straight
forward.
Patch 3/4 adds support for RAW sockets. It's a bit tricky because
sock_tx_timestamp functions has to be refactored to receive full socket
cookie information to fill in ID. The commit b534dc46c8ae ("net_tstamp:
add SOF_TIMESTAMPING_OPT_ID_TCP") did the conversion of sk_tsflags to
u32 but sock_tx_timestamp functions were not converted and still receive
16b flags. It wasn't a problem because SOF_TIMESTAMPING_OPT_ID_TCP was
not checked in these functions, that's why no backporting is needed.
Patch 4/4 adds selftests for new feature.
[1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
====================
Link: https://patch.msgid.link/20241001125716.2832769-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Extend txtimestamp test to run with fixed tskey using
SCM_TS_OPT_ID control message for all types of sockets.
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Link: https://patch.msgid.link/20241001125716.2832769-4-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The last type of sockets which supports SOF_TIMESTAMPING_OPT_ID is RAW
sockets. To add new option this patch converts all callers (direct and
indirect) of _sock_tx_timestamp to provide sockcm_cookie instead of
tsflags. And while here fix __sock_tx_timestamp to receive tsflags as
__u32 instead of __u16.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Link: https://patch.msgid.link/20241001125716.2832769-3-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
timestamps and packets sent via socket. Unfortunately, there is no way
to reliably predict socket timestamp ID value in case of error returned
by sendmsg. For UDP sockets it's impossible because of lockless
nature of UDP transmit, several threads may send packets in parallel. In
case of RAW sockets MSG_MORE option makes things complicated. More
details are in the conversation [1].
This patch adds new control message type to give user-space
software an opportunity to control the mapping between packets and
values by providing ID with each sendmsg for UDP sockets.
The documentation is also added in this patch.
[1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Link: https://patch.msgid.link/20241001125716.2832769-2-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix a potential NULL-pointer dereference in gpiolib core
- fix a probe() regression from the v6.12 merge window and an older bug
leading to missed interrupts in gpio-davinci
* tag 'gpio-fixes-for-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: Fix potential NULL pointer dereference in gpiod_get_label()
gpio: davinci: Fix condition for irqchip registration
gpio: davinci: fix lazy disable
|
|
Tariq Toukan says:
====================
net/mlx5: hw counters refactor
This is a patchset re-post, see:
https://lore.kernel.org/20240815054656.2210494-7-tariqt@nvidia.com
In this patchset, Cosmin refactors hw counters and solves perf scaling
issue.
Series generated against:
commit c824deb1a897 ("cxgb4: clip_tbl: Fix spelling mistake "wont" -> "won't"")
HW counters are central to mlx5 driver operations. They are hardware
objects created and used alongside most steering operations, and queried
from a variety of places. Most counters are queried in bulk from a
periodic task in fs_counters.c.
Counter performance is important and as such, a variety of improvements
have been done over the years. Currently, counters are allocated from
pools, which are bulk allocated to amortize the cost of firmware
commands. Counters are managed through an IDR, a doubly linked list and
two atomic single linked lists. Adding/removing counters is a complex
dance between user contexts requesting it and the mlx5_fc_stats_work
task which does most of the work.
Under high load (e.g. from connection tracking flow insertion/deletion),
the counter code becomes a bottleneck, as seen on flame graphs. Whenever
a counter is deleted, it gets added to a list and the wq task is
scheduled to run immediately to actually delete it. This is done via
mod_delayed_work which uses an internal spinlock. In some tests, waiting
for this spinlock took up to 66% of all samples.
This series refactors the counter code to use a more straight-forward
approach, avoiding the mod_delayed_work problem and making the code
easier to understand. For that:
- patch #1 moves counters data structs to a more appropriate place.
- patch #2 simplifies the bulk query allocation scheme by using vmalloc.
- patch #3 replaces the IDR+3 lists with an xarray. This is the main
patch of the series, solving the spinlock congestion issue.
- patch #4 removes an unnecessary cacheline alignment causing a lot of
memory to be wasted.
- patches #5 and #6 are small cleanups enabled by the refactoring.
====================
Link: https://patch.msgid.link/20241001103709.58127-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It no longer serves any purpose and is identical to mlx5_fc_create upon
which it was originally based of.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241001103709.58127-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
num_counters is only used for deciding whether to grow the bulk query
buffer, which is done once more counters than a small initial threshold
are present. After that, maintaining num_counters serves no purpose.
This commit replaces that with an actual xarray traversal to count the
counters. This appears expensive at first sight, but is only done when
the number of counters is less than the initial threshold (8) and only
once every sampling interval. Once the number of counters goes above the
threshold, the bulk query buffer is grown to max size and the xarray
traversal is never done again.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241001103709.58127-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The mlx5_fc struct has a cache for values queried from hw, which is
cacheline aligned. On x86_64, this results in:
struct mlx5_fc {
u32 id; /* 0 4 */
bool aging; /* 4 1 */
/* XXX 3 bytes hole, try to pack */
struct mlx5_fc_bulk * bulk; /* 8 8 */
/* XXX 48 bytes hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */
struct mlx5_fc_cache cache __attribute__((__aligned__(64)));
/* 64 24 */
u64 lastpackets; /* 88 8 */
u64 lastbytes; /* 96 8 */
/* size: 128, cachelines: 2, members: 6 */
/* sum members: 53, holes: 2, sum holes: 51 */
/* padding: 24 */
/* forced aligns: 1, forced holes: 1, sum forced holes: 48 */
} __attribute__((__aligned__(64)));
(output from pahole).
...So a 48+24=72 byte waste. As far as I can determine, this serves no
purpose other than maybe making sure that the values in the cache do not
span two cachelines in the worst case scenario, but that's not a valid
enough reason to waste 72 bytes per counter, especially since this code
is not performance-critical. There could potentially be hundreds of
thousands of counters (e.g. for connection-tracking), so this quickly
adds up to multiple MB wasted.
This commit removes the alignment, resulting in:
struct mlx5_fc {
[...]
/* size: 56, cachelines: 1, members: 6 */
/* sum members: 53, holes: 1, sum holes: 3 */
/* last cacheline: 56 bytes */
};
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241001103709.58127-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Previously, managing counters was a complicated affair involving an IDR,
a sorted double linked list, two single linked lists and a complex dance
between a non-periodic wq task and users adding/deleting counters.
Adding was done by inserting new counters into the IDR and into a single
linked list, leaving the wq to process the list and actually add the
counters into the double linked list, maintained sorted with the IDR.
Deleting involved adding the counter into another single linked list,
leaving the wq to actually unlink the counter from the other structures
and release it.
Dumping the counters is done with the bulk query API, which relies on
the counter list being sorted and unmutable during querying to
efficiently retrieve cached counter values.
Finally, the IDR data struct is deprecated.
This commit replaces all of that with an xarray.
Adding is now done directly, by using xa_lock.
Deleting is also done directly, under the xa_lock.
Querying is done from a periodic task running every sampling_interval
(default 1s) and uses the bulk query API for efficiency.
It works by iterating over the xarray:
- when a new bulk needs to be started, the bulk information is computed
under the xa_lock.
- the xa iteration state is saved and the xa_lock dropped.
- the HW is queried for bulk counter values.
- the xa_lock is reacquired.
- counter caches with ids covered by the bulk response are updated.
Querying always requests the max bulk length, for simplicity.
Counters could be added/deleted while the HW is queried. This is safe,
as the HW API simply returns unknown values for counters not in HW, but
those values won't be accessed. Only counters present in xarray before
bulk query will actually read queried cache values.
This cuts down the size of mlx5_fc by 4 pointers (88->56 bytes), which
amounts to ~3MB / 100K counters.
But more importantly, this solves the wq spinlock congestion issue seen
happening on high-rate counter insertion+deletion.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241001103709.58127-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The bulk query buffer starts out small (see [1]) and as soon as the
number of counters goes past the initial threshold grows to max
size (32K entries, 512KB) with a retry scheme.
This commit switches to using kvmalloc for the buffer, which has a near
zero likelihood of failing, and thus the explicit retry scheme becomes
superfluous and is taken out. On the low chance the allocation fails, it
will still be retried every sampling_interval, when the wq task runs.
[1] commit b247f32aecad ("net/mlx5: Dynamically resize flow counters
query buffer")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241001103709.58127-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The mlx5_fc_stats and mlx5_fc_pool structs are only used from
fs_counters.c. As such, make them private there.
mlx5_fc_pool is not used or referenced at all outside fs_counters.
mlx5_fc_stats is referenced from mlx5_core_dev, so instead of having it
as a direct member (which requires exporting it from fs_counters), store
a pointer to it, allocate it on init and clear it on destroy.
One caveat is that a simple container_of to get from a 'work' struct to
the outermost mlx5_core_dev struct directly no longer works, so an extra
pointer had to be added to mlx5_fc_stats back to the parent dev.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241001103709.58127-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Slightly high amount of changes in this round, partly because of my
vacation in the last weeks. But all changes are small and nothing
looks worrisome.
The biggest LOCs is MAINTAINERS updates, and there is a core change
for card-ID string creation for non-ASCII inputs. Others are rather
device-specific, such as new quirks and device IDs for ASoC, usual
HD-audio and USB-audio quirks and fixes, as well as regression fixes
in HD-audio HDMI audio and Conexant codec"
* tag 'sound-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (39 commits)
ALSA: hda/conexant: Fix conflicting quirk for System76 Pangolin
ALSA: line6: add hw monitor volume control to POD HD500X
ALSA: gus: Fix some error handling paths related to get_bpos() usage
ALSA: hda: Add missing parameter description for snd_hdac_stream_timecounter_init()
ALSA: usb-audio: Add native DSD support for Luxman D-08u
ALSA: core: add isascii() check to card ID generator
MAINTAINERS: ALSA: use linux-sound@vger.kernel.org list
Revert "ALSA: hda: Conditionally use snooping for AMD HDMI"
ASoC: intel: sof_sdw: Add check devm_kasprintf() returned value
ASoC: imx-card: Set card.owner to avoid a warning calltrace if SND=m
ASoC: dt-bindings: davinci-mcasp: Fix interrupts property
ASoC: qcom: sm8250: add qrb4210-rb2-sndcard compatible string
ASoC: dt-bindings: qcom,sm8250: add qrb4210-rb2-sndcard
ALSA: hda: fix trigger_tstamp_latched
ALSA: hda/realtek: Add a quirk for HP Pavilion 15z-ec200
ALSA: hda/generic: Drop obsoleted obey_preferred_dacs flag
ALSA: hda/generic: Unconditionally prefer preferred_dacs pairs
ALSA: silence integer wrapping warning
ASoC: Intel: soc-acpi: arl: Fix some missing empty terminators
ASoC: Intel: soc-acpi-intel-rpl-match: add missing empty item
...
|