Age | Commit message (Collapse) | Author |
|
Fix a shutdown WARNING in bch2_dev_free caused by active write I/O
references (ca->io_ref[WRITE]) on a device being freed.
The problem occurs when:
- The filesystem is marked read-only (BCH_FS_rw clear in c->flags).
- A subsequent operation (e.g., error handling for device removal)
incorrectly tries to grant write references back to a device.
- During final shutdown, the read-only flag causes the system to skip
stopping write I/O references (bch2_dev_io_ref_stop(ca, WRITE)).
- The leftover active write reference triggers the WARN_ON in
bch2_dev_free.
Prevent this by checking if the filesystem is read-only before
attempting to grant write references to a device in the problematic
code path. Ensure consistency between the filesystem state flag
and the device I/O reference state during shutdown.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
One of my 'git grep' searches tripped on this file listing
an already removed <linux/list.h> primitive.
Refresh it.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Syzkaller detected a use-after-free issue in ext4_insert_dentry that was
caused by out-of-bounds access due to incorrect splitting in do_split.
BUG: KASAN: use-after-free in ext4_insert_dentry+0x36a/0x6d0 fs/ext4/namei.c:2109
Write of size 251 at addr ffff888074572f14 by task syz-executor335/5847
CPU: 0 UID: 0 PID: 5847 Comm: syz-executor335 Not tainted 6.12.0-rc6-syzkaller-00318-ga9cda7c0ffed #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
kasan_report+0x143/0x180 mm/kasan/report.c:601
kasan_check_range+0x282/0x290 mm/kasan/generic.c:189
__asan_memcpy+0x40/0x70 mm/kasan/shadow.c:106
ext4_insert_dentry+0x36a/0x6d0 fs/ext4/namei.c:2109
add_dirent_to_buf+0x3d9/0x750 fs/ext4/namei.c:2154
make_indexed_dir+0xf98/0x1600 fs/ext4/namei.c:2351
ext4_add_entry+0x222a/0x25d0 fs/ext4/namei.c:2455
ext4_add_nondir+0x8d/0x290 fs/ext4/namei.c:2796
ext4_symlink+0x920/0xb50 fs/ext4/namei.c:3431
vfs_symlink+0x137/0x2e0 fs/namei.c:4615
do_symlinkat+0x222/0x3a0 fs/namei.c:4641
__do_sys_symlink fs/namei.c:4662 [inline]
__se_sys_symlink fs/namei.c:4660 [inline]
__x64_sys_symlink+0x7a/0x90 fs/namei.c:4660
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
The following loop is located right above 'if' statement.
for (i = count-1; i >= 0; i--) {
/* is more than half of this entry in 2nd half of the block? */
if (size + map[i].size/2 > blocksize/2)
break;
size += map[i].size;
move++;
}
'i' in this case could go down to -1, in which case sum of active entries
wouldn't exceed half the block size, but previous behaviour would also do
split in half if sum would exceed at the very last block, which in case of
having too many long name files in a single block could lead to
out-of-bounds access and following use-after-free.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Cc: stable@vger.kernel.org
Fixes: 5872331b3d91 ("ext4: fix potential negative array index in do_split()")
Signed-off-by: Artem Sadovnikov <a.sadovnikov@ispras.ru>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250404082804.2567-3-a.sadovnikov@ispras.ru
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Block validity checks need to be skipped in case they are called
for journal blocks since they are part of system's protected
zone.
Currently, this is done by checking inode->ino against
sbi->s_es->s_journal_inum, which is a direct read from the ext4 sb
buffer head. If someone modifies this underneath us then the
s_journal_inum field might get corrupted. To prevent against this,
change the check to directly compare the inode with journal->j_inode.
**Slight change in behavior**: During journal init path,
check_block_validity etc might be called for journal inode when
sbi->s_journal is not set yet. In this case we now proceed with
ext4_inode_block_valid() instead of returning early. Since systems zones
have not been set yet, it is okay to proceed so we can perform basic
checks on the blocks.
Suggested-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/0c06bc9ebfcd6ccfed84a36e79147bf45ff5adc1.1743142920.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.
Use the `DEFINE_RAW_FLEX()` helper for an on-stack definition of
a flexible structure where the size of the flexible-array member
is known at compile-time, and refactor the rest of the code,
accordingly.
So, with these changes, fix the following warning:
fs/ext4/mballoc.c:3041:40: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/Z-SF97N3AxcIMlSi@kspp
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Documentation and implementation of the ext4 super block have
slightly diverged: Padding has been removed in order to make room for
new fields that are still missing in the documentation.
Add the new fields s_encryption_level, s_first_error_errorcode,
s_last_error_errorcode to the documentation of the ext4 super block.
Fixes: f542fbe8d5e8 ("ext4 crypto: reserve codepoints used by the ext4 encryption feature")
Fixes: 878520ac45f9 ("ext4: save the error code which triggered an ext4_error() in the superblock")
Signed-off-by: Tom Vierjahn <tom.vierjahn@acm.org>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20250324221004.5268-1-tom.vierjahn@acm.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Hide get_vm_area() from MMUless builds
The function get_vm_area() is not defined when CONFIG_MMU is not
defined. Hide that function within #ifdef CONFIG_MMU.
- Fix output of synthetic events when they have dynamic strings
The print fmt of the synthetic event's format file use to have "%.*s"
for dynamic size strings even though the user space exported
arguments had only __get_str() macro that provided just a nul
terminated string. This was fixed so that user space could parse this
properly.
But the reason that it had "%.*s" was because internally it provided
the maximum size of the string as one of the arguments. The fix that
replaced "%.*s" with "%s" caused the trace output (when the kernel
reads the event) to write "(efault)" as it would now read the length
of the string as "%s".
As the string provided is always nul terminated, there's no reason
for the internal code to use "%.*s" anyway. Just remove the length
argument to match the "%s" that is now in the format.
- Fix the ftrace subops hash logic of the manager ops hash
The function_graph uses the ftrace subops code. The subops code is a
way to have a single ftrace_ops registered with ftrace to determine
what functions will call the ftrace_ops callback. More than one user
of function graph can register a ftrace_ops with it. The function
graph infrastructure will then add this ftrace_ops as a subops with
the main ftrace_ops it registers with ftrace. This is because the
functions will always call the function graph callback which in turn
calls the subops ftrace_ops callbacks.
The main ftrace_ops must add a callback to all the functions that the
subops want a callback from. When a subops is registered, it will
update the main ftrace_ops hash to include the functions it wants.
This is the logic that was broken.
The ftrace_ops hash has a "filter_hash" and a "notrace_hash" where
all the functions in the filter_hash but not in the notrace_hash are
attached by ftrace. The original logic would have the main ftrace_ops
filter_hash be a union of all the subops filter_hashes and the main
notrace_hash would be a intersect of all the subops filter hashes.
But this was incorrect because the notrace hash depends on the
filter_hash it is associated to and not the union of all
filter_hashes.
Instead, when a subops is added, just include all the functions of
the subops hash that are in its filter_hash but not in its
notrace_hash. The main subops hash should not use its notrace hash,
unless all of its subops hashes have an empty filter_hash (which
means to attach to all functions), and then, and only then, the main
ftrace_ops notrace hash can be the intersect of all the subops
hashes.
This not only fixes the bug, but also simplifies the code.
- Add a selftest to better test the subops filtering
Add a selftest that would catch the bug fixed by the above change.
- Fix extra newline printed in function tracing with retval
The function parameter code changed the output logic slightly and
called print_graph_retval() and also printed a newline. The
print_graph_retval() also prints a newline which caused blank lines
to be printed in the function graph tracer when retval was added.
This caused one of the selftests to fail if retvals were enabled.
Instead remove the new line output from print_graph_retval() and have
the callers always print the new line so that it doesn't have to do
special logic if it calls print_graph_retval() or not.
- Fix out-of-bound memory access in the runtime verifier
When rv_is_container_monitor() is called on the last entry on the
link list it references the next entry, which is the list head and
causes an out-of-bound memory access.
* tag 'trace-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rv: Fix out-of-bound memory access in rv_is_container_monitor()
ftrace: Do not have print_graph_retval() add a newline
tracing/selftest: Add test to better test subops filtering of function graph
ftrace: Fix accounting of subop hashes
ftrace: Properly merge notrace hashes
tracing: Do not add length to print format in synthetic events
tracing: Hide get_vm_area() from MMUless builds
|
|
Pull bpf fixes from Alexei Starovoitov:
- Followup fixes for resilient spinlock (Kumar Kartikeya Dwivedi):
- Make res_spin_lock test less verbose, since it was spamming BPF
CI on failure, and make the check for AA deadlock stronger
- Fix rebasing mistake and use architecture provided
res_smp_cond_load_acquire
- Convert BPF maps (queue_stack and ringbuf) to resilient spinlock
to address long standing syzbot reports
- Make sure that classic BPF load instruction from SKF_[NET|LL]_OFF
offsets works when skb is fragmeneted (Willem de Bruijn)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Convert ringbuf map to rqspinlock
bpf: Convert queue_stack map to rqspinlock
bpf: Use architecture provided res_smp_cond_load_acquire
selftests/bpf: Make res_spin_lock AA test condition stronger
selftests/net: test sk_filter support for SKF_NET_OFF on frags
bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags
selftests/bpf: Make res_spin_lock test less verbose
|
|
unreleased standalone Zen5 microcode patches
All Zen5 machines out there should get BIOS updates which update to the
correct microcode patches addressing the microcode signature issue.
However, silly people carve out random microcode blobs from BIOS
packages and think are doing other people a service this way...
Block loading of any unreleased standalone Zen5 microcode patches.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: <stable@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Cc: Nikolay Borisov <nik.borisov@suse.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250410114222.32523-1-bp@kernel.org
|
|
When rv_is_container_monitor() is called on the last monitor in
rv_monitors_list, KASAN yells:
BUG: KASAN: global-out-of-bounds in rv_is_container_monitor+0x101/0x110
Read of size 8 at addr ffffffff97c7c798 by task setup/221
The buggy address belongs to the variable:
rv_monitors_list+0x18/0x40
This is due to list_next_entry() is called on the last entry in the list.
It wraps around to the first list_head, and the first list_head is not
embedded in struct rv_monitor_def.
Fix it by checking if the monitor is last in the list.
Cc: stable@vger.kernel.org
Cc: Gabriele Monaco <gmonaco@redhat.com>
Fixes: cb85c660fcd4 ("rv: Add option for nested monitors and include sched")
Link: https://lore.kernel.org/e85b5eeb7228bfc23b8d7d4ab5411472c54ae91b.1744355018.git.namcao@linutronix.de
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The retval and retaddr options for function_graph tracer will add a
comment at the end of a function for both leaf and non leaf functions that
looks like:
__wake_up_common(); /* ret=0x1 */
} /* pick_next_task_fair ret=0x0 */
The function print_graph_retval() adds a newline after the "*/". But if
that's not called, the caller function needs to make sure there's a
newline added.
This is confusing and when the function parameters code was added, it
added a newline even when calling print_graph_retval() as the fact that
the print_graph_retval() function prints a newline isn't obvious.
This caused an extra newline to be printed and that made it fail the
selftests when the retval option was set, as the selftests were not
expecting blank lines being injected into the trace.
Instead of having print_graph_retval() print a newline, just have the
caller always print the newline regardless if it calls print_graph_retval()
or not. This not only fixes this bug, but it also simplifies the code.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250411133015.015ca393@gandalf.local.home
Reported-by: Mark Brown <broonie@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/ccc40f2b-4b9e-4abd-8daf-d22fce2a86f0@sirena.org.uk/
Fixes: ff5c9c576e754 ("ftrace: Add support for function argument to graph tracer")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux
Pull pwm fixes from Uwe Kleine-König:
"A set of fixes for pwm core and various drivers
The first three patches handle clk_get_rate() returning 0 (which might
happen for example if the CCF is disabled). The first of these was
found because this triggered a warning with clang, the two others by
looking for similar issues in other drivers.
The remaining three fixes address issues in the new waveform pwm API.
Now that I worked on this a bit more, the finer details and corner
cases are better understood and the code is fixed accordingly"
* tag 'pwm/for-6.15-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
pwm: axi-pwmgen: Let .round_waveform_tohw() signal when request was rounded up
pwm: stm32: Search an appropriate duty_cycle if period cannot be modified
pwm: Let pwm_set_waveform() succeed even if lowlevel driver rounded up
pwm: fsl-ftm: Handle clk_get_rate() returning 0
pwm: rcar: Improve register calculation
pwm: mediatek: Prevent divide-by-zero in pwm_mediatek_config()
|
|
Jijie Shao says:
====================
There are some bugfix for hibmcge driver
v2: https://lore.kernel.org/20250403135311.545633-7-shaojijie@huawei.com/
v1: https://lore.kernel.org/20250402133905.895421-1-shaojijie@huawei.com/
====================
Link: https://patch.msgid.link/20250410021327.590362-1-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After detecting the np_link_fail exception,
the driver attempts to fix the exception by
using phy_stop() and phy_start() in the scheduled task.
However, hbg_fix_np_link_fail() and .ndo_stop()
may be concurrently executed. As a result,
phy_stop() is executed twice, and the following Calltrace occurs:
hibmcge 0000:84:00.2 enp132s0f2: Link is Down
hibmcge 0000:84:00.2: failed to link between MAC and PHY, try to fix...
------------[ cut here ]------------
called from state HALTED
WARNING: CPU: 71 PID: 23391 at drivers/net/phy/phy.c:1503 phy_stop...
...
pc : phy_stop+0x138/0x180
lr : phy_stop+0x138/0x180
sp : ffff8000c76bbd40
x29: ffff8000c76bbd40 x28: 0000000000000000 x27: 0000000000000000
x26: ffff2020047358c0 x25: ffff202004735940 x24: ffff20200000e405
x23: ffff2020060e5178 x22: ffff2020060e4000 x21: ffff2020060e49c0
x20: ffff2020060e5170 x19: ffff20202538e000 x18: 0000000000000020
x17: 0000000000000000 x16: ffffcede02e28f40 x15: ffffffffffffffff
x14: 0000000000000000 x13: 205d313933333254 x12: 5b5d393430303233
x11: ffffcede04555958 x10: ffffcede04495918 x9 : ffffcede0274fee0
x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 0000000000000001
x5 : 00000000002bffa8 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff20202e429480
Call trace:
phy_stop+0x138/0x180
hbg_fix_np_link_fail+0x4c/0x90 [hibmcge]
hbg_service_task+0xfc/0x148 [hibmcge]
process_one_work+0x180/0x398
worker_thread+0x210/0x328
kthread+0xe0/0xf0
ret_from_fork+0x10/0x20
---[ end trace 0000000000000000 ]---
This patch adds the rtnl_lock to hbg_fix_np_link_fail()
to ensure that other operations are not performed concurrently.
In addition, np_link_fail exception can be fixed
only when the PHY is link.
Fixes: e0306637e85d ("net: hibmcge: Add support for mac link exception handling feature")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410021327.590362-8-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The MAC hardware supports receiving two types of
pause frames from link partner.
One is a pause frame with a destination address
of 01:80:C2:00:00:01.
The other is a pause frame whose destination address
is the address of the hibmcge driver.
01:80:C2:00:00:01 is supported by default.
In .ndo_set_mac_address(), the hibmcge driver calls
.hbg_hw_set_rx_pause_mac_addr() to set its mac address as the
destination address of the rx puase frame.
Therefore, pause frames with two types of MAC addresses can be received.
Currently, the rx pause addr does not restored after reset.
As a result, pause frames whose destination address is
the hibmcge driver address cannot be correctly received.
This patch restores the configuration by calling
.hbg_hw_set_rx_pause_mac_addr() after reset is complete.
Fixes: 3f5a61f6d504 ("net: hibmcge: Add reset supported in this module")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410021327.590362-7-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the debugfs file, the driver displays the np_link fail state
based on the HBG_NIC_STATE_NP_LINK_FAIL.
However, HBG_NIC_STATE_NP_LINK_FAIL is cleared in hbg_service_task()
So, this value of np_link fail is always false.
This patch directly reads the related register to display the real state.
Fixes: e0306637e85d ("net: hibmcge: Add support for mac link exception handling feature")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410021327.590362-6-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A dbg log is generated when the driver modifies the MTU,
which is expected to trace the change of the MTU.
However, the log is recorded after WRITE_ONCE().
At this time, netdev->mtu has been changed to the new value.
As a result, netdev->mtu is the same as new_mtu.
This patch modifies the log location and records logs before WRITE_ONCE().
Fixes: ff4edac6e9bd ("net: hibmcge: Implement some .ndo functions")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410021327.590362-5-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
issue
hbg_irqs is a global array which contains irq statistics.
However, the irq statistics of different network ports
point to the same global array. As a result, the statistics are incorrect.
This patch allocates a statistics array for each network port
to prevent the statistics of different network ports
from affecting each other.
irq statistics are removed from hbg_irq_info. Therefore,
all data in hbg_irq_info remains unchanged. Therefore,
the input parameter of some functions is changed to const.
Fixes: 4d089035fa19 ("net: hibmcge: Add interrupt supported in this module")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410021327.590362-4-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The driver does not support multicast filtering,
the mask must be set to 0xFFFFFFFF. Otherwise,
incorrect filtering occurs.
This patch fixes this problem.
Fixes: 37b367d60d0f ("net: hibmcge: Add unicast frame filter supported in this module")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410021327.590362-3-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The driver supports pause frames,
but does not pass pause frames based on rx pause enable configuration,
resulting in incorrect pause frame statistics.
like this:
mz eno3 '01 80 c2 00 00 01 00 18 2d 04 00 9c 88 08 00 01 ff ff' \
-p 64 -c 100
ethtool -S enp132s0f2 | grep -v ": 0"
NIC statistics:
rx_octets_total_filt_cnt: 6800
rx_filt_pkt_cnt: 100
The rx pause frames are filtered by the MAC hardware.
This patch configures pass pause frames based on the
rx puase enable status to ensure that
rx pause frames are not filtered.
mz eno3 '01 80 c2 00 00 01 00 18 2d 04 00 9c 88 08 00 01 ff ff' \
-p 64 -c 100
ethtool --include-statistics -a enp132s0f2
Pause parameters for enp132s0f2:
Autonegotiate: on
RX: on
TX: on
RX negotiated: on
TX negotiated: on
Statistics:
tx_pause_frames: 0
rx_pause_frames: 100
Fixes: 3a03763f3876 ("net: hibmcge: Add pauseparam supported in this module")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250410021327.590362-2-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot reports a case of ethtool_ops->get_link being called without
ops lock:
ethtool_op_get_link+0x15/0x60 net/ethtool/ioctl.c:63
bond_check_dev_link+0x1fb/0x4b0 drivers/net/bonding/bond_main.c:864
bond_miimon_inspect drivers/net/bonding/bond_main.c:2734 [inline]
bond_mii_monitor+0x49d/0x3170 drivers/net/bonding/bond_main.c:2956
process_one_work kernel/workqueue.c:3238 [inline]
process_scheduled_works+0xac3/0x18e0 kernel/workqueue.c:3319
worker_thread+0x870/0xd50 kernel/workqueue.c:3400
kthread+0x7b7/0x940 kernel/kthread.c:464
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:153
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
Commit 04efcee6ef8d ("net: hold instance lock during NETDEV_CHANGE")
changed to lockless __linkwatch_sync_dev in ethtool_op_get_link.
All paths except bonding are coming via locked ioctl. Add necessary
locking to bonding.
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reported-by: syzbot+48c14f61594bdfadb086@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=48c14f61594bdfadb086
Fixes: 04efcee6ef8d ("net: hold instance lock during NETDEV_CHANGE")
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250410161117.3519250-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Bind lookup runs under RCU, so ensure that a socket doesn't go away in
the middle of a lookup.
Fixes: 833ef3b91de6 ("mctp: Populate socket implementation")
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://patch.msgid.link/20250410-mctp-rcu-sock-v1-1-872de9fdc877@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
rpl is passed as a pointer to ethtool_cmis_module_poll(), so the correct
size of rpl is sizeof(*rpl) which should be just 1 byte. Using the
pointer size instead can cause stack corruption:
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ethtool_cmis_wait_for_cond+0xf4/0x100
CPU: 72 UID: 0 PID: 4440 Comm: kworker/72:2 Kdump: loaded Tainted: G OE 6.11.0 #24
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Dell Inc. PowerEdge R760/04GWWM, BIOS 1.6.6 09/20/2023
Workqueue: events module_flash_fw_work
Call Trace:
<TASK>
panic+0x339/0x360
? ethtool_cmis_wait_for_cond+0xf4/0x100
? __pfx_status_success+0x10/0x10
? __pfx_status_fail+0x10/0x10
__stack_chk_fail+0x10/0x10
ethtool_cmis_wait_for_cond+0xf4/0x100
ethtool_cmis_cdb_execute_cmd+0x1fc/0x330
? __pfx_status_fail+0x10/0x10
cmis_cdb_module_features_get+0x6d/0xd0
ethtool_cmis_cdb_init+0x8a/0xd0
ethtool_cmis_fw_update+0x46/0x1d0
module_flash_fw_work+0x17/0xa0
process_one_work+0x179/0x390
worker_thread+0x239/0x340
? __pfx_worker_thread+0x10/0x10
kthread+0xcc/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2d/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Fixes: a39c84d79625 ("ethtool: cmis_cdb: Add a layer for supporting CDB commands")
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250409173312.733012-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The memory allocated for intr_ctrl_regset, which is passed to
debugfs_create_regset32() may not be cleaned up when the driver is
removed. Fix that by using device managed allocation for it.
Fixes: 45d76f492938 ("pds_core: set up device and adminq")
Signed-off-by: Abdun Nihaal <abdun.nihaal@gmail.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20250409054450.48606-1-abdun.nihaal@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Disable hash request chaining in case a driver that copies an
ahash_request object by hand accidentally triggers chaining.
Reported-by: Manorit Chawdhry <m-chawdhry@ti.com>
Fixes: f2ffe5a9183d ("crypto: hash - Add request chaining API")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Manorit Chawdhry <m-chawdhry@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
In order to use scomp_free_streams to free the partially allocted
streams in the allocation error path, move the alg->stream assignment
to the beginning. Also check for error pointers in scomp_free_streams
before freeing the ctx.
Finally set alg->stream to NULL to not break subsequent attempts
to allocate the streams.
Fixes: 3d72ad46a23a ("crypto: acomp - Move stream management into scomp layer")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Co-developed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Co-developed-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
In the case of apply_to_existing_page_range(), apply_to_pte_range() is
reached with 'create' set to false. When !create, the loop over the PTE
page table is broken.
apply_to_pte_range() will only move to the next PTE entry if 'create' is
true or if the current entry is not pte_none().
This means that the user of apply_to_existing_page_range() will not have
'fn' called for any entries after the first pte_none() in the PTE page
table.
Fix the loop logic in apply_to_pte_range().
There are no known runtime issues from this, but the fix is trivial enough
for stable@ even without a known buggy user.
Link: https://lkml.kernel.org/r/20250409094043.1629234-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: be1db4753ee6 ("mm/memory.c: add apply_to_existing_page_range() helper")
Cc: Daniel Axtens <dja@axtens.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Following build warning comes up for cow test as 'transferred' variable has
not been initialized. Fix the warning via zero init for the variable.
CC cow
cow.c: In function `do_test_vmsplice_in_parent':
cow.c:365:61: warning: `transferred' may be used uninitialized [-Wmaybe-uninitialized]
365 | cur = read(fds[0], new + total, transferred - total);
| ~~~~~~~~~~~~^~~~~~~
cow.c:296:29: note: `transferred' was declared here
296 | ssize_t cur, total, transferred;
| ^~~~~~~~~~~
CC compaction_test
CC gup_longterm
Link: https://lkml.kernel.org/r/20250409095006.1422620-1-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
alloc_pages_bulk_node() may partially succeed and allocate fewer than the
requested nr_pages. There are several conditions under which this can
occur, but we have encountered the case where CONFIG_PAGE_OWNER is enabled
causing all bulk allocations to always fallback to single page allocations
due to commit 187ad460b841 ("mm/page_alloc: avoid page allocator recursion
with pagesets.lock held").
Currently vm_module_tags_populate() immediately fails when
alloc_pages_bulk_node() returns fewer than the requested number of pages.
When this happens memory allocation profiling gets disabled, for example
[ 14.297583] [9: modprobe: 465] Failed to allocate memory for allocation tags in the module scsc_wlan. Memory allocation profiling is disabled!
[ 14.299339] [9: modprobe: 465] modprobe: Failed to insmod '/vendor/lib/modules/scsc_wlan.ko' with args '': Out of memory
This patch causes vm_module_tags_populate() to retry bulk allocations for
the remaining memory instead of failing immediately which will avoid the
disablement of memory allocation profiling.
Link: https://lkml.kernel.org/r/20250409225111.3770347-1-tjmercier@google.com
Fixes: 0f9b685626da ("alloc_tag: populate memory for module tags as needed")
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Reported-by: Janghyuck Kim <janghyuck.kim@samsung.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
As recent contributions where made with the @ideasonboard.com email, any
reply would fail. Add the proper address to map this old one.
Link: https://lkml.kernel.org/r/20250328-mailmap-v2-v2-1-bdc69d2193ca@yoseli.org
Signed-off-by: Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We got a late smatch warning and some additional review feedback.
smatch warnings:
mm/memory.c:1428 copy_page_range() error: uninitialized symbol 'pfn'.
We actually use the pfn only when it is properly initialized; however, we
may pass an uninitialized value to a function -- although it will not use
it that likely still is UB in C.
So let's just fix it by always initializing pfn in the caller of
track_pfn_copy(), and improving the documentation of track_pfn_copy().
While at it, clarify the doc of untrack_pfn_copy(), that internal checks
make sure if we actually have to untrack anything.
Link: https://lkml.kernel.org/r/20250408085950.976103-1-david@redhat.com
Fixes: dc84bc2aba85 ("x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Closes: https://lore.kernel.org/r/202503270941.IFILyNCX-lkp@intel.com/
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
filemap_get_folios_contig() is supposed to return distinct folios found
within [start, end]. Large folios in the Xarray become multi-index
entries. xas_next() can iterate through the sub-indexes before finding a
sibling entry and breaking out of the loop.
This can result in a returned folio_batch containing an indeterminate
number of duplicate folios, which forces the callers to skeptically handle
the returned batch. This is inefficient and incurs a large maintenance
overhead.
We can fix this by calling xas_advance() after we have successfully adding
a folio to the batch to ensure our Xarray is positioned such that it will
correctly find the next folio - similar to filemap_get_read_batch().
Link: https://lkml.kernel.org/r/Z-8s1-kiIDkzgRbc@fedora
Fixes: 35b471467f88 ("filemap: add filemap_get_folios_contig()")
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reported-by: Qu Wenruo <quwenruo.btrfs@gmx.com>
Closes: https://lkml.kernel.org/r/b714e4de-2583-4035-b829-72cfb5eb6fc6@gmx.com
Tested-by: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Missing line break at the end of the format string.
Link: https://lkml.kernel.org/r/20250407103017.2979821-1-18810879172@163.com
Signed-off-by: wangxuewen <wangxuewen@kylinos.cn>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When running mincore test cases, I encountered the following failures:
"
mincore_selftest.c:359:check_tmpfs_mmap:Expected ra_pages (511) == 0 (0)
mincore_selftest.c:360:check_tmpfs_mmap:Read-ahead pages found in memory
check_tmpfs_mmap: Test terminated by assertion
FAIL global.check_tmpfs_mmap
not ok 5 global.check_tmpfs_mmap
FAILED: 4 / 5 tests passed
"
The reason for the test case failure is that my system automatically enabled
tmpfs large folio allocation by adding the 'transparent_hugepage_tmpfs=always'
cmdline. However, the test case still expects the tmpfs mounted on /dev/shm to
allocate small folios, which leads to assertion failures when verifying readahead
pages.
As discussed with David, there's no reason to continue checking the readahead
logic for tmpfs. Drop it to fix this issue.
Link: https://lkml.kernel.org/r/9a00856cc6a8b4e46f4ab8b1af11ce5fc1a31851.1744025467.git.baolin.wang@linux.alibaba.com
Fixes: d635ccdb435c ("mm: shmem: add a kernel command line to change the default huge policy for tmpfs")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Barry Song <21cnbao@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In set_max_huge_pages(), min_count is computed taking into account surplus
huge pages, which might lead in some cases to not be able to free huge
pages and end up accounting them as surplus instead.
One way to solve it is to subtract surplus_huge_pages directly, but we
cannot do it blindly because there might be surplus pages that are also
free pages, which might happen when we fail to restore the vmemmap for
optimized hvo pages. So we could be subtracting the same page twice.
In order to work this around, let us first compute the number of free
persistent pages, and use that along with surplus pages to compute
min_count.
Steps to reproduce:
1) create 5 hugetlb folios in Node0
2) run a program to use all the hugetlb folios
3) echo 0 > nr_hugepages for Node0 to free the hugetlb folios. Thus
the 5 hugetlb folios in Node0 are accounted as surplus.
4) create 5 hugetlb folios in Node1
5) echo 0 > nr_hugepages for Node1 to free the hugetlb folios
The result:
Node0 Node1
Total 5 5
Free 0 5
Surp 5 5
The result with this patch:
Node0 Node1
Total 5 0
Free 0 0
Surp 5 0
Link: https://lkml.kernel.org/r/20250409055957.3774471-1-tujinjiang@huawei.com
Link: https://lkml.kernel.org/r/20250407124706.2688092-1-tujinjiang@huawei.com
Fixes: 9a30523066cd ("hugetlb: add per node hstate attributes")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The cma_declare_contiguous_nid code was refactored by commit c009da4258f9
("mm, cma: support multiple contiguous ranges, if requested"), so that it
could use an internal function to attempt a single range area first, and
then try a multi-range one.
However, that meant that the actual base address used for the !fixed case
(base == 0) wasn't available one level up to be printed in the
informational message, and it would always end up printing a base address
of 0 in the boot message.
Make the internal function take a phys_addr_t pointer to the base address,
so that the value is available to the caller.
[fvdl@google.com: v2]
Link: https://lkml.kernel.org/r/20250408164000.3215690-1-fvdl@google.com
Link: https://lkml.kernel.org/r/20250407165435.2567898-1-fvdl@google.com
Fixes: c009da4258f9 ("mm, cma: support multiple contiguous ranges, if requested")
Signed-off-by: Frank van der Linden <fvdl@google.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Closes: https://lore.kernel.org/linux-mm/CAMuHMdVWviQ7O9yBFE3f=ev0eVb1CnsQvR6SKtEROBbM6z7g3w@mail.gmail.com/
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal
single pages from biggest buddy") as the root cause of a 56.4% regression
in vm-scalability::lru-file-mmap-read.
Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix
freelist movement during block conversion"), as the root cause for a
regression in worst-case zone->lock+irqoff hold times.
Both of these patches modify the page allocator's fallback path to be less
greedy in an effort to stave off fragmentation. The flip side of this is
that fallbacks are also less productive each time around, which means the
fallback search can run much more frequently.
Carlos' traces point to rmqueue_bulk() specifically, which tries to refill
the percpu cache by allocating a large batch of pages in a loop. It
highlights how once the native freelists are exhausted, the fallback code
first scans orders top-down for whole blocks to claim, then falls back to
a bottom-up search for the smallest buddy to steal. For the next batch
page, it goes through the same thing again.
This can be made more efficient. Since rmqueue_bulk() holds the
zone->lock over the entire batch, the freelists are not subject to outside
changes; when the search for a block to claim has already failed, there is
no point in trying again for the next page.
Modify __rmqueue() to remember the last successful fallback mode, and
restart directly from there on the next rmqueue_bulk() iteration.
Oliver confirms that this improves beyond the regression that the test
robot reported against c2f6ea38fc1b:
commit:
f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap")
c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy")
acc4d5ff0b ("Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()") <--- your patch
f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1 acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
25525364 ± 3% -56.4% 11135467 -57.8% 10779336 +31.6% 33581409 vm-scalability.throughput
Carlos confirms that worst-case times are almost fully recovered
compared to before the earlier culprit patch:
2dd482ba627d (before freelist hygiene): 1ms
c0cd6f557b90 (after freelist hygiene): 90ms
next-20250319 (steal smallest buddy): 280ms
this patch : 8ms
[jackmanb@google.com: comment updates]
Link: https://lkml.kernel.org/r/D92AC0P9594X.3BML64MUKTF8Z@google.com
[hannes@cmpxchg.org: reset rmqueue_mode in rmqueue_buddy() error loop, per Yunsheng Lin]
Link: https://lkml.kernel.org/r/20250409140023.GA2313@cmpxchg.org
Link: https://lkml.kernel.org/r/20250407180154.63348-1-hannes@cmpxchg.org
Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion")
Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: Carlos Song <carlos.song@nxp.com>
Tested-by: Carlos Song <carlos.song@nxp.com>
Tested-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202503271547.fc08b188-lkp@intel.com
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Tested-by: Shivank Garg <shivankg@amd.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> [6.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Modules without a description now cause a warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/tests/slub_kunit.o
Link: https://lkml.kernel.org/r/20250324173242.1501003-10-arnd@kernel.org
Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Guenetr Roeck <linux@roeck-us.net>
Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Pei Xiao <xiaopei01@kylinos.cn>
Cc: Rae Moar <rmoar@google.com>
Cc: Stehen Rothwell <sfr@canb.auug.org.au>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Modules without a description now cause a warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in mm/kasan/kasan_test.o
[akpm@linux-foundation.org: update description text, per Andrey]
Link: https://lkml.kernel.org/r/20250324173242.1501003-9-arnd@kernel.org
Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Cc: Macro Elver <elver@google.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nihar Chaithanya <niharchaithanya@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sabyrzhan Tasbolatov <snovitoll@gmail.com>
Cc: Stehen Rothwell <sfr@canb.auug.org.au>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Modules without a description now cause a warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/ucs2_string.o
Link: https://lkml.kernel.org/r/20250324173242.1501003-7-arnd@kernel.org
Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Stehen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Modules without a description now cause a warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/zlib_inflate/zlib_inflate.o
Link: https://lkml.kernel.org/r/20250324173242.1501003-6-arnd@kernel.org
Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Stehen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Modules without a description now cause a warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/fpga/tests/fpga-bridge-test.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/fpga/tests/fpga-mgr-test.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/fpga/tests/fpga-region-test.o
Link: https://lkml.kernel.org/r/20250324173242.1501003-4-arnd@kernel.org
Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Hao Wu <hao.wu@intel.com>
Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Cc: Marco Pagani <marpagan@redhat.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Moritz Fischer <mdf@kernel.org>
Cc: Russ Weight <russ.weight@linux.dev>
Cc: Stehen Rothwell <sfr@canb.auug.org.au>
Cc: Tom Rix <trix@redhat.com>
Cc: Xu Yilun <yilun.xu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Every module should have a description, so add one for each of these modules.
[akpm@linux-foundation.org: match the livepatch-callbacks-mod.c description, per Petr]
Link: https://lkml.kernel.org/r/20250324173242.1501003-3-arnd@kernel.org
Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Easwar Hariharan <eahariha@linux.microsoft.com>
Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Stehen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This is needed to avoid a build warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/asn1_decoder.o
Link: https://lkml.kernel.org/r/20250324173242.1501003-2-arnd@kernel.org
Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Stehen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently, if a VMA merge fails due to an OOM condition arising on commit
merge or a failure to duplicate anon_vma's, we report this so the caller
can handle it.
However there are cases where the caller is only ostensibly trying a
merge, and doesn't mind if it fails due to this condition.
Since we do not want to introduce an implicit assumption that we only
actually modify VMAs after OOM conditions might arise, add a 'give up on
oom' option and make an explicit contract that, should this flag be set, we
absolutely will not modify any VMAs should OOM arise and just bail out.
Since it'd be very unusual for a user to try to vma_modify() with this flag
set but be specifying a range within a VMA which ends up being split (which
can fail due to rlimit issues, not only OOM), we add a debug warning for
this condition.
The motivating reason for this is uffd release - syzkaller (and Pedro
Falcato's VERY astute analysis) found a way in which an injected fault on
allocation, triggering an OOM condition on commit merge, would result in
uffd code becoming confused and treating an error value as if it were a VMA
pointer.
To avoid this, we make use of this new VMG flag to ensure that this never
occurs, utilising the fact that, should we be clearing entire VMAs, we do
not wish an OOM event to be reported to us.
Many thanks to Pedro Falcato for his excellent analysis and Jann Horn for
his insightful and intelligent analysis of the situation, both of whom were
instrumental in this fix.
Link: https://lkml.kernel.org/r/20250321100937.46634-1-lorenzo.stoakes@oracle.com
Reported-by: syzbot+20ed41006cf9d842c2b5@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67dc67f0.050a0220.25ae54.001e.GAE@google.com/
Fixes: 47b16d0462a4 ("mm: abort vma_modify() on merge out of memory failure")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Suggested-by: Pedro Falcato <pfalcato@suse.de>
Suggested-by: Jann Horn <jannh@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently if the filesystem for the cgroups version it wants to use is not
mounted charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh tests
will attempt to mount it on the hard coded path /dev/cgroup/memory,
deleting that directory when the test finishes. This will fail if there
is not a preexisting directory at that path, and since the directory is
deleted subsequent runs of the test will fail. Instead of relying on this
hard coded directory name use mktemp to generate a temporary directory to
use as a mountpoint, fixing both the assumption and the disruption caused
by deleting a preexisting directory.
This means that if the relevant cgroup filesystem is not already mounted
then we rely on having coreutils (which provides mktemp) installed. I
suspect that many current users are relying on having things automounted
by default, and given that the script relies on bash it's probably not an
unreasonable requirement.
Link: https://lkml.kernel.org/r/20250404-kselftest-mm-cgroup2-detection-v1-1-3dba6d32ba8c@kernel.org
Fixes: 209376ed2a84 ("selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: Aishwarya TCV <aishwarya.tcv@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add Andrew as kexec/kdump maintainer because he has been helping review
and merge ready kexec/kdump patches.
And I would like to nominate myself as kexec maintainer because I always
try to review generic kexec codes.
Link: https://lkml.kernel.org/r/20250328104402.16826-1-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Acked-by: Simon Horman <horms@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
It's wrong to use nid directly since the nid may be changed in allocation.
Use folio_nid() to obtain the nid of folio instead.
Fix: 2273dea6b1e1 ("mm/hugetlb: update nr_huge_pages and surplus_huge_pages together")
Link: https://lkml.kernel.org/r/20250403064138.2867929-1-liushixin2@huawei.com
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
spin_trylock followed by spin_lock will cause extra write cache access.
If the lock is contended it may cause unnecessary cache line bouncing and
will execute redundant irq restore/save pair. Therefore, check
alloc/fpi_flags first and use spin_trylock or spin_lock.
Link: https://lkml.kernel.org/r/20250331002809.94758-1-alexei.starovoitov@gmail.com
Fixes: 97769a53f117 ("mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkman <daniel@iogearbox.net>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The compaction code doesn't take references on pages until we're certain
we should attempt to handle it.
In the hugetlb case, isolate_or_dissolve_huge_page() may return -EBUSY
without taking a reference to the folio associated with our pfn. If our
folio's refcount drops to 0, compound_nr() becomes unpredictable, making
low_pfn and nr_scanned unreliable. The user-visible effect is minimal -
this should rarely happen (if ever).
Fix this by storing the folio statistics earlier on the stack (just like
the THP and Buddy cases).
Also revert commit 66fe1cf7f581 ("mm: compaction: use helper compound_nr
in isolate_migratepages_block") to make backporting easier.
Link: https://lkml.kernel.org/r/20250401021025.637333-1-vishal.moola@gmail.com
Fixes: 369fa227c219 ("mm: make alloc_contig_range handle free hugetlb pages")
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|