Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"As usual, it's scattered changes all over. Patches touching things
outside of our traditional areas in the tree have been Acked by
maintainers or were trivial changes:
- loadpin: remove unsupported MODULE_COMPRESS_NONE (Arulpandiyan
Vadivel)
- samples/check-exec: Fix script name (Mickaël Salaün)
- yama: remove needless locking in yama_task_prctl() (Oleg Nesterov)
- lib/string_choices: Sort by function name (R Sundar)
- hardening: Allow default HARDENED_USERCOPY to be set at compile
time (Mel Gorman)
- uaccess: Split out compile-time checks into ucopysize.h
- kbuild: clang: Support building UM with SUBARCH=i386
- x86: Enable i386 FORTIFY_SOURCE on Clang 16+
- ubsan/overflow: Rework integer overflow sanitizer option
- Add missing __nonstring annotations for callers of
memtostr*()/strtomem*()
- Add __must_be_noncstr() and have memtostr*()/strtomem*() check for
it
- Introduce __nonstring_array for silencing future GCC 15 warnings"
* tag 'hardening-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits)
compiler_types: Introduce __nonstring_array
hardening: Enable i386 FORTIFY_SOURCE on Clang 16+
x86/build: Remove -ffreestanding on i386 with GCC
ubsan/overflow: Enable ignorelist parsing and add type filter
ubsan/overflow: Enable pattern exclusions
ubsan/overflow: Rework integer overflow sanitizer option to turn on everything
samples/check-exec: Fix script name
yama: don't abuse rcu_read_lock/get_task_struct in yama_task_prctl()
kbuild: clang: Support building UM with SUBARCH=i386
loadpin: remove MODULE_COMPRESS_NONE as it is no longer supported
lib/string_choices: Rearrange functions in sorted order
string.h: Validate memtostr*()/strtomem*() arguments more carefully
compiler.h: Introduce __must_be_noncstr()
nilfs2: Mark on-disk strings as nonstring
uapi: stddef.h: Introduce __kernel_nonstring
x86/tdx: Mark message.bytes as nonstring
string: kunit: Mark nonstring test strings as __nonstring
scsi: qla2xxx: Mark device strings as nonstring
scsi: mpt3sas: Mark device strings as nonstring
scsi: mpi3mr: Mark device strings as nonstring
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs initramfs updates from Christian Brauner:
"This adds basic kunit test coverage for initramfs unpacking and cleans
up some buffer handling issues and inefficiencies"
* tag 'vfs-6.15-rc1.initramfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
MAINTAINERS: append initramfs files to the VFS section
initramfs: avoid static buffer for error message
initramfs: fix hardlink hash leak without TRAILER
initramfs: reuse name_len for dir mtime tracking
initramfs: allocate heap buffers together
initramfs: avoid memcpy for hex header fields
vsprintf: add simple_strntoul
initramfs_test: kunit tests for initramfs unpacking
init: add initramfs_internal.h
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
GCC has expanded support of the "nonstring" attribute so that it can be
applied to arrays of character arrays[1], which is needed to identify
correct static initialization of those kinds of objects. Since this was
not supported prior to GCC 15, we need to distinguish the usage of Linux's
existing __nonstring macro for the attribute for non-multi-dimensional
char arrays. Until GCC 15 is the minimum version, use __nonstring_array to
mark arrays of non-string character arrays. (Regular non-string character
arrays can continue to use __nonstring.) Once GCC 15 is the minimum
compiler version we can replace all uses of __nonstring_array with just
__nonstring and remove this macro.
This allows for changes like this:
-static const char table_sigs[][ACPI_NAMESEG_SIZE] __initconst = {
+static const char table_sigs[][ACPI_NAMESEG_SIZE] __nonstring_array __initconst = {
ACPI_SIG_BERT, ACPI_SIG_BGRT, ACPI_SIG_CPEP, ACPI_SIG_ECDT,
Which will silence the coming -Wunterminated-string-initialization
warnings in GCC 15:
In file included from ../include/acpi/actbl.h:371, from ../include/acpi/acpi.h:26, from ../include/linux/acpi.h:26,
from ../drivers/acpi/tables.c:19:
../include/acpi/actbl1.h:30:33: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (5 chars into 4 available) [-Wunterminated-string-initialization]
30 | #define ACPI_SIG_BERT "BERT" /* Boot Error Record Table */
| ^~~~~~ ../drivers/acpi/tables.c:400:9: note: in expansion of macro 'ACPI_SIG_BERT' 400 | ACPI_SIG_BERT, ACPI_SIG_BGRT, ACPI_SIG_CPEP, ACPI_SIG_ECDT,
| ^~~~~~~~~~~~~ ../include/acpi/actbl1.h:31:33: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (5 chars into 4 available) [-Wunterminated-string-initialization]
31 | #define ACPI_SIG_BGRT "BGRT" /* Boot Graphics Resource Table */
| ^~~~~~
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117178 [1]
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://lore.kernel.org/r/20250310214244.work.194-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
The kernel cannot currently self-parse BTF containing Rust debug
information. pahole uses the language of the CU to determine whether to
filter out debug information when generating the BTF. When LTO is
enabled, Rust code can cross CU boundaries, resulting in Rust debug
information in CUs labeled as C. This results in a system which cannot
parse its own BTF.
Signed-off-by: Matthew Maurer <mmaurer@google.com>
Cc: stable@vger.kernel.org
Fixes: c1177979af9c ("btf, scripts: Exclude Rust CUs with pahole")
Link: https://lore.kernel.org/r/20250108-rust-btf-lto-incompat-v1-1-60243ff6d820@google.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
The dynamic error message printed if CONFIG_RD_$ALG compression support
is missing needn't be propagated up to the caller via a static buffer.
Print it immediately via pr_err() and set @message to a const string to
flag error.
Before:
text data bss dec hex filename
8006 1118 8 9132 23ac init/initramfs.o
After:
text data bss dec hex filename
7938 1022 8 8968 2308 init/initramfs.o
Signed-off-by: David Disseldorp <ddiss@suse.de>
Link: https://lore.kernel.org/r/20250304061020.9815-9-ddiss@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Covered in Documentation/driver-api/early-userspace/buffer-format.rst ,
initramfs archives can carry an optional "TRAILER!!!" entry which serves
as a boundary for collecting and associating hardlinks with matching
inode and major / minor device numbers.
Although optional, if hardlinks are found in an archive without a
subsequent "TRAILER!!!" entry then the hardlink state hash table is
leaked, e.g. unfixed kernel, with initramfs_test.c hunk applied only:
unreferenced object 0xffff9405408cc000 (size 8192):
comm "kunit_try_catch", pid 53, jiffies 4294892519
hex dump (first 32 bytes):
01 00 00 00 01 00 00 00 00 00 00 00 ff 81 00 00 ................
00 00 00 00 00 00 00 00 69 6e 69 74 72 61 6d 66 ........initramf
backtrace (crc a9fb0ee0):
[<0000000066739faa>] __kmalloc_cache_noprof+0x11d/0x250
[<00000000fc755219>] maybe_link.part.5+0xbc/0x120
[<000000000526a128>] do_name+0xce/0x2f0
[<00000000145c1048>] write_buffer+0x22/0x40
[<000000003f0b4f32>] unpack_to_rootfs+0xf9/0x2a0
[<00000000d6f7e5af>] initramfs_test_hardlink+0xe3/0x3f0
[<0000000014fde8d6>] kunit_try_run_case+0x5f/0x130
[<00000000dc9dafc5>] kunit_generic_run_threadfn_adapter+0x18/0x30
[<000000001076c239>] kthread+0xc8/0x100
[<00000000d939f1c1>] ret_from_fork+0x2b/0x40
[<00000000f848ad1a>] ret_from_fork_asm+0x1a/0x30
Fix this by calling free_hash() after initramfs buffer processing in
unpack_to_rootfs(). An extra hardlink_seen global is added as an
optimization to avoid walking the 32 entry hash array unnecessarily.
The expectation is that a "TRAILER!!!" entry will normally be present,
and initramfs hardlinks are uncommon.
There is one user facing side-effect of this fix: hardlinks can
currently be associated across built-in and external initramfs archives,
*if* the built-in initramfs archive lacks a "TRAILER!!!" terminator. I'd
consider this cross-archive association broken, but perhaps it's used.
Signed-off-by: David Disseldorp <ddiss@suse.de>
Link: https://lore.kernel.org/r/20250304061020.9815-8-ddiss@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
We already have a nulterm-inclusive, checked name_len for the directory
name, so use that instead of calling strlen().
Signed-off-by: David Disseldorp <ddiss@suse.de>
Link: https://lore.kernel.org/r/20250304061020.9815-7-ddiss@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
header_buf, symlink_buf and name_buf all share the same lifecycle so
needn't be allocated / freed separately. This change leads to a minor
reduction in .text size:
before:
text data bss dec hex filename
7914 1110 8 9032 2348 init/initramfs.o
after:
text data bss dec hex filename
7854 1110 8 8972 230c init/initramfs.o
A previous iteration of this patch reused a single buffer instead of
three, given that buffer use is state-sequential (GotHeader, GotName,
GotSymlink). However, the slight decrease in heap use during early boot
isn't really worth the extra review complexity.
Link: https://lore.kernel.org/all/20241107002044.16477-7-ddiss@suse.de/
Signed-off-by: David Disseldorp <ddiss@suse.de>
Link: https://lore.kernel.org/r/20250304061020.9815-6-ddiss@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
newc/crc cpio headers contain a bunch of 8-character hexadecimal fields
which we convert via simple_strtoul(), following memcpy() into a
zero-terminated stack buffer. The new simple_strntoul() helper allows us
to pass in max_chars=8 to avoid zero-termination and memcpy().
Signed-off-by: David Disseldorp <ddiss@suse.de>
Link: https://lore.kernel.org/r/20250304061020.9815-5-ddiss@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Provide some basic initramfs unpack sanity tests covering:
- simple file / dir extraction
- filename field overrun, as reported and fixed separately via
https://lore.kernel.org/r/20241030035509.20194-2-ddiss@suse.de
- "070702" cpio data checksums
- hardlinks
Signed-off-by: David Disseldorp <ddiss@suse.de>
Link: https://lore.kernel.org/r/20250304061020.9815-3-ddiss@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The timer and hrtimer softirq processing has moved to dedicated threads
for kernels built with CONFIG_IRQ_FORCED_THREADING=y. This results in
timers not expiring until later in early boot, which in turn causes the
RCU Tasks self-tests to hang in kernels built with CONFIG_PROVE_RCU=y,
which further causes the entire kernel to hang. One fix would be to
make timers work during this time, but there are no known users of RCU
Tasks grace periods during that time, so no justification for the added
complexity. Not yet, anyway.
This commit therefore moves the call to rcu_init_tasks_generic() from
kernel_init_freeable() to a core_initcall(). This works because the
timer and hrtimer kthreads are created at early_initcall() time.
Fixes: 49a17639508c3 ("softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: <linux-trace-kernel@vger.kernel.org>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
|
|
The new header only exports a single unpack function and a CPIO_HDRLEN
constant for future test use.
Signed-off-by: David Disseldorp <ddiss@suse.de>
Link: https://lore.kernel.org/r/20250304061020.9815-2-ddiss@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
x86-64 was the only user.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250123190747.745588-16-brgerst@gmail.com
|
|
The percpu section is currently linked at absolute address 0, because
older compilers hard-coded the stack protector canary value at a fixed
offset from the start of the GS segment. Now that the canary is a
normal percpu variable, the percpu section does not need to be linked
at a specific address.
x86-64 will now calculate the percpu offsets as the delta between the
initial percpu address and the dynamically allocated memory, like other
architectures. Note that GSBASE is limited to the canonical address
width (48 or 57 bits, sign-extended). As long as the kernel text,
modules, and the dynamically allocated percpu memory are all in the
negative address space, the delta will not overflow this limit.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250123190747.745588-9-brgerst@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Support multiple hook locations for maint scripts of Debian package
- Remove 'cpio' from the build tool requirement
- Introduce gendwarfksyms tool, which computes CRCs for export symbols
based on the DWARF information
- Support CONFIG_MODVERSIONS for Rust
- Resolve all conflicts in the genksyms parser
- Fix several syntax errors in genksyms
* tag 'kbuild-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (64 commits)
kbuild: fix Clang LTO with CONFIG_OBJTOOL=n
kbuild: Strip runtime const RELA sections correctly
kconfig: fix memory leak in sym_warn_unmet_dep()
kconfig: fix file name in warnings when loading KCONFIG_DEFCONFIG_LIST
genksyms: fix syntax error for attribute before init-declarator
genksyms: fix syntax error for builtin (u)int*x*_t types
genksyms: fix syntax error for attribute after 'union'
genksyms: fix syntax error for attribute after 'struct'
genksyms: fix syntax error for attribute after abstact_declarator
genksyms: fix syntax error for attribute before nested_declarator
genksyms: fix syntax error for attribute before abstract_declarator
genksyms: decouple ATTRIBUTE_PHRASE from type-qualifier
genksyms: record attributes consistently for init-declarator
genksyms: restrict direct-declarator to take one parameter-type-list
genksyms: restrict direct-abstract-declarator to take one parameter-type-list
genksyms: remove Makefile hack
genksyms: fix last 3 shift/reduce conflicts
genksyms: fix 6 shift/reduce conflicts and 5 reduce/reduce conflicts
genksyms: reduce type_qualifier directly to decl_specifier
genksyms: rename cvar_qualifier to type_qualifier
...
|
|
Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.
Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.
Created this by running an spatch followed by a sed command:
Spatch:
virtual patch
@
depends on !(file in "net")
disable optional_qualifier
@
identifier table_name != {
watchdog_hardlockup_sysctl,
iwcm_ctl_table,
ucma_ctl_table,
memory_allocation_profiling_sysctls,
loadpin_sysctl_table
};
@@
+ const
struct ctl_table table_name [] = { ... };
sed:
sed --in-place \
-e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \
kernel/utsname_sysctl.c
Reviewed-by: Song Liu <song@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI
Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
|
|
Pull drm fixes from Simona Vetter:
"cgroup:
- fix Koncfig fallout from new dmem controller
Driver Changes:
- v3d NULL pointer regression fix in fence signalling race
- virtio: uaf in dma_buf free path
- xlnx: fix kerneldoc
- bochs: fix double-free on driver removal
- zynqmp: add missing locking to DP bridge driver
- amdgpu fixes all over:
- documentation, display, sriov, various hw block drivers
- use drm/sched helper
- mark some debug module options as unsafe
- amdkfd: mark some debug module options as unsafe, trap handler
updates, fix partial migration handling
DRM core:
- fix fbdev Kconfig select rules, improve tiled-based display
support"
* tag 'drm-next-2025-01-27' of https://gitlab.freedesktop.org/drm/kernel: (40 commits)
drm/amd/display: Optimize cursor position updates
drm/amd/display: Add hubp cache reset when powergating
drm/amd/amdgpu: Enable scratch data dump for mes 12
drm/amd: Clarify kdoc for amdgpu.gttsize
drm/amd/amdgpu: Prevent null pointer dereference in GPU bandwidth calculation
drm/amd/display: Fix error pointers in amdgpu_dm_crtc_mem_type_changed
drm/amdgpu: fix ring timeout issue in gfx10 sr-iov environment
drm/amd/pm: Fix smu v13.0.6 caps initialization
drm/amd/pm: Refactor SMU 13.0.6 SDMA reset firmware version checks
revert "drm/amdgpu/pm: add definition PPSMC_MSG_ResetSDMA2"
revert "drm/amdgpu/pm: Implement SDMA queue reset for different asic"
drm/amd/pm: Add capability flags for SMU v13.0.6
drm/amd/display: fix SUBVP DC_DEBUG_MASK documentation
drm/amd/display: fix CEC DC_DEBUG_MASK documentation
drm/amdgpu: fix the PCIe lanes reporting in the INFO IOCTL
drm/amdgpu: cache gpu pcie link width
drm/amd/display: mark static functions noinline_for_stack
drm/amdkfd: Clear MODE.VSKIP in gfx9 trap handler
drm/amdgpu: Refine ip detection log message
drm/amdgpu: Add handler for SDMA context empty
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"The various patchsets are summarized below. Plus of course many
indivudual patches which are described in their changelogs.
- "Allocate and free frozen pages" from Matthew Wilcox reorganizes
the page allocator so we end up with the ability to allocate and
free zero-refcount pages. So that callers (ie, slab) can avoid a
refcount inc & dec
- "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to
use large folios other than PMD-sized ones
- "Fix mm/rodata_test" from Petr Tesarik performs some maintenance
and fixes for this small built-in kernel selftest
- "mas_anode_descend() related cleanup" from Wei Yang tidies up part
of the mapletree code
- "mm: fix format issues and param types" from Keren Sun implements a
few minor code cleanups
- "simplify split calculation" from Wei Yang provides a few fixes and
a test for the mapletree code
- "mm/vma: make more mmap logic userland testable" from Lorenzo
Stoakes continues the work of moving vma-related code into the
(relatively) new mm/vma.c
- "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
Hildenbrand cleans up and rationalizes handling of gfp flags in the
page allocator
- "readahead: Reintroduce fix for improper RA window sizing" from Jan
Kara is a second attempt at fixing a readahead window sizing issue.
It should reduce the amount of unnecessary reading
- "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
addresses an issue where "huge" amounts of pte pagetables are
accumulated:
https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/
Qi's series addresses this windup by synchronously freeing PTE
memory within the context of madvise(MADV_DONTNEED)
- "selftest/mm: Remove warnings found by adding compiler flags" from
Muhammad Usama Anjum fixes some build warnings in the selftests
code when optional compiler warnings are enabled
- "mm: don't use __GFP_HARDWALL when migrating remote pages" from
David Hildenbrand tightens the allocator's observance of
__GFP_HARDWALL
- "pkeys kselftests improvements" from Kevin Brodsky implements
various fixes and cleanups in the MM selftests code, mainly
pertaining to the pkeys tests
- "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
estimate application working set size
- "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
provides some cleanups to memcg's hugetlb charging logic
- "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
removes the global swap cgroup lock. A speedup of 10% for a
tmpfs-based kernel build was demonstrated
- "zram: split page type read/write handling" from Sergey Senozhatsky
has several fixes and cleaups for zram in the area of
zram_write_page(). A watchdog softlockup warning was eliminated
- "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin
Brodsky cleans up the pagetable destructor implementations. A rare
use-after-free race is fixed
- "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
simplifies and cleans up the debugging code in the VMA merging
logic
- "Account page tables at all levels" from Kevin Brodsky cleans up
and regularizes the pagetable ctor/dtor handling. This results in
improvements in accounting accuracy
- "mm/damon: replace most damon_callback usages in sysfs with new
core functions" from SeongJae Park cleans up and generalizes
DAMON's sysfs file interface logic
- "mm/damon: enable page level properties based monitoring" from
SeongJae Park increases the amount of information which is
presented in response to DAMOS actions
- "mm/damon: remove DAMON debugfs interface" from SeongJae Park
removes DAMON's long-deprecated debugfs interfaces. Thus the
migration to sysfs is completed
- "mm/hugetlb: Refactor hugetlb allocation resv accounting" from
Peter Xu cleans up and generalizes the hugetlb reservation
accounting
- "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
removes a never-used feature of the alloc_pages_bulk() interface
- "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
extends DAMOS filters to support not only exclusion (rejecting),
but also inclusion (allowing) behavior
- "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
introduces a new memory descriptor for zswap.zpool that currently
overlaps with struct page for now. This is part of the effort to
reduce the size of struct page and to enable dynamic allocation of
memory descriptors
- "mm, swap: rework of swap allocator locks" from Kairui Song redoes
and simplifies the swap allocator locking. A speedup of 400% was
demonstrated for one workload. As was a 35% reduction for kernel
build time with swap-on-zram
- "mm: update mips to use do_mmap(), make mmap_region() internal"
from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
mmap_region() can be made MM-internal
- "mm/mglru: performance optimizations" from Yu Zhao fixes a few
MGLRU regressions and otherwise improves MGLRU performance
- "Docs/mm/damon: add tuning guide and misc updates" from SeongJae
Park updates DAMON documentation
- "Cleanup for memfd_create()" from Isaac Manjarres does that thing
- "mm: hugetlb+THP folio and migration cleanups" from David
Hildenbrand provides various cleanups in the areas of hugetlb
folios, THP folios and migration
- "Uncached buffered IO" from Jens Axboe implements the new
RWF_DONTCACHE flag which provides synchronous dropbehind for
pagecache reading and writing. To permite userspace to address
issues with massive buildup of useless pagecache when
reading/writing fast devices
- "selftests/mm: virtual_address_range: Reduce memory" from Thomas
Weißschuh fixes and optimizes some of the MM selftests"
* tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
mm/compaction: fix UBSAN shift-out-of-bounds warning
s390/mm: add missing ctor/dtor on page table upgrade
kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags()
tools: add VM_WARN_ON_VMG definition
mm/damon/core: use str_high_low() helper in damos_wmark_wait_us()
seqlock: add missing parameter documentation for raw_seqcount_try_begin()
mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh
mm/page_alloc: remove the incorrect and misleading comment
zram: remove zcomp_stream_put() from write_incompressible_page()
mm: separate move/undo parts from migrate_pages_batch()
mm/kfence: use str_write_read() helper in get_access_type()
selftests/mm/mkdirty: fix memory leak in test_uffdio_copy()
kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags()
selftests/mm: virtual_address_range: avoid reading from VM_IO mappings
selftests/mm: vm_util: split up /proc/self/smaps parsing
selftests/mm: virtual_address_range: unmap chunks after validation
selftests/mm: virtual_address_range: mmap() without PROT_WRITE
selftests/memfd/memfd_test: fix possible NULL pointer dereference
mm: add FGP_DONTCACHE folio creation flag
mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"Mainly individually changelogged singleton patches. The patch series
in this pull are:
- "lib min_heap: Improve min_heap safety, testing, and documentation"
from Kuan-Wei Chiu provides various tightenings to the min_heap
library code
- "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms
some cleanup and Rust preparation in the xarray library code
- "Update reference to include/asm-<arch>" from Geert Uytterhoeven
fixes pathnames in some code comments
- "Converge on using secs_to_jiffies()" from Easwar Hariharan uses
the new secs_to_jiffies() in various places where that is
appropriate
- "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen
switches two filesystems to the new mount API
- "Convert ocfs2 to use folios" from Matthew Wilcox does that
- "Remove get_task_comm() and print task comm directly" from Yafang
Shao removes now-unneeded calls to get_task_comm() in various
places
- "squashfs: reduce memory usage and update docs" from Phillip
Lougher implements some memory savings in squashfs and performs
some maintainability work
- "lib: clarify comparison function requirements" from Kuan-Wei Chiu
tightens the sort code's behaviour and adds some maintenance work
- "nilfs2: protect busy buffer heads from being force-cleared" from
Ryusuke Konishi fixes an issues in nlifs when the fs is presented
with a corrupted image
- "nilfs2: fix kernel-doc comments for function return values" from
Ryusuke Konishi fixes some nilfs kerneldoc
- "nilfs2: fix issues with rename operations" from Ryusuke Konishi
addresses some nilfs BUG_ONs which syzbot was able to trigger
- "minmax.h: Cleanups and minor optimisations" from David Laight does
some maintenance work on the min/max library code
- "Fixes and cleanups to xarray" from Kemeng Shi does maintenance
work on the xarray library code"
* tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (131 commits)
ocfs2: use str_yes_no() and str_no_yes() helper functions
include/linux/lz4.h: add some missing macros
Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent
Xarray: remove repeat check in xas_squash_marks()
Xarray: distinguish large entries correctly in xas_split_alloc()
Xarray: move forward index correctly in xas_pause()
Xarray: do not return sibling entries from xas_find_marked()
ipc/util.c: complete the kernel-doc function descriptions
gcov: clang: use correct function param names
latencytop: use correct kernel-doc format for func params
minmax.h: remove some #defines that are only expanded once
minmax.h: simplify the variants of clamp()
minmax.h: move all the clamp() definitions after the min/max() ones
minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()
minmax.h: reduce the #define expansion of min(), max() and clamp()
minmax.h: update some comments
minmax.h: add whitespace around operators and after commas
nilfs2: do not update mtime of renamed directory that is not moved
nilfs2: handle errors that nilfs_prepare_chunk() may return
CREDITS: fix spelling mistake
...
|
|
Before SLUB initialization, various subsystems used memblock_alloc to
allocate memory. In most cases, when memory allocation fails, an
immediate panic is required. To simplify this behavior and reduce
repetitive checks, introduce `memblock_alloc_or_panic`. This function
ensures that memory allocation failures result in a panic automatically,
improving code readability and consistency across subsystems that require
this behavior.
[guoweikang.kernel@gmail.com: arch/s390: save_area_alloc default failure behavior changed to panic]
Link: https://lkml.kernel.org/r/20250109033136.2845676-1-guoweikang.kernel@gmail.com
Link: https://lore.kernel.org/lkml/Z2fknmnNtiZbCc7x@kernel.org/
Link: https://lkml.kernel.org/r/20250102072528.650926-1-guoweikang.kernel@gmail.com
Signed-off-by: Guo Weikang <guoweikang.kernel@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390]
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The cpuset file is a legacy attribute that is bound primarily to cpuset
v1 hierarchy (equivalent information is available in /proc/$pid/cgroup path
on the unified hierarchy in conjunction with respective
cgroup.controllers showing where cpuset controller is enabled).
Followup to commit b0ced9d378d49 ("cgroup/cpuset: move v1 interfaces to
cpuset-v1.c") and hide CONFIG_PROC_PID_CPUSET under CONFIG_CPUSETS_V1.
Drop an obsolete comment too.
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- Finish the move to custom FFI integer types started in the previous
cycle and finally map 'long' to 'isize' and 'char' to 'u8'. Do a
few cleanups on top thanks to that.
- Start to use 'derive(CoercePointee)' on Rust >= 1.84.0.
This is a major milestone on the path to build the kernel using
only stable Rust features. In particular, previously we were using
the unstable features 'coerce_unsized', 'dispatch_from_dyn' and
'unsize', and now we will use the new 'derive_coerce_pointee' one,
which is on track to stabilization. This new feature is a macro
that essentially expands into code that internally uses the
unstable features that we were using before, without having to
expose those.
With it, stable Rust users, including the kernel, will be able to
build custom smart pointers that work with trait objects, e.g.:
fn f(p: &Arc<dyn Display>) {
pr_info!("{p}\n");
}
let a: Arc<dyn Display> = Arc::new(42i32, GFP_KERNEL)?;
let b: Arc<dyn Display> = Arc::new("hello there", GFP_KERNEL)?;
f(&a); // Prints "42".
f(&b); // Prints "hello there".
Together with the 'arbitrary_self_types' feature that we started
using in the previous cycle, using our custom smart pointers like
'Arc' will eventually only rely in stable Rust.
- Introduce 'PROCMACROLDFLAGS' environment variable to allow to link
Rust proc macros using different flags than those used for linking
Rust host programs (e.g. when 'rustc' uses a different C library
than the host programs' one), which Android needs.
- Help kernel builds under macOS with Rust enabled by accomodating
other naming conventions for dynamic libraries (i.e. '.so' vs.
'.dylib') which are used for Rust procedural macros. The actual
support for macOS (i.e. the rest of the pieces needed) is provided
out-of-tree by others, following the policy used for other parts of
the kernel by Kbuild.
- Run Clippy for 'rusttest' code too and clean the bits it spotted.
- Provide Clippy with the minimum supported Rust version to improve
the suggestions it gives.
- Document 'bindgen' 0.71.0 regression.
'kernel' crate:
- 'build_error!': move users of the hidden function to the documented
macro, prevent such uses in the future by moving the function
elsewhere and add the macro to the prelude.
- 'types' module: add improved version of 'ForeignOwnable::borrow_mut'
(which was removed in the past since it was problematic); change
'ForeignOwnable' pointer type to '*mut'.
- 'alloc' module: implement 'Display' for 'Box' and align the 'Debug'
implementation to it; add example (doctest) for 'ArrayLayout::new()'
- 'sync' module: document 'PhantomData' in 'Arc'; use
'NonNull::new_unchecked' in 'ForeignOwnable for Arc' impl.
- 'uaccess' module: accept 'Vec's with different allocators in
'UserSliceReader::read_all'.
- 'workqueue' module: enable run-testing a couple more doctests.
- 'error' module: simplify 'from_errno()'.
- 'block' module: fix formatting in code documentation (a lint to catch
these is being implemented).
- Avoid 'unwrap()'s in doctests, which also improves the examples by
showing how kernel code is supposed to be written.
- Avoid 'as' casts with 'cast{,_mut}' calls which are a bit safer.
And a few other cleanups"
* tag 'rust-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (32 commits)
kbuild: rust: add PROCMACROLDFLAGS
rust: uaccess: generalize userSliceReader to support any Vec
rust: kernel: add improved version of `ForeignOwnable::borrow_mut`
rust: kernel: reorder `ForeignOwnable` items
rust: kernel: change `ForeignOwnable` pointer to mut
rust: arc: split unsafe block, add missing comment
rust: types: avoid `as` casts
rust: arc: use `NonNull::new_unchecked`
rust: use derive(CoercePointee) on rustc >= 1.84.0
rust: alloc: add doctest for `ArrayLayout::new()`
rust: init: update `stack_try_pin_init` examples
rust: error: import `kernel`'s `LayoutError` instead of `core`'s
rust: str: replace unwraps with question mark operators
rust: page: remove unnecessary helper function from doctest
rust: rbtree: remove unwrap in asserts
rust: init: replace unwraps with question mark operators
rust: use host dylib naming convention to support macOS
rust: add `build_error!` to the prelude
rust: kernel: move `build_error` hidden function to prevent mistakes
rust: use the `build_error!` macro, not the hidden function
...
|
|
Pull drm updates from Dave Airlie:
"There are two external interactions of note, the msm tree pull in some
opp tree, hopefully the opp tree arrives from the same git tree
however it normally does.
There is also a new cgroup controller for device memory, that is used
by drm, so is merging through my tree. This will hopefully help open
up gpu cgroup usage a bit more and move us forward.
There is a new accelerator driver for the AMD XDNA Ryzen AI NPUs.
Then the usual xe/amdgpu/i915/msm leaders and lots of changes and
refactors across the board:
core:
- device memory cgroup controller added
- Remove driver date from drm_driver
- Add drm_printer based hex dumper
- drm memory stats docs update
- scheduler documentation improvements
new driver:
- amdxdna - Ryzen AI NPU support
connector:
- add a mutex to protect ELD
- make connector setup two-step
panels:
- Introduce backlight quirks infrastructure
- New panels: KDB KD116N2130B12, Tianma TM070JDHG34-00,
- Multi-Inno Technology MI1010Z1T-1CP11
bridge:
- ti-sn65dsi83: Add ti,lvds-vod-swing optional properties
- Provide default implementation of atomic_check for HDMI bridges
- it605: HDCP improvements, MCCS Support
xe:
- make OA buffer size configurable
- GuC capture fixes
- add ufence and g2h flushes
- restore system memory GGTT mappings
- ioctl fixes
- SRIOV PF scheduling priority
- allow fault injection
- lots of improvements/refactors
- Enable GuC's WA_DUAL_QUEUE for newer platforms
- IRQ related fixes and improvements
i915:
- More accurate engine busyness metrics with GuC submission
- Ensure partial BO segment offset never exceeds allowed max
- Flush GuC CT receive tasklet during reset preparation
- Some DG2 refactor to fix DG2 bugs when operating with certain CPUs
- Fix DG1 power gate sequence
- Enabling uncompressed 128b/132b UHBR SST
- Handle hdmi connector init failures, and no HDMI/DP cases
- More robust engine resets on Haswell and older
i915/xe display:
- HDCP fixes for Xe3Lpd
- New GSC FW ARL-H/ARL-U
- support 3 VDSC engines 12 slices
- MBUS joining sanitisation
- reconcile i915/xe display power mgmt
- Xe3Lpd fixes
- UHBR rates for Thunderbolt
amdgpu:
- DRM panic support
- track BO memory stats at runtime
- Fix max surface handling in DC
- Cleaner shader support for gfx10.3 dGPUs
- fix drm buddy trim handling
- SDMA engine reset updates
- Fix doorbell ttm cleanup
- RAS updates
- ISP updates
- SDMA queue reset support
- Rework DPM powergating interfaces
- Documentation updates and cleanups
- DCN 3.5 updates
- Use a pm notifier to more gracefully handle VRAM eviction on
suspend or hibernate
- Add debugfs interfaces for forcing scheduling to specific engine
instances
- GG 9.5 updates
- IH 4.4 updates
- Make missing optional firmware less noisy
- PSP 13.x updates
- SMU 13.x updates
- VCN 5.x updates
- JPEG 5.x updates
- GC 12.x updates
- DC FAMS updates
amdkfd:
- GG 9.5 updates
- Logging improvements
- Shader debugger fixes
- Trap handler cleanup
- Cleanup includes
- Eviction fence wq fix
msm:
- MDSS:
- properly described UBWC registers
- added SM6150 (aka QCS615) support
- DPU:
- added SM6150 (aka QCS615) support
- enabled wide planes if virtual planes are enabled (by using two
SSPPs for a single plane)
- added CWB hardware blocks support
- DSI:
- added SM6150 (aka QCS615) support
- GPU:
- Print GMU core fw version
- GMU bandwidth voting for a740 and a750
- Expose uche trap base via uapi
- UAPI error reporting
rcar-du:
- Add r8a779h0 Support
ivpu:
- Fix qemu crash when using passthrough
nouveau:
- expose GSP-RM logging buffers via debugfs
panfrost:
- Add MT8188 Mali-G57 MC3 support
rockchip:
- Gamma LUT support
hisilicon:
- new HIBMC support
virtio-gpu:
- convert to helpers
- add prime support for scanout buffers
v3d:
- Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL
vc4:
- Add support for BCM2712
vkms:
- line-per-line compositing algorithm to improve performance
zynqmp:
- Add DP audio support
mediatek:
- dp: Add sdp path reset
- dp: Support flexible length of DP calibration data
etnaviv:
- add fdinfo memory support
- add explicit reset handling"
* tag 'drm-next-2025-01-17' of https://gitlab.freedesktop.org/drm/kernel: (1070 commits)
drm/bridge: fix documentation for the hdmi_audio_prepare() callback
doc/cgroup: Fix title underline length
drm/doc: Include new drm-compute documentation
cgroup/dmem: Fix parameters documentation
cgroup/dmem: Select PAGE_COUNTER
kernel/cgroup: Remove the unused variable climit
drm/display: hdmi: Do not read EDID on disconnected connectors
drm/tests: hdmi: Add connector disablement test
drm/connector: hdmi: Do atomic check when necessary
drm/amd/display: 3.2.316
drm/amd/display: avoid reset DTBCLK at clock init
drm/amd/display: improve dpia pre-train
drm/amd/display: Apply DML21 Patches
drm/amd/display: Use HW lock mgr for PSR1
drm/amd/display: Revised for Replay Pseudo vblank control
drm/amd/display: Add a new flag for replay low hz
drm/amd/display: Remove unused read_ono_state function from Hwss module
drm/amd/display: Do not elevate mem_type change to full update
drm/amd/display: Do not wait for PSR disable on vbl enable
drm/amd/display: Remove unnecessary eDP power down
...
|
|
Previously, two things stopped Rust from using MODVERSIONS:
1. Rust symbols are occasionally too long to be represented in the
original versions table
2. Rust types cannot be properly hashed by the existing genksyms
approach because:
* Looking up type definitions in Rust is more complex than C
* Type layout is potentially dependent on the compiler in Rust,
not just the source type declaration.
CONFIG_EXTENDED_MODVERSIONS addresses the first point, and
CONFIG_GENDWARFKSYMS the second. If Rust wants to use MODVERSIONS, allow
it to do so by selecting both features.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Co-developed-by: Matthew Maurer <mmaurer@google.com>
Signed-off-by: Matthew Maurer <mmaurer@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
When adding the Device memory controller (DMEM), "select PAGE_COUNTER"
was added to CGROUP_RDMA, presumably instead of CGROUP_DMEM.
While commit e33b51499a0a6bca ("cgroup/dmem: Select PAGE_COUNTER") added
the missing select to CGROUP_DMEM, the bogus select is still there.
Remove it.
Fixes: b168ed458ddecc17 ("kernel/cgroup: Add "dmem" memory accounting cgroup")
Closes: https://lore.kernel.org/CAMuHMdUmPfahsnZwx2iB5yfh8rjjW25LNcnYujNBgcKotUXBNg@mail.gmail.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/b4d462f038a2f895f30ae759928397c8183f6f7e.1737020925.git.geert+renesas@glider.be
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
The dmem cgroup the page counting API implemented behing the
PAGE_COUNTER kconfig option. However, it doesn't select it, resulting in
potential build breakages. Select PAGE_COUNTER.
Fixes: b168ed458dde ("kernel/cgroup: Add "dmem" memory accounting cgroup")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501111330.3VuUx8vf-lkp@intel.com/
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20250113092608.1349287-1-mripard@kernel.org
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
The `kernel` crate relies on both `coerce_unsized` and `dispatch_from_dyn`
unstable features.
Alice Ryhl has proposed [1] the introduction of the unstable macro
`SmartPointer` to reduce such dependence, along with a RFC patch [2].
Since Rust 1.81.0 this macro, later renamed to `CoercePointee` in
Rust 1.84.0 [3], has been fully implemented with the naming discussion
resolved.
This feature is now on track to stabilization in the language.
In order to do so, we shall start using this macro in the `kernel` crate
to prove the functionality and utility of the macro as the justification
of its stabilization.
This patch makes this switch in such a way that the crate remains
backward compatible with older Rust compiler versions,
via the new Kconfig option `RUSTC_HAS_COERCE_POINTEE`.
A minimal demonstration example is added to the
`samples/rust/rust_print_main.rs` module.
Link: https://rust-lang.github.io/rfcs/3621-derive-smart-pointer.html [1]
Link: https://lore.kernel.org/all/20240823-derive-smart-pointer-v1-1-53769cd37239@google.com/ [2]
Link: https://github.com/rust-lang/rust/pull/131284 [3]
Signed-off-by: Xiangfei Ding <dingxiangfei2009@gmail.com>
Reviewed-by: Fiona Behrens <me@kloenk.dev>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20241203205050.679106-2-dingxiangfei2009@gmail.com
[ Fixed version to 1.84. Renamed option to `RUSTC_HAS_COERCE_POINTEE`
to match `CC_HAS_*` ones. Moved up new config option, closer to the
`CC_HAS_*` ones. Simplified Kconfig line. Fixed typos and slightly
reworded example and commit. Added Link to PR. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
This won't be removed in 2021, no matter how hard we try.
Link: https://lkml.kernel.org/r/20241218123638.34907-1-martink@posteo.de
Signed-off-by: Martin Kepplinger <martink@posteo.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Joel Granados <joel.granados@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Introduce a separate initialization of kvfree_rcu() functionality.
For such purpose a kfree_rcu_batch_init() is renamed to a kvfree_rcu_init()
and it is invoked from the main.c right after rcu_init() is done.
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Hyeonggon Yoo <hyeonggon.yoo@sk.com>
Tested-by: Hyeonggon Yoo <hyeonggon.yoo@sk.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
`bindgen` 0.71.0 regressed [1] on the "`--version` requires header"
issue which appeared in 0.69.0 first [2] and was fixed in 0.69.1. It
has been fixed again in 0.71.1 [3].
Thus document it so that, when we upgrade the minimum past 0.69.0 in the
future, we do not forget that we cannot remove the workaround until we
arrive at 0.71.1 at least.
Link: https://github.com/rust-lang/rust-bindgen/issues/3039 [1]
Link: https://github.com/rust-lang/rust-bindgen/issues/2677 [2]
Link: https://github.com/rust-lang/rust-bindgen/blob/main/CHANGELOG.md#v0711-2024-12-09 [3]
Reviewed-by: Fiona Behrens <me@kloenk.dev>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20241209212544.1977065-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
This code is based on the RDMA and misc cgroup initially, but now
uses page_counter. It uses the same min/low/max semantics as the memory
cgroup as a result.
There's a small mismatch as TTM uses u64, and page_counter long pages.
In practice it's not a problem. 32-bits systems don't really come with
>=4GB cards and as long as we're consistently wrong with units, it's
fine. The device page size may not be in the same units as kernel page
size, and each region might also have a different page size (VRAM vs GART
for example).
The interface is simple:
- Call dmem_cgroup_register_region()
- Use dmem_cgroup_try_charge to check if you can allocate a chunk of memory,
use dmem_cgroup__uncharge when freeing it. This may return an error code,
or -EAGAIN when the cgroup limit is reached. In that case a reference
to the limiting pool is returned.
- The limiting cs can be used as compare function for
dmem_cgroup_state_evict_valuable.
- After having evicted enough, drop reference to limiting cs with
dmem_cgroup_pool_state_put.
This API allows you to limit device resources with cgroups.
You can see the supported cards in /sys/fs/cgroup/dmem.capacity
You need to echo +dmem to cgroup.subtree_control, and then you can
partition device memory.
Co-developed-by: Friedrich Vock <friedrich.vock@gmx.de>
Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
Co-developed-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20241204143112.1250983-1-dev@lankhorst.se
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- The series "resource: A couple of cleanups" from Andy Shevchenko
performs some cleanups in the resource management code
- The series "Improve the copy of task comm" from Yafang Shao addresses
possible race-induced overflows in the management of
task_struct.comm[]
- The series "Remove unnecessary header includes from
{tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a
small fix to the list_sort library code and to its selftest
- The series "Enhance min heap API with non-inline functions and
optimizations" also from Kuan-Wei Chiu optimizes and cleans up the
min_heap library code
- The series "nilfs2: Finish folio conversion" from Ryusuke Konishi
finishes off nilfs2's folioification
- The series "add detect count for hung tasks" from Lance Yang adds
more userspace visibility into the hung-task detector's activity
- Apart from that, singelton patches in many places - please see the
individual changelogs for details
* tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
gdb: lx-symbols: do not error out on monolithic build
kernel/reboot: replace sprintf() with sysfs_emit()
lib: util_macros_kunit: add kunit test for util_macros.h
util_macros.h: fix/rework find_closest() macros
Improve consistency of '#error' directive messages
ocfs2: fix uninitialized value in ocfs2_file_read_iter()
hung_task: add docs for hung_task_detect_count
hung_task: add detect count for hung tasks
dma-buf: use atomic64_inc_return() in dma_buf_getfile()
fs/proc/kcore.c: fix coccinelle reported ERROR instances
resource: avoid unnecessary resource tree walking in __region_intersects()
ocfs2: remove unused errmsg function and table
ocfs2: cluster: fix a typo
lib/scatterlist: use sg_phys() helper
checkpatch: always parse orig_commit in fixes tag
nilfs2: convert metadata aops from writepage to writepages
nilfs2: convert nilfs_recovery_copy_block() to take a folio
nilfs2: convert nilfs_page_count_clean_buffers() to take a folio
nilfs2: remove nilfs_writepage
nilfs2: convert checkpoint file to be folio-based
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
- Disable __counted_by in Clang < 19.1.3 (Jan Hendrik Farr)
- string_helpers: Silence output truncation warning (Bartosz
Golaszewski)
- compiler.h: Avoid needing BUILD_BUG_ON_ZERO() (Philipp Reisner)
- MAINTAINERS: Add kernel hardening keywords __counted_by{_le|_be}
(Thorsten Blum)
* tag 'hardening-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
Compiler Attributes: disable __counted_by for clang < 19.1.3
compiler.h: Fix undefined BUILD_BUG_ON_ZERO()
lib: string_helpers: silence snprintf() output truncation warning
MAINTAINERS: Add kernel hardening keywords __counted_by{_le|_be}
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
- Addition of faultable tracepoints
There's a tracepoint attached to both a system call entry and exit.
This location is known to allow page faults. The tracepoints are
called under an rcu_read_lock() which does not allow faults that can
sleep. This limits the ability of tracepoint handlers to page fault
in user space system call parameters. Now these tracepoints have been
made "faultable", allowing the callbacks to fault in user space
parameters and record them.
Note, only the infrastructure has been implemented. The consumers
(perf, ftrace, BPF) now need to have their code modified to allow
faults.
- Fix up of BPF code for the tracepoint faultable logic
- Update tracepoints to use the new static branch API
- Remove trace_*_rcuidle() variants and the SRCU protection they used
- Remove unused TRACE_EVENT_FL_FILTERED logic
- Replace strncpy() with strscpy() and memcpy()
- Use replace per_cpu_ptr(smp_processor_id()) with this_cpu_ptr()
- Fix perf events to not duplicate samples when tracing is enabled
- Replace atomic64_add_return(1, counter) with
atomic64_inc_return(counter)
- Make stack trace buffer 4K instead of PAGE_SIZE
- Remove TRACE_FLAG_IRQS_NOSUPPORT flag as it was never used
- Get the true return address for function tracer when function graph
tracer is also running.
When function_graph trace is running along with function tracer, the
parent function of the function tracer sometimes is
"return_to_handler", which is the function graph trampoline to record
the exit of the function. Use existing logic that calls into the
fgraph infrastructure to find the real return address.
- Remove (un)regfunc pointers out of tracepoint structure
- Added last minute bug fix for setting pending modules in stack
function filter.
echo "write*:mod:ext3" > /sys/kernel/tracing/stack_trace_filter
Would cause a kernel NULL dereference.
- Minor clean ups
* tag 'trace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (31 commits)
ftrace: Fix regression with module command in stack_trace_filter
tracing: Fix function name for trampoline
ftrace: Get the true parent ip for function tracer
tracing: Remove redundant check on field->field in histograms
bpf: ensure RCU Tasks Trace GP for sleepable raw tracepoint BPF links
bpf: decouple BPF link/attach hook and BPF program sleepable semantics
bpf: put bpf_link's program when link is safe to be deallocated
tracing: Replace strncpy() with strscpy() when copying comm
tracing: Add might_fault() check in __DECLARE_TRACE_SYSCALL
tracing: Fix syscall tracepoint use-after-free
tracing: Introduce tracepoint_is_faultable()
tracing: Introduce tracepoint extended structure
tracing: Remove TRACE_FLAG_IRQS_NOSUPPORT
tracing: Replace multiple deprecated strncpy with memcpy
tracing: Make percpu stack trace buffer invariant to PAGE_SIZE
tracing: Use atomic64_inc_return() in trace_clock_counter()
trace/trace_event_perf: remove duplicate samples on the first tracepoint event
tracing/bpf: Add might_fault check to syscall probes
tracing/perf: Add might_fault check to syscall probes
tracing/ftrace: Add might_fault check to syscall probes
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Print more precise information about the printk log buffer memory
usage.
- Make sure that the sysrq title is shown on the console even when
deferred.
- Do not enable earlycon by `console=` which is meant to disable the
default console.
* tag 'printk-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: add dummy printk_force_console_enter/exit helpers
tty: sysrq: Use printk_force_console context on __handle_sysrq
printk: Introduce FORCE_CON flag
printk: Improve memory usage logging during boot
init: Don't proxy `console=` to earlycon
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"A rather large update for timekeeping and timers:
- The final step to get rid of auto-rearming posix-timers
posix-timers are currently auto-rearmed by the kernel when the
signal of the timer is ignored so that the timer signal can be
delivered once the corresponding signal is unignored.
This requires to throttle the timer to prevent a DoS by small
intervals and keeps the system pointlessly out of low power states
for no value. This is a long standing non-trivial problem due to
the lock order of posix-timer lock and the sighand lock along with
life time issues as the timer and the sigqueue have different life
time rules.
Cure this by:
- Embedding the sigqueue into the timer struct to have the same
life time rules. Aside of that this also avoids the lookup of
the timer in the signal delivery and rearm path as it's just a
always valid container_of() now.
- Queuing ignored timer signals onto a seperate ignored list.
- Moving queued timer signals onto the ignored list when the
signal is switched to SIG_IGN before it could be delivered.
- Walking the ignored list when SIG_IGN is lifted and requeue the
signals to the actual signal lists. This allows the signal
delivery code to rearm the timer.
This also required to consolidate the signal delivery rules so they
are consistent across all situations. With that all self test
scenarios finally succeed.
- Core infrastructure for VFS multigrain timestamping
This is required to allow the kernel to use coarse grained time
stamps by default and switch to fine grained time stamps when inode
attributes are actively observed via getattr().
These changes have been provided to the VFS tree as well, so that
the VFS specific infrastructure could be built on top.
- Cleanup and consolidation of the sleep() infrastructure
- Move all sleep and timeout functions into one file
- Rework udelay() and ndelay() into proper documented inline
functions and replace the hardcoded magic numbers by proper
defines.
- Rework the fsleep() implementation to take the reality of the
timer wheel granularity on different HZ values into account.
Right now the boundaries are hard coded time ranges which fail
to provide the requested accuracy on different HZ settings.
- Update documentation for all sleep/timeout related functions
and fix up stale documentation links all over the place
- Fixup a few usage sites
- Rework of timekeeping and adjtimex(2) to prepare for multiple PTP
clocks
A system can have multiple PTP clocks which are participating in
seperate and independent PTP clock domains. So far the kernel only
considers the PTP clock which is based on CLOCK TAI relevant as
that's the clock which drives the timekeeping adjustments via the
various user space daemons through adjtimex(2).
The non TAI based clock domains are accessible via the file
descriptor based posix clocks, but their usability is very limited.
They can't be accessed fast as they always go all the way out to
the hardware and they cannot be utilized in the kernel itself.
As Time Sensitive Networking (TSN) gains traction it is required to
provide fast user and kernel space access to these clocks.
The approach taken is to utilize the timekeeping and adjtimex(2)
infrastructure to provide this access in a similar way how the
kernel provides access to clock MONOTONIC, REALTIME etc.
Instead of creating a duplicated infrastructure this rework
converts timekeeping and adjtimex(2) into generic functionality
which operates on pointers to data structures instead of using
static variables.
This allows to provide time accessors and adjtimex(2) functionality
for the independent PTP clocks in a subsequent step.
- Consolidate hrtimer initialization
hrtimers are set up by initializing the data structure and then
seperately setting the callback function for historical reasons.
That's an extra unnecessary step and makes Rust support less
straight forward than it should be.
Provide a new set of hrtimer_setup*() functions and convert the
core code and a few usage sites of the less frequently used
interfaces over.
The bulk of the htimer_init() to hrtimer_setup() conversion is
already prepared and scheduled for the next merge window.
- Drivers:
- Ensure that the global timekeeping clocksource is utilizing the
cluster 0 timer on MIPS multi-cluster systems.
Otherwise CPUs on different clusters use their cluster specific
clocksource which is not guaranteed to be synchronized with
other clusters.
- Mostly boring cleanups, fixes, improvements and code movement"
* tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (140 commits)
posix-timers: Fix spurious warning on double enqueue versus do_exit()
clocksource/drivers/arm_arch_timer: Use of_property_present() for non-boolean properties
clocksource/drivers/gpx: Remove redundant casts
clocksource/drivers/timer-ti-dm: Fix child node refcount handling
dt-bindings: timer: actions,owl-timer: convert to YAML
clocksource/drivers/ralink: Add Ralink System Tick Counter driver
clocksource/drivers/mips-gic-timer: Always use cluster 0 counter as clocksource
clocksource/drivers/timer-ti-dm: Don't fail probe if int not found
clocksource/drivers:sp804: Make user selectable
clocksource/drivers/dw_apb: Remove unused dw_apb_clockevent functions
hrtimers: Delete hrtimer_init_on_stack()
alarmtimer: Switch to use hrtimer_setup() and hrtimer_setup_on_stack()
io_uring: Switch to use hrtimer_setup_on_stack()
sched/idle: Switch to use hrtimer_setup_on_stack()
hrtimers: Delete hrtimer_init_sleeper_on_stack()
wait: Switch to use hrtimer_setup_sleeper_on_stack()
timers: Switch to use hrtimer_setup_sleeper_on_stack()
net: pktgen: Switch to use hrtimer_setup_sleeper_on_stack()
futex: Switch to use hrtimer_setup_sleeper_on_stack()
fs/aio: Switch to use hrtimer_setup_sleeper_on_stack()
...
|
|
This patch disables __counted_by for clang versions < 19.1.3 because
of the two issues listed below. It does this by introducing
CONFIG_CC_HAS_COUNTED_BY.
1. clang < 19.1.2 has a bug that can lead to __bdos returning 0:
https://github.com/llvm/llvm-project/pull/110497
2. clang < 19.1.3 has a bug that can lead to __bdos being off by 4:
https://github.com/llvm/llvm-project/pull/112636
Fixes: c8248faf3ca2 ("Compiler Attributes: counted_by: Adjust name and identifier expansion")
Cc: stable@vger.kernel.org # 6.6.x: 16c31dd7fdf6: Compiler Attributes: counted_by: bump min gcc version
Cc: stable@vger.kernel.org # 6.6.x: 2993eb7a8d34: Compiler Attributes: counted_by: fixup clang URL
Cc: stable@vger.kernel.org # 6.6.x: 231dc3f0c936: lkdtm/bugs: Improve warning message for compilers without counted_by support
Cc: stable@vger.kernel.org # 6.6.x
Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://lore.kernel.org/all/20240913164630.GA4091534@thelio-3990X/
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202409260949.a1254989-oliver.sang@intel.com
Link: https://lore.kernel.org/all/Zw8iawAF5W2uzGuh@archlinux/T/#m204c09f63c076586a02d194b87dffc7e81b8de7b
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Jan Hendrik Farr <kernel@jfarr.cc>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://lore.kernel.org/r/20241029140036.577804-2-kernel@jfarr.cc
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Features:
- Fixup and improve NLM and kNFSD file lock callbacks
Last year both GFS2 and OCFS2 had some work done to make their
locking more robust when exported over NFS. Unfortunately, part of
that work caused both NLM (for NFS v3 exports) and kNFSD (for
NFSv4.1+ exports) to no longer send lock notifications to clients
This in itself is not a huge problem because most NFS clients will
still poll the server in order to acquire a conflicted lock
It's important for NLM and kNFSD that they do not block their
kernel threads inside filesystem's file_lock implementations
because that can produce deadlocks. We used to make sure of this by
only trusting that posix_lock_file() can correctly handle blocking
lock calls asynchronously, so the lock managers would only setup
their file_lock requests for async callbacks if the filesystem did
not define its own lock() file operation
However, when GFS2 and OCFS2 grew the capability to correctly
handle blocking lock requests asynchronously, they started
signalling this behavior with EXPORT_OP_ASYNC_LOCK, and the check
for also trusting posix_lock_file() was inadvertently dropped, so
now most filesystems no longer produce lock notifications when
exported over NFS
Fix this by using an fop_flag which greatly simplifies the problem
and grooms the way for future uses by both filesystems and lock
managers alike
- Add a sysctl to delete the dentry when a file is removed instead of
making it a negative dentry
Commit 681ce8623567 ("vfs: Delete the associated dentry when
deleting a file") introduced an unconditional deletion of the
associated dentry when a file is removed. However, this led to
performance regressions in specific benchmarks, such as
ilebench.sum_operations/s, prompting a revert in commit
4a4be1ad3a6e ("Revert "vfs: Delete the associated dentry when
deleting a file""). This reintroduces the concept conditionally
through a sysctl
- Expand the statmount() system call:
* Report the filesystem subtype in a new fs_subtype field to
e.g., report fuse filesystem subtypes
* Report the superblock source in a new sb_source field
* Add a new way to return filesystem specific mount options in an
option array that returns filesystem specific mount options
separated by zero bytes and unescaped. This allows caller's to
retrieve filesystem specific mount options and immediately pass
them to e.g., fsconfig() without having to unescape or split
them
* Report security (LSM) specific mount options in a separate
security option array. We don't lump them together with
filesystem specific mount options as security mount options are
generic and most users aren't interested in them
The format is the same as for the filesystem specific mount
option array
- Support relative paths in fsconfig()'s FSCONFIG_SET_STRING command
- Optimize acl_permission_check() to avoid costly {g,u}id ownership
checks if possible
- Use smp_mb__after_spinlock() to avoid full smp_mb() in evict()
- Add synchronous wakeup support for ep_poll_callback.
Currently, epoll only uses wake_up() to wake up task. But sometimes
there are epoll users which want to use the synchronous wakeup flag
to give a hint to the scheduler, e.g., the Android binder driver.
So add a wake_up_sync() define, and use wake_up_sync() when sync is
true in ep_poll_callback()
Fixes:
- Fix kernel documentation for inode_insert5() and iget5_locked()
- Annotate racy epoll check on file->f_ep
- Make F_DUPFD_QUERY associative
- Avoid filename buffer overrun in initramfs
- Don't let statmount() return empty strings
- Add a cond_resched() to dump_user_range() to avoid hogging the CPU
- Don't query the device logical blocksize multiple times for hfsplus
- Make filemap_read() check that the offset is positive or zero
Cleanups:
- Various typo fixes
- Cleanup wbc_attach_fdatawrite_inode()
- Add __releases annotation to wbc_attach_and_unlock_inode()
- Add hugetlbfs tracepoints
- Fix various vfs kernel doc parameters
- Remove obsolete TODO comment from io_cancel()
- Convert wbc_account_cgroup_owner() to take a folio
- Fix comments for BANDWITH_INTERVAL and wb_domain_writeout_add()
- Reorder struct posix_acl to save 8 bytes
- Annotate struct posix_acl with __counted_by()
- Replace one-element array with flexible array member in freevxfs
- Use idiomatic atomic64_inc_return() in alloc_mnt_ns()"
* tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits)
statmount: retrieve security mount options
vfs: make evict() use smp_mb__after_spinlock instead of smp_mb
statmount: add flag to retrieve unescaped options
fs: add the ability for statmount() to report the sb_source
writeback: wbc_attach_fdatawrite_inode out of line
writeback: add a __releases annoation to wbc_attach_and_unlock_inode
fs: add the ability for statmount() to report the fs_subtype
fs: don't let statmount return empty strings
fs:aio: Remove TODO comment suggesting hash or array usage in io_cancel()
hfsplus: don't query the device logical block size multiple times
freevxfs: Replace one-element array with flexible array member
fs: optimize acl_permission_check()
initramfs: avoid filename buffer overrun
fs/writeback: convert wbc_account_cgroup_owner to take a folio
acl: Annotate struct posix_acl with __counted_by()
acl: Realign struct posix_acl to save 8 bytes
epoll: Add synchronous wakeup support for ep_poll_callback
coredump: add cond_resched() to dump_user_range
mm/page-writeback.c: Fix comment of wb_domain_writeout_add()
mm/page-writeback.c: Update comment for BANDWIDTH_INTERVAL
...
|
|
To prepare for handling posix timer signals on sigaction(SIG_IGN) properly,
add a list to task::signal.
This list will be used to queue posix timers so their signal can be
requeued when SIG_IGN is lifted later.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20241105064213.920101900@linutronix.de
|
|
Currently, cpuset is the only user of the union-find implementation.
Compiling union-find in all configurations unnecessarily increases the
code size when building the kernel without cgroup support. Modify the
build system to compile union-find only when CONFIG_CPUSETS is enabled.
Link: https://lore.kernel.org/lkml/1ccd6411-5002-4574-bb8e-3e64bba6a757@redhat.com/
Link: https://lkml.kernel.org/r/20241011141214.87096-1-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Suggested-by: Waiman Long <llong@redhat.com>
Acked-by: Waiman Long <longman@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Xavier <xavier_qy@163.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The initramfs filename field is defined in
Documentation/driver-api/early-userspace/buffer-format.rst as:
37 cpio_file := ALGN(4) + cpio_header + filename + "\0" + ALGN(4) + data
...
55 ============= ================== =========================
56 Field name Field size Meaning
57 ============= ================== =========================
...
70 c_namesize 8 bytes Length of filename, including final \0
When extracting an initramfs cpio archive, the kernel's do_name() path
handler assumes a zero-terminated path at @collected, passing it
directly to filp_open() / init_mkdir() / init_mknod().
If a specially crafted cpio entry carries a non-zero-terminated filename
and is followed by uninitialized memory, then a file may be created with
trailing characters that represent the uninitialized memory. The ability
to create an initramfs entry would imply already having full control of
the system, so the buffer overrun shouldn't be considered a security
vulnerability.
Append the output of the following bash script to an existing initramfs
and observe any created /initramfs_test_fname_overrunAA* path. E.g.
./reproducer.sh | gzip >> /myinitramfs
It's easiest to observe non-zero uninitialized memory when the output is
gzipped, as it'll overflow the heap allocated @out_buf in __gunzip(),
rather than the initrd_start+initrd_size block.
---- reproducer.sh ----
nilchar="A" # change to "\0" to properly zero terminate / pad
magic="070701"
ino=1
mode=$(( 0100777 ))
uid=0
gid=0
nlink=1
mtime=1
filesize=0
devmajor=0
devminor=1
rdevmajor=0
rdevminor=0
csum=0
fname="initramfs_test_fname_overrun"
namelen=$(( ${#fname} + 1 )) # plus one to account for terminator
printf "%s%08x%08x%08x%08x%08x%08x%08x%08x%08x%08x%08x%08x%08x%s" \
$magic $ino $mode $uid $gid $nlink $mtime $filesize \
$devmajor $devminor $rdevmajor $rdevminor $namelen $csum $fname
termpadlen=$(( 1 + ((4 - ((110 + $namelen) & 3)) % 4) ))
printf "%.s${nilchar}" $(seq 1 $termpadlen)
---- reproducer.sh ----
Symlink filename fields handled in do_symlink() won't overrun past the
data segment, due to the explicit zero-termination of the symlink
target.
Fix filename buffer overrun by aborting the initramfs FSM if any cpio
entry doesn't carry a zero-terminator at the expected (name_len - 1)
offset.
Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
Signed-off-by: David Disseldorp <ddiss@suse.de>
Link: https://lore.kernel.org/r/20241030035509.20194-2-ddiss@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The HAVE_CFI_ICALL_NORMALIZE_INTEGERS option has some tricky conditions
when KASAN or GCOV are turned on, as in that case we need some clang and
rustc fixes [1][2] to avoid boot failures. The intent with the current
setup is that you should be able to override the check and turn on the
option if your clang/rustc has the fix. However, this override does not
work in practice. Thus, use the new RUSTC_LLVM_VERSION to correctly
implement the check for whether the fix is available.
Additionally, remove KASAN_HW_TAGS from the list of incompatible
options. The CFI_ICALL_NORMALIZE_INTEGERS option is incompatible with
KASAN because LLVM will emit some constructors when using KASAN that are
assigned incorrect CFI tags. These constructors are emitted due to use
of -fsanitize=kernel-address or -fsanitize=kernel-hwaddress that are
respectively passed when KASAN_GENERIC or KASAN_SW_TAGS are enabled.
However, the KASAN_HW_TAGS option relies on hardware support for MTE
instead and does not pass either flag. (Note also that KASAN_HW_TAGS
does not `select CONSTRUCTORS`.)
Link: https://github.com/llvm/llvm-project/pull/104826 [1]
Link: https://github.com/rust-lang/rust/pull/129373 [2]
Fixes: 4c66f8307ac0 ("cfi: encode cfi normalized integers + kasan/gcov bug in Kconfig")
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20241010-icall-detect-vers-v1-2-8f114956aa88@google.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
Each version of Rust supports a range of LLVM versions. There are cases where
we want to gate a config on the LLVM version instead of the Rust version.
Normalized cfi integer tags are one example [1].
The invocation of rustc-version is being moved from init/Kconfig to
scripts/Kconfig.include for consistency with cc-version.
Link: https://lore.kernel.org/all/20240925-cfi-norm-kasan-fix-v1-1-0328985cdf33@google.com/ [1]
Signed-off-by: Gary Guo <gary@garyguo.net>
Link: https://lore.kernel.org/r/20241011114040.3900487-1-gary@garyguo.net
[ Added missing `-llvm` to the Usage documentation. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
Use Tasks Trace RCU to protect iteration of system call enter/exit
tracepoint probes to allow those probes to handle page faults.
In preparation for this change, all tracers registering to system call
enter/exit tracepoints should expect those to be called with preemption
enabled.
This allows tracers to fault-in userspace system call arguments such as
path strings within their probe callbacks.
Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
Link: https://lore.kernel.org/20241009010718.2050182-6-mathieu.desnoyers@efficios.com
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Today we are proxying the `console=` command line args to the
`param_setup_earlycon()` handler. This is done because the following are
equivalent:
console=uart[8250],mmio,<addr>[,options]
earlycon=uart[8250],mmio,<addr>[,options]
Both invocations enable an early `bootconsole`. `console=uartXXXX` is
just an alias for `earlycon=uartXXXX`.
In addition, when `earlycon=` (empty value) or just `earlycon`
(no value) is specified on the command line, we enable the earlycon
`bootconsole` specified by the SPCR table or the DT.
The problem arises when `console=` (empty value) is specified on the
command line. It's intention is to disable the `console`, but what
happens instead is that the SPRC/DT console gets enabled.
This happens because we are proxying the `console=` (empty value)
parameter to the `earlycon` handler. The `earlycon` handler then sees
that the parameter value is empty, so it enables the SPCR/DT
`bootconsole`.
This change makes it so that the `console` or `console=` parameters no
longer enable the SPCR/DT `bootconsole`. I also cleans up the hack in
`main.c` that would forward the `console` parameter to the `earlycon`
handler.
Signed-off-by: Raul E Rangel <rrangel@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20240911123507.v2.1.Id08823b2f848237ae90ce5c5fa7e027e97c33ad3@changeid
Signed-off-by: Petr Mladek <pmladek@suse.com>
|
|
There is a bug in the LLVM implementation of KASAN and GCOV that makes
these options incompatible with the CFI_ICALL_NORMALIZE_INTEGERS option.
The bug has already been fixed in llvm/clang [1] and rustc [2]. However,
Kconfig currently has no way to gate features on the LLVM version inside
rustc, so we cannot write down a precise `depends on` clause in this
case. Instead, a `def_bool` option is defined for whether
CFI_ICALL_NORMALIZE_INTEGERS is available, and its default value is set
to false when GCOV or KASAN are turned on. End users using a patched
clang/rustc can turn on the HAVE_CFI_ICALL_NORMALIZE_INTEGERS option
directly to override this.
An alternative solution is to inspect a binary created by clang or rustc
to see whether the faulty CFI tags are in the binary. This would be a
precise check, but it would involve hard-coding the *hashed* version of
the CFI tag. This is because there's no way to get clang or rustc to
output the unhased version of the CFI tag. Relying on the precise
hashing algorithm using by CFI seems too fragile, so I have not pursued
this option. Besides, this kind of hack is exactly what lead to the LLVM
bug in the first place.
If the CFI_ICALL_NORMALIZE_INTEGERS option is used without CONFIG_RUST,
then we actually can perform a precise check today: just compare the
clang version number. This works since clang and llvm are always updated
in lockstep. However, encoding this in Kconfig would give the
HAVE_CFI_ICALL_NORMALIZE_INTEGERS option a dependency on CONFIG_RUST,
which is not possible as the reverse dependency already exists.
HAVE_CFI_ICALL_NORMALIZE_INTEGERS is defined to be a `def_bool` instead
of `bool` to avoid asking end users whether they want to turn on the
option. Turning it on explicitly is something only experts should do, so
making it hard to do so is not an issue.
I added a `depends on CFI_CLANG` clause to the new Kconfig option. I'm
not sure whether that makes sense or not, but it doesn't seem to make a
big difference.
In a future kernel release, I would like to add a Kconfig option similar
to CLANG_VERSION/RUSTC_VERSION for inspecting the version of the LLVM
inside rustc. Once that feature lands, this logic will be replaced with
a precise version check. This check is not being introduced here to
avoid introducing a new _VERSION constant in a fix.
Link: https://github.com/llvm/llvm-project/pull/104826 [1]
Link: https://github.com/rust-lang/rust/pull/129373 [2]
Fixes: ce4a2620985c ("cfi: add CONFIG_CFI_ICALL_NORMALIZE_INTEGERS")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202409231044.4f064459-oliver.sang@intel.com
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20240925-cfi-norm-kasan-fix-v1-1-0328985cdf33@google.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
When enabling both KASAN and RETHUNK, objtool emits the following
warnings:
rust/core.o: warning: objtool: asan.module_ctor+0x13: 'naked' return found in MITIGATION_RETHUNK build
rust/core.o: warning: objtool: asan.module_dtor+0x13: 'naked' return found in MITIGATION_RETHUNK build
This is caused by the -Zfunction-return=thunk-extern flag in rustc not
informing LLVM about the mitigation at the module level (it does so at
the function level only currently, which covers most cases, but both
are required), which means that the KASAN functions asan.module_ctor
and asan.module_dtor are generated without the rethunk mitigation.
The other mitigations that we enabled for Rust (SLS, RETPOLINE) do not
have the same bug, as they're being applied through the target-feature
functionality instead.
This is being fixed for rustc 1.83.0, so update Kconfig to reject this
configuration on older compilers.
Link: https://github.com/rust-lang/rust/pull/130824
Fixes: d7868550d573 ("x86/rust: support MITIGATION_RETHUNK")
Reported-by: Miguel Ojeda <ojeda@kernel.org>
Closes: https://lore.kernel.org/all/CANiq72myZL4_poCMuNFevtpYYc0V0embjSuKb7y=C+m3vVA_8g@mail.gmail.com/
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20240926093849.1192264-1-aliceryhl@google.com
[ Reworded to add the details mentioned in the list. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
The `-Zpatchable-function-entry` flag is available since Rust
1.81.0, not Rust 1.80.0, i.e. commit ac7595fdb1ee ("Support for -Z
patchable-function-entry") in upstream Rust.
Fixes: ca627e636551 ("rust: cfi: add support for CFI_CLANG with Rust")
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Fiona Behrens <me@kloenk.dev>
Link: https://lore.kernel.org/r/20240925141944.277936-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
Pull Rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- Support 'MITIGATION_{RETHUNK,RETPOLINE,SLS}' (which cleans up
objtool warnings), teach objtool about 'noreturn' Rust symbols and
mimic '___ADDRESSABLE()' for 'module_{init,exit}'. With that, we
should be objtool-warning-free, so enable it to run for all Rust
object files.
- KASAN (no 'SW_TAGS'), KCFI and shadow call sanitizer support.
- Support 'RUSTC_VERSION', including re-config and re-build on
change.
- Split helpers file into several files in a folder, to avoid
conflicts in it. Eventually those files will be moved to the right
places with the new build system. In addition, remove the need to
manually export the symbols defined there, reusing existing
machinery for that.
- Relax restriction on configurations with Rust + GCC plugins to just
the RANDSTRUCT plugin.
'kernel' crate:
- New 'list' module: doubly-linked linked list for use with reference
counted values, which is heavily used by the upcoming Rust Binder.
This includes 'ListArc' (a wrapper around 'Arc' that is guaranteed
unique for the given ID), 'AtomicTracker' (tracks whether a
'ListArc' exists using an atomic), 'ListLinks' (the prev/next
pointers for an item in a linked list), 'List' (the linked list
itself), 'Iter' (an iterator over a 'List'), 'Cursor' (a cursor
into a 'List' that allows to remove elements), 'ListArcField' (a
field exclusively owned by a 'ListArc'), as well as support for
heterogeneous lists.
- New 'rbtree' module: red-black tree abstractions used by the
upcoming Rust Binder.
This includes 'RBTree' (the red-black tree itself), 'RBTreeNode' (a
node), 'RBTreeNodeReservation' (a memory reservation for a node),
'Iter' and 'IterMut' (immutable and mutable iterators), 'Cursor'
(bidirectional cursor that allows to remove elements), as well as
an entry API similar to the Rust standard library one.
- 'init' module: add 'write_[pin_]init' methods and the
'InPlaceWrite' trait. Add the 'assert_pinned!' macro.
- 'sync' module: implement the 'InPlaceInit' trait for 'Arc' by
introducing an associated type in the trait.
- 'alloc' module: add 'drop_contents' method to 'BoxExt'.
- 'types' module: implement the 'ForeignOwnable' trait for
'Pin<Box<T>>' and improve the trait's documentation. In addition,
add the 'into_raw' method to the 'ARef' type.
- 'error' module: in preparation for the upcoming Rust support for
32-bit architectures, like arm, locally allow Clippy lint for
those.
Documentation:
- https://rust.docs.kernel.org has been announced, so link to it.
- Enable rustdoc's "jump to definition" feature, making its output a
bit closer to the experience in a cross-referencer.
- Debian Testing now also provides recent Rust releases (outside of
the freeze period), so add it to the list.
MAINTAINERS:
- Trevor is joining as reviewer of the "RUST" entry.
And a few other small bits"
* tag 'rust-6.12' of https://github.com/Rust-for-Linux/linux: (54 commits)
kasan: rust: Add KASAN smoke test via UAF
kbuild: rust: Enable KASAN support
rust: kasan: Rust does not support KHWASAN
kbuild: rust: Define probing macros for rustc
kasan: simplify and clarify Makefile
rust: cfi: add support for CFI_CLANG with Rust
cfi: add CONFIG_CFI_ICALL_NORMALIZE_INTEGERS
rust: support for shadow call stack sanitizer
docs: rust: include other expressions in conditional compilation section
kbuild: rust: replace proc macros dependency on `core.o` with the version text
kbuild: rust: rebuild if the version text changes
kbuild: rust: re-run Kconfig if the version text changes
kbuild: rust: add `CONFIG_RUSTC_VERSION`
rust: avoid `box_uninit_write` feature
MAINTAINERS: add Trevor Gross as Rust reviewer
rust: rbtree: add `RBTree::entry`
rust: rbtree: add cursor
rust: rbtree: add mutable iterator
rust: rbtree: add iterator
rust: rbtree: add red-black tree implementation backed by the C version
...
|